Back to Community
Quantopian Lecture Series: Long-Short Equity Algorithm

EDIT: The backtest originally shared with this post was replaced with a more efficient version. The logic wasn't changed but the code was optimized to run more quickly.

I worked with James Christopher to create an updated version of the algorithm from this post:

https://www.quantopian.com/posts/long-short-pipeline-multi-factor

This new version will be attached to the lecture series as the example long-short equity algorithm. Long-short equity strategies have the advantage of trading many names, and are therefore statistically robust. They also have very high maximum capital thresholds, but also high minimum profitable capital thresholds. For a fuller explanation of these strategies, please see Lecture 17

I recommend that folks try swapping in their own ranking methodology, as that is the key component. The rebalancing, ordering, and other components are fairly modular and should not need to be updated too much. If you have a new investment thesis or pricing model, that can be inserted simply by changing the custom factors and the ranking logic.

NOTE: This algorithm is not intended to perform consistently over all time periods. Ranking schemes have predictive lifecycles, and the goal of someone trying to use this algorithm to trade profitably now should be to find a predictive ranking scheme that will work in the future. For more information on this please see Lecture 18.

Clone Algorithm
994
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 56a8157633749711029e987b
We have migrated this algorithm to work with a new version of the Quantopian API. The code is different than the original version, but the investment rationale of the algorithm has not changed. We've put everything you need to know here on one page.
There was a runtime error.
Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

72 responses

Thank you very much for this template!
I don't understand one thing:

In this example the number of stocks to be in the market with is 200 for the long side and 200 for the short side, if i run the backtest both in minute mode and in daily mode the number of stocks creep up to more than 550 in a few years as so does the leverage that reaches values like 1.3 or so.

Isn't this an issue when we have to evaluate an algo that it's supposed to work with 400 stocks and instead we have more than 550-600 stocks in our portfolio?
Thank you

Answering my own question...
I think that the problem is the number of delisted stocks, infact if i modify the before trading start method adding this lines before updating the universe

context.output['end_date'] = [stock.end_date for stock in context.output.index]  
context.output = context.output[(context.output['end_date'] > (get_datetime()+datetime.timedelta(days=365)))]  

i have a perfect flat line of num_positions wich is equal to 400 and of course longs and shorts are 200 each.
So now the question is, does anyone have a way to filter the pipeline output reducing the risk of delisted stocks without introducing this lookahead error?

And by the way.. what happens to the backtest values when the portfolio includes so many delisted stocks?

Delaney Granizo-Mackenzie,

It took about 4 hours to run backtest of yours algo in minute mode on full market cycle with enabled default slippage model.
Is the results in line with Quantopian goal?
If not, there are two possible problems:
Ether default slippage model is not realistic.
Then Quantopian must fix it first?
Ether long-short strategy by this code practically not making any money.
Then why Quantopian put it as the example in lecture series?
The leverage problem was mentioned by Giuseppe Moriconi.

Clone Algorithm
22
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 56a48ddae746f511a3eb7ea9
There was a runtime error.

Hi Giuseppe,

The issue of de-listings has been festering for awhile (e.g. https://www.google.com/?gws_rd=ssl#q=quantopian+delisted). The sticky problem is that if a position is held in a stock after it is delisted, there is no way to close it out, so you end up with a build-up of dead stocks (since once the stock event stream shuts down, no transactions can be made). The problem is not only with backtesting, but also with Quantopian live paper trading (since it is basically backtesting, but with a different data feed).

I have to think for the most part, de-listings are not without warning. For example, I see that Nasdaq publishes relevant lists: http://www.nasdaq.com/markets/go-public.aspx. My hunch is that there is a regulatory process for de-listing that all exchanges follow. For example, if I look up BOULDER BRANDS, INC. on http://www.sec.gov/edgar/searchedgar/companysearch.html, I find that SEC Form 25 was filed on 2016-01-15, announcing the pending suspension. So, if the Quantopian database incorporated such information, one could apply a filter without look-ahead bias.

One way to sorta avoid look-ahead bias is to apply your code, but change days=365 to days=10. This assumes that any de-listing conforms to the 10-day notification period on http://www.investopedia.com/terms/s/sec-form-25.asp (presumably 10 calendar days and not trading days). You could still end up getting stuck with a few dead stocks in your portfolio, if no trades occur for those stocks within the 10 day window.

The problem is, when live paper trading, I think you have to remove your de-listing filter, since end_date will never be 10 days out; it'll be the current day (or the next trading day...I'm not sure which). Hence, Q needs to supply a real fix.

This is one reason I've been reluctant to start messing with pipeline, looking at 8000+ stocks. As you found, there will be lots of de-listings, and Q provides no way to manage them.

Grant

Hi Grant and thank you.

The 10 days is a good choice also because with 365 you will never see how your algo would have performed in the last year :)

By the way , i humbly suggest to the Q staff that stock piking algorithms like the ones based on pipeline should be equipped with some tools to handle these extreme situations because otherwyse you will never see the backtest of your idea, but you will see a completely different thing instead

For backtesting, set the end date of the backtest to account for your days=10 setting. Otherwise, I think you'll have no update of stocks in your universe at the end, which could skew your result.

You might also add some error checking code to print out any securities held in your portfolio after their end dates. This way, any time a backtest is run, you can confirm that you didn't end up holding any dead securities.

thanks again Grant, actually i think that if the goal of this post is to provide a robust template for long-short strategies, maybe it would be wise to include some pipeline "best practice" using stocks.end_date
otherwise (maybe i am wrong but..) i suspect that the outcomes would be sistematically biased.

The issue with delisted securities is something we are aware of and working to fix. We have a project that just kicked off to get them resolved without you having to resort to any look ahead bias. I can't commit to when it will be done, but it's in progress.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Hi Delaney,

A few questions/comments:

  1. Does your code filter out ETFs and such? If I wanted to write a long-short strategy that avoided them and just traded in company stocks, how would I do it?
  2. You have a bunch of 'from quantopian...' imports. This approach is a barrier to entry for using pipeline, since how does one know what to import? Is it possible just to do import pipeline or something similar?
  3. I'm kinda surprised that the code takes 4 hours to run, as reported above. If I'm reading it correctly, you should only be doing number-crunching 12 times a year. And you are only doing simple ranking. Yet it takes approx. 2 minutes per call. Am I missing something? Where's the bottleneck?

Grant

How does pipeline handle "when issued" stocks? Say I wanted to exclude them? How would I do it?

I have used the following code in before_trading_start to remove "when issued" stocks.

context.output = pipeline_output('top500')  
top500 = context.output.fillna(0)  
top500['sid'] = top500.index  
top500['symbol'] = top500.sid.apply(lambda x: x.symbol)  
top500 = top500[top500.symbol.apply(lambda x: not x.endswith('_WI'))]  

My understanding is that '_WI' is insufficient. Have you confirmed that all when-issued securities carry the _WI suffix and that no valid securities can carry it?

References:

https://www.quantopian.com/posts/is-there-a-reliable-way-to-exclude-when-issued-securities-from-fundamentals-screens
https://www.quantopian.com/posts/exit-securities-which-no-longer-trade
Sep 16, 2015, Simon comments: 'There are other symbols which end in WI but do not have an _, but which I still think are when-issued...'

I've edited my original algorithm to a much more efficient version that finishes in ~20 minutes.

@Delaney - can you please post your edited algo for us to review? Thanks!

Daniel, I think the algo in the original post at the top was replaced with the improved, edited version.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

@Daniel: To clarify, my original post had a slower version of the same algo but the post at the top was edited and now has the improved version. Sorry for the confusion!

Delaney,
I tried to backtest edited version with yours custom slippage model but must stop it because after 4 hours in backtest it done only 2/3 what the original done.
The leverage problem still exist.
Adding custom slippage model you actually answered my second question.
Quantopian default slippage model is not realistic. Am I right?
Dont you see that long-short strategy by this code practically not making any money?
Then why Quantopian put it as the example in lecture series?
Is the results in line with Quantopian goal?

Clone Algorithm
9
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 56a8d32380a65211951e0767
There was a runtime error.

Hi All,

I helped make some tweaks to this algorithm the other day to help it take better advantage of the Pipeline API.
Along the way I wrote up a notebook showing how to analyze Pipeline outputs and how to write code in a way that can be transferred back and forth between the IDE and Research (there's still a lot more that could be done on that front, but the design of Pipeline is motivated in large part by a desire to make it easier to write code that's applicable to both environments.)

Loading notebook preview...
Notebook previews are currently unavailable.
Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Is there a list of security types that are traded by this algo, that you wouldn't want in the portfolio if it were part of the Q hedge fund?

  • Leveraged ETFs (taken care of by context.dont_buys = security_lists.leveraged_etf_list)
  • ETFs and such (anything not an individual company stock)
  • When-issued (I'm not sure Karen's code covers everything)
  • Stocks about to be de-listed (apparently tools are in the works)
  • Other?

Delaney,

I am getting the same sluggish execution as Vladimir. I started the backtest on 1/1/2005 over two hours ago, and it is still running, not even up to 2009 yet. Is there a bug, or is this the expected behavior? Also, when I bring up the backtest result webpage, it sorta sputters along drawing the performance charts.

Are you sure you fixed it?

Grant

It took more then 6 hours to run backtest in minute mode from 01/02/2007 on edited (definitely not improved) version with default slippage and commission models.

Clone Algorithm
9
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 56a9438fbe6a6010ffe57717
There was a runtime error.

Timing Code (now has a summary sort bug fix not contained in this backtest)

pipeline_output() is sometimes fast, sometimes slow, once over three minutes.
This skips that most of the month, 27:40 elapsed for eleven months.
It doesn't appear to be capturing the main slowdown yet

Instead, add these lines:

    timing(context, 'gap', 'reset')  
def handle_data(context, data):  
    timing(context, 'gap')  

The output doesn't make much sense to me. Remember there's that return some days of the month added.
Anyway there may be a time lag behind the scenes between before_trading_start and handle_data or something.

Also tried this instead:

def before_trading_start(context, data):  
    timing(context, 'bts')

    [ other code ]

    timing(context, 'bts', 'reset') # reset last, measuring time spent outside of before_trading_start -ish, round trip  
def handle_data(context, data):  

Produces some big numbers. Except for the early days of the month, on other days, when the added return kicks in, it winds up timing the cumulative time. So I gather pipeline is doing a lot of work every day, since, even though schedule_function exists, it can't really know when its data might be called for. Maybe could be overall more efficient to only crunch when called? (If that's possible)

Clone Algorithm
11
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 56a9c64a5458b112a0959561
There was a runtime error.

Some comments by Scott Sanderson, Dec. 3, 2015, re: execution speed ( https://www.quantopian.com/posts/how-to-speed-up-pipeline-calculations-1#5660cae8a2018af26d000037 ):

It's still a fair bit slower than I'd like it to be, but the bottleneck at this point is almost entirely network IO with our database.

The bad news here is that there isn't much that you can do as a user to speed this up.
The good news is that this should get faster for free over time as we figure out how to optimize the fundamentals database for pipeline usage patterns.

Perhaps the bottleneck still exists?

Also, I allowed the backtest to continue, that was started 1/1/2005, and in 2014, got:

Something went wrong. Sorry for the inconvenience. Try using the built-in debugger to analyze your code. If you would like help, send us an email.

Not quite ready for prime time, it seems.

Clone Algorithm
14
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 56a94734dd99d31109d372c5
There was a runtime error.

We are not thrilled with the performance of this backtest either. We are working to understand why it's so slow and I'll report back on that when I have more information.

In the short term, if you are trying to do quick iteration on factors using this as a template, I recommend taking a look at the research notebook Scotty shared above. Working with pipeline in research is faster and allows for quick iteration.

The confusion over the run time stemmed from Delaney's comment:

I've edited my original algorithm to a much more efficient version that finishes in ~20 minutes.

Apparently, he did not test the problem reported by Vladimir, which was the execution time over a longer backtest period. As a general rule, when a problem is reported and a supposed fix is implemented, the exact problem needs to be tested.

Any feedback on the other questions raised in this thread?

Would it be feasible for Q to present an example that has decent performance vis-a-vis the expectations for the Q fund? It would be more encouraging to start with something that more-or-less works, and attempt to improve it.

I'm working on responding to all the points raised in this thread, and I appreciate everybody's feedback on the algorithm. The returns of the algorithm will depend almost entirely on the quality of the factors chosen for the ranking scheme, and whereas we're working on some factors that have worked well recently, we wanted to get this one out as a template showing everybody how the algorithm would work in general. No factors are predictive over all time periods, and the selection of factors that will continue to work in the future is precisely the problem that an industry quant is trying to solve. If we found something and that worked well and released it, it would likely be arbitraged out very quickly. As such we try to spend our time on creating better templates rather than trying to find factors that will produce good returns.

If we found something and that worked well and released it, it would likely be arbitraged out very quickly. As such we try to spend our time on creating better templates rather than trying to find factors that will produce good returns.

I find this argument unconvincing, if not a tad hypocritical if true, given that sharing of algos without expectation of their destruction is the premise of this entire forum. I think it's more likely that nobody at Quantopian has come up with an algorithm which would meet Quantopian's own criteria, and therefore cannot share one. Which is not surprising really, it's an exceedingly tough game.

Hi Delaney,

I realize that it is kinda unfair to criticize the performance, since you say up front:

NOTE: This algorithm is not intended to perform consistently over all time periods. Ranking schemes have predictive lifecycles, and the goal of someone trying to use this algorithm to trade profitably now should be to find a predictive ranking scheme that will work in the future.

However, I find it a bit odd that Q has made a big investment in the pipeline API, with the idea that it would support viable factor-based long-short algos for the fund, but you've yet to write some working examples (meaning ones that could be funded). And even before developing the API, how did you get the confidence that lots of unique strategies could be deployed profitably on Q by a large number of users (it is intended to be a crowd-sourced effort, so a handful of long-short algos in the fund won't cut it)?

Simon seems to have pretty good intuition, so when he says "it's an exceedingly tough game" I have to wonder if attempting to write a long-short strategy for Q would be a fool's errand. Personally, I don't work in the hedge fund world (or even in finance generally) and I don't have any acquaintances I could ask ("Hey Joe, how hard would it be to write a decent factor-based long-short strategy that scales to $10M in capital, using U.S. equities?"). So, I have no frame of reference.

You'll learn a lot by doing it yourselves, and then you can teach others. Q has some capable people. For example, let's say you, Fawce, Jess, Justin, and Thomas W. did nothing for two weeks but tried to come up with one or more factor-based long-short strategies? I think you'd gain credibility, learn the workflow first-hand, and potentially have an example algo you could share. Ignoring the opportunity cost, it might cost $50K to pay everyone, which is 0.5% of $10M--in relative terms, a small investment.

Also, I note on https://www.quantopian.com/about#op-28573-algorithm-writer-intern that you are actually looking to publish algos:

Quantopian is seeking an undergraduate intern to help expand our library of trading algorithms. We have dozens of algorithms that are described or otherwise implemented in languages other than Python. We would like these algorithms to be re-written in Python, tested in the Quantopian platform, and shared in the Quantopian community.

Presumably, these would be hedge fund style algos, since that's your business. So, I guess I'm confused about the idea that you wouldn't want to publish viable algos, since they'd be arbitraged away. What would the intern be doing then? Publishing algos that don't work?

On a separate note, you say "ranking schemes have predictive lifecycles" so is there code one could plunk into an algo, to flag when the strategy is working, and when it is spitting out jibberish? Or is the idea that one would manually run pyfolio and decide when the strategy has run out of gas?

In theory, what you want is an algo that computes 1000s of factors scores everyday on a basket of stocks and then builds some type of model based on which factors are generating the highest returns based on a rolling time period every day. Definitely a hard problem to solve and a lot of computing need and machine learning expertise

So I agree it doesn't make sense to focus on making static models

Miles,

Sounds ugly. Brute force, you'd need to create a highly multi-dimensional response surface on a rolling basis, and then optimize. In theory, one could do it in the research platform, by looping over the factor levels, running a backtest at each, but then one would have to manually update the algo for every new long-short allocation (and for hundreds of stocks, one might bump into the "don't steal our data" limits in the research platform). And it sounds impossible using the backtester. Seems like one would need a backtesting engine that could run all of the days in parallel (vectorized), rather than churning forward in a loop, day-by-day. And running N backtests in parallel would obviously speed things up, too.

Grant

Hi Delaney,

Yet another question: how does one know that daily bar data are sufficient for constructing a real long-short strategy? Intuitively, it seems like the relatively low-frequency data would frame the problem on a time scale that would not be competitive (e.g. to get decent statistics, one might need a month or two of data). Is there any evidence from the literature or elsewhere that a long-short strategy could be constructed with daily data, and accordingly trade relatively infrequently? Is it gonna be like stepping up to the plate in the big leagues with a corkball bat? What workflow does a professional long-short strategy developer follow and with what tools?

Grant

Hey all,

I'm really busy with a round of workshops right now, but making this algorithm template really good and accessible is super high priority for me. Based on my work and vacation schedule, I suspect I'll be able to come out with a bunch of updated material towards the end of the month. I'll make sure I address all the concerns raised in this thread, some of the answers are not as much simple as they are lectures in and of themselves, so I want to make sure I get them right before releasing the content.

Thanks,
Delaney

Thanks Delaney,

Rome wasn't built in a day. Another thought is that it'd be nice to be able to write out a file from the research platform, and then pull it into the backtester/trading engine. Ideally, one could do this on-the-fly, overnight, if not during the trading day, so that the algo would not need to be stopped during live trading.

Regarding your point about publishing a viable long-short strategy developed by Quantopian, and it being arbitraged away, you wouldn't necessarily need to publish the detailed code (the "secret sauce"), just the overall approach, what you learned in terms of pitfalls and workflow and perhaps some performance results, etc. Have you talked with anyone there about the idea? You wouldn't be able to co-opt the time of a bunch of folks for several weeks to help you to research and write a strategy, without the support of the powers-that-be.

Grant

I am probably not aware of all the issues with this but i think that switching between research and algos is un-necessary.
Ipython notebook is so comfortable and advanced that just putting some features in it to enable a real backtest (and maybe live trading too) makes the basic ide an uncessary tool, expecially since the debug window is very very unconfortable to me.

I know that one can use zipline via research but it's not very weel documented (just one example in research folder) and not very much used in the community.

Then i think that building everything with ipython notebook is much more easy to read and so closer to the spirit of the community.

Regarding the topic i think that a real working (not necessarily money making) example to use as a template would be very usefull expecially if it includes the major tools:

  • pipeline
  • managing leverage
  • stoplosses
  • takeprofits
  • trailing profits
  • delisted stocks

-Giuseppe

Good point about a unified user interface based on IPython. Another thing that would help would be to allow spinning up multiple backtests (or whatever analyses) in parallel from the research platform, which can be done from the IDE. It is not obvious that crunching over 8,000 securities per day, and umpteen factors makes sense in a serial computing environment; I can't imagine that the hedge funds Q is trying to compete against are taking this approach. It would also be a good idea to enable e-mail notifications, so that when compute jobs are complete (or crash), users would be notified.

I've also heard that there are approaches to backtesting that can be vectorized, so that the computation over time is done basically in one compute cycle. With event-driven testing, you are stuck with looping over every minute.

Any update from the Q staff on the numerous issues/questions raised in this thread?

Simon gives an example of how to filter out junk that one would not want in a long-short strategy:

https://www.quantopian.com/posts/equity-long-short

Does his code capture everything that should be excluded?

Also, specifically regarding de-listings, will Q be adding another data set? It seems that you'd need a feed of the filings/announcements of de-listings (and perhaps there are reversals/postponements, etc. which could get messy). Just wondering how you'll approach it.

Hi Grant, I think I read in Delaney's post above that he would come back with answers towards the end of this month.

Yep. Just thought I'd check in. I assume that others at Q can respond, as well. At some point, I'd like to take the plunge and attempt one of these glorious long-short strategies using pipeline. I don't want to get bogged down if the tools aren't ready and/or I don't know how to use them.

@Scott, can you extend your notebook to also call zipline in the notebook? I tried, but failed miserably. I don't know what 'data' to pass to the TradingAlgorithm, my from quantopian.algorithm import attach_pipeline, pipeline_output fails, and calls to attach_pipeline fail.

I agree with @Giuseppe and would like to work on a real working example in the notebook, but I'm stuck. I'd like to see:

  • A pipeline created
  • A call to the zipline backtester
  • A pyfolio output

I am wrong, or in the edited version of the algorithm the short stock weights are wrong?

#before  
 context.short_list = 1 / average_rank.order().iloc[:200]  
 context.short_list /= context.short_list.sum()  
# after  
context.short_weights = (short_ranks / short_ranks.sum())  

I'm currently working on a project that will replicate some existing factors quants use, plus have full research notebooks demonstrating the factor research process. Hopefully this will address a lot of the issues with this algorithm. Stay tuned for more.

Hello Delaney,

Thanks for the update. One of the things I'm observing on the forum is that users bang into limitations of the platform in trying to carry out their research work and formulate algos. This is actually a really good sign, since it means that you have people willing to volunteer their time to write algos for your hedge fund (and perhaps for their own capital, although its not really your main business, as I understand it). It is clear that the pipeline API was released as a beta version, and that overall the Quantopian platform is beta-esque. All fine. What's missing is a forward-looking architecture, to put it all into context, and to which users can contribute in a cohesive, constructive way, versus scattered bugs, comments, complaints, improvement requests, etc. distributed over forum posts going back several years.

To borrow from the late Steve Jobs, you want to aspire to "insanely great" tools. I suggest lobbying for putting up a wish-list architecture and roadmap and then driving to get it. Personally, I have a number of recommendations, but I'm not sure they are heard and captured.

As I suggest above, one way to sort out what's missing would be to try to research and write a viable algo yourselves, as a team, sharing the results and lessons-learned. Have you given this any more thought? I think you basically rejected it, on the grounds that "If we found something and that worked well and released it, it would likely be arbitraged out very quickly." Perhaps, but you would have lost nothing, unless you put capital toward the public strategy. And your basic premise here anyway is that there are enough orthogonal strategies out there that publishing one won't make a difference in Quantopian's future.

Delaney,

I'm working on responding to all the points raised in this thread, and I appreciate everybody's feedback on the algorithm.

I do not see any response to my questions:

Quantopian default slippage model is not realistic. Am I right?
Dont you see that long-short strategy by this code practically not making any money?
Then why Quantopian put it as the example in lecture series?
Is the results in line with Quantopian goal?

Hello Vladimir,

Regarding the default slippage model, admittedly I don't have any expertise in this area and haven't paid much attention to it. It can be disabled by users, and users can write their own slippage models, too. Aside from the contest, the default slippage model is not imposed on anyone. And I have to wonder if it may be valid for sifting out algos that are particularly robust against slippage. In any case, if you have a specific recommended slippage model, perhaps you could post it in a separate thread? And maybe even put it up in github in a public repository?

Also, Q is in the business of making money. So, if you or anyone else has a convincing case that their algo will work well under realistic slippage and deserves capital, I'd expect they'd listen. In fact, if the contest is the concern, users are free to set up as many parallel algos as they want, to run for 6 months. Enter one into the contest, and run another one in parallel with your favorite slippage model.

That said, if the current default slippage is grossly inadequate for writing long-short algos, as Delaney is promoting, then it should be added to the (unpublished) list of tools that are needed (maybe it is already on the list?).

I think the over-arching issue here, as Simon points out above, is that "it's an exceedingly tough game." Q is still pulling together the tools. But they need to maintain their version of a "reality distortion field" both internally and as their public face. Thus, the conversation tends to be one-sided. They are not likely to post a list of the short-comings of their API, with prioritization and commitments to fix problems. At least that's my take on it. So, I'd be patient.

Since I guess the goal here is to evolve Delaney's example above into a practical template, one thing that is missing is a data integrity check, since there can be errors in data supplied by Q. I'm not clear how to do this in a thorough, robust fashion. Inevitably, when dealing with a dynamic universe of thousands of securities, errors will crop up. So a block of code in the algo is required to catch them. And the algo needs to be written so that it is immune to any errors that slip through the check.

In addition to code in the algo, one suggestion is that Q publish a list of securities with known problems, to be avoided in long-short algos (e.g. security_lists.erroneous_data).

For accurate backtesting and Q simulated live trading, the algo also needs:

def cancel_everything(context,data):  
    """  
    Cancels all open orders for all sids.  
    """  
    all_open_orders = get_open_orders()  
    if all_open_orders:  
        for security, oo_for_sid in all_open_orders.iteritems():  
            for order_obj in oo_for_sid:  
                log.info("%s: Cancelling order for %s of %s created on %s" %  
                         (get_datetime(), order_obj.amount,  
                          security.symbol, order_obj.created))  
                cancel_order(order_obj)  

I think the proper usage is to call it in the last minute of trading (using schedule_function), so that it simulates the automatic cancellation of all open orders after the close, when trading real money.

The de-listing problem was fixed, I think, so this code is no longer needed:

if security in data:  # Work around inability to sell de-listed stocks.  
            if security not in context.active_portfolio:  
                order_target_percent(security, 0)  

Correct? Or is it still required?

Delaney - any progress on this algo?

I'm running a backtest on Delaney's code, starting 1/1/2005. After one hour, it is 27% done. So, it seems we are still at a 4 hour backtest time. Any update on the speed?

@ Karen - "We are working to understand why it's so slow and I'll report back on that when I have more information." Any feedback?

Any updates from the Q team on this? It'd be great to have a practical, working (but not necessarily profitable) long-short template.

Hey everyone. Sorry for the slow responses, I just got back from Asia. Let me detail the approach I'm taking here:

We're going through a list of known investment factors, including standard ones like momentum, mean reversion, etc. We're implementing long short algorithms for each, but also keeping a general base template so that factors can be recombined across different algorithms. Through this process we hope to be able to release this base template, but also the known factors as examples. We're taking into account all the feedback we got on this thread, and making sure we test for speed and robustness. We'll never be able to account for all potential issues, but we hope to release something that's as close as we can get to what an industry long short would look like, and release as much of the infrastructure for development as possible. One of our interns is currently working full time on this project, and I'll let you know when I have more info about ETA. Likely not in the next couple months.

In the meantime, I've actually been using this template a lot, and it's pretty reasonable for most of the use cases I've had. Some issues with speed that are tougher to address, but that might improve as the site undergoes optimizations. The robustness issues like order handling are easier to fix by swapping in new code from other forum algorithms. Clearly not optimal, but we've opted to go for a full-project redesign versus tweaking this template.

Thanks,
Delaney

Delaney,

Thanks for the update. Regarding speed, a few thoughts:

  • Run daily instead of minutely backtests, as a first-pass screen. The slippage model can be manipulated for more realistic fills (e.g. https://www.quantopian.com/posts/trade-at-the-open-slippage-model). Pipeline only supports daily data, so unless there is further minutely refinement in the code, it seems that daily backtest mode could be cajoled into yielding accurate results for development work.
  • Vectorized backtesting? Is it feasible to get realistic "back-of-the-envelope" results? The Q minutely event-driven backtester, with full compatibility with live trading may be overkill for relative comparisons and optimizations.
  • Enabling code to be executed in parallel, GPU computing, etc. (Thomas W. provided an example - see https://www.quantopian.com/posts/zipline-in-the-cloud-optimizing-financial-trading-algorithms)

On a separate topic, I'm not sure what you mean by "a full-project redesign versus tweaking this template." In any case, I'd encourage your unidentified intern to leverage the user base in his/her development process, versus doing a "Ta-da!" release, only to find out that important stuff was not considered. Also, in terms of "release" I'd encourage y'all to use github, since if the code is not under public revision control, you really haven't released anything. As part of the "full-project redesign" you could consider setting up a repository.

A lot of this comes down to the fact that industry quants spend very little time backtesting, and the vast majority of their time in whatever their equivalent is to a our research environment. The back of the envelope calculations you mention are effectively looking at the factor's performance in the research environment. Those, if the strategy is meaningful, should model the backtest returns fairly well. Andrew has done a good amount of work towards this in his 101 Alphas Project. I've also started to update the lectures to include the pipeline API. I recommend checking out 13, 14, and 15. The idea is to spend most of the time in the research environment, and then just use the backtester to do some analysis of transactions and slippage issues, and as a final validation that the model works. In practice industry quants face the same issues, backtesting is generally incredibly slow, and custom infrastructure is built to support very specific backtesting cases. We're trying to make the algorithm faster, but I think a lot of the speed gain will come from researching the factor models in a research environment, which is by nature vectorized.

By redesigning I mean putting together an intelligent workflow including Andrew's work, plus a bunch of work in development, to enable what I describe above as the industry quant workflow. We will certainly make sure to show iterations to the community for feedback, and I've been thinking of repos we could set up to better support algorithm content. We do release a good amount of algorithm content here:

https://github.com/quantopian/research_public/tree/master/lectures

Thanks Delaney,

You are putting a lot of effort into this long-short factor-based strategy business. Is there a body of work suggesting it will actually work? Where did you get the idea in the first place? How would I know it's not just going to be a colossal waste of time? Do other hedge funds use it to their success? And how would you actually know, since reportedly they are highly secretive?

I've never actually seen a thorough justification by Quantopian, so I figured I'd ask.

Cheers,

Grant

I believe long-short momentum is at the core of the CTA industry, which has $350bn under management:

http://www.barclayhedge.com/research/indices/cta/Money_Under_Management.html

Dan,

You're right about momentum/trend-following being the primary strategy employed by CTAs, but it's a completely different breed of animal than the fully hedged factor models Quantopian is pressing. If compared to an equities momentum factor model, most CTAs are similar in that they are buying the strongest and selling the weakest, but that's about it. CTA trend following programs are usually taking direct long or short positions, unhedged in the general sense. Their "hedge" is mostly due to the diversity of markets they tend to trade, rather than buying one basket of securities while simultaneously selling another basket of securities and treating it as a single trade. The elevated levels of volatility which CTA programs experience from time to time are often due to the fact that they are taking unhedged positions and can be caught on the wrong side of a violent move in one or more instruments. This also allows them to get huge returns at times when they're on the right side of a move.

Graham

Ah, I didn't think of that. To paraphrase, CTA funds take long OR short positions without a requirement to balance the two? Definitely not an attempt to harvest some clean factor premium, but rather a more opportunistic "trading" approach.

Anyway, what's a good sound bite type answer to Grant's mildly inflammatory question?

To piggyback a bit on the Dan H. response with the barclayhedge hedge fund indices URL. Here is the hedge fund style indices presented by Credit Suisse which breaks down the fund styles even further. Long/Short equity is one of them (which as Graham points out, doesn't really mean they are fully hedged, and can take some bets), but Credit Suisse also posts the Equity Market Neutral hedgefund index as well. I'm not sure how they define "neutral" but it is likely somewhat stringent in its definition such that the component funds look different from the ones in the Long/Short Equity index. As well, Equity Market Neutral is a substantial style class in the hedge fund world, one of the most popular right behind the less strict Long/Short Equity.

http://www.hedgeindex.com/hedgeindex/en/indexoverview.aspx?indexname=HEDG&cy=USD

The AUM per strategy for Hedge Funds strategies, of which Equity Market Neutral specifically is a line item, that I discovered when poking around the barclayhedge link previously shared above:
http://www.barclayhedge.com/research/indices/ghs/mum/HF_Money_Under_Management.html

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

There's a breakdown of common strategies here:

http://www.investopedia.com/university/hedge-fund/strategies.asp

I'm sure somewhat out of date, but was helpful for me.

On the various podcasts I listen to, I hear a lot of talk about hedge funds acquiring vast amounts of data and getting into machine learning and artificial intelligence to sift through it for patterns. I assume the execution of this involves tools like linear regression, long-short factor portfolios, bayesian probabilities and statistical significance tests. All of this is stuff that Q is pushing, but with the crowd providing the intelligence, not robots.

It's a bit off topic, but I wonder how well these AI efforts are going.

O.K. $200B is a pretty big number for the AUM for equity long-short (http://www.barclayhedge.com/research/indices/ghs/mum/Equity_Long_Short.html). Must be legitimate. I'll assume that Q didn't just dream it up, and that it is possible to make money.

Delaney,

FYI - your code no longer runs due to:

get_open_orders(sid=long_stock):  

However, per the help page, it should run:

get_open_orders(sid=sid)
If sid is None or not specified, returns all open orders. If sid is specified, returns open orders for that sid
Parameters
sid: (optional) A security object. Can be also be None.
Returns
If sid is unspecified or None, returns a dictionary keyed by security id. The dictionary contains a list of orders for each sid, oldest first. If a sid is specified, returns a list of open orders for that sid, oldest first.

Your code will run by calling like this:

get_open_orders(sid)  

Good catch. We have a version ported over for Quantopian 2 that we'll be releasing shortly, that one should work.

Just be sure to test it over a long time period. When I tried to run your algo, it ran out of memory. See:

https://testdrive.quantopian.com/posts/memory-error-2

I tried to run the Q2 algo, with:

Settings:  
From 2003-01-01 to 2016-04-28 with $10,000,000 initial capital  

I got the error:

InterfaceError: connection already closed
There was a runtime error on line 139.

I do not get the error if I start with this:

Settings:  
From 2005-01-01 to 2016-04-28 with $10,000,000 initial capital  

And for:

Settings:  
From 2005-01-01 to 2016-04-28 with $10,000,000 initial capital  

I eventually get the error:

TimeoutException: Call to before_trading_start timed out
There was a runtime error on line 139.

after about 6 months of simulation.

I ain't feelin' the Q2 love yet...

I gave the Migrated Source Code algo another go, and got:

Something went wrong. Sorry for the inconvenience. Try using the built-in debugger to analyze your code. If you would like help, send us an email.  
InterfaceError: connection already closed  
There was a runtime error on line 139.

Still not working.

Clone Algorithm
3
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 5735982aec7ec611af06411f
There was a runtime error.

FYI - I have not read through this entire thread so I am devoid of it's context...but...

@ Grant - Just cloned your algo from your last post and ran a backtest from 2003/01/01 through 2012/01/01. Worked fine on my PC (i5 chip with 4GB RAM). Wondering if you have a hardware deficiency on your end?

@ Q - The culprit in my backtest appears to be ticker "KGC".

  • Frank V.
Clone Algorithm
2
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 5736dd403f37e60f8f681d7e
There was a runtime error.

Hmm. I still get:

Something went wrong. Sorry for the inconvenience. Try using the built-in debugger to analyze your code. If you would like help, send us an email.  
InterfaceError: connection already closed  
There was a runtime error on line 139.

I thought that the algos run on the server. Maybe there is an interaction with the local pc environment? Internet connection?

Clone Algorithm
10
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 5736f16bb487e211de051460
There was a runtime error.

Tried again and it ran.

Clone Algorithm
10
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 57370f91c52d070f8f246321
There was a runtime error.

Grant,

I've noticed the same error on number of my algorithms and I really think it somehow depends on the time of the day you run it. Not sure why, but this is my suspicion.

@Frank - re KGC, I had the same problem. See this thread: https://www.quantopian.com/posts/long-short-pipeline-multi-factor
AFAICS, it's bad stock split data?