Back to Community
How to Build a Stat Arb Strategy on Quantopian?

To have the best shot at winning the Quantopian Open, or having your algo selected for the Quantopian hedge fund, you need to focus on low beta strategies with consistent, low volatility returns.

Statistical arbitrage strategies are a great example of an algo type that achieves this purpose. Instead of investing in a few securities, investing in a large basket of securities will help decrease beta and increase consistent returns. The most simple form of a stat arb strategy is a pairs trade where you compare two individual stocks within the same industry (for example Coke and Pepsi or Shell and Exxon). But more interesting and complex strategies can be created by investing in larger baskets of stocks that you believe should behave similarly but for some reason don't.

A typical workflow for these strategies is to filter down to a universe of a couple thousand securities. Rank those securities based on some factor, and then long the top decile and short the bottom decile.

Unfortunately, writing a stat arb algo of this type is difficult on Quantopian today. You are limited to only fundamental data within before_trading_starts() where you can filter your universe and many of these strategies reply on pricing data as well. Your universe is restricted to 200 securities, which makes filtering to the best universe difficult. Many of our members have resorted to doing their analysis elsewhere and then importing a buy list via fetcher.

We understand this is far from ideal.

We've spent the last few weeks hard at work designing an API to make implementing stat arb strategies easier. Thursday May 14th, at 1PM EST, I am going to host a webinar to walk through the API design and a couple of psuedo coded examples. We haven't built this yet, and I am hoping to get feedback from the community on our design and on what functionality you want.

Please sign up to join me here.

If you can't make it, please feel free to share your thoughts below. At this stage basic examples of what you are trying to do (like this one from Bo Dong) are incredibly helpful and any you are willing to contribute to the discussion are appreciated.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

35 responses

Well, the number one thing would be to get unrestricted price data into before_trading_start/fundamentals, so that things like market_cap and anything else which is based on the price can be calculated correctly for every day, even years ago, and we can combine fundamental signals like value with price-based signals like relative and absolute momentum. This would also facilitate creating a pre-screens on fundamental/price-based factors, and then subsequent portfolio optimization based on shrunken minute/daily/weekly covariance matrices to come up with high-IR/low-downside-tracking-error portfolios vs the benchmark.

Thanks, sounds exciting!

Hi Karen,

One thought would be to allow background processes to be run during the trading day, since the 50 second time-out can be restrictive.

For your API, when will it run? Sometime overnight, before trading starts? Weekends? If so, will it time out and crash the algo (as the 50 second time-out does), or will there be some way to catch the error? And what will be the computing environment?

Grant

I would like the 50 second time-out relaxed for background processes. Especially if my strategy trades only once a day it does not make sense.

A lot of interesting stat arb is possible with options and futures. Else, you are pretty much limited to pair trading. Would be brilliant to have option and future data, if it is not a tough ask.

A bit off topic, but is there any way to have "your algo selected for the Quantopian hedge fund" without entering the Open?

How i can use algorithms in my trading?

Would there be any way to link up to outside computing resources (e.g. https://elsen.co/)?

need to allow simple filtering before trading start based on correlation, technical indicators and any other custom criteria

Will it be recorded so we may access it if we cant make it?

Thanks for all your thoughts. I am not going to go into a huge amount of detail here, not because I am trying to be cagey, but because the explanation of what we are planning to do is better with visuals and code examples.

I will say that we are focused on,

  • @Simon's request to get pricing data into before trading starts
  • @Grant & @Satyapravin's request to have more than 50 seconds to run your models, and to be able to schedule when your model runs
  • @Joe's request to allow filtering before trading starts

@Bharath, we are working on futures, but that will not be part of this discussion. Our first focus for stat-arb will be US Equities, although there are developers currently working on adding support for futures.

@Ethan, I am planning to record the webinar (although in the interest of full disclosure, the last time I did so, it only recorded 1/4 of my screen. Hopefully I can figure that out this time around.)

It'd be nice to be able to run N backtests, varying one or more parameters, so that optimization could be done periodically. An example is the algorithm posted by Peter Poon on https://www.quantopian.com/posts/comparing-olps-algorithms-olmar-up-et-al-dot-on-etfs, where there is a parameter context.eps. This parameter could be re-optimized periodically, as market conditions vary.

Is there any background material/concepts required to understand this webinar? I have always focussed on pure quant strategies and have little exposure to fundamentals. Are there things I need to brush up before attending this webinar?

How about each user would get a computational sandbox, with disk space for storage, internet connectivity, full control over software tools, limited admin rights, etc.? And the ability to extend computational capabilities for a fee (e.g. memory, compute cores, etc.--completely extensible)? Just break out of the mode of crafting restricted API's, and give users flexible, high-end computing resources, if they want them. As I understand, you are already paying a fixed monthly fee per user for the Q research platform, regardless of usage. So, why not make those resources available for backtesting and live trading? In fact, some users might say "Forget about research platform access. Spend your money on getting me computational resources I can use for live trading."

Regarding the internet connectivity, I understand that you can't let bulk data flow out, due to licensing restrictions. You figured it out the unidirectional flow problem for fetcher, so perhaps something similar could be worked out for the compute sandbox? Maybe a restricted ftp could be setup, so that users could grab files? And Wget?

I don't see any fundamental technical restrictions here. Is there something on the legal or business side that constrains what you could offer?

@Karen,

Thanks for the detailed update.

@ Karen
I am not clear on whether we would be able to filter based on open price of a security. All the cases you mentioned above are for 'before trading starts'. Do we have the day's open price available before the start of trade?

To be more specific, will this new feature let me do what I asked in this post:
https://www.quantopian.com/posts/creating-new-universe-every-day

Thanks.

@Panaki,
What you describe should be much easier to do in the new approach. You will be able to filter your universe on price and fundamental data. I highly recommend you join the webinar on Thursday to get the details of what we are planning. You'll also have the ability to let me know if there is anything you don't think will work, so we can take it into consideration.

http://www.homepages.ucl.ac.uk/~ucakgwp/QRSLab/ShaminKinathil.pdf

I thought the above link would be relevant on this post :).
Code is included and it is an interesting use of the particle filter.
Enjoy all.

@Karen
I would like to attend, but cannot. If it's possible to record the webinar, then people like me can watch it later. Thanks.

Hi Karen,

This business of "How to Build a Stat Arb Strategy on Quantopian?" is shrouded in mystery. Could you just whip up 5-10 exemplary algos that you'd love to have in the hedge fund and post them? Seriously, though, is there an outline you could share of the kind of thing you are looking for? I gather it is some sort of stat arbitrage-ish algo with 20 or so specially picked securities, long on some, short on others, in just the right proportions to be market-neutral. When Quantopian first started, there were lots of example algos flying around, which was helpful. Jess Stauth summarized many of them on http://blog.quantopian.com/5-basic-quant-strategies-implemented-by-the-quantopian-community/. Are any of them anywhere close to being hedge-fund-worthy? If not, could such a list be compiled of ones that would be good examples?

Grant

Hi Karen,

It's a long-only, high-beta algo, but nevertheless, here's a use case for parallel processing:

https://www.quantopian.com/posts/time-of-day-dependence

I suspect a time-of-day dependence; performance may depend on the value of 'minutes' in:

schedule_function(trade, date_rules.every_day(), time_rules.market_open(minutes=60))  

So, I'd like to run a backtest for every minute of the day (390 total), pull the results into the Q research platform, and analyze them.

Once the dependence is understood, then I could start thinking about how to approach it as an online optimization problem, incorporated into the algo (which might involve using a more efficient optimization approach than running 390 backtests and finding the global optimum, according to some merit function).

There are several other parameters that could be explored, including the return threshold parameter:

context.eps = 1.005  

The size of the trailing window (hard-coded as '8' and '9'):

prices = history(8*390,'1m','price')  
for n in range(1,9):  

The number of securities (given by '20'):

order_by(fundamentals.valuation.market_cap.desc()).limit(20))  

The length of the trailing window for the price smoothing:

prices = pd.ewma(prices,span=390).as_matrix(context.stocks)  

Also, sticking with the basic approach, an alternate exchange or index could be used, to pick the stocks.

And different percentile ranges of market cap could be examined.

So just for this relatively simple algorithm, potentially, there is a large parameter space to be explored. And it could be that the optimum values of the parameters need to be adjusted periodically (e.g. daily/weekly/monthly/quarterly), which ideally would be done automatically (although initially, it would make sense for a human to "bless" the optimization, prior to it being deployed, which would benefit from the kind of visualization tools you are making available in the Q research platform).

I won't be able to listen to your webinar today, but I suggesting bringing up the topic of being able to "call" the live running algo as a backtest, as it runs live, with parallel computing support, to do the kind of thing described by Thomas Wiecki here:

http://blog.quantopian.com/zipline_in_the_cloud/
https://vimeo.com/63273425

You could architect the system without parallel computing, but I think you'll be hamstrung. My strong recommendation is to build it in from the get-go, and figure out how to pay for it later. And your blueprint should include a means to enable it on a per-user basis. Eventually, you may identify some rock-star quants, and want to give them more resources. And maybe some users would pay for more resources (you might even get a dollar bill to pin up in your cubicle for being the first Q employee to generate revenue).

Grant

I can't partecipate unfortunately. Could you kindly make the webcast available on youtube?

It'd be nice to be able to run backtests and paper trading on synthetic data (both OHLCV minute bars and fundamental data). Basically, there needs to be a way to do an end-to-end simulation.

Was this seminar recorded? I am interested in back testing some specific stat arb, spread strategies on U.S. equities. This sounds like exactly what I might be looking for.

Thanks.

The recording of this webinar can be found here. Feel free to post questions here if you have them.

Thanks for great webinar.
I have a question about the implementation of the concept.
I noticed that the filtering and buildup of the portfolio is done in initialize() with scheduling mechanism. Isn't the fact it is in Initialize constrain us from changing the filter/factors method based on reality/needs that may happen in the future of our trades? so basically it means that the filter/factors for portfolio update will be fixed and just rescheduled and we can't change the filters dynamically at future stages of our trades?

An essential ingredient in the type of strategies described in webinar is a portfolio optimizer. This is both a science and art because different people want to optimize the portfolio weights based on different constraints. I think quantopian should provide us with this flexibility by providing a optimization solver software. There is CVXPY in python which is free or there are commercial optimization software available.

When wil the webinar code be available along with the new commands?

Hi Karen,

I haven't gotten around to listening to your webinar yet (will do eventually). I figure you might have an internal spec./requirements doc. you are working from. Is it something you could share? Or even put up on github for revision control and public comment? Or maybe there are proprietary details you can't share?

Grant

Hi Grant,
Unfortunately, no I don't have a spec doc. It's all on the whiteboards on the walls of our office. The webinar is the best we can do at this point.

KR

Thanks Karen,

My sense is that while there may be some truly proprietary details that cannot be shared, the behavior may be a carry-over from a start-up "stealth mode" mentality where there is a belief that you'll spring some wonderful new capability on the market, and one-up the competition. My suggestion is that if you want a truly crowd-sourced hedge fund, you'll need to get the crowd involved in the technical details of your development process (and perhaps some of the relevant business rationale and constraints). Aside from github, I bet there are lots of collaborative tools for doing this sort of thing (probably even virtual whiteboards). Since you don't have anything in place for this go-around, maybe it doesn't make sense. But it would be worth some thought as you move forward. I realize that it is unconventional. Most companies these days are struggling to keep their development work private. What if you just laid it all out there for the masses?

Grant

Hi Grant,
Here are the slides from the webinar and I have tried to include screen shots of the examples. Perhaps this will be more useful for you.

We aren't trying to hide anything, or to be stealthy. My reason for doing the webinar was to get the crowds input on this feature before we had it built. It helped me get a number of examples from users so that we can make sure they are considered while we build it.

Additionally, all the work we currently have in flight on this is being done in zipline, so paying attention to the various ffc branches there will also give you more insight.

KR

Thanks Karen,

I couldn't sort out how to open the slides, but I did listen to the webinar. Some quick feedback:

  • It is not quite correct that trade data is not available in before_trading_start(). If the data are copied to context, then one day of bar data is missed, at the outset of backtesting/trading.
  • It wasn't clear to what extent the data will be point-in-time? Will the data sets get updated minutely/daily/weekly/monthly?
  • How will you manage start and end dates, without introducing bias (see discussion on https://www.quantopian.com/posts/dealing-with-securities-that-expire)? And ending up with lots of securities "stuck" in a portfolio, when backtesting, mucking things up?
  • Seems like you are just putting up a database, with no additional computing power (e.g. applying an optimization or statistical test across large numbers of securities). It is not clear that users would be able to run anything more complicated than today.
  • Under live trading, it seems like quality control could be a problem. Presumably, you'll be updating the database overnight, so what if something is off? It'd roll into trading the next day? How will this be managed?
  • Will you have any way to gauge the success of this effort? Will you be able to tell which algos use it in the contest? In the fund?
  • Again, I don't really understand how this will allow custom analyses over large numbers of securities? It seems like more of a database plus some additional spreadsheet-like filtering tools.
  • Will there be any way for users to set up their own databases, either by number crunching in the research platform or by uploading the database?
  • Did you talk with anyone who has done start arb for a living? If so, what was their feedback on your plans for this API?
  • On an unrelated note, I think Jess commented that for the fund, Q will not use IB, but a "prime broker" -- what does this mean? Will this provide any advantage to users not in the fund?

Grant

Hi Karen,

I posted an example notebook. It takes over 20 minutes to cycle over the approx. 20,000 securities in the Quantopian database with a relatively simple computation. I'm not clear how the new API will manage this sort of thing. Are you anticipating hours of computation time available, say pre-market? Or running as a parallel job, when the market is open? Or something else?

See https://www.quantopian.com/posts/clonable-notebook-example-looping-over-all-quantopian-securities for the example notebook.

Grant

What is the status of this effort?

It is in progress. We have a team of alpha users working with the API and giving us feedback. We hope to get it into the hands of all users as soon as possible.

Stay tuned for updates soon.

KR