How to Hack the Contest (without cheating)

Of course the devil is in the details, but basically, since Quantopian stresses algorithmic simplicity and performance within specified parameters, the problem is straightforward.

Step 1: Define a large set of possible data inputs (eg, price, P/E, volatility, etc..) to consider as algorithmic inputs.
Step 2: Define a simple prototype algorithm that can employ A FEW of these at a time in making trading decisions.
Step 3: Develop your own back testing algorithm and buy historical stock data from data providers.
Step 4: Rent computing power from a cloud provider...
Step 5: So you can run thousands of back tests...
Step 6: And use machine learning to find the best parameters and inputs for your prototype algorithm!
Step 6: Enter your winning algorithm(s) in the contest.
Step 7: Profit. ;)

This would not be easy nor free, but it is how I would approach the contest if I really needed to win. The intention here is to speculate about ways that someone could use external resources to increase the odds of winning the contest. Anyone have other ideas?

6 responses

James, what you are describing sounds like overfitting. If you train and tune a model using an extensive parameter search over historical data, you are likely to end up with a model that would have been very good at predicting equity returns in the that particular window of time. However, this usually has either no benefit, or a negative effect on predicting returns going forward. I encourage you to check out The Dangers of Overfitting, a lecture which explains what overfitting is, and why you should be cautious when fitting parameters for a model.

In the contest, the scoring is based entirely on the out-of-sample performance of each submission. In order to succeed in the contest, an algorithm needs to perform well after it was constructed. The allocation evaluation process also looks at out-of-sample performance. If an algorithm exhibits different behavior between the in-sample and out-of-sample periods, it's usually a sign of overfitting. We are looking for algorithms that are consistently profitable, among other things.

In response to "Quantopian stresses algorithmic simplicity and performance within specified parameters", I don't think that specifying parameters like beta neutrality and dollar neutrality make a trading strategy any simpler. In fact, I would expect the complexity of a strategy fitting this profile to be more complex as it has to control for both of these risk factors.

There are certainly limits that exist on the Quantopian platform. We are working hard to improve the platform and remove these limits as fast as we can. Recently, we doubled the memory available to research notebooks and algorithms. We are also working on increasing the number of datasets available on the platform, and speeding up access to these data. In the meantime, if you have data that you want to use that's not yet available on the platform, you can import it using local_csv in research, or Fetcher in an algorithm.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Thank you for the anwser. I know what overfitting is and the limitations of macine learning in that respect. But since my post is vague I see why you may think as you did.

The feature that seems to be missing from Quantopian is the ability to interact with the backtester through code. Is that correct?

I would like to be able to creat and run backtests dynamically through code and recieve the output variables as function output. For example, in psudo-code:

Stored_risk_parameters = User_command(Start_backtest(algorithm, Startdate,Enddate)

The licence agreement prohibits using automated means to interact with Quantopian, so it appears that if I wanted to run backtests automatically to guide dynamic algorithm generation I would have to build my own platform.

Hi James,

Yes, I think I have a better understanding of what you're saying, thanks for clarifying. You're right that there's no automated way to kick off backtests right now.

Out of curiosity, would you use the automated backtests to perform a parameter search? It's not clear to me that this type of fitting would help improve an algo for the reasons I mentioned above. One way that it might help, however, is if you were conducting a sensitivity analysis on parameters. Is that what you would be looking at?

I might package a wide range of data inputs into interchangeable modules and then run tests looking for combinations of modules that perform the best.

For example if you have normalized data X1 and X2, then one could try buying securities with high X1/X2 and comparing performance with the SP500.

You could take 100 data inputs and normalize them as X1-X100, and then test them in various combinations.

Then if you found several pairs of data that give some results... X1/X2 = P1, X12/X6 = P2, you could try combining them (P1+P2)/2, or P1/P2 etc.

When you find sources of alpha then you put them into a hedged algorithm:

Find securities with maximum ((X1/X7)+(X3/X5)+(X34/X2))/3 = STOCKS

Then create a hedged portfolio with these securities.

Something like that.

Also if back testing could be done in code it would be possible to use the back tester in a live trading algorithm. For example, one could write a algorithm that switches between several models based on short-term back tests to determine market conditions.

Hi James,

Those are some cool ideas, thanks for sharing.

For factor combination, I wonder if you could achieve something similar by using Alphalens in research? I did something similar in another thread where I conducted a search over a lookback window space for a particular factor. Of course, alphalens isn't the exact same as a backtest, but it's a good test for the predictive power of a factor, or particular combination of factors. The backtest is more of a simulation of how the signal holds up with some real world conditions like commission and slippage.

The computing power available on Quantopian would limit the space you can look over, but you could certainly perform some form of a coarse search.

I still want to add a word of warning against overfitting. You sound like you know what you're doing, but for someone who is a bit newer to the concept, it's important to understand that the more you search a space for combining factors or fitting models, the more likely it is that the system is overfit.

The idea of doing an on-the-fly parameter search is an interesting one. I can see why this might have some value, but I would also think you'd have to have a good sense of the decay rate of a particular signal. If you wait too long to re-fit the model, the signal might be gone, and if you do it too quickly, I imagine it would be overfit. Either way, it sounds interesting. While it might not work on a large number of factors, you might be able to do a course search on a small group of factors, maybe just to verify that there's something that could work.

My recommendation would be to start with Alphalens, and maybe try rolling the simulation window as well as a parameter going into the factor.

I notice on that thread there is a user asking for similar functionality (the ability to programmatically run back tests) as I was. I see how alpha lens should be able to do something similar as far as testing data, but not so much in testing other aspects of a trading algorithm, but there may be ways to work around it. At any rate this seems like a great place to learn and try out ideas quickly, so that is what I am doing here. Thanks for your input!