Back to Community
Great paper on overfitting your backtests
Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

3 responses

Very interesting paper, but I felt the conclusion left me wanting more. On the one hand, they say that overfitting is unavoidable when selecting parameters from a very large universe of possibilities, yet suggest that more granular data, which creates a larger universe of possiblities, has likely already been mined by 'large quantitative funds' through their 'expertise and facilities.'

I wonder what specific expertise and facilities they are referring to.

"So when a computer program, such as ours, produces an optimal set of weights, it is selecting from an inconceivably large set of possible weighting sets, and thus statistical overfitting of the backtest data is unavoidable."

"Any underlying actionable information that might exist in such data has long been mined by highly sophisticated computerized algorithms operated by large quantitative funds and other organizations, using much more detailed data (minute-by-minute or even millisecond-by-millisecond records of many thousands of securities), who can afford the expertise and facilities to make such analyses profitable."

And god knows what this says about deep learning and neural nets. Is it all a complete waste of time? If so what the hell are so many thousands of academics and practitioners wasting their time for?

On the other hand weather prediction has improved dramatically over the past 70 years. Just as unfathomable a chaotic system as the stock market. So we all plug on.

Perhaps a slightly irritating piece of research?. All the negatives and absolutely no positives. Fine, we are all doing it wrong; but these guys have no answers for us.

In any event, regarding their criticism of the use of daily and longer data as opposed to minutely, well Quantopian addresses that one.

I also happen to feel they are completely and utterly wrong about the use of daily or monthly data for very long term studies.

In an rare interview on YouTube (https://www.youtube.com/watch?v=QNznD9hMEh0) Jim Simons actually mentions in passing that the Medallion fund has been using machine learning and huge data sets. And they started out in the 1980's ...