Great paper on overfitting your backtests

Very interesting paper, but I felt the conclusion left me wanting more. On the one hand, they say that overfitting is unavoidable when selecting parameters from a very large universe of possibilities, yet suggest that more granular data, which creates a larger universe of possiblities, has likely already been mined by 'large quantitative funds' through their 'expertise and facilities.'

I wonder what specific expertise and facilities they are referring to.

"So when a computer program, such as ours, produces an optimal set of weights, it is selecting from an inconceivably large set of possible weighting sets, and thus statistical overfitting of the backtest data is unavoidable."

"Any underlying actionable information that might exist in such data has long been mined by highly sophisticated computerized algorithms operated by large quantitative funds and other organizations, using much more detailed data (minute-by-minute or even millisecond-by-millisecond records of many thousands of securities), who can afford the expertise and facilities to make such analyses profitable."

And god knows what this says about deep learning and neural nets. Is it all a complete waste of time? If so what the hell are so many thousands of academics and practitioners wasting their time for?

On the other hand weather prediction has improved dramatically over the past 70 years. Just as unfathomable a chaotic system as the stock market. So we all plug on.

Perhaps a slightly irritating piece of research?. All the negatives and absolutely no positives. Fine, we are all doing it wrong; but these guys have no answers for us.

In any event, regarding their criticism of the use of daily and longer data as opposed to minutely, well Quantopian addresses that one.

I also happen to feel they are completely and utterly wrong about the use of daily or monthly data for very long term studies.

In an rare interview on YouTube ( Jim Simons actually mentions in passing that the Medallion fund has been using machine learning and huge data sets. And they started out in the 1980's ...