Is there a reason as to why the top ranking algos have terrible backtests? Could it be that there betting on one quick daytrade? For example, this algo only makes 2% annually but also has a terrible backtest with even lower annual yields.

Why cant quantopian make a contest which gives more weight to more relevent variables for being used in pro money, like max drawdown and backtest results? What hedgefund wants an algo that can only make 2% a year in good conditions?

13 responses

Risk adjusted return is the only thing that matters. 2% a year with a Sharpe Ratio of 7.0 means that Quantopian can dynamically leverage your fund to generate 20% a year with 10x leverage without much pain. With that said, you can't conclude with just a few days of data. The ranking will become more accurate as time passes.

Really? 10X leverage? Ha! How would you even pay the 2~8% margin fee? And what about the risk metrics going up as the leverage goes up? How do you know that its not already using leverage? If it was so feasible, then why wouldnt the OP use 10x leverage on the version submitted?Im sorry, Its really not that easy.

What do you have to say about the backtest then?

Really? 10X leverage? Ha! How would you even pay the 2~8% margin fee?

The leverage was a hypothetical scenario. The margin structure applies differently to Quantopian since they have better access to money. Please read more about the Quantopian contest rules: https://www.quantopian.com/open/rules

And what about the risk metrics going up as the leverage goes up?

Only your volatility and max drawdown should increase with leverage if your risk adjusted return are consistent. Hypothetically, your high return and high sharpe ratio will help balance the metrics so your ranking should still be good even with increased risks.

If it was so feasible, then why wouldnt the OP use 10x leverage on the version submitted?Im sorry, Its really not that easy.

There's a limit to the leverage in the contest and if the contest ranks people correctly, increasing leverage won't do squat if your risk adjusted returns are poor as it'll affect all your metrics. If your example fund is to be believed with its 7.0 sharpe ratio, it's an exceptional algorithm worthy of 1st place.

What do you have to say about the backtest then?

As I've said, you're making conclusions on just a few days of data compare to the backtest. The contest is for 6 months and initial metrics and rankings will swing wildly before they stabilize.

Even 6 months is a pretty darn short period of time. It's just a little blip, to compare with the backtest, to get some sense that things aren't too out of whack. My hunch is that Quantopian will need many years of real-money trading before they'll be eligible for big institutional money. There's a lot of money sloshing around, though, so the wait could be worth it.

Grant,

Kyle Foster ask Q "why the top ranking algos have terrible backtests?"

858
Annual Returns
0.06656%
18
Annual Volatility
0.4190%
850
Sharpe
0.1612
17
Max Drawdown
-0.4414%
804
Stability
0.1683
796.0
Sortino Ratio
0.2372
16
Beta
0.0006087
Correlation
-4.164%

I remember yours words:
"They need a fund that looks like a bank CD, except with higher return" on May 21, 2015 at "It's All About That Beta" discussion.

You absolutely right about CD like investment.
Not only Sharpe Ratio will get rank 1 as StDev =0,
Calmar Ratio will get rank 1 as Maximum Drowdown =0,
Volatility will get rank 1, as it will be zero,
Beta will get rank 1 as it will be zero,
Stability "of doing nothing" will be ranked in top ten,
Consistency "of doing nothing" will be near 1.
And it doesn't metter what is your return rank 100 or 300 because 6 not money related factors
already open the door to Quantopian Hedge Fund Management.
To be exposed to market is not only risk it is an opportunity to make money.

Do you think Annual Return 0.06656% is higher then bank CD?
Do you think the algo with Annual Return 0.06656% should be ever top ranked in 8% bull market ?
Or may be contest ranking system needs to be fixed?
May be it is time to replace that "fig leaf counts indicator" with Information Ratio which may solve the problem?
The question "why the top ranking algos have terrible backtests?" is still open.

If I understand the contest rules correctly, the winner is judged solely on his 6 months of live trading performance, and not on his backtest. I haven't kept up with the details...is this correct? From the rules, we have "Just like our fund selection method, the Participant's algorithm will be judged on a combination of backtest performance and paper trading performance" but I don't see how the combination is done. In fact, it says "We will calculate an overall rank by averaging the Participant's rank in each criterion in paper trading." So, is the backtest considered at all?

The simple answer to the question may be that if the 2-year backtest results are not used to determine the current rank, then the ranking will be a roll of the dice.

But then if the backtest results are used, one runs into the "over-fitting" problem:

I think if Quantopian knew how to align the contest rules with the Q fund, they'd have done it by now. In fact, they tried, and it didn't work out so well. It used to be that the winner got $100K in seed money. They've switched to a less transparent, but presumably better process for seeding fund algos. As the contest stands now, it is kind of a parallel promotional/motivational effort, with an unknown relationship to the actual Q fund algo selection process. As the contest stands now, it is kind of a parallel promotional/motivational effort, with an unknown relationship to the actual Q fund algo selection process. I agree. The contest leaderboard is not meaningful after only trading 9 days. At this moment, the algorithm has the 2nd best paper trading score. But if you look at the backtest score, the algorithm is ranked 499th. I'm pretty sure you'll see the live performance slip over the length of the contest. Also, please note that algorithms with leverage that exceeds 3X are disqualified. I certainly agree that our ranking method is imperfect. I'm interested in revising the rules/judging sometime next year. Disclaimer The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances. Hi Dan - Regarding your comment: I certainly agree that our ranking method is imperfect. I'm interested in revising the rules/judging sometime next year. It would be worth considering if you could devise a way make this effort a community project, perhaps putting out a request for comment post/document to kick things off. This would be one way to head off the weeping and gnashing of teeth over the whole thing. If there is significant community ownership and some kind of collective consensus that the rules and judging make sense, then you might end up with a better result. As a side note, for Contest 19, the leader appears to have pretty consistent results, backtest to live trading: https://www.quantopian.com/leaderboard/19/573de28db77d9afa9400016f The algo was submitted on May 19, 2016 11:58 AM. For this sample of one, the contest seems to be working. I would also be interested to hear when Quantopian plans on adding the "Stability of Return" metric to back testing results. Also does the 6 month contest period assume a cold start, or a running start? It seems to me that the contest rules will have to continue to change over time if the goal is to supply the fund with algos that are uncorrelated to those already in the fund. The challenge is that once the fund has sufficient diversification in the algos, they need to dynamically allocate across them based on market and economic conditions. Two years of backtest and six months of OOS trading for any single algo is enough. It doesn’t need to perform well over a ten year horizon; thats the expectation for the Q fund. Building predictive models outside of a trading context, 3-6 months is as long as I’d go without re-validating the model. The top ranked entries in the most recently started contests typical fall off after 3 months. As others have pointed out this is mostly a result of overfitting. A good reality check on this would be to run a two year backtest starting three years ago, then a six month backtest ending at the present. I think it would be much more helpful if there were some kind one-to-one correspondence between the contest, and algos that are evaluated and subsequently funded with Q fund seed capital (and larger allocations, as the data become available). This was how the contest started, but the correspondence has become murky, in my opinion. Originally, winning contest algos were awarded$100K seed money directly, but now it is not even clear if any contest algos have been funded at all. Q now has a stable of Q fund algos that presumably have backtests consistent with their > 6-month out-of-sample results, and in some cases, real-money track records. So, it would be feasible to evaluate contest algos relative to this baseline. For example, there is the "Low Correlation to Peers" requirement. So, one could imagine that contest algos would be compared to the Q fund stable of algos, with whatever correlation metric is used to accept/reject Q fund algos. This might help inform the crowd regarding the degree to which their algos need to be unique.