Back to Community
Quantopian Leaderboard

Is there a reason as to why the top ranking algos have terrible backtests? Could it be that there betting on one quick daytrade? For example, this algo only makes 2% annually but also has a terrible backtest with even lower annual yields.

Why cant quantopian make a contest which gives more weight to more relevent variables for being used in pro money, like max drawdown and backtest results? What hedgefund wants an algo that can only make 2% a year in good conditions?

13 responses

Risk adjusted return is the only thing that matters. 2% a year with a Sharpe Ratio of 7.0 means that Quantopian can dynamically leverage your fund to generate 20% a year with 10x leverage without much pain. With that said, you can't conclude with just a few days of data. The ranking will become more accurate as time passes.

Really? 10X leverage? Ha! How would you even pay the 2~8% margin fee? And what about the risk metrics going up as the leverage goes up? How do you know that its not already using leverage? If it was so feasible, then why wouldnt the OP use 10x leverage on the version submitted?Im sorry, Its really not that easy.

What do you have to say about the backtest then?

Really? 10X leverage? Ha! How would you even pay the 2~8% margin fee?

The leverage was a hypothetical scenario. The margin structure applies differently to Quantopian since they have better access to money. Please read more about the Quantopian contest rules:
Also read this about their fund:

And what about the risk metrics going up as the leverage goes up?

Only your volatility and max drawdown should increase with leverage if your risk adjusted return are consistent. Hypothetically, your high return and high sharpe ratio will help balance the metrics so your ranking should still be good even with increased risks.

If it was so feasible, then why wouldnt the OP use 10x leverage on the version submitted?Im sorry, Its really not that easy.

There's a limit to the leverage in the contest and if the contest ranks people correctly, increasing leverage won't do squat if your risk adjusted returns are poor as it'll affect all your metrics. If your example fund is to be believed with its 7.0 sharpe ratio, it's an exceptional algorithm worthy of 1st place.

What do you have to say about the backtest then?

As I've said, you're making conclusions on just a few days of data compare to the backtest. The contest is for 6 months and initial metrics and rankings will swing wildly before they stabilize.

Even 6 months is a pretty darn short period of time. It's just a little blip, to compare with the backtest, to get some sense that things aren't too out of whack. My hunch is that Quantopian will need many years of real-money trading before they'll be eligible for big institutional money. There's a lot of money sloshing around, though, so the wait could be worth it.


Kyle Foster ask Q "why the top ranking algos have terrible backtests?"

Annual Returns
Annual Volatility
Max Drawdown
Sortino Ratio

I remember yours words:
"They need a fund that looks like a bank CD, except with higher return" on May 21, 2015 at "It's All About That Beta" discussion.

And I remember my answer:
You absolutely right about CD like investment.
Not only Sharpe Ratio will get rank 1 as StDev =0,
Calmar Ratio will get rank 1 as Maximum Drowdown =0,
Volatility will get rank 1, as it will be zero,
Beta will get rank 1 as it will be zero,
Stability "of doing nothing" will be ranked in top ten,
Consistency "of doing nothing" will be near 1.
And it doesn't metter what is your return rank 100 or 300 because 6 not money related factors
already open the door to Quantopian Hedge Fund Management.
To be exposed to market is not only risk it is an opportunity to make money.

Do you think Annual Return 0.06656% is higher then bank CD?
Do you think the algo with Annual Return 0.06656% should be ever top ranked in 8% bull market ?
Or may be contest ranking system needs to be fixed?
May be it is time to replace that "fig leaf counts indicator" with Information Ratio which may solve the problem?
The question "why the top ranking algos have terrible backtests?" is still open.

If I understand the contest rules correctly, the winner is judged solely on his 6 months of live trading performance, and not on his backtest. I haven't kept up with the this correct? From the rules, we have "Just like our fund selection method, the Participant's algorithm will be judged on a combination of backtest performance and paper trading performance" but I don't see how the combination is done. In fact, it says "We will calculate an overall rank by averaging the Participant's rank in each criterion in paper trading." So, is the backtest considered at all?

The simple answer to the question may be that if the 2-year backtest results are not used to determine the current rank, then the ranking will be a roll of the dice.

But then if the backtest results are used, one runs into the "over-fitting" problem:

In the end, Quantopian should now have some data indicating whether the contest, as constructed, is useful. Last I heard, they'd funded 17 algorithms (with their own seed money) and are working to scale up to take on external capital. That's a decent sample. So did any of those algos come from contest entries? How did they rank? What was the relationship between the backtest and the live trading results? How are things faring with real money? Personally, I'd be more interested in ranking my algos against the current Q fund algos, than the contest entries.

I think if Quantopian knew how to align the contest rules with the Q fund, they'd have done it by now. In fact, they tried, and it didn't work out so well. It used to be that the winner got $100K in seed money. They've switched to a less transparent, but presumably better process for seeding fund algos. As the contest stands now, it is kind of a parallel promotional/motivational effort, with an unknown relationship to the actual Q fund algo selection process.

As the contest stands now, it is kind of a parallel promotional/motivational effort, with an unknown relationship to the actual Q fund algo selection process.

I agree.

The contest leaderboard is not meaningful after only trading 9 days.

At this moment, the algorithm has the 2nd best paper trading score. But if you look at the backtest score, the algorithm is ranked 499th. I'm pretty sure you'll see the live performance slip over the length of the contest.

Also, please note that algorithms with leverage that exceeds 3X are disqualified.

I certainly agree that our ranking method is imperfect. I'm interested in revising the rules/judging sometime next year.


The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Hi Dan -

Regarding your comment:

I certainly agree that our ranking method is imperfect. I'm interested in revising the rules/judging sometime next year.

It would be worth considering if you could devise a way make this effort a community project, perhaps putting out a request for comment post/document to kick things off. This would be one way to head off the weeping and gnashing of teeth over the whole thing. If there is significant community ownership and some kind of collective consensus that the rules and judging make sense, then you might end up with a better result.

As a side note, for Contest 19, the leader appears to have pretty consistent results, backtest to live trading:

The algo was submitted on May 19, 2016 11:58 AM. For this sample of one, the contest seems to be working.

I would also be interested to hear when Quantopian plans on adding the "Stability of Return" metric to back testing results. Also does the 6 month contest period assume a cold start, or a running start?

It seems to me that the contest rules will have to continue to change over time if the goal is to supply the fund with algos that are uncorrelated to those already in the fund. The challenge is that once the fund has sufficient diversification in the algos, they need to dynamically allocate across them based on market and economic conditions.

Two years of backtest and six months of OOS trading for any single algo is enough. It doesn’t need to perform well over a ten year horizon; thats the expectation for the Q fund. Building predictive models outside of a trading context, 3-6 months is as long as I’d go without re-validating the model.

The top ranked entries in the most recently started contests typical fall off after 3 months. As others have pointed out this is mostly a result of overfitting. A good reality check on this would be to run a two year backtest starting three years ago, then a six month backtest ending at the present.

I think it would be much more helpful if there were some kind one-to-one correspondence between the contest, and algos that are evaluated and subsequently funded with Q fund seed capital (and larger allocations, as the data become available). This was how the contest started, but the correspondence has become murky, in my opinion. Originally, winning contest algos were awarded $100K seed money directly, but now it is not even clear if any contest algos have been funded at all. Q now has a stable of Q fund algos that presumably have backtests consistent with their > 6-month out-of-sample results, and in some cases, real-money track records. So, it would be feasible to evaluate contest algos relative to this baseline. For example, there is the "Low Correlation to Peers" requirement. So, one could imagine that contest algos would be compared to the Q fund stable of algos, with whatever correlation metric is used to accept/reject Q fund algos. This might help inform the crowd regarding the degree to which their algos need to be unique.

Generally, it would be nice to have some sort of overall metric of how close the algo is to what is needed for the Q fund (e.g. a score of 0-10). Algos that score 8 or higher, for example, would be ensured of a 'high-touch' evaluation, with detailed feedback from the Q fund team.

Also, if I'm not mistaken, the contest winners are decided solely upon their paper trading record, correct? But is more weight placed on algos that have longer records? Say I had a contest algo that paper traded for 2 years in the contest, with consistent, decent results. It shouldn't get creamed by someone who gets lucky with 6 months of anomalously good short-term results. Am I missing something, or is there an un-level playing field, that incentivizes short-term "pops" in performance?

Another thought would be to issue some form of ownership in Quantopian and/or the fund, proportional to ranking in the contest. As I understand, there are regulatory limitations on who could invest in the Q fund, but if shares were issued at no risk to the owner, then perhaps the regulations would not apply? Overall, this would be a step in the direction of a more open, collective, crowd-sourced effort. My sense is that a new level of engagement will only be achieved by ceding some power to the crowd. As things stand, there is a very counter-productive top-down feel to the whole enterprise, in my opinion.