Back to Community
Scaling contest score by out of sample duration

Many of contest algos have been running out of sample for several months. Usually they are good algos (otherwise they would have been stopped / resubmitted).

I think it would be more fair to either

  1. Scale score by out of sample period
    or
  2. Reset out of sample starting point for all algos at the beginning of each contest

This is applicable to both contests, but more important for the 1 month contest.

Why?

Metrics calculated from a longer out of sample period is more reliable than shorter one. It is obvious that 3 month performance is a lot more reliable than 1 month performance. But with longer run time, drawdown and stability metrics tend to get worse, hence algos that performance calculated over 3 months or more are very unlikely to win the 1 month contest.

If we want to compare 3 month performance with 1 month performance. There should be some reward for the reliability achieved through longer run time. Shoudn't it?

4 responses

This is one of the challenges we've been battling with since the beginning: in a one-month contest, the winner is often someone who just had a "lucky" month. We've done a lot to minimize the role of luck, but it's still there.

Boosting longer-running algorithms is definitely something that we've thought about. I never found a formulation I liked. The cure always ended up being worse than the disease. I haven't found a solution that does as you suggest.

In the end, we decided to resolve it by just removing the "lucky month" problem. We just kicked off our first 6-month contest, and in the future, that's going to be the contest that matters. The long-running algorithms that meet the criteria will be the ones that are rewarded.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Hear, Hear!

Hi Dan,

For 1 month contests, why not simply comparing algos based on the same period? I.e. for Aug contest, all algos will be compared based on performance from Aug 1 to Aug 31 regardless of when they were submitted? That will level the playing field, and gives long running algos equal chance of getting a lucky month.

You guys can still keep track of the actual out of sample performance to pick good algos for the fund.

Best

Boosting longer-running algorithms is definitely something that we've thought about. I never found a formulation I liked. The cure always ended up being worse than the disease. I haven't found a solution that does as you suggest.

I know this is a old post but I'm unsure if this problem has been solved (IMHO it hasn't as the backtests are only 2 years).

One of the most important questions is what are you actually trying to get with "score" that you are looking for. Long term survival? Minimum risk? Short term volatility adjusted profit in current trend environment? I would assume longer term survival and vol adjusted returns?

My humble opinion: If your contest lasts only a month and backtest only 2 years then it is not just hard but impossible to find a scoring method that works. If you have lots of competitors in the competition then there is no way to escape survivorship bias especially if the last 2+ years are similar or quite similar market regimens (the case at the moment).

I have been developing genetic trading algos in the past (and these little things really want to cheat if they can! Few of them exploited a bug in the framework that I was using to win for example..) and the most reliable method was really long backtest that was consistent. I know backtests don't tell that much as they can be curve-fitted but at least they tell you something if you have 10+ (preferably 20+) year performance that is consistent in different market regimens.

On longer term tests one simple fitness score was best of all the others I tested and most consistent in out-sample - it was "MAR", ie. annualized profits/max DD - almost the same as "calmar ratio" that you were using at some point but with scores calculated for the whole period.

At least I would consider making backtest for the competition last a few market regimens and quite a bit longer term (10+ years), otherwise you are into nasty surprises with real assets and algos that "win" in current regimen but will be killed in downtrend. I could probably make a algo that would be very consistent in last 2 years + month but there is really no point in just aiming to win one competition when the actual target is long term survival.

Just my 2 cents.

(edit: It seems contest is not 1 month, it's 6. That's better but still not survivorship-bias free)