Backtester Change: Updated Benchmark

Short version: We've changed the benchmark used in backtesting. When you re-run an algorithm through the backtester, you should expect different results for the benchmark and risk calculations.

Long version: A few months ago it was pointed out in this thread that we were using a price-based benchmark rather than a returns-based benchmark. When we thought about it, it was pretty clear we'd chosen the wrong one. If you're comparing your algorithm to a benchmark, you need to look at the returns of that benchmark. This evening we changed the benchmark being used in the backtester.

The effect of this is that most backtest results will look a little different than they did before. The benchmark will look better, which means the algorithm results won't look comparatively as good, and the risk metrics driven by the benchmark will similarly be different. For short backtests, or backtests that didn't include a dividend date, there won't be a change. For an 11-year test, the difference is pretty significant.

We know that having a stable backtester is important, and having an accurate backtester is also important. This is one of those times where the goals are in conflict, and being correct is more important than being consistent. We work very hard to make changes like this as unusual as possible. We're sorry for any inconvenience this change causes.

Example: The backtest below buys a bunch of SPY and holds it. You see the returns match pretty closely. The tiny differences in the returns are driven by the real-world effects of a strategy v. a benchmark.

• The benchmark is assumed to be 100% invested at the market open, while the algorithm has to place an order in the first bar and be filled in the second bar.
• The algorithm can't be 100% invested because it can't hold partial shares. It always has a small cash position, and that cash has 0% return.

Still to Come: One of our highly-requested features is to make the benchmark customizeable. This change we shipped today gets us a long way towards that - when we put in the new benchmark, we laid the foundation for future changes on the fly. I don't have a delivery target yet but I'm hoping we can finish it sooner than later.

Edit 31-Jan-14: Shipped another version of the benchmark - benchmark now even closer to the fully-invested SPY return.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

7 responses

In the Risk Metrics, the Max Drawdown of 18% looks surprisingly low for the SP500. I cloned this algorithm and ran it with $1m to minimize any effects of uninvested cash. The max value was$1,449,350 on 2007-Oct-09 and the subsequent min value was $653,860 on 2009-Mar-09. That makes the drawdown amount equal to$795,490, or 54.9% of the strategy's prior peak.

Hi Colin,

That sounds like a bug on our end - thanks for letting us know! We'll look into it and I'll let you know when its fixed.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Drawdown should be 1 - (price / cummax(price)

where price is the value of the strategy at the close of the measurement period (e.g. daily).

Here is a simple implementation, but the calculation is easy to do using the pandas.DataFrame.cummax function.

Hello Alisa,

Is this confirmed as a bug? If so does it effect both the algo and benchmark drawdown calculations? And when will it be fixed?

P.

Hi Peter,

We're digging into this issue and I'll let you know when we get to the bottom of it. And thanks to Colin for pointing it out, I'll post any updates here.

Cheers,
Alisa

Taking a quick look at the cumulative.py file in the zipline/finance/risk folder, I would focus on the following lines of code:

402: if self.current_max < self.compounded_log_returns[self.latest_dt]:
403: self.current_max = self.compounded_log_returns[self.latest_dt]

It appears as if the equality sign in line 402 is reversed (i.e. I believe it should be >)

Lines 409-411 do the drawdown calculation on log returns, and this looks correct to me. This returns a positive drawdown number (e.g. 0.25 = 25% DD).