Back to Community
Question Regarding the Benchmark

Hi All -

I've posted on the forum on a few occasions regarding observed discrepancies between the backtester in Quantopian and a backtester that I developed. One considerable difference I noticed was in the benchmark.

I noticed (and have posted before) that the benchmark, SPY, underperforms the algo when the algo is to buy and hold SPY. The algo outperforms by about 20%, in the time period below, diverging from the benchmark with time. I may have misunderstood responses to this inquiry in the past but I now understand that the benchmark DOES NOT include any dividend distributions! (see examples below) Note that I'm not saying anything about reinvesting dividends or about how the dividends are handled - the dividends are completely ignored in the benchmark as far as I can tell.

The benchmark is essentially penalized when SPY pays its dividend because the dividend is never added to the total return of the benchmark. Is the benchmark supposed to ignore dividend returns from SPY? This seems to err on the side of making algos look like they outperform a buy-and-hold strategy.

SO LONG STORY SHORT: Shouldn't the benchmark include dividends (either reinvested or not?)

13 responses

So in this algo, SPY is purchased on the first day and held thru out the simulation. Dividends are collected but are NOT reinvested.

January 3, 2002 - Simulation start. SPY price = $116.84 / share
July 12, 2013 - Simualtion end. SPY price = $167.51 / share
Price return = $50.67 / share = 43.47%
Dividends paid = $26.997
Total return = $77.667 = 66.47%
(Note the algo return is close to 66% and the benchmark is close to 43%. So benchmark is price return only?!?)

Clone Algorithm
36
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 52140c213ed69a06d7f8ec8d
This backtest was created using an older version of the backtester. Please re-run this backtest to see results using the latest backtester. Learn more about the recent changes.
There was a runtime error.

Cloned from Jiaming Kong - this algo reinvests dividends as they are paid.

Note:
Jan 3, 2002 dividend adjusted price of SPY = $93.44
July 12, 2013 dividend adjusted price of SPY = $167.51
Total return = $74.07 = 79.3%

The algo here returns 73.3%, probably because of a small cash position?

Clone Algorithm
22
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 52140e7d4354bf06c78bd174
This backtest was created using an older version of the backtester. Please re-run this backtest to see results using the latest backtester. Learn more about the recent changes.
There was a runtime error.

Daniel,
The order algorithm only takes in rounded integer as shares to buy. That's why I manually rounded the numbers before I order it.

desired_amount[i] = np.round(context.portfolio.starting_cash / context.m / prices[i])

That being said that you will have some cash positions. If you look at the log output of the above algorithm, sometimes the algo received cash dividend, and it would buy 90 shares of SPY, and buy another 1 share tomorrow because the SPY just went down a little bit to be affordable.

@Jiaming - Absolutely, that makes complete sense. I fully understand your algo and I think it works perfectly. My goal is to understand discrepancies between different tests that I've run - essentially to understand the caveats and assumptions within Quantopian.

Most of us watch SPY as a benchmark to gauge the performance of our algorithm (this may actually be a mistake but that is a different discussion) and if you were to buy-and-hold SPY you would return: ~43% on price return, ~66% with dividends, and ~79% with dividends reinvested. Quantopian has decided to use only the price return - which explains why algos that underperform SPY in my backtester (which uses reinvested dividend return) outperform in Quantopian. Their assumption is not outwardly wrong but it is a very, very important caveat for investors looking to "beat" the market.

I was really hoping to hear from one of the Quantopian staff on whether the price return was intentionally used or whether the omission of dividends was something they were going to change in the future. Overall, I'm very happy to have made some progress in resolving discrepancies between backtesters.

Great thread, thank you for starting it.

This wasn't a choice that we made with a great deal of thought. The original thought was "We need a benchmark. People have different needs, so we should let them choose their own benchmark. But we don't have a benchmark-chooser feature yet, so let's go with a simple obvious benchmark. SPY sounds good, right?" And so the choice was made. We didn't really consider the SPY dividends or how to apply them.

You've definitely convinced me that our choice had deeper implications than I realized. I agree, we should be using a total-returns value for our benchmark.

I think our path going forward probably has three parts. First is to label what we have better, which should mitigate the problem for now. The second part is to make the benchmark configurable per algorithm, so people can choose the right benchmark for their algo. The third part is to make the default benchmark a smarter choice, like a total-return S&P 500, rather than a price return. I don't have a timeframe on that right now - I need to do some spec work, estimation, and adjust our roadmap.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

No worries Dan.

I'm feeling much better overall. For the past 3 weeks approximately I've been pulling my hair out trying to find the differences between our in-house (less sophisticated, more buggy, and feature-limited) backtester and Quantopian. One-by-one I've nailed down a few differences in the assumptions that have been made and I'm seeing things converge. As I'm sure you folks know, when you approach any computer problem with multiple strategies or executions and the answers converge its a really good thing!

Hello Dan D.,

I would vote for a total-return benchmark as the priority. Following Daniel's comments about buy-and-hold SPY as a benchmark I see that that gives a 23% or so return from Jan-08 to Aug-13 against the built-in benchmark of 13% or so. The current benchmark significantly 'flatters' users' algos.

P.

@Peter -
Agreed. In the meantime, you can run a separate simulation that buys and holds SPY for the same back test period and then you would need to look at the returns of the SPY buy-and-hold "algo" and compare to your own algo. It is an extra step but at least you can actually compare the algo to SPY. (Or to your own benchmark)

In the ideal case (in my opinion) would be for the user to construct a customized buy-and-hold portfolio to use as a benchmark in the initialize function, possibly consisting of as many as 5 securities. This way, if you are algorithmically constructing portfolios (which is what I do) you can construct a portfolio that is a blend of SPY and AGG, weighted in a way to reflect risk tolerance objective of the portfolio. Or... if you are building an algo focused on emerging markets then you can select an emerging markets focused benchmark. There are literally hundreds of appropriate benchmarks depending on the focus of the strategy.

On a related note, regarding dividends - be aware peter that if you are watching for a price drop below a moving average, as is done in the sample algo, you get false buy and sell signals on dividend ex-dates. This also means moving averages (or any algo based on prices over multiple trading days) could incorporate prices pre- and post-dividend ex-date and would therefore give incorrect signals. For my algos, I generate a csv of dividend adjusted data and fetch_csv that, use it to dictate buy/sell signals and then use Quantopians order function (which buys/sells at non-dividend adjusted prices and handles dividends as events)

Hello Daniel,

Your posts have been very informative over the last few weeks and have made me realise that the built-in benchmark undersates returns by a substantial percentage depending on the length of the backtest. I'm now running a SPY buy-and-hold for the backtest period as a workaround. I like the customised 'portfolio' benchmark idea.

That's a very good point about dividends that I hadn't picked up on. It would be great if the Quantopian data could include dividend dates that are available to the algo.

P.

Not to be a pest but, the concern should be a risk adjusted benchmark. If there is anything worth leveraging for this early on, it might be to test an algorithm that is inflation adjusted against a benchmark that is inflation adjusted. Technically, the benchmark should be strategy neutral, an adjustable benchmark would be wonderful, but what we may be better off asking for is the means to use past, template, and user algorithms as a benchmark.

Along the lines of "What the benchmark should measure", I'm curious if others think it would make sense to start only after any pre-roll of data is complete.

when using @batch_transform functions, history, moving averages and the like my calculations really can't begin until enough days have past, however the benchmark starts up right from day one, which doesn't seem like a 1:1 kind of comparison.

Picking up on this thread, I agree it would be nice if the benchmark were total return. I offer not only a work-around but something that might be a best practice anyway. I don't mean this to be THE way to run a back-test, but a practice to include in the course of examining a strategy, particularly long-only strategies. Run the strategy with a short position representing the benchmark. The return will become an excess return and the standard deviation will become a tracking error. obviously one isn't limited to SPY, but might consider an equal, cap, or volume weighted portfolio of the stocks traded. To some extent this is common. The Fama-French factors (high minus low, big minus small, etc) are effectively long-short portfolios.

Closing the loop on this - the change is done, and the benchmark is now total return.