Back to Community
Question Regarding the Benchmark

Hi All -

I've posted on the forum on a few occasions regarding observed discrepancies between the backtester in Quantopian and a backtester that I developed. One considerable difference I noticed was in the benchmark.

I noticed (and have posted before) that the benchmark, SPY, underperforms the algo when the algo is to buy and hold SPY. The algo outperforms by about 20%, in the time period below, diverging from the benchmark with time. I may have misunderstood responses to this inquiry in the past but I now understand that the benchmark DOES NOT include any dividend distributions! (see examples below) Note that I'm not saying anything about reinvesting dividends or about how the dividends are handled - the dividends are completely ignored in the benchmark as far as I can tell.

The benchmark is essentially penalized when SPY pays its dividend because the dividend is never added to the total return of the benchmark. Is the benchmark supposed to ignore dividend returns from SPY? This seems to err on the side of making algos look like they outperform a buy-and-hold strategy.

SO LONG STORY SHORT: Shouldn't the benchmark include dividends (either reinvested or not?)

13 responses

So in this algo, SPY is purchased on the first day and held thru out the simulation. Dividends are collected but are NOT reinvested.

January 3, 2002 - Simulation start. SPY price = $116.84 / share
July 12, 2013 - Simualtion end. SPY price = $167.51 / share
Price return = $50.67 / share = 43.47%
Dividends paid = $26.997
Total return = $77.667 = 66.47%
(Note the algo return is close to 66% and the benchmark is close to 43%. So benchmark is price return only?!?)

Clone Algorithm
39
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
#Use SPY 15 day MA to determine when to go to cash
#When not in cash buy and hold the most undervalued sectors

import datetime
import math
import numpy
import pandas

def initialize(context):
    #
    #
    #Variables for later
    context.day=None
    context.pauseUntil=0
    context.timestep=0
    context.margin_req=0 
    context.totalShorts=0
    context.totalLongs=0
    context.cash=0
    context.oldRanks=None
    context.timer=0
    context.holdNmanyDays=63
    context.liquid=1
          
    #
    #
    #Set constraints on borrowing
    context.pct_invested_threshold=1 #Set limit on percent invested (as a decimal)
    context.init_margin=.01 #Set initial margin requirement    
    context.maint_margin=.01 #Set the maintenance margin requirement   


    #
    #
    #Read universe
    context.SPY=sid(8554)
    context.Sectors=[sid(19662), sid(19659), sid(19656), sid(19661), sid(19655), sid(19658), sid(19660), sid(19654), sid(19657)]  
    
 
    

def handle_data(context, data):

    update_newFrame(context,data)
    
    if context.timestep == 1:
        log.debug("Timestep: "+str(context.timestep)+" buying SPY")
        nshares=context.portfolio.cash*0.99/data[sid(8554)].price
        generate_order(sid(8554), nshares, context, data)
        

    
#
#
#    Supporting Functions
#
#

    

def update_newFrame(context, data):
    #
    context.cash = context.portfolio.cash
    context.portvalue = context.portfolio.positions_value
    context.totalShorts=0
    for sym in data.keys():
        if context.portfolio.positions[sym].amount < 0:
            context.totalShorts += (context.portfolio.positions[sym].amount * data[sym].price)
        else:
            context.totalLongs += (context.portfolio.positions[sym].amount * data[sym].price)

    update_portvals(context)
    
    #Handle assigning the timestep number (1 day is 1 timestep)
    if get_datetime().day <> context.day: #look for a new day
        context.day=get_datetime().day
        context.timestep += 1
        context.timer += 1
        #log.info ( "Cash: "+str(context.cash)+"; Margin Req: "+str(context.margin_req)+" Avail Cash:"+str(context.cash - context.margin_req) )
        if context.timestep>context.pauseUntil:
            if context.cash < context.margin_req: #check for margin calls daily
                generate_marginCall(context, data)
            
    
def update_portvals(context):
    #Update account information when this function is called
    context.total_equity = context.cash + context.portvalue
    context.pct_invested = (context.totalLongs-context.totalShorts) / context.total_equity
    context.pct_cash = context.cash / context.total_equity
    context.margin_req = abs(context.totalShorts * context.maint_margin)
    
        
def generate_order(sym, size, context, data):    
    #log.info("Call to generate_order")
    if size>0: #Opening a long position    

        #log.info("Buy long "+str(size)+" shares "+str(sym) )
        #log.info("Cash = "+str(context.cash)+"; Current Margin Req.="+str(context.margin_req) )                 

        #Is there enough available cash to buy the position
        if (context.cash-context.margin_req) < size * data[sym].price:
            #log.info("Trade canceled : Insufficient funds.")
            return

        #Deduct price from cash and add to portfolio value
        context.cash -= size * data[sym].price
        context.portvalue += size * data[sym].price
        context.totalLongs += size * data[sym].price
        update_portvals(context)

        #Abort the transaction if the percent invested is greater than the threshold
        #before slippage and commissions
        if context.pct_invested > context.pct_invested_threshold:
            context.cash += size * data[sym].price
            context.portvalue -= size * data[sym].price
            context.totalLongs -= size * data[sym].price
            update_portvals(context)

#            if size>100:
#                log.info("Re-generating order for "+str(size*context.pct_invested_threshold)+" instead of "+str(size))
#                generate_order(sym,size*context.pct_invested_threshold, context,data)
#            return
        
        #Abort the transaction if the investment would generate a margin call
        if context.cash < context.margin_req:
#            log.info("Invest would generate a margin call")
            context.cash += size * data[sym].price
            context.portvalue -= size * data[sym].price
            context.totalLongs -= size * data[sym].price
            update_portvals(context)
            return
    
        order(sym,size)

    else: #Opening a short position
        
        #log.info("Generating a short order for "+str(size)+" shares of "+str(sym)+" and context.cash="+str(context.cash)+" and context.margin_req="+str(context.margin_req) )
        #Is there at least enough available cash to cover the initial maintenance margin
        if (context.cash-context.margin_req) < abs(size * data[sym].price * context.init_margin):
            #log.info("Trade canceled")
            return
        
        #Deduct price from cash and add to portfolio value (note that size is negative)
        context.cash -= size * data[sym].price
        context.portvalue += size * data[sym].price
        context.totalShorts += size * data[sym].price
        update_portvals(context)
        
        #Abort the transaction if the percent invested is greater than the threshold
        #before slippage and commission
        if context.pct_invested > context.pct_invested_threshold:
            context.cash += size * data[sym].price
            context.portvalue -= size * data[sym].price
            context.totalShorts -= size * data[sym].price
            update_portvals(context)
            #log.info("Trade canceled")
            return
            
        #Abort the transaction if the investment would generate a margin call
        if context.cash < context.margin_req:
            context.cash += size * data[sym].price
            context.portvalue -= size * data[sym].price
            context.totalShorts -= size * data[sym].price
            update_portvals(context)
            #log.info("Trade canceled")
            return
        
        order(sym,size)
            
        
def generate_marginCall(context,data):
    #This function should be coded to address margin calls
    #log.info("Margin call")
    return(0)

def liquidate_position(sym,context):
    if context.portfolio.positions[sym].amount is not 0:
        log.debug("Liquidating "+str(context.portfolio.positions[sym].amount)+" shares of position "+str(sym))
        order(sym, -context.portfolio.positions[sym].amount)
        
This backtest was created using an older version of the backtester. Please re-run this backtest to see results using the latest backtester. Learn more about the recent changes.
There was a runtime error.

Cloned from Jiaming Kong - this algo reinvests dividends as they are paid.

Note:
Jan 3, 2002 dividend adjusted price of SPY = $93.44
July 12, 2013 dividend adjusted price of SPY = $167.51
Total return = $74.07 = 79.3%

The algo here returns 73.3%, probably because of a small cash position?

Clone Algorithm
22
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
import numpy as np
import datetime

def initialize(context):

    # context.stocks = [sid(19662),sid(19659),sid(19655),sid(19656),sid(19661),sid(19657),sid(19654),sid(19658),sid(19660), sid(8554)]
    context.stocks = [sid(8554)]
    context.m = len(context.stocks)
    context.b_t = np.ones(context.m) / context.m
    context.eps = 1  #change epsilon here
    context.init = True
    context.counter = 0
    
    set_slippage(slippage.VolumeShareSlippage(volume_limit=0.25, price_impact=0))
    set_commission(commission.PerShare(cost=0))
    
def handle_data(context, data):    
    context.counter += 1
    if context.counter == 1:
        buyAndHold(context, data)
    else:
        ReInvest(context, data)

def buyAndHold(context, data):
    for i, stock in enumerate(context.stocks):
        prices = np.zeros_like(context.b_t)
        desired_amount = np.zeros_like(prices)
        prices[i] = data[stock].price
        desired_amount[i] = np.round(context.portfolio.starting_cash / context.m / prices[i])
        log.info("Bought {ticker} @ {price} for {amount} shares".format(ticker = stock, price = prices[i], amount = desired_amount[i]))
        order(stock, desired_amount[i])

def ReInvest(context, data):
    for i, stock in enumerate(context.stocks):
        prices = np.zeros_like(context.b_t)
        desired_amount = np.zeros_like(prices)
        prices[i] = data[stock].price
        desired_amount[i] = np.round(context.portfolio.cash / context.m / prices[i])
        if desired_amount[i] >0:
            log.info("Bought {ticker} @ {price} for {amount} shares".format(ticker = stock, price = prices[i], amount = desired_amount[i]))
            order(stock, desired_amount[i])
This backtest was created using an older version of the backtester. Please re-run this backtest to see results using the latest backtester. Learn more about the recent changes.
There was a runtime error.

Daniel,
The order algorithm only takes in rounded integer as shares to buy. That's why I manually rounded the numbers before I order it.

desired_amount[i] = np.round(context.portfolio.starting_cash / context.m / prices[i])

That being said that you will have some cash positions. If you look at the log output of the above algorithm, sometimes the algo received cash dividend, and it would buy 90 shares of SPY, and buy another 1 share tomorrow because the SPY just went down a little bit to be affordable.

@Jiaming - Absolutely, that makes complete sense. I fully understand your algo and I think it works perfectly. My goal is to understand discrepancies between different tests that I've run - essentially to understand the caveats and assumptions within Quantopian.

Most of us watch SPY as a benchmark to gauge the performance of our algorithm (this may actually be a mistake but that is a different discussion) and if you were to buy-and-hold SPY you would return: ~43% on price return, ~66% with dividends, and ~79% with dividends reinvested. Quantopian has decided to use only the price return - which explains why algos that underperform SPY in my backtester (which uses reinvested dividend return) outperform in Quantopian. Their assumption is not outwardly wrong but it is a very, very important caveat for investors looking to "beat" the market.

I was really hoping to hear from one of the Quantopian staff on whether the price return was intentionally used or whether the omission of dividends was something they were going to change in the future. Overall, I'm very happy to have made some progress in resolving discrepancies between backtesters.

Great thread, thank you for starting it.

This wasn't a choice that we made with a great deal of thought. The original thought was "We need a benchmark. People have different needs, so we should let them choose their own benchmark. But we don't have a benchmark-chooser feature yet, so let's go with a simple obvious benchmark. SPY sounds good, right?" And so the choice was made. We didn't really consider the SPY dividends or how to apply them.

You've definitely convinced me that our choice had deeper implications than I realized. I agree, we should be using a total-returns value for our benchmark.

I think our path going forward probably has three parts. First is to label what we have better, which should mitigate the problem for now. The second part is to make the benchmark configurable per algorithm, so people can choose the right benchmark for their algo. The third part is to make the default benchmark a smarter choice, like a total-return S&P 500, rather than a price return. I don't have a timeframe on that right now - I need to do some spec work, estimation, and adjust our roadmap.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

No worries Dan.

I'm feeling much better overall. For the past 3 weeks approximately I've been pulling my hair out trying to find the differences between our in-house (less sophisticated, more buggy, and feature-limited) backtester and Quantopian. One-by-one I've nailed down a few differences in the assumptions that have been made and I'm seeing things converge. As I'm sure you folks know, when you approach any computer problem with multiple strategies or executions and the answers converge its a really good thing!

Hello Dan D.,

I would vote for a total-return benchmark as the priority. Following Daniel's comments about buy-and-hold SPY as a benchmark I see that that gives a 23% or so return from Jan-08 to Aug-13 against the built-in benchmark of 13% or so. The current benchmark significantly 'flatters' users' algos.

P.

@Peter -
Agreed. In the meantime, you can run a separate simulation that buys and holds SPY for the same back test period and then you would need to look at the returns of the SPY buy-and-hold "algo" and compare to your own algo. It is an extra step but at least you can actually compare the algo to SPY. (Or to your own benchmark)

In the ideal case (in my opinion) would be for the user to construct a customized buy-and-hold portfolio to use as a benchmark in the initialize function, possibly consisting of as many as 5 securities. This way, if you are algorithmically constructing portfolios (which is what I do) you can construct a portfolio that is a blend of SPY and AGG, weighted in a way to reflect risk tolerance objective of the portfolio. Or... if you are building an algo focused on emerging markets then you can select an emerging markets focused benchmark. There are literally hundreds of appropriate benchmarks depending on the focus of the strategy.

On a related note, regarding dividends - be aware peter that if you are watching for a price drop below a moving average, as is done in the sample algo, you get false buy and sell signals on dividend ex-dates. This also means moving averages (or any algo based on prices over multiple trading days) could incorporate prices pre- and post-dividend ex-date and would therefore give incorrect signals. For my algos, I generate a csv of dividend adjusted data and fetch_csv that, use it to dictate buy/sell signals and then use Quantopians order function (which buys/sells at non-dividend adjusted prices and handles dividends as events)

Hello Daniel,

Your posts have been very informative over the last few weeks and have made me realise that the built-in benchmark undersates returns by a substantial percentage depending on the length of the backtest. I'm now running a SPY buy-and-hold for the backtest period as a workaround. I like the customised 'portfolio' benchmark idea.

That's a very good point about dividends that I hadn't picked up on. It would be great if the Quantopian data could include dividend dates that are available to the algo.

P.

Not to be a pest but, the concern should be a risk adjusted benchmark. If there is anything worth leveraging for this early on, it might be to test an algorithm that is inflation adjusted against a benchmark that is inflation adjusted. Technically, the benchmark should be strategy neutral, an adjustable benchmark would be wonderful, but what we may be better off asking for is the means to use past, template, and user algorithms as a benchmark.

Along the lines of "What the benchmark should measure", I'm curious if others think it would make sense to start only after any pre-roll of data is complete.

when using @batch_transform functions, history, moving averages and the like my calculations really can't begin until enough days have past, however the benchmark starts up right from day one, which doesn't seem like a 1:1 kind of comparison.

Picking up on this thread, I agree it would be nice if the benchmark were total return. I offer not only a work-around but something that might be a best practice anyway. I don't mean this to be THE way to run a back-test, but a practice to include in the course of examining a strategy, particularly long-only strategies. Run the strategy with a short position representing the benchmark. The return will become an excess return and the standard deviation will become a tracking error. obviously one isn't limited to SPY, but might consider an equal, cap, or volume weighted portfolio of the stocks traded. To some extent this is common. The Fama-French factors (high minus low, big minus small, etc) are effectively long-short portfolios.

Closing the loop on this - the change is done, and the benchmark is now total return.