Zipline/Quantopian : Major discrepencies using a cross-compatible code

Hi all,
My initial objective was to write a code that could be copy/pasted between Zipline and Quantopian, with the objective to use Linux/Spyder python IDE to write and debug code. The provided code addresses that objective.

Then I looked at the backtest results for any differences. Although I was expecting some, I've been surprised by the extent of such (see log below for example).

So I'm wondering, Am I doing anything wrong, is this the discussed differences between Quant-feed, and Yahoo ... and then what to make of it.

I've been using Yahoo for periodic strategies (periodic > 1 week rebalancing for tactical allocation strategies) with success. So I'm looking for convincing/rational info to be as confident with Q but currently my experience with yahoo tends to bring quite some bias in my thinking.

Any thoughts ?
Also ... any better way to display results in Zipline rather than the few lines of codes I've copied from https://www.quantopian.com/users/5369480afece9e06440000f6

quantopian:
2004-01-02PRINTDate 2004-01-02 00:00:00+00:00 Switch Nb: 1
2004-09-01PRINTDate 2004-09-01 00:00:00+00:00 Switch Nb: 2
2004-12-01PRINTDate 2004-12-01 00:00:00+00:00 Switch Nb: 3
2004-12-01PRINTDate 2004-12-01 00:00:00+00:00 CAGR = 0.0130861133
2005-04-01PRINTDate 2005-04-01 00:00:00+00:00 Switch Nb: 4
2005-08-01PRINTDate 2005-08-01 00:00:00+00:00 Switch Nb: 5
2005-12-01PRINTDate 2005-12-01 00:00:00+00:00 CAGR = 0.0372192950866

zipline:
Date 2004-01-02 00:00:00+00:00 Switch Nb: 1
Date 2004-05-03 00:00:00+00:00 Switch Nb: 2
Date 2004-09-01 00:00:00+00:00 Switch Nb: 3
Date 2004-12-01 00:00:00+00:00 Switch Nb: 4
Date 2004-12-01 00:00:00+00:00 CAGR = 0.015788
Date 2005-02-01 00:00:00+00:00 Switch Nb: 5
Date 2005-03-01 00:00:00+00:00 Switch Nb: 6
Date 2005-04-01 00:00:00+00:00 Switch Nb: 7
Date 2005-08-01 00:00:00+00:00 Switch Nb: 8
Date 2005-09-01 00:00:00+00:00 Switch Nb: 9
Date 2005-10-03 00:00:00+00:00 Switch Nb: 10
Date 2005-12-01 00:00:00+00:00 CAGR = 0.00133460940886

40
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
 Returns 1 Month 3 Month 6 Month 12 Month
 Alpha 1 Month 3 Month 6 Month 12 Month
 Beta 1 Month 3 Month 6 Month 12 Month
 Sharpe 1 Month 3 Month 6 Month 12 Month
 Sortino 1 Month 3 Month 6 Month 12 Month
 Volatility 1 Month 3 Month 6 Month 12 Month
 Max Drawdown 1 Month 3 Month 6 Month 12 Month
'''
OBJECTIVE
---------
Have a code that is fully compatible between Zipline and Quantopian, and that can be debug under Linux/Spyder.

-> To investigate Zip & Quant results : significative discrepencies on returns and the number of switches between pairs.

STRATEGY
--------
Lit.rev
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1917044
Paired-switching for tactical portfolio allocation

Starting from the end of the first full week of the year, we look at the performance of the two equities
over the prior thirteen weeks (the ranking period), and buy the equity that has the higher return during
the ranking period. The position is held for thirteen weeks (the investment period). At the end of the
investment period the cycle is repeated.

Obviously, the number of weeks in the ranking period and the investment period can be varied
(independently) to optimize the strategy for a given pair of equities. Although the inevitable concerns
on over-fitting associated with such an optimization can be addressed by means of an
appropriate cross-validation methodology

IMPLEMENTATION
--------------

SPY / TLT
rebalance period: 1st day of month
ranking period: 4months
investment period: 1 month
'''

import math
import pandas as pd
import numpy as np

from datetime import datetime
import pytz

def initialize(context):
context.stocks = [symbol('SPY'), symbol('TLT')]
context.nbSwitch = 0
context.periodCount = 0
context.lookback = 4*21 # 4 months period, 21 trading days per month
context.Periodicity = 1 # every x period ; 1 means every period

try:  # check for Quantopian
schedule_function(ordering_logic,
date_rule=date_rules.month_start(),
time_rule=time_rules.market_open(hours=0, minutes=15))
except:  # running Zipline (or my error above)

context.schedule_function(ordering_logic,
date_rule=date_rules.month_start(),
time_rule=time_rules.market_open(hours=0, minutes=15))

context.startDate = datetime(2004, 1, 1, 0, 0, 0, 0, pytz.utc)
context.endDate = datetime(2014, 1, 1, 0, 0, 0, 0, pytz.utc)

def ordering_logic(context, data):

context.periodCount += 1
# execute modulo context.Periodicity
if context.periodCount % context.Periodicity == 0:
ror = stockMetrics (context,data)
if (ror[0] > ror[1]):
allin(0, context, data)
else:
allin(1,context, data)

pass

def handle_data(context, data):
position = context.portfolio.positions[context.stocks[0]].amount
record(position=position)
pass

def allin (stockid, context, data):
status = context.portfolio.positions[context.stocks[stockid]].amount
if status > 0:
# do nothing, we are already invested
pass
else:
context.nbSwitch +=1
print("Date "+ str(data[context.stocks[0]].datetime) +"   Switch Nb: " +str(context.nbSwitch))
if (stockid == 0):
order_target_percent(context.stocks[stockid], 1)
order_target_percent(context.stocks[1], 0)
else:
order_target_percent(context.stocks[stockid], 1)
order_target_percent(context.stocks[0], 0)

get_cagr(context, data)
pass

def get_cagr(context, data):
if (context.periodCount % 12 == 0):
# portf_value: Sum value of all open positions and ending cash balance.
cagr = np.power(context.portfolio.portfolio_value/float(context.portfolio.starting_cash), 1/float(context.periodCount/12) )-1
print("Date "+ str(data[context.stocks[0]].datetime) +"   CAGR = " +str(cagr))
pass

def stockMetrics (context, data):
rateReturn = 0
std = 0
# Request history from the last period days
prices = history(context.lookback, '1d', 'price')
opens = history(context.lookback, '1d', 'open_price')
closes = history(context.lookback, '1d', 'close_price')
# compute returns over the period
rateReturn = (prices.ix[-1] - prices.ix[0]) / prices.ix[0]
# compute standard deviation over period
std = prices.std()
#record(retSPY=rateReturn[0], retTLT=rateReturn[1])
return(rateReturn)

def show_results(algo, data, results):
bm_returns = br[(br.index >= algo.startDate) & (br.index <= algo.endDate)]
results['benchmark_returns'] = (1 + bm_returns).cumprod().values
results['algorithm_returns'] = (1 + results.returns).cumprod()
#sharpe = [risk['sharpe'] for risk in algo.risk_report['one_month']]
#print("Monthly Sharpe ratios: {0}".format(sharpe))

#print("ideal netpnl: " + str(round(results.ideal[-1], 2)))
actual = results.portfolio_value - algo.portfolio.starting_cash
print("actual netpnl: " + str(actual[-1]))

fig = pl.figure(1, figsize=(8, 10))

results[['algorithm_returns', 'benchmark_returns']].plot(ax=ax1, sharex=True)
pl.setp(ax1.get_xticklabels(), visible=False)
pl.legend(loc=0)

#    data[algo.stocks].plot(ax=ax2, color='blue')
#    pl.setp(ax2.get_xticklabels(), visible=False)
#    pl.legend(loc=0)

#    ax3 = fig.add_subplot(313, ylabel='Position Size')
#    results.position.plot(ax=ax3, color='blue')
#    pl.legend(loc=0)

pl.gcf().set_size_inches(18, 8)
pl.show()

'''
if __name__ == '__main__':

from zipline.api import order_target, record, symbol, order_target_percent, history
from zipline.finance import trading, commission, slippage
import zipline.utils.events
from zipline.utils.events import (EventManager, make_eventrule, DateRuleFactory, TimeRuleFactory)
from zipline.utils.events import DateRuleFactory as date_rules
from zipline.utils.events import TimeRuleFactory as time_rules

import matplotlib.pyplot as plt
import pylab as pl

algo = TradingAlgorithm(initialize=initialize, handle_data=handle_data, capital_base = 10000)

data = data.dropna()
results = algo.run(data)
show_results(algo, data, results)
'''
There was a runtime error.
8 responses

I've been also looking into zipline vs. Quantopian and I also observed some difference in data between the two. I haven't read anywhere why so much difference, so I don't have a good answer for you. By the way, you can use get_environment function to figure out whether your algorithm is running in zipline or Quantopian.

The difference you're seeing is a common occurrence between the differences in data sources. From our FAQ:

Quantopian uses the last traded price as the close price for the security. Depending on the data source, others may use end-of-day (EOD) prices. For example, Yahoo is an EOD datasource. Yahoo and other EOD data providers get their price and volume data from the official exchange record. Quantopian's data is generated by the actual trades, regardless of what exchange the trade was made on. The EOD sources rarely exactly match data derived from intraday data. For instance, the official close for a NYSE stock is the last trade of the day for the stock on NYSE. But if the stock also trades on Chicago, Pacific or another regional exchange, the last trade on one of those exchanges could be our close.

Also, Quantopian's data is adjusted for splits and mergers, but does not use adjusted close-prices for dividends. Hope that helps to explain the differences!

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Alisa, you should look into using the official exchange of the primary open/close for day bars or maybe offer it as a parameter. The liquidity on the other exchanges is typically much less then what you get on the primary. Using BAC on 12/8 as an example. NYSE had the largest closing volume. 128k traded at 17.66. If you looked at the last print at 16:00 only 200 shares make up the arca print at 17.67. It seems that the platform is flexible in bringing in other data however the big appeal of your ecosystem is that you offer data. Lightens the load for newish programmers like myself.

Alisa, Thanks for the reply and explanation;

If I were to use Yahoo price data, and make the most of their [close, adjusted close (accounting for splits & dividends), and dividends], and re-adjust close prices ONLY for splits, I should get closer to Quantopian results is that right ? This would be a great way for me to gain confidence in Zipline and Q, as I'm getting closer to live trading some of my strategies, and I do believe designing strategy offline is more efficient (specifically using multiple files/modules).

It's nice to be able to cross validate results in one OSS framework (Z) with another commercial one (Q) and I believe this would only strengthen Q.

Florent, the approach you mentioned would likely get you closer to the Quantopian data. But I'd warm that there there will may be some differences. As mentioned, our backtesting data is the aggregated trade data whereas Yahoo's may include data from pre- and post-market auction pools. All data sources are slightly different (even between Google, Yahoo, Bloomberg) and we get our data from a private vendor, which we then monitor, clean, and stream into the IDE. We'd love to make our data available offline for use in Zipline, for the reason you mentioned, but we can't redistribute the data per our agreement with the vendor.

When you're ready to port your strategy to Quantopian, I'd suggest to use the get_environment method. This will make it easier to move your code over to the IDE.

Thanks for taking the time to follow-up.
The strategy is ready, indeed posted above. I design in Zipline while making sure it's compatible with Quantopian. Anything not running in Zipline, I would never use in Quantopian as it lacks the benefit of my desktop IDE (linux/spyder) in terms of file management and debugging.

Fundamentally, I design everything in Z (1 strategy = multiple files ... the usual approach for maintaining a clean and robust environment), and use a script to assemble everything as a single file for Quantopian. That's the best approach I have found to maintain a proper code between different strategies, and prevent copy/pasta ! Right now, my aggregating code is not too robust, but eventually I'll make it better and share. I'm surprised there is no such things being part of Zipline-core.

On the data/performance comparison, I ll report and share with different data in the next few days/weeks. I believe instrument, with no splits and no dividends should provide some level of similarity, and even instruments with split-only should be in good agreements for strategies that have long-periodic analysis (like the one I've posted above).

Hi florent,

your research about "Zipline/Quantopian : Major discrepencies using a cross-compatible code" is very interesting,
Do you have any in advnace finding that could share to us.

Hi NT,
nothing new. I'm waiting on this to be merged: https://github.com/quantopian/zipline/pull/398, although I still think working up the Yahoo data, to keep the splits but remove the dividends and provide them as cash inflow at dividend dates (as done in Q I believe), while also enabling full stock return values as an option inside Zipline Algorithm would be the best way as some algo requires this info (based on full return), although only for the processing, as simulating feeding price should always be w/o dividends like everyday live trading.