Sensitivity Analysis a.k.a. "Parameter Optimization" of Pair Trade Input Parameters

Today I'd like to share a notebook in our Research environment that shows how to run backtests of your algo across various input parameters, and then plot the results in various heatmaps to enable visualizing of how sensitive the algo is to small changes in the input parameters.

You can step through the notebook one cell at a time, or just do "Run All..." If you go the route of "Run All..." it may take upwards of 30 minutes for the whole notebook to complete because I have it setup to run 25 total backtests, varying the values across two parameters over five values each.

As well, it is worth mentioning, that your algo must be written in Zipline, completely in the Research environment, in order to do this at present time. The notebook I'm sharing here is a basic implementation of a pair trading algo, and you can freely modify the following inputs:

• the 2 stocks in the pair
• start and end dates for the backtest
• Z-score entry/exit criteria (e.g. +/- 1.0 standard deviation) for which to enter and exit the trades based on how much the pair's spread has diverged
• the lookback number of days to use for computing the hedge ratio (e.g. the # of days used in the regression)
• the lookback number of days for calculating the Z-score for determining whether the pair is diverging

The parameters over which I'm testing the sensitivity of in this example are:

• the lookback number of days to use for computing the hedge ratio (e.g. the # of days used in the regression)
• the lookback number of days for calculating the Z-score for determining whether the pair is diverging

By varying each of the inputs and viewing the resulting heatmaps I can see whether basing the spread calculation over shorter or longer timeframes results in more profitable trades (based on days used in the hedge ratio regression calculation); as well, I can see whether I should trigger a trade based on shorter or longer term divergences (based on the days used to compute the Z-score).

Held "constant" in the backtest are the following:

• trades are entered when the pair's spread diverges by more than +/- 1.0 standard deviations (Z-scores),

Free free to change the values to whatever you wish if you prefer. As well, the for() loop that runs all of the backtests can be easily modified to run over these entry and exit z-score parameters instead.

After you've run your simulations, over many different pairs of stocks, and encounter encouraging results, you can simply clone the algo I've shared below in the next reply (which is the Q Backtester equivalent of the Zipline algo used in this research notebook), quickly modify the stocks, and parameters to what you've researched, and then papertrade it live or enter it in an upcoming contest.

Happy Researching!

217
Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

15 responses

Here's the backtester algo for you to clone.

Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
 Returns 1 Month 3 Month 6 Month 12 Month
 Alpha 1 Month 3 Month 6 Month 12 Month
 Beta 1 Month 3 Month 6 Month 12 Month
 Sharpe 1 Month 3 Month 6 Month 12 Month
 Sortino 1 Month 3 Month 6 Month 12 Month
 Volatility 1 Month 3 Month 6 Month 12 Month
 Max Drawdown 1 Month 3 Month 6 Month 12 Month
import numpy as np
import statsmodels.api as sm
import pandas as pd
import pytz

def initialize(context):
# Quantopian backtester specific variables
set_slippage(slippage.VolumeShareSlippage(volume_limit=0.025, price_impact=0.1))
context.y = symbol('USO')
context.x = symbol('GLD')

# this defines whether to use a hedge ratio computed N days ago
# rationale for this is that since the algo is trading off of mean reversion
# that a hedge ratio that excludes N days of recency, e.g. when severe divergences
# could have occured, and which this algo hopes to exploit,  may be more aligned
# to the economic historical relationship befitting of the stock pair
context.use_hedge_ratio_lag = True
context.hedge_ratio_lag = 2

# strategy specific variables
context.lookback = 20     # used for regression
context.z_window = 20     # used for zscore calculation, must be <= lookback
context.entry_z = 1.0      # trade entry triggered when spread is + or - entryZ
context.exit_z = 0.0       # trade exit triggered when spread is + or - entryZ

context.hedge_ratio_history = np.array([])
context.in_long = False
context.in_short = False

if not context.use_hedge_ratio_lag:
# a lag of 1 means to include the most recent price in the hedge_ratio calculation
# specificlly, this is used for np.array[-1] indexing
context.hedge_ratio_lag = 1

# Will be called on every trade event for the securities you specify.
def handle_data(context, data):
if get_open_orders():
return

now = get_datetime()
exchange_time = now.astimezone(pytz.timezone('US/Eastern'))

# Only trade 30-minutes before market close
if not (exchange_time.hour == 15 and exchange_time.minute == 30):
return

prices = history(35, '1d', 'price').iloc[-context.lookback::]

y = prices[context.y]
x = prices[context.x]

try:
except ValueError as e:
log.debug(e)
return

context.hedge_ratio_history = np.append(context.hedge_ratio_history, hedge)
# Calculate the current day's spread and add it to the running tally
if context.hedge_ratio_history.size < context.hedge_ratio_lag:
return
# Grab the previous day's hedgeRatio
hedge = context.hedge_ratio_history[-context.hedge_ratio_lag]

# Keep only the z-score lookback period

if context.in_short and zscore < context.exit_z:
order_target(context.y, 0)
order_target(context.x, 0)
context.in_short = False
context.in_long = False
record(stock_Y_pct=0, stock_X_pct=0)
return

if context.in_long and zscore > context.exit_z:
order_target(context.y, 0)
order_target(context.x, 0)
context.in_short = False
context.in_long = False
record(stock_Y_pct=0, stock_X_pct=0)
return

if zscore < -context.entry_z and (not context.in_long):
y_target_shares = 1
x_target_shares = -hedge
context.in_long = True
context.in_short = False

(y_target_pct, x_target_pct) = compute_holdings_pct(y_target_shares,
x_target_shares,
y[-1], x[-1] )
order_target_percent(context.y, y_target_pct)
order_target_percent(context.x, x_target_pct)
record(stock_Y_pct=y_target_pct, stock_X_pct=x_target_pct)
return

if zscore > context.entry_z and (not context.in_short):
y_target_shares = -1
x_target_shares = hedge
context.in_short = True
context.in_long = False

(y_target_pct, x_target_pct) = compute_holdings_pct(y_target_shares,
x_target_shares,
y[-1], x[-1] )
order_target_percent(context.y, y_target_pct)
order_target_percent(context.x, x_target_pct)
record(stock_Y_pct=y_target_pct, stock_X_pct=x_target_pct)

def is_market_close(dt):

model = sm.OLS(y, x).fit()
return model.params[1]
model = sm.OLS(y, x).fit()
return model.params.values

def compute_holdings_pct(y_shares, x_shares, y_price, x_price):
y_dollars = y_shares * y_price
x_dollars = x_shares * x_price
notional_dollars =  abs(y_dollars) + abs(x_dollars)
y_target_pct = y_dollars / notional_dollars
x_target_pct = x_dollars / notional_dollars
return (y_target_pct, x_target_pct)


There was a runtime error.

@Justin: This looks great! Thanks for putting this together. I can't wait to dig more into it...

@Tristan, great, I hope it serves useful to you for your research. Please feel free to leave comments regarding any improvements you would like to see, or if you have any questions on how to modify the code.

One thing I noticed is that I left all of the ticker symbols hardcoded. They only appear in 2 places so it's easy to swap out. But thought I'd mention it.

1) They show up when you pull the data with get_pricing.
2) The are in the initialize() function definitions. E.g.

• context.y = symbols('USO')
• context.x = symbols('GLD')

@Justin... i luv your graphics in your notes..however... it seems... it only works with 'USO' and 'GLD'.. when you try.. other pairs... it wont work... try it with other pairs.. you see what I mean... cheers.

@JOHN CHAN, see my response directly above your last post about how to modify the ticker symbols.

Ctrl+F on Windows, or Command+F on Mac will work for finding text in the ipython notebook just like on a regular webpage. So you can just Control+F search for the couple of instances of "USO" and "GLD". They are only hardcoded in the 2 places that I referenced in my above previous reply.

Let me know if this works.

@justin... I mean this one... in your notes... you just change the symbol right?? it wont work just like USO and GLD illustration....

"""
This cell loads in the data for our tickers used in the backtest.
Change the ticker symbols, start_date or end_date to suit your needs.
""" #uso ,gld data = get_pricing(
['USO', 'GLD'], **<--------------------------------------

start_date='2013-01-01',
end_date = '2015-01-01',
frequency='minute'
)**

Hi John,
Sorry I guess I'm still not understanding your question. All you should have to do is change those ticker strings everywhere in the notebook (I just looked and there are 3 total places where "USO" shows up, for example.)

These are the cell names that contain the ticker symbol strings: In[2] , In[4], In[9]

Then you just have to re-run all the cells of the notebook (by doing Shift+Enter on each cell). Or just doing "Run All.." from the "Run" dropdown menu in the upper-right of the screen.

The heatmap graphics will only update if you first run all the simulations, and then you have to execute each of the cells after that draws the graphics. "Run All" will automatically do this, but if you are running each cell individually using Shift+Enter then you will have to run all of these cells as well.

I've just tried it on my end by changing the tickers in each of those cells to something different, then I re-ran the whole notebook, and it is working for me.

Let me know if I am not understanding the issue you're seeing correctly, and I can try to help further.

Hi, I'm trying to run this in the research module, and I cannot seem to get it to work. I am just copying and pasting the cells. Any advice?

Thanks.

Hi Justin. The code is running well on my research platform. However, it is extremely slow.

When it get to this part

Running the cell below runs a single backtest, to serve as an example before running all 25 backtests later on

# RUN this cell to run a single backtest
data_frequency='minute')
perf_manual = algo_obj.run(data.transpose(2,1,0))
perf_returns = perf_manual.returns     # grab the daily returns from the algo backtest
(np.cumprod(1+perf_returns)).plot()    # plots the performance of your algo


It took almost 2 hours to complete this. Is that normal? This is run on your server right?

Can someone explain why I would get the following error:

# RUN this cell to run a single backtest
data_frequency='minute')
perf_manual = algo_obj.run(data.transpose(2,1,0))
perf_returns = perf_manual.returns     # grab the daily returns from the algo backtest
(np.cumprod(1+perf_returns)).plot()    # plots the performance of your algo
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-2516be5a4125> in <module>()
3                             data_frequency='minute')
----> 4 perf_manual = algo_obj.run(data.transpose(2,1,0))
5 perf_returns = perf_manual.returns     # grab the daily returns from the algo backtest
6 (np.cumprod(1+perf_returns)).plot()    # plots the performance of your algo

TypeError: transpose() takes exactly 1 argument (4 given)


Hi - in "param_range_1 = map(int, np.linspace(20, 100, 5)) # hedge ratio lookback", if I want to use decimal places for instance to test over a range of 4 - 6 i.e. (np.linspace(4, 6, 5)) - what do I replace int with?

I tried = map(Decimal, (np.linspace(4, 6, 5)) but no luck and figured I probably need to import decimals so than I added at top 'from decimal import decimal' and am now getting an error msg:
"InputRejected: Importing decimal from decimal raised an ImportError. No modules or attributes with a similar name were found. Our security system is concerned. If you continue to have import errors, your account will be suspended until a human can talk to you." What am I missing?

edit: ok solved by [weight for weight in np.arange(4, 8, .5)] and making a separate weight variable

Dear all,
there seem to be an error in the line

perf_manual = algo_obj.run(data.transpose(2,1,0))

how can this been solves?
any help is greatly appreciated.
Regards

@Bodo,
What's the error you see? Can you paste it in?

@Justin

I updated the date rage to avoid one bug to not start or end on the first:

data = get_pricing(
['USO', 'GLD'],
start_date='2013-01-02',
end_date = '2015-01-02',
frequency='minute'
)

I'm still getting an error here.

KeyError Traceback (most recent call last)
in ()
3 data_frequency='minute')
----> 4 perf_manual = algo_obj.run(data.transpose(2,1,0))
5 perf_returns = perf_manual.returns # grab the daily returns from the algo backtest
6 (np.cumprod(1+perf_returns)).plot() # plots the performance of your algo

/usr/local/lib/python2.7/dist-packages/pandas/core/indexing.pyc in error() 1271 "cannot use label indexing with a null key")
1272 raise KeyError("the label [%s] is not in the [%s]" %
-> 1273 (key, self.obj._get_axis_name(axis)))
1274
1275 try:

KeyError: 'the label [2013-01-02 14:31:00+00:00] is not in the [index]'

I think it's related to using minute data vs daily data.

@Ryan: Currently Zipline can't do minute backtests in Research. This should be fixed soon.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.