Back to Community
Pairs Trading Algorithm

A general algorithm to trade pairs based on the concept of cointegration. I tried three different pairs and don't think I would trade on any of them.

Clone Algorithm
260
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
from scipy import stats
import statsmodels.tsa.stattools as ts
import math
import numpy as np
import pandas as pd

def initialize(context):
    # set one of these pair options to test
    pair_option = 1
    
    if pair_option == 1:
        # Can go negative depending on the lookback.
        set_symbol_lookup_date('2015-01-01')
        context.pair = symbols('F', 'GM')
    if pair_option == 2:
        # this one goes through a long period of
        # not coming back to equilibrium
        context.pair = symbols('COKE', 'PEP')
    if pair_option == 3:
        # this one appears to blow up. Interesting to play
        # with the lookback period and see the effects.
        context.pair = symbols('AAPL', 'QCOM')
    
    # I don't think I would trade on any of these.
    
    set_commission(commission.PerShare(cost=0.03, min_trade_cost=None))
    set_slippage(slippage.FixedSlippage(spread=0.00))
    
    # Use a custom function to schedule trades
    schedule_function(func=trade,
                      date_rule=date_rules.every_day(),
                      time_rule=time_rules.market_open(hours=1)
                     )
    
    # Initialize model parameters
    context.initial_setup   = False
    context.entry_amount    = 1000
    context.entry_threshold = 1.5
    context.exit_threshold  = 0.0
    context.adf_threshold   = 0.1
    context.lookback        = 250

def handle_data(context, data):
    # This function is required, but we don't do anything here
    pass

def build_model(context, data):
    # Here we look back at historical data and build our model
    
    # Get data
    prices = history(context.lookback, '1d', 'price', ffill=True)
    
    # run a linear regression on the pair
    beta, alpha, r_value, p_value, std_err = stats.linregress(prices[context.pair[0]],
                                                              prices[context.pair[1]])
    
    # use the results of the linear regression to predict the second
    # fund from the first
    predicted = alpha + beta * prices[context.pair[0]]
    
    # calculate the sprea
    spread = prices[context.pair[1]] - predicted
    
    # check to see if the spread is stationary. We will compare this
    # value to context.adf_threshold, but generally we are looking for
    # a value of less than 0.05. The lower the value, the higher our 
    # confidence that the pair is cointegrated.
    adf_pvalue = ts.adfuller(spread)[1]
    
    # get the standard deviation of the spread
    spread_std = spread.std()
    
    # store relevant parameters to be used later
    context.alpha = alpha
    context.beta = beta
    context.spread_std = spread_std
    context.adf_pvalue = adf_pvalue
    context.initial_setup = True

def return_current(context, data):
    # calculate current spread and z-score of spread
    current_spread = (data[context.pair[1]].price - 
                     (context.alpha + context.beta * data[context.pair[0]].price))
    current_z = current_spread / context.spread_std
    return current_spread, current_z

def trade(context, data):
    # Here's the main trading logic
    
    if context.initial_setup == False:
        # the first time through we need to build the model here
        build_model(context, data)
    
    #########################################################################
    # This is a setting to play with. 
    # If set to false, once a trade is entered, the model will
    # not be rebuilt until the trade has been exited. This setting
    # gives the best performance as we wait for the pair to come
    # back to the equilibrium we expected when we initiated the postion.
    # If set to true, we continue to rebuild the model and will
    # exit the postion when the potentially new equilibrium is reached.
    # this could produce unexpected returns, but it also might be
    # safer if the relationship of the pair alters substantially 
    # after the trade has been entered.
     
    always_rebuild = False
    if always_rebuild:
        build_model(context, data)
    #########################################################################
    
    # calculate current relationship of pair
    current_spread, current_z = return_current(context, data)
    
    # check sign of relationship (above or below equilibrium)
    sign = math.copysign(1, current_z)
    
    # time to exit?
    if len(context.portfolio.positions) > 0 and np.any(sign != context.entry_sign or
                                                       abs(current_z) < context.exit_threshold):
        # if we get here we were in a trade and the pair has come back
        # to equilibrium.
        order_target_percent(context.pair[0], 0)
        order_target_percent(context.pair[1], 0)
    
    # look to enter a trade
    if len(context.portfolio.positions) == 0:
        
        # if we aren't  always rebuilding, we need to rebuild here
        if always_rebuild == False:
            build_model(context, data)
            current_spread, current_z = return_current(context, data)
            # check to see if we are above or below equilibrium
            sign = math.copysign(1, current_z)
        
        
        if (context.adf_pvalue < context.adf_threshold and     # cointegrated
            abs(current_z) >= context.entry_threshold):        # spread is big enough
            
            # record relationship at start of position
            context.entry_sign = sign
            
            # calculate shares to buy based on entry_amount
            # tried to do this with order_target_value() but
            # that would truncate instead of round. This method
            # get's us closer starting values
            shares_pair0 = round(context.entry_amount / data[context.pair[0]].price, 0)
            shares_pair1 = round(context.entry_amount / data[context.pair[1]].price, 0)
            
            order(context.pair[0],      sign * shares_pair0)
            order(context.pair[1], -1 * sign * shares_pair1)
    
    # some interesting values to look at. Can only record 
    # a max of 5 so modify commenting as desired.
    record(p_value = context.adf_pvalue)
    record(spread_z = current_z)
    record(beta = context.beta)
    record(alpha = context.alpha)
    # record(ref = data[context.pair[0]].price, follow = data[context.pair[1]].price)
        
                                                      
                                                
    
There was a runtime error.
7 responses

Thanks for sharing Aaron! This is a good template. Pair trading investigations will be interesting to explore in IPython notebooks using the new research environment. There you can visualize different pair combinations and watch the effect of swapping different securities.

If you don't have research yet, you can submit an algo to the contest and get your access:
https://www.quantopian.com/posts/enter-the-quantopian-open-to-get-early-access-to-the-research-platform

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Would the same pairs work with OLS based pair trading?

Not sure I understand your question. This algorithm uses OLS to model the relationship between the chosen pairs. The rest of the logic is centered around the decision to enter into the trade. Here I'm using cointegration as a test condition and then waiting for a specified # of standard deviation from the mean before I take a position.

Aaron, A very neat and good job.

Hey Aaron,
Nice work on this algo, I like is that you consider whether the previous spread has reverted before estimating new parameters, sometimes new estimations cause a position to be closed before the original spread was profitable. Tracking the pair's pnl helps avoid that, but it's good to keep the original parameters around anyway.

The drift in the hedge ratio tends to be the killer in pairs trading In my experience, you could try adding a maximum time you'll wait for a reversion, or track how far the original parameters have deviated from updated ones and change them when the model has gone sufficiently out of whack. I added a simple max holding period as an example. I also switched it to use log prices since the returns of the spread are,

(log(A,t1) - log(A,t2)) + beta*(log(B, t2) - log(B, t1)).

It's heading in the right direction though, you'll just need risk stuff and some decent pairs.

Thanks for sharing,
David

Clone Algorithm
73
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
from scipy import stats
import statsmodels.tsa.stattools as ts
import math
import numpy as np
import pandas as pd

def initialize(context):
    # set one of these pair options to test
    pair_option = 1
    
    if pair_option == 1:
        # Can go negative depending on the lookback.
        set_symbol_lookup_date('2015-01-01')
        context.pair = symbols('F', 'GM')
    if pair_option == 2:
        # this one goes through a long period of
        # not coming back to equilibrium
        context.pair = symbols('KO', 'PEP')
    if pair_option == 3:
        # this one appears to blow up. Interesting to play
        # with the lookback period and see the effects.
        context.pair = symbols('AAPL', 'QCOM')
    
    # I don't think I would trade on any of these.
    
    set_commission(commission.PerShare(cost=0.03, min_trade_cost=None))
    set_slippage(slippage.FixedSlippage(spread=0.00))
    
    # Use a custom function to schedule trades
    
    schedule_function(func=trade,
                      date_rule=date_rules.every_day(),
                      time_rule=time_rules.market_open(hours=1)
                     )
    
    # Initialize model parameters
    context.initial_setup   = False
    context.entry_amount    = 10000
    context.entry_threshold = 2.0
    context.exit_threshold  = 0.0
    context.adf_threshold   = 0.2
    context.lookback        = 60*390
    context.hold_period     = pd.tseries.offsets.BDay() * 60
    context.entry_dt        = None

def handle_data(context, data):
    # This function is required, but we don't do anything here
    pass

def build_model(context, data):
    # Here we look back at historical data and build our model
    
    # Get data
    prices = np.log(history(context.lookback, '1m', 'price', ffill=True))
    
    # run a linear regression on the pair
    beta, alpha, r_value, p_value, std_err = stats.linregress(prices[context.pair[0]], 
                                                              prices[context.pair[1]])
    
    # use the results of the linear regression to predict the second
    # fund from the first
    predicted = alpha + beta * prices[context.pair[0]]
    
    # calculate the sprea
    spread = prices[context.pair[1]] - predicted
    
    # check to see if the spread is stationary. We will compare this
    # value to context.adf_threshold, but generally we are looking for
    # a value of less than 0.05. The lower the value, the higher our 
    # confidence that the pair is cointegrated.
    adf_pvalue = ts.adfuller(spread)[1]
    
    # get the standard deviation of the spread
    spread_std = spread.std()
    
    # store relevant parameters to be used later
    context.alpha = alpha
    context.beta = beta
    
    context.spread_std = spread_std
    context.adf_pvalue = adf_pvalue
    context.initial_setup = True

def return_current(context, data):
    # calculate current spread and z-score of spread
    current_spread = (np.log(data[context.pair[1]].price) - 
                     (context.alpha + context.beta * np.log(data[context.pair[0]].price)))
    current_z = current_spread / context.spread_std
    return current_spread, current_z

def trade(context, data):
    # Here's the main trading logic
    for stock in context.pair:
        if get_open_orders(stock):
            return
    if context.initial_setup == False:
        # the first time through we need to build the model here
        build_model(context, data)
    
    #########################################################################
    # This is a setting to play with. 
    # If set to false, once a trade is entered, the model will
    # not be rebuilt until the trade has been exited. This setting
    # gives the best performance as we wait for the pair to come
    # back to the equilibrium we expected when we initiated the postion.
    # If set to true, we continue to rebuild the model and will
    # exit the postion when the potentially new equilibrium is reached.
    # this could produce unexpected returns, but it also might be
    # safer if the relationship of the pair alters substantially 
    # after the trade has been entered.
     
    always_rebuild = False
    if always_rebuild:
        build_model(context, data)
    #########################################################################
    
    # calculate current relationship of pair
    current_spread, current_z = return_current(context, data)
    
    # check sign of relationship (above or below equilibrium)
    sign = math.copysign(1, current_z)
    
    # time to exit?
    if len(context.portfolio.positions) > 0 and np.any(sign != context.entry_sign or
                                                       abs(current_z) < context.exit_threshold):
        # if we get here we were in a trade and the pair has come back
        # to equilibrium.
        order_target_percent(context.pair[0], 0)
        order_target_percent(context.pair[1], 0)
        context.entry_dt = None
        
    if context.entry_dt is not None:
        exit_dt = context.entry_dt + context.hold_period
        if get_datetime() >= exit_dt:
            order_target(context.pair[0], 0)
            order_target(context.pair[1], 0)
            context.entry_dt = None
            return
            
    
    # look to enter a trade
    if len(context.portfolio.positions) == 0:
        
        # if we aren't  always rebuilding, we need to rebuild here
        if always_rebuild == False:
            build_model(context, data)
            current_spread, current_z = return_current(context, data)
            # check to see if we are above or below equilibrium
            sign = math.copysign(1, current_z)
        
        
        if (context.adf_pvalue < context.adf_threshold and     # cointegrated
            abs(current_z) >= context.entry_threshold):        # spread is big enough
            
            # record relationship at start of position
            context.entry_sign = sign
            
            # calculate shares to buy based on entry_amount
            # tried to do this with order_target_value() but
            # that would truncate instead of round. This method
            # get's us closer starting values
            shares_pair0 = round(context.entry_amount / data[context.pair[0]].price, 0)
            shares_pair1 = round(context.entry_amount / data[context.pair[1]].price, 0)
            
            order_target(context.pair[0],      sign * shares_pair0)
            order_target(context.pair[1], -1 * sign * shares_pair1)
            context.entry_dt = get_datetime()
    
    # some interesting values to look at. Can only record 
    # a max of 5 so modify commenting as desired.
    record(p_value = context.adf_pvalue)
    record(spread_z = current_z)
    record(spread=current_spread)
    record(beta = context.beta)
    record(alpha = context.alpha)
    # record(ref = data[context.pair[0]].price, follow = data[context.pair[1]].price)
        
                                                      
                                                
    
There was a runtime error.

Is there anyone who could help me modify this algorithm for a much simpler trend-following strategy? I want to test the spread between DIA and SPY as a simple crossover strategy on an hourly chart. I would like the strategy to start with an equivalent dollar amount of each stock. I believe the pair ratio is around .86 for these ETFs. I would like it to trade a simple cross moving average cross over strategy. So, if the price of the spread goes above the moving average, the strategy would buy an addition 100 or so shares of one ETF, and then go short once it crosses back below the moving average, trading around the core equivalent holding. I am using this strategy as one part of a larger strategy, so I don't expect the results to be spectacular, but there should be reasonably low volatility. If someone could help me out, that would be great. It seems like this is a good starting algorithm, but I know they are generally correlated and I only want to trade the one pair. Any help would be appreciated. Thanks.

Hey Aaron and David, thanks for sharing.
For the exit, wouldn't this sub condition be always False since context.exit_threshold is set to 0 ?
abs(current_z) < context.exit_threshold