Back to Community

Investment Hypothesis

Pairs trading is a strategy that involves two stocks which are simultaneously longed and shorted according to their future expected prices. Pairs trading is usually done on pairs of stocks that have some underlying economic relationship and can be very profitable if their difference is mean-reverting. In general, if the difference is beyond a certain level above the mean, the over-valued stock is shorted while the under-valued stock is longed, with the expectation that the stocks would revert to their 'true' value in the future; and vice-versa in case the difference is below a certain level under the mean.

The main hypothesis of my strategy, derived from the paper that was assigned to me, is that the change in a mean-reverting stock spread can be modeled as an OU process in the following way:

dXt = θ * ( μt - Xt ) * dt + σdWt

where μ is the mean and θ is the mean-reversion rate. A higher level of θ indicates faster convergence to the mean, which means that profits can be realized on pairs with higher θ much faster and therefore in a greater volume than pairs with lower θ.

Investment Algorithm

The spread between stocks A and B at time t is defined as Xt = ln( A(t) / A(0) ) − ln( B(t) / B(0) ), t ≥ 0

Trades are made if the current value of the spread crosses the Bolinger bands. The upper and lower bands in this case are defined as μ + 0.5 * σ and μ - 0.5 * σ respectively, where μ represents the 30 days moving average and σ represents the 30 days moving standard deviation.
The following course of action is implemented if the value of the spread crosses the Bolinger bands:

if X > μ + 0.5 * σ
stock A is shorted and stock B is longed

if X < μ - 0.5 * σ
stock A is longed and stock B is shorted

Validating the Investment Hypothesis

Pairs trading strategy requires that there must be an economic relationship between the two stocks. Therefore in my strategy, the stocks within a pair are always chosen from within the same industry. To do this, I create a pipeline that filters stocks for their industrial classification, and their volume of trade. I then use this pipeline to calculate the mean reversion rate, as well as the adfuller statistic for each pair.

To estimate the mean-reversion rate, I first rewrite the OU process equation as

dXt / dt = θ * ( μt - Xt ) + ( σdWt / dt )

Then creating variables for ( dXt / dt ) and ( μt - Xt ), I run the following regression

dXt / dt = α + β * ( μt - Xt ) + ε

and use β as an approximation for θ i.e. the mean-reversion rate.

The stock pairs with the highest mean-reversion rates and an adfuller statistic of below 0.1 (indicating co-integration) are then chosen to be included in the backtest. The attached notebook demonstrates this process for stocks in the energy sector. The seventh cell in the notebook shows the twenty stock pairs with the highest mean-reversion rates. The eighth cell shows the adfuller statistic for all of these pairs. The final two cells arrange the stock labels into the input that is then copied and used in the backtest.

The backtest is run from July 2018 to July 2019 on minutely data and includes pairs from energy and technology industries. The strategy gives a 5.13% return and a Sharpe ratio of 2.19.

My key deviation from the paper is in the way I select the stock pairs. The paper is quite vague about it and only briefly mentions that the stock pairs are chosen on the basis of mean-reversion rates and some other characteristics. The exact methodology is not described in any detail. Therefore, my formula for selecting on the basis of mean-reversion rates and co-integration is improvised.

A potential shortcoming of my strategy is that there is a forward-looking bias in my method for pairs selection, since the mean-reversion rates are calculated for the year in which the backtest is applied. The main reason for the presence of this bias is that I was unable to include the pair selection method in the backtest environment. However, my strategy does confirm the hypothesis that co-integrated pairs with high mean-reversion rates are likely to provide good returns. I tested this for several years, and the backtest always shows positive returns. On the other hand, including pairs with very low mean-reversion rates always results in losses.


Overall, my strategy shows that trading on pairs with high mean-reversion rates and a high degree of co-integration can be profitable. Although the returns are ultimately lower than the market, that is to be expected since pairs trading returns tend not to imitate the market. In fact, it can be seen in the backtest that the strategy provided positive returns even when the market returns turned to losses. Therefore, this strategy may be recommended as a safe bet, but not one that is likely to result in huge returns.

Loading notebook preview...
Notebook previews are currently unavailable.
1 response


Clone Algorithm
Backtest from to with initial capital
Total Returns
Max Drawdown
Benchmark Returns
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
import numpy as np
import statsmodels.api as sm
import pandas as pd

import quantopian.optimize as opt
import quantopian.algorithm as algo
from statsmodels.tsa.stattools import coint, adfuller
from statsmodels import regression

def initialize(context):
    # Quantopian backtester specific variables
    context.stock_pairs = [
(symbol('PSX'), symbol('FANG')) ,
(symbol('APA'), symbol('SLB')) ,
(symbol('MUR'), symbol('COP')) ,
(symbol('SM'), symbol('WLL')) ,
(symbol('HP'), symbol('MUR')) ,
(symbol('APA'), symbol('MRO')) ,
(symbol('CLR'), symbol('MPC')) ,
(symbol('SLB'), symbol('CLR')) ,
(symbol('VLO'), symbol('BHGE')) ,
(symbol('CLR'), symbol('PBF')) ,
(symbol('EA'), symbol('SPOT')) ,
(symbol('YNDX'), symbol('FB')) ,
(symbol('PFPT'), symbol('FB')) ,
(symbol('AMAT'), symbol('FB')) ,
(symbol('INTC'), symbol('SYMC')) ,
(symbol('NXPI'), symbol('PFPT')) ,
(symbol('ADBE'), symbol('ACN')) ,
(symbol('FSLR'), symbol('YNDX')) ,
(symbol('NXPI'), symbol('FB')) ,
(symbol('NXPI'), symbol('YNDX')) ,
(symbol('EMR'), symbol('UPS')) ,
(symbol('GD'), symbol('HII')) ,
(symbol('ADP'), symbol('DOV')) ,
(symbol('GWW'), symbol('AAL')) ,
(symbol('JBHT'), symbol('TRN')) ,
(symbol('GWW'), symbol('TRN')) ,
(symbol('NSC'), symbol('HON')) ,
(symbol('ADP'), symbol('CMI')) ,
(symbol('GWW'), symbol('RHI')) ,
(symbol('IR'), symbol('VRSK')) ,
    #[(symbol('APA'), symbol('APC'))
 #                          , (symbol('MSFT'), symbol('AAPL')), (symbol('ATU'), symbol('AIRM')), (symbol('SAVE'), symbol('XYL')), (symbol('GOOG_L'), symbol('LNKD')),(symbol('APA'), symbol('SLB'))]
    context.stocks = symbols('PSX',
    #symbols('APA', 'APC', 'MSFT', 'AAPL', 'ATU', 'AIRM','SAVE', 'XYL', 'GOOG_L', 'LNKD','SLB')
    context.num_pairs = len(context.stock_pairs)
    # strategy specific variables
    context.lookback = 7000 
    context.target_weights = pd.Series(index=context.stocks, data=0.25)
    context.spread = np.ndarray((context.num_pairs, 0))
    context.inLong = [False] * context.num_pairs
    context.inShort = [False] * context.num_pairs
    # Only do work 30 minutes before close
    schedule_function(func=check_pair_status, date_rule=date_rules.every_day(), time_rule=time_rules.market_close(minutes=30))
# Will be called on every trade event for the securities you specify. 
def handle_data(context, data):
    # Our work is now scheduled in check_pair_status

def check_pair_status(context, data):
    prices = data.history(context.stocks, 'price', 8580, '1m').iloc[-context.lookback::]
    new_spreads = np.ndarray((context.num_pairs, 1))
    for i in range(context.num_pairs):

        (stock_y, stock_x) = context.stock_pairs[i]

        Y = prices[stock_y]
        X = prices[stock_x]
        # Comment explaining try block
            hedge = hedge_ratio(Y, X, add_const=True)      
        except ValueError as e:

        context.target_weights = get_current_portfolio_weights(context, data)
        spread = np.log(Y/Y.iloc[0]) - np.log(X/X.iloc[0])

        long_ma = np.mean(spread)
    # Get the std of the long window
        long_std = np.std(spread)
        bolinger1 = long_ma + 0.5*long_std
        bolinger2 = long_ma - 0.5*long_std
        spread_at_t = spread.iloc[-1:]
    # mean reversion rate    
        sample = spread
        dt = float(1)/len(sample)
        dx = sample[1:] - sample[:-1].values
        Yhat = dx/dt
        Xvar = np.mean(sample[:-1]) - sample[:-1] 
        model = regression.linear_model.OLS(Yhat.values, Xvar.values).fit()
        mean_rev = model.params[0]
        ref = 0

        if context.inShort[i] and spread_at_t.values < bolinger1 and  spread_at_t[0] > bolinger2:
                context.target_weights[stock_y] = 0
                context.target_weights[stock_x] = 0
                context.inShort[i] = False
                context.inLong[i] = False
                record(X_pct=0, Y_pct=0)
                allocate(context, data)


        if spread_at_t.values < bolinger2 and (not context.inLong[i]) and mean_rev>ref:
                # Only trade if NOT already in a trade 
                y_target_shares = 1
                X_target_shares = -hedge
                context.inLong[i] = True
                context.inShort[i] = False

                (y_target_pct, x_target_pct) = computeHoldingsPct(y_target_shares,X_target_shares, Y[-1], X[-1])
                context.target_weights[stock_y] = y_target_pct * (1.0/context.num_pairs)
                context.target_weights[stock_x] = x_target_pct * (1.0/context.num_pairs)
                record(Y_pct=y_target_pct, X_pct=x_target_pct)
                allocate(context, data)

        if spread_at_t.values > bolinger1 and (not context.inShort[i]) and mean_rev>ref:
                # Only trade if NOT already in a trade
                y_target_shares = -1
                X_target_shares = hedge
                context.inShort[i] = True
                context.inLong[i] = False

                (y_target_pct, x_target_pct) = computeHoldingsPct( y_target_shares, X_target_shares, Y[-1], X[-1] )
                context.target_weights[stock_y] = y_target_pct * (1.0/context.num_pairs)
                context.target_weights[stock_x] = x_target_pct * (1.0/context.num_pairs)
                record(Y_pct=y_target_pct, X_pct=x_target_pct)
                allocate(context, data)
    context.spread = np.hstack([context.spread, new_spreads])

def hedge_ratio(Y, X, add_const=True):
    if add_const:
        X = sm.add_constant(X)
        model = sm.OLS(Y, X).fit()
        return model.params[1]
    model = sm.OLS(Y, X).fit()
    return model.params.values
def computeHoldingsPct(yShares, xShares, yPrice, xPrice):
    yDol = yShares * yPrice
    xDol = xShares * xPrice
    notionalDol =  abs(yDol) + abs(xDol)
    y_target_pct = yDol / notionalDol
    x_target_pct = xDol / notionalDol
    return (y_target_pct, x_target_pct)

def get_current_portfolio_weights(context, data):  
    positions = context.portfolio.positions  
    positions_index = pd.Index(positions)  
    share_counts = pd.Series(  
        data=[positions[asset].amount for asset in positions]  

    current_prices = data.current(positions_index, 'price')  
    current_weights = share_counts * current_prices / context.portfolio.portfolio_value  
    return current_weights.reindex(positions_index.union(context.stocks), fill_value=0.0)  

def allocate(context, data):    
    # Set objective to match target weights as closely as possible, given constraints
    objective = opt.TargetWeights(context.target_weights)
    # Define constraints
    constraints = []
There was a runtime error.