AmmarTahir_FinalReport

Investment Hypothesis

Pairs trading is a strategy that involves two stocks which are simultaneously longed and shorted according to their future expected prices. Pairs trading is usually done on pairs of stocks that have some underlying economic relationship and can be very profitable if their difference is mean-reverting. In general, if the difference is beyond a certain level above the mean, the over-valued stock is shorted while the under-valued stock is longed, with the expectation that the stocks would revert to their 'true' value in the future; and vice-versa in case the difference is below a certain level under the mean.

The main hypothesis of my strategy, derived from the paper that was assigned to me, is that the change in a mean-reverting stock spread can be modeled as an OU process in the following way:

dXt = θ * ( μt - Xt ) * dt + σdWt

where μ is the mean and θ is the mean-reversion rate. A higher level of θ indicates faster convergence to the mean, which means that profits can be realized on pairs with higher θ much faster and therefore in a greater volume than pairs with lower θ.

Investment Algorithm

The spread between stocks A and B at time t is defined as Xt = ln( A(t) / A(0) ) − ln( B(t) / B(0) ), t ≥ 0

Trades are made if the current value of the spread crosses the Bolinger bands. The upper and lower bands in this case are defined as μ + 0.5 * σ and μ - 0.5 * σ respectively, where μ represents the 30 days moving average and σ represents the 30 days moving standard deviation.
The following course of action is implemented if the value of the spread crosses the Bolinger bands:

if X > μ + 0.5 * σ
stock A is shorted and stock B is longed

if X < μ - 0.5 * σ
stock A is longed and stock B is shorted

Validating the Investment Hypothesis

Pairs trading strategy requires that there must be an economic relationship between the two stocks. Therefore in my strategy, the stocks within a pair are always chosen from within the same industry. To do this, I create a pipeline that filters stocks for their industrial classification, and their volume of trade. I then use this pipeline to calculate the mean reversion rate, as well as the adfuller statistic for each pair.

To estimate the mean-reversion rate, I first rewrite the OU process equation as

dXt / dt = θ * ( μt - Xt ) + ( σdWt / dt )

Then creating variables for ( dXt / dt ) and ( μt - Xt ), I run the following regression

dXt / dt = α + β * ( μt - Xt ) + ε

and use β as an approximation for θ i.e. the mean-reversion rate.

The stock pairs with the highest mean-reversion rates and an adfuller statistic of below 0.1 (indicating co-integration) are then chosen to be included in the backtest. The attached notebook demonstrates this process for stocks in the energy sector. The seventh cell in the notebook shows the twenty stock pairs with the highest mean-reversion rates. The eighth cell shows the adfuller statistic for all of these pairs. The final two cells arrange the stock labels into the input that is then copied and used in the backtest.

The backtest is run from July 2018 to July 2019 on minutely data and includes pairs from energy and technology industries. The strategy gives a 5.13% return and a Sharpe ratio of 2.19.

My key deviation from the paper is in the way I select the stock pairs. The paper is quite vague about it and only briefly mentions that the stock pairs are chosen on the basis of mean-reversion rates and some other characteristics. The exact methodology is not described in any detail. Therefore, my formula for selecting on the basis of mean-reversion rates and co-integration is improvised.

A potential shortcoming of my strategy is that there is a forward-looking bias in my method for pairs selection, since the mean-reversion rates are calculated for the year in which the backtest is applied. The main reason for the presence of this bias is that I was unable to include the pair selection method in the backtest environment. However, my strategy does confirm the hypothesis that co-integrated pairs with high mean-reversion rates are likely to provide good returns. I tested this for several years, and the backtest always shows positive returns. On the other hand, including pairs with very low mean-reversion rates always results in losses.

Conclusion

Overall, my strategy shows that trading on pairs with high mean-reversion rates and a high degree of co-integration can be profitable. Although the returns are ultimately lower than the market, that is to be expected since pairs trading returns tend not to imitate the market. In fact, it can be seen in the backtest that the strategy provided positive returns even when the market returns turned to losses. Therefore, this strategy may be recommended as a safe bet, but not one that is likely to result in huge returns.

0
Notebook previews are currently unavailable.
1 response

Backtest

0
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
 Returns 1 Month 3 Month 6 Month 12 Month
 Alpha 1 Month 3 Month 6 Month 12 Month
 Beta 1 Month 3 Month 6 Month 12 Month
 Sharpe 1 Month 3 Month 6 Month 12 Month
 Sortino 1 Month 3 Month 6 Month 12 Month
 Volatility 1 Month 3 Month 6 Month 12 Month
 Max Drawdown 1 Month 3 Month 6 Month 12 Month
import numpy as np
import statsmodels.api as sm
import pandas as pd

import quantopian.optimize as opt
import quantopian.algorithm as algo
from statsmodels import regression

def initialize(context):
# Quantopian backtester specific variables
#set_symbol_lookup_date('2014-01-01')

context.stock_pairs = [
(symbol('PSX'), symbol('FANG')) ,
(symbol('APA'), symbol('SLB')) ,
(symbol('MUR'), symbol('COP')) ,
(symbol('SM'), symbol('WLL')) ,
(symbol('HP'), symbol('MUR')) ,
(symbol('APA'), symbol('MRO')) ,
(symbol('CLR'), symbol('MPC')) ,
(symbol('SLB'), symbol('CLR')) ,
(symbol('VLO'), symbol('BHGE')) ,
(symbol('CLR'), symbol('PBF')) ,
(symbol('EA'), symbol('SPOT')) ,
(symbol('YNDX'), symbol('FB')) ,
(symbol('PFPT'), symbol('FB')) ,
(symbol('AMAT'), symbol('FB')) ,
(symbol('INTC'), symbol('SYMC')) ,
(symbol('NXPI'), symbol('PFPT')) ,
(symbol('FSLR'), symbol('YNDX')) ,
(symbol('NXPI'), symbol('FB')) ,
(symbol('NXPI'), symbol('YNDX')) ,
(symbol('EMR'), symbol('UPS')) ,
(symbol('GD'), symbol('HII')) ,
(symbol('GWW'), symbol('AAL')) ,
(symbol('JBHT'), symbol('TRN')) ,
(symbol('GWW'), symbol('TRN')) ,
(symbol('NSC'), symbol('HON')) ,
(symbol('GWW'), symbol('RHI')) ,
(symbol('IR'), symbol('VRSK')) ,
]

#[(symbol('APA'), symbol('APC'))
#                          , (symbol('MSFT'), symbol('AAPL')), (symbol('ATU'), symbol('AIRM')), (symbol('SAVE'), symbol('XYL')), (symbol('GOOG_L'), symbol('LNKD')),(symbol('APA'), symbol('SLB'))]

context.stocks = symbols('PSX',
'FANG',
'HFC',
'XEC',
'APA',
'SLB',
'SM',
'NOV',
'MUR',
'COP',
'WLL',
'HP',
'MRO',
'CLR',
'MPC',
'VLO',
'BHGE',
'PBF',
'LNG',
'CVX',
'EA',
'SPOT',
'YNDX',
'FB',
'PFPT',
'AMAT',
'INTC',
'SYMC',
'NXPI',
'MRVL',
'FSLR',
'ACN',
'EMR',
'UPS',
'GD',
'HII',
'ROP',
'VRSK',
'DOV',
'GWW',
'AAL',
'CHRW',
'JBHT',
'TRN',
'NSC',
'HON',
'CMI',
'RHI',
'IR')

#symbols('APA', 'APC', 'MSFT', 'AAPL', 'ATU', 'AIRM','SAVE', 'XYL', 'GOOG_L', 'LNKD','SLB')

context.num_pairs = len(context.stock_pairs)
# strategy specific variables
context.lookback = 7000

context.target_weights = pd.Series(index=context.stocks, data=0.25)

context.inLong = [False] * context.num_pairs
context.inShort = [False] * context.num_pairs

# Only do work 30 minutes before close
schedule_function(func=check_pair_status, date_rule=date_rules.every_day(), time_rule=time_rules.market_close(minutes=30))
set_max_order_count(1)

# Will be called on every trade event for the securities you specify.
def handle_data(context, data):
# Our work is now scheduled in check_pair_status
pass

def check_pair_status(context, data):

prices = data.history(context.stocks, 'price', 8580, '1m').iloc[-context.lookback::]

for i in range(context.num_pairs):

(stock_y, stock_x) = context.stock_pairs[i]

Y = prices[stock_y]
X = prices[stock_x]

# Comment explaining try block
try:
except ValueError as e:
log.debug(e)
return

context.target_weights = get_current_portfolio_weights(context, data)

# Get the std of the long window
bolinger1 = long_ma + 0.5*long_std
bolinger2 = long_ma - 0.5*long_std
# mean reversion rate
dt = float(1)/len(sample)
dx = sample[1:] - sample[:-1].values
Yhat = dx/dt
Xvar = np.mean(sample[:-1]) - sample[:-1]
model = regression.linear_model.OLS(Yhat.values, Xvar.values).fit()
mean_rev = model.params

ref = 0

context.target_weights[stock_y] = 0
context.target_weights[stock_x] = 0

context.inShort[i] = False
context.inLong[i] = False

record(X_pct=0, Y_pct=0)
allocate(context, data)
return

if spread_at_t.values < bolinger2 and (not context.inLong[i]) and mean_rev>ref:
y_target_shares = 1
X_target_shares = -hedge
context.inLong[i] = True
context.inShort[i] = False

(y_target_pct, x_target_pct) = computeHoldingsPct(y_target_shares,X_target_shares, Y[-1], X[-1])

context.target_weights[stock_y] = y_target_pct * (1.0/context.num_pairs)
context.target_weights[stock_x] = x_target_pct * (1.0/context.num_pairs)

record(Y_pct=y_target_pct, X_pct=x_target_pct)
allocate(context, data)
return

if spread_at_t.values > bolinger1 and (not context.inShort[i]) and mean_rev>ref:
y_target_shares = -1
X_target_shares = hedge
context.inShort[i] = True
context.inLong[i] = False

(y_target_pct, x_target_pct) = computeHoldingsPct( y_target_shares, X_target_shares, Y[-1], X[-1] )

context.target_weights[stock_y] = y_target_pct * (1.0/context.num_pairs)
context.target_weights[stock_x] = x_target_pct * (1.0/context.num_pairs)

record(Y_pct=y_target_pct, X_pct=x_target_pct)
allocate(context, data)
return

model = sm.OLS(Y, X).fit()
return model.params
model = sm.OLS(Y, X).fit()
return model.params.values

def computeHoldingsPct(yShares, xShares, yPrice, xPrice):
yDol = yShares * yPrice
xDol = xShares * xPrice
notionalDol =  abs(yDol) + abs(xDol)
y_target_pct = yDol / notionalDol
x_target_pct = xDol / notionalDol
return (y_target_pct, x_target_pct)

def get_current_portfolio_weights(context, data):
positions = context.portfolio.positions
positions_index = pd.Index(positions)
share_counts = pd.Series(
index=positions_index,
data=[positions[asset].amount for asset in positions]
)

current_prices = data.current(positions_index, 'price')
current_weights = share_counts * current_prices / context.portfolio.portfolio_value
return current_weights.reindex(positions_index.union(context.stocks), fill_value=0.0)

def allocate(context, data):
# Set objective to match target weights as closely as possible, given constraints
objective = opt.TargetWeights(context.target_weights)

# Define constraints
constraints = []
constraints.append(opt.MaxGrossExposure(1.0))

algo.order_optimal_portfolio(
objective=objective,
constraints=constraints,
)
There was a runtime error.