Back to Community
Coke vs Pepsi: An Integration Trade

I recently did some work in R and the library package quantmod (see link below) to find co-integrated stock pairs. The idea is that some stocks not only move together (correlated) but tend to have a stationary mean-reverting spread.

So for instance, say Pepsi and Coke are priced at $25 and $50, respectively. Therefore, the relative spread is 0.5 (25/50). The idea is that although the prices of each individual stock will vary greatly over time, the spread will remain fairly constant.

My strategy buys Pepsi and sells Coke when the spread narrows below a certain amount as a percentage of its historic standard deviation. It then waits for the spread to widen to somewhere closer to its medium run average and closes the position. It does the opposite when the spread widens too far.

I experimented a bit with the specific parameters and used 2.0 standard deviations away from the medium run average as a trigger to buy and 0.5 standard deviations away from that same average to close out the position. I also use 20 days as the period to calculate the mean and standard deviation of the spread. My bet size was 200k with a requirement of my position value + cash to be greater than 0 to open new trades. I experimented with smaller bets but found that placing larger bets and waiting for reversion was more profitable.

Obviously these parameters need to be optimized.

Finding cointegrated pairs is simple enough. Does anyone else have any strategies on trading on cointegration?

R package for financial analysis:
http://www.quantmod.com/
Quick tutorial on finding cointegrated pairs using R:
http://quanttrader.info/public/testForCoint.html

Clone Algorithm
128
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
import numpy as np

def initialize(context):
    # trading Coke (KO) and Pepsi (PEP)
    context.stocks = [sid(4283), sid(5885)]
    # $200,000 bets 
    context.bet_amount = 200000
    # long (short) if KO/PEP is below (above) 20-day average +(-) 2 * 20-day standard deviation
    # close long (short) position when KO/PEP rises (falls) to long (short) price -(+) 0.5 * long (short) standard deviation 
    context.trade_tolerance = 2.0
    context.close_tolerance = 0.5
    context.close_trades = {"long": {}, "short": {}}

def handle_data(context, data):
    rval = avgstd(data,context)
    
    if rval is None:
        return
    
    avg_spread, std_spread = rval
    cur_spread = data[context.stocks[0]].price/data[context.stocks[1]].price

    
    signal = calculate_direction(context, cur_spread, avg_spread, std_spread)
    
    a_price, b_price = data[context.stocks[0]].price, data[context.stocks[1]].price
    a_order, b_order = calculate_order(context, signal, a_price, b_price)
    
    if a_order == 0 and b_order == 0: return
    
    calculate_close_trades(context,signal,cur_spread,std_spread,a_order,b_order)
    
    a_order, b_order = calculate_net_orders(context,cur_spread,a_order,b_order)
    
    order(context.stocks[0], a_order)
    order(context.stocks[1], b_order)
    
    if signal:
        logmsg = '\n{sig} a_price: {ap} b_price: {bp} cur_spread: {c} avg_spread: {m} std_spread: {s} a_position: {a} b_position: {b}\n'
        log.info(logmsg.format(
                               sig=signal,
                               ap=a_price,
                               bp=b_price,
                               c=cur_spread, 
                               m=avg_spread, 
                               s=std_spread, 
                               a=a_order,
                               b=b_order
                           ))
    
def calculate_direction(context, cur_spread, avg_spread, std_spread):
    """ calculates the direction of trade given trade tolerance, current spraed, average spread and standard deviation of spread """
    if avg_spread is not None and (context.portfolio.positions_value + context.portfolio.cash) > 0:
        if cur_spread >= avg_spread + std_spread * context.trade_tolerance:
            # if 0 or negative holdings in A, short spread -- short A, long B
            return "short"
        elif cur_spread <= avg_spread - std_spread * context.trade_tolerance:
            # if 0 or positive holdings in A, long spread -- long A, short B
            return "long"
    try: 
        log.info("No cash...")
    except NameError:
        log.info("avg_spread not defined...")
    return None

def calculate_close_trades(context,signal,cur_spread, std_spread, a_order, b_order):
    """ calculates close trades given close tolerance, signal, current spread, standard deviation of spread and  current orders """
    close_order_amount = np.matrix((-1*a_order, -1*b_order))
    if signal =="long":
        close_spread_amount = cur_spread + std_spread*context.close_tolerance
    elif signal == "short":
        close_spread_amount = cur_spread - std_spread*context.close_tolerance
        
    if cur_spread in context.close_trades[signal]:
        context.close_trades[signal][close_spread_amount] += close_order_amount
    else:
        context.close_trades[signal][close_spread_amount] = close_order_amount
    return None

def calculate_net_orders(context, cur_spread, a_order, b_order):
    """ calculates net orders given the close orders from existing trades, current spread and current orders """
    for close_spread in context.close_trades["long"].keys():
        if cur_spread > close_spread:
            amount = context.close_trades["long"][close_spread]
            a_order += amount.item(0)
            b_order += amount.item(1)
            del context.close_trades["long"][close_spread] 
    for close_spread in context.close_trades["short"].keys():
        if cur_spread < close_spread:
            amount = context.close_trades["short"][close_spread]
            a_order += amount.item(0)
            b_order += amount.item(1)
            del context.close_trades["short"][close_spread]
    return (a_order, b_order)
        
def calculate_order(context, signal, a_price, b_price):
    """ returns the net-flat order amounts of KO and PEP given signal and price """
    a_abs_order_amount = int(context.bet_amount/a_price)
    b_abs_order_amount = int(context.bet_amount/b_price)
    if not signal: return (0,0)
    elif signal[0] == "l":
        return ( a_abs_order_amount, (-1 * b_abs_order_amount))
    elif signal[0] == "s":
        return ( (-1 * a_abs_order_amount), b_abs_order_amount)
    return None
   
@batch_transform(refresh_period=1, window_length=20)
def avgstd(datapanel,context):
    """ returns mean and standard devaition of KO/PEP over the last 20 days"""
    a_price = np.array(datapanel['price'][context.stocks[0]])
    b_price = np.array(datapanel['price'][context.stocks[1]])
    
    if a_price is not None and b_price is not None:
        spread = a_price/b_price
        return (spread.mean(), spread.std())
    else:
        return None
This backtest was created using an older version of the backtester. Please re-run this backtest to see results using the latest backtester. Learn more about the recent changes.
There was a runtime error.
4 responses

@Branko, thanks for sharing. This is an interesting idea.

I noticed a problem in the algorithm that result in overbuying your positions.

The way portfolio.positions_value and portfolio.cash are calculated by the backtester is not very useful for hedged positions. The short position is subtracted from the long position. As a result the overall position can run wild but the two variables that are supposed to measure the position remain neutral (close to zero).

So in calculate_direction() where you test for "cash" the result is not going to be what you expect. I suggest directly checking to see if you have a position and not using the aggregate value:

    # this boolean test reveals if a position is already established  
    if context.portfolio.positions[context.stocks[0]].amount == 0:

    # this boolean test is highly misleading because the variables are miscalculated by the backtester  
    if (context.portfolio.positions_value + context.portfolio.cash) > 0:  

In addition you had a short circuit test that would exit from handle_data() if a_order and b_order equal zero. I didn't trace all the logic to find out if this is a problem but in general I wouldn't want to have logic that halted before testing calculate_close_trades()

    if a_order == 0 and b_order == 0: return  

So I commented out that test. As a result calculate_close_trades() will likely be called even if no position is established. I added a quick test inside the function to prevent an error caused by trying to access variable keys that haven't been set yet.

    if not signal in context.close_trades: return  

The modified backtest is attached.

Clone Algorithm
26
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
import numpy as np

def initialize(context):
    # trading Coke (KO) and Pepsi (PEP)
    context.stocks = [sid(4283), sid(5885)]
    # $200,000 bets 
    context.bet_amount = 200000
    # long (short) if KO/PEP is below (above) 20-day average +(-) 2 * 20-day standard deviation
    # close long (short) position when KO/PEP rises (falls) to long (short) price -(+) 0.5 * long (short) standard deviation 
    context.trade_tolerance = 2.0
    context.close_tolerance = 0.5
    context.close_trades = {"long": {}, "short": {}}

def handle_data(context, data):
    rval = avgstd(data,context)
    
    if rval is None:
        return

    avg_spread, std_spread = rval
    cur_spread = data[context.stocks[0]].price/data[context.stocks[1]].price
    
    signal = calculate_direction(context, cur_spread, avg_spread, std_spread)
    
    a_price, b_price = data[context.stocks[0]].price, data[context.stocks[1]].price
    a_order, b_order = calculate_order(context, signal, a_price, b_price)
    
    #if a_order == 0 and b_order == 0: return
    
    calculate_close_trades(context,signal,cur_spread,std_spread,a_order,b_order)
    
    a_order, b_order = calculate_net_orders(context,cur_spread,a_order,b_order)
    
    order(context.stocks[0], a_order)
    order(context.stocks[1], b_order)
    
    if signal:
        logmsg = '\n{sig} a_price: {ap} b_price: {bp} cur_spread: {c} avg_spread: {m} std_spread: {s} a_position: {a} b_position: {b}\n'
        log.info(logmsg.format(
                               sig=signal,
                               ap=a_price,
                               bp=b_price,
                               c=cur_spread, 
                               m=avg_spread, 
                               s=std_spread, 
                               a=a_order,
                               b=b_order
                           ))
    
def calculate_direction(context, cur_spread, avg_spread, std_spread):
    """ calculates the direction of trade given trade tolerance, current spraed, average spread and standard deviation of spread """
    if avg_spread is not None and context.portfolio.positions[context.stocks[0]].amount == 0:
        # and (context.portfolio.positions_value + context.portfolio.cash) > 0:
        if cur_spread >= avg_spread + std_spread * context.trade_tolerance:
            # if 0 or negative holdings in A, short spread -- short A, long B
            return "short"
        elif cur_spread <= avg_spread - std_spread * context.trade_tolerance:
            # if 0 or positive holdings in A, long spread -- long A, short B
            return "long"
    return None

def calculate_close_trades(context,signal,cur_spread, std_spread, a_order, b_order):
    """ calculates close trades given close tolerance, signal, current spread, standard deviation of spread and  current orders """
    close_order_amount = np.matrix((-1*a_order, -1*b_order))
    if signal =="long":
        close_spread_amount = cur_spread + std_spread*context.close_tolerance
    elif signal == "short":
        close_spread_amount = cur_spread - std_spread*context.close_tolerance

    if not signal in context.close_trades: return
    if cur_spread in context.close_trades[signal]:
        context.close_trades[signal][close_spread_amount] += close_order_amount
    else:
        context.close_trades[signal][close_spread_amount] = close_order_amount
    return None

def calculate_net_orders(context, cur_spread, a_order, b_order):
    """ calculates net orders given the close orders from existing trades, current spread and current orders """
    for close_spread in context.close_trades["long"].keys():
        if cur_spread > close_spread:
            amount = context.close_trades["long"][close_spread]
            a_order += amount.item(0)
            b_order += amount.item(1)
            del context.close_trades["long"][close_spread] 
    for close_spread in context.close_trades["short"].keys():
        if cur_spread < close_spread:
            amount = context.close_trades["short"][close_spread]
            a_order += amount.item(0)
            b_order += amount.item(1)
            del context.close_trades["short"][close_spread]
    return (a_order, b_order)
        
def calculate_order(context, signal, a_price, b_price):
    """ returns the net-flat order amounts of KO and PEP given signal and price """
    a_abs_order_amount = int(context.bet_amount/a_price)
    b_abs_order_amount = int(context.bet_amount/b_price)
    if not signal: return (0,0)
    elif signal[0] == "l":
        return ( a_abs_order_amount, (-1 * b_abs_order_amount))
    elif signal[0] == "s":
        return ( (-1 * a_abs_order_amount), b_abs_order_amount)
    return None
   
@batch_transform(refresh_period=1, window_length=20)
def avgstd(datapanel,context):
    """ returns mean and standard devaition of KO/PEP over the last 20 days"""
    a_price = np.array(datapanel['price'][context.stocks[0]])
    b_price = np.array(datapanel['price'][context.stocks[1]])
    
    if a_price is not None and b_price is not None:
        spread = a_price/b_price
        return (spread.mean(), spread.std())
    else:
        return None
This backtest was created using an older version of the backtester. Please re-run this backtest to see results using the latest backtester. Learn more about the recent changes.
There was a runtime error.

Thanks for the feedback Dennis C.

You're right about the short-circuit. I originally had the calculate_direction function yield a close position signal but had changed that and forgot to remove the short circuit.

In regards to the position limit, I want to be able to put down more bets when the spread continues to widen/collapse as the reversion would be expected to be even more profitable. Do you know of a good way to control hedged positions that would somewhat resemble real life trading apart from testing if there are already any holdings? Without any check, the swings would just be wild but with the check you suggested no activity could take place until spread converges.

Also, any idea how I would be able to optimize the variables? I imagine I would want to optimize them on one data set (in the past) and test them on another (in the more recent past). Hopefully I can do this in the Quantopian platform.

For better position metrics I suggest the home-brew calculation I posted in another thread on the topic. The values abs_cash and abs_capital_used should be calculated once per handle_data(). If you find any errors in this approach please let me know.

# calculate capital_used using absolute amount: abs(amount)*cost_basis  
pos = context.portfolio.positions  
abs_capital_used = sum(abs(pos[s].amount) * pos[s].cost_basis for s in pos)  
record(abs_capital_used = abs_capital_used)  
# calculate free cash: starting_cash + pnl - capital_used  
port = context.portfolio  
abs_cash = port.starting_cash + port.pnl - abs_capital_used  
record(abs_cash = abs_cash)  

You can specify any start and end date (from 2002 to 2013) in your backtests. But Quantopian isn't really set up for parameter optimization. I would caution you that overfitting can easily occur so be very conservative in your optimizations. Much better would be to find natural ratios and averages to use in your formulas. That way it has a much better chance of adapting to out-of-sample data.

Branko,
Thanks for sharing the code, insights, and links.

I recently did a casual survey of Coke and Pepsi historical data. I have a hypothesis that is a twist on cointegrated pairs.
The hypothesis asserts:

  • Coke and Pepsi have movement around dividend time
  • Maybe, a lot of institutional and retail money moves from KO to PEP and back around the dividend cycle
  • The magnitude of movement might be greater than the dividend as a percent of share price
  • Buy PEP on KO ex-Dividend day, and vice versa

I suppose that is more of a calendar feed and scheduling job than a technical cointegration strategy.
I do want to emphasize casual survey. I merely compared decades of KO and PEP visually on a Yahoo Interactive chart, with Dividend events turned on.
The hypothesis appeared truthy, but was not measured and tested. Anyone in the community doing that sort of testing?

I posted an image here:
Cointegrated Pair with Ex-Dividend Timing?

Cheers,
Monte
nFol.io