Back to Community
Cointegration test don't match with Lectures example

Hi.
I have written this code to perform a cointegration test on a list of tickers in order to return a list of cointegrated assets for every ticker in the list. I have modified slightly the code in the example called "How to Build a Pairs Trading Strategy on Quantopian?" because i am not very familiar with panels, so i'd stick with dataframes for now.

The link of the example:
https://www.quantopian.com/research/notebooks/Cloned%20from%20%22How%20to%20Build%20a%20Pairs%20Trading%20Strategy%20on%20Quantopian%3F%22%202.ipynb

So i have the same couple of symbols that came out to be cointegrated as in the example and they are 'ABGB' and 'CSUN'.
But when i move on and try to visualize the zscore in a plot i have totally different graphic results although the symbols are the same and the start time and end time are the same as in the example

Here is the code:

import pandas as pd  
import pandas.io.data as web  
import numpy as np  
import matplotlib.pyplot as plt  
import statsmodels  
from statsmodels.tsa.stattools import coint  
pd.options.display.mpl_style = 'default'

start= '2014-1-1'  
end= '2015-1-1'  
ticker_list = ['ABGB', 'ASTI', 'CSUN', 'DQ', 'FSLR','SPY']  
allData = {}  
for ticker in ticker_list:  
    allData[ticker]= web.get_data_yahoo(ticker, start, end)  
#just prices:  
total_df = pd.DataFrame({tic:data['Close'] for tic, data in allData.iteritems()})  
#percent variations:  
daily_returns= total_df.pct_change()[1:]  
#cumulative variations:  
return_index= (1 + daily_returns).cumprod()

def cointegration_finder(ticker_list):  
    '''  
    populate the dictionary 'result' with each symbol as keys  
    and the list of symbols with wich it is cointegrated as values  
    then convert it to a dataframe for printing  
    '''  
    result= {}  
    for ticker in ticker_list:  
        compare_ticker_to_this_list = [x for x in ticker_list if x != ticker]  
        cointegrated_tickers = [x for x in compare_ticker_to_this_list if coint(total_df[x],total_df[ticker])[1]<.05]  
        result[ticker]= cointegrated_tickers  
    return pd.DataFrame.from_dict(result, orient='index')

df = cointegration_finder(ticker_list)  
print df


def zscore(series):  
    return (series - series.mean()) / np.std(series)

def visualize_spread(x, y):  
    score, pvalue, _= coint(x, y)  
    diff_series= x-y  
    zscore(diff_series).plot()  
    plt.axhline(zscore(diff_series).mean(), color='black')  
    plt.axhline(1.0, color='red', linestyle='--')  
    plt.axhline(-1.0, color='green', linestyle='--')  
    plt.figure(figsize=(15,8))  
    plt.show()

visualize_spread(total_df['ABGB'], total_df['CSUN'])  
8 responses

Hello Giuseppe,

I'd be happy to help figure out the mismatch. Would you mind replying to this thread with the actual notebook you're working on? If you'd prefer not to for privacy reasons I can try to work with what you have now. The link you shared won't work because it's specific to your account.

Also, indexing into a Panel should yield a DataFrame. I can share an example of this if you'd like.

Lastly, I need to make a mention of this in the lecture, but once you've gotten your analysis working make sure that you are doing another level of statistical validation on the pairs you find. Because of multiple comparisons bias you will find many pairs that achieve significant p-values through random chance. Out of sample testing can help with this. I can explain more if you're not familiar with multiple comparisons bias.

Thanks,
Delaney

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

thank you Delaney!
I try to make my problem more clear: first of all i have done this notebook attached ( 'Untitled.ipynb' ) and i have this 2 problems when i try to compare it with yours that i will clone and attach in my next post because i can attach just one notebook here as far as i see.

  1. I found out that FSLR and ABGB are cointegrated as you can see in line 12 but also ASTI is cointegrated with CSUN but not vice versa, how is this possible?
    2) When i plot the spread between the two series in line 20 i have completely different results than yours.. and i can't figure out why

Thanks again, your Lectures are really clear and useful i will follow along with them to have more tools to make my statistical validation more robust.
-Giuseppe

Loading notebook preview...
Notebook previews are currently unavailable.

.. and this is your Notebook from the Lecture..

Loading notebook preview...
Notebook previews are currently unavailable.
  1. The mathematic definition of cointegration is not necessarily symmetric. Unlike in the notebook, which deals with the simpler example of X - Y, the mathematical test for cointegration deals with X / Y and is more complicated. It is, as far as I know, possible to have pairs that are cointegrated in one direction and not the other. I'll be updating the lecture soon to include discussion of the pair ratio X / Y. This is more in line with the traditional mathematical way of looking at pairs, and leads more nicely into concepts like hedge ratios. You might want to try plotting both as I have done in the updated notebook. Ultimately both are likely useful signals, and it's up to you to decide which will work better for your use case.

  2. It appears you are plotting the spread between ABGB and CSUN, in my notebook the spread is between ABGB and FSLR. I added a spread plot between ABGB and FSLR and it appears to line up with my notebook. Let me know if I'm missing something here.

Thank you for your kind words about the lecture series, feedback like yours helps us keep improving them. Please don't hesitate to reach out with comments.

Loading notebook preview...
Notebook previews are currently unavailable.

thank you Delaney, the reply on point one was very clear and ... sorry for the trivial error on point 2 :)
-Giuseppe

Happy to help.

Hello Delaney,

I'm having a different result too for the coint function, notebook "Introduction to Pairs Trading".

# compute the p-value of the cointegration test  
# will inform us as to whether the spread between the 2 timeseries is stationary  
# around its mean  
score, pvalue, _ = coint(X,Y)  
print(pvalue)  
0.683824295029  

The result was 0.683824295029 vs 2.75767345363e-16 (https://www.quantopian.com/lectures/introduction-to-pairs-trading)

I´m using the same sequence of commands here. I´m using jupyter with statmodels 0.8.0;

Bruno

Hey Bruno,

Could you post precisely the series of commands that generate X and Y? I tried rerunning the lecture notebook and it looks fine for me.

Thanks,
Delaney