Back to Community
!!!!! HELP !!!!!!

Hi guys ,

I need your help.

In the "Introduction Pair Trading" Lecture , I've found this piece of code :

symbol_list = ['ABGB', 'FSLR']
prices_df = get_pricing(symbol_list, fields=['price'], start_date='2015-01-01', end_date='2016-01-01')['price']
prices_df.columns = map(lambda x: x.symbol, prices_df.columns)

Actually , I'm working on Jupyter notebook not quantopian so I would like to find another way to write as "get_pricing" and "symbol" are quantopian keywords

I managed to do this :

symbol_list = ['ABGB', 'ASTI', 'CSUN', 'DQ', 'FSLR','SPY']
start_date='2014-01-01'
end_date='2015-01-01'
df_close = data.DataReader(symbol_list, 'yahoo', start_date, end_date)["Close"]
prices_df.columns = map(lambda x: x.?????????, prices_df.columns)

But , I really don't know what I have to put in ????????? to select I guess all the tickers ....

20 responses

Have you checked if the columns aren't already the symbols? Because they are if I use the following code:

from pandas_datareader import data

symbol_list = ['ABGB', 'ASTI', 'CSUN', 'DQ', 'FSLR','SPY']  
start_date='2014-01-01'  
end_date='2015-01-01'  
df_close = data.DataReader(symbol_list, 'yahoo', start_date, end_date)["Close"]

print(df_close.head())  
Symbols         ASTI         DQ       FSLR         SPY  ABGB  CSUN  
Date  
2014-01-02  146220.0  38.000000  57.439999  182.919998   NaN   NaN  
2014-01-03  145000.0  39.090000  56.740002  182.889999   NaN   NaN  
2014-01-06  142420.0  40.049999  51.259998  182.360001   NaN   NaN  
2014-01-07  145880.0  41.930000  52.490002  183.479996   NaN   NaN  
2014-01-08  142020.0  42.380001  51.680000  183.520004   NaN   NaN  

If not, what are the columns in your case?

Yes , I have the same thing but I preferred to follow the lecture ....maybe it is useless to use the "map" function here.

Because Later , I get an error doing this :

Blockquote

Heatmap to show the p-values of the cointegration test between each pair of

Blockquote

stocks. Only show the value in the upper-diagonal of the heatmap
scores, pvalues, pairs = find_cointegrated_pairs(df_close)
import seaborn
seaborn.heatmap(pvalues, xticklabels=symbol_list, yticklabels=symbol_list, cmap='RdYlGn_r'
, mask = (pvalues >= 0.05)
)
print(pairs)

and the error was :

'NoneType' object has no attribute 'shape'

and the "shape" comes from a function as following :

def find_cointegrated_pairs(data):
n = data.shape[1]
score_matrix = np.zeros((n, n))
pvalue_matrix = np.ones((n, n))
keys = data.keys()
pairs = []
for i in range(n):
for j in range(i+1, n):
S1 = data[keys[i]]
S2 = data[keys[j]]
result = coint(S1, S2)
score = result[0]
pvalue = result[1]
score_matrix[i, j] = score
pvalue_matrix[i, j] = pvalue
if pvalue < 0.05:
pairs.append((keys[i], keys[j]))
return score_matrix, pvalue_matrix, pairs

thanks for your help

That's odd, because

df_close.shape  

shows me
(252, 6)

could you post the link to the lecture or attach the lecture notebook?
EDIT: found it...

Could you give me your e-mail address please because the Attach seems to be under maintenance

Thanks

I couldn't reproduce this error, the only problem I have is the lack of data for ABGB and CSUN on yahoo. I've replaced them with GLD and IAU (both gold ETFs) and everything runs as expected.

Often the problem with notebooks is that something undesirable still is stored in memory, try restarting the kernell and run all the cells again.

import numpy as np  
import pandas as pd  
import statsmodels  
import statsmodels.api as sm  
from statsmodels.tsa.stattools import coint

np.random.seed(107)

import matplotlib.pyplot as plt  

def find_cointegrated_pairs(data):  
    n = data.shape[1]  
    score_matrix = np.zeros((n, n))  
    pvalue_matrix = np.ones((n, n))  
    keys = data.keys()  
    pairs = []  
    for i in range(n):  
        for j in range(i+1, n):  
            S1 = data[keys[i]]  
            S2 = data[keys[j]]  
            result = coint(S1, S2)  
            score = result[0]  
            pvalue = result[1]  
            score_matrix[i, j] = score  
            pvalue_matrix[i, j] = pvalue  
            if pvalue < 0.05:  
                pairs.append((keys[i], keys[j]))  
    return score_matrix, pvalue_matrix, pairs

from pandas_datareader import data ```

symbol_list = ['ABGB', 'ASTI', 'CSUN', 'DQ', 'FSLR','SPY']  
start_date ='2014-01-01'  
end_date ='2015-01-01'  
df_close = data.DataReader(symbol_list, 'yahoo', start_date, end_date)["Close"]

df_close = df_close.dropna(inplace=True)  
Heatmap to show the p-values of the cointegration test between each pair of  
stocks. Only show the value in the upper-diagonal of the heatmap  
scores, pvalues, pairs = find_cointegrated_pairs(df_close)  
import seaborn  
seaborn.heatmap(pvalues, xticklabels=symbol_list, yticklabels=symbol_list, cmap='RdYlGn_r'  
                , mask = (pvalues >= 0.05)  
                )  
print(pairs)  

Try this please , you should get an error in the last cell

nothing changed ,same error .....could take a look at what my comment

Oh, I see. The problem is this line

df_close = df_close.dropna(inplace=True)  

You could use

df_close = df_close.dropna(axis=1)  

However, the NaNs are in the symbols that are integrated...
Replace 'ASTI' and 'ABGB' with 'GLD' and 'IAU' then you don't need to drop NaNs. Just make sure you replace them in all the following cells

CORRECTION: axis=1 and ABGB
now I don't know any more which ones were the culprits, here is the symbol list that works for me:
symbol_list = ['ABGB', 'DQ', 'FSLR','SPY', 'GLD', 'IAU']

Bro , it's worse than before.

What's it saying?

It suppressed everything except the columns.

I'm going to keep 'ASTI' and 'ABGB'

The problem is not them but the code

I've uploaded my edited version. It works up to the part Moving Averages , we would need an old pandas version or find another kind of calculation for this...

here it is

Loading notebook preview...

Btw, is there an important reason why you don't use the notebook on quantopian? Would save you all this trouble ;)

I don't want to use quantopian because if I would like to use my algorithm in foreigner stocks ...would it be possible ? Also , I don't like the fact of using specific keywords related to a platform. I prefer a general notebook to work to everything ....

Anyway , it works now , when I replaced the with 'GLD' and 'IAU' and I found them correlated
Also , in "df_close.dropna(inplace=True)" It deleted all the rows ....so It was a big mistake.

I'm wondering if with quantopian we can use foreigner stocks ??

Take a look here, they have a lot of foreign stocks. If I want to see if they have a particular symbol I just write

symbols('THESYMBOL').asset_name  

in a notebook and see if the name is what I expected

As expected , there is nothing for Tunisia . but the most important thing is that it covers the "international assets"....

How can we replace this piece of code ?


```rolling_beta = pd.ols(y=S1, x=S2, window_type='rolling', window=30)


I made it for the mean and std ..but not for ols  

Yes, that's the part where I got stuck. I have no answer for this, because a) I'm not sure what exactly pandas should have done here and b) I don't know much about statistics anyway...
I tried downgrading pandas but that resulted in a dependency nightmare ;)

@Q
perhaps you can help?

Ok, I found out how to calculate rolling beta 'by hand'

# Get the spread between the 2 stocks  
# Calculate rolling beta coefficient  
cov = S1.rolling(30).cov(S2.rolling(30))  
var = S1.rolling(30).var()  
rolling_beta = cov / var  
spread = S2 - rolling_beta * S1

spread.name = 'spread'

# Get the 1 day moving average of the price spread  
spread_mavg1 = spread.rolling(1).mean()  
spread_mavg1.name = 'spread 1d mavg'

# Get the 30 day moving average  
spread_mavg30 = spread.rolling(30).mean()  
spread_mavg30.name = 'spread 30d mavg'

plt.plot(spread_mavg1.index, spread_mavg1.values)  
plt.plot(spread_mavg30.index, spread_mavg30.values)


plt.legend(['1 Day Spread MAVG', '30 Day Spread MAVG'])

plt.ylabel('Spread');  

Thanks bro :)