Industry Returns Correlation

Could someone please help on attach piece of code? The goal is to convert pipeline dataframe into another data frame that will show daily mean return of each of the industry code. So that, rows will be dates.... and columns will be industry code. So that one can run df.corr() on that datafram to see correlation between industry returns. I am trying for many hours but still no luck

1
3 responses

I think you could try using the before_trading_start() method, and within that method use the pandas get_value method. Then every day you can get the mean return for whatever industry code you want. You would then have to import that data into another dataframe in order to run the correlation. I haven't tried this myself, but found this example which might help:
https://www.quantopian.com/posts/access-data-in-pipeline-output-pandas-dataframe

The two methods you are looking for are groupby and unstack.

First, you will want to group by the date (index level 0) and then by the industry code. In pandas 0.20 and later this can be done in one step. However, in version 0.18 (which is the version currently here on Quantopian) one cannot mix grouping by an index and a column. The workaround is to copy the date index to a column, then group on the two columns. Like this

results['date'] = results.index.get_level_values(level=0)
mean_returns = results.groupby(['date', 'gr']).returns.mean()



That gives us a multi-indexed pandas series with index level 0 as dates, level 1 as industry codes, and values which are the mean returns of each industry group. Almost what we want except there should a be column for each industry group and not a multi-index. The unstack method to the rescue. The stack and unstack methods should be part of everyone's 'goto methods'. They turn columns into an index and an index into columns respectively. I remember which is which by thinking of 'stacking' as making the index 'taller' and 'unstacking' as making it 'shorter'. So, a single line will get our industry groups into columns.

mean_returns.unstack()



That's it. Three lines of code (could even be two). Knowing the pandas methods will save a lot of time.

Now, one little detail about returns. Arithmetic returns cannot really be added or averaged. Consider returns of -10% and 10%. The average is NOT 0%. If one has a 10% loss then a 10% gain you will end up with a 1% loss (99% of your original amount) and not back to 0. To average returns one must use log returns. The best way to do this is in the pipeline factor definition.

log_returns = Factors.Returns(window_length = 2).log1p()



The built in log1p method will turn arithmetic returns into log returns. If you feel more comfortable dealing with arithmetic returns you can turn it back into arithmetic returns with the expm1 method after you get the means.

Finally, to get the correlations do something like this. It's also informative to plot correlations as a heatmap. Seaborn has a nice method to do that.

# Use the pandas corr method to get correlations between mean log returns
correlations = mean_log_returns.unstack().corr()

# Show a heatmap of the correlations using seaborn
import seaborn as sns
sns.heatmap(correlations);



The attached notebook shows this code in action.

4