Back to Community
Industry Returns Correlation

Could someone please help on attach piece of code? The goal is to convert pipeline dataframe into another data frame that will show daily mean return of each of the industry code. So that, rows will be dates.... and columns will be industry code. So that one can run df.corr() on that datafram to see correlation between industry returns. I am trying for many hours but still no luck

Loading notebook preview...
3 responses

I think you could try using the before_trading_start() method, and within that method use the pandas get_value method. Then every day you can get the mean return for whatever industry code you want. You would then have to import that data into another dataframe in order to run the correlation. I haven't tried this myself, but found this example which might help:

The two methods you are looking for are groupby and unstack.

First, you will want to group by the date (index level 0) and then by the industry code. In pandas 0.20 and later this can be done in one step. However, in version 0.18 (which is the version currently here on Quantopian) one cannot mix grouping by an index and a column. The workaround is to copy the date index to a column, then group on the two columns. Like this

results['date'] = results.index.get_level_values(level=0)  
mean_returns = results.groupby(['date', 'gr']).returns.mean()

That gives us a multi-indexed pandas series with index level 0 as dates, level 1 as industry codes, and values which are the mean returns of each industry group. Almost what we want except there should a be column for each industry group and not a multi-index. The unstack method to the rescue. The stack and unstack methods should be part of everyone's 'goto methods'. They turn columns into an index and an index into columns respectively. I remember which is which by thinking of 'stacking' as making the index 'taller' and 'unstacking' as making it 'shorter'. So, a single line will get our industry groups into columns.


That's it. Three lines of code (could even be two). Knowing the pandas methods will save a lot of time.

Now, one little detail about returns. Arithmetic returns cannot really be added or averaged. Consider returns of -10% and 10%. The average is NOT 0%. If one has a 10% loss then a 10% gain you will end up with a 1% loss (99% of your original amount) and not back to 0. To average returns one must use log returns. The best way to do this is in the pipeline factor definition.

log_returns = Factors.Returns(window_length = 2).log1p()

The built in log1p method will turn arithmetic returns into log returns. If you feel more comfortable dealing with arithmetic returns you can turn it back into arithmetic returns with the expm1 method after you get the means.

Finally, to get the correlations do something like this. It's also informative to plot correlations as a heatmap. Seaborn has a nice method to do that.

# Use the pandas `corr` method to get correlations between mean log returns  
correlations = mean_log_returns.unstack().corr()

# Show a heatmap of the correlations using seaborn  
import seaborn as sns  

The attached notebook shows this code in action.

Loading notebook preview...

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

@ Dan - Can't thank you enough. Always super helpful. Also, thank you very much for additional insight on the log returns.