The two methods you are looking for are groupby and unstack.
First, you will want to group by the date (index level 0) and then by the industry code. In pandas 0.20 and later this can be done in one step. However, in version 0.18 (which is the version currently here on Quantopian) one cannot mix grouping by an index and a column. The workaround is to copy the date index to a column, then group on the two columns. Like this
results['date'] = results.index.get_level_values(level=0)
mean_returns = results.groupby(['date', 'gr']).returns.mean()
That gives us a multi-indexed pandas series with index level 0 as dates, level 1 as industry codes, and values which are the mean returns of each industry group. Almost what we want except there should a be column for each industry group and not a multi-index. The
unstack method to the rescue. The
unstack methods should be part of everyone's 'goto methods'. They turn columns into an index and an index into columns respectively. I remember which is which by thinking of 'stacking' as making the index 'taller' and 'unstacking' as making it 'shorter'. So, a single line will get our industry groups into columns.
That's it. Three lines of code (could even be two). Knowing the pandas methods will save a lot of time.
Now, one little detail about returns. Arithmetic returns cannot really be added or averaged. Consider returns of -10% and 10%. The average is NOT 0%. If one has a 10% loss then a 10% gain you will end up with a 1% loss (99% of your original amount) and not back to 0. To average returns one must use log returns. The best way to do this is in the pipeline factor definition.
log_returns = Factors.Returns(window_length = 2).log1p()
The built in
log1p method will turn arithmetic returns into log returns. If you feel more comfortable dealing with arithmetic returns you can turn it back into arithmetic returns with the
expm1 method after you get the means.
Finally, to get the correlations do something like this. It's also informative to plot correlations as a heatmap. Seaborn has a nice method to do that.
# Use the pandas `corr` method to get correlations between mean log returns
correlations = mean_log_returns.unstack().corr()
# Show a heatmap of the correlations using seaborn
import seaborn as sns
The attached notebook shows this code in action.