Back to Community
Checking Correlation and Risk Exposure of Alpha Factors

The quant workflow involves testing models to see whether they're predictive of returns. Once you've shown they're predictive, you want to check how they correlate with other models and known risk factors. Here's a video going through that whole workflow.

Here is a notebook I built that allows you to check the correlation between two alpha factors, plus run an alpha factor through our Pyfolio integration to see what the risk exposures look like. You probably want to do this as you're building alpha factors so you can see if the effect you found is actually super correlated with a known risk factor, or correlated with another model you're already working on. If it is that isn't necessarily a bad thing, two models which are 50% correlated with still be 50% uncorrelated and adding them should help diversify your portfolio.

This notebook assumes a working knowledge of Alphalens and Pyfolio. You can get both in our Getting Started Tutorial.

EDIT: Forgot -- I should also note that much of the code in this notebook comes from examples built by Luca.

Loading notebook preview...

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

13 responses

Such an awesome notebook! Thanks for putting this together and for answering the first batch of questions in the Alphalens Questions Thread.

Adding a 3rd factor to check correlation of returns work fine, but when I try to check the correlation matrix I get the below error, which I'm not able to troubleshoot due to my limited Python and Numpy skills. :( Could anyone help please?

np.corrcoef(factor1_returns, factor2_returns, factor3_returns)

ValueErrorTraceback (most recent call last)  
<ipython-input-39-712bb4abfd00> in <module>()  
----> 1 np.corrcoef(factor1_returns, factor2_returns, factor3_returns)

/usr/local/lib/python2.7/dist-packages/numpy/lib/function_base.pyc in corrcoef(x, y, rowvar, bias, ddof)
   2559         warnings.warn('bias and ddof have no effect and are deprecated',  
   2560                       DeprecationWarning)  
-> 2561     c = cov(x, y, rowvar)  
   2562     try:  
   2563         d = diag(c)

/usr/local/lib/python2.7/dist-packages/numpy/lib/function_base.pyc in cov(m, y, rowvar, bias, ddof, fweights, aweights)
   2422         dtype = np.result_type(m, y, np.float64)  
   2423     X = array(m, ndmin=2, dtype=dtype)  
-> 2424     if rowvar == 0 and X.shape[0] != 1:  
   2425         X = X.T  
   2426     if X.shape[0] == 0:

/usr/local/lib/python2.7/dist-packages/pandas/core/generic.pyc in __nonzero__(self)
    890         raise ValueError("The truth value of a {0} is ambiguous. "  
    891                          "Use a.empty, a.bool(), a.item(), a.any() or a.all()."  
--> 892                          .format(self.__class__.__name__))  
    894     __bool__ = __nonzero__

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Thanks for putting together this notebook. Very helpful.


My bad, I forgot to pass the factors in an array as an argument to np.corrcoef. In the simple case of 2x2 correlation, you can pass them as two arguments and it will work. In higher dimensional cases you just pass all the time series in an array. I've updated the notebook.

Updated notebook works great with multiple factors now. Thank you - very useful notebook!

The explanation as to why the original notebook didn't work also helps me (slowly) get my head around Numpy and code in general.

Thank you Delaney! This will reduce the number of backtests I do dramatically.

Thanks Delaney, super helpful notebook to move even more of the workflow into the research environment.

Hi Delaney -

If I'm not mistaken, with the factor returns covariance matrix, one can straightaway find the factor combination weights that will minimize the returns variance (see Under the assumption that the past will be replicated perfectly in the future, one could use the code you posted above to find the covariance matrix, and then compute the factor combination weights once-and-for-all. It would be better, though, to do this optimization on a rolling basis, within the algo. Is this possible? It would require coaxing Pipeline to return the factor returns covariance matrix, in addition to the factor values, point-in-time.

Your notebook also suggests that one should consider minimizing the variance of the portfolio of factors, subject to the Q risk model constraints. Any thoughts on how to do this?

As a footnote, it would seem that most of the value to Q is in the alpha factors themselves, versus how they are combined and the portfolio is constructed (see Wouldn't y'all be happier if the crowd just fed you alpha factors, and then you could do all kinds of fancy dancy stuff to combine and trade them? Having authors be concerned with writing full-up trading algos would seem not to add much value, and potentially bears hidden opportunity costs. Given that you no longer support live real-money trading by the crowd, might it make more sense just to solicit raw alpha factors? Your compensation scheme would need to change (no more $50M allocation headlines and 10% of the algo profits), but from a purely technical standpoint, my sense is that you might be much better off in the long run. If there is interest in discussing this concept, I'd be glad to start another forum thread.

On my wish list is a notebook where I can provide two or more backtest IDs and get a correlation matrix either in Returns or holding space. This way one could easily check how correlated ones Q contest submissions are.

Hi I running the notebook provided but I found this error:

TypeError: cumulative_returns() takes 1 positional argument but 2 were given

Can someone help ?

Apparently last alphalens version breaks the pyfolio integration. It's reported here:

Hi there, im a quantitative analysis student and im starting to experiment some high frequency trading. Anyone could help me with some good introductive course on udemy or in other platform to learn Python applied to algoritm trading and machile learning?

Thank you soo much!

I am running the existing code in my notebook.But i am stuck in factor1 & factor 2 returns and getting error.

TypeError: cumulative_returns() takes 1 positional argument but 2 were given.

Please help me how to resolve this issue and explain me as well why i am getting this error.