Learning how to use Alphalens for factor analysis

Today, we published a new Quantopian tutorial: the Alphalens Tutorial.

This tutorial will teach you how to use Alphalens to analyze the predictiveness of a given alpha factor.

In How to be a Successful Quant , we said: "The hardest and most valuable work is identifying alpha". As such, we encourage you to incorporate Alphalens into your quant workflow to improve your alpha research process.

Feedback is welcome. If you have any questions/comments, please post them here!

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

17 responses

Hi @Cal,

Great tutorial and NBs! I especially find Lessons 3, 4, and 5 very helpful for me!

A few questions:

'1. In the Lesson 3 NB, you say that:

If it has an IC mean close to .1 (or higher) over a large trading
universe, that factor is probably really good.

Was this meant to be 'Mean IC' > 0.01 is really good? The Example Tear Sheet on GitHub doesn't even have an IC Mean of anywhere close to 0.1, and that alpha factor (from ExtractAlpha) seems amazingly good, no?

'2. Again, in the Lesson 3 NB, you don't seem to specify the number of 'quantiles' or 'periods' anywhere (unless I've missed it). Does AL use 5 quantiles and 1D, 5D, 10D periods by default, if nothing is specified? (This is what I tend to use as well)

'3. In the Lesson 4 NB (awesome NB, by the way - more of this please!), you say:

The point where the line dips below 0 represents when our alpha
factor's predictions stop being useful.

Wouldn't a consistently negative IC mean, mean (pun intended) that the factor actually is 'predictive' if you 'flip' it? :)

'4. Again in the Lesson 4 NB, I find it interesting that (in the last cell) you have a factor with positive Ann. Alpha, positive mean IC, but negative 'Mean Period Wise Spread (bps)' and you don't mention anything about why this may be the case (perhaps due to extreme outliers?)?

Thanks again for an awesome tutorial! I especially find all the #comments in the code very helpful, and I personally wouldn't mind a #comment after each line of code, explaining what each line does (maybe overkill for some).

On my wish list is also a template NB on how to best combine two (or more) separate factors that are 'predictive' on different Sectors. I personally find that quite difficult to implement, with my still very limited coding 'skills'.

1) No, there is no typo there. An IC mean of .1 would be really good. In fact, if your algorithm has an IC mean of 0.1, you might want to double check the algorithm to make sure there isn't some sort of lookahead bias.

2) Yes that is true. The Alphalens function get_clean_factor_and_forward_returns() uses 5 quantiles, and 1, 5, and 10 day forward periods by default. You can, and should change those values around, however! For example, if you are researching factors that only update once a quarter (63 trading days), you might want to change your forward looking periods to 1, 5, 10, 21, and 63 days.

3) That is the case, but if an alpha factor's IC mean isn't consistent with your economic rationale, you should be careful about just "flipping" it. That can lead to overfitting.

4) Thanks for the suggestion, I'll look into it when I do my first revision.

Thanks Cal,

For 1), ok fair enough, but how realistic really is an IC mean of 0.1 or higher? Is that really achievable and something we should shoot for? It seems almost impossible if the ExtracAlpha guys 'only' get a 1D IC mean of 0.013?

For 3), wouldn't an IC of -1.0 mean that my prediction is wrong 100% of the time, so if I did the opposite of my prediction (e.g. selling instead of buying) there's still very much predictive power in the model? I thought an IC of 0 meant there's no predictive power (or the prediction is 100% due to random chance)? That's what I gathered from the Investopedia page on IC anyway, but then you guys are the experts at this stuff, not me. :)

1) It will be extremely difficult to attain an IC mean of .1 for 1 day forward returns. In your opinion, does the tutorial make it sound like you should shoot for .1? If so, I might change the wording, because that kind of predictive ability is very rare.

3) You are actually right, perfect "noise" is an IC mean of 0. I have updated my answer above.

Thanks Cal,

1) Personally I would change it to > 0.01 (which I believe is quite achievable, and not bad in my opinion), as in my view, it's less misleading (obviously the higher the better). That's just my opinion though (which is often wrong), but perhaps there's a good reason to keep it at >0.1?

Thanks again for a great tutorial. I find it very helpful.

Here is an example where I expand on the optimization that is done in the alphalens tutorial. Basically, I optimize in the training period on the correlation lookback window and on the holding period. I use the information coefficient (IC) as the objective function as is done in the tutorial. See the summary comments in the notebook for more of an explanation.

42
Notebook previews are currently unavailable.

Such quality work @Michael, as always! Really appreciate your work!

@Michael: This is awesome and I think warrants an individual post!

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Thanks Thomas. I would be more than happy to create a separate post if you think it would help the community. I just put it in here because I felt like it added on to some of the good work that was done in the Alphalens tutorial.

@Michael: Definitely will be to the benefit of everyone. It certainly fits here but stands alone also.

My god but its slow. Horribly, horribly slow. It makes a back-test look lightening quick. Admittedly I am running over more than one year of data but I am using an appropriate screen and currently looking at single factors. I mostly have time to cook a meal while I am waiting for a single cell to run. Such is life.

And by the time I have returned after cooking a Michelin starred dinner, I have forgotten what changes I had made anyway.

Does anyone else find this tool so very slow to use? I wonder whether its the wrong time of day - more people in the US awake and hence the server is busy? Perhaps this also happens with the back tester?

And 51 minutes later still waiting for my pipline , pricing data cell to calculate.

I guess there are times when you just have to say "goodnight Sooty", pack up, and go an do something else.

Hi @Cal - Within the Returns Analysis table of an Alphalens tear sheet I'm noticing that my Mean Period Wise figures (top, bottom, spread) change meaningfully based on simple changes to the choice of forward periods passed to get_clean_factor_and_forward_returns(). Example:
Period Set #1: periods=(1, 2, 5, 10, 20)
Period Set #2: periods=(2, 5, 10, 20)

Assuming no other changes, the Returns Tables for #1 and #2 will show the same values for Ann. alpha and beta as expected; however, both the Top Quantile and Bottom Quantile figures for Mean Period Wise Return in Set #2 will be 2x their corresponding Set #1 values and the spread will consequently be off as well. How should I interpret of this behavior? Thanks in advance.

Reference code where period range is passed:

from alphalens.utils import get_clean_factor_and_forward_returns

merged_data = get_clean_factor_and_forward_returns(
factor=factor_data['alpha_factor'],
prices=pricing_data,
periods=(2, 5, 10, 20)

)


Hi Cem, thank you for the feedback.

I should add a note to the tutorial to show why that behavior is happening. The numbers passed to the periods argument all depend on the first number as the base unit of time.

For example, periods=(2, 5, 10, 20) will yield the same results as periods=(1, 10, 20, 40).

Does this make sense?

@Joakim Arvidsson what makes you say an IC Mean > 0.01 is really good? Is that essentially saying that the factor predicted future returns correctly only 1% of the time?

Hi @Evan Kim,

Sorry for the late response - I had missed this one.

I don’t think I ever said an IC Mean of > 0.01 is ‘really good.’ I said it’s ‘achievable’ and ‘not bad.’ At least not on a 1 day forward period. While it’s not quite the same, if you can find a factor that has a 1% edge vs ‘flipping a coin’ over 1 day forward period, that’s not at all bad in my book. If you can do that every day over hundreds of stocks you won’t be too disappointed I reckon. Assuming the other 99% is random, netting +-0 (minus trading costs) in a diversified and equally balanced long/short strategy.

Sure, a mean IC of 0.1 for 1D forward period is immensely better, but how achievable and realistic is that really?

That all said, I’m often wrong and far from an expert at this stuff, so if you or any other data science black-belts can show me how I’m wrong, I’d love to hear it! :)