alphalens - what does it do?

Is there a tutorial on what alphalens does under-the-hood? I understand that it is a signal screening tool, that spits out a lot of visuals, but how does it accomplish this? And what are the limitations?

30 responses

Not sure what you mean by "under-the-hood". If you're asking about implementation details, I suggest you just read the code here. It's surprisingly short and very educational.

If you have other kind of questions, ask away; I've used it a bit.

Thanks João -

I gather that Alphalens takes as an input a factor that takes a single numeric value per stock every day, and also prices per stock every day, and then attempts to say something about the predictive power of the factor.

Looking at https://github.com/quantopian/alphalens, the term "returns" is used, but how are the portfolio returns computed? How does alphalens determine the portfolio weights versus time from the factors? Or is it equal-weight? For alphalens to work, does the factor need to be formulated in a certain way (e.g. factor values predict returns with a linear model)? Does the factor need to be demeaned, and range between -1 and +1? And normalized, to gross leverage of 1.0? Or does alphalens take care of this (overriding the factor, which at any point in time may call for all long or all short or some mix)?

An alpha is an expression, applied to the cross-section of your
universe of stocks, which returns a vector of real numbers where these
values are predictive of the relative magnitude of future returns.

Is alphalens intended to help sort out a model (possibly nonlinear) that relates the factor values to future returns? Or is there an assumption that the factor effectively linearizes the relationship, so that returns are proportional to factor values?

From what I understand it works as follows. You pass alphalens two things: one is the time-series of the factor, the other is "the portfolio". The way I've seen it used this portfolio is passed to alphalens as a DataFrame, the columns of which are just the stocks prices. The relevant function here is utils.get_clean_factor_and_forward_returns. Nothing better than reading the documentation of the function here. Pay special attention to the description of the variables factor and prices. So I guess that to answer one of your questions, Yes, it's just equal-weighted stocks.

factor_data = get_clean_factor_and_forward_returns(factor, prices)


Then that function returns a DataFrame with a shape that alphalens likes, lets call it factor_data. Your job as you study a factor is to then pass factor_data into different alphalens functions. For example to compute your factor's alpha and beta you can take that factor_data and call performance.factor_alpha_beta(factor_data). See code here.

print factor_alpha_beta(factor_data)


Regarding factor returns, there's a function for just that: performance.factor_returns, which you can see here. I've never used it myself, but from what I understand it computes returns for the stocks as if they were weighted by the factor. I imagine that in this case it's important to have the factor ordered so that higher values of the factor are the "good" ones, and lower (negative) values of the factor are the bad ones. According to factor_returns docs, you don't need to demean the factor or anything, that function does it all.

In addition to this you can do other nice things on top, like segment your set of stocks by quantile, or by some grouper function which you define.

Is alphalens intended to help sort out a model (possibly nonlinear) that relates the factor values to future returns? Or is there an assumption that the factor effectively linearizes the relationship, so that returns are proportional to factor values?

At core the predictiveness of the factor is given by some linear regression: in the function factor_alpha_beta you can find this code:

        reg_fit = OLS(y, x).fit()
alpha, beta = reg_fit.params


where OLS stands for "Ordinary Least Squares".

Let me know if you have more questions.

Thanks João -

I'll have to spend some more time with this, when I get the chance.

Regarding this:

it computes returns for the stocks as if they were weighted by the factor

I think this implies that factors should be written as portfolio weights? Or is it simply good enough that return be a monotonic function of the factor (which would be fixed by ranking?)? I guess before I get into alphalens, I need to understand the requirements for an alpha factor, in the context of Quantopian...

I think this implies that factors should be written as portfolio weights?

I'm not sure what this means. What I meant by the phrase you quote is this bit of code in function factor_returns:

    weights = factor_data.groupby(grouper)['factor'] \
.apply(to_weights, long_short)


That is, if you have a bunch of stocks and their daily prices on the one hand, and you have a "factor" (which is a number of each day and each stock) on the other hand, then you can define a "portfolio return" which is the return of a portfolio made of those stocks where you're long stocks with positive factor and short stocks with negative factor.

EDIT: If I remember correctly the Sentdex tutorial talks about alphalens at some point. https://www.quantopian.com/tutorials/algorithmic-trading-sentdex

Presumably, alphalens takes the raw factor values, demeans and normalizes and then uses these as the portfolio weights, correct? It doesn't rank and then demean and normalize (which I recall seeing in some example algos).

The function get_clean_factor_and_forward_returns also ranks. There's a quantiles parameter (default 5, but you can change) which requires ranking and then binning the stocks according to the ranking quantiles.

Hmm? I guess I need to understand that detail. If my algo uses raw factor values (not ranked and not demeaned, passed to the Optimize API), but alphalens is working with ranked, demeaned factors, then it might not do a good job of predicting algo performance.

At the other end of the process, I'm wondering if all of the output from alphalens can be rolled up into one or a few simple figures of merit. Say I had lots of factors to evaluate, and wanted to score them without looking at each one individually using alphalens. For example, how would I automatically analyze the 101 Alphas (see https://www.quantopian.com/posts/alpha-compiler)?

@Grant, probably alphalens.tears.create_summary_tear_sheet is enough to understand the quality of your factor, so you might use that with the 101 Alphas and then call alphalens.tears.create_full_tear_sheet only on those factors that show good performance.

If my algo uses raw factor values (not ranked and not demeaned, passed
to the Optimize API), but alphalens is working with ranked, demeaned
factors, then it might not do a good job of predicting algo
performance.

The forward returns demeaning is performed only if long_short=True and it is useful to adjust the factor performance for a dollar neutral algo, but it doesn't mean you have to modify your factor values before passing it to the Optimize API. You also don't have to perform any ranking.

Briefly this is what Alphalens does: if you know the factor value for each stock and also their future price you actually know what are the expected returns for each factor value. So Alphalens compute the mean future (forward) return for the factor values, but it does so by dividing the factor in quantiles first, then averaging the future return of each quantile. Those are the values you see pretty much everywhere in the factor returns analysis, except for the plot called "'Factor Weighted Long/Short Portfolio Cumulative Return", that is the plot you and João Aparício talked about. In that articular plot the simulated portfolio is non based on quantiles but each stock forward return is weighted by the stock factor value and the weight is like this:
 factor_demeaned_vals / factor_demeaned_vals.abs().sum() 

In your algorithm you can actually use the same method to compute the weighting for your stocks (or zscore that gives you a similar weighting and it is already implemented as pipeline factor's method) but you can also use equal weighting. You can decide from the Alphalens output what is the best, just compare "'Factor Weighted Long/Short Portfolio Cumulative Return" plot to ""Cumulative Return by Quantile"

It is actually worth looking at the code as it is really linear and you can follow very easily what alphalens does.

This video is also very interesting (from minute 28)

Thanks Luca -

So I gather that aside from the Spearman rank correlation monotonicity test, the factor is raw, demeaned and normalized, but not ranked? Or does alphalens manage outliers by ranking?

I'll look at the code at some point, but generally I don't like attempting to read code (and having to unravel unfamiliar Python), when words and equations would be better. I was hoping for a write-up, but I guess none exists.

So I gather that aside from the Spearman rank correlation monotonicity
test, the factor is raw, demeaned and normalized, but not ranked? Or
does alphalens manage outliers by ranking?

Yes it is raw and also the demeaning and normalization is done only in "'Factor Weighted Long/Short Portfolio Cumulative Return" plot. The other return analysis plots deal with quantiles mean return, so once the grouping of the stocks by quantiles is done (on the raw factor as it wouldn't change the result doing so on a ranked or normalized or demeaned factor), the functions can forget about the factor values, they simply average the forward returns of the stocks (by quantile) using equal weighting.

Also, the long_short option I meantioned earlier is used to demean the forward returns, not the factor values (except for "'Factor Weighted Long/Short Portfolio Cumulative Return" plot).

The outliers are handled by the quantiles, once the outliers end up in one of the quantiles they get equal weight, so they are not a problem. This is not true for 'Factor Weighted Long/Short Portfolio Cumulative Return" where the factor value is used to compute the stock weighting, but there is one option that might help with that:

    filter_zscore : int or float
Sets forward returns greater than X standard deviations
from the the mean to nan.
Caution: this outlier filtering incorporates lookahead bias.


Or you can simply pre-filter your factor dataframe before calling alphalens, you can so remove the outliers by yourself.

EDIT: the factor alpha and beta are computed on a portfolio weighted by factor values, where the factor raw data is demeaned and normalized in the same way is done in "'Factor Weighted Long/Short Portfolio Cumulative Return" plot

Hey Grant, I'll have some free time later today, wanna discuss over skype or something similar? To be honest I could use some refreshing these ideas again.

Thanks João and Luca -

I think I now understand the "Factor Weighted Long/Short Portfolio Cumulative Return" plot--it is simply akin to an ideal backtest using the daily point-in-time raw factor as the portfolio weights, which has been demeaned and normalized (to a gross leverage of 1.0).

In the "Returns Analysis" table, I'd assumed that the first three values (Ann. alpha, t-stat(alpha), & beta) are computed from the "Factor Weighted Long/Short Portfolio Cumulative Return" plot, but I'm not so sure, since there is only one plot (I'd expect three plots--one for each of the three trading frequencies: 1-, 5-, & 10-day). Presumably, the columns labeled 1, 5, & 10 mean that the portfolio is rebalanced on a 1-, 5-, or 10-day frequency, with instantaneous daily factor values (not averaged over the 5- or 10-day periods). However, there is only one "Factor Weighted Long/Short Portfolio Cumulative Return" plot, so does it represent the "1" column in the "Returns Analysis" table? Why not show three plots in the "Factor Weighted Long/Short Portfolio Cumulative Return"--one for each of the frequencies, 1-, 5-, and 10-day? Or am I misinterpreting?

Also, I'm curious how alphalens handles the potential bias of the start date of the analysis (since the 5- and 10-day frequencies select out specific days)? Presumably, nothing is built in; the user would run alphalens using a bunch of start dates to see the effect.

I think I now understand the "Factor Weighted Long/Short Portfolio
Cumulative Return" plot--it is simply akin to an ideal backtest using
the daily point-in-time raw factor as the portfolio weights, which has
been demeaned and normalized (to a gross leverage of 1.0).

Exactly

In the "Returns Analysis" table, I'd assumed that the first three
values (Ann. alpha, t-stat(alpha), & beta) are computed from the
"Factor Weighted Long/Short Portfolio Cumulative Return" plot,

Exactly

but I'm not so sure, since there is only one plot (I'd expect three plots--one
for each of the three trading frequencies: 1-, 5-, & 10-day).
Presumably, the columns labeled 1, 5, & 10 mean that the portfolio is
rebalanced on a 1-, 5-, or 10-day frequency, with instantaneous daily
factor values (not averaged over the 5- or 10-day periods). However,
there is only one "Factor Weighted Long/Short Portfolio Cumulative
Return" plot, so does it represent the "1" column in the "Returns
Analysis" table? Why not show three plots in the "Factor Weighted
Long/Short Portfolio Cumulative Return"--one for each of the
frequencies, 1-, 5-, and 10-day? Or am I misinterpreting?

Maybe you are looking at an old plot?Check this out , you'll see a "Factor Weighted Long/Short Portfolio Cumulative Return" plot for each period 1, 5 and 10

Also, I'm curious how alphalens handles the potential bias of the start
date of the analysis (since the 5- and 10-day frequencies select out
specific days)? Presumably, nothing is built in; the user would run
alphalens using a bunch of start dates to see the effect.

Have a look at this discussion: How to interpret the cumulative return for N forward period in cumulative factor return plots?

Presumably, the columns labeled 1, 5, & 10 mean that the portfolio is rebalanced on a 1-, 5-, or 10-day frequency

No, Grant. In alphalens the "periods" refer to the time periods over which you want alphalens to investigate whether some predictive power exists. There is no re-balancing going on (and because of this, no sensitivity to frequencies selecting out specific days).

Here's the code of what alphalens does internally:

    for period in periods:
delta = prices.pct_change(period).shift(-period)


it's a simple pct_change() followed by .shift(-period).

Anyway, I've made a notebook studying the predictive power of the forward_earning_yield using alphalens.

10
Notebook previews are currently unavailable.

@João Aparício , in the context of "Factor Weighted Long/Short Portfolio Cumulative Return" plot the periods correspond to the rebalancing period of the simulated portfolio, so Grant assumption is correct if he was referring to "Factor Weighted Long/Short Portfolio Cumulative Return" plot. In general the meaning of periods is what you described though.

Thanks again -

I had been looking at https://github.com/quantopian/alphalens where only one plot is shown for "Factor Weighted Long/Short Portfolio Cumulative Return".

So, I gather that one piece of info provided by alphalens is to get some sense if the factor will fly for trading at 1-, 5-, & 10-day frequencies. However, it sounds like for this to make sense, the factor should be formulated to set the portfolio weights directly in the first place. It is a more restrictive definition than I'd been considering, but it seems to be inherent in the Q framework.

However, it sounds like for this to make sense, the factor should be
formulated to set the portfolio weights directly in the first place.

I am not sure to follow what you mean.

@Luca that is not clear to me...

You said

In that articular plot the simulated portfolio is non based on quantiles but each stock forward return is weighted by the stock factor value and the weight is like this: ( ... )

The function that you link to, named factor_returns calculates the weights in the following way:

    weights = factor_data.groupby(grouper)['factor'] \
.apply(to_weights, long_short)


That means that the weights will be different for each day! Try it out. Compute a factor_data DataFrame and do the steps inside factor_returns one by one.

In other words, if you ask alphalens eg periods=(5,10) it is NOT true that alpha lens will internally calculate two weight arrays, one with a rebalancing frequency of 5 days the other with a rebalancing frequency of 10. Instead, it calculates one weights array which changes DAILY, and then sum the returns across each asset given by those weights. Periods aren't rebalancing frequencies. They're the time-horizon on which you want to test for predictive power. If anything, this should be called daily rebalancing.

Am I missing something here? (I could link the updated notebook, but I'm afraid of spamming the thread even more. Do you want it? I could be wrong, I'm genuinely trying to understand this. But from the internal calculations I can't find the rebalancing you claim.)

@João Aparício I agree with you that periods are the time-horizon on which you want to test for predictive power. Anyway, solely in the context of "Factor Weighted Long/Short Portfolio Cumulative Return" plot they can be considered intuitively as rebalancing periods. How factor values that are computed daily can be used in an algorithm that rebalances every 5 or 10 days is explained here: How to interpret the cumulative return for N forward period in cumulative factor return plots?. It is also briefly explained in plot_cumulative_returns, the function that accepts in input the daily factor values and plot the cumulative returns plot.

@Luca In fact, it makes more sense that it is the way I'm describing. You have one factor per day (and per stock). If (for example) periods=(5,10) what alphalens is telling you are the returns on horizons (5,10) if you bought and sold every day according to the factor of that day, NOT the returns that you'd have if you bought and sold with frequencies (5,10). It's telling you the returns of ONE portfolio (rebalanced daily) on different horizons, not the returns of portfolios rebalanced with different frequencies.

@Luca thanks for the link. I have to go now, I'll come back to you with more thoughts after I've read it :-)

To be specific, on https://github.com/quantopian/alphalens/blob/master/alphalens/examples/predictive_vs_non-predictive_factor.ipynb, we have the plots:

"Factor Weighted Long/Short Portfolio Cumulative Return (1 Fwd Period)"

"Factor Weighted Long/Short Portfolio Cumulative Return (5 Fwd Period)"

"Factor Weighted Long/Short Portfolio Cumulative Return (10 Fwd Period)"

Intuitively, I'd think that these would represent portfolio updates at 1-, 5-, and 10-day frequencies, using the demeaned and normalized factor values as portfolio weights (1.0 gross leverage constraint). Is this correct?

This would seem to be the simplest thing done by alphalens--a back-of-the-envelope trading simulation. Once I understand it, then I'll move on to more complicated slicing and dicing.

Hey all,

Max made a lecture that goes over how to interpret the output of the plots. https://www.quantopian.com/lectures/factor-analysis

In general I think about it this way:

Thinking leads to hypothesis. Hypothesis leads to model. Model is tested in alphalens to see if it forecasts future prices, this is done primarily by checking if model's weights are linearly correlated with future returns. If yes then model's weights are fed into optimize, which take the alpha and apply risk constraints. The alpha will ideally survive constraining without losing all signal, but sometimes it won't. The full process is available for newer folks here:
https://www.quantopian.com/lectures/example-long-short-equity-algorithm

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

@Grant

Intuitively, I'd think that these would represent portfolio updates at
1-, 5-, and 10-day frequencies, using the demeaned and normalized
factor values as portfolio weights (1.0 gross leverage constraint). Is
this correct?

Yes and the details on how the updates are done at frequencies higher than 1 are in the links I provided earlier.

Luca -

Regarding:

However, it sounds like for this to make sense, the factor should be
formulated to set the portfolio weights directly in the first place.

I'm just saying that for the alphalens alpha & beta values to be representative, then a Q alpha factor simply needs to be the portfolio weights point-in-time. If it is not, then the alphalens prediction and the algo could be quite different.

Keep in mind that alpha is just leftover returns when beta is removed. I wouldn't really use alphalens to determine the alpha and beta of your factor. Alphalens determines whether a hypothetical alpha has any returns period. Once you've determined it does, you go to the next step of unbraiding what is alpha and what is beta. That is accomplished using more than just the market as a risk factor. We're working on tooling to enable people to do this better.

@Grant

I'm just saying that for the alphalens alpha & beta values to be
representative, then a Q alpha factor simply needs to be the portfolio
weights point-in-time. If it is not, then the alphalens prediction and
the algo could be quite different.

Actually before computing alpha and beta Alphalens demeans and normalizes the raw factor in the same way as it does when computing "Factor Weighted Long/Short Portfolio Cumulative Return". So the alpha and beta are relative to that particular method of demeaning and normalizing and if the algorithm does it differently it might get slightly different results.

Hi Delaney -

I don't understand:

I wouldn't really use alphalens to determine the alpha and beta of your factor

I'd expect that if I demean and normalize a factor, and then use it to set the portfolio weights in an algo, I'd get an alpha & beta similar to what I'd get using alphalens. Or am I misinterpreting? If this is not the case, how should one interpret the alpha & beta provided by alphalens?

I don't have time, but if someone wanted to give this a go, it would be interesting to see the comparison between alphalens and an algo, for a single factor.

Do you guys know if you can run Alphalens on infrequent data i.e. Economic Data released weekly or monthly? I have posted about this on the thread below. Sorry I am not technical enough to be able to decipher the input format & 'how it works' outside of the standard equity fundamentals.
https://www.quantopian.com/posts/help-with-alphalens-on-infrequent-data