Analyzing Alphalens Results

Here's a notebook I whipped up to practice getting accustomed to the system. The goal was to get the PEG ratio of the SP500 stocks, then rank them within each sector, normalized so the top rankings in each sector were roughly equally desirable. The final results is that the best ranked stocks have the lowest PEG ratio among their sector.

I'm running into some issues in understanding the alphalens results (the last cell). I watched this video (https://www.youtube.com/watch?v=v5IYcBxMDYE) to help me understand as much as I could, but some of the things I still have questions on are below.

1. Where it says "mean period wise return" in the "returns analysis" section, does that indicate that if I held the top quantile of stocks for the given number of days, I would get that level of returns (in basis points)? Eg I would average -1.618 basis points per 10 days for the top quantile?
2. Do the lines on the violin plots indicate the 25th, 50th, and 75th percentile of the spreads of returns?
3. In the "Top Minus Bottom Quantile Mean Returns" graphs, what is the shaded blue area in which the lines lie in?
4. I'm overall really lost about the Q-Q graphs lol. How do I read these?

And lastly, nothing to do with alphalens; can anyone give an economic reason for why my returns seem better for middle ranked stocks, and then drop off on either side? I wouldn't have thought stocks ranked by PEG ratio would behave in this manner, and I've double checked my rankings to make sure they worked as intended.

3
Notebook previews are currently unavailable.
16 responses

Hi Ma,

1. You're right -- mean period-wise return for a particular quantile refers to the average return of stocks in that quantile, broken down by period.
2. Right again -- the dashed lines on the violin plots indicate the 25th, 50th, and 75th percentile values (respectively, from bottom to top).
3. The shaded blue area illustrates an error band around the mean top minus bottom quantile returns. By default, the error band is one standard deviation above/below,.
4. You can read more about QQ plots here. In general, we want the distribution of our information coefficient to have fat tails (since we want to long/short exceptionally high/low values). As such, we usually want our QQ plots to have an "S"-shaped curve.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Just adding a note regarding question 1.

Where it says "mean period wise return" in the "returns analysis"
section, does that indicate that if I held the top quantile of stocks
for the given number of days, I would get that level of returns (in
basis points)? Eg I would average -1.618 basis points per 10 days for
the top quantile?

Almost :) The bars from different period length are plotted side by side so that we are able to compare them and easily spot the best performing period. Since we cannot directly compare the the 1 day mean return to 5 days (or any other period) mean return due to the cumulative returns effect, what Alphalens shows is the rate of returns. The rate of return is the value the returns would have been every day if they had grown at a steady rate.

To answer your question, if I held the top quantile of stocks for the given number of days, I would get that level of returns each day. Eg I would average -1.618 basis points every day per 10 days for the top quantile.

Thanks Lucy. Luca you bring a good point, but are you sure? Here's an example of a notebook where that would be ridiculous. The cumulative returns over these 8 months for the top quantile are about 1.5 or 5000 basis points. However, the "mean period wise return" says 753 basis points for the top quantile. There's no way this can be 753 basis points per day, since we're only netting 5000 basis points per 8 months.

Does it mean 753 basis points per 30 days? That gets a lot closer to 5000. Again, still confused as to what the "mean period wise return" numbers and bar graphs indicate. You're right that we can't compare 1 day returns to 5 day returns (or in this case, 30 day returns to 60 day returns) but this example shows that it can't be daily, unless I'm misunderstanding something else.

3
Notebook previews are currently unavailable.

Luca you bring a good point, but are you sure?

Never!

Nevertheless I am pretty familiar with Alphalens code and what I wrote above is the intent of Alphalens (the rate of returns is computed here, printed here and plotted here).

...but bugs are always lurking around corners. Are we seeing a bug here?

However, the "mean period wise return" says 753 basis points for the
top quantile. There's no way this can be 753 basis points per day,
since we're only netting 5000 basis points per 8 months.

Let's say we have a top quantile whose daily mean returns over a period of 4 days are: -700, -700, -700 and 3000 bps.

Quantile mean return: (-700 -700 -700 + 3000) /4 = 225 bps

Expected returns after 4 days: (1.0225^4)-1 = 0.093 or 9.3%

Actual returns after 4 days: (0.93*0.93*0.93*1.3)-1 = 0.045 or 4.5%

This is just an example but the general rule is that you need to know the actual daily returns to compute the cumulative returns, the average daily returns information is not enough. So your reasoning doesn't say much regarding the correctness of Alphalens results.

Anyway, if you find a bug please report it to the github project page so that it gets fixed.

EDIT: fixed after Guy's post

To be honest I still don't follow. When it says "mean period wise return" does it mean per 30 days (or wtever the period is) or does it mean per day?

If it means per day, I don't understand why my results are showing something as high as 753, it should be around 25. Even if the order of returns is totally wonky, that seems like a massive discrepancy.

And if it means per period instead, then we have to do a little more mental interpretation for the graph "Mean Period Wise Return by Factor Quantile", right? That also would be rather strange.

Thanks for your help, hoping you or someone else can still walk me through this lol.

Ah, right. I forgot another important detail that explains the mystery we are seeing. if you look at the code I linked above (the code doesn't lie) you'll see that Alphalens uses the shortest period length as the base time over which computing the rate of returns. This is in line with the idea of allowing a comparison of the performance of different period lengths. In your NB the base period is 30 days, so what we are actually seeing is 30 days rate of return.

Try adding 1 day to the periods list and you'll see everything converted to 1 day rate of returns and the results are in line with expectations..at last!

0
Notebook previews are currently unavailable.

@Luca, looking at your post: https://www.quantopian.com/posts/analyzing-alphalens-results#5b4920eb402af7003f3aaa57

There is a difference when looking at returns using points as opposed to percentages. In the first case, you have a linear equation while in the second you are compounding returns.

To make this more evident, double the series as thus:

average basis points: (-700 -700 -700 +3000 -700 -700 -700 +3000)/8 = 225 bps

average return: (0.93*0.93*0.93*1.30*0.93*0.93*0.93*1.30) – 1 = 0.0934

The average return more than doubled due to compounding.

When you look at compounded returns, it does not matter in which order the returns are taken. Switch the numbers around in the average return series and you will get the same answer.

@Guy -

thank you and of course you are right, everybody knows that the multiplication has the commutative property. I don't know why I wrote the sentence "the general rule is that the order in which the ruturns happen matters" because that was also irrelevant to my point.

The point is that you cannot guess the final value of the cumulative returns knowing the average daily returns. Let's apply my reasoning to your series:

"Expected" (not the statistical meaning) cumulative returns after 8 days : (1.0225^8)-1 = 0.1948 or 19.48%

Actual returns : (0.93*0.93*0.93*1.30*0.93*0.93*0.93*1.30) -1 = 0.0934 or 9.34%

Again, you cannot guess the actual cumulative returns from the average daily returns. It is not the order that matters but the actual values of the daily returns that matter.

Do you agree?

Luca,

I think I understand it now! So the mean-period wise returns gives the mean return in terms of whatever the base period is, and the bar graphs also represent that mean return over the base period. So in the example I gave, it was period per 30 days. In the notebook you modified, it was period per 1 day.

Thanks so much for your help.

Exactly! I am happy we figured it out what was happening ;)

@Luca, agree on the first part. However, on your last point, not so much.

What you are looking at is the difference between a linear representation of return as in (1+r∙t), and its compounding counterpart: (1+r)^t. Those will be equal if t = 1, and the trivial case where t = 0. Prior to t = 1, the linear side (1+r∙t) gives a higher value. At t > 1, compounding wins where it will always give a greater number than (1+r∙t ).

If the values are representative of what is being done. Then both versions could be used to extrapolate. For t > 1, you will always have (1+r)^t > (1+r∙t). However, you can also find a point where the following holds: (1+r_a)^t = (1+r_b∙t) where r_b will increase exponentially to catch up with (1+r_a)^t.

The important point might be: are these 4 values representative of what is being analyzed? If yes, then you can make “guesses” or estimates of what might come your way. It might be better to have a ballpark figure than nothing at all, on the condition, evidently, that the numbers are in fact representative.

Note that a sample of 4 is not hard evidence and could not be considered as a satisfactory and statistically significant sample of the return population. To make even a “reasonable” estimate, you would need a lot more data than that.

A strategy's payoff matrix has for equation: F(t) = F(0) + Σ(H∙ΔP) – Σ(Exp.) = n∙x_bar = n∙u∙PT% where PT% is the average percent profit per trading unit. The equation would tend to accept an estimate based on a large n where the law of large numbers would tend to apply.

@Guy - However you put it, you still cannot claim that given an average daily return value you can infer the cumulative return value. So I simply didn't want to dig into the details of the NB without having a sound proof of the existence of a bug. My policy is: first show me the bug, then I might fix it but don't make me do your homework ;)

@Luca, I have no homework to do.

Another question came up if someone could answer. In the attached notebook, in the tearsheet in the very last cell I'm getting a little confused by the graphs. The bar graph shows that I have negative mean returns for my highest quantile, yet the violin chart shows I have positive mean returns for my highest quantile (if I'm correct in assuming that the middle line indicates the mean).

Am I reading these wrong, or is this contradictory info? The rest of the tearsheet indiciates that the bar graph is correct, but I'd still like to know why the violin chart is showing positive mean.

1