Back to Community
Alphalens Questions Thread

Hello Folks,

TL;DR Ask questions about Alphalens here and we'll answer them in comments and/or webinars.

During a recent chat with community member Joakim, it became apparent that a similar thread to our Tearsheet Feedback Thread would help people understand Alphalens. Alphalens is currently underused by our community members, but is an incredibly powerful tool that allows you to eliminate a lot of frustration and time from the strategy development process.

Every trading strategy depends on a predictive model, Alphalens tests whether a model is predictive without getting bogged down with any of the details associated with a full backtest. Our Getting Started Tutorial and Factor Analysis with Alphalens Lecture are both resources you can use to learn more about Alphalens on your own.

If you develop a strategy in the backtester, you're going to spend a ton of time debugging an algorithm and making sure the portfolio optimizer works well, plus running long backtests. With Alphalens, you can just quickly test whether your model is predictive. Most ideas fail at this testing stage, so with alphalens you can fail faster and iterate, versus spending way more time testing each idea. Max talks about this workflow in this video. If you want a deeper dive, I go through the whole process of turning an idea into an algorithm in this webinar.

Once you've determined your model/models are predictive, you can drop them into our template to further test whether they survives real market conditions like slippage and transaction costs.

Please leave your questions on Alphalens here so that one of us, can respond via a comment and/or via a recorded webinar or video. We'll run webinars or record videos whenever we have enough questions to go through, up to a reasonable total numbe. We're not sure of the response we'll get here so please work with us as we adapt to the level of interest. If you'd like you can also attach the specific Alphalens output or tearsheet that you'd like help with.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

125 responses

Thanks for this!

I do have a lot of questions around Alphalens and its associated statistics when researching alpha. Hopefully I'm not the only one. Most of my questions are related to the 'Information Analysis' table (see top of second to last cell output in the attached Alphalens Template).

  1. Does a positive IC skew mean that there's more predictive information in the top quantile (for longs) and vice versa for negative skew? If not, what does the IC skew tell me?
  2. If I set the 'null hypothesis' threshold to 5%, and if the p-value is below 5% for the 5 day holding period but above 5% for 1 day (or vice versa), do I accept or reject the 'null hypothesis?' What 'bias' am I vulnerable to if I 'cherrypick' holding period based on what the IC p-value tell me?
  3. Does choosing a longer test period reduce the likelihood of getting a false-negative p-value? (maybe I should be more concerned with false-positives?) What's a good balance? Is one year + one month of future returns sufficient?
  4. If I still 'believe' in a factor, but p-value > 0.05, is it wrong / bad practice to test a different time series to see if it returns a p-value < 0.05? Essentially p-value hunting to 'fit' my hypothesis (did I just answer my own question there?). It's possible that a factor has predictive power during some time periods, and not during others though, correct? (non-stationarity (?))
  5. Similar to above, for 'false-positive' p-values (below 0.05) and IC, in general, is it a good idea to re-test a 'good' factor (p-value of <0.05 and high IC) during a different time period (cross-validation(?)), to ensure it wasn't just a 'false-positive' during the first test? If so, by how much does a second test reduce the likelihood of a 'false-positive' if both tests have p-value of <0.05 and high-ish IC?
  6. If a 'combined_factor' has some individual factors with p_values above 0.05 and/or negative IC during the test period, but when removing them from the 'combined_factor' results in significantly lower returns in backtests (i.e. fitted to market noise), what should I do with the 'bad' factors? Should I try to see if any combination of non-predictive factors, when combined have a p-value of < 0.05? Or just remove them altogether?
  7. Does a higher IC Kurtosis mean that there are more extreme outliers and therefore a stronger case for 'winzorizing' the factor? My factors tend to have fairly low IC Kurtosis values (well below 3, which I believe is the peak for a normal distribution (?)). Is this good or bad?
  8. IC Std. - the lower the better in relation to the IC Mean?? What does IC Std. actually tell me?
  9. Risk-adjusted IC - is this 'volatility' adjusted IC? If so, is it essentially: IC / IC Std.?
  10. T-stat(IC) - what does this tell me, in relation to alpha research? Higher the better?
  11. In this Q Short video on IC and P-values, Delaney mentions at around 3:15 in the video that there's currently a lot of debate whether p-value analysis is meaningful or not. Could you expand a bit on this please? Is there a school of thought that argues that p-values should be seen as 'relative' (i.e. the lower the better) rather than binary?
  12. In the graph, IC Observed Quantile (y-axis) to Normal Distribution Quantile (x-axis), does one wand to see the bottom left tail of the S-shaped plot to be above the divider line, and the upper right S-tail to be below the line? Or it doesn't matter as long as it's a clear S-shape?
  13. Does an IC value of 0.1 mean that the factor if predictive 10% of the time, and the other 90% of the time it's just random noise / coin-flipping? And an IC value of 1.0 means the factor is predictive 100% of the time?

Sorry for the many (basic) stats questions but I'd like to understand all of this better in relation to alpha research in Alphalens.

Here's the Alphalens notebook template...

Loading notebook preview...
Notebook previews are currently unavailable.

Sorry, a few more:

'14. How can I check the correlation of two different alpha factors?

'15. If I combine two uncorrelated alpha factors that I've found, is there any need to run the combined factor through Alphalens (over a different training period?)?

'16. By combining two uncorrelated (and seemingly conflicting) alpha factors, for example one Momentum factor and one ST Reversal factor, wouldn't you lose alpha by combining them?

'17. Taking the above example again, one Momentum Factor and one ST Reversal factor combined into one. The 'Strategic Intent' for the Momentum factor is the Newtonian idea that 'stocks in motion tend to stay in motion' whereas the 'Strategic Intent' for the ST Reversal factor is essentially the opposite (Mr. Market overreacts in the short-term). Is it ok to have separate Economic Rational for each factor, or is it not a good idea to combine two factors that have conflicting rationale (as they might be negatively correlated (?))?

Hi Joakim,

An effective way to justify the use of factors is to support it with published research or working papers.

For example: "Following Jegadeesh and Titman (1993), I include a momentum risk factor."

Returns to Buying Winners and Selling Losers

Similarly, when it comes to justifying factor diversification: "Fisher, Shah, and Titman (2017) illustrate the benefits from combining value and momentum in the same portfolio."

Combining value and momentum

Thanks @OC!

Will be reading that second paper you linked to.

All,

I also just wanted to mention that I've found many of the lectures very helpful in explaining a lot of this (though I still have those above questions), especially 'P-hacking and Multiple Comparisons Bias' and 'Spearman Rank Correlation' lectures.

I wish there were videos available for the 'Hypothesis Testing' and the 'Confidence Intervals' lectures as well, though the Notebooks and Exercises are very helpful too.

It would seem that for Alphalens to be helpful, the risk model would need to be incorporated. For example, say I decide to cook up a factor, and it just happens to be similar to one of these:

from quantopian.pipeline.experimental import Momentum, ShortTermReversal, Size, Value, Volatility  

I could spend a lot of time analyzing and perfecting my factor, only to have a very unpleasant surprise when I try to use it in an algo for the contest/fund.

Any thoughts on how to address this problem?

Would there be any way to roll up all of the information spit out by Alphalens into a single figure of merit? This would allow it to be used for screening a large number of factors (e.g. factors such as the ones described here: https://www.quantopian.com/posts/the-101-alphas-project).

Thanks a ton for all the questions, I'm working internally to set up a time to get them answered. Been a bit busy with the recent news:

https://www.quantopian.com/posts/quantopian-funds-over-50-dollars-million-to-a-single-strategy-155-dollars-million-overall

Great news from Fawce! Congratulations to everyone!

A few more Alphalens and alpha research related questions from me. Sorry for the multiple posts. I hope it's ok to just post questions here for now as they come up?

'18. If I get a p-value below my cutoff (say < 0.05) and a reasonably positive IC and risk adjusted IC (say 0.01 and 0.1 respectively), what might be the reason for getting negative Annual Alpha numbers? What's a recommended action to take if IC is positive but annual alpha is negative?

'19. Is there a way to 'hide' a cell in an Alphalens Jupyter notebook, e.g. the cell containing the 'alpha factor'? People might be more comfortable sharing Alphalens notebooks if there's a way to not disclose the actual alpha factor.

'20. Am I at risk of overfitting if I 'tweak' an alpha factor (or its parameters) to either get the p-value below my cutoff, to maximize IC (or volatility adjusted IC), or to maximize Annual Alpha?

'21. Is there a good balance of recommended number of time-series to cross validate against (to minimize the risk of false positives) without risking 'multiple comparison' biasing?

Joakim,

To answer a few of your questions on the IC:

'9. Risk-adjusted IC = (Mean of the daily ICs / Standard Deviation of daily ICs)

'10. t-stat is inversely related to the P-value. The higher the t-stat, the lower the p-value. (assuming a constant degrees of freedom)

'14. There is currently a feature in development on Github to measure correlations and interactions of factors in alphalens (although the development has been going slowly as I haven't had time to work on it recently). That said, you can simply use numpy's or pandas' built-in correlation function to calculate the correlation either between the factor values themselves or the returns of the factors (e.g. maybe you set the factor return = highest quantile - lowest quantile. Put these in two columns and calculate the correlation). You can also do a rolling correlation as well. Additionally, for comparing multiple factors you can run a multi-variate regression between them to evaluate the strength of any linear relation.

I noticed a lot of your questions focused a lot on the p-value of the information coefficient. I wouldn't give much weight to this based on the alphalens output of this value for periods greater than 1-day. If you increase the return period to greater than a day, you run into the "overlapping data problem". Essentially, what this means is that your observations of the IC are no longer independent (which is an assumption of the t-test that is used in alphalens). Therefore, you are going to overestimate the confidence level that you have in your result (i.e. you will get an artificially low p-value). The ICs are not independent because if you calculate the IC of your prediction/weights with say a 5-period return at time t=0. At time t=1, you are going to be using 4 of the same time periods that were used in the prior calculation (this is why it is called the "overlapping" data problem). I would submit a fix for this, but I would need to research the best way to solve it. If anyone reading this has a suggestion, please submit it to Github!

Also, if you weren't aware already, you can take a look at how a lot of these things are calculated in the tearsheet by looking at the source code on the alphalens Github (as long as you don't mind digging through the code a little bit).

Hi Michael,

Thank you - very much appreciated! Are you one of the developers of Alphalens?

Regarding the p-value of IC and the 'overlapping data problem', thanks for making me aware of this issue. I should have made it more clear that most of my questions are for 1-day holding periods (except for Q2. essentially I believe).

For Q9. would you say that the Risk adjusted IC value is more important than the Mean IC. Or in other words, should be significantly higher than the Mean IC?

For Q10. do you know what a reasonably high t-value is for a p-value cutoff of 0.05?

Thanks again for your help!

Joakim,

For Q10. Usually t-values larger than 2 for a p-value < 0.05, and a t-value less than -2 for negative relationship(i.e. negative correlation). T-values are easier to read when p-values get too small.

For Q16. You want uncorrelated alphas as long as the alphas are both positive. For example, 12 month momentum and 1 month mean reversion(negative 1 month momentum) sometimes have positive alpha. The holy grail is finding negatively correlated alphas that have positive alphas which is very hard to find in the stock market. An example might be the SP500 and US treasuries in times of market crashes.

Hi @IM,

Thank you! I appreciate you taking the time answering these for me!

Q10. Thanks! Makes sense and is in line with what I've seen in my research.

Q16. Also makes sense, thank you! When you say 'positive alpha' do you mean 'positive IC' or 'positive IC' in combination with 'positive Annual Alpha' (both assuming p-value(IC) is below the cutoff), or something completely different?

Thanks!!

The risk-adjusted IC is very similar to the t-stat. The t-stat is simply the risk-adjusted t-stat times the square root of the number of samples. You are trying to get a sense for how likely is it that the "true population IC" is different from 0. So, if you want to simplify things, use the t-stat along side the mean IC.

As Indigo Monkey said, a t-stat greater than 2 will yield p-values less than 0.05 for large sample sizes. For smaller sample sizes, you may need a t-stat greater than 2 to get a p-value less than 0.05.

Regarding your question as to whether I am a developer on the alphalens project, I have made a few small contributions to the project, but they are relatively insignificant compared to the work others have done on the project.

Is there any way to see cumulative returns for long only portfolio? the alphalens.tears.create_full_tear_sheet(factor_data) shows me factor weighted long/short with detailed axes. It also shows quantile-wise returns, but often there is only one tickmark on the vertical axis (if at all).

I'd like to see returns by quantile, but with more information than the current graph produces. Here's a picture of what it shows right now, I want to actually see the numbers on the vertical axis: https://imgur.com/a/YszS7yh

Hi,

I'm new to Quantopian and have used alphalens, although I think there are of no use. For instance, how do you explain a backtest showing negative or small positive returns while the alphalens' alphas are all positive with values ranging from 0.042 to 0.069 and ICs above 0.01 and significant? How do you choose the number of quantiles to use and the number periods? If Alphalens can predict forward returns of 1, 5 and 10 days, how can this be useful in the long run.

Thanks

Please keep posting questions as they come up, I've started gathering questions and will go through and prepare answers as fast as I can. It's going to be first come first served for now.

Joakim

To answer your follow up question. I think a positive ic and a positive alpha are both important and are related to each other when building a factor models. A high risk adjusted ic seems to be more important when you need to describe a large group of stocks. With a smaller group of stocks, you mostly care about the upper and lower quantiles. Thus, ic becomes less important than alpha (spread) between upper and lower quantiles. But both are important in either case.

Hi Delaney -

I suggest rather than providing general feedback as a series of posts to this thread, incorporate it into your original post by editing it (see Jamie's example here).

For the framework y'all have in mind, I get the impression that each Pipeline factor should be run on the entire QTradableStocksUS universe, however, one could also pick a different universe for each factor, which makes the problem considerably more complex. It would seem that for certain types of factors, simply running over the entire universe would be fine, but for others, restricting the universe would be important (at this point, this is intuitive on my part...I'll try to explain after some mental percolation). For the latter, how would one use Alphalens to figure out how to restrict the universe to the ideal one for a given factor (in an optimal fashion, versus just fiddling with the universe definition manually)?

A suggestion for illustrative purposes would be to consider how to leverage your new Self-Serve Data API to synthesize data and factors with controllable characteristics. This way, we aren't always dealing with sketchy, real-world factors, that don't do diddly, and nobody cares about, and lead to confusion, etc. Just synthesize the factors with "knobs" so that you can illustrate specific points, without all of the noise of the real-world.

For instance, how do you explain a backtest showing negative or small positive returns while the alphalens' alphas are all positive with values ranging from 0.042 to 0.069 and ICs above 0.01 and significant?

Most likely, because of turnover costs. One trick you can use to assess the costs is to set the slippage to zero in the cost model.

Hi,
This is a duplicate question from another thread I made but I took advice from a Cream Mongoose who told me to post it here.
Sorry for the likely very basic question as I am new to this. Nevertheless...

I am using Quantopian to make my first algorithm and have been stuck on doing some factor analysis for the past few days. In the lecture on fundamental factor models, they show how to estimate the risk premiums for your factors, using some code for regressions on returns.
I understand that the estimates for risk premia is the additional reward in your algorithm you get for exposing yourself to that fundamental factor. However, when analysing a basic 6-month momentum factor I get a negative risk premium coefficient using the code in the lecture. Despite this, when analysing the factor individually using alphalens over the same time frame, it clearly has some positive predictive power of returns, (the 5th and 1st quantiles of equities have clear positive (0.59) and negative (-0.16) mean averages of returns).

Why is this?
and...
Should you t-test the risk premiums to decide if factors are statistically significant, or the coefficients from the initial regression of your factors against returns?

Thanks for the help.

@ Joakim (or anyone who can answer) -

Above, you post "the Alphalens notebook template" but where did you get it? Did you grab it from https://www.quantopian.com/lectures/factor-analysis-with-alphalens and strip out all of the comments? Or is there a separate template somewhere?

Also, I'd like to run as far back as possible. I recall that one has to do something special in the research platform, so that Pipeline chunks data (in the backtester, it is done automatically). Does anyone know how to do it?

How far back is that OLMAR mean reversion factor that can run as far in Alphalens, Grant?

Hi Grant,

Yeah, basically a lot of cutting and pasting from the notebook in that lecture to get something close to what I wanted. You or someone else might come up with a better one. If I come up with a better one I’ll share it in the forums.

Basically I just want the table with the IC and p-values as a start. If that looks good then the Ann. Alpha with quantile spread mean returns etc. If that also looks good then all the graphs.

Think I can probably figure this out from the docs, just need to spend some time with it.

Hi Karl -

For folks unfamiliar with the "OLMAR" reference, see this paper:

https://icml.cc/2012/papers/168.pdf

Note that it is long-only.

Note also that if you write a Pipeline factor, it may not run optimally, since you'll only have access to the last closing price. I would write a factor using minutely data (smoothed), and run it through Alphalens (which is possible, as I understand...perhaps Delaney can provide an example/template).

Since it is based on price only, it can run as far back as the Q price data will support (~2002, I think).

Thanks Joakim - I guess I'll hack something together, since there appears to be no general template.

@ Delaney -

Would there be any way, either by Q or by other users, to get paid for individual Pipeline factors, in addition to compensation for the contest or the fund? My thinking is that within the framework of a multi-factor strategy, much of the value is in the factors, and the rest can kinda be reduced to practice. Basically, if I focused all of my time researching factors in Alphalens, and came up with some good ones, do you see any path for me to get paid for my efforts? Or would I need to combine them in an algo, enter the contest, wait six months, etc.? I'm thinking that out-of-the-box, without any out-of-sample data whatsoever, there ought to be a a market value for raw factors. And then if there were a way, within the Q system, to have them age, as for algos, they could appreciate in value (e.g. like art, cheese, wine, and Scotch). It just seems that if Q quants were incentivized to focus on the alpha factors (the alpha1, alpha2, etc. in the flow diagram), then y'all could sort out how to do the rest (and probably do it much better than individual quants would). I recognize that this is a significant departure from your current system, but I keep getting the message that it's all about the alpha factors and their predictability, so why not reward production of good alpha factors directly? I also figure that traditional "brick-and-mortar" hedge funds approach things this way. There's incentive to individual quants for delivering specialized alpha factors (e.g. sector, style, individual company, market, alternative data, etc.), but then each alpha gets melded with the rest, into a big-honkin' portfolio. There's the problem of how to attribute which returns to which factors, but I'm sure there's some way to sort an equitable compensation scheme.

Maybe the new Factset arrangement will facilitate this? If I come up with a nice factor, I could contact them, and sort out how it could be licensed, and then provided, for a fee, to the Quantopian Community?

@Grant,

I believe Q already has the infrastructure to do exactly what you are proposing with Self Serve Data and AlphaLens. The framework of Self Serve Data is an offshoot of what Q already uses with their Premium Dataset vendors. AlphaLens can serve as a validation mechanism of your discovered alpha. If you submit both and it passes Q's stress test then it is possible to sell your discovered alphas as a premium dataset to the general users and get compensated for your efforts. Just a thought.

@James, would you think AlphaLens can be used to discover alpha? Appears that is the intent "as a substitute to running extensive time consuming backtests". From first impressions it appears to be a platform that can be used to validate your alpha just like you mentioned.

@ James -

Yeah. It's a bit of a tangent for this thread, but I figure that the basic gist here is that if AlphaLens can show the goodness of factors, then perhaps that goodness could be turned directly into dollars.

@Leo,

To me, a long backtest that takes into account different market regimes and a multiyear holdout period for validation would be the preferred approach for discovering and validating alphas. While AlphaLens may appear to be a subsitute to extensive backtests, it does not consider transaction costs and therefore I prefer to look at it more of a validating mechanism of discovered alpha after running it through the extensive backtest. I always prefer to analyze alphas as simulated to as close to actual trading as possible then verify results with the statistics of 1 day forward returns of Alphalens.

@Grant,

Yeah, why not. If one can make money off discovered alphas it's a win-win-win for all.

@James,

Thanks. Yeah, your suggestion makes perfect sense to me to analyze/discover alpha using a process as close to actual trading as possible.

I will use AlphaLens for final validation along with holdout data.

Delaney -

Another question -

How does one know what the "default" (presumably preferred) settings are for an AlphaLens analysis? I just gave Joakim's notebook above a run, with a different factor, and got an error (see https://www.quantopian.com/posts/alphalens-maxlossexceedederror). Is there some way to set the thing up so that it runs for a typical factor (I used the risk model Momentum factor, which is nothing fancy), without barfing?

Just starting to get a feel for the AlphaLens thingy. Here, I run the style risk factors, combined:

test_factor = Momentum(mask=base_universe).zscore() + \  
              ShortTermReversal(mask=base_universe).zscore() - \  
              Size(mask=base_universe).zscore() + \  
              Value(mask=base_universe).zscore() - \  
              Volatility(mask=base_universe).zscore()  

I set the sign of Size and Volatility to -1, since based on the returns, this is appropriate for the analysis period (long-term, maybe it isn't justified).

One interesting exercise would be to hear from the Q team:supposing one wanted to create the best possible algo from the result, how one would go about it? How would one translate the AlphaLens results to actual implementation in an algo?

Loading notebook preview...
Notebook previews are currently unavailable.

'22. One of my factors have the following Information and Return analysis. Should this factor be rebalanced daily, weekly, monthly, or some other period? Is the increasing IC mean and risk adjusted values likely due to the 'data overlapping issue' that Michael informed us of above? Or is there likely something else 'funny' going on here? Some of it looks a bit 'too good to be true' in my very limited experience with AL.

Information Analysis
1D 5D 10D 21D 42D 63D
IC Mean 0.015 0.029 0.035 0.034 0.044 0.054
IC Std. 0.109 0.112 0.103 0.089 0.087 0.090
Risk-Adjusted IC 0.134 0.258 0.339 0.384 0.508 0.595
t-stat(IC) 3.054 5.867 7.716 8.739 11.553 13.535
p-value(IC) 0.002 0.000 0.000 0.000 0.000 0.000
IC Skew 0.077 0.059 -0.059 -0.227 0.073 0.088
IC Kurtosis 0.029 -0.016 -0.364 -0.195 -0.437 -0.600

Returns Analysis
1D 5D 10D 21D 42D 63D
Ann. alpha 0.091 0.089 0.075 0.043 0.029 0.038
beta 0.083 0.017 0.041 0.051 0.018 -0.051
Mean Period Wise Return Top Quantile (bps) 2.920 2.470 2.090 1.169 0.944 1.153
Mean Period Wise Return Bottom Quantile (bps) -4.282 -4.385 -3.822 -2.672 -2.197 -2.084
Mean Period Wise Spread (bps) 7.191 6.877 5.930 3.814 3.122 3.246

'23. Is a high Autocorrelation value a good or a bad thing, and how does it relate to the 'turnover' figure? Does one want both to be as low as possible?

'24. Autocorrelation seems to be decreasing with longer holding periods (with Turnover increasing). Is this normal/expected behaviour? Good/bad?

For the periods, e.g. 1D, 3D, 5D, 10D, 21D, are the stats computed every day on a rolling basis? Or for the 10D time frame, for example, would the stats be computed every 10 days, skipping days in between?

@Grant,

I believe it is a rolling basis. Was running some tests earlier and the cumulative 'returns' calculated from a daily 63D series were absurd.

That being said I got a question for anyone with input. I have been running basic tests -- correlation, cumulative returns, etc. -- based on the Factor Risk Exposure lecture (i.e running a single combined pipe w/ the best and worst performers for each factor, and subtracting the returns). Is there any merit to using the AL 1D series to calculate factor correlations and run MLRs w/ other factors? Would this be a supplement to the original strategy or a replacement?

@ Delaney -

More questions for your list.

For the Information Coefficient (IC), there are plots of the Observed Quantile versus the Normal Distribution Quantile. Here is an example:

What is one to make of the red line? Is it a fit to the data? Or just a guide to the eye, representing an ideal normal distribution? Generally, would one expect the IC to conform to a normal distribution? What are we to make of deviations from normality (e.g. so-called "fat tails")?

In the plot, are all the time series data shown? If not, what is shown?

Since the data come from a time series, does AlphaLens provide an easy way to identify where in time the points are coming from? For the example I've provided, how can I pinpoint the origin of the fat tail (e.g. a certain period of time, perhaps related to the Great Recession period, or just distributed uniformly over the whole time period)?

There are some statistics provided:

Presumably, there is a normality assumption in some of them. If the IC data don't fall on a straight line, indicating perfect normality, how can one determine whether that stats are valid? As I recall, one has to be careful that normality is a valid assumption, before applying certain statistics.

Hi Delaney -

Any guidance on what to use for price for AlphaLens? There would seem to be a lot of options here, beyond the daily open price. Particularly for the longer time frames, it would seem to make more sense to use smoothed prices, versus the open (which is derived from a single trade, as I understand).

Does alphalens take into account dividends? If so, how?

Here's AlphaLens run on:

class RandomFactor(CustomFactor):  
        """ Random factor """  
        inputs = [USEquityPricing.close]  
        window_length = 1  
        def compute(self, today, assets, out, prices):  
            out[:] = np.random.random(size=prices[-1,:].shape)  

Thought I'd show what a monkey-on-a-keyboard looks like.

I'm not sure how to interpret the sector results. I'd think that for a random factor, AlphaLens would not show systematic overall gains/losses, by sector. However, it does. So, another question for Delaney is why?

Loading notebook preview...
Notebook previews are currently unavailable.

A small question:
Is there a way to get the correlation between n factor using alphalens?

Would it be possible to use AL (or otherwise) to analyze a factor's predictive power of future volatility (rather than future returns) and/or future risk adjusted returns? Stupid question perhaps, I haven't really thought this through and don't know if/how this would be useful even if it's possible. Ignore if so.

I've been wondering if Pipeline can be set up, for multiple factors, to spit out a trailing window of returns per factor and the variability in returns per factor? These data could potentially be used in combining the factors (e.g. minimize the variance). Also, effectively, this would allow one to embed AlphaLens-like analysis in an algo (e.g. to automatically manage "alpha decay" across factors).

@me @Joakim
So I am trying to compute the correlation between factors. One simple way is to just compute the correlation of the output of alphalens. But I would be happy to have a more specific correlator (by sector, equity, or period).

I tried to do a selection using Panda DataFrame.filter but the selection is extremely slow, take around 30 per filter which make it nearly useless!
Does someone knows a way to build corraletors using panda with some filters?

Probably useless (as trivial) but here is the notebook.

Loading notebook preview...
Notebook previews are currently unavailable.

Thanks to everybody who submitted questions. I've gone through an initial batch below in the order in which they were asked. I will work to get another batch answered in the next week or two. I'm going to try to run some webinars and record some videos on the topics below, so watch out for those.

Does a positive IC skew mean that there's more predictive information in the top quantile (for longs) and vice versa for negative skew? If not, what does the IC skew tell me?

We have found that this statistic is not super useful in practice and will likely remove it from Alphalens in the future Alphalens. IC Skew refers to the distribution of information coefficients, not anything about the percentiles. The IC is computed each day, which means we have a whole set of measurements of the predictiveness of the model. Each IC alone is likely not that informative, as rarely are strategies highly predictive on one day. Often they become more and more predictive as you average together more and more predictions. Given that, we look at the distribution of IC estimates and try to decide if the IC is drawn from a distribution with a mean of zero (not predictive), or not zero (predictive). Skew is one moment of that distribution, and refers to how the mass is distributed in the histogram. The explanation of skew is a bit unintuitive, but I recommend reading up on it if you’re curious. It informs you about the distribution of times your model is and isn’t predictive.

If I set the 'null hypothesis' threshold to 5%, and if the p-value is below 5% for the 5 day holding period but above 5% for 1 day (or vice versa), do I accept or reject the 'null hypothesis?' What 'bias' am I vulnerable to if I 'cherrypick' holding period based on what the IC p-value tell me?

Generally, models will have a predictive ‘sweet spot’ at which they work best. You can think about the days-forward as a parameter to the model, and you’re trying to decide the best value of that parameter without overfitting. You are definitely vulnerable to p-hacking (multiple comparisons bias) if you look at a ton of horizons and pick one that happens to pass your null hypothesis test. As such, you wanna think about it no differently from any other parameter choice. I would start by picking wide parameters, like 1, 5, 20, then zooming in on the one that works best to see if it’s indeed a smooth local peak, or that number of days just happened to randomly work well. For instance, if 5 works best, then try 3, 5, 7 and get a better sense of the total parameter space. Then, once you’re chosen what you believe to be the best number of days, run an out of sample test on new data and check the ranges again to make sure that the same structure exists in that parameter space.

Does choosing a longer test period reduce the likelihood of getting a false-negative p-value? (maybe I should be more concerned with false-positives?) What's a good balance? Is one year + one month of future returns sufficient?

Yes, more data while running the same number of tests reduces the chance of a false negative and false positive by upping your certainty in general. However if you also run more tests, you’re back to the same level of risk (p-hacking). The amount of time you need to validate depends on how often your strategy trades. If it trades many securities each day, you build up confidence in the predictive power much quicker and a month might be okay. A slower strategy will need more time. In general you wanna pick a confidence level that makes sense in your situation. If this is the only strategy you are personally investing in, then you wanna be super confident. If this is one of 100 strategies being invested in by an institutional investor, you probably actually are okay with more false positives, because the flip-side is you’ll discard fewer good strategies. There’s some recent research on this here. Sorry I don’t have an easy answer, it’s very context dependent and there’s no general rule.

If I still 'believe' in a factor, but p-value > 0.05, is it wrong / bad practice to test a different time series to see if it returns a p-value < 0.05? Essentially p-value hunting to 'fit' my hypothesis (did I just answer my own question there?). It's possible that a factor has predictive power during some time periods, and not during others though, correct? (non-stationarity (?))

Generally yes, tweaking something until it works is textbook p-hacking. It’s okay to test and idea, notice it doesn’t quite work yet, but then use what you learned to improve the idea. But you definitely have to out-of-sample test it afterwards to make sure you didn’t just overfit to the in-sample data. Just make sure you aren’t repeatedly testing (a few times is probably okay) on out-of-sample data, as that just puts you back to square one.

Similar to above, for 'false-positive' p-values (below 0.05) and IC, in general, is it a good idea to re-test a 'good' factor (p-value of <0.05 and high IC) during a different time period (cross-validation(?)), to ensure it wasn't just a 'false-positive' during the first test? If so, by how much does a second test reduce the likelihood of a 'false-positive' if both tests have p-value of <0.05 and high-ish IC?

Yes, cross-validation is a specific technique, I think you’re thinking of generic out-of-sample testing. A model that holds up with similar accuracy statistics in out-of-sample data is a very strong indication that you have found a good model. Just make sure you aren’t repeatedly testing (a few times is probably okay) on out-of-sample data, as that just puts you back to square one.

If a 'combined_factor' has some individual factors with p_values above 0.05 and/or negative IC during the test period, but when removing them from the 'combined_factor' results in significantly lower returns in backtests (i.e. fitted to market noise), what should I do with the 'bad' factors? Should I try to see if any combination of non-predictive factors, when combined have a p-value of < 0.05? Or just remove them altogether?

This is a very interesting case. It’s very possible that models, not predictive alone, are predictive in combination. This is known as non-linearity. What that means is that there’s some combination in the models that’s acting as predictive, or that you just happen to be overfit. In order to investigate this more, I would look at all the possible combinations of your individual models. Watch what happens as you add and subtract different ones and notice if there’s a specific combination which is causing the predictiveness. Also look at the IC and returns as you add and subtract, not just the p-value. Some research has been done looking at alpha factor interactions. Think about the ranking of two factors, when you combine both each stock gets assigned a point in 2D space, so now instead of quintiles you have a 5x5 grid of buckets into which stocks can fit. Whereas it’s definitely true that adding complexity increases your risk of overfitting, if you start from a hypothesis that two models should interact in an interesting way, there are definitely some cool effects you can explore here. For instance, you might hypothesize that whereas price/equity ratio is not related with returns, stocks with a high price/equity ratio and high amounts of debt will have negative excess returns in the future.

Does a higher IC Kurtosis mean that there are more extreme outliers and therefore a stronger case for 'winzorizing' the factor? My factors tend to have fairly low IC Kurtosis values (well below 3, which I believe is the peak for a normal distribution (?)). Is this good or bad?

As before in the skewness question, the kurtosis here refers to the distribution of the IC values, and not to the specific values of the factor itself. If it’s particularly high, then it’s worth investigating why that’s the case. It means that the model is not uniformly predictive day over day, and that there may be structures governing the days on which it predicts better.

IC Std. - the lower the better in relation to the IC Mean?? What does IC Std. actually tell me?

The lower the better. A low IC standard deviation means that you can have higher confidence in the mean value of the IC. A large amount of variance means you can’t be as certain. This is really no different from standard concepts around confidence intervals.

Risk-adjusted IC - is this 'volatility' adjusted IC? If so, is it essentially: IC / IC Std.?

The nice part of Alphalens being open-source is one can just check, which is precisely what I did. It is the IC mean / IC standard deviation. Again coming back to notions of confidence intervals, it just describes the IC mean as a number of standard deviations away from 0. No different from a z-score. However, you need to be careful not to assume anything about the distribution of IC values, as you don’t know if they’re normal. Basically it’s a way of comparing two different models. If one has a mean IC of 0.1, but a IC std. of 0.1, then it will get a volatility adjusted IC score of 1. If another one has a mean IC of 0.05, but a IC std. of 0.01, then it will get a IC score of 5. This doesn’t mean that the second is definitely better, it just means that you have relatively more confidence that the second IC is meaningfully different from 0, and less in the first case.

T-stat(IC) - what does this tell me, in relation to alpha research? Higher the better?

This is just the raw t-stat from the t-test that checks to see whether the IC values were likely drawn from a distribution with a mean of 0 (not predictive) or not (predictive). Remember that a t-test assumes a t-distribution, so this test is certainly not perfect. More of a rule of thumb estimate. The t-stat is not particularly useful, I would mostly just look at the p-value.

In this Q Short video on IC and P-values, Delaney mentions at around 3:15 in the video that there's currently a lot of debate whether p-value analysis is meaningful or not. Could you expand a bit on this please? Is there a school of thought that argues that p-values should be seen as 'relative' (i.e. the lower the better) rather than binary?

Here is a series of blog posts discussing why the author thinks that p-values are not a great way to test hypotheses. One of the most compelling ones in my opinion is just how little they are fully understood, and how much they are consequently misused. P-values are delicate and complex things, and they are only as useful as your interpretation of them. Be careful, read up if you want, but for now it’s probably enough to just think of them as binary and greater than or less than 0.05. Make sure you check the effect size, in this case the volatility adjusted IC and mean IC.

In the graph, IC Observed Quantile (y-axis) to Normal Distribution Quantile (x-axis), does one wand to see the bottom left tail of the S-shaped plot to be above the divider line, and the upper right S-tail to be below the line? Or it doesn't matter as long as it's a clear S-shape?

Quantile-Quantile plots just tell you how closely your data follow a baseline distribution. In our case we use the normal distribution as the baseline. If you notice a deviation, it’s because the distribution of IC values is not behaving in a normal fashion. That’s generally to be expected in real data I think, but the plots can give you clues as to how it might be deviating. A normal distribution will just be a straight line exactly, any deviation from the straight line indicates a dearth or surplus of observations in that quantile.

Does an IC value of 0.1 mean that the factor if predictive 10% of the time, and the other 90% of the time it's just random noise / coin-flipping? And an IC value of 1.0 means the factor is predictive 100% of the time?

Basically yes, but be careful about taking this too far without checking the actual math. A mean IC of 0.1 means that on average the correlation between your model’s predictions and real returns is 0.1. A perfect model which have a mean of 1.0 and an std. of 0.0. A coinflip will have a mean IC of 0.0 and I’m not actually sure what std. you’d get.

How can I check the correlation of two different alpha factors?

I’m attaching a notebook that does this in a separate comment. Basically you construct a portfolio based on each factor, then you check the correlation of the returns. This methodology relies on your choice of portfolio, but generally choosing a portfolio that longs the top quintile and shorts the bottom is common in industry. If you’re worried you can try it for a variety of portfolio methods, or a method that fits how you actually trade better.

If I combine two uncorrelated alpha factors that I've found, is there any need to run the combined factor through Alphalens (over a different training period?)?

Yes absolutely. You don’t know what kind of non-linear effects are introduced by combining your models, you want to check that the stats are at least as good as each model independently, ideally the mean IC will be the average of the two independent mean ICs, and the IC std. Will be strictly lower than the average of the independent IC stds.

By combining two uncorrelated (and seemingly conflicting) alpha factors, for example one Momentum factor and one ST Reversal factor, wouldn't you lose alpha by combining them?

Certainly possible. To the extent that the factors are based on contradictory models you will definitely lose alpha. To be clear, when we talk about combining factors, it’s factors that are independent and have no reason not to be combined. Another thing to think about is predictive time frame of factors, you probably want to combine factors with similar time frames, or at least make sure that they help each other. Momentum factors tend to be slower than short term reversal in my experience, so combining the two might not ever be a problem. However, the frequency at which you’re trading would decide which factor’s signal you’re actually using and complicate things. It’s certainly possible to combine some slightly slower (say weekly) factors with some faster ones (say daily) and get benefits. This can be especially helpful if you have a weekly model and need to increase turnover to get to within our contest criteria bounds.

Taking the above example again, one Momentum Factor and one ST Reversal factor combined into one. The 'Strategic Intent' for the Momentum factor is the Newtonian idea that 'stocks in motion tend to stay in motion' whereas the 'Strategic Intent' for the ST Reversal factor is essentially the opposite (Mr. Market overreacts in the short-term). Is it ok to have separate Economic Rational for each factor, or is it not a good idea to combine two factors that have conflicting rationale (as they might be negatively correlated (?))?

Conflicting rationale is definitely a worrying issue, but again let’s think about the time frames. Momentum will generally target weekly/monthly timeframes from what I’ve seen. Momentum says that an upwards swing is indicative of a long term upwards trend. You can still trade on the up/down noise that happens on that upwards trend. So you could even combine momentum and mean reversion models in an intelligent way by using momentum to effectively detrend the series.

It would seem that for Alphalens to be helpful, the risk model would need to be incorporated. For example, say I decide to cook up a factor, and it just happens to be similar to one of these:
from quantopian.pipeline.experimental import Momentum, ShortTermReversal, Size, Value, Volatility
I could spend a lot of time analyzing and perfecting my factor, only to have a very unpleasant surprise when I try to use it in an algo for the contest/fund.
Any thoughts on how to address this problem?

We've built an integration between Alphalens and Pyfolio that allows you to construct a basic portfolio based on your alpha factor and then run that portfolio's returns through Pyfolio's risk exposure. We'd love to build a more integrated risk breakdown into Alphalens, this is what we currently have. As mentioned above I'm attaching a notebook in a separate comment that shows how to do this.

Here is a notebook I built that allows you to check the correlation between two alpha factors, plus run an alpha factor through our Pyfolio integration to see what the risk exposures look like. You probably want to do this as you're building alpha factors so you can see if the effect you found is actually super correlated with a known risk factor, or correlated with another model you're already working on. If it is that isn't necessarily a bad thing, two models which are 50% correlated with still be 50% uncorrelated and adding them should help diversify your portfolio.

Loading notebook preview...
Notebook previews are currently unavailable.

I've been wondering if Pipeline can be set up, for multiple factors, to spit out a trailing window of returns per factor and the variability in returns per factor? These data could potentially be used in combining the factors (e.g. minimize the variance). Also, effectively, this would allow one to embed AlphaLens-like analysis in an algo (e.g. to automatically manage "alpha decay" across factors).

Sounds like a good idea keeping it simple as overall returns per factor but one route as a start involving per stock might be correlating yesterday's pipeline output containing ranks for each factor per stock to current PnL's also ranked, in BTS.
Then, for record(), how can an overall score of some sort be calculated for each factor and charted? To make sense of shorting (presumably those below mean rank), could their PnL just be multiplied by -1 for pandas .corr()?

I'm wondering if there is a systematic way to investigate a given data set, without cooking up a specific factor. For example, we have the https://www.quantopian.com/data/psychsignal/aggregated_twitter_withretweets_stocktwits free data set. So, I could try to be clever and formulate some hypothesis (and another and another...), and cast it into a Pipeline custom factor formula and then test it using AlphaLens. However, what I'd really like to know, is in a general sense, is there any formulation of the available data that might be worthwhile trading. In mathematical terms, is there any transformation like this that would be profitable:

F(bull_scored_messages, bear_scored_messages, total_scanned_messages)  

Of course, there is an infinite number of ways of formulating the transformation, but is there a systematic way of doing it, that would answer the question once and for all that there is or isn't any way to eke any alpha out of bull_scored_messages, bear_scored_messages, total_scanned_messages?

As a ground rule, say that the transformation is run across the entire QTU, and no other data can be brought into play, in a ranking scheme (I think this is the prescribed approach, versus trying to identify a sub-universe for which the factor would work).

@Grant

I think your wonder/question should not be in the thread. It is more about how to automatically build a factor rather than how to analysis its performance.
But then I was thinking the exact same thing :-).

So let say we have a factor F(a_i, p_j), where a_i are some values from some datasets, and p_i a list of parameters.
First as you said, the parameter space of F(a_i) is infinite (as it can be an arbitrary function, depending on arbitrary parameters). Therefore fitting F will be damn but really damn computationally expensive, and worst there is very high probability that the solution you find is completely "overfited".

So basically you have 2 way of constructing F.
First one: decide a simple function (linear or quadratic). Then you construct a fitting tool which would run as follow:
1) A fonction which return alpha for a given set of parameter (run a pipeline, run alphalens, extract alpha and the other things you want to maximize)
2) A fitting tool, which optimize the outputs obtain in 1). (plenty of different way to do it, a markov chain (a bit as in a MCMC), a gradient descent, ..... (I think the best here is look in neural net how to perform the optimization)
Second one: You would prefer a "highly" non-linear function:
Do it with a neural network: but that will not be trivial, and its the same step than the first solution. The only difference is that your function of point 1) is now describe by a neural network (which contains logs,exp,sync,arctan,.. any crap you want to put in). But then the training will not be straightforward, as it have to be unsupervised (you dont know the alpha...), maybe reinforcement learning might be a way.

But I would say, a tool like that is the holy grail! If you manage to have such a tool, you can fully automatize strategy building.... I am sure than more than one person in the world would be quite happy to buy it! holy grail are beautiful, but at the end, if they exist, its never trivial to find them.....

@ Delaney -

Thanks for your feedback and the new notebook. Hopefully y'all are finding our questions useful.

One thing I'm realizing, and attempted to articulate above, is that there should be some way to treat any time series data set that has been classified/derived in a certain fashion (e.g. "mood" of traders based on an algo that analyzes Twitter data) and characterize it in a general sense, prior to attempting to write down a model using the data. For example, if my general approach is to take price data and construct a model (i.e. write a Pipeline custom factor), then across the QTU (to be specific), there should be evidence that prices are something other than noise, and that future prices bear some relation to past prices, in a general sense. I'd want to do this before getting all worked up about developing a specific model using the data. What is your recommendation, at a gross level, to see if there is anything other than noise in a given data set, without formulating a specific model?

It is also necessary to check that the data have not been biased, e.g. over-fit, such as we see for Alpha Vertex PreCog data. I think this requires knowledge of the in-sample/out-of-sample periods (which we don't have for the Twitter data), but maybe there is a way to look for certain regime changes in the data? Given the Q interest in detecting over-fitting in algos, perhaps you have expertise in this area?

Questions are very helpful. The adage "If you have a question, someone else probably has the same question." holds true in my experience. You also prompted me to make a correlation and risk checker notebook, which I'm happy is finally out.

Like I said, I'm working on making time for the next batch. I count 8 pages of questions in my google doc I use to track my backlog, which is awesome. I wanted to thank all the other folks who've been stepping in to answer questions. If I provide an answer to the same question it's not because I think yours wasn't good, it's because I just go through the questions in order and don't keep track of which ones have been answered by other folks on this thread. Thanks a bunch to Michael Matthews in particular.

Another thing to think about is predictive time frame of factors, you probably want to combine factors with similar time frames, or at least make sure that they help each other. Momentum factors tend to be slower than short term reversal in my experience, so combining the two might not ever be a problem. However, the frequency at which you’re trading would decide which factor’s signal you’re actually using and complicate things. It’s certainly possible to combine some slightly slower (say weekly) factors with some faster ones (say daily) and get benefits. This can be especially helpful if you have a weekly model and need to increase turnover to get to within our contest criteria bounds.

This touches on the topic of factors on disparate time scales, which Joakim asked about here. Jamie took a crack at a response here, but there was no further Q engagement on what would seem to be an important topic. I'm guessing this is a "solved problem" in the professional quant community. Combining N factors, with M respective trading time scales is basically the thing that needs to be done. How might it be done on Quantopian? I gather that AlphaLens can be used to find the optimal trading frequency for a set of factors, but then it would seem challenging to incorporate them into a single algo if all of the time scales are not identical.

Any guidance on what I think is termed "guards against bad data?" I see custom factors with various np.nanmean, np.nanstd, etc. and other nan guards scattered around code. And then we have non-nan data that has errors, etc., which one has to guard against in writing factors for input to AlphLens. Is there a lecture/tutorial/help page section on how to approach this problem systematically?

Grant,

I was looking at your notebook evaluating the random factor. I don't see systematic gains/losses by sector as a result of the factor. That said, some sectors did perform better or worse over the sample period (as would be expected). The "systematic" out/under-performance you were referring to may simply have been due to this.

Note: There is a group_neutral option in the al.tears.create_full_tear_sheet function (see attached notebook). This may help in evaluating strategies where you want to be sector neutral.

Loading notebook preview...
Notebook previews are currently unavailable.

A few more from me. Grateful for the answers provided so far!

  • If a factor model has a mean IC of 0.04 from In-Sample (IS) data, but when tested OOS (same length of time series data, e.g. 1 year + 1 month returns each) it has a mean IC of 0.03, does that mean the model was likely 'overfit' by 25% (0.01/0.04) on the training data? Or would one need to take more OOS samples (to get a mean and variance of the 'overfit' IC mean(?)) rather than just rely on a single OOS test? Is there a better way of measuring how 'overfit' a model might be, when comparing IS vs OOS output from AL?

  • Is there a difference between overfitting on 'Random-Walk' noise vs. overfitting on a certain non-stationary market regime? If there is, the latter may have at least some value as the market regime may return/continue in the future? Is there any way of measuring the difference in AL?

  • How can I measure, using AL, the 'break-even' point that a model needs in order to survive Q's default trading costs (both default commission and slippage)? E.g. is there a minimum mean spread (in bps) between the upper and lower quantile that's required in order for the model to survive trading costs? (my hunch is that it may not be this simple)

Joakim,

On your first question, I don’t know that the interpretation of the IC works that way. However, in general, it is typical for out of sample performance to deteriorate somewhat. Additionally, performance may deteriorate due to other factors besides overfitting. For example, you may experience a regime shift in the out of sample data that may cause your model to be miscalibrated (and possibly underfit). I guess you could say that an IC of .03 means your model explains .09% of the variation in ranks (think R-squared). Whereas your 0.04 IC indicates that your model explains .16% of the variation in ranks, so maybe you could say you saw a 43.75% deterioration in the explanatory power of your model. (To any statisticians, please correct me if my interpretation is wrong there).

Regarding overfitting and regime shifts, if you can predict the regime shifts on out of sample data, then I would argue this is not overfitting at all. You are simply picking up on a signal, and not on noise. Overfitting by definition is simply fitting to random noise. Alphalens can help in this case because it shows you how the factor performed over time. It also shows seasonality plots. This is where you can use your judgement as a researcher to determine whether periods of under/out-performance of your factor were due to randomness, or if there is possibly a fundamental or structural reason for this cyclicality.

Regarding transaction costs, my impression is that Alphalens was not created to do transaction cost/implementation analysis. The goal of alphalens, as I understand it, is to determine whether a factor is predictive or not, irrespective of transaction costs. Even if a factor is weakly predictive, it can potentially be valuable when combined with other uncorrelated factors, or could even interact “multiplicatively” with other factors. In this case, while the factor would not be able to overcome transaction costs on its own, you still wouldn’t want to throw it out. (See this blog post on the quant workflow by Quantopian).

I have also attached an example notebook that illustrates the concept of overfitting. It's not well explained because I'm a bit tired, but feel free to ask questions. Quantopian also has a lecture on overfitting if you haven't seen it already.

Loading notebook preview...
Notebook previews are currently unavailable.

Hi Michael,

Wow, that is one cool notebook! Thank you for taking the time putting that together and for sharing it. I'll need to spends some time playing around with it.

Regarding my question on 'overfitting on a specific market regime' I probably didn't explain it very well.

Let's say I train my model on 2017 full year dataset (a fairly low volatile market regime), and my factor therefore, inadvertently, gets fitted on a low volatile market (or trending market, or a market where large-caps on average outperform small-caps, technology stocks on average outperform utility stocks, etc.). Since financial time-series is non-stationary, couldn't you say that my factor was 'overfitted' to work better during a low volatile market (or whatever the market regime was during the time-series when I was training my model), especially if that was not my intention? And so when tested on 2018 data (a much higher volatile market), the factor works less well due to the market regime shift. Does that make sense?

Good point on AL not being intended for transaction cost analysis, and that one may want to still keep weaker alpha factors that, on their own wouldn't survive transaction costs, but in aggregate would still be net alpha-positive when combined in the right overall model.

Thanks again, I appreciate you taking the time answering these!

Regarding trading costs: you need roughly > 10bps to make money.

I’m wondering how one can tackle the alpha combination evaluation step. At some point, one has N decent factors (on the same trading time scale, to keep things simple). So one then has to study the best approach to combine them. Q has shown sum of z scores and ML techniques, and probably others and the 101 Alphas project was to explore the combination step (it was never finished, for some unreported reason).

As a general note to Q and Delaney, there’s a lot of preaching that combining factors is the way to go (good advice) but the tools aren’t adequate. One really needs a version of Alphalens that takes in N factors individually and in combination and analyzes the whole thing in an integrated fashion.

Joakim,

Regarding different regimes, I think you have the right idea. If you only build and train your model on one type of market condition, then you might have a model that only works on that type of market. In some sense, the model might properly be fit to that regime (as opposed to overfit), but it still have poor performance out of sample. This wouldn't be bad if you could identify ex ante what type of regime you are in, but that is a challenge in itself.

This also highlights the importance of sample selection. I think it absolutely makes sense to test your model over different time periods with differing market conditions to see if your model holds up. As always, there is a trade off between sample period size and data relevance due to the possibility of permanent structural breaks in the market (e.g. the effect of decimalization on short-term scalping strategies). This part of where the skill and experience of the researcher comes into play.

Thanks Michael,

I fully agree with that. I'm not trying to do any sort of market timing though, which I think is quite difficult, though could be very profitable if possible at all.

Basically I'd like to avoid, as much as possible, to fit my models on any specific market regime (and on noise), whether I'm aware of it or not. 'Overfit' might be the wrong word for it, if 'overfit' only means 'fitted on noise.' Perhaps 'over-exposure' is a better word?

I suppose this is where the Q Risk Model comes into play in my AL research (pennies are slowly starting to drop). If my model is 'over-exposed' to a certain sector or style, and a lot of the returns in my model are coming from being exposed to a certain style/sector, then it's likely that my model is 'over-exposed' to a certain 'market regime' that perhaps worked well during the training period, but may not hold up as well during other (OOS) time-series where there's other market regimes at play?

Basically, if it's possible, I'd like to find a way of measuring how much of my model's returns can be attributed to market regime 'over-exposure', how much of my model is 'overfitted' on just random walk noise, and how much is likely to be 'pure alpha.'

I'm looking for the Holy Grail essentially... Ni!

:)

Hi Delaney -

Did ya'll ever pursue trying to understand this paper:

https://arxiv.org/ftp/arxiv/papers/1601/1601.00991.pdf

In the appendix, the 101 Alphas are published, and they've largely been translated into the Q API. The whole thing is a bit perplexing, since the paper seems to claim that they can somehow be combined into a profitable "mega-alpha" and asserts:

We emphasize that the 101 alphas we present here are not “toy” alphas but real-life trading
alphas used in production. In fact, 80 of these alphas are in production as of this writing.

This would say that the Q workflow should be capable of taking the alphas as an input, in an attempt to substantiate the claims.

My guess is that individually, most if not all of the alphas look pretty bad using Alphalens, yet there is the possibility that when combined, one could somehow eke out some consistent alpha. This would say that with the right alpha combination technique, factors that would normally be rejected could have some value (e.g. contain "ephemeral" alpha as the paper says). Any insights?

Joakim,

I think the Q Risk Model will help you with what you are describing. It will measure your exposures to given factors/sectors, and then give you an “attribution analysis” of what drove the algorithm’s performance. You can then asses whether the drivers of your performance are actually what is driving the logic of your algorithm. If these do not align, then that may be a sign that the results are less likely to hold up out of sample.

I think this is a slightly different problem than the volatility regime example you gave above. For this I would refer you to Quantopian’s blog post and notebook outlining a process for evaluating whether an algo performs better or worse in a calm vs. volatile market.

Hope this helps!

@Michael Matthews,

Thanks for the link on volatility regime sensitivity analysis. Dr. Jess Stauth had mentioned this Q study to me earlier in another thread but was not directed to the reference. This is a great stress test for long backtests (10+ years) as it highlights the attributable performance metrics to a particular regime, in this case high / low volatility regimes as defined. One can extend this analysis to other regime conditions / definition. For example, Bull / Bear regimes, Bull is define as current SPY close > 200 MA SPY close and Bear as current SPY close < 200 MA SPY. It can even be further extended to multiple regime conditions such as Bull/high volatility, Bull/low volatility, Bear/high volatility and Bear/low volatility. If your long backtests survives it, you might have just found what might be the closest to the "Holy Grail" is!

Hi Delaney -

I thought this was a good thread. Are you planning to sustain it? Generally, Q has all but given up on the forum, as far as I can tell. You are the last great hope, my man!

So, I've been plugging away, using AL exclusively. It is a broad question, but how do I know that I'm done, vis-a-vis results that would suggest a shot at contest winnings and a fund allocation? I'm happy to spend 90% of my time in the research platform, as Fawce recommended, but how do I know I'm done, and can move onto the remaining 10% of my precious hobby time?

Hi Delaney -

Is there anything in the AlphaLens output that would scream "Winsorize me, baby!"? "To clip or not to clip: that is the question..."

Maybe there is a standard way of interpreting the AlphaLens output that would indicate that winsorization (or something similar) is called for. Any guidance?

A few more from me. Hopefully we'll get to these eventually. I really appreciate this thread!

  • Is it 'wrong' / 'bad practice' to have a less strict p-value cutoff for OOS vs IS? For example, a 0.01 cutoff for IS but a 0.05 cutoff for the OOS test?
  • Would you be able to provide a 'reasonable' range (or Max) for the AutoCorrelation (AC) metric? For example, anything above 0.95 is bad(?) and why a high AC is 'bad'? I understand (somewhat) what AC is but not why it's 'bad'. Do you have any general advice on how one might be able to reduce this figure?
  • Sometimes I get a positive Mean IC (and risk adjusted IC), a p-value below my cutoff, decent spread between top/bottom quantiles (seeming to spread away from each-other), BUT a negative Annual Alpha and a negatively trending 'Factor weighted long/short Portfolio Cumulative Returns' graph... How is this possible, and what might be the cause(s) of the negative returns? Should I still keep this factor, or try to 'tweak' it to get positive returns, or discard altogether?
  • Would you be able to provide a general 'reasonable' range/max for the Mean Turnover value for the top/bottom quantiles? This metric is mostly for trying to minimize unnecessary trading costs, correct? Do you have any general advice on how one might be able to reduce this figure?
  • Similar to Grant's question above, would you be able to provide examples on when to do a complete cutoff of outliers (e.g. .percentile_between(1, 99), and when to do winzorise, and when one might want to do both, and hopefully still have some remaining alpha?
  • Is there any way of measuring 'Alpha decay' of a factor using AL?

Just as general feedback, I really find the below videos/lectures very helpful. Since they are all about 3 years old, and some things have changed on Q (new tools, APIs, etc) it might be quite useful to revisit them and do new webinars for a new audience?

While I'm writing a 'webinar wish-list' I think it might also be nice to have a multi-part series of 'deep-dives' into each of the steps in this webinar: Idea to Algorithm. Just cause I know you guys are not busy enough as it is with FactSet datasets and global EQ market rollouts, etc. :)

I'm learning how to use Alphalens but I don't get the logic of the "demeaned" argument, I have two questions:

  1. When I use the method "alphalens.performance.mean_return_by_quantile" with the demeaned argument equal "False" is to simulate a Long Short portfolio, but I don't understand why the two factors (in this case) have the same values (in absolute value), indeed when I put "demeaned = False", the values are quite different.

  2. With "alphalens.performance.factor_returns" I see that the "demeaned" argument has no effect in the result of the function, why?

I attached the notebook with the both problems, thanks for the help!

Loading notebook preview...
Notebook previews are currently unavailable.

Hi Fabio,
As I understand it the purpose of demeaned=False is not to simulate a long/short portfolio. Demeaning the data has the effect of reducing the sample mean to 0 (by subtracting it from each sample observation). I believe what you are looking for are the results of your factor depending on whether it is used in a long/short portfolio? If this is the case I recommend looking into the AL documentation so you can see which functions are the most convenient for what exactly you are looking for. For example, you could try create_summary_tear_sheet with long_short =True and long_short=False to see the effect this has on your alpha signal.

Hope that answers your question.

Fabio,

For the first part of your question, to elaborate on Luke's answer, setting demaned=True takes the returns and subtracts them by the universe mean return before calculating the mean return by quantile. In other words, it is an "excess" return over the mean universe return (similar to if you went long the quantile and short the entire equal-weighted universe).

For part 2, in the performance.factor_returns function, when you set equal_weighted=True, it is mathematically equivalent irrespective of whether you set demeaned=True. If you are interested in the math, I have included the proof below. Note, when equal_weighted=True, alphalens uses the median to demean instead of the mean. This makes sure you have the same number of longs and shorts and thus shorts have the opposite weight of longs (but same magnitude).

\(m=\) number of longs

\(n=\) number of shorts

\(i=\) asset \(i\) in long basket

\(j=\) asset \(j\) in short basket

\(w_i=\) weight of asset \(i\)

\(r_i=\) return of asset \(i\)

\(w_j=\) weight of asset \(j\)

\(r_j=\) return of asset \(j\)

\(r=\) median return

Note, since equal weighted demeaning uses the median, \(m = n\), so I will simply use the variable \(m\).

Without Demeaning, the return for date \(t\) is:

$$r_t = \sum_{i=1}^{m}w_ir_i + \sum_{j=1}^{m}w_jr_j$$ (Note: \(w_j\) is negative)

With Demeaning, the return for date \(t\) is:

$$r_t = \sum_{i=1}^{m}w_i(r_i-r) + \sum_{j=1}^{m}w_j(r_j-r)$$

$$r_t = \sum (w_ir_i- w_ir) + \sum (w_jr_j- w_jr)$$

$$r_t = \sum w_ir_i- \sum w_ir + \sum w_jr_j- \sum w_jr$$

$$r_t = \sum w_ir_i + \sum w_jr_j - \sum w_ir - \sum w_jr$$

$$r_t = \sum w_ir_i + \sum w_jr_j - (\sum w_ir + \sum w_jr)$$

$$r_t = \sum w_ir_i + \sum w_jr_j - (r\sum w_i + r\sum w_j)$$

Since portfolio is Equal Weighted Dollar Neutral:
$$\sum w_i = -\sum w_j$$

$$r_t = \sum w_ir_i + \sum w_jr_j - (r\sum w_i - r\sum w_i)$$

$$r_t = \sum w_ir_i + \sum w_jr_j$$

And this is the same result as without demeaning.

Joakim,

Here are some of my thoughts on your questions. As always, these are just my opinions, so take it for what it is worth.

Is it 'wrong' / 'bad practice' to have a less strict p-value cutoff for OOS vs IS? For example, a 0.01 cutoff for IS but a 0.05 cutoff for the OOS test?

I don’t know whether it is wrong or right, but one approach would be to adjust your p-value for the number of tests run. https://en.wikipedia.org/wiki/Multiple_comparisons_problem#Controlling_procedures

Would you be able to provide a 'reasonable' range (or Max) for the AutoCorrelation (AC) metric? For example, anything above 0.95 is bad(?) and why a high AC is 'bad'? I understand (somewhat) what AC is but not why it's 'bad'. Do you have any general advice on how one might be able to reduce this figure?

I think this would depend on the context. Check out this video by Quantopian at 39:53. They give some rationale for how to think of it. In the example of the factor being analyzed, the factor values show a high autocorrelation, and Dr. Stauth goes through her thought process for why this might need to be investigated further. Essentially if autocorrelation of your factor is too high, you aren’t really getting a large number of “independent” bets. (Note: A point I think they miss in the videos is that the main reason for the high autocorrelation is due to how they constructed the factor (it is a 5 period moving average of sentiment). By its nature, since you are only dropping 1 day in the average and adding a new day, prior day’s values will be highly correlated with the next value.) Regarding a "reasonable range" for autocorrelation, I think it depends on the context/situation, and/or factors being analyzed.

Sometimes I get a positive Mean IC (and risk adjusted IC), a p-value below my cutoff, decent spread between top/bottom quantiles (seeming to spread away from each-other), BUT a negative Annual Alpha and a negatively trending 'Factor weighted long/short Portfolio Cumulative Returns' graph... How is this possible, and what might be the cause(s) of the negative returns? Should I still keep this factor, or try to 'tweak' it to get positive returns, or discard altogether?

I am not really sure on this, but keep in mind that both the alpha calculation and the factor weighted long/short cumulative returns chart use all stocks in the computation and weight returns proportional to the factor value. Therefore, it would be possible to have a positive high minus low spread, a negative alpha, and negative factor-weighted returns if the “middle quantiles” provided a drag on the long-short portfolio. That said, the IC coefficient should take into account those mid-range factor values as well, so it is a little puzzling without seeing a specific example.

Would you be able to provide a general 'reasonable' range/max for the Mean Turnover value for the top/bottom quantiles? This metric is mostly for trying to minimize unnecessary trading costs, correct? Do you have any general advice on how one might be able to reduce this figure?

Again, I think this is situation specific. A signal will have to have relatively greater predictive power if it has higher turnover. But saying exactly how much turnover is okay depends on how strong the signal is.

Too low turnover may mean that you need larger amounts of data to accumulate the desired number of independent bets to prove your signal is robust. Also, if you have a large amount of intraday turnover, the slippage model in the IDE may not do a great job of modeling this (think high frequency trading), particularly since bid/ask data is not available.

In terms of how to reduce turnover, I’m not really sure in general terms but maybe look at:

  • Increasing time between rebalancing
  • Smooth the factor values

Also, see contest rules for their guidelines on Turnover. https://www.quantopian.com/contest/rules

Similar to Grant's question above, would you be able to provide examples on when to do a complete cutoff of outliers (e.g. .percentile_between(1, 99), and when to do winzorise, and when one might want to do both, and hopefully still have some remaining alpha?

If you are weighting assets proportional to factor weights, it might make sense to winsorize, especially if you think a small handful of assets may end up being responsible for a large portion of your alpha. Note, if you use factor ranks or quantiles, winsorizing is likely unnecessary. Regarding removing outliers, I would lean toward not removing outliers unless if you think it is bad data. Otherwise, you would need an economic rationale for why doing this is a good idea. (e.g. Maybe you are testing a momentum factor and you think that stocks with extremely high returns signify an event like a takeover (assuming these aren’t already screened out)).

Is there any way of measuring 'Alpha decay' of a factor using AL?

Assuming you are talking about how long of a forward period the signal is predictive for, think about using the alphalens event tearsheet. They also talk about this in the video I referenced before at 36:16.

Hope this helps.

Thanks Michael, super helpful as always! Would be great to get @Delaney's take on them as well, though your answers do make a lot of sense to me.

A few follow-up questions/comments if you don't mind:

That said, the IC coefficient should take into account those mid-range
factor values as well, so it is a little puzzling without seeing a
specific example.

I'll see if I can find a somewhat 'generic' example of this (positive alpha but negative returns) and attach the AL notebook.

Again, I think this is situation specific. A signal will have to have
relatively greater predictive power if it has higher turnover. But
saying exactly how much turnover is okay depends on how strong the
signal is.

Great answer! I'm slowly starting to get my head around this. Would a 'signal-to-turnover' ratio make sense you think, and if so, what would such a ratio look like?

Mean IC [or Risk Adjusted IC] / (Bottom Quantile Mean Turnover + Top Quantile Mean Turnover)?

Or:

Mean Period Wise Spread (bps) / (Bottom Quantile Mean Turnover + Top Quantile Mean Turnover)?

Or something else? (same holding period for both the numerator and denominator, e.g. 1D or 5D, etc). If any of this makes sense, is it possible to get this 'signal/turnover' ratio from AL somehow?

Really appreciate your help and insights! I'm quite new to all this and a bit of a slow learner (and a quick forgetter) so appreciate your patience and helpful explanations!

Joakim,

I think your second idea for a signal/turnover ratio makes some intuitive sense . Here is my thought process:

\(r=\) return for the period in question

\(t=\) mean turnover rate (as a decimal)

\(c=\) transaction cost for each dollar traded

\(\theta_{tc}=\) signal/transaction cost ratio

\(\theta_t=\) signal/turnover ratio

Essentially, what we are trying to measure is whether the return more than compensates us for our transaction costs. Therefore, your total transaction cost as a fraction of your portfolio will equal \(tc\). Our signal to transaction cost ratio would be:

$$\theta_{tc} = \frac{r}{tc}$$

In this case, if \(\theta>1\), it means your return is greater than your transaction costs. Now, to make this analysis more general, let's assume we don't know the user's transaction cost, so let's just remove \(c\) from the equation to get our signal to turnover ratio:

$$\theta_t = \frac{r}{t}$$

An alternative interpretation of this \(\theta_t\) is the breakeven transaction cost rate \(c\). In other words, if your return was 0.01 (1%) and your turnover was 100% (or simply 1.0), your break-even transaction cost rate would be 0.01 (1%). If your transaction cost rate was less than 0.1, you would theoret be making money.

I'll try and see if I can code an example using alphalens a bit later.

Here is the example I coded up using a basic mean reversion factor. The function at the end calculates the signal/turnover ratio. It could probably use a look from another pair of eyes, but it seems to be calculating correctly.

Loading notebook preview...
Notebook previews are currently unavailable.

Hey folks, as you can tell a few others have been doing an amazing job of stepping in and answering questions. The last several weeks have been super busy for me, but I'm working on another batch of answers now.

Thanks Michael, very useful notebook!

I think I understand why a high turnover (when compared to signal strength) is a bad thing (due to trading costs). However, in the (very helpful) video you linked to, Jess says that a too low turnover value is also a bad thing... Honestly I don't understand this, but I think it's related to a low turnover value means it will take a long time for the factor 'prediction' to materialize(?). Doesn't this contradict the Mean IC value for each respective holding period though?

Take the below AL summary factor analysis (1D column) as an example:

If the Mean IC value is 0.008 for the 1 day holding period, doesn't that mean that about 0.8% of the times, the factor is predictive (up or down) after each 1 day holding period? However, from what I can tell from Jess' explanation in the video, a low turnover means that it may take a much longer time (than just 1 day) for the factor to materialize (I'm probably misunderstanding her somehow, I just don't know how).

This is basically why I asked for a 'reasonable range' for the factor mean turnover value, since a too high value appears bad (which I can understand), but a too low value is also bad (which I don't fully understand).

In the above example, if you or anyone else could explain (in a simple way) why the low factor mean turnover, and the super high factor AutoCorrelation value are both a bad thing, I'd be very grateful!

@Delaney,

Thanks for the update! I'm looking forward to the next batch!

This is basically why I asked for a 'reasonable range' for the factor mean turnover value, since a too high value appears bad (which I can understand), but a too low value is also bad (which I don't fully understand).

An interesting point. I'm sure there are lots of hand-wavy explanations, but one would wonder if, based on some basic statistical/information theoretic measures, one could back out a kind of natural turnover sweet spot for Quantopian's QTradeableUS? This would be irrespective of any specific factor or set of factors.

Joakim,

A factor with high autocorrelation (and most likely lower turnover) means that it does not change frequently. Therefore, you will have fewer independent bets. As a quant, if you have an edge, you want as many independent bets as possible so that the law of large numbers starts to kick in. Furthermore, as it relates to research and backtesting, you need more independent bets to generate confidence in your results.

Also, regarding the interpretation of the mean IC, the IC is a correlation coefficient of the factor "ranks" vs. the "rank" of returns on a daily basis (in other words, it is the spearman rank correlation coefficient of factor values vs. return values). Alphalens reports the mean of these daily ICs. This does not equate to saying the factor is predictive IC% of the time. Alternatively, if you square each IC and then take the mean, I believe you could say that on average your factor ranks explain \(IC^2\) of the variation in return ranks. However, for better intuition about the IC, I would recommend reading about the Fundamental Law of Active Management. It relates your information coefficient to your Information Ratio (or Sharpe Ratio). Basically, it says that your Information Ratio equals your IC times the number of independent bets:

$$ IR = IC * \sqrt{N} $$

In other words, we want high predictability, and we want to make a lot of independent predictions in order to obtain a higher risk adjusted return.

To summarize, it is not that low turnover is such a bad thing (hence the popularity of passive index funds/ETFs). It is just that it makes it hard to prove statistically with confidence that you actually have an edge that works and is not just due to random chance.

P.S. - To Grant's point, it may be possible to come up with a theoretical model that takes into account turnover/autocorrelation, number of independent bets, desired statistical confidence, and signal strength to find a reasonable range for a given factor. I don't have a model for you here, but maybe it is something to think about.

Hi Michael,

Thank you! I think it's slowly starting to sink in to the old melon. Delaney appears to be saying the same thing in the Autocorrelation and AR Models lecture. I really appreciate how you are able to explain in simple terms something quite complicated (to me anyway).

When you say:

A factor with high autocorrelation (and most likely lower turnover)
means that it does not change frequently.

Do you mean that the factor value itself doesn't change frequently? For example, a 'quality' factor, such as ROIC, I would then expect to change quarterly (when companies release their 10-Qs), and therefore have a high Autocorrelation value, and a low Turnover value(?). However, a 'value' factor that has 'price' in it in the denominator, such as the Earnings Yield (inverse of P/E ratio), I would expect to change quite frequently (daily for most companies).

However, both the 'quality' and 'value' factors appear to have very high Autocorrelation, and low turnover? Since the 'value' factor should change almost daily, what's the reason for the 'value factor' having such high Autocorrelation (and low Turnover)? See attached based on your previous NB.

@Delaney,

Just a suggestion, but perhaps it might be useful to have a (curated) FAQ section after each Lecture?

Loading notebook preview...
Notebook previews are currently unavailable.

For AlphaLens, the Mean Factor Rank Autocorrelation is described in the code as:

Computes autocorrelation of mean factor ranks in specified time spans.
We must compare period to period factor ranks rather than factor values
to account for systematic shifts in the factor values of all names or names
within a group. This metric is useful for measuring the turnover of a
factor. If the value of a factor for each name changes randomly from period
to period, we'd expect an autocorrelation of 0.

Here's my interpretation:

  • It sounds like the starting point is, for each stock, to have a trailing window of raw factor values (a time series of trading days, ignoring weekends and holidays, I suppose). But then is the ranking done along the time axis, or across the stocks at each point in time?

  • Then, presumably, the autocorrelation of factor values for each stock is computed for various lag values (e.g. 1D, 5D, etc.). Once we have these data, then for each lag, the mean autocorrelation is computed for each lag value (e.g. 1D, 5D, etc.).

  • For the plots, the same computation is applied, but only using a rolling trailing window, not the whole data set. What is the length of the trailing window?

The author of the code states "This metric is useful for measuring the turnover of a factor" so how is this done? Can I actually compute the turnover from the autocorrelation via a relation (e.g. TURNOVER = FUNCTION(AUTOCORRELATION))?

Overall, a bit of feedback to Delaney on this thread is that in the end, we need guidance on how to accept/reject factors, either on an individual basis or in combination (and what alpha combination techniques to consider), vis-a-vis what might work for contest/fund-worthy algos. In the end, it is a go/no-go decision on a given factor (or combination of factors). It is easy to lose sight of the big picture with all of the discussion on the statistical minutiae.

The other consideration is that as the number of factors grows, the manual AlphaLens process breaks down. One really needs a machine to assess factors. It would be interesting to hear how this might be done (apparently, in the context of the 101 Alphas project, y'all started to think about this, and for the Q fund, you certainly must be noodling on it, since in theory, you could have hundreds, if not thousands, of algos to assess and combine...although with less than 200 contest participants and ~20 algos in the fund, you have a way to go).

Joakim,

It is the factor "ranks" that are used in calculating the autocorrelation. These ranks tend to change relatively slowly for a value factor. In the example you gave with the earnings yield, the earnings only change quarterly, but most stocks tend to go up and down with the market. So, for a stock to get more expensive relative to the rest of the universe, one of two things has to happen, the stock has to have a significant decline in earnings relative to everything else (which as we mentioned only happens quarterly), or the stock has to outperform the rest of the market. Outperformance can happen slowly or quickly.

I think it is useful to understand how the factor rank autocorrelation is calculated. First, I think it is helpful to understand the difference between pearson's correlation coefficient, and the spearman rank correlation coefficient. (This will help you understand the IC calculation better, too). Here are a couple links to familiarize yourself with the concept:

I also took the notebook you attached and pulled the code from Github line by line to illustrate and explain how Alphalens is arriving at this number so you can better understand it. It calculates the ranks of the factors grouped by day. It then calculates the rank correlation of each day's values with a prior day's values. You end up with T-p correlation coefficients (T=Number of days in sample; p = lag of number of days). Alphalens plots this timeseries and also reports the mean in the Turnover/Autocorrelation table.

Regarding Grant's point on guidance on how to accept/reject factors, I think that Quantopian has done a good job putting a lot of information out there. That said, I don't think it is in the spirit of the company to provide an exact formula for when to accept or reject factors. The goal is to have a crowd-sourced fund where hopefully each person does something unique and different. In other words, hopefully people will come up with many different formulas. If Quantopian did have a formula(s) for this, I imagine they wouldn't need us. But that is just my 2 cents.

Loading notebook preview...
Notebook previews are currently unavailable.

Michael,

What can I say other than a big THANK YOU! Your above explanation, the two links, and your very helpful NB all helps me to finally (slowly) understand all of this. It will take me a while to digest it all, but I already feel I have a better understanding of the factor rank autocorrelation.

I also think the the educational material (all FREE) on Q is excellent, but sometimes no matter how many times I watch a lecture or go through the notebook I don't fully grasp the material unless I can ask specific questions I have about the material and have the answers explained to me just like you've done above. So again, a big thank you for taking the time putting this together!

I still don't understand how the autocorrelation computation normalizes the alpha factor. If, for each point in time, the factor is ranked across stocks, then one could have dramatic jumps for a given stock versus time, since the number of stocks is not constrained. For example, if at one point in time, there are 100 stocks, and at another point, there are 10 stocks, then the ranks versus time will not be comparable. Is there an implicit assumption here that the number of stocks versus time will be roughly constant (e.g. only change by a few percent)? Or is there some under-the-hood normalization that makes the computations independent of the number of stocks?

Grant,

I don't think the autocorrelation computation will normalize for the change in number of stocks. It can only compute an autocorrelation for stocks that exist in both the current period and the lagged period. So in your example, the autocorrelation would only take into account stocks that were in both the lagged and current period.

However, it does normalize in the sense that it only picks up changes in factor values "relative" to all the stocks in the universe. In other words, let's say you had a mean reversion factor based on 1-day return. Let's just assume we are trading a universe that shows negative autocorrelation of time series returns (i.e. if today's universe average return is positive, tomorrow's return tends to be negative). However, this does not mean the "ranking" of individual returns of stocks are negatively autocorrelated. For example, you could still have a tendency for a stock that is stronger than the rest of the group be stronger than the rest of the group tomorrow (i.e. positive rank autocorrelation). Alphalens is designed to analyze cross-sectional factors. I believe this is what the documentation means when it says:

We must compare period to period factor ranks rather than factor values
to account for systematic shifts in the factor values of all names or names
within a group.

@ Michael -

Say I have stocks A, B, C, X, Y, Z, with ranks 1, 2, 3, 4, 5, 6. For 100 trading days, all is good, and then X drops out of the universe. Then, I'm left with A, B, C, Y, Z with ranks 1, 2, 3, 4, 5. Well, A, B, C are just fine, but Y jumped from 5 to 4, and Z jumped from 6 to 5. The jumps would seem problematic as stocks go in and out of the universe.

Another question for Delaney is why ranks and not z-scores or some other normalization? It would seem that ranking throws out information.

Grant,

I can see your point. It will make a difference, although it is quite small in the example you give (from 1.0 to .99). You might could say it is a bug in the code. In your example, you would treat the X entry for both days as a NaN. This still messes with the correlation coefficient. However, my guess is that the effect would be immaterial. Maybe if you had large shifts in your universe, this could become an issue. That said, the fix is easy, and can be done in one of two ways. Drop the symbols that don't exist in both days, next rank the two days, then calculate correlation, or more simply, keep everything the same, and just use the spearman rank correlation coefficient at the end of the computation. This will not be affected by the missing value. In your example, you would get an exact 1.0 correlation.

Regarding your question about ranks for z-scores, here is my 2 cents if you care to hear it. Ranks tend to be more robust to outliers. It prevents your model from giving too much weight to some extreme value/factor, and thus protects a bit from overfitting. I believe this is similar to one of the reasons machine learning practitioners in finance will often setup a problem as a classification problem instead of a regression problem. Essentially, we are trying to predict a continuous variable (return), which lends itself as a regression setup. However, since data availability is a problem in finance (i.e. we can't just get more of it), labeling your data as a positive return or negative return (or some other labelling method, e.g. Prado's triple barrier method) reduces the possibility of overfitting the model. Whereas a regression model might heavily weight the features that were associated with a stock that had an outlier return due to some random noise, or non-repeatable event, the classification method simply treats it as a positive label, just like all the other positive labels. In summary, using ranks and classification is one way to deal with possible overfitting (or the bias/variance trade-off).

You are correct though. You do lose information by ranking. You lose the magnitude of the difference between factor values. There may be signal in that extreme factor value that you want to capture. I think this should be up to the judgement of the analyst, and it will depend on the use case.

Also, I should note, the spearman rank correlation coefficient uses ranks to measure the monotonic relationship between variables, as opposed to just measuring the strength of a linear relationship. This can be desirable for measuring correlation between variables that may have a relatively positive monotonic relation, but might not be linear.

Hope this helps.

I understand the roll of ranking a factor and stock returns in the information coefficient (IC) calculation (see discussion of Spearman's rank correlation coefficient). However, my assumption is that ranked factor values are not used in subsequent calculations in AlphaLens. I was figuring that the returns analyses is based on treating the raw factor values as long-short portfolio weights, after the following normalization (in pseudo-code):

alpha_demeaned = (alpha - mean(alpha))  
alpha_normalized = alpha_demeaned/sum(abs(alpha_demeaned))  

So, why one then jumps back to ranks for the autocorrelation computations is a mystery.

One suggestion to Delaney would be to use this forum discussion thread as a springboard for improving the AlphaLens documentation. One approach would be to build it right into AlphaLens itself and have a doc mode flag, with all the detail an analyst would ever want (in words and standard mathematical/statistical symbols, not Python code...just make sure the code does what you say it does). Something like:

al.tears.create_information_tear_sheet(factor_data, doc=True)  

Then, your efforts here could take on some permanence.

What changes to AlphaLens will be required to accommodate the FactSet integration and addition of global equities? And if changes are required, when will they be done, relative to the upcoming FactSet/global equity launch?

Grant,

You have a point. That could be a possible option added to the autocorrelation functions in alphalens where you give the user the option to use a different normalization method. Why don't you submit an issue on the Github page? That way you can get some thoughts from the developers and/or submit a pull request.

Any reason you would create a doc flag as opposed to just adding to the doc strings of the alphalens function. This way you can use Jupyter's functionality by either pressing SHIFT+TAB or using the ? after a function to look at the doc string. Alphalens generally uses the numpy style doc strings (see here). If you look down to "Section #12 Notes" in the link, I believe this would be where you would add details using mathematical notation in LaTeX format.

Well, there's another question for Delaney--are there AlphaLens docstrings?

I tried this:

import alphalens as al  
al?  

and got:

Type:        ModuleWrapper  
String form: <module 'alphalens' from '__init__.pyc'>  
File:        /usr/local/lib/python2.7/dist-packages/alphalens/__init__.py  
Docstring:   <no docstring>  

There aren't any doc strings at the package or module level, but there are docstrings for the individual functions, as you seem to be aware of since you pulled documentation from one of them in your post above.

I went to github and did a copy-paste, but maybe the docstrings are accessible directly from within a notebook?

Grant,

Yes, see the attached notebook for examples on how to navigate modules and inspect doc strings.

Loading notebook preview...
Notebook previews are currently unavailable.

Michael -

Thanks - I cloned and will review at some point.

@ Delaney - in your Q & A spreadsheet, perhaps you should add a column to capture if any docstring addition/revision might be in order?

Another awesome NB - thank you! Helps a rookie python-coder like myself immensely!

Hi @Michael,

Regarding my below question, and your answer:

Sometimes I get a positive Mean IC (and risk adjusted IC), a p-value
below my cutoff, decent spread between top/bottom quantiles (seeming
to spread away from each-other), BUT a negative Annual Alpha and a
negatively trending 'Factor weighted long/short Portfolio Cumulative
Returns' graph... How is this possible, and what might be the cause(s)
of the negative returns? Should I still keep this factor, or try to
'tweak' it to get positive returns, or discard altogether?

I am not really sure on this, but keep in mind that both the alpha
calculation and the factor weighted long/short cumulative returns
chart use all stocks in the computation and weight returns
proportional to the factor value. Therefore, it would be possible to
have a positive high minus low spread, a negative alpha, and negative
factor-weighted returns if the “middle quantiles” provided a drag on
the long-short portfolio. That said, the IC coefficient should take
into account those mid-range factor values as well, so it is a little
puzzling without seeing a specific example.

Attached is a very simple example of what I described above. The factor, ROIC (a 'quality-type' factor in my book) appears to (in sample at least) have decent positive mean IC, p-value below my cutoff of 0.05, top bottom quantiles spreading away from each other (sometimes at least, and on average?), but quite a negative Ann. Alpha, and pretty consistent negative return slope... Would you know what in the world is going on here?

Loading notebook preview...
Notebook previews are currently unavailable.

I think IC is just one screen. Sometimes high IC does not translate to high performance =(

A bit confused by the demeaned method used in al.performance.mean_return_by_quantile.

My understanding is the spread between two quantiles' forward return should be the same no matter using demeaned or not, but confused by below testing:

I got a factor_data, for 5D forward return, if demeaned=False, quantile 1 has lower return than quantile 2; if set demeaned=True, quantile1 has higher return.

Any idea?

Thanks!

mean_return_by_q, std_err_by_q = al.performance.mean_return_by_quantile(factor_data,demeaned=False)
mean_return_by_q

factor_quantile 1D 5D 10D
1 0.000513 0.002653 0.005372
2 0.000484 0.002780 0.005221

mean_return_by_q, std_err_by_q = al.performance.mean_return_by_quantile(factor_data,demeaned=True)
mean_return_by_q

factor_quantile 1D 5D 10D
1 0.000029 0.00009 0.000352
2 -0.000042 -0.00013 -0.000513

Hi Joakim,

It took me a little bit of thinking, but I believe I have figured out what is causing your issue. I have attached a notebook to aid in the explanation.

Your factor values had extreme outliers. This is not a problem for the mean return spread. However, for any calculation that uses factor-weighted returns, it becomes an issue. (The two main areas in the tearsheet that come to mind are the alpha calculation and the cumulative factor-weighted returns chart). On more than half of the days, one asset had a 20+% weight! It even got as high as 45%.

You could probably solve this in a number of ways (factor winsorizing, factor ranking, etc.). It is really up to you.

See the notebook for some of the code I used to figure out the problem. Note, to simplify the problem, I only focused on the 1-period return values, but the same analysis could be done for the other return periods.

Also, Leo is correct, I believe there are possible situations where a positive IC may not lead to positive performance, in part because it is only measuring the correlation in ranks and doesn't take into account the magnitude of the top-bottom spread. In other words, you might do a good job predicting the ranks overall, but you do better predicting the ranks when the spread is narrow and a poor job predicting the ranks when the spread is wide. However, I don't think this is the case here.

Loading notebook preview...
Notebook previews are currently unavailable.

@Michael,

Such an awesome answer and NB - Thank you!!

I now got this ROIC factor to indeed 'work' (during this 'in-sample' period at least) after removing the worst outliers (filtering by .percentile_between(1,99)) and 'winsorizing' the rest of the tails.

Regarding 'winsorizing', does results.MyFactor.clip(-30,30) do the same thing on a pandas series, as winsorize (from scipy.stats.mstats) does on an array? (notice how I'm pretending to know the difference between a series and an array).

Also, is it at all possible to increase the number of 'bins' in the histogram graphs? I'm just barely able to 'see' the outliers on the graph, even after increasing the figsize (matplotlib.pyplot is another one of my many weaknesses - I'm just full of them!). It would be great to be able to 'visualize' the outliers better on the distribution plot I think.

plt.figure(figsize=(18,12))  
sns.distplot(results.MyFactorZ, kde=False)  
stats.describe(results.MyFactorZ)  

Hey Joakim,

I have attached a notebook that should give you some options on how to visualize the distribution. Note, I did not realize that pipeline factors have a winsorize method. In this new notebook, I used this method instead of the pandas clip method. Regarding winsorize from scipy.stats.mstats vs. pandas clip, the main difference is the inputs they take. In the mstats version, it looks like you input a lower and upper percentile (i.e. in the range of 0 to 1) at which you want to winsorize the data. In the pandas clip method, you input "values" for the limits. Let me know if this is not clear. (Note, the pipeline factor.winsorize method is similar to mstats in that it also takes percentile inputs).

Loading notebook preview...
Notebook previews are currently unavailable.

So awesome Michael, thank you!!!

Thank you also for including the Q-Q plot code and description! Is the red line on the first graph the probability of a normal distribution? I.e. what you're referring to below:

The theoretical distribution in the plot below is the normal
distribution.

Honestly I don't quite understand these Q-Q probability plots. When searching for 'alpha' I thought one would want to see these S shaped curves, as the 'alpha' can be found at the tails (and the further away from the normal distribution, the higher the alpha)? The more extreme S-shape, the better I thought?

Just looking at either the earlier histogram distribution plot, or the Q-Q probability plot, or both, is there a way to:

  1. Associate the extreme outliers with negative returns (or non-alpha?)?
  2. Make a reasonable judgement call as to 'where' one might want to either a) exclude/filter-out the extreme outliers, b) reduce the impact of the outliers (winsorizing), or c) both? Your earlier decision to winsorize at -30 and 30 appear to be a good call, but how did you come to this conclusion?

In short, how does one determine where the 'bad' outliers are, so one can make a reasonable judgement where to exclude and/or winsorize the tails, without leaving too much alpha on the table?

Again, thank you for all your help! I find all of it extremely helpful!

The red line in the q-q plot is just a regression line. (Note: sometimes a 45 degree line will be used instead if we are concerned about the differences in location and scale of the distribution). If the distribution on the y-axis matches the distribution on the x-axis, the points will lie on a straight line. So, since we are comparing to the normal distribution in our example, if the points lie on a straight line, we can infer that the distribution is normal.

In the attached notebook, I give some examples of what different types of distributions look like on the q-q plot. Basically, if the points are more dispersed/spread out than the normal distribution on a certain interval of the graph, the slope of the points will be steeper than the regression line (and vice versa). Here is a link to a youtube video that explains how to interpret these plots using an analogy of filling up a vase with water.

When you talk about the desirable "S-Curve" shape, I believe you are referring to the IC Q-Q plots in alphalens. I have seen some of the videos where Quantopian says this curve is desirable. I believe the theory here is that if your information coefficients have a distribution with an s-shape, similar to the Leptokurtic example in my notebook, it means that more extreme values of the IC occur with a higher frequency than would be assumed by a normal distribution. This is good since higher IC values mean that our factor is more predictive. However, I believe we would prefer to have a longer right tail than left tail, so maybe we really only prefer a "half S". This is just my interpretation. Someone, please let me know if I am off base here.

In response to your question 1, if you are trying to analyze the information content of extreme outliers, first and foremost, think about why these outliers exist. Is there an economic rationale for why an extreme value might be informative and thus predictive of future returns out of sample? If there isn't a great reason for this, then maybe it just make sense to use factor ranks or larger quantile buckets. To test it, you can always split your alphalens quantiles into more granular buckets (i.e. maybe use 10 quantiles instead of 5). Or instead of using quantiles, use the bins argument when creating your factor_data dataframe. That way you can specify cutoffs, so you can create bins that only contain the very extreme values.

In response to question 2, I didn't put much thought into my cutoff values for winzorizing. I just picked some values for illustration purposes (note, the factor ranking worked better than winsorizing in my example). But to be honest, I think this is something where you just have to use your judgement. Use a similar thought process I described in the prior paragraph, and also look at the distribution of values in your in-sample data to see if maybe there are any "natural" dividing points.

Loading notebook preview...
Notebook previews are currently unavailable.

Mai Yang,

I'm looking into your question, but I was curious as to if you could provide some more detail. When creating factor_data, are you using any grouping settings (e.g. calculating quantiles by group). Also, if you are able, please provide a notebook example. It would be easier to isolate the problem that way.

Sooooooo good, thank you Michael!!

I can't claim to really understand it all yet, but I do find these NBs incredibly helpful (both the code but especially your commentary and the plots) in trying to get my head around these statistics and probability concepts. So again, a big thank you!!

@Delaney,

Regarding Michael's comment below (from above), it would be great to hear your (or @Max's or any other data science guru's) thoughts on this. Michael's comment does make intuitive sense to me (as we're mostly interested in positive ICs(?)), but really I wouldn't have a clue...

However, I believe we would prefer to have a longer right tail than
left tail, so maybe we really only prefer a "half S". This is just my
interpretation. Someone, please let me know if I am off base here.

@Michael,

Attached is my notebook. Thanks for picking it up.

Loading notebook preview...
Notebook previews are currently unavailable.

A bit technical, but might be usefull:

In alphalens when one compute the returns using "factor_returns" the weight are computed by "factor_weights". This method compute the weight as follow:

Computes asset weights by factor values and dividing by the sum of their
absolute value (achieving gross leverage of 1). Positive factor values will
results in positive weights and negative values in negative weights.
" or using equal weight (if the arg is set to True)

Then in a real algorithm the weight would be computed using the optimise method, but this method have an "obscure" way of computing it (i say obscure as one cannot see the source code :-( and its a much more complex way than the method in alphalens).

But obviously those 2 methods do not return the same weight (one can compute the weight and compare.... not the same at all :-) ).

My current solution is to call manually "optimise" methods for the computation of the weight. An interesting fact is that I see a significant different in most of the metrics outputed by alphalens when I change the method of computing the weight (especially when I set some constrain (ok "some" might not be very exact, more accuratly "lot"))

Mai Yang,

I believe I figured out the cause of your issue. As you suggested, the issue was that the mean return "spread" should be the same regardless of whether you are demeaning or not demeaning quantile returns by the universe mean:

Without demeaning, the spread should be: \(r_{top} - r_{bottom}\) .
With demeaning, the spread should be \( (r_{top} - r_{universe}) - (r_{bottom} - r_{universe}) => r_{top} - r_{bottom}\). So, mathematically, they simplify to be the same expression.

There is actually a minor bug in the al.performance.mean_return_by_quantile version that is used on the Quantopian platform. See the following issue on GitHub for details of the bug: https://github.com/quantopian/alphalens/issues/309. Once Quantopian updates alphalens on the platform, this should be resolved.

In the attached notebook, I pasted in the updated mean_return_by_quantile function from GitHub. You should see from the output that the issue is resolved with this function.

Let me know if you have any questions or if I missed anything.

Loading notebook preview...
Notebook previews are currently unavailable.

Hi David,

In response to your above post, there were a few things I wanted to clear up so nobody gets confused. The factor_weights method only uses the median when shifting the factor values when the equal_weight option is set to True. I believe this is so that the negative weights will be equal in magnitude to the positive weights. The only way to do this is to make sure there are an equal amount of negative and positive weights; hence, the median must be used instead of the mean.

The al.performance.mean_return_by_quantile does not make use of the factor_weights function. It computes a simple arithmetic mean with equal weights in each quantile. If demeaned is true, it subtracts the mean of the asset universe.

As I write this, I just refreshed the browser where I was referencing your post, and it seems that your post was taken down, so forgive me if I'm just repeating what you already know.

Hi @Michael, @Dan, @Delaney, @Anyone else,

In the attached NB, I'm trying to apply (or exclude really) specific sectors for two different factors (value and quality), based on the idea that some factors may only have predictive power on stocks in certain sectors.

I then try to normalize and combine these two factors, with scores only on stocks in sectors that are applicable to each individual factor, hoping that the combined_factor will be stronger. Not having much luck though. What am I doing wrong?

I'm asking a few questions in the NB too, but the main thing I'm trying to achieve is to get a 'combined_factor' score for stocks that have a valid 'value' or 'quality' score, and to only score stocks for the individual factor if they are in the specified (factor specific) universe.

Take sector 103 (Financial Services) for example. In this 'example model' I'm claiming that only the 'value' factor is predictive on 'Financial Services' stocks, so the 'combined_factor' should only include the score from the 'value' factor. Does that make sense, and is this possible at all? What I'm doing is obviously not working... :(

I'm probably making a ton of mistakes here so I apologize in advance. Also, not really sure what's going on with some of the graphs. The high top/bottom quantile spread (for the value factor) and high mean IC for both I'm assuming are due to massive overfitting...?

Loading notebook preview...
Notebook previews are currently unavailable.

Hi Joakim,

As we discussed and shared our experiences, there is OOS evidence in live algorithms to improve performance by excluding/modifying sector weights, say constraining from 0~100% the tricks is how to compute the weights on the fly.. there may be some bright ideas from the community :)

Hi Karl,

I'm not even trying to do anything that fancy. All I'm trying to do here is to limit the factor to only score stocks in sectors where I think each individual factor has predictive power, then combine the factors with equal weight, hoping that the combined factor, in aggregate, has higher predictive power on all stocks in all sectors. Not sure I'm making sense here, but hope so... :)

@Joakim

You are computing the zscore of a zscore. Remove the . zscore from the combined factore and then the sum will be the sum :-)
combined_factor = value + quality

For the nan, you can use np.nan_to_num (it replace a nan by a 0).

Good catch @David, thank you!

Joakim,

I didn't have a chance to look through your methodology and figure out why it wasn't working, but I have attached an alternative solution using pandas to combine the factors after getting the individual factors from Pipeline.

Basically, I created a sector mapping DataFrame that tells which sectors use which factors. Then I grouped the Pipeline results by sector. I then multiplied each group by the appropriate row in the sector mapping DataFrame. The combo factor is the average of the two columns. Let me know if this doesn't do what you are looking for or if you have any questions. If I get a chance, I'll take a look at your implementation and try to figure out what was going on.

Loading notebook preview...
Notebook previews are currently unavailable.

Joakim,

I just realized that inspecting the new results would be a good use case for qgrid. See the attached updated notebook

Loading notebook preview...
Notebook previews are currently unavailable.

Hi there,

What a great and informative thread. I especially enjoyed the inquisitive nature of Joakim Arvidsson and Michael Matthews' answers and notebooks.

I would like to share my two centavos' worth of thoughts on outliers in the data. I think it's best to go on a case by case basis as outliers in some cases will be nonsensical whereas in other cases they'll contain valuable information. I'd like to share four made up but hopefully illustrative examples of what I mean.

Ex.1 Price/earning ratio: Imagine a company that has a price earnings ratio of 10,000 with a market cap of e.g. 10 million. This outlier doesn't contain much relevant information as the earnings were only 1,000 $ and therefore very close to zero. Removing that outlier like people traditionally do in case of negative earnings seems to make a lot of sense, since the information value of that outlier is ~null. Even if you do feel it contains some value it doesn't merit its outsize influence on e.g. the mean in a distribution where let's say +99% of the values are between 3 and 50 (or negative nonsensical).

Ex.2 Return on invested capital: Let's say a company has invested 10,000 dollars and turned that into a billion dollars (Mark Zuckerberg comes to mind without knowing exactly how much he invested originally). Again the small number will have an out-sized effect on the mean of the distribution and although it does contain some information on the impressiveness of having turned 10,000 dollars into a billion it probably doesn't make much sense when you compare it to for instance industrial companies investing in machinery or factories.

Ex.3 Daily returns of stocks: If you take a 1000 stocks' daily returns on a specific day where 999 of the stocks are between -10% and +10% (and the vast majority just around 0 returns) but one stock is up 200% you may not want to exclude or winsorize that return as it is very likely to contain some relevant information, e.g. a biotech company getting a drug FDA-approved or a cannabis company being granted rights to sell its products in a big state like California. You may choose to exclude it anyway, but you'll have to be aware that you may be throwing away very valuable information.

Ex.4 Market cap: Say in an alternative universe a big stock's market cap makes up 20% of the combined market cap of the 500 biggest companies, let's call that company FANG Skynet (or maybe Samsung on the South Korean market?). The second biggest company has only 1% of the combined market capitalization so FANG Skynet's market cap is an outlier in the data. This outlier does contain a lot of information however, whether that information is about increased pricing power in the markets it operates or about impending apocalypse it may be relevant, so excluding it or minimizing doesn't seem be a good solution.

The reason you may want to exclude the information and make your factor resemble a normal distribution is that the normal distribution has some nice qualities for instance if you want to use the weighting of the factor as an input in the position sizing of the bets.
To summarize, you likely need to apply case/factor/domain specific knowledge to determine whether you let outliers stay in or out (or diminish their influence) making some sensible choices depending on the specific problem at hand and whether the information the outlier provides is important for your specific purpose.

Hi Bjarke,

Great post. It does a good job illustrating the thought process that I think a researcher should use when looking at outliers. It also shows the value of having a fundamental understanding of what you are trying to model in addition to solid quantitative skills.

Hi everyone,

So I have done quite lot of work in factor analysis lately. My first step was to create a dataframe which is close to the ouput of "get_clean_factor_and_forward_returns" with the difference that it take the output of a pipeline as input (instead of a factor dataframe) and can handle multiple factors. The list of factor is pass using a list containing the pipeline column names one want to put. It return a dataframe very similare that the Alphalens usual one, but the column factor and factor_quantiles have different names for each factor. Then a small utility all to get back the usual data used by alphalens.

Might be useful, lol dont know... Sorry I will not share the data analysis tools which take this data as input (I might create a fork of Alphalens to include those tools in a far far future)

Happy hacking :-)

Loading notebook preview...
Notebook previews are currently unavailable.

Hey David,

Take a look at these two threads on Github. It may be of some interest to you, and it would be great if you had ideas to contribute!

There is a development branch called "factors_interactions" in the Alphalens GitHub repository. The idea for this branch was to add functionality to Alphalens to aid in the analysis of how multiple factors combine and interact with each other (which would be a nice complement to the current functionality which helps analyze the predictive ability of a single alpha factor). I haven't pushed any commits recently. However, I have been doing some work in the research environment to brainstorm ideas on what I would like to add. I will probably share more on this in the near future, but I showed a little bit of the work in this post.

I had seen the branch :-). Look not so active lately :-)

Currently I am more focused on ranking the factors. My goal is to create a large factor library (now I am around 250) then rank them for each groups (I have grouped thing by sector, but also by other metrics).

About your post, on factor combination:
I really like the figue "Evaluate Correlation between Factor Values", an idea to make it even better:
Instead of plotting the point with the same color, one could plot them with a color map to the returns... or a plot in 3d. All the thing is to try to figure out if 2 factor worst to be combined.

Hi,
I'm just working through create_full_tear_sheet output, and am confused about the very first output it gives, labelled "Quantiles Statistics"... what are these the statistics of? I originally assumed they were the returns for the factor_quantile over the forward evaluation period, but they are not. The values for the means in this table are not the same as the first graph, labelled "Mean Period Wise Returns by Factor Quantile".
I have had a good search through the docs and videos, but can't seem to see any discussion of what the very first table actually represents... any ideas?
Cheers,
Bruce

Hi Bruce,

The Quantiles statistics are describing the actual factor values themselves. For example, if you were using a price-to-earnings factor, the mean in that table would be the average of the price to earnings ratios of the stocks grouped by quantile.

Mike

Is there a way to use Alphalens with factors such as economic indicators which are not assigned to each individual stock?