Back to Community
An updated method to analyze alpha factors

We recently released a great alphalens tutorial. While that represents the perfect introduction for analyzing factors, we are also constantly evolving our thinking and analyses. In this post, I want to give people an updated but less polished way of analyzing factors. In addition, this notebook contains some updated thoughts on what constitutes a good factor and tips on how to build it that we have not shared before. Thus, if you want to increase your chances of scoring well in the contest or getting an allocation, I think this is a good resource to study.

Loading notebook preview...
Notebook previews are currently unavailable.
Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

17 responses
ep.cum_returns_final(perf_attribution).plot.barh()  
plt.xlabel('cumulative returns')  

Yes, very helpful indeed. It makes it a great deal more obvious and tractable. And anything that improves speed is more than welcome.

On speed, working out of US hours seems to make a difference. The code below executed within 5 to 10 minutes this morning. Last night I finally shut it down when it had not completed after an hour.

To be honest I felt rather depressed. Your tools are innovative and fascinating (to me at least). But often unusable in terms of the time they take to run.

It is particularly helpful for instance to see how you are calculating "specific returns".

pipeline_output = run_pipeline(  
    make_pipeline(),  
    start_date='2007-01-01',  
    end_date='2016-11-01' #  *** NOTE *** Our factor data ends in 2014  
)

pricing_data = get_pricing(  
    pipeline_output.index.levels[1],  
    start_date='2007-10-08',  
    end_date='2017-11-01', # *** NOTE *** Our pricing data ends in 2015  
    fields='open_price'  
)

factor_data = get_clean_factor_and_forward_returns(  
    pipeline_output['factor_to_analyze'],  
    pricing_data,  
    periods=range(1,252,20) # Change the step to 10 or more for long look forward periods to save time  
)

mean_information_coefficient(factor_data).plot()  

Incidentally, with a back test you can shut down the web page locally on your own computer and it continues to run on the Quantopian server. So that you can initiate a number of back tests, go off to do some gardening, and return later to see the results.

Notebook does not seem to work in this way? I Tried the same trick with your most interesting new notebook, came back and fired it up but the calculations had not been performed. The notebooks had not been shut down - the memory usage on your server was still the same. But the calculations had not been performed.

Just an idea for you.

@Zenothestoic: Yes, that's a downside of research. The difficulty is that we know when a backtest is finished while a kernel just keeps running so potentially we would just need to keep it running indefinitely. The issue with the tab closing is discussed here at length (without a solution, however): https://github.com/jupyter/notebook/issues/1647

FWIW, nothing in that NB was that slow for me, what parts are you referring to specifically?

No Thomas. Not your notebook. I was referring to the standard Alphalens notebook from which my quoted code above was taken. And thank you for the extra information. You have great tools here. I am looking forward to contributing.
A

In the code following the paragraph "Risk Exposure" the following error needs correcting: erroneous and corrected line show:

   # pos = (pos / divide(pos.abs().sum())).reindex(pricing.index).ffill().shift(delay)  
    pos = (pos / (pos.abs().sum())).reindex(pricing.index).ffill().shift(delay)  

It is correct later on in "Putting it Altogether" but I just thought I should note it.

@Zenothestoic: Thanks, should be fixed now.

Very cool, Thomas! Thanks for the awesome post.

Hi @Thomas - This is very useful. One result that I have that's a bit baffling to me currently is a plotted factor exposure range of [-15, 20] whereas I would have expected a max range of [-1,1] and preferably one of [-0.2, 0.2] as shown in the example you shared. Any reactions as to how this is possible? My annual volatility is high for specific returns (0.6), but lower on individual exposures (<0.1).

Separately, the default Cumulative Returns and Annual Volatility charts shown in the lower right are for delay=0 which I didn't find relevant for what I'm researching. In case anyone else finds it useful, you can update the delay on those charts by passing the delay parameter within factor_portfolio_returns().

portfolio_returns, portfolio_pos = factor_portfolio_returns(factor, pricing, delay=2)  

@Cem: Re exposure range: Is it possible you are not equal weighting your factor?

Also to clarify: delay=0 means you are trading into the factor when it's available, so maybe you compute it any time before close and trade into it on that same day. delay=1 then means that you have one additional day to act on the signal. I agree though, that delay=1 probably is more relevant here.

Thank you Thomas for the notebook. I definitely needed this notebook to better determine the alpha of my factors and also save a couple steps in development. By the way, thumbs up on your blog post about copulas.

Glad you find it useful. Feel free to post your tearsheet here if you have an interesting factor that looks promising. After running the NB, you can just delete the cells in the NB that have your factor logic and only leave the tearsheet output to not reveal your IP.

Hi Thomas,

Thank you for this awesome notebook! I thought I'd give it a try with a simple, yet effective factor: fcf_yield from MorningStar. (effective up until March 2016 that is. Not so much thereafter).

I wanted to first see when Mean IC and Specific Returns 'peak' for this factor, and then possibly apply a SMA on the factor based on this (as per your comments in the NB). However, as you can see in the attached, both Mean IC and Specific Returns just keep rising...

I've most likely made a mistake somewhere, though I didn't really change much from your original NB. Or could it be related to the 'data overlapping' problem in AL that Michael Matthews mentioned in another thread a while back?

Loading notebook preview...
Notebook previews are currently unavailable.

@Joakim: Great, thanks for posting this. I don't think you did anything wrong here, code-wise, and I've definitely seen this before. I don't know the exact cause but can come up with two hypotheses:
1. The factor is short volatility which is something that has been working (sort of) well for a long time. As such, if you just keep betting on something that keeps going up you will have this slow stacking.
2. The IC overall is quite low, as such, it does not take much to keep it modestly increasing.

I think it could be a combination of these two but happy to hear other thoughts.

Also, the factor is daily and seems to have a long alpha horizon (if we were to believe this, which I don't, but for the sake of argument). In that case you'll want to either subsample, or better, smooth the signal a bit by taking the average, like I did originally. When you're just developing the factor at first it's fine to do it like you do here, just saying that could be a good next step if you wanted to develop this idea further.

Here is a factor anomaly that shows up in several research papers: investment to asset ratio.

Loading notebook preview...
Notebook previews are currently unavailable.

Thanks Thomas, very helpful!

the factor is daily and seems to have a long alpha horizon

Using your NB, I wanted to try to find out the best rebalance period for the factor, and/or what window_length to use when smoothing the factor using SMA. How would I go about doing this using your NB?

or better, smooth the signal a bit by taking the average, like I did
originally.

Yes, this was what I was planning to do. How can I determine a reasonable period to use in the smoothing though (without risking too much overfitting)? The 'price' portion of the yield updates daily, and is noisy, so should I smooth by 63 days (FCF should update quarterly)?

I tried smoothing over 65 days in the attached, and it doesn't appear to make any difference.

Note: I'm not really pursuing this specific factor, just using it as an example when trying to learn how to use your NB. Also, the full NB didn't complete due to the memory limit and kernel restarting.

Loading notebook preview...
Notebook previews are currently unavailable.

Yeah, you probably don't need any smoothing as the factor only changes once a quarter. In that case you should definitely sub-sample the factor to that frequency however to avoid overlapping windows when computing the IC curve, this might change things.