Alphalens - Performance analysis of predictive alpha factors

A previous post showcased a pre-release version of Alphalens

Algorithms rely on predictive factors for success. We call these factors alphas. Alphas express a predictive relationship between some given set of information and future returns. By applying this relationship to multiple stocks we can hope to generate an alpha signal and trade off of it. Alphalens is a Python package for performance analysis of alpha factors which can be used to create factor models and cross-sectional equity algos.

Analyzing your factors in Research first allows you to spend less time writing, running and analyzing backtests with Pyfolio. Consequently, this allows for faster iteration of ideas, and a final algorithm that you can be confident in. Building a rigorous workflow with Alphalens will make your strategies more robust and less prone to overfitting - things we look for when evaluating algorithms.

Alphalens is only one part of the algorithm creation process. The main function of Alphalens is to surface the most relevant statistics and plots about a single alpha factor. This information can tell you if the alpha factor you found is predictive -- whether you have found an "edge." These statistics cover:

• Returns Analysis
• Information Analysis
• Turnover Analysis
• Group Analysis
• Event-style Analysis

Using Alphalens in Quantopian Research is pretty simple:

1. Define your pipeline alpha factors

class Momentum(CustomFactor):
inputs = [USEquityPricing.close]
window_length = 252
def compute(self, today, assets, out, close):
out[:] = close[-20] / close[0]


2. Create and run your pipeline

alpha_pipe = Pipeline(columns={'my_factor': Momentum()})
alphas = run_pipeline(alpha_pipe, start_date=start, end_date=end)


3. Get pricing data

assets = alphas.index.levels[1].unique()
pricing = get_pricing(assets, start, end + pd.Timedelta(days=30), fields="open_price")


4. Run the Alphalens factor tear sheet.

# Ingest and format data
factor_data = alphalens.utils.get_clean_factor_and_forward_returns(alphas['my_factor'], pricing)

# Run analysis
alphalens.tears.create_full_tear_sheet(factor_data)


# Get started with the Factor Analysis Lecture

Here are some places to check out too:
- Alphalens Docs for an analysis of a professional alpha factor.
- Alphalens Github repo
- Example notebook from the repo to use anywhere
- Alphalens pre-release post

Thanks to Luca and other community members for your contributions!

277
Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

20 responses

Looks great! Can't wait to try it out!

What were the major changes from the pre-release version?

We introduced a completely new API:

OLD (Please don't use this, it's deprecated and will be removed at some point)

alphalens.tears.create_factor_tear_sheet(alphas['my_factor'], pricing)


NEW

factor_data = alphalens.utils.get_clean_factor_and_forward_returns(alphas['my_factor'], pricing)
alphalens.tears.create_full_tear_sheet(factor_data)


This new interface is more standardized and allows users to pull out performance functions independent of the tear sheets much easier. It is also faster, especially in groupby operations. In addition, since the full tearsheet is pretty big we've broken it up into smaller tear sheets:

• create_returns_tear_sheet
• create_information_tear_sheet
• create_turnover_tear_sheet
• create_full_tear_sheet

including a much shortened summary tear sheet (pretty much the only one I use now)...

• create_summary_tear_sheet

We also moved the event-style analysis into its own tear sheet

• create_event_returns_tear_sheet

Thanks! Are there any performance upgrades? Previously I would hit time outs on more than a few years of sample data.

Thanks James for the clarification. I was digging through the codes and noticed the above as well. That really helps.

@Dan

I haven't had any timeout issues with Alphalens, but I didn't before the change either...perhaps the timeouts were from something else that was running in the NB?? But there are quite a few performance upgrades: we recalculate a lot less, use pandas categoricals, joins are more efficient, etc.

Also the strengths of the new API are really uncovered in the new Factor Analysis lecture.

Hi,nice tools. I found alphalens code on github and there is an example folder in it which shows two examples. There's also .jpg pictures showing the result , I'm wondering how to save the results as a .jpg file? Expecially the dataframe

The video for the Factor Analysis Lecture is no longer available. Is there somewhere else to view it?

Note, when using the example notebook, you'll need to edit the code to pass a parameter to create_information_tear_sheet as otherwise the code won't run.

al.tears.create_information_tear_sheet(factor_data)


You can also replace:

al.tears.create_full_tear_sheet(factor_data, by_group=True)


with:

al.tears.create_summary_tear_sheet(factor_data)


...to get a simplified tear sheet.

Finally, here's some code to plug in for a different factor to evaluate. This time it's the "Low Volatility" factor, also called the "Low Volatility Anomaly":

class MyFactor (CustomFactor):
""" Low Volatility factor """
inputs = [Returns(window_length=2)]
window_length=252

def compute(self, today, assets, out, returns):
out[:] = -np.nanstd(returns, axis=0) # Negative as we want to long low vol, short high vol


Its still running but getting a lot of warnings now.

Is this going to be updated ?

/usr/local/lib/python2.7/dist-packages/alphalens/plotting.py:727: FutureWarning: pd.rolling_apply is deprecated for Series and will be removed in a future version, replace with Series.rolling(center=False,min_periods=1,window=5).apply(args=,func=,kwargs=)
min_periods=1, args=(period,))
/usr/local/lib/python2.7/dist-packages/alphalens/plotting.py:767: FutureWarning: pd.rolling_apply is deprecated for DataFrame and will be removed in a future version, replace with DataFrame.rolling(center=False,min_periods=1,window=5).apply(args=,func=,kwargs=)
min_periods=1, args=(period,))
/usr/local/lib/python2.7/dist-packages/alphalens/plotting.py:727: FutureWarning: pd.rolling_apply is deprecated for Series and will be removed in a future version, replace with Series.rolling(center=False,min_periods=1,window=10).apply(args=,func=,kwargs=)
min_periods=1, args=(period,))
/usr/local/lib/python2.7/dist-packages/alphalens/plotting.py:767: FutureWarning: pd.rolling_apply is deprecated for DataFrame and will be removed in a future version, replace with DataFrame.rolling(center=False,min_periods=1,window=10).apply(args=,func=,kwargs=)
min_periods=1, args=(period,))
/usr/local/lib/python2.7/dist-packages/alphalens/plotting.py:727: FutureWarning: pd.rolling_apply is deprecated for Series and will be removed in a future version, replace with Series.rolling(center=False,min_periods=1,window=20).apply(args=,func=,kwargs=)
min_periods=1, args=(period,))
/usr/local/lib/python2.7/dist-packages/alphalens/plotting.py:767: FutureWarning: pd.rolling_apply is deprecated for DataFrame and will be removed in a future version, replace with DataFrame.rolling(center=False,min_periods=1,window=20).apply(args=,func=,kwargs=)
min_periods=1, args=(period,))
/usr/local/lib/python2.7/dist-packages/alphalens/plotting.py:727: FutureWarning: pd.rolling_apply is deprecated for Series and will be removed in a future version, replace with Series.rolling(center=False,min_periods=1,window=45).apply(args=,func=,kwargs=)
min_periods=1, args=(period,))
/usr/local/lib/python2.7/dist-packages/alphalens/plotting.py:767: FutureWarning: pd.rolling_apply is deprecated for DataFrame and will be removed in a future version, replace with DataFrame.rolling(center=False,min_periods=1,window=45).apply(args=,func=,kwargs=)
min_periods=1, args=(period,))
/usr/local/lib/python2.7/dist-packages/alphalens/plotting.py:519: FutureWarning: pd.rolling_mean is deprecated for Series and will be removed in a future version, replace with Series.rolling(window=22,center=False).mean()

@PaulB those warnings are nothing to worry about, they just remind developers they need to update the code eventually, when they decide to move to a more recent version of pandas. Anyway, I'd love to see an updated version of Alphalens on Q.

alphalens.utils.get_clean_factor_and_forward_returns(alphas['my_factor'], pricing)

That example doesn't have quantiles, groupby and periods, which is what I would prefer at least at first (none), to simplify, but without specifying them the defaults appear. I can set quantiles=2 and periods=(1,5) and wind up with just this

    1   5
Ann. alpha  -0.183  -0.277
beta    0.021   0.020
Mean Period Wise Return Bottom Quantile (bps)   3.626   20.283
Mean Period Wise Spread (bps)   -7.862  -8.641


But I want a single overall alpha score, if that makes sense.
Is it reasonable or wrong to be interested in a single value as opposed to quantiles and periods?

The reason: I want to record an alpha value for all 900+ current fundamentals like this for sorting.

Does anyone know how to obtain a single alpha value?

Hi everyone,

Regarding to the Lecture 38: Factor Analysis, there is a mistake about using the function plot_top_bottom_quantile_turnover(). Please see the detail here.

You are trying to sum a string to a Timedelta:

pricing = get_pricing(assets, '01-01-2010', '02-01-2018' + pd.Timedelta(days=30), fields="open_price")

# This cannot work
'02-01-2018' + pd.Timedelta(days=30)



Try this:

def get_daily_price(sid_universe, start_date, end_date, extra_days_before=0, extra_days_after=0):
"""
Creates a DataFrame containing daily percentage returns and price
"""
extra_days = math.ceil(extra_days_before * 365.0/252.0) + 3 # just to be sure
start_date = datetime.datetime.strptime(start_date, "%Y-%m-%d") - datetime.timedelta(days=extra_days)
start_date = start_date.strftime("%Y-%m-%d")
extra_days = math.ceil(extra_days_after * 365.0/252.0) + 3 # just to be sure
end_date = datetime.datetime.strptime(end_date, "%Y-%m-%d") + datetime.timedelta(days=extra_days)
end_date = end_date.strftime("%Y-%m-%d")
pricing = get_pricing(sid_universe, start_date=start_date, end_date=end_date, fields='open_price')
return pricing


Can one use Fundamentals data in Alphalens (to define a factor)?

Yes, sure. Certainly you can use the tool with price/volume information only, but there is little left to be discovered in there, so running Alphalens on Fundamentals or other alternative datasets is probably the way to go. If you look at the NB you can see that you just have to run Pipeline on any factor you like and pass the output to Alpahlens. More generally you can use alphalens without pipeline, but it requires a little bit more work on the data preparation step.

Thanks Luca. I have been trying to use the following factor based on fundamentals,

class Factor(CustomFactor):
inputs = [USEquityPricing.close]
window_length = 11
def compute(self, today, assets, out, price):
x = Fundamentals.pb_ratio.latest
out[:] = np.nan_to_num(x)


but I am getting an error message that I do not know how to resolve:

out[:] = np.nan_to_num(x)
ValueError: setting an array element with a sequence.


The Fundamentals you like to use inside the compute method should be passed to inputs

class Factor(CustomFactor):
inputs = [Fundamentals.pb_ratio]
window_length = 11
def compute(self, today, assets, out, pb_ratio):
out[:] = np.nan_to_num(pb_ratio)


Thank you, Luca. I think there might be one small modification required to make the code run error-free:

 out[:] = np.nan_to_num(pb_ratio[-1])