Once you have created an alpha factor in Research, you can analyze it using Alphalens. With Alphalens, you can explore the predictive ability of an alpha factor without having to write an algorithm or run a backtest.
When should you use Alphalens?
Alphalens was designed to be used early and often when developing a factor. Once you have defined a trading universe and constructed a factor in a pipeline, it is best to analyze the factor using Alphalens in Research before writing an algorithm and running a backtest. The primary reason to use Alphalens before backtesting is that Alphalens is fast. It is much faster to analyze a factor with Alphalens than it is to run a full backtest in Quantopian's algorithm IDE. While the Quantopian backtester focuses on simulating order fills and realistic trading conditions, Alphalens analyzes the "best case", focusing on aligning factor data to future prices to see if a theoretical predictive relationship exists. This way, you can discard ideas that don't work before spending time writing them into an algorithm.
If Alphalens reveals that a factor is predictive of forward returns, the next step is to work that factor into an algorithm and backtest it to see if the signal holds up in realistic trading conditions.
What does Alphalens do?
Alphalens provides a host of statistics and plots about an alpha factor, including:
- Returns Analysis
- Information Coefficient Analysis
- Turnover Analysis
- Grouped Analysis
These plots and analyses are designed to make it a quick and visual process to determine if an alpha factor is worth adding to an algorithm and running it through the backtester.
How do you use Alphalens?
On Quantopian, analyzing an alpha factor with Alphalens is done in two steps:
- Generate factor and returns data.
- Run a tearsheet to run an Alphalens analysis and view the results.
Generating Factor and Returns Data¶
Alphalens is a factor analysis tool that requires you to bring your own data to be analyzed. In order to analyze a factor on Quantopian, you need to first create a factor and run it with the Pipeline API and then get returns data using
Alphalens accepts factor data in the format of a single column pipeline output. More specifically, the factor needs to be stored in a
Seriesindexed by timestamp (level 0) and asset (level 1). The best way to get a factor in this output format is to run a pipeline in Research, and slice into the factor column.
Alphalens also requires returns data for the assets in your factor dataframe. The best way to get returns data for running an Alphalens tearsheet is to use
get_forward_returns() method takes your factor data as input and returns a dataframe of forward returns data in the required format.
Combining Factor and Returns Data¶
As a last step, the factor and returns data obtained from the steps above need to be combined into a single dataframe. Alphalens makes this step easy by offering a utility method:
get_clean_factor(). The example below demonstrates getting factor data from a Pipeline output, getting returns data from
get_forward_returns(), and then combining the data using
# <Pipeline code goes here> pipeline_result = run_pipeline(make_pipeline(), '2015-01-01', '2016-01-01') factor_data = pipeline_result['my_factor'] from quantopian.research import get_forward_returns # This example assumes that the pipeline above was run on the US_EQUITIES domain. al_returns = get_forward_returns( factor=factor_data, periods=, domain=US_EQUITIES ) import alphalens as al al_data = al.utils.get_clean_factor( factor_data, al_returns, quantiles=5, bins=None, ) al.tears.create_full_tear_sheet(al_data)
In the example above, we also provided the argument
get_clean_factor. Alphalens tearsheets include quantile analysis and
get_clean_factor adds the additional quantile data to
al_data before we create the tearsheet.
Once your data is formatted for use in Alphalens, you can get key metrics about your factor using the functions in
alphalens.performance functions provide specific, standalone plots and tables, while the
alphalens.tears functions provide "tearsheets" that output many plots and tables to provide a comprehensive analysis of your alpha factor.
The most commonly-used function is the
create_full_tear_sheet() function, which runs a full tearsheet including plots of mean returns, turnover, information coefficient, and more.
Tips & Tricks¶
When comparing mean period wise returns for the same factor across different tearsheets, you will need to consider the periods you are comparing. This is because mean period wise returns are not simply excess returns. Rather, mean period wise returns reflects the rate of returns -- returns adjusted as if they had grown linearly over time. When computing the rate of return, Alphalens uses the shortest period length as the base time. As such, the mean period wise returns data will vary based on the periods selected.
For example, say you are running two full tearsheets for the same set of alpha factor values. For tearsheet A, you get forward returns data with
periods=(1, 5, 10). For tearsheet B, you get forward returns data with
periods=(10, 20, 30). The mean period wise returns charts/plots for the 10D period will not be the same across these two tearsheets. This is because in tearsheet A, the mean period wise returns are being normalized to daily returns (period=1D) whereas in tearsheet B, the returns are being normalized to 10-day returns (period=10D). The underlying data is the same, but the units plotted on the Y axis are different between tearsheets A and B.
Why does Alphalens use the rate of returns? Returns compound over time, so it is important to normalize all of the different periods provided to a single tearsheet to a single unit rate of returns. This is why many backtests are expressed in units of "annualized returns", even if a simulation is run for multiple years. Most people want to know the rate at which a strategy produces returns, not just the total return amount.
Dropped 1.6% entries from factor data: 1.6% in forward returns computation and 0.0% in binning phase (set max_loss=0 to see potentially suppressed Exceptions).
If the total percentage of entries dropped exceeds the
max_loss parameter value (by default 35%), you will receive a fatal
In order to avoid this error, make sure that you are not providing
inf values as input to your tearsheet and make sure that your choice of quantiles or bins when calling
get_clean_factor() is appropriate. In most cases, if a high percentage of enries are dropped during the binning phase, it means that the choice of quantiles or bins doesn't suit the distribution of the factor data. For example, if you have
quantiles = 3 for data that is 50% zeros, your binning loss will be extremely high. To reduce the number of dropped entries in this stage, try reducing the number of bins/quantiles or increasing the granularity of your data.
While preparing your data for Alphalens, you might see an error message like this:
ValueError: Wrong number of items passed 2, placement implies 1