Once you have created an alpha factor in Research, you can analyze it using Alphalens. With Alphalens, you can explore the predictive ability of an alpha factor without having to write an algorithm or run a backtest.
When should you use Alphalens?
Alphalens was designed to be used early and often when developing a factor. Once you have defined a trading universe and constructed a factor in a pipeline, it is best to analyze the factor using Alphalens in Research before writing an algorithm and running a backtest. The primary reason to use Alphalens before backtesting is that Alphalens is fast. It is much faster to analyze a factor with Alphalens than it is to run a full backtest in Quantopian's algorithm IDE. While the Quantopian backtester focuses on simulating order fills and realistic trading conditions, Alphalens analyzes the "best case", focusing on aligning factor data to future prices to see if a theoretical predictive relationship exists. This way, you can discard ideas that don't work before spending time writing them into an algorithm.
If Alphalens reveals that a factor is predictive of forward returns, the next step is to work that factor into an algorithm and backtest it to see if the signal holds up in realistic trading conditions.
What does Alphalens do?
Alphalens provides a host of statistics and plots about an alpha factor, including:
- Returns Analysis
- Information Coefficient Analysis
- Turnover Analysis
- Grouped Analysis
These plots and analyses are designed to make it a quick and visual process to determine if an alpha factor is worth adding to an algorithm and running it through the backtester.
How do you use Alphalens?
On Quantopian, analyzing an alpha factor with Alphalens is done in two steps:
- Transform data into the correct format for Alphalens ingestion.
- Run a tearsheet to run an Alphalens analysis and view the results.
Alphalens works with data that is structured in a particular way. This section explains how to structure your data so that you can analyze a factor in Alphalens.
The best way to structure your data for use in Alphalens is to use the
get_clean_factor_and_forward_returns() utility function. This function takes factor data and pricing data as input and then appropriately constructs a data structure that can be fed directly as input to Alphalens' plotting and analysis tools.
get_clean_factor_and_forward_returns() requires factor data and pricing/returns data in order to create valid input for Alphalens.
1. Factor Data. Alphalens accepts factor data in the format of a single column pipeline output. More specifically, the factor needs to be stored in a
Series indexed by timestamp (level 0) and asset (level 1). The best way to get a factor in this output format is to run a pipeline in Research, and slice into the factor column.
2. Pricing or Returns Data. Alphalens also requires pricing or returns data that is stored in a wide form
DataFrame indexed by timestamp with assets in the columns. The default output format of
get_pricing() produces this output structure with pricing data when multiple assets are provided.
Once you've collected your alpha factor values and pricing data as described above, use
get_clean_factor_and_forward_returns() to transform it into the format required by Alphalens tearsheets. For example, assuming you've defined some function
make_pipeline() that returns a
Pipeline object, you can format your data like this:
import alphalens pipeline_result = run_pipeline(make_pipeline(), '2015-01-01', '2016-01-01') alpha_factor = pipeline_result['my_factor'] asset_list = alpha_factor.index.levels.unique() pricing_data = get_pricing(asset_list, '2015-01-01, '2016-02-01', fields='close_price') factor_data = alphalens.utils.get_clean_factor_and_forward_returns(factor=alpha_factor, prices=pricing_data)
In the example above, the date range for
pricing_data ends after the date range for
alpha_factor. For an explanation, see the Max Loss section.
get_clean_factor_and_forward_returns() with factor data and pricing data will return data that is structured in a way that can be used with Alphalens. In the example above, the format of
factor_data will be a
pd.DataFrame indexed by date (level 0) and asset (level 1). For each asset-date pair, the
pd.DataFrame contains the values for a single alpha factor, forward returns for each period, the factor quantile/bin that factor value belongs to, and (optionally) the group the asset belongs to. Forward returns column names follow the format accepted by
Once your data is formatted for use in Alphalens, you can get key metrics about your factor using the functions in
alphalens.performance functions provide specific, standalone plots and tables, while the
alphalens.tears functions provide "tearsheets" that output many plots and tables to provide a comprehensive analysis of your alpha factor.
The most commonly-used function is the
create_full_tear_sheet() function, which runs a full tearsheet including plots of mean returns, turnover, information coefficient, and more.
Tips & Tricks¶
Mean Period Wise Returns¶
When comparing mean period wise returns for the same factor across different tearsheets, you will need to consider the periods you are comparing.
For example, say you are running two full tearsheets for the same set of alpha factor values. To get your factor data for Tearsheet A, you run
periods=(1, 5, 10). To get your factor data for Tearsheet B, you run
periods=(10, 20, 30). The mean period wise returns charts/plots for the 10D period will not be the same across these two tearsheets.
This is because mean period wise returns are not simply excess returns. Rather, mean period wise returns reflects the rate of returns -- returns adjusted as if they had grown linearly over time. When computing the rate of return, Alphalens uses the shortest period length as the base time. As such, the mean period wise returns data will vary based on the periods selected.
Why does Alphalens use the rate of returns? Returns compound over time, so it can be difficult/counterintuitive to compare returns accurately across different time periods.
Dropped 1.6% entries from factor data: 1.6% in forward returns computation and 0.0% in binning phase (set max_loss=0 to see potentially suppressed Exceptions).
If the total percentage of entries dropped exceeds the
max_loss parameter value (by default 35%), you will receive a fatal
In order to avoid this error, reduce the number of entries dropped in the forward returns computation and the binning phase:
"Forward returns computation" refers to the calculation of forward returns based on factor and pricing data. Dropped entries in forward returns computation is likely due to a lack of pricing data -- specifically, there is insufficient data to compute forward returns over the
periods specified. Some dropped entries in forward returns computation are unavoidable due to legitimately missing pricing data; however, you can usually greatly reduce the number of dropped entries in this stage by ensuring that your pricing data end date is sufficiently extended past your alpha factor data end date.
"Binning phase" refers to the division of assets into bins (as dictated by the
quantiles parameter). Entries are dropped in the binning phase when there is insufficient granularity to sort them into bins. For example, if you have
quantiles = 3 for data that is 50% zeros, your binning loss will be extremely high. To reduce the number of dropped entries in this stage, try reducing the number of bins/quantiles or increasing the granularity of your data.
ValueError: Wrong number of items passed 2, placement implies 1
This error is generally caused by incorrectly shaped factor values data; specifically, attempting to pass values for multiple factors to the
factor parameter of