Alphalens

Once you have created an alpha factor in Research, you can analyze it using Alphalens. With Alphalens, you can explore the predictive ability of an alpha factor without having to write an algorithm or run a backtest.

When should you use Alphalens?

Alphalens was designed to be used early and often when developing a factor. Once you have defined a trading universe and constructed a factor in a pipeline, it is best to analyze the factor using Alphalens in Research before writing an algorithm and running a backtest. The primary reason to use Alphalens before backtesting is that Alphalens is fast. It is much faster to analyze a factor with Alphalens than it is to run a full backtest in Quantopian's algorithm IDE. While the Quantopian backtester focuses on simulating order fills and realistic trading conditions, Alphalens analyzes the "best case", focusing on aligning factor data to future prices to see if a theoretical predictive relationship exists. This way, you can discard ideas that don't work before spending time writing them into an algorithm.

If Alphalens reveals that a factor is predictive of forward returns, the next step is to work that factor into an algorithm and backtest it to see if the signal holds up in realistic trading conditions.

What does Alphalens do?

Alphalens provides a host of statistics and plots about an alpha factor, including:

  • Returns Analysis
  • Information Coefficient Analysis
  • Turnover Analysis
  • Grouped Analysis

These plots and analyses are designed to make it a quick and visual process to determine if an alpha factor is worth adding to an algorithm and running it through the backtester.

How do you use Alphalens?

On Quantopian, analyzing an alpha factor with Alphalens is done in two steps:

  1. Transform data into the correct format for Alphalens ingestion.
  2. Run a tearsheet to run an Alphalens analysis and view the results.

Transforming Data

Alphalens works with data that is structured in a particular way. This section explains how to structure your data so that you can analyze a factor in Alphalens.

The best way to structure your data for use in Alphalens is to use the get_clean_factor_and_forward_returns() utility function. This function takes factor data and pricing data as input and then appropriately constructs a data structure that can be fed directly as input to Alphalens' plotting and analysis tools.

get_clean_factor_and_forward_returns() requires factor data and pricing/returns data in order to create valid input for Alphalens.

1. Factor Data. Alphalens accepts factor data in the format of a single column pipeline output. More specifically, the factor needs to be stored in a MultiIndex Series indexed by timestamp (level 0) and asset (level 1). The best way to get a factor in this output format is to run a pipeline in Research, and slice into the factor column.

Example of alpha factor values data.
date asset value
2014-01-01 AAPL 0.5
BA 1.1
CMG 1.7
DAL 0.1
LULU 2.7

2. Pricing or Returns Data. Alphalens also requires pricing or returns data that is stored in a wide form DataFrame indexed by timestamp with assets in the columns. The default output format of get_pricing() produces this output structure with pricing data when multiple assets are provided.

Example of pricing data.
  AAPL BA CMG DAL LULU
Date          
2014-01-01 605.12 24.58 11.72 54.43 37.14
2014-01-02 604.35 22.23 12.21 52.78 33.63
2014-01-03 607.94 21.68 14.36 53.94 29.37

Once you've collected your alpha factor values and pricing data as described above, use get_clean_factor_and_forward_returns() to transform it into the format required by Alphalens tearsheets. For example, assuming you've defined some function make_pipeline() that returns a Pipeline object, you can format your data like this:

import alphalens

pipeline_result = run_pipeline(make_pipeline(), '2015-01-01', '2016-01-01')

alpha_factor = pipeline_result['my_factor']

asset_list = alpha_factor.index.levels[1].unique()
pricing_data = get_pricing(asset_list, '2015-01-01, '2016-02-01', fields='close_price')

factor_data = alphalens.utils.get_clean_factor_and_forward_returns(factor=alpha_factor,
prices=pricing_data)

Note

In the example above, the date range for pricing_data ends after the date range for alpha_factor. For an explanation, see the Max Loss section.

Running get_clean_factor_and_forward_returns() with factor data and pricing data will return data that is structured in a way that can be used with Alphalens. In the example above, the format of factor_data will be a MultiIndex pd.DataFrame indexed by date (level 0) and asset (level 1). For each asset-date pair, the pd.DataFrame contains the values for a single alpha factor, forward returns for each period, the factor quantile/bin that factor value belongs to, and (optionally) the group the asset belongs to. Forward returns column names follow the format accepted by pd.Timedelta (e.g. 1D, 5D, 10D, etc).

Example of Alphalens data.
    1D 5D 10D factor group factor_quantile
Date Asset            
2014-01-01 AAPL 0.09 -0.01 -0.079 0.5 G1 3
BA 0.02 0.06 0.020 -1.1 G2 5
CMG 0.03 0.09 0.036 1.7 G2 1
DAL -0.02 -0.06 -0.029 -0.1 G3 5
LULU -0.03 0.05 -0.009 2.7 G1 2

See also

There are a number of other functions in alphalens.utils that are useful for data cleaning and transformation. See the full list in the API Reference.

Running Tearsheets

Once your data is formatted for use in Alphalens, you can get key metrics about your factor using the functions in alphalens.performance and alphalens.tears. The alphalens.performance functions provide specific, standalone plots and tables, while the alphalens.tears functions provide "tearsheets" that output many plots and tables to provide a comprehensive analysis of your alpha factor.

The most commonly-used function is the create_full_tear_sheet() function, which runs a full tearsheet including plots of mean returns, turnover, information coefficient, and more.

Tips & Tricks

Mean Period Wise Returns

When comparing mean period wise returns for the same factor across different tearsheets, you will need to consider the periods you are comparing.

For example, say you are running two full tearsheets for the same set of alpha factor values. To get your factor data for Tearsheet A, you run get_clean_factor_and_forward_returns() with periods=(1, 5, 10). To get your factor data for Tearsheet B, you run get_clean_factor_and_forward_returns() with periods=(10, 20, 30). The mean period wise returns charts/plots for the 10D period will not be the same across these two tearsheets.

This is because mean period wise returns are not simply excess returns. Rather, mean period wise returns reflects the rate of returns -- returns adjusted as if they had grown linearly over time. When computing the rate of return, Alphalens uses the shortest period length as the base time. As such, the mean period wise returns data will vary based on the periods selected.

Note

Why does Alphalens use the rate of returns? Returns compound over time, so it can be difficult/counterintuitive to compare returns accurately across different time periods.

Max Loss

While preparing your data for Alphalens ingestion using get_clean_factor_and_forward_returns(), you will see a status message like this:

Dropped 1.6% entries from factor data: 1.6% in forward returns computation
and 0.0% in binning phase (set max_loss=0 to see potentially suppressed Exceptions).

If the total percentage of entries dropped exceeds the max_loss parameter value (by default 35%), you will receive a fatal MaxLossExceededError.

In order to avoid this error, reduce the number of entries dropped in the forward returns computation and the binning phase:

"Forward returns computation" refers to the calculation of forward returns based on factor and pricing data. Dropped entries in forward returns computation is likely due to a lack of pricing data -- specifically, there is insufficient data to compute forward returns over the periods specified. Some dropped entries in forward returns computation are unavoidable due to legitimately missing pricing data; however, you can usually greatly reduce the number of dropped entries in this stage by ensuring that your pricing data end date is sufficiently extended past your alpha factor data end date.

"Binning phase" refers to the division of assets into bins (as dictated by the bins or quantiles parameter). Entries are dropped in the binning phase when there is insufficient granularity to sort them into bins. For example, if you have quantiles = 3 for data that is 50% zeros, your binning loss will be extremely high. To reduce the number of dropped entries in this stage, try reducing the number of bins/quantiles or increasing the granularity of your data.

ValueError

While preparing your data for Alphalens ingestion using get_clean_factor_and_forward_returns(), you might see an error message like this:

ValueError: Wrong number of items passed 2, placement implies 1

This error is generally caused by incorrectly shaped factor values data; specifically, attempting to pass values for multiple factors to the factor parameter of get_clean_factor_and_forward_returns().

Alphalens can only analyze one factor at a time; you must pass a Series or a DataFrame with exactly one column.