Alphalens API Reference

Quick Reference

Utilities

alphalens.utils.get_clean_factor Formats the factor data, forward return data, and group mappings into a DataFrame that contains aligned MultiIndex indices of timestamp and asset.
alphalens.utils.get_clean_factor_and_forward_returns Formats the factor data, pricing data, and group mappings into a DataFrame that contains aligned MultiIndex indices of timestamp and asset.
alphalens.utils.get_forward_returns_columns Utility that detects and returns the columns that are forward returns

Tearsheets

alphalens.tears.create_event_returns_tear_sheet Creates a tear sheet to view the average cumulative returns for a factor within a window (pre and post event).
alphalens.tears.create_event_study_tear_sheet Creates an event study tear sheet for analysis of a specific event.
alphalens.tears.create_full_tear_sheet Creates a full tear sheet for analysis and evaluating single return predicting (alpha) factor.
alphalens.tears.create_information_tear_sheet Creates a tear sheet for information analysis of a factor.
alphalens.tears.create_returns_tear_sheet Creates a tear sheet for returns analysis of a factor.
alphalens.tears.create_summary_tear_sheet Creates a small summary tear sheet with returns, information, and turnover analysis.
alphalens.tears.create_turnover_tear_sheet Creates a tear sheet for analyzing the turnover properties of a factor.

Detailed Reference

Utilities

alphalens.utils.get_clean_factor(factor, forward_returns, groupby=None, binning_by_group=False, quantiles=5, bins=None, groupby_labels=None, max_loss=0.35, zero_aware=False)

Formats the factor data, forward return data, and group mappings into a DataFrame that contains aligned MultiIndex indices of timestamp and asset. The returned data will be formatted to be suitable for Alphalens functions.

It is safe to skip a call to this function and still make use of Alphalens functionalities as long as the factor data conforms to the format returned from get_clean_factor_and_forward_returns and documented here

Parameters:
  • factor (pd.Series - MultiIndex) --

    A MultiIndex Series indexed by timestamp (level 0) and asset (level 1), containing the values for a single alpha factor.

    -----------------------------------
        date    |    asset   |
    -----------------------------------
                |   AAPL     |   0.5
                -----------------------
                |   BA       |  -1.1
                -----------------------
    2014-01-01  |   CMG      |   1.7
                -----------------------
                |   DAL      |  -0.1
                -----------------------
                |   LULU     |   2.7
                -----------------------
    
  • forward_returns (pd.DataFrame - MultiIndex) --

    A MultiIndex DataFrame indexed by timestamp (level 0) and asset (level 1), containing the forward returns for assets.

    Forward returns column names must follow the format accepted by pd.Timedelta (e.g. '1D', '30m', '3h15m', '1D1h', etc).

    'date' index freq property must be set to a trading calendar (pandas DateOffset), see infer_trading_calendar for more details. This information is currently used only in cumulative returns computation

    ---------------------------------------
               |       | 1D  | 5D  | 10D
    ---------------------------------------
        date   | asset |     |     |
    ---------------------------------------
               | AAPL  | 0.09|-0.01|-0.079
               ----------------------------
               | BA    | 0.02| 0.06| 0.020
               ----------------------------
    2014-01-01 | CMG   | 0.03| 0.09| 0.036
               ----------------------------
               | DAL   |-0.02|-0.06|-0.029
               ----------------------------
               | LULU  |-0.03| 0.05|-0.009
               ----------------------------
    
  • groupby (pd.Series - MultiIndex or dict) -- Either A MultiIndex Series indexed by date and asset, containing the period wise group codes for each asset, or a dict of asset to group mappings. If a dict is passed, it is assumed that group mappings are unchanged for the entire time period of the passed factor data.
  • binning_by_group (bool) -- If True, compute quantile buckets separately for each group. This is useful when the factor values range vary considerably across gorups so that it is wise to make the binning group relative. You should probably enable this if the factor is intended to be analyzed for a group neutral portfolio
  • quantiles (int or sequence[float]) -- Number of equal-sized quantile buckets to use in factor bucketing. Alternately sequence of quantiles, allowing non-equal-sized buckets e.g. [0, .10, .5, .90, 1.] or [.05, .5, .95] Only one of 'quantiles' or 'bins' can be not-None
  • bins (int or sequence[float]) -- Number of equal-width (valuewise) bins to use in factor bucketing. Alternately sequence of bin edges allowing for non-uniform bin width e.g. [-4, -2, -0.5, 0, 10] Chooses the buckets to be evenly spaced according to the values themselves. Useful when the factor contains discrete values. Only one of 'quantiles' or 'bins' can be not-None
  • groupby_labels (dict) -- A dictionary keyed by group code with values corresponding to the display name for each group.
  • max_loss (float, optional) --

    Maximum percentage (0.00 to 1.00) of factor data dropping allowed, computed comparing the number of items in the input factor index and the number of items in the output DataFrame index.

    Factor data can be partially dropped due to being flawed itself (e.g. NaNs), not having provided enough price data to compute forward returns for all factor values, or because it is not possible to perform binning.

    Set max_loss=0 to avoid Exceptions suppression.

  • zero_aware (bool, optional) -- If True, compute quantile buckets separately for positive and negative signal values. This is useful if your signal is centered and zero is the separation between long and short signals, respectively. 'quantiles' is None.
Returns:

merged_data -- A MultiIndex Series indexed by date (level 0) and asset (level 1), containing the values for a single alpha factor, forward returns for each period, the factor quantile/bin to which that factor belongs, and (optionally) the group to which the asset belongs.

Forward returns column names follow the format accepted by pd.Timedelta (e.g. '1D', '30m', '3h15m', '1D1h', etc).

date' index freq property (merged_data.index.levels[0].freq) is the same as that of the input forward returns data. This is currently used only in cumulative returns computation.

-------------------------------------------------------------------
           |       | 1D  | 5D  | 10D  |factor|group|factor_quantile
-------------------------------------------------------------------
    date   | asset |     |     |      |      |     |
-------------------------------------------------------------------
           | AAPL  | 0.09|-0.01|-0.079|  0.5 |  G1 |      3
           --------------------------------------------------------
           | BA    | 0.02| 0.06| 0.020| -1.1 |  G2 |      5
           --------------------------------------------------------
2014-01-01 | CMG   | 0.03| 0.09| 0.036|  1.7 |  G2 |      1
           --------------------------------------------------------
           | DAL   |-0.02|-0.06|-0.029| -0.1 |  G3 |      5
           --------------------------------------------------------
           | LULU  |-0.03| 0.05|-0.009|  2.7 |  G1 |      2
           --------------------------------------------------------

Return type:

pd.DataFrame - MultiIndex

alphalens.utils.get_clean_factor_and_forward_returns(factor, prices, groupby=None, binning_by_group=False, quantiles=5, bins=None, periods=(1, 5, 10), filter_zscore=20, groupby_labels=None, max_loss=0.35, zero_aware=False)

Formats the factor data, pricing data, and group mappings into a DataFrame that contains aligned MultiIndex indices of timestamp and asset. The returned data will be formatted to be suitable for Alphalens functions.

It is safe to skip a call to this function and still make use of Alphalens functionalities as long as the factor data conforms to the format returned from get_clean_factor_and_forward_returns and documented here.

Parameters:
  • factor (pd.Series - MultiIndex) --

    A MultiIndex Series indexed by timestamp (level 0) and asset (level 1), containing the values for a single alpha factor.

    -----------------------------------
        date    |    asset   |
    -----------------------------------
                |   AAPL     |   0.5
                -----------------------
                |   BA       |  -1.1
                -----------------------
    2014-01-01  |   CMG      |   1.7
                -----------------------
                |   DAL      |  -0.1
                -----------------------
                |   LULU     |   2.7
                -----------------------
    
  • prices (pd.DataFrame) --

    A wide form Pandas DataFrame indexed by timestamp with assets in the columns.

    Pricing data must span the factor analysis time period plus an additional buffer window that is greater than the maximum number of expected periods in the forward returns calculations. It is important to pass the correct pricing data in depending on what time of period your signal was generated so to avoid lookahead bias, or delayed calculations.

    'Prices' must contain at least an entry for each timestamp/asset combination in 'factor'. This entry should reflect the buy price for the assets and usually it is the next available price after the factor is computed but it can also be a later price if the factor is meant to be traded later (e.g. if the factor is computed at market open but traded 1 hour after market open the price information should be 1 hour after market open).

    'Prices' must also contain entries for timestamps following each timestamp/asset combination in 'factor', as many more timestamps as the maximum value in 'periods'. The asset price after 'period' timestamps will be considered the sell price for that asset when computing 'period' forward returns.

    ----------------------------------------------------
                | AAPL |  BA  |  CMG  |  DAL  |  LULU  |
    ----------------------------------------------------
       Date     |      |      |       |       |        |
    ----------------------------------------------------
    2014-01-01  |605.12| 24.58|  11.72| 54.43 |  37.14 |
    ----------------------------------------------------
    2014-01-02  |604.35| 22.23|  12.21| 52.78 |  33.63 |
    ----------------------------------------------------
    2014-01-03  |607.94| 21.68|  14.36| 53.94 |  29.37 |
    ----------------------------------------------------
    
  • groupby (pd.Series - MultiIndex or dict) -- Either A MultiIndex Series indexed by date and asset, containing the period wise group codes for each asset, or a dict of asset to group mappings. If a dict is passed, it is assumed that group mappings are unchanged for the entire time period of the passed factor data.
  • binning_by_group (bool) -- If True, compute quantile buckets separately for each group. This is useful when the factor values range vary considerably across gorups so that it is wise to make the binning group relative. You should probably enable this if the factor is intended to be analyzed for a group neutral portfolio
  • quantiles (int or sequence[float]) -- Number of equal-sized quantile buckets to use in factor bucketing. Alternately sequence of quantiles, allowing non-equal-sized buckets e.g. [0, .10, .5, .90, 1.] or [.05, .5, .95] Only one of 'quantiles' or 'bins' can be not-None
  • bins (int or sequence[float]) -- Number of equal-width (valuewise) bins to use in factor bucketing. Alternately sequence of bin edges allowing for non-uniform bin width e.g. [-4, -2, -0.5, 0, 10] Chooses the buckets to be evenly spaced according to the values themselves. Useful when the factor contains discrete values. Only one of 'quantiles' or 'bins' can be not-None
  • periods (sequence[int]) -- periods to compute forward returns on.
  • filter_zscore (int or float, optional) -- Sets forward returns greater than X standard deviations from the the mean to nan. Set it to 'None' to avoid filtering. Caution: this outlier filtering incorporates lookahead bias.
  • groupby_labels (dict) -- A dictionary keyed by group code with values corresponding to the display name for each group.
  • max_loss (float, optional) --

    Maximum percentage (0.00 to 1.00) of factor data dropping allowed, computed comparing the number of items in the input factor index and the number of items in the output DataFrame index.

    Factor data can be partially dropped due to being flawed itself (e.g. NaNs), not having provided enough price data to compute forward returns for all factor values, or because it is not possible to perform binning.

    Set max_loss=0 to avoid Exceptions suppression.

  • zero_aware (bool, optional) -- If True, compute quantile buckets separately for positive and negative signal values. This is useful if your signal is centered and zero is the separation between long and short signals, respectively.
Returns:

merged_data -- A MultiIndex Series indexed by date (level 0) and asset (level 1), containing the values for a single alpha factor, forward returns for each period, the factor quantile/bin to which the factor value belongs, and (optionally) the group to which the asset belongs.

Forward returns column names follow the format accepted by pd.Timedelta (e.g. '1D', '30m', '3h15m', '1D1h', etc).

'date' index freq property (merged_data.index.levels[0].freq) will be set to a trading calendar (pandas DateOffset) inferred from the input data (see infer_trading_calendar for more details). This is currently used only in cumulative returns computation.

-------------------------------------------------------------------
           |       | 1D  | 5D  | 10D  |factor|group|factor_quantile
-------------------------------------------------------------------
    date   | asset |     |     |      |      |     |
-------------------------------------------------------------------
           | AAPL  | 0.09|-0.01|-0.079|  0.5 |  G1 |      3
           --------------------------------------------------------
           | BA    | 0.02| 0.06| 0.020| -1.1 |  G2 |      5
           --------------------------------------------------------
2014-01-01 | CMG   | 0.03| 0.09| 0.036|  1.7 |  G2 |      1
           --------------------------------------------------------
           | DAL   |-0.02|-0.06|-0.029| -0.1 |  G3 |      5
           --------------------------------------------------------
           | LULU  |-0.03| 0.05|-0.009|  2.7 |  G1 |      2
           --------------------------------------------------------

Return type:

pd.DataFrame - MultiIndex

alphalens.utils.get_forward_returns_columns(columns)

Utility that detects and returns the columns that are forward returns

Tearsheets

alphalens.tears.create_event_returns_tear_sheet(*args, **kwargs)

Creates a tear sheet to view the average cumulative returns for a factor within a window (pre and post event).

Parameters:
  • factor_data (pd.DataFrame - MultiIndex) -- A MultiIndex Series indexed by date (level 0) and asset (level 1), containing the values for a single alpha factor, the factor quantile/bin that factor value belongs to and (optionally) the group the asset belongs to. - See full explanation in utils.get_clean_factor_and_forward_returns
  • prices (pd.DataFrame) -- A DataFrame indexed by date with assets in the columns containing the pricing data. - See full explanation in utils.get_clean_factor_and_forward_returns
  • avgretplot (tuple (int, int) - (before, after)) -- If not None, plot quantile average cumulative returns
  • long_short (bool) -- Should this computation happen on a long short portfolio? if so then factor returns will be demeaned across the factor universe
  • group_neutral (bool) -- Should this computation happen on a group neutral portfolio? if so, returns demeaning will occur on the group level.
  • std_bar (boolean, optional) -- Show plots with standard deviation bars, one for each quantile
  • by_group (bool) -- If True, display graphs separately for each group.
alphalens.tears.create_event_study_tear_sheet(*args, **kwargs)

Creates an event study tear sheet for analysis of a specific event.

Parameters:
  • factor_data (pd.DataFrame - MultiIndex) -- A MultiIndex DataFrame indexed by date (level 0) and asset (level 1), containing the values for a single event, forward returns for each period, the factor quantile/bin that factor value belongs to, and (optionally) the group the asset belongs to.
  • prices (pd.DataFrame, required only if 'avgretplot' is provided) -- A DataFrame indexed by date with assets in the columns containing the pricing data. - See full explanation in utils.get_clean_factor_and_forward_returns
  • avgretplot (tuple (int, int) - (before, after), optional) -- If not None, plot event style average cumulative returns within a window (pre and post event).
  • rate_of_ret (bool, optional) -- Display rate of return instead of simple return in 'Mean Period Wise Return By Factor Quantile' and 'Period Wise Return By Factor Quantile' plots
  • n_bars (int, optional) -- Number of bars in event distribution plot
alphalens.tears.create_full_tear_sheet(*args, **kwargs)

Creates a full tear sheet for analysis and evaluating single return predicting (alpha) factor.

Parameters:
  • factor_data (pd.DataFrame - MultiIndex) -- A MultiIndex DataFrame indexed by date (level 0) and asset (level 1), containing the values for a single alpha factor, forward returns for each period, the factor quantile/bin that factor value belongs to, and (optionally) the group the asset belongs to. - See full explanation in utils.get_clean_factor_and_forward_returns
  • long_short (bool) -- Should this computation happen on a long short portfolio? - See tears.create_returns_tear_sheet for details on how this flag affects returns analysis
  • group_neutral (bool) -- Should this computation happen on a group neutral portfolio? - See tears.create_returns_tear_sheet for details on how this flag affects returns analysis - See tears.create_information_tear_sheet for details on how this flag affects information analysis
  • by_group (bool) -- If True, display graphs separately for each group.
alphalens.tears.create_information_tear_sheet(*args, **kwargs)

Creates a tear sheet for information analysis of a factor.

Parameters:
  • factor_data (pd.DataFrame - MultiIndex) -- A MultiIndex DataFrame indexed by date (level 0) and asset (level 1), containing the values for a single alpha factor, forward returns for each period, the factor quantile/bin that factor value belongs to, and (optionally) the group the asset belongs to. - See full explanation in utils.get_clean_factor_and_forward_returns
  • group_neutral (bool) -- Demean forward returns by group before computing IC.
  • by_group (bool) -- If True, display graphs separately for each group.
alphalens.tears.create_returns_tear_sheet(*args, **kwargs)

Creates a tear sheet for returns analysis of a factor.

Parameters:
  • factor_data (pd.DataFrame - MultiIndex) -- A MultiIndex DataFrame indexed by date (level 0) and asset (level 1), containing the values for a single alpha factor, forward returns for each period, the factor quantile/bin that factor value belongs to, and (optionally) the group the asset belongs to. - See full explanation in utils.get_clean_factor_and_forward_returns
  • long_short (bool) -- Should this computation happen on a long short portfolio? if so, then mean quantile returns will be demeaned across the factor universe. Additionally factor values will be demeaned across the factor universe when factor weighting the portfolio for cumulative returns plots
  • group_neutral (bool) -- Should this computation happen on a group neutral portfolio? if so, returns demeaning will occur on the group level. Additionally each group will weight the same in cumulative returns plots
  • by_group (bool) -- If True, display graphs separately for each group.
alphalens.tears.create_summary_tear_sheet(*args, **kwargs)

Creates a small summary tear sheet with returns, information, and turnover analysis.

Parameters:
  • factor_data (pd.DataFrame - MultiIndex) -- A MultiIndex DataFrame indexed by date (level 0) and asset (level 1), containing the values for a single alpha factor, forward returns for each period, the factor quantile/bin that factor value belongs to, and (optionally) the group the asset belongs to. - See full explanation in utils.get_clean_factor_and_forward_returns
  • long_short (bool) -- Should this computation happen on a long short portfolio? if so, then mean quantile returns will be demeaned across the factor universe.
  • group_neutral (bool) -- Should this computation happen on a group neutral portfolio? if so, returns demeaning will occur on the group level.
alphalens.tears.create_turnover_tear_sheet(*args, **kwargs)

Creates a tear sheet for analyzing the turnover properties of a factor.

Parameters:
  • factor_data (pd.DataFrame - MultiIndex) -- A MultiIndex DataFrame indexed by date (level 0) and asset (level 1), containing the values for a single alpha factor, forward returns for each period, the factor quantile/bin that factor value belongs to, and (optionally) the group the asset belongs to. - See full explanation in utils.get_clean_factor_and_forward_returns
  • turnover_periods (sequence[string], optional) -- Periods to compute turnover analysis on. By default periods in 'factor_data' are used but custom periods can provided instead. This can be useful when periods in 'factor_data' are not multiples of the frequency at which factor values are computed i.e. the periods are 2h and 4h and the factor is computed daily and so values like ['1D', '2D'] could be used instead