Intro to Alphalens
Welcome to the Alphalens Tutorial! This tutorial assumes that you have completed the Getting Started and Pipeline API tutorials.

Acknowledgement: Quantopian would like to thank community member Luca, for contributing to Alphalens and for helping with this tutorial.
What is Alphalens?
Alphalens is a tool for analyzing a given alpha factor's effectiveness at predicting future returns. As a reminder, alpha factors express a predictive relationship between some given set of information and future returns.
When Should You Use Alphalens?
Alphalens was desgined to be used early and often in the quant workflow. Once you have defined your universe and constructed a factor in a pipeline, it's best to analyze it using Alphalens before backtesting. In the context of the quant workflow:
1. Define your trading universe and build an alpha factor using the Pipeline API.
2. Analyze the predictiveness of your alpha factor with Alphalens.
3. Create a trading strategy based on your alpha factor in Quantopian's IDE using the Optimize API.
You learned how to define a trading universe and build an alpha factor in the Pipeline API tutorial. Alphalens allows you to inspect a factor to see how predictive it is. Steps 1 and 2 are done in a research notebook, steps 3 and 4 are done in the algorithm IDE.
Why Should You Use Alphalens?
1. Alphalens is fast. It is much faster to analyze a factor with Alphalens than it is to run a full backtest in Quantopian's algo IDE.
2. Alphalens is visual. It's hard to find meaning in a wall of numbers, so Alphalens creates charts to help you visualize data.
3. Alphalens is easy. Once you create an alpha factor in a Pipeline, you're only a few steps away from analyzing it with Alphalens.
In the next lesson, you will learn how to analyze a Pipeline's output with Alphalens in a research notebook.
Creating tear sheets with Alphalens
In the previous lesson, you learned what Alphalens is. In this lesson, you will learn a four step process for how to use it:
1. Express an alpha factor and define a trading universe by creating and running a pipeline over a certain time period.
2. Query pricing data for the assets in our universe during that same time period with get_pricing()
3. Align the alpha factor data with the pricing data with get_clean_factor_and_forward_returns()
4. Visualize how well our alpha factor predicts future price movements with create_full_tear_sheet()
Build And Run A Pipeline
The following code expresses an alpha factor based on asset growth, then runs it with run_pipeline()
```from quantopian.pipeline.data import factset

from quantopian.pipeline import Pipeline
from quantopian.research import run_pipeline

def make_pipeline():

# Measures a company's asset growth rate.
asset_growth = factset.Fundamentals.assets_gr_qf.latest

return Pipeline(
columns={'asset_growth': asset_growth},
)

factor_data = run_pipeline(pipeline=make_pipeline(), start_date='2014-1-1', end_date='2016-1-1')

# Show the first 5 rows of factor data
```
We now have data from our alpha factor. The first five rows of that data looks like this: Query Pricing Data
Now that we have factor data, let's get pricing data for the same time period. get_pricing() returns pricing data for a list of assets over a specified time period. It requires four arguments:
• A list of assets for which we want pricing.
• A start date.
• An end date.
• Whether to use open, high, low or close pricing.
```pricing_data = get_pricing(
symbols=factor_data.index.levels, # Finds all assets that appear at least once in "factor_data"
start_date='2014-1-1',
end_date='2016-2-1', # must be after run_pipeline()'s end date. Explained more in lesson 4
fields='open_price' # Generally, you should use open pricing. Explained more in lesson 4
)

# Show the first 5 rows of pricing_data
```
We now have pricing data for every asset in our trading universe. The first five rows look like this: Align Data
get_clean_factor_and_forward_returns() aligns factor data from a Pipeline with pricing data from get_pricing(), and returns an object suitable for analysis with Alphalens' charting functions. It requires two arguments:
```from alphalens.utils import get_clean_factor_and_forward_returns

merged_data = get_clean_factor_and_forward_returns(
factor=factor_data,
prices=pricing_data
)

# Show the first 5 rows of merged_data
```
We have now combined pricing and factor data into a format that can be analyzed to see whether our factor data affects prices at certain time periods in the future. By default, those time periods are 1 day, 5 days, and 10 days. Visualize Results
```from alphalens.tears import create_full_tear_sheet

create_full_tear_sheet(merged_data)
```
A full tear sheet produces tons of useful charts, below is the first one it produces. Clone this lesson's notebook to see the rest of them! That's It!
In the next lesson, we will show you how to interpret the charts produced by create_full_tear_sheet().
Intepreting Alphalens tear sheets
In the previous lesson, you learned how to query and process data so that we can analyze it with Alphalens tear sheets. In this lesson, you will experience a few iterations of the alpha discovery phase of the quant workflow by analyzing those tear sheets. In this lesson, we will:
1. Analyze how well an alpha factor predicts future price movements with create_information_tear_sheet().
2. Try to improve our original alpha factor by combining it with another alpha factor.
3. Preview the profitability of our alpha factor with create_returns_tear_sheet().
Our Starting Alpha Factors
The following code expresses an alpha factor based on a company's net income and market cap, and then creates an information tear sheet for that alpha factor. We will start analyzing the alpha factor by looking at it's information coefficient (IC). The IC is a number ranging from -1, to 1, which quantifies the predictiveness of an alpha factor. Any number above 0 is considered somewhat predictive.

The first number you should look at is the IC mean, which is an alpha factor's average IC over a given time period. You want your factor's IC Mean to be as high as possible. Generally speaking, a factor is worth investigating if it has an IC mean over 0. If it has an IC mean close to .1 (or higher) over a large trading universe, that factor is probably really good.
```from quantopian.pipeline.data import factset

from quantopian.pipeline import Pipeline
from quantopian.research import run_pipeline
from quantopian.pipeline.factors import CustomFactor, SimpleMovingAverage
from alphalens.tears import create_information_tear_sheet
from alphalens.utils import get_clean_factor_and_forward_returns

def make_pipeline():

# 1 year moving average of year over year net income
net_income_moving_average = SimpleMovingAverage(
inputs=[factset.Fundamentals.net_inc_af],
window_length=252
)

# 1 year moving average of market cap
market_cap_moving_average = SimpleMovingAverage(
inputs=[factset.Fundamentals.mkt_val],
window_length=252
)

average_market_cap_per_net_income = (market_cap_moving_average / net_income_moving_average)

# the last quarter's net income
net_income = factset.Fundamentals.net_inc_qf.latest

projected_market_cap = average_market_cap_per_net_income * net_income

return Pipeline(
columns={'projected_market_cap': projected_market_cap},
)

factor_data = run_pipeline(make_pipeline(), '2010-1-1', '2012-1-1')
pricing_data = get_pricing(factor_data.index.levels, '2010-1-1', '2012-2-1', fields='open_price')
merged_data = get_clean_factor_and_forward_returns(factor_data, pricing_data)

create_information_tear_sheet(merged_data)
```
Below is the first chart produced by create_information_tear_sheet(). Notice how the IC Mean figures are all positive. That is a good sign! Alphalens is useful for identifying alpha factors that aren't predictive early in the quant workflow. This allows you to avoid wasting time running a full backtest on a factor that could have been discarded earlier in the process.

The following code expresses another alpha factor called price_to_book, combines it with `projected_market_cap` using zscores and winsorization, then creates another information tear sheet based on our new (and hopefully improved) alpha factor.
```def make_pipeline():

# 1 year moving average of year over year net income
net_income_moving_average = SimpleMovingAverage(
inputs=[factset.Fundamentals.net_inc_af],
window_length=252
)

# 1 year moving average of market cap
market_cap_moving_average = SimpleMovingAverage(
inputs=[factset.Fundamentals.mkt_val],
window_length=252
)

average_market_cap_per_net_income = (market_cap_moving_average / net_income_moving_average)

net_income = factset.Fundamentals.net_inc_qf.latest # The last quarter's net income

projected_market_cap = average_market_cap_per_net_income * net_income

price_to_book = factset.Fundamentals.pbk_qf.latest # The alpha factor we are adding

factor_to_analyze = projected_market_cap.zscore() + price_to_book.zscore()

return Pipeline(
columns={'factor_to_analyze': factor_to_analyze},
)

factor_data = run_pipeline(make_pipeline(), '2010-1-1', '2012-1-1')
pricing_data = get_pricing(factor_data.index.levels, '2010-1-1', '2012-2-1', fields='open_price')
new_merged_data = get_clean_factor_and_forward_returns(factor_data, pricing_data)

create_information_tear_sheet(new_merged_data)
```
Notice how the IC figures are lower than they were in the first chart. That means the factor we added is making our predictions worse! See If Our Alpha Factor Might Be Profitable
We found that the first iteration of our alpha factor had more predictive value than the second one. Let's see if the original alpha factor might make any money.

create_returns_tear_sheet() splits your universe into quantiles, then shows the returns generated by each quantile over different time periods. Quantile 1 is the 20% of assets with the lowest alpha factor values, and quantile 5 is the highest 20%.

This function creates six types of charts, but the two most important ones are:
• Mean period-wise returns by factor quantile: This chart shows the average return for each quantile in your universe, per time period. You want the quantiles on the right to have higher average returns than the quantiles on the left.
• Cumulative return by quantile: This chart shows you how each quantile performed over time. You want to see quantile 1 consistently performing the worst, quantile 5 consistently performing the best, and the other quantiles in the middle.
The following code creates a returns tear sheet.
```from alphalens.tears import create_returns_tear_sheet

create_returns_tear_sheet(merged_data)
```
Notice how quantile 5 doesn't have the highest returns. Ideally, you want quantile 1 to have the lowest returns, and quantile 5 to have the highest returns. Also, there is significant crossover between quantiles in the Cumulative Returns By Quantile chart. Ideally, there shouldn't be any crossover. This tear sheet is telling us we still have work to do! In this lesson, you experienced a few cycles of the alpha discovery stage of the quant worfklow. Making good alpha factors isn't easy, but Alphalens allows you to iterate through them quickly to find out if you're on the right track! You can usually improve existing alpha factors in some way by getting creative with moving averages, looking for trend reversals, or any number of other stratgies.

Try looking around Quantopian's forums, or reading academic papers for inspiration. This is where you get to be creative! In the next lesson, we'll discuss advanced Alphalens concepts.
You've learned the basics of using Alphalens. This lesson explores the following advanced Alphalens concepts:
1. Determining how far an alpha factor's predictive value stretches into the future.
2. Dealing with a common Alphalens error named MaxLossExceededError.
3. Grouping assets by sector, then analyzing each sector individually.
4. Writing group neutral strategies.
The following code creates an alpha factor in a pipeline. The rest of this lesson will discuss advanced Alphalens concepts using the data created by the pipeline.

Important note: Until this lesson, we passed the output of run_pipeline() to get_clean_factor_and_forward_returns() without any changes. This was possible because the previous lessons' pipelines only returned one column. This lesson's pipeline returns two columns, which means we need to specify the column we're passing as factor data. Look for commented code near get_clean_factor_and_forward_returns() in the following cell to see how to do this.
```from quantopian.pipeline import Pipeline
from quantopian.pipeline.data import factset
from quantopian.research import run_pipeline
from quantopian.pipeline.classifiers.fundamentals import Sector
from alphalens.utils import get_clean_factor_and_forward_returns

def make_pipeline():

change_in_working_capital = factset.Fundamentals.wkcap_chg_qf.latest
ciwc_processed = change_in_working_capital.winsorize(.2, .98).zscore()

sales_per_working_capital = factset.Fundamentals.sales_wkcap_qf.latest
spwc_processed = sales_per_working_capital.winsorize(.2, .98).zscore()

factor_to_analyze = (ciwc_processed + spwc_processed).zscore()

sector = Sector()

return Pipeline(
columns = {
'factor_to_analyze': factor_to_analyze,
'sector': sector,
},
screen = (
& factor_to_analyze.notnull()
& sector.notnull()
)
)

pipeline_output = run_pipeline(make_pipeline(), '2013-1-1', '2014-1-1')
pricing_data = get_pricing(pipeline_output.index.levels, '2013-1-1', '2014-3-1', fields='open_price')

factor_data = get_clean_factor_and_forward_returns(
pipeline_output['factor_to_analyze'], # How to analyze a specific pipeline column with Alphalens
pricing_data,
periods=range(1,32,3)
)
```
Visualizing an alpha factor's decay rate
A lot of fundamental data only comes out 4 times a year in quarterly reports. Because of this low frequency, it can be useful to increase the amount of time get_clean_factor_and_forward_returns() looks into the future to calculate returns.

Tip: A month usually has 21 trading days, a quarter usually has 63 trading days, and a year usually has 252 trading days.

Let's say you're creating a strategy that buys stock in companies with rising profits (data that is released every 63 trading days). Would you only look 10 days into the future to analyze that factor? Probably not! But how do you decide how far to look forward?

The following code charts our alpha factor's IC mean over time.
```from alphalens.performance import mean_information_coefficient
mean_information_coefficient(factor_data).plot(title="IC Decay");
```
The point where the line dips below 0 represents when our alpha factor's predictions stop being useful. What do you think the chart will look like if we calculate the IC a full year into the future?

*Hint*: This is a setup for section two of this lesson.
```factor_data = get_clean_factor_and_forward_returns(
pipeline_output['factor_to_analyze'],
pricing_data,
periods=range(1,252,20) # The third argument to the range statement changes the "step" of the range
)

mean_information_coefficient(factor_data).plot()
```
Running the code above would produce the following error: Dealing With MaxLossExceededError
Oh No! What does MaxLossExceededError: max_loss (35.0%) exceeded 88.4%, consider increasing it. mean?

get_clean_factor_and_forward_returns() aligns data from an alpha factor with forward looking returns data. This means we need our pricing data to go further into the future than our alpha factor data by at least as long as our forward looking period. In this case, we'll change get_pricing()'s end_date argument to be at least a year after run_pipeline()'s end_date argument.

The following code shows how to make those changes.

```pipeline_output = run_pipeline(
make_pipeline(),
start_date='2013-1-1',
end_date='2014-1-1' #  *** NOTE *** Our factor data ends in 2014
)

pricing_data = get_pricing(
pipeline_output.index.levels,
start_date='2013-1-1',
end_date='2015-2-1', # *** NOTE *** Our pricing data ends in 2015
fields='open_price'
)

factor_data = get_clean_factor_and_forward_returns(
pipeline_output['factor_to_analyze'],
pricing_data,
periods=range(1,252,20) # Change the step to 10 or more for long look forward periods to save time
)

mean_information_coefficient(factor_data).plot()
```
As you can see, this alpha factor's IC decays quickly after a few days, but comes back even stronger than before six months into the future. Interesting! Note: MaxLossExceededError has two possible causes; forward returns computation and binning. We showed you how to fix forward returns computation here because it is much more common. You can read more about what binning is in the API docs.
Analyzing Alpha Factors By Group
Alphalens allows you to group assets using a classifier. A common use case for this is creating a classifier that specifies which sector each equity belongs to, then comparing your alpha factor's returns among sectors.

You can group assets by any classifier, but sector is most common. The pipeline in the first cell of this lesson returns a column named sector, whose values represent the corresponding Morningstar sector code. All we have to do now is pass that column to the groupby argument of get_clean_factor_and_forward_returns()

The following code shows how to make those changes.
```from alphalens.tears import create_returns_tear_sheet

sector_labels, sector_labels[-1] = dict(Sector.SECTOR_NAMES), "Unknown"

factor_data = get_clean_factor_and_forward_returns(
factor=pipeline_output['factor_to_analyze'],
prices=pricing_data,
groupby=pipeline_output['sector'],
groupby_labels=sector_labels,
)

create_returns_tear_sheet(factor_data=factor_data, by_group=True)
```
Once the factor is grouped by sector, you will see charts at the bottom of the tear sheet showing how our factor performs in different sectors. Writing Group Neutral Strategies
Not only does Alphalens allow us to simulate how our alpha factor would perform in a long/short trading strategy, it also allows us to simulate how it would do if we went long/short on every group!

Grouping by sector, and going long/short on each sector allows you to limit exposure to the overall movement of sectors. For example, you may have noticed in step three of this tutorial, that certain sectors had all positive returns, or all negative returns. That information isn't useful to us, because that just means the sector group outperformed (or underperformed) the market; it doesn't give us any insight into how our factor performs within that sector.

Since we grouped our assets by sector in the previous cell, going group neutral is easy; just make the two following changes:

1. Pass binning_by_group=True as an argument to get_clean_factor_and_forward_returns().
2. Pass group_neutral=True as an argument to create_full_tear_sheet().
3. The following cell has made the approriate changes. Try running it and notice how the results differ from the previous cell.
```factor_data = get_clean_factor_and_forward_returns(
pipeline_output['factor_to_analyze'],
prices=pricing_data,
groupby=pipeline_output['sector'],
groupby_labels=sector_labels,
binning_by_group=True,
)

create_returns_tear_sheet(factor_data, by_group=True, group_neutral=True)
```
As you can see, the results are different when we go group neutral. You can sometimes get insights into why your alpha factor is behaving in a certain way by analyzing it in a group neutral fashion. That's it!

The techniques you learned in this tutorial will help you identify good alpha factors. Use the template on the following page to create a few alpha factors, then try implementing them in the IDE to enter into the Quantopian Contest!
Quickstart Template
You've now learned what Alphalens is and how to use it. Clone this lesson's notebook to start analyzing factors of your own!

Below is an explanation of each block of code in the notebook. The notebook does not contain these explanations.
Define An Alpha Factor
The attached notebook is set up to analyze the alpha factor called factor_to_analyze in the pipeline below.

If you're confused: Don't worry, you only have to change one line of code in the entire notebook to get started!

Trying altering this line of code: factor_to_analyze = (current_assets - assets_moving_average) to analyze a data field from our list of Fundamental data fields from FactSet. For example, you could change that line to factor_to_analyze = factset.Fundamentals.assets_gr_qf.latest, then run the rest of the cells in the notebook. By modifying that one line of code, you're now analyzing how a company's asset growth affects its stock price!
```def make_pipeline():

assets_moving_average = SimpleMovingAverage(inputs=[factset.Fundamentals.assets], window_length=252)
current_assets = factset.Fundamentals.assets.latest

factor_to_analyze = (current_assets - assets_moving_average)

sector = Sector()

return Pipeline(
columns={'factor_to_analyze': factor_to_analyze, 'sector': sector},
)

factor_data = run_pipeline(make_pipeline(), '2015-1-1', '2016-1-1')
pricing_data = get_pricing(factor_data.index.levels, '2015-1-1', '2016-6-1', fields='open_price')
```
Determine The Decay Rate Of The Alpha Factor
The following chart shows your alpha factor's Information Coefficient over time. As a reminder, the IC is the most useful number for quantifying a given alpha factor's predictiveness.
```longest_look_forward_period = 63 # week = 5, month = 21, quarter = 63, year = 252
range_step = 5

merged_data = get_clean_factor_and_forward_returns(
factor=factor_data['factor_to_analyze'],
prices=pricing_data,
periods=range(1, longest_look_forward_period, range_step)
)

mean_information_coefficient(merged_data).plot(title="IC Decay")
```
Create Group Neutral Tear Sheets
Run the following cells to create group neutral tear sheets for your alpha factor. If you don't know what group neutral means, please refer back to lesson #4 of this tutorial.

Ideally, you want the IC Mean line chart to constantly be above 0. You also want to see Quantile 1 consistently have the lowest returns, and quantile 5 consistently have the highest returns.

Lastly, be sure to take a look at the IC Mean and returns by sector to see if there are any major outliers in terms of performace.
```sector_labels, sector_labels[-1] = dict(Sector.SECTOR_NAMES), "Unknown"

merged_data = get_clean_factor_and_forward_returns(
factor=factor_data['factor_to_analyze'],
prices=pricing_data,
groupby=factor_data['sector'],
groupby_labels=sector_labels,
binning_by_group=True,
periods=(1,5,10)
)

create_information_tear_sheet(merged_data, by_group=True, group_neutral=True)
create_returns_tear_sheet(merged_data, by_group=True, group_neutral=True)```

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian.

In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.