Welcome
In this tutorial, we introduce Quantopian, the problems it aims to solve, and the tools it provides to help you solve those problems. At the end of this tutorial, you should have a high level understanding of what you can do with Quantopian.
The focus of the tutorial is to get you started, not to make you an expert Quantopian user. If you already feel comfortable with the basics of Quantopian, there are other resources to help you learn more about Quantopian's tools:
All you need to get started on this tutorial is some basic Python programming skills.
What is Quantopian?
Quantopian is a cloud-based software platform that allows you to research quantitative financial factors in developed and emerging equity markets around the world using Python. Quantopian makes it easy to iterate on ideas by supplying a fast, uniform API on top of all sorts of financial data. Additionally, Quantopian provides tools to help you upload your own financial datasets, analyze the efficacy of your factors, and share your findings with a global community of quants.
Typically, researching cross-sectional equity factors involves the following steps:
1. Define a universe of assets.
2. Define a financial factor over the universe.
3. Test the factor.
4. Share and discuss results with other quants with an eye toward improving your approach.
On Quantopian, steps 1 and 2 are achieved using the Pipeline API, step 3 is done using a tool called Alphalens, and step 4 is done in the Quantopian community forum. The rest of this tutorial will give a brief walkthrough of an end-to-end factor research workflow on Quantopian.
Research Environment
The code in this tutorial can be run in Quantopian's Research environment. Research is a hosted Jupyter notebook environment that allows you to interactively run Python code. Research comes with a mix of proprietary and open-source Python libraries pre-installed. To learn more about Research, see the documentation.
You can follow along with the code in this notebook by clicking Get Notebook at the top right corner of this lesson. Once you have the code in Research, press Shift+Enter to run each cell of code (grey boxes).
Step 1 - Define a universe of assets.
The first step to researching a cross-sectional equity factor is to select a “universe” of equities over which our factor will be defined. In this context, a universe represents the set of equities we want to consider when performing computations later. On Quantopian, defining a universe is done using the the Pipeline API. Later on, we will use the same API to compute factors over the equities in this universe.
The Pipeline API provides a uniform interface to several built-in datasets, as well as any custom datasets that we upload to our account. Pipeline makes it easy to define computations or expressions using built-in and custom data. For example, the following code snippet imports two built-in datasets, FactSet Fundamentals and FactSet Equity Metadata, and uses them to define an equity universe.
```from quantopian.pipeline.data.factset import Fundamentals, EquityMetadata

primary_shares = (is_share & is_primary)
market_cap = Fundamentals.mkt_val.latest

```
The above example defines a universe to be the top 1000 primary issue common stocks ranked by market cap. Universes can be defined using any of the data available on Quantopian. Additionally, you can upload your own data, such as index constituents or another custom universe to the platform using the Self-Serve Data tool. To learn more about uploading a custom dataset, see the Self-Serve Data documentation. For now, we will stick with the universe definition above.
Step 2 - Define a factor.
After defining a universe, the next step is to define a factor for testing. On Quantopian, a factor is a computation that produces numerical values at a regular frequency for assets in a universe. Similar to step 1, we will use the the Pipeline API to define factors. In addition to providing a fast, uniform API on top of pre-integrated and custom datasets, Pipeline also provides a set of built-in classes and methods that can be used to quickly define factors. For example, the following code snippet defines a momentum factor using fast and slow moving average computations.
```from quantopian.pipeline.data import EquityPricing
from quantopian.pipeline.factors import SimpleMovingAverage

# 1-month (21 trading day) moving average factor.
fast_ma = SimpleMovingAverage(inputs=[EquityPricing.close], window_length=21)

# 6-month (126 trading day) moving average factor.
slow_ma = SimpleMovingAverage(inputs=[EquityPricing.close], window_length=126)

# Divide fast_ma by slow_ma to get momentum factor and z-score.
momentum = fast_ma / slow_ma
momentum_factor = momentum.zscore()
```
Now that we defined a universe and a factor, we can choose a market and time period and simulate the factor. One of the defining features of the Pipeline API is that it allows us to define universes and factors using high level terms, without having to worry about common data engineering problems like adjustments, point-in-time data, symbol mapping, delistings, and data alignment. Pipeline does all of that work behind the scenes and allows us to focus our time on building and testing factors.
The code below creates a Pipeline instance that adds our factor as a column and screens down to equities in our universe. The Pipeline is then run over the US equities market from 2016 to 2019.
```from quantopian.pipeline import Pipeline
from quantopian.pipeline.data import EquityPricing
from quantopian.pipeline.domain import US_EQUITIES
from quantopian.pipeline.factors import SimpleMovingAverage

primary_shares = (is_share & is_primary)
market_cap = Fundamentals.mkt_val.latest

# 1-month moving average factor.
fast_ma = SimpleMovingAverage(inputs=[EquityPricing.close], window_length=21)

# 6-month moving average factor.
slow_ma = SimpleMovingAverage(inputs=[EquityPricing.close], window_length=126)

# Divide fast_ma by slow_ma to get momentum factor and z-score.
momentum = fast_ma / slow_ma
momentum_factor = momentum.zscore()

# Create a US equities pipeline with our momentum factor, screening down to our universe.
pipe = Pipeline(
columns={
'momentum_factor': momentum_factor,
},
screen=universe,
domain=US_EQUITIES,
)

# Run the pipeline from 2016 to 2019 and display the first few rows of output.
from quantopian.research import run_pipeline
factor_data = run_pipeline(pipe, '2016-01-01', '2019-01-01')
print("Result contains {} rows of output.".format(len(factor_data)))

```
Running the above code in Research will produce a pandas dataframe, stored in the variable factor_data, and display the first few rows of its output. The dataframe contains a momentum factor value per equity per day, for each equity in our universe, based on the definition we provided. Now that we have a momentum value for each equity in our universe, and each day between 2016 and 2019, we can test to see if our factor is predictive.
Note: Due to licensing restrictions, some datasets on Quantopian have holdouts on the most recent year or two of data. Each dataset is documented with the length of holdout on recent data. For instance, FactSet Fundamentals has the most recent year of data held out. Holdouts to not apply to Quantopian Enterprise.
The following image shows this resulting dataframe. Step 3 - Test the factor.
The next step is to test the predictiveness of the factor we defined in step 2. To determine if our factor is predictive, we calculate the forward returns for the factor's assets over the factor's dates. We then pass the factor and the forward returns data into Alphalens. The following code cell shows how to get this returns data and send it to Alphalens.
```from quantopian.research import get_forward_returns
import alphalens as al

# Get the 1-day forward returns for the assets and dates in the factor
returns_df = get_forward_returns(
factor_data['momentum_factor'],
,
US_EQUITIES
)

# Format the factor and returns data so that we can run it through Alphalens.
al_data = al.utils.get_clean_factor(
factor_data['momentum_factor'],
returns_df,
quantiles=5,
bins=None,
)
```
Then, we can create a factor tearsheet to analyze our momentum factor.
```from alphalens.tears import create_full_tear_sheet

create_full_tear_sheet(al_data)
```   The Alphalens tearsheet offers insight into the predictive ability of a factor. To learn more about Alphalens, check out the documentation.
Step 4 - Discuss with the Quantopian Community
When we have a factor that we like, we can share the result in the Quantopian community forum and solicit feedback from community members. The ideas you come up with on Quantopian belong to you, but sometimes sharing a result can spark a discussion and create an opportunity to learn from others.
In the community forum, Research notebooks can be attached to posts. If you want to share the result of your work and the code, you can share your notebook as is. If you want to keep the code a secret, you can create a copy of your notebook, run your factor through Alphalens, delete the code cells that have your Pipeline code, and just share the Alphalens result in a community post.
If you want to share your work or your result in the community, adding an explanation or asking questions will make it more likely to have a productive conversation with other users.
Recap and Next Steps
In this tutorial, we introduced Quantopian and walked through a factor research workflow using Pipeline and Alphalens. Quantopian has a rich set of documentation which you can use to learn more about the platform. We recommend starting with the User Guide section of the documentation if you would like to grow your understanding of Quantopian or the Data Reference if you want to learn more about the data that's available to you out of the box.