# Pipeline¶

Many algorithms depend on calculations that follow a specific pattern:

Every day, for some set of data sources, fetch the last `N` days’ worth of data for a large number of assets and apply a reduction function to produce a single value per asset.

This kind of calculation is called a cross-sectional trailing-window computation: "cross-sectional" because one value is computed for each asset, and "trailing-window" because data is fetched over a trailing window. (Alpha factors are cross-sectional trailing-window computations.)

A simple example of a cross-sectional trailing-window computation is “close-to-close daily returns”, which has the form:

Every day, fetch the last two days of close prices for all assets. For each asset, calculate the percent change between the asset’s previous close price and its current close price.

The purpose of the Pipeline API is to make it easy to define and execute cross-sectional trailing-window computations.

## Basic Usage¶

Working with pipeline is generally done in two parts: defining an object of class `Pipeline` and running that pipeline object over some period of time. You can think of defining a pipeline like defining a mathematical expression with variables, like `f = 2*x + y - 3*z`. Defining a pipeline is like defining a mathematical expression with more domain specific variables: `f = 2*close_price + earnings_yield - 3*sentiment_score`. Running a pipeline is the equivalent of plugging numbers into those variables and evaluating the result (but running a pipeline usually involves plugging in millions of values!).

Defining a pipeline is done in 3 steps:

While each step is explored in detail further in the docs, it is often easiest to start by walking through an example.

In the example below, pipeline is used to describe a computation of 10-day and 30-day simple moving averages of daily close prices for all US equities trading on a supported exchange. The computation is then filtered down to just equities with a 10-day average price of \$5 or more.

```# Import pipeline built-ins.
from quantopian.pipeline import Pipeline
from quantopian.pipeline.factors import SimpleMovingAverage

# Import datasets.
from quantopian.pipeline.data import EquityPricing

# Define factors.
sma_10 = SimpleMovingAverage(inputs=[EquityPricing.close], window_length=10)
sma_30 = SimpleMovingAverage(inputs=[EquityPricing.close], window_length=30)

# Define a filter.
prices_over_5 = (sma_10 > 5)

# Instantiate pipeline with two columns corresponding to our two factors, and a
# screen that filters the result down to assets where sma_10 > \$5.
pipe = Pipeline(
columns={
'sma_10': sma_10,
'sma_30': sma_30,
},
screen=prices_over_5
)
```

The above example constructs a `Pipeline` representing the definition of the computations described earlier, not the results of those computations.

Note

Under the hood, the `Pipeline` object describes a directed acyclic graph (DAG) of data inputs and transformations that correspond to the computation definitions provided to the pipeline constructor. Pipeline knows how to structure the graph to maximize the efficiency and speed of performing the computations that it describes.

In order for the computations in a pipeline defintion to be executed, the `Pipeline` object needs to be run.

Running a pipeline in Research requires explicit start and end dates to be passed. The example pipeline above can be run in Research from 01/01/2017 to 01/01/2018 with the following code:

```from quantopian.research import run_pipeline

# Pipeline definition goes here.

my_pipeline_result = run_pipeline(pipe, '2017-01-01', '2018-01-01')
```

In the IDE, pipelines are attached to algorithms and automatically executed for each day of a backtest. The same example pipeline from above can be "attached" to an algorithm like this:

```from quantopian.algorithm import attach_pipeline

# Pipeline definition goes here.

def initialize(context):

attach_pipeline(pipe)
```

This separation between defining and running a pipeline allows the Pipeline API to be the same in both Research and the IDE. Having a common API for defining a pipeline in both environments means you can research and analyze alpha factors in the Research environment and then copy your pipeline code over to the IDE for backtesting.

Generally speaking, developing a pipeline in Research is more interactive and much easier to debug than developing a pipeline in the IDE. As such, it is best to start by defining a pipeline in research until you have a pipeline that works as expected, then copy that pipeline into an algorithm in the IDE.

## Defining a Pipeline¶

As mentioned above, defining a pipeline is done in three steps: 1. Importing data. 2. Defining computations (where the vast majority of your time will be spent). 3. Instantiating a pipeline.

The sections below explore each of these steps in more detail.

### Importing Data¶

#### DataSets¶

Before defining a pipeline, you will need to import any data that you want to use. In pipeline, datasets are imported as `DataSet` objects. Pipeline `DataSets` are collections of objects that tell the Pipeline API where and how to find the inputs to computations. Importantly, a `DataSet` does not hold actual data. Since these objects generally correspond to database columns, the attributes of a `DataSet` are referred to as "columns".

DataSets can be imported using the usual Python import syntax; for example, the following code imports the EquityPricing DataSet.

```from quantopian.pipeline.data import EquityPricing
```

The full list of importable DataSets can be found in the Data Reference.

#### BoundColumns¶

After importing the DataSet (or DataSets) you want to use, the next step is to reference the field(s) that you want to use from that DataSet. In pipeline, each "field" of data is represented as a `BoundColumn`. A `BoundColumn` is a column of data that is concretely bound to a DataSet. Instances of BoundColumns are dynamically created upon access to attributes of DataSets. Inputs to pipeline computations are most commonly of type `BoundColumn`, so it is important to understand how to access a `BoundColumn` in pipeline. The below code snippet imports the `EquityPricing` datasets and accesses one of its attributes to get a reference to a `BoundColumn`.

```# Import the EquityPricing DataSet.
from quantopian.pipeline.data import EquityPricing

# Access the EquityPricing close attribute to instantiate a
# BoundColumn. Note that a BoundColumn DOES NOT store data, it
# is instead used to inform the pipeline engine where to retrieve
# the data when performing computations. Printing the below
# statement will not display daily close prices.
daily_close = EquityPricing.close
```

#### dtypes¶

Each `BoundColumn` on a `DataSet` has a particular `np.dtype` (short for 'data type'). Valid `BoundColumn` dtypes include `float64`, `int64`, `bool`, `datetime64`, and `object` (representing string values). Since a BoundColumn does not actually contain data, it has a specified dtype so that pipeline knows the type of data that will populate the field when the pipeline is run. The dtype is vital to pipeline because it dictates what computations can be applied to the field. For example, you can take the sum of two float-type fields, but you cannot sum a float-type field and an object-type (string) field.

The dtype of a BoundColumn can also determine the type of a computation. In the case of the `Latest` computation, the dtype determines whether the computation is a Factor, a Filter, or a Classifier.

#### DataSetFamily¶

Some datasets are accessible as a `DataSetFamily` instead of as a `DataSet`. A `DataSetFamily` is like a collection of `DataSets`, where each member dataset has the same columns. Each member of a family is identified by a tuple of named attributes, called its coordinates. To select a member from a `DataSetFamily`, you call the family's `slice()` method, passing the coordinates of the desired member.

For example, geographic revenue exposure data is accessible via the GeoRev `DataSetFamily`. To select the `DataSet` containing data for estimated revenue exposure to the Western European Union, you would write:

```# Import the GeoRev DataSetFamily.
from quantopian.pipeline.data.factset import GeoRev

# GeoRev is a DataSetFamily, gr_weu is a DataSet.
gr_weu = GeoRev.slice(region='WESTERN EUROPEAN UNION')

# You can also pass slice coordinates positionally.
gr_weu = GeoRev.slice('WESTERN EUROPEAN UNION')

# Once we have a DataSet, you can access a bound column like
# you would with any other DataSet (by accessing an attribute).
# est_exposure_weu is a BoundColumn.
est_exposure_weu = gr_weu.est_pct
```

Some data on Quantopian is accessible as a `DataSet` while other data is accessible as a `DataSetFamily`. Generally speaking, `DataSetFamily` integrations are reserved for data that is structured in such a way that certain variables (like a categorical label) need to be fixed in order to select a logical daily timeseries of data. To learn more about what this means and some of the background behind the design of the `DataSetFamily` object, see the explanation in this forum post.

Note

Each data integration on Quantopian is has its own page in the Data Reference. To determine if a particular integration is available as a `DataSetFamily` or a `DataSet`, navigate to the Pipeline Datasets and Columns section of the reference page for the relevant data source.

#### Custom Data¶

Self Serve Data

In addition to built-in data, you can upload your own data to Quantopian using the Self Serve Data tool. Custom data uploaded via Self Serve is usable in pipeline. To learn more about using custom data in pipeline, see the Self Serve Data section of the documentation.

### Defining Computations¶

Once you've imported the data that you want to use, you will need to define the computations your pipeline should compute each day.

These transformations are referred to as pipeline terms. Pipeline terms are implemented via Factors, Filters, and Classifiers.

#### Factors¶

A `Factor` is a function from an asset and a moment in time to a number:

f(asset, timestamp) --> numerical value

An instance of the `Factor` class must be defined before it can be called in a pipeline. There are a set of built-in factors that are available out-of-the box. If you want to perform a computation that doesn't exist as a built-in, you can define your own custom factor.

Every Factor stores four pieces of state:

1. `inputs`: A list of `BoundColumn` objects and/or other pipeline terms (factor, filter, or classifier) describing the inputs to the factor.
2. `window_length` : An integer describing how many rows of historical data the Factor needs to be provided each day.
3. `dtype`: A `np.dtype` object representing the type of values computed by the Factor. Most factors are of dtype `float64`, indicating that they produce numerical values represented as 64-bit floats. Factors can also be of dtype `datetime64[ns]`. Factors default to dtype `float64`.
4. A `compute` function that operates on the data described by `inputs` and `window_length`. When a factor is computed for a day on which there are `N` assets in the Quantopian database, the underlying pipeline engine provides that factor's compute function a two-dimensional array of shape window_length*N for each input in inputs. The job of the compute function is to produce a one-dimensional array of length `N` as an output.

The `dtype` and `compute` pieces of state should be provided when defining the `Factor` object, since those are "inherent" to the Factor. However, `inputs` and `window_length` can be provided either as defaults when defining the `Factor` class object or when the `Factor` object is called in your pipeline (since a factor might be applied over many different inputs or different window lengths).

For example, consider the built-in `SimpleMovingAverage` factor:

```from quantopian.pipeline import Pipeline
from quantopian.pipeline.data import EquityPricing
from quantopian.pipeline.factors import SimpleMovingAverage

mean_close_10 = SimpleMovingAverage(
inputs=[EquityPricing.close],
window_length=10,
)
```

In this example, the `dtype` is `float64`, and the `compute` function is the simple moving average (both of which are defined in the built-in `SimpleMovingAverage` object). In the code snippet above, the `inputs` argument is defined as the `close` column of the `EquityPricing` DataSet, and the `window_length` as 10 days.

#### Built-In Factors¶

Built-in factors are available for many common operations (like simple moving average, etc.). Built-in factors are available for import via the `quantopian.pipeline.factors` module.

For example, the built-in `Returns` factor computes close to close returns over a specified window length.

```from quantopian.pipeline import Pipeline
from quantopian.pipeline.factors import Returns

# The default inputs argument for Returns is EquityPricing.close.
returns_2w = Returns(
window_length=11,
)
```

For a complete list of built-in factors available on Quantopian, see the API Reference.

#### CustomFactors¶

For operations not available as built-in factors, you can build your own `CustomFactor`. As outlined above, you'll need to provide at least the `compute` function when defining your `CustomFactor`.

For example, consider this standard deviation CustomFactor:

```class StdDev(CustomFactor):
def compute(self, today, asset_ids, out, values):
# Calculates the column-wise standard deviation, ignoring NaNs
out[:] = numpy.nanstd(values, axis=0)
```

Here, the `compute` function is defined using `numpy.nanstd()` and the `dtype` defaults to `float64` (this is the default of any custom factor unless you choose to override it).

The `inputs` and `window_length` can be defined upon instantiationg of the `CustomFactor`, but it's also possible to define one or both as defaults within the `CustomFactor` class:

```class StdDev(CustomFactor):
inputs = [USEquityPricing.close]
window_length = 5
def compute(self, today, asset_ids, out, values):
# Calculates the column-wise standard deviation, ignoring NaNs
out[:] = numpy.nanstd(values, axis=0)
```

The `CustomFactor` object can then be instantiated as follows:

```std_dev_5 = StdDev()
```

Many other functions beyond `np.nanstd` can be used in `compute`; the `compute` function can be any function that translates a series of values to a numerical value.

In this example, `dtype` defaulted to `float64`. However, it might be necessary to set `dtype` to `datetime64` if you expect the output of your factor to be a datetime (this is the only time you should override the default `dtype`). For example:

```class MyDateFactor(CustomFactor):
dtype = np.dtype('datetime64[ns]')
def compute(self, today, assets, out, inputs):
...
```

In most cases, CustomFactors are used to perform more complex operations on fields. If you need to combine fields using basic operations (addition, multiplication, etc.), see Combining Factors.

Note

Instances of built-in factors and custom factors are both instances of the pipeline `Factor`. They both store all four pieces of state (`inputs`, `dtype`, `window_length`, `compute`) and they all have access to the same set of factor methods.

#### Combining Factors¶

Factors can be combined, both with other factors and with scalar values, via any of the basic mathematical operators (`+`, `-`, `*`, etc). This makes it easy to write complex expressions that combine multiple factors. For example, constructing a factor that computes the average of two other factors is simply:

```f1 = SomeFactor(...)
f2 = SomeOtherFactor(...)
average = (f1 + f2) / 2.0
```

It is generally preferred to combine factors by using the basic mathematical operators as opposed to defining CustomFactors to achieve the same result. Using the mathematical operators is generally simpler to read in the code.

Note

Any factors can be combined using basic mathematical operators, regardless of whether they are built-in or custom factors.

#### Using Factor Methods¶

Each instance of the `Factor` class has several methods that can be used to perform transformations that are common to a numerical values timeseries. Some of the more popular `Factor` methods include `zscore()`, `percentile_between()`, and `winsorize()`. The full set of available factor methods are listed in the `Factor` definition in the API Reference.

Some factor methods support transformations that result in a new factor (e.g. `zscore()` and `winsorize()`), while others support a transformation that returns a `Filter` (e.g. `percentile_between()`). Checking the return type of each transformation before using it is important so you know how to use the output properly in your pipeline.

#### Slicing Factors¶

Note

In this section, we refer to slicing a factor, which is a different operation than slicing a DataSetFamily (referenced earlier on this page).

In certain situations, you might want to use the output from a factor for one asset as the input to another. Using a technique called "slicing", it is possible to extract the values of a `Factor` for a single asset. For example, you might want to regress a particular factor against the returns of SPY (an ETF that tracks the S&P500 index). Slices are created by indexing into a factor by asset; this action creates an object of the `Slice` class. These `Slice` objects are then used as an input to a CustomFactor.

Note

Only slices of certain factors can be used as inputs. These factors include `Returns` and any factors created from `rank()` or `zscore()`. The reason for this is that these factors produce normalized values, so they are safe for use as inputs to other factors.

When a `Slice` object is used as an input to a custom factor, it always returns an N x 1 column vector of values, where `N` is the window length. For example, a slice of a `Returns` factor would output a column vector of the `N` previous returns values for a given security.

Each day, a slice only computes a value for the single asset with which it is associated, whereas ordinary factors compute a value for every asset. As such, slices cannot be added as a column to a pipeline.

Cookbook recipe: slicing a factor.

#### Filters¶

Like a factor, a `Filter` is a transformation of input data. The difference between filters and factors is that filters are functions that produce boolean-valued outputs, whereas factors produce numerical or datetime-valued outputs:

f(asset, timestamp) --> boolean

In general, filters are used for narrowing down the set of assets included in a computation or in the final output of a pipeline.

There are two common ways to create a `Filter`: comparison operators and built-in `Factor`/`Classifier` methods.

#### Comparison Operators¶

Just like you can filter pandas DataFrames with comparison operators, you can filter pipelines with comparison operators (`>`, `<`, `==`, etc.). For example, the following code would create a filter, `close_price_filter`, that returns `True` for all equities with close prices over \$20 on a particular day:

```from quantopian.pipeline import Pipeline
from quantopian.pipeline.data import EquityPricing

# Define a factor representing the most recent close price (yesterday's close).
last_close_price = EquityPricing.close.latest

# Define a filter that returns True each time last_close_price returns a value
# greater than 20.
close_price_filter = (last_close_price > 20)
```

#### Factor/Classifier Methods¶

Various methods of the `Factor` and `Classifier` classes return a `Filter`. For example, the `top()` method produces a Filter that returns `True` for the top `N` securities of a given factor each day. The following example produces a filter that returns `True` for 200 assets every day, indicating that those assets were in the top 200 by last close price across all known assets:

```from quantopian.pipeline import Pipeline
from quantopian.pipeline.data import EquityPricing

last_close_price = EquityPricing.close.latest

# This is a filter.
top_close_price_filter = last_close_price.top(200)
```

The `percentile_between()` method is another example of a a Factor method that produces a `Filter`. For a full list of Factor methods that return Filters, see the methods of `Factor`.

You can also use comparison operators with `Classifiers` (described further down on this page) using the `eq()` method. For example, the following code would create a Filter `nyse_filter` that returns `True` for all stocks traded on the NYSE on a particular day:

```from quantopian.pipeline import Pipeline
from quantopian.pipeline.data import Fundamentals

# Since the underlying data of Fundamentals.exchange_id
# is of dtype 'object', .latest returns a Classifier
exchange = Fundamentals.exchange_id.latest

# The Classifier method `eq` returns a filter that outputs True
# each time our classifier outputs 'NYS'.
nyse_filter = exchange.eq('NYS')
```

Classifier methods like `isnull()` and `startswith()` also produce Filters. For a full list of Classifier methods that return filters, see the methods of `Classifier`.

#### Built-In Filters¶

There are several built-in Filters that filter assets based on liquidity, SID, and more.

One notable built-in Filter is the Quantopian Tradable Universe, which screens out illiquid stocks. The Quantopian Tradable Universe is the recommended tradable universe to use when researching strategies on Quantopian. You can access the Quantopian Tradable Universe filter as `QTradableStocksUS()`. For example,

```from quantopian.pipeline.filters import QTradableStocksUS

```

In this example, `base_universe` would be a Filter that you could add to a Pipeline in order to narrow your trading universe to the Quantopian Tradable Universe.

Note

In order to enter the contest or be eligible for a capital allocation, your algorithm must trade within the Quantopian Tradable Universe.

For a full list of built-in Filters, see the API Reference.

#### Combining Filters¶

Like factors, filters can be combined. Combining filters is done using the `&` (and) and `|` (or) operators. For example, the following code will screen for securities that are in the top 10% of average dollar volume and have a latest close price above \$20:

```from quantopian.pipeline import Pipeline
from quantopian.pipeline.data import EquityPricing
from quantopian.pipeline.factors import AverageDollarVolume

dollar_volume = AverageDollarVolume(window_length=63)
high_dollar_volume = dollar_volume.percentile_between(90, 100)

latest_close = EquityPricing.close.latest
above_20 = latest_close > 20

```

This filter will evaluate to `True` for securities where both `high_dollar_volume` and `above_20` are `True`. Otherwise, it will evaluate to `False`. A similar computation can be made with the `|` (or) operator.

Note

You must use `&` and `|` to combine filters. The keywords `and` and `or` are not supported when combining pipeline filters.

Sometimes, it is better to ignore certain assets when computing pipeline expresssions. There are two common cases where ignoring assets is useful:

1. An expression is computationally expensive, and results are only relevant for certain assets. A common example of such an expensive expression is a factor computing the coefficients of a regression (`linear_regression()`).
2. An expression performs comparisons between assets, but comparisons should only be performed among a subset of all assets. For example, using the `top()` method to compute the top 200 assets by earnings yield, ignoring assets that don't meet some liquidity constraint.

To support these two use cases, all factors and many factor methods accept an optional `mask` argument, which must be a `Filter` indicating which assets to consider when computing.

For example, let's define a pipeline that computes the top 200 assets ranked by market cap, but let's restrict that computation to only consider assets that are in the top 50% of assets ranked by average dollar volume. To do this, begin by creating a `high_dollar_volume` filter. This filter can then be supplied to the `mask` argument of `top`.

```from quantopian.pipeline import Pipeline
from quantopian.pipeline.data.factset import Fundamentals
from quantopian.pipeline.factors import AverageDollarVolume

dollar_volume = AverageDollarVolume(window_length=63)
high_dollar_volume = dollar_volume.percentile_between(50, 100)

mcap = Fundamentals.mkt_val.latest

```

Applying the mask to `mcap.top` restricts the `top()` method to only return the top 200 assets within the ~4000 assets passing the `high_dollar_volume` filter, as opposed to considering all ~8000 without a mask. Since `mcap_top_200` is another filter, you could then pass it as a `mask` to another compuation if you wanted.

#### CustomFilters¶

CustomFactors

For boolean-output operations that cannot be expressed using comparison operators or factor/classifier methods, you can build your own `CustomFilter`. Defining a `CustomFilter` is just like defining `CustomFactor`, except the `dtype` is a `boolean` and the computation must result in a `boolean` value output for each asset. Note that the need to define a `CustomFilter` is very uncommon. Whenever possible, you should define a filter using comparison operators or built-in methods to keep your code simple.

#### Classifiers¶

A `Classifier` is a function from an asset and a moment in time to a categorical output such as a string or integer label:

f(asset, timestamp) --> category

Note

Classifiers and filters are similar in that they both return non-numeric outputs. However, filters specifically return booleans. Additionally, filters are almost always used to filter data, while Classifiers are most often used to group data.

Classifiers are most commonly created by accessing the `.latest` attribute on a `BoundColumn` of `dtype` `int64` or `object` (string). An example of a classifier producing a string output is the exchange of a security. To create this classifier, begin by importing the `EquityMetadata` dataset. Then, use the `latest` attribute to instantiate a classifier returning the latest exchange on which each asset trades:

```from quantopian.pipeline import Pipeline

# Since the underlying data of EquityMetadata.listing_exchange
# is of type object (string), .latest returns a classifier.
```

Another way to define a pipeline classifier is with a factor method. Factor methods like `quantiles()` result in a `Classifier`. For a full list of Factor methods that result in Classifiers, see the API Reference.

Note

If the underlying data of a `BoundColumn` is numeric, `latest` returns a Factor. If it is string-type or integer labels, `latest` returns a Classifier.

Note

At this time, CustomClassifiers are not available.

#### Grouping with Classifiers¶

An important application of classifiers in pipeline is grouping. For example, you might want to compute earnings yield across all known assets and then normalize the result by dividing each asset’s earnings ratio by the mean earnings yield for that asset’s sector or industry.

In the same way that the optional `mask` parameter allows you to modify the behavior of `demean()` to ignore certain values, the `groupby` parameter allows you to specify that normalizations should be performed on subsets of rows, rather than on the entire row at once.

In the same way that you pass a `Filter` to define values to ignore, you can pass a `Classifier` to define how to partition up the rows of the `Factor` being normalized. Note that this is only applicable on methods that take a `groupby` argument. See the API Reference to see which functions take a `groupby` argument.

### Instantiating a Pipeline¶

Once you've defined your computations using Factors, Filters, and Classifiers, you will need to instantiate your pipeline. Begin by importing the `Pipeline` class:

```from quantopian.pipeline import Pipeline
```

After importing `Pipeline`, you must instantiate your pipeline with three optional arguments:

1. `columns`: A dictionary; keys are column names, values are pipeline terms (factor, filter, or classifier).
2. `domain`: A `Domain` that specifies the set of assets and a corresponding trading calendar over which the expressions of a pipeline should be computed.
3. `screen`: A `Filter` that gets applied as a post-processing screen on the pipeline output.

Each of the three `Pipeline` constructor arguments are described below.

#### Columns¶

The `columns` argument informs the pipeline engine about the pipeline terms which should be included in the output `pd.DataFrame` when the pipeline is executed. When you run a pipeline, the pipeline engine will figure out the most efficient way to compute the results that are included in the output which includes any terms on which any output `columns` depend (for example, pipeline might need to compute a filter before applying that filter as a `mask` on a factor).

The `columns` argument must be a dictionary where the keys are column names (string) and the values are pipeline terms (factor, filter, or classifier).

#### Domain¶

The `domain` argument informs the pipeline engine about the set of inputs that should be processed. Concretely, the domain of a pipeline controls two things:
1. The calendar to which the pipeline's input rows are aligned.
2. The set of assets to which the pipeline's input columns are aligned.

The `domain` argument must be a `Domain` object. The set of domains supported on Quantopian can be found in the Data Reference. Currently, each domain on Quantopian corresponds to a country's stock market. However, it is possible that domains corresponding to other sets of inputs might be added to Quantopian in the future.

Currently, all supported domains are importable from `quantopian.pipeline.domain`. Each country's domain is named **_EQUITIES, with ** replaced by a country code. For example, you can define a pipeline to be run over the Japanese equities domain like this:

```from quantopian.pipeline import Pipeline
from quantopian.pipeline.domain import JP_EQUITIES

pipe = Pipeline(columns={}, domain=JP_EQUITIES)
```

Note

If no domain is specified, a pipeline's domain will default to `US_EQUITIES`, representing the US equity market.

Note

For the mathematically-inclined, the name "domain" refers to the mathematical concept of the domain of a function, which is the set of potential inputs to a function. For more information about the design of domains, see the public design document on GitHub.

#### Screen¶

The `screen` argument is used to apply a post-processing filter to the output dataframe of a pipeline execution. Unlike the `columns` and `domain` arguments, the `screen` argument is a convenience method that doesn't actually affect the pipeline execution.

The `screen` argument must be a pipeline `Filter`. Once the pipeline has been executed successfully, any assets for which the supplied filter yields `False` will be dropped from the output dataframe. Since this is a post-processing step, it is often helpful to not supply a `screen` to a pipeline and instead do manual filtering on the pipeline result. For instance, if you have a computationally expensive pipeline and you want to test multiple filters, it would make sense to run the pipeline once, store the result, and then apply different filters to the pipeline output (usually a fast operation) so that you don't have to run it multiple times.

Once you have defined a pipeline, the next step is to run it (also referred to as 'executing' a pipeline). The `Pipeline` object doesn't actually contain any data; instead, it's a "computational expression" that will be evaluated using a particular dataset or datasets. In order to access the data indicated in your pipeline, the pipeline needs to be run. Running a pipeline essentially plugs in real data to all of the computations that were defined in the pipeline.
Thanks to DataSets and BoundColumns, the pipeline engine knows where to get the data to plug in to a `Pipeline` definition. In addition, the pipeline knows about the trading calendar and the existence of any listed assets that correspond to its domain. As a result, pipeline knows how to load the required data to evaluate its computations for the dynamic set of all active equities in the specified domain each day. Under the hood, pipeline is able to perform these computations extremely efficiently by pre-fetching data and chunking its computations. By default, pipelines in the IDE are run in 6-month chunks while pipelines in Research are run in 12-month chunks, allowing pipeline to pre-fetch more data and perform the required computations more quickly. Importantly, the chunksize does not affect the output of pipeline, it only affects the speed and memory usage of the pipeline engine.
Pipeline also knows about `timestamps` in each dataset, so it can surface data in a point-in-time fashion and prevent lookahead bias. The result of a pipeline is evaluated one day at a time and pipeline computations are only allowed to access data that was available prior to the simulation date. For example, if a pipeline was run from 01/01/2017 to 01/01/2018, a data point that was learned on 05/01/2017 (`timestamp` at 05/01/2017 6:00pm ET) will only be available when computing the results for 05/02/2017 onward. The concept of point-in-time data on Quantopian is further explained here in the Data Reference.
The output of running a pipeline depends on the environment in which it is run. In Research, a pipeline is run using `run_pipeline()`, which requires explicit start and end dates to be provided. In the IDE, a pipeline must be 'attached' to an algorithm where it is evaluated on each day of a backtest. For more on pipeline outputs, refer to Running Pipelines in Research and Running Pipelines in the IDE.