Algorithms

On Quantopian, 'algorithms' are Python programs that define trading logic using the Algorithm API. Algorithms must be developed in the the IDE (not in Research). On Quantopian, an algorithm is the unit of work that is needed to run a backtest. A backtest is a simulation over historical data to see how a particular strategy would have performed under realistic trading conditions.

Algorithms are different from pipelines in that they focus on defining ordering logic and portfolio construction instead of querying data and building factors. Usually, an algorithm will 'attach' a pipeline and then simulate order fills and portfolio construction based on factors defined in that pipeline.

This page explains the process of developing an algorithm on Quantopian using the Algorithm API.

Algorithm Structure

On Quantopian, trading algorithms are Python programs that are partitioned into four major parts:

  1. Initializing an algorithm: Initialize state, schedule functions, and register a Pipeline.
  2. Performing computations on data: Import the data your algorithm uses to build your portfolio, then perform any necessary computations with the data.
  3. Rebalancing a portfolio of assets: Buy/sell assets based on your imported data/computations.
  4. Logging and plotting: Log and plot bookkeeping variables for further analysis.

When an algorithm is backtested, the initialization step is only run once, while the rest of the algorithm is run at a regular frequency (usually once per day, week, or month of the selected backtest period).

The rest of this page walks through the four parts of a Quantopian trading algorithm in greater detail.

Note

Developing an algorithm on Quantopian The Algorithm API provides structured methods for building a trading algorithm. The Algorithm API is available in the IDE, not in Research.

Initialization

The initialization step in an algorithm is responsible for running code that only needs to be executed once in a trading algorithm. Most commonly, this step includes initializing state, attaching a pipeline, and scheduling functions.

On Quantopian, all initialization logic is defined in the initialize() method. initialize() is a required method that is called only once, at the beginning of a backtest.

The following sections describe the various actions that can be performed in initialize(), many of which are controlled by functions in the quantopian.algorithm module.

Note

Technically, initialize() is the only required function in an algorithm, so it is usually best to start by defining it. Of course, your algorithm won't do much without defining other parts!

Initializing State

It is often desirable to set a starting state of an algorithm. For example, you might want to set certain global parameters that are referenced later on in an algorithm definition. On Quantopian, setting state is done using the context object.

The context object is a Python dictionary of class AlgorithmContext. The context object is used to maintain state throughout your algorithm. This object is passed to initialize(), before_trading_start(), and all scheduled functions in your algorithm. You should use context instead of global variables to pass variable values between the various methods in your algorithm.

The context dictionary has been augmented so that properties can be accessed using dot notation (context.some_property) as well as the traditional bracket notation. For example, you might use context to set a parameter:

context.my_parameter = 0.5

Context variables can be defined and/or modified in any method of your algorithm, not just initialize(). However, you should define initial context variable values and context variables that persist throughout the algorithm in the initialize() method.

Attaching Pipelines

Once you've defined a pipeline, you need to attach it to your algorithm in the initialize() method. To attach a pipeline, use the attach_pipeline() function.

For example, the following code would attach an empty Pipeline to your algorithm under the name my_pipeline:

from quantopian.algorithm import attach_pipeline, pipeline_output
from quantopian.pipeline import Pipeline

def make_pipeline():
    # Instantiates an empty Pipeline.
    return Pipeline()

def initialize(context):
    # Creates a reference to the empty Pipeline.
    pipe = make_pipeline()
    # Registers the empty Pipeline under the name 'my_pipeline'.
    attach_pipeline(pipe, name='my_pipeline')

Attaching a pipeline registers it such that its computations are actually executed for each simulation day in your backtest. Once your pipeline is attached to your algorithm, you can access the output of that pipeline each day via pipeline_output(). Working with pipeline results in an algorithm is covered later in this document.

Note

The Pipeline API is the exact same in both Research and the IDE, so you can copy your pipeline definition directly from a notebook to your algorithm. Just make sure to remove any reference to run_pipeline() and replace any reference to symbols() with symbol() (if you use them in your pipeline), as these functions are proprietary to their respective environments.

Pipeline Execution

As the backtest simulation progresses through the dates of your backtest, pipeline computations are not executed on each simulated day. For performance reasons, computations are instead run in "chunks".

At the beginning of a backtest, an attached pipeline is executed in a 1-week chunk. This first 1-week chunk pre-fetches the data that it needs, and performs computations for the first week of the backtest simulation. Each day's results for this 1 week chunk are cached and made appropriately accessible as the backtester progresses through the first week of the simulation. After the first week of the simulation, the process of precomputing pipeline results repeats itself, but subsequent pipeline executions are conducted in 6-month chunks.

Note

Each pipeline 'chunk' has 10 minutes to complete, or the algorithm will raise a PipelineTimeout error.

Note

The reason that the backtester starts with a shorter 1-week chunk is so that you can quickly see results and verify that your backtest didn't raise an error. During the 1-week chunk, you should see a "Loading Pipeline. This might take a minute or two." status message in the console (if you're building) or the full backtest screen (if you're backtesting).

Scheduling Functions

Most trading algorithms require certain actions or analyses to be performed on a regular basis. During the initialization step of a Quantopian algorithm, you can tell your algorithm to execute actions at particular times by scheduling functions to run at specific times of the day, week, or month. All scheduling must be done from within the :meth:`~quantopian.algorithm.interface.initialize` method.

The schedule_function() function allows you to specify when functions are run with date and time rules.

For example, this algorithm would run myfunc every day, one minute after market open.

import quantopian.algorithm as algo

def initialize(context):
    algo.schedule_function(
        func=myfunc,
        date_rule=algo.date_rules.every_day(),
        time_rule=algo.time_rules.market_close(minutes=1),
        calendar=algo.calendars.US_EQUITIES
    )

Scheduled functions are not asynchronous. If two scheduled functions are supposed to run at the same time, they will happen sequentially, in the order in which they were created.

Any function that is scheduled via schedule_function() must accept two arguments: context and data. context will be provided a reference to an AlgorithmContext instance while data will be provided a reference to a BarData instance. BarData is explained later on this page.

Note

Scheduled functions share a 50 second time limit with handle_data(), i.e., the total amount of time taken up by any scheduled functions and handle_data() for the same minute can't exceed 50 seconds. When this occurs, a TimeoutException will be raised.

Note

Each minute in a day is labeled by its end time. So the first minute in each trading day is usually 9:31AM ET for US markets; the last minute is usually 3:59PM ET (last minute when it would be possible for an order to fill).

Trading Calendars

Specific times for relative scheduling rules such as "market open" or "market close" will depend on the calendar that is selected before running a backtest. If you would like to schedule a function according to the time rules of a different calendar from the one selected, you can specify a calendar argument. To specify a calendar, you must import calendars from the quantopian.algorithm module.

Currently, there are two calendars supported on Quantopian algorithms:

  • The US Equity calendar runs weekdays from 9:30AM-4PM Eastern Time and respects the US stock market holiday schedule, including partial days. Futures trading is not supported on the US Equity calendar.
  • The US Futures calendar runs weekdays from 6:30AM-5PM Eastern Time and respects the futures exchange holidays observed by all US futures (US New Years Day, Good Friday, and Christmas). Trading futures and equities are both supported on the US Futures calendar. Overnight US futures data can be retrieved with a history() call. However, placing orders can only be done between 6:30AM-5PM.

Note

The US Futures calendar is no longer officially supported by Quantopian. Backtests that are run on the US futures calendar cannot be simulated past 2018, as no futures data is available beyond then.

Setting Slippage and Commissions

To make simulations as realistic as possible, trading costs are incorporated into backtests. On Quantopian, trading costs are simulated using slippage and commission models.

Slippage

In backtesting, the price impact and fill rate of trades are wrapped into a slippage model. By default, US equity orders follow the FixedBasisPointsSlippage model at 5 basis points fixed slippage and 10% volume share limit.

If you want to override the default slippage model, you can do so in the initialize() method. To set slippage, use the set_slippage() method and pass in one of the built-in slippage models, or a custom slippage model that you define.

The contest and allocation process both use the default slippage model, so overriding the default is generally discouraged.

Builtin Slippage Models

There are several built-in slippage models that can be selected when backtesting a strategy including the following:

  • FixedBasisPointsSlippage (default, used in the contest)
  • VolumeShareSlippage
  • FixedSlippage

For more detail on the implementation of each model, click through the links above to see the API reference.

Custom Slippage Models

Note

If you are interested in entering the contest, using a custom slippage model is discouraged, since contest algorithms are tested and scored using the default FixedBasisPointsSlippage model.

If none of the built-in slippage models suit your need, you can build a custom slippage model that uses your own logic to convert a stream of orders into a stream of transactions.

Your custom model must be a class that inherits from slippage.SlippageModel and implements process_order(). The process_order() method must return a tuple of (execution_price, execution_volume), which signifies the price and volume for the transaction that your model wants to generate. The transaction is then created for you.

Your model gets passed the same data object that is passed to your other functions, letting you do any price or history lookup for any security in your model. The order object contains the rest of the information you need, such as the asset, order size, and order type.

The order object has the following properties: amount (float), asset (Asset), stop and limit (float), and stop_reached and limit_reached (boolean). The trade_bar object is the same as data[sid] in handle_data() and has open_price, close_price, high, low, volume, and sid.

The slippage.create_transaction() method takes the given order, the data object, and the price and amount calculated by your slippage model, and returns the newly constructed transaction.

Many slippage models' behavior depends on how much of the total volume traded is being captured by the algorithm. You can use self.volume_for_bar() to see how many shares of the current security have been traded so far during this bar. If your algorithm has many different orders for the same stock in the same bar, this is useful for making sure you don't take an unrealistically large fraction of the traded volume.

If your slippage model doesn't place a transaction for the full amount of the order, the order stays open with an updated amount value, and will be passed to process_order on the next bar. Orders that have limits that have not been reached will not be passed to process_order. Finally, if your transaction has 0 shares or more shares than the original order amount, an exception will be thrown.

Commissions

On Quantopian, trading fees are captured in a commission model. This commission model affects the fill rate of orders placed in backtests. The default commission model for US equity orders is PerShare, at $0.001 per share and no minimum cost per order. The first fill will incur at least the minimum commission, and subsequent fills will incur additional commission.

Commissions are taken out of the algorithm's available cash. Regardless of what commission model you use, orders that are canceled before any fills occur do not incur any commission.

To override the default commission model, use the set_commission() method and pass in PerShare or PerTrade. Like the slippage model, set_commission() must be invoked in the initialize() method.

The contest and allocation process both use the default commission model, so overriding the default is generally discouraged.

Note

You can see how much commission has been associated with an order by fetching the order using get_order() and then looking at its commission field.

Manual Asset Lookup

If you want to manually reference an equity, you can use the symbol() function to look up a security by its ticker or company name.

When using symbol(), you'll need to consider your simulation dates to ensure that the ticker is referring to the correct equity -- sometimes, tickers are reused over time as companies delist and new ones begin trading. For example, G used to refer to Gillette, but now refers to Genpact. If a ticker was reused by multiple companies, use set_symbol_lookup_date() to specify what date to use when resolving conflicts. This date needs to be set before any conflicting calls to symbol().

Another option to manually reference an asset is to use the sid() function. All securities have a unique security identifier (SID) in our system. Since symbols may be reused among exchanges, this prevents any confusion and ensures that you are calling the desired asset regardless of simulation date. You can use the sid() method to look up a security by its ID, symbol, or name.

When you use the symbol() or sid() functions, the IDE will autofill SIDs and symbols.

Note

Quantopian's backtester will attempt to automatically adjust the backtest's start or end dates to accommodate the assets that are being used. For example, if you're trying to run a backtest with Tesla in 2004, the backtest will suggest you begin on June 28, 2010, the first day the security traded. This ability is significantly decreased when using symbol() instead of sid.()

The symbol() and symbols() methods accept only string literals as parameters. The sid() method accepts only an integer literal as a parameter. A static analysis is run on the code to quickly retrieve data needed for the backtest.

Using Data

There are two ways to access data in an algorithm:

1. Pipeline. The more common way to access data is to attach a pipeline to an algorithm and retrieve the output every day. Once you have factor data (generated by an attached pipelin) in your algorithm, you can use it to make trading and portfolio construction decisions. 1. BarData Lookup. In addition to using pipeline, you can query minute level pricing and volume data using the built-in BarData object (available in scheduled functions via the data variable). BarData provides methods that allow you to query current and historical minute-frequency data in algorithms.

These two techniques for using data in algorithms are covered in more detail in the next sections.

Pipeline in Algorithms

As discussed earlier in this guide, the purpose of pipeline is to make it easy to define and execute cross-sectional trailing-window computations. Once you have developed a pipeline in Research, you can attach it to your algorithm. Once your pipeline is attached to your algorithm, it will be executed and make results available via pipeline_output() for each day in a backtest simulation. pipeline_output() can be called from within any scheduled function.

For example, the following code gets the output from a pipeline that was attached under the name 'my_pipeline'.

def my_scheduled_function(context, data):
    # Access results using the name passed to attach_pipeline.
    pipeline_results_today = pipeline_output('my_pipeline')

The return value of pipeline_output() will be a pd.DataFrame with columns corresponding to the columns that were included in the pipeline definition and one row per asset that was listed on a supported exchange in the specified domain on that day. Additionally, any equities that do not pass the screen (if one was provided) will be omitted from the output.

Importantly, the pd.DataFrame returned by pipeline_output() is slightly different than the pd.DataFrame returned by run_pipeline() in Research. Pipelines in research produce pd.MultiIndex data frames while with the first index level corresponding to the simulation date and the second index level corresponding to the equity object. Pipelines attached to algorithms have an implied simulation date equal to the current backest simulation date, so the output dataframe has a regular pd.Index that only contains equity objects.

Note

Pipelines are computed in a special computation window and have their results cached. pipeline_output() simply reads the result of the Pipeline, which is a fast operation, so it is a computationally inexpensive operation.

Note

You can use multiple pipelines in the same algorithm. To do this, you can define multiple pipelines in an algorithm and attach each one under a different name.

Post Processing Computations

Sometimes, there might be an operation or trasnformation that you want to perform on a pipeline output before using it to inform trading decisions. For example, you might have a custom computation that cannot easily be expressed as a pipeline custom factor. In this situation, you should use before_trading_start() to define the computation as a post-processing step on the pipeline output.

before_trading_start() is an optional method called once a day, before the market opens but after the current day's pipeline has been computed. before_trading_start() is a general-purpose function with a 5 minute time limit. It is a good place to perform once-per-day calculations such as a post-processing step on a pipeline output.

Like scheduled functions, before_trading_start() accepts two arguments: context and data. These arguments correspond to instances of AlgorithmContext and BarData, respectively.

BarData Lookup

In addition to accessing daily data via pipeline, you can access minute-level pricing and volume data in an algorithm. Minute level pricing and volume data is not available in pipeline, but can be retrieved via BarData methods.

Any function that is scheduled via schedule_function() must be defined to accept two arguments: context and data. The data argument is provided with an instance of BarData which has several methods that can be used to retrieve minute frequency pricing and volumen data. This means that you can use BarData methods in any scheduled function.

In general, computations should be performed in a pipeline whenever possible (it's much faster). However, BarData provides methods that allow you to access to minute-frequency data which is not available in pipeline. With the BarData methods, you can:

  • Get open/high/low/close/volume (OHLCV) values for the current minute for any asset.
  • Get historical windows of OHLCV values for any asset.
  • Check if the last known price data of an asset is stale.

The instance of BarData provided to scheduled functions knows your algorithm's simulation time/date and uses that time for all its internal calculations.

All the methods on BarData accept a single Asset or a list of Asset objects, and the price fetching methods also accept an OHLCV (open, high, low, close, volume) field or a list of OHLCV fields. The more that your algorithm can batch up queries by passing multiple assets or multiple fields, the faster Quantopian can get that data to your algorithm.

See also

Manual asset lookup (for getting instances of Asset to pass to BarData).

Note

If you request a history of minute data that extends past the start of the day (usually 9:31AM), the history() function will get the remaining minute bars from the end of the previous day. For example, if you ask for 60 minutes of pricing data at 10:00AM on the equities calendar, the first 30 prices will be from the end of the previous trading day, and the next 30 will be from the current morning.

Note

Quantopian's price data starts on Jan 2, 2002. Any pricing data call that extends before that date will raise an exception.

Rebalancing

Once you have the output of your algorithm's pipeline, the next step is to construct a portfolio. In this step of algorithm development, you will use your algorithm's pipeline output to define a portfolio optimization problem and construct your portfolio.

Where should this go in your algorithm? The portfolio construction logic of your algorithm should be executed in a scheduled function. This allows you to control the rebalance frequency of your strategy.

Placing Orders

The best way to place orders in your algorithm is with Optimize. Optimize allows you to move your portfolio from one state to another by defining a portfolio optimization problem. To construct a portfolio with Optimize, define an objective, a set of constraints, and pass them both to order_optimal_portfolio().

Note

Algorithms must use order_optimal_portfolio() to be eligible for a capital allocation from Quantopian. The use of order_optimal_portfolio() is also required to enter the contest.

Manual Orders

Note

In order to be considered for an allocation or enter the contest, an algorithm cannot place manual orders. This is because the internal Quantopian fund machinery depends on some of the functionality in order_optimal_portfolio().

In addition to placing orders with order_optimal_portfolio(), algorithms can place manual orders using functions listed in the API reference, the most popular of which is order_target_percent().

There are two important features of orders that you should know if you are going to use manual ordering. First, all open orders are canceled at the end of the trading day. If you want to try to fill a canceled order on the next trading day, you will have to manually open a new order for the remaining amount. Second, there is no limit to the amount of cash you can spend. Even if an order would take you into negative cash, the backtester won't stop you. It's up to the algorithm to make sure that it doesn't order more than the amount of cash it holds (assuming that is the desired behavior). This type of control is best done using the portfolio opject, described below.

The Portfolio Object

Your portfolio represents all the assets you currently hold. Each algorithm has exactly one portfolio. You can access the algorithm's Portfolio object in your algorithm via context.portfolio. Note that Portfolio has many attributes that you can use to get information about the current state of your algorithm's portfolio.

Viewing Portfolio State

Before rebalancing, you might want to check what assets you already hold. You can view your portfolio's state using properties of Portfolio. For example, you can use context.portfolio.positions to access your algorithm's Positions which contains a dictionary of all open positions, or use context.portfolio.cash to view the current amount of cash in your portfolio.

See also

See the Portfolio reference for a full list of available attributes.

Common Rebalancing Issues

Unavailable Assets

Ordering a delisted security or ordering a security before an IPO are actions will raise an error in a backtest. Pipeline only returns assets that were listed on a supported exchange on each simulation day, so any assets retrieved from pipeline_output() should be tradable. If you are manually referencing assets, you might need to check if the asset is still listed. To check if an equity can be traded at a given point in your algorithm, use can_trade() (which returns True if the asset is listed and has traded at least once).

Stale Prices

Quantopian forward-fills pricing data. However, your algorithm might need to know if the price for an equity is from the most recent minute before placing orders. The is_stale() method returns True if the asset is alive but the latest price is from a previous minute.

Unfilled Orders

You can get information about orders using order status functions. For example, you can see the status of a specific order by calling get_order(), or see a list of all open orders by calling get_open_orders(). For a full list of order status functions, see the API Reference.

All open orders are canceled at the end of the day, both in backtesting and live trading. This is true for orders placed by any means (using Optimize API and/or manual order methods). You can also cancel orders before the end of the day using cancel_order().

Logging and Plotting

As you run your algorithm, you may want to log or plot custom metrics beyond those that are automatically tracked by the backtester.

Logging

Your algorithm can easily generate log output by using the log.info() method. Log output appears in the right-hand panel of the IDE or under the "Activity" tab of the full backtest result page.

Logging is rate-limited (throttled) for performance reasons. The basic limit is two log messages per call of initialize and 2 more log messages for each minute. Each backtest has an additional buffer of 20 extra log messages. Once the limit is exceeded, messages are discarded until the buffer has been emptied. A message explaining that some messages were discarded is shown.

Two examples:

  • Suppose in initialize() you log 22 lines. Two lines are permitted, plus the 20 extra log messages, so this works. However, a 23rd log line would be discarded.
  • Suppose you run a scheduled function every minute and in that function, you log three lines. Each time the scheduled function is called, two lines are permitted, plus one of the extra 20 is consumed. On the 21st call two lines are logged, and the last line is discarded. Subsequent log lines are also discarded until the buffer is emptied.

Additionally, there is a per-member overall log limit. When a backtest causes the overall limit to be reached, the logs for the oldest backtest are discarded.

Plotting

Where should this go in your algorithm? The best place to call record() is in a scheduled function. For example algorithms that record custom metrics or at the end of a daily rebalance function.

In an algorithm, you have some (limited) plotting functionality. Using record(), you can create time series charts by passing series names and corresponding values using keyword arguments. Up to five series can be charted in an algorithm at day-level granularity. Timeseries' are then displayed in a chart below the performance chart (if you click "Build Algorithm" or under the "Activity" tab if you run a full backtest.

Note

In backtesting, the last recorded value per day is used. Therefore, we recommend using schedule_function() to record values once per day.

This minimal example records and plots the price of MSFT and AAPL every day at market close:

def initialize(context):
    schedule_function(record_vars, date_rules.every_day(), time_rules.market_close())

def record_vars(context, data):
    # track the prices of MSFT and AAPL
    record(msft=data.current(sid(5061), 'price'), aapl=data.current(sid(24), 'price'))

Common Errors

Writing Faster Algorithms

The Quantopian backtester will run fastest when you follow best practices, such as:

Use Pipeline: Pipeline is the most efficient way to access data on Quantopian. Whenever possible, you should use pipeline to perform computations in order to have your backtests run as fast as possible.

Only access minute data when you need it: Minute frequency pricing and volume data is always accessible to you via BarData methods, and it is loaded on-demand. You pay the performance cost of accessing the data when you call for it. If your algorithm checks a price every minute, that has a performance cost. If you don't need pricing data that frequently, asking for it less frequently will speed up your algorithms.

Avoid 'for' loops: As much as possible, try to use vectorized computations instead of for loops. Most data structures on Quantopian support vectorized computations which are usually much faster than for loops. On Quantopian, numpy arrays and pandas data structures are two common entities that support vectorized computations.

Batch minute data look ups: As much as possible, you should batch any calls to BarData() methods. All of the data functions (history(), current(), can_trade(), and is_stale()) accept a list of assets when requesting data. Running these once with a list of assets will be significantly more performant than looping through the list of assets and calling these functions individually per asset.

Record data daily, not minutely, in backtesting: Any data you record in your backtest will record the last data point per day. If you try to plot something every minute using record(), it will still only record one data point per day in the backtest.

Access account and portfolio data only when needed: Account and portfolio information is calculated daily or on demand. Accessing your algorithm's Portfolio in multiple different minutes per day will force the system to calculate your entire portfolio in each of those minutes, slowing down the backtest. You should only access Portfolio when you need to use the data.