On Quantopian, 'algorithms' are Python programs that define trading logic using the Algorithm API. Algorithms must be developed in the the IDE (not in Research). On Quantopian, an algorithm is the unit of work that is needed to run a backtest. A backtest is a simulation over historical data to see how a particular strategy would have performed under realistic trading conditions.
Algorithms are different from pipelines in that they focus on defining ordering logic and portfolio construction instead of querying data and building factors. Usually, an algorithm will 'attach' a pipeline and then simulate order fills and portfolio construction based on factors defined in that pipeline.
This page explains the process of developing an algorithm on Quantopian using the Algorithm API.
On Quantopian, trading algorithms are Python programs that are partitioned into four major parts:
- Initializing an algorithm: Initialize state, schedule functions, and register a Pipeline.
- Performing computations on data: Import the data your algorithm uses to build your portfolio, then perform any necessary computations with the data.
- Rebalancing a portfolio of assets: Buy/sell assets based on your imported data/computations.
- Logging and plotting: Log and plot bookkeeping variables for further analysis.
When an algorithm is backtested, the initialization step is only run once, while the rest of the algorithm is run at a regular frequency (usually once per day, week, or month of the selected backtest period).
The rest of this page walks through the four parts of a Quantopian trading algorithm in greater detail.
The initialization step in an algorithm is responsible for running code that only needs to be executed once in a trading algorithm. Most commonly, this step includes initializing state, attaching a pipeline, and scheduling functions.
initialize() is the only required function in an algorithm, so it is usually best to start by defining it. Of course, your algorithm won't do much without defining other parts!
It is often desirable to set a starting state of an algorithm. For example, you might want to set certain global parameters that are referenced later on in an algorithm definition. On Quantopian, setting state is done using the
context object is a Python dictionary of class
context object is used to maintain state throughout your algorithm. This object is passed to
before_trading_start(), and all scheduled functions in your algorithm. You should use
context instead of
global variables to pass variable values between the various methods in your algorithm.
context dictionary has been augmented so that properties can be accessed using dot notation (
context.some_property) as well as the traditional bracket notation. For example, you might use
context to set a parameter:
context.my_parameter = 0.5
Context variables can be defined and/or modified in any method of your algorithm, not just
initialize(). However, you should define initial context variable values and context variables that persist throughout the algorithm in the
For example, the following code would attach an empty
Pipeline to your algorithm under the name
from quantopian.algorithm import attach_pipeline, pipeline_output from quantopian.pipeline import Pipeline def make_pipeline(): # Instantiates an empty Pipeline. return Pipeline() def initialize(context): # Creates a reference to the empty Pipeline. pipe = make_pipeline() # Registers the empty Pipeline under the name 'my_pipeline'. attach_pipeline(pipe, name='my_pipeline')
Attaching a pipeline registers it such that its computations are actually executed for each simulation day in your backtest. Once your pipeline is attached to your algorithm, you can access the output of that pipeline each day via
pipeline_output(). Working with pipeline results in an algorithm is covered later in this document.
The Pipeline API is the exact same in both Research and the IDE, so you can copy your pipeline definition directly from a notebook to your algorithm. Just make sure to remove any reference to
run_pipeline() and replace any reference to
symbol() (if you use them in your pipeline), as these functions are proprietary to their respective environments.
As the backtest simulation progresses through the dates of your backtest, pipeline computations are not executed on each simulated day. For performance reasons, computations are instead run in "chunks".
At the beginning of a backtest, an attached pipeline is executed in a 1-week chunk. This first 1-week chunk pre-fetches the data that it needs, and performs computations for the first week of the backtest simulation. Each day's results for this 1 week chunk are cached and made appropriately accessible as the backtester progresses through the first week of the simulation. After the first week of the simulation, the process of precomputing pipeline results repeats itself, but subsequent pipeline executions are conducted in 6-month chunks.
Each pipeline 'chunk' has 10 minutes to complete, or the algorithm will raise a
The reason that the backtester starts with a shorter 1-week chunk is so that you can quickly see results and verify that your backtest didn't raise an error. During the 1-week chunk, you should see a "Loading Pipeline. This might take a minute or two." status message in the console (if you're building) or the full backtest screen (if you're backtesting).
Most trading algorithms require certain actions or analyses to be performed on a regular basis. During the initialization step of a Quantopian algorithm, you can tell your algorithm to execute actions at particular times by scheduling functions to run at specific times of the day, week, or month. All scheduling must be done from within the :meth:`~quantopian.algorithm.interface.initialize` method.
schedule_function() function allows you to specify when functions are run with date and time rules.
For example, this algorithm would run
myfunc every day, one minute after market open.
import quantopian.algorithm as algo def initialize(context): algo.schedule_function( func=myfunc, date_rule=algo.date_rules.every_day(), time_rule=algo.time_rules.market_close(minutes=1), calendar=algo.calendars.US_EQUITIES )
Scheduled functions are not asynchronous. If two scheduled functions are supposed to run at the same time, they will happen sequentially, in the order in which they were created.
Any function that is scheduled via
schedule_function() must accept two arguments:
context will be provided a reference to an
AlgorithmContext instance while
data will be provided a reference to a
BarData is explained later on this page.
Scheduled functions share a 50 second time limit with
handle_data(), i.e., the total amount of time taken up by any scheduled functions and
handle_data() for the same minute can't exceed 50 seconds. When this occurs, a
TimeoutException will be raised.
Each minute in a day is labeled by its end time. So the first minute in each trading day is usually 9:31AM ET for US markets; the last minute is usually 3:59PM ET (last minute when it would be possible for an order to fill).
Specific times for relative scheduling rules such as "market open" or "market close" will depend on the calendar that is selected before running a backtest. If you would like to schedule a function according to the time rules of a different calendar from the one selected, you can specify a
calendar argument. To specify a calendar, you must import calendars from the
Currently, there are two calendars supported on Quantopian algorithms:
- The US Equity calendar runs weekdays from 9:30AM-4PM Eastern Time and respects the US stock market holiday schedule, including partial days. Futures trading is not supported on the US Equity calendar.
- The US Futures calendar runs weekdays from 6:30AM-5PM Eastern Time and respects the futures exchange holidays observed by all US futures (US New Years Day, Good Friday, and Christmas). Trading futures and equities are both supported on the US Futures calendar. Overnight US futures data can be retrieved with a
history()call. However, placing orders can only be done between 6:30AM-5PM.
The US Futures calendar is no longer officially supported by Quantopian. Backtests that are run on the US futures calendar cannot be simulated past 2018, as no futures data is available beyond then.
Setting Slippage and Commissions¶
In backtesting, the price impact and fill rate of trades are wrapped into a slippage model. By default, US equity orders follow the
FixedBasisPointsSlippage model at 5 basis points fixed slippage and 10% volume share limit.
If you want to override the default slippage model, you can do so in the
initialize() method. To set slippage, use the
set_slippage() method and pass in one of the built-in slippage models, or a custom slippage model that you define.
Builtin Slippage Models¶
There are several built-in slippage models that can be selected when backtesting a strategy including the following:
For more detail on the implementation of each model, click through the links above to see the API reference.
Custom Slippage Models¶
If none of the built-in slippage models suit your need, you can build a custom slippage model that uses your own logic to convert a stream of orders into a stream of transactions.
Your custom model must be a class that inherits from
slippage.SlippageModel and implements
process_order() method must return a tuple of
(execution_price, execution_volume), which signifies the price and volume for the transaction that your model wants to generate. The transaction is then created for you.
Your model gets passed the same data object that is passed to your other functions, letting you do any price or history lookup for any security in your model. The order object contains the rest of the information you need, such as the asset, order size, and order type.
The order object has the following properties:
limit (float), and
limit_reached (boolean). The
trade_bar object is the same as
handle_data() and has
slippage.create_transaction() method takes the given order, the data object, and the price and amount calculated by your slippage model, and returns the newly constructed transaction.
Many slippage models' behavior depends on how much of the total volume traded is being captured by the algorithm. You can use
self.volume_for_bar() to see how many shares of the current security have been traded so far during this bar. If your algorithm has many different orders for the same stock in the same bar, this is useful for making sure you don't take an unrealistically large fraction of the traded volume.
If your slippage model doesn't place a transaction for the full amount of the order, the order stays open with an updated amount value, and will be passed to process_order on the next bar. Orders that have limits that have not been reached will not be passed to process_order. Finally, if your transaction has 0 shares or more shares than the original order amount, an exception will be thrown.
On Quantopian, trading fees are captured in a commission model. This commission model affects the fill rate of orders placed in backtests. The default commission model for US equity orders is
PerShare, at $0.001 per share and no minimum cost per order. The first fill will incur at least the minimum commission, and subsequent fills will incur additional commission.
Commissions are taken out of the algorithm's available cash. Regardless of what commission model you use, orders that are canceled before any fills occur do not incur any commission.
You can see how much commission has been associated with an order by fetching the order using
get_order() and then looking at its
Manual Asset Lookup¶
If you want to manually reference an equity, you can use the
symbol() function to look up a security by its ticker or company name.
symbol(), you'll need to consider your simulation dates to ensure that the ticker is referring to the correct equity -- sometimes, tickers are reused over time as companies delist and new ones begin trading. For example, G used to refer to Gillette, but now refers to Genpact. If a ticker was reused by multiple companies, use
set_symbol_lookup_date() to specify what date to use when resolving conflicts. This date needs to be set before any conflicting calls to
Another option to manually reference an asset is to use the
sid() function. All securities have a unique security identifier (SID) in our system. Since symbols may be reused among exchanges, this prevents any confusion and ensures that you are calling the desired asset regardless of simulation date. You can use the
sid() method to look up a security by its ID, symbol, or name.
Quantopian's backtester will attempt to automatically adjust the backtest's start or end dates to accommodate the assets that are being used. For example, if you're trying to run a backtest with Tesla in 2004, the backtest will suggest you begin on June 28, 2010, the first day the security traded. This ability is significantly decreased when using
symbol() instead of
symbols() methods accept only string literals as parameters. The
sid() method accepts only an integer literal as a parameter. A static analysis is run on the code to quickly retrieve data needed for the backtest.
There are two ways to access data in an algorithm:
1. Pipeline. The more common way to access data is to attach a pipeline to an algorithm and retrieve the output every day. Once you have factor data (generated by an attached pipelin) in your algorithm, you can use it to make trading and portfolio construction decisions.
1. BarData Lookup. In addition to using pipeline, you can query minute level pricing and volume data using the built-in
BarData object (available in scheduled functions via the
BarData provides methods that allow you to query current and historical minute-frequency data in algorithms.
These two techniques for using data in algorithms are covered in more detail in the next sections.
Pipeline in Algorithms¶
As discussed earlier in this guide, the purpose of pipeline is to make it easy to define and execute cross-sectional trailing-window computations. Once you have developed a pipeline in Research, you can attach it to your algorithm. Once your pipeline is attached to your algorithm, it will be executed and make results available via
pipeline_output() for each day in a backtest simulation.
pipeline_output() can be called from within any scheduled function.
For example, the following code gets the output from a pipeline that was attached under the name
def my_scheduled_function(context, data): # Access results using the name passed to attach_pipeline. pipeline_results_today = pipeline_output('my_pipeline')
The return value of
pipeline_output() will be a
pd.DataFrame with columns corresponding to the columns that were included in the pipeline definition and one row per asset that was listed on a supported exchange in the specified domain on that day. Additionally, any equities that do not pass the
screen (if one was provided) will be omitted from the output.
pd.DataFrame returned by
pipeline_output() is slightly different than the
pd.DataFrame returned by
run_pipeline() in Research. Pipelines in research produce
pd.MultiIndex data frames while with the first index level corresponding to the simulation date and the second index level corresponding to the equity object. Pipelines attached to algorithms have an implied simulation date equal to the current backest simulation date, so the output dataframe has a regular
pd.Index that only contains equity objects.
Pipelines are computed in a special computation window and have their results cached.
pipeline_output() simply reads the result of the Pipeline, which is a fast operation, so it is a computationally inexpensive operation.
You can use multiple pipelines in the same algorithm. To do this, you can define multiple pipelines in an algorithm and attach each one under a different name.
Post Processing Computations¶
Sometimes, there might be an operation or trasnformation that you want to perform on a pipeline output before using it to inform trading decisions. For example, you might have a custom computation that cannot easily be expressed as a pipeline custom factor. In this situation, you should use
before_trading_start() to define the computation as a post-processing step on the pipeline output.
before_trading_start() is an optional method called once a day, before the market opens but after the current day's pipeline has been computed.
before_trading_start() is a general-purpose function with a 5 minute time limit. It is a good place to perform once-per-day calculations such as a post-processing step on a pipeline output.
In addition to accessing daily data via pipeline, you can access minute-level pricing and volume data in an algorithm. Minute level pricing and volume data is not available in pipeline, but can be retrieved via
Any function that is scheduled via
schedule_function() must be defined to accept two arguments:
data argument is provided with an instance of
BarData which has several methods that can be used to retrieve minute frequency pricing and volumen data. This means that you can use
BarData methods in any scheduled function.
In general, computations should be performed in a pipeline whenever possible (it's much faster). However,
BarData provides methods that allow you to access to minute-frequency data which is not available in pipeline. With the
BarData methods, you can:
- Get open/high/low/close/volume (OHLCV) values for the current minute for any asset.
- Get historical windows of OHLCV values for any asset.
- Check if the last known price data of an asset is stale.
The instance of
BarData provided to scheduled functions knows your algorithm's simulation time/date and uses that time for all its internal calculations.
All the methods on
BarData accept a single
Asset or a list of
Asset objects, and the price fetching methods also accept an OHLCV (open, high, low, close, volume) field or a list of OHLCV fields. The more that your algorithm can batch up queries by passing multiple assets or multiple fields, the faster Quantopian can get that data to your algorithm.
If you request a history of minute data that extends past the start of the day (usually 9:31AM), the
history() function will get the remaining minute bars from the end of the previous day. For example, if you ask for 60 minutes of pricing data at 10:00AM on the equities calendar, the first 30 prices will be from the end of the previous trading day, and the next 30 will be from the current morning.
Quantopian's price data starts on Jan 2, 2002. Any pricing data call that extends before that date will raise an exception.
Once you have the output of your algorithm's pipeline, the next step is to construct a portfolio. In this step of algorithm development, you will use your algorithm's pipeline output to define a portfolio optimization problem and construct your portfolio.
Where should this go in your algorithm? The portfolio construction logic of your algorithm should be executed in a scheduled function. This allows you to control the rebalance frequency of your strategy.
The best way to place orders in your algorithm is with Optimize. Optimize allows you to move your portfolio from one state to another by defining a portfolio optimization problem. To construct a portfolio with Optimize, define an objective, a set of constraints, and pass them both to
There are two important features of orders that you should know if you are going to use manual ordering. First, all open orders are canceled at the end of the trading day. If you want to try to fill a canceled order on the next trading day, you will have to manually open a new order for the remaining amount. Second, there is no limit to the amount of cash you can spend. Even if an order would take you into negative cash, the backtester won't stop you. It's up to the algorithm to make sure that it doesn't order more than the amount of cash it holds (assuming that is the desired behavior). This type of control is best done using the portfolio opject, described below.
The Portfolio Object¶
Your portfolio represents all the assets you currently hold. Each algorithm has exactly one portfolio. You can access the algorithm's
Portfolio object in your algorithm via
context.portfolio. Note that
Portfolio has many attributes that you can use to get information about the current state of your algorithm's portfolio.
Viewing Portfolio State¶
Before rebalancing, you might want to check what assets you already hold. You can view your portfolio's state using properties of
Portfolio. For example, you can use
context.portfolio.positions to access your algorithm's
Positions which contains a dictionary of all open positions, or use
context.portfolio.cash to view the current amount of cash in your portfolio.
Portfolio reference for a full list of available attributes.
Common Rebalancing Issues¶
Quantopian forward-fills pricing data. However, your algorithm might need to know if the price for an equity is from the most recent minute before placing orders. The
is_stale() method returns
True if the asset is alive but the latest price is from a previous minute.
You can get information about orders using order status functions. For example, you can see the status of a specific order by calling
get_order(), or see a list of all open orders by calling
get_open_orders(). For a full list of order status functions, see the API Reference.
All open orders are canceled at the end of the day, both in backtesting and live trading. This is true for orders placed by any means (using Optimize API and/or manual order methods). You can also cancel orders before the end of the day using
Logging and Plotting¶
As you run your algorithm, you may want to log or plot custom metrics beyond those that are automatically tracked by the backtester.
Your algorithm can easily generate log output by using the
log.info() method. Log output appears in the right-hand panel of the IDE or under the "Activity" tab of the full backtest result page.
Logging is rate-limited (throttled) for performance reasons. The basic limit is two log messages per call of initialize and 2 more log messages for each minute. Each backtest has an additional buffer of 20 extra log messages. Once the limit is exceeded, messages are discarded until the buffer has been emptied. A message explaining that some messages were discarded is shown.
- Suppose in
initialize()you log 22 lines. Two lines are permitted, plus the 20 extra log messages, so this works. However, a 23rd log line would be discarded.
- Suppose you run a scheduled function every minute and in that function, you log three lines. Each time the scheduled function is called, two lines are permitted, plus one of the extra 20 is consumed. On the 21st call two lines are logged, and the last line is discarded. Subsequent log lines are also discarded until the buffer is emptied.
Additionally, there is a per-member overall log limit. When a backtest causes the overall limit to be reached, the logs for the oldest backtest are discarded.
In an algorithm, you have some (limited) plotting functionality. Using
record(), you can create time series charts by passing series names and corresponding values using keyword arguments. Up to five series can be charted in an algorithm at day-level granularity. Timeseries' are then displayed in a chart below the performance chart (if you click "Build Algorithm" or under the "Activity" tab if you run a full backtest.
In backtesting, the last recorded value per day is used. Therefore, we recommend using
schedule_function() to record values once per day.
This minimal example records and plots the price of MSFT and AAPL every day at market close:
def initialize(context): schedule_function(record_vars, date_rules.every_day(), time_rules.market_close()) def record_vars(context, data): # track the prices of MSFT and AAPL record(msft=data.current(sid(5061), 'price'), aapl=data.current(sid(24), 'price'))
Writing Faster Algorithms¶
The Quantopian backtester will run fastest when you follow best practices, such as:
Use Pipeline: Pipeline is the most efficient way to access data on Quantopian. Whenever possible, you should use pipeline to perform computations in order to have your backtests run as fast as possible.
Only access minute data when you need it: Minute frequency pricing and volume data is always accessible to you via
BarData methods, and it is loaded on-demand. You pay the performance cost of accessing the data when you call for it. If your algorithm checks a price every minute, that has a performance cost. If you don't need pricing data that frequently, asking for it less frequently will speed up your algorithms.
Avoid 'for' loops: As much as possible, try to use vectorized computations instead of
for loops. Most data structures on Quantopian support vectorized computations which are usually much faster than for loops. On Quantopian, numpy arrays and pandas data structures are two common entities that support vectorized computations.
Batch minute data look ups: As much as possible, you should batch any calls to
BarData() methods. All of the data functions (
is_stale()) accept a list of assets when requesting data. Running these once with a list of assets will be significantly more performant than looping through the list of assets and calling these functions individually per asset.
Record data daily, not minutely, in backtesting: Any data you record in your backtest will record the last data point per day. If you try to plot something every minute using
record(), it will still only record one data point per day in the backtest.
Access account and portfolio data only when needed: Account and portfolio information is calculated daily or on demand. Accessing your algorithm's
Portfolio in multiple different minutes per day will force the system to calculate your entire portfolio in each of those minutes, slowing down the backtest. You should only access
Portfolio when you need to use the data.