Back to Community
When to use data.history vs pipeline?

Are there any guidelines around this? For example, if I was doing a simple moving average crossover strategy over the whole universe of stocks, would I use data.history in my scheduled function and create the moving averages there, or should I do it in the data pipeline?

Also, another thing, if I screen data in the pipeline and day 1 I go long a stock, but that stock gets screened out over the next days, how will the algo know to sell if there is no data for it (as it's screened out)?

ie if I screen for stocks over $5, then I buy at $6, then the stock goes under $5 and gets screened out.

2 responses

The main raison d'ĂȘtre of pipeline is to speed up the data fetches. Using the data.history method to fetch the past 20 days prices every day for 100 days results in 100 separate database calls. However, much of the historical data each day will be the same each day. One keeps fetching the same data over and over again. Pipeline improves this situation by 'chunking' the total timeframe (ie 100 days) into smaller chunks and may therefore only call the database a couple of times to get the first 50 and then the next 50 days of data. So, the guideline is if speed is important then use pipeline.

It's a problem with stocks falling out of the pipeline. I'd suggest NOT filtering any results in the pipeline. Do all the filter and select methods once the pipeline data is returned. That way data for any stocks currently held will be available regardless if they do not meet any current criteria.

I can give some examples if that's not entirely clear. Good luck.

Thanks Dan, that is very clear. I understand the performance aspect of the pipeline and i think its a great idea. I like the way zipline runs all symbols in parallel each day rather than each symbol sequentially (like quantrat R).

Soon i will be implementing some (real strategies) factors that i can rank on and use alphalens to find alpha and this is where the pipe will come in really handy, but for now i'm just playing around with some naive momentum strategies and using stock selection of the universe but I'm also wanting to put some constraints on the percentage of each position and leverage.

Here is the algo I've implemented so far, feel free to comment on anything I've done silly.
The strategy just tries to hold a basket of 200 stocks with each position 5%.
It looks for relative strength against the SPY (not properly implemented yet), absolute strength with SMA's, VIX under 25 and close price above $1.
It sells on absolute weakness (SMA's).

Clone Algorithm
Total Returns
Max Drawdown
Benchmark Returns
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
This is a template algorithm on Quantopian for you to adapt and fill in.
import quantopian.algorithm as algo
from quantopian.pipeline import Pipeline, CustomFactor
from import USEquityPricing
from quantopian.pipeline.filters import QTradableStocksUS
from quantopian.pipeline.factors import SimpleMovingAverage
import pandas as pd
import numpy as np
from import yahoo_index_vix

def initialize(context):
    Called once at the start of the algorithm.
    # Rebalance every day, 1 hour after market open.

    # Record tracking variables at the end of each day.

    # Create our dynamic stock selector.
    algo.attach_pipeline(make_pipeline(), 'pipeline')

#compares symbol with the SPY
#### this needs to be updated to compare returns, rather than just ratio
class Ratio(CustomFactor):
    window_safe = True
    def compute(self, today, assets, out, close):
        market_idx = assets.get_loc(8554)
        idx_close = close[-1,market_idx]
        ratio = close / idx_close
        out[:] = ratio

#adds the VIX for volitility 
class VIXFactor(CustomFactor):
    def compute(self, today, assets, out, vix):  
        out[:] = vix
# Pipeline definition
def make_pipeline():
    base_universe = QTradableStocksUS()
    ratio = Ratio(inputs=[USEquityPricing.close], window_length=1)
    #symbol vs index
    smaSymVIndexF = SimpleMovingAverage(
    smaSymVIndexS = SimpleMovingAverage(
    #adds VIX
    vix_close = VIXFactor(inputs=[yahoo_index_vix.close], window_length=1)
    close_price = USEquityPricing.close.latest

    smaF = SimpleMovingAverage(
    smaS = SimpleMovingAverage(
    #go long signal
    buy = ((smaSymVIndexF > smaSymVIndexS) 
           & (smaF > smaS) 
           & (close_price > 1) 
           & (vix_close < 25)
    #sell any longs
    sell = smaF < smaS
    return Pipeline(
            'close_price': close_price,
            'buy': buy,

def before_trading_start(context, data):
    Called every day before market open.
    context.output = algo.pipeline_output('pipeline')

    # These are the securities that we are interested in trading each day.
    context.security_list = context.output.index
    #print len(context.security_list)

def rebalance(context, data):
    Execute orders according to our schedule_function() timing.
    #not sure if this is best way to get longs, copied someones code
    open_rules = 'buy == True'
    open_these = context.output.query(open_rules).index.tolist()
    current_positions = context.portfolio.positions
    #naive way of tracking open positions, not sure if better way
    positions_now = len(current_positions)

    for stock in open_these:
        #i use a naive way to constrain to under 200 positions
        if stock not in current_positions and data.can_trade(stock) and (positions_now < 201):
            order_percent(stock, .005)
            positions_now = positions_now +1
  "Buying %s" % (stock.symbol))
    #again this was just copied from someone else
    close_rules = 'sell == True'
    close_these = context.output.query(close_rules).index.tolist()
    for stock in close_these:
        if stock in current_positions and data.can_trade(stock):
            order_target(stock, 0)
  "Selling %s" % (stock.symbol))


def record_vars(context, data):
    Plot variables at the end of each day.
There was a runtime error.