Back to Community
PLEASE HELP!

How do I create a universe of every stock? I would like to initialize all stocks then narrow the universe to 25% gainers.

I am having trouble figuring out how the definitions work. I am decent with python but the passing and updating times don't flow logically for me here. I think just to many things are happening "behind the scenes" for me to follow well.

I appreciate any help thank you!

4 responses

First off, welcome!

Here is a short overview of the flow of a Quantopian algorithm which may help. The three big 'behind the scenes' things to be aware of are:

  • The 'framework'. When you 'build algorithm' or 'run full backtest' or launch it for paper trading or live trading, you are really handing it over to overarching program which does some things. Those 'things' are explained below.
  • The 'pipeline' object. This is really just an object which is defined in an open source library (https://github.com/quantopian/zipline/tree/master/zipline/pipeline) that takes care of executing the actual daily (not minutely) data queries for you. It is also optimized for backtesting so it 'pre-processes' the data for speed. Instead of doing any direct database or file queries you simply define the data you want in a pipeline definition, then run the pipeline. It will output a nice Pandas dataframe with all your data. Maybe check this post out https://www.quantopian.com/posts/custom-factor-calculation-over-iterating-help.
  • All the 'built in' objects for factors and filters and order functions etc.. These again are all open sourced and can easily be imported into your algorithm. Read through documentation https://www.quantopian.com/help.

Here's the general 'flow' of an algorithm when you run the program for either a backtest or live...

  1. Anything not in a function is run once. This should really be only any imports your program needs and possibly the setting of any 'constants' your program may use. All of your logic should be inside of any functions you define.

  2. Your initialize function is called exactly once. This is typically where the pipeline is defined and any of your functions that need to be handled periodically are scheduled (using the 'schedule' function). Don't generally put any trading 'logic' here. It must be called 'initialize'.

  3. Your before_trading_start function is called every trading day before markets are open (and after all the Quantopian data feeds are updated). This is typically where the pipeline is run and the output is stored so the pipeline dataframe can be used throughout your algorithm. It must be called 'before_trading_start'

  4. Your handle_data function is called every minute. Put anything you need to update every minute here. Many programs however, do not need to check things that often and therefore do not have 'handle_data' function even defined. It must be called 'handle_data'.

  5. Your functions that were scheduled using schedule_functionare run at their pre-defined schedules. This is where the bulk of your logic resides. These can in turn call other functions if needed and/or to make your logic more readable.

So... to answer your specific questions:

How do I create a universe of every stock? This is easy. Use pipeline. The output (specifically the index) will contain ALL securities that Quantopian tracks. Note that these are common stocks, preferred stocks, ETNs, ETFs, etc. You, should really filter this down to some initial sub-set. One of the pre-defined universe filters such as Q1500US would get you the most tradable stocks for instance.

I would like to initialize all stocks then narrow the universe to 25% gainers Again, use pipeline. Create an initial filter to get only stocks, create a factor for 'gainers' (ie returns), then use the built in method '.percentile_between' to get the top 25%.

import quantopian.pipeline.filters as Filters  
import quantopian.pipeline.factors as Factors

# Built in filter to exclude ETFs etc  
is_stock = Filters.IsPrimaryShare()

# Create a factor for gains  
gains = Factors.Returns((inputs=[USEquityPricing.close], window_length=2, mask = is_stock)

# Filter to get only the top 25% stocks with highest gains .  
top_25_percent_gainers = gains.percentile_between(75, 100, mask=is_stock)

Attached is an algorithm which does just this (though it uses Q1500US for the universe of stocks). It may help getting started. Do look at the tutorials and the help docs. You may also want to look at these other posts:

more overview on what pipeline is all about
https://www.quantopian.com/posts/screen-vs-filter

a bit more about how pipeline works and is optimized
https://www.quantopian.com/posts/custom-factor-calculation-over-iterating-help

links to some good tutorials
https://www.quantopian.com/posts/quantopian-2-dot-0-tutorial-series

Clone Algorithm
8
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
'''
Basic template for stock selection and trading.
Pipline factors are used to 'pick' the stocks 
'''

# The following imports need to included when using Pipeline
from quantopian.algorithm import attach_pipeline, pipeline_output
from quantopian.pipeline import Pipeline

# Import all the built in Quantopian filters and factors (just in case)
import quantopian.pipeline.filters as Filters
import quantopian.pipeline.factors as Factors

# Import Pandas and Numpy (just in case we want to use their functionality)
import pandas as pd
import numpy as np

# Import any specialiazed packages here (eg scipy.optimize or scipy.stats)
pass

# Import any needed datasets
from quantopian.pipeline.data.builtin import USEquityPricing


# Set any 'constants' you will be using
TARGET_STOCKS = 20
TARGET_LEVERAGE = 1.0


def initialize(context):
    """
    Called once at the start of the algorithm.
    """   
    
    # Set commission model or omit and the default Q models will be used
    set_commission(commission.PerShare(cost=0.0, min_trade_cost=0.0))
    set_slippage(slippage.FixedSlippage(spread=0))
    

    # Attach the pipeline defined in my_pipe so we have data to use
    attach_pipeline(pipe_definition(context), name='my_pipe')

  
    # Schedule when to trade
    schedule_function(enter_buy_sell_orders, date_rules.week_start(), time_rules.market_open())


    # Schedule when to record any tracking data
    schedule_function(my_record_vars, date_rules.every_day(), time_rules.market_close())
     

         
def pipe_definition(context):
    '''
    Here is where the pipline definition is set.
    Specifically it defines which collumns appear in the resulting dataframe.
    Typically don't apply a screen (or filter) to the entire result or at least be
    careful if you do. If any close rules depend upon data for current positions then ensure that
    those positions are always included in the dataframe (and not filtered out).
    '''
    
    # Create a universe filter which defines our baseline set of securities
    # If no filter is used then ALL assets in the Q database will potentially be returned
    # This is not what one typically wants because 
    #    1) it includes a mix of ETFs and stocks
    #    2) it includes very low liquid and 'penny' stocks
    #
    # This filter can also be used as a mask in factors to potentially speed up some calcs
    # Built in filters (eg Q500US) can be used or filters can be made directly from datasets as below
    universe = Filters.Q1500US()
    
    # Create any basic data factors that your logic will use.
    # This is done by simply using the 'latest' method on a datacolumn object.
    # Just ensure the dataset is imported first.
    low_price = USEquityPricing.low.latest    

    # Create any built in factors you want to use (in this case Returns). 
    # Just ensure they are imported first.
    gain = Factors.Returns(inputs=[USEquityPricing.close], window_length=2, mask = universe)
    
    # Create any built in filters you want to use.
    pass

    # Create any filters based upon factors defined above.
    # These are easily made with the built in methods such as '.top' etc applied to a factor
    top_gainers = gain.percentile_between(75, 100, mask = universe)
    
    # Define the columns and any screen which we want our pipeline to return
    return Pipeline(
            columns = {
            'low_price' : low_price,
            'gain' : gain,
            },
            screen = top_gainers
            )
    
 
def before_trading_start(context, data):
    '''
    Run pipeline_output to get the latest data for each security.
    The data is returned in a 2D pandas dataframe. Rows are the security objects.
    Columns are what was defined in the pipeline definition.
    '''
    
    # Get a dataframe of our pipe data. Placed in the context object so it's available
    # to other functions and methods (quasi global)
    context.output = pipeline_output('my_pipe')

   
def enter_buy_sell_orders(context, data):
    '''
    Let's buy the 20 lowest priced stocks returned from the pipeline
    Order an equal amount of each stock
    Try to buy at yesterdays low by using a limit order
    '''
    stocks_to_buy = context.output.sort_values('low_price', ascending=True).head(TARGET_STOCKS).index
    weight = TARGET_LEVERAGE/stocks_to_buy.size

    for stock in stocks_to_buy:
        if data.can_trade(stock):
            limit_price = context.output.get_value(stock, 'low_price')
            order_target_percent(stock, weight, style=LimitOrder(limit_price))

    # Sell everything that we don't want to buy
    for stock in context.portfolio.positions:
        if stock not in stocks_to_buy and data.can_trade(stock):
            order_target_percent(stock, 0)


def my_record_vars(context, data):
    """
    Plot variables at the end of each day.
    """
    
    # Record the number of positions held each day
    qty_of_positions = len(context.portfolio.positions)
    record(positions=qty_of_positions)
 
There was a runtime error.

How to use top, bottom, percentile_between methods on pipeline_output?

from quantopian.pipeline  import Pipeline  
from quantopian.algorithm import attach_pipeline, pipeline_output  
import quantopian.pipeline.factors as Factors  
import quantopian.pipeline.filters as Filters

def initialize(context):  
    attach_pipeline(pipeline(context), 'pipeline')  
    schedule_function(rebalance, date_rules.month_start(), time_rules.market_open(minutes = 65))

def pipeline(context):  
    universe  = Filters.QTradableStocksUS()  
    mkt_cap   = Factors.MarketCap(mask = universe)  
    my_screen = mkt_cap.top(100)  
    pipe      = Pipeline(columns = {'mkt_cap':mkt_cap, }, screen = my_screen)  
    return pipe

def rebalance(context, data):  
    mkt_cap_sorted = pipeline_output('pipeline').sort_values('mkt_cap', ascending = True)  
    longs = mkt_cap_sorted.tail(10)  
    # longs = mkt_cap_sorted.top(10)  
    # longs = mkt_cap_sorted.buttom(10)  
    # longs = mkt_cap_sorted.percentile_between(90, 100)  
    print longs  

Note that 'top', 'bottom', and 'percentile_between' are pipeline factor methods. They need to be applied to a factor object and then will return a pipeline filter object. Factor and filter definitions are done exactly once in an algorithm (typically in the 'initialize' method or a method called inside 'initialize').

So, the code below doesn't work

def rebalance(context, data):  
    mkt_cap_sorted = pipeline_output('pipeline').sort_values('mkt_cap', ascending = True)  
    longs = mkt_cap_sorted.tail(10)  
    longs = mkt_cap_sorted.top(10)  
    longs = mkt_cap_sorted.buttom(10)  
    longs = mkt_cap_sorted.percentile_between(90, 100) 

First, 'mkt_cap' is a pandas dataframe (not a factor object) so it doesn't recognize the 'top', 'bottom', and 'percentile_between' methods. Second, these need to be defined before the 'pipeline_output' is called (typically in 'initialize') .

Put these methods in the pipeline definition. Something like this.

def pipeline(context):  
    universe = Filters.QTradableStocksUS()  
    mkt_cap = Factors.MarketCap(mask = universe)  
    top_10 = mkt_cap.top(10)  
    bottom_10 = mkt_cap_sorted.bottom(10)  
    between_90_100 = mkt_cap.percentile_between(90, 100) 


    pipe = Pipeline(columns = {  
        'mkt_cap' : mkt_cap,  
        'top_10' : top_10,  
        'bottom_10' : bottom_10,  
        'between_90_100' : between_90_100,  
          })  


    return pipe

To then use these when getting the pipeline output (which is usually done in the 'before_trading_start' method), do something like this

def before_trading_start(context, data):  
    output = pipeline_output('pipeline')  
    top_10_list = output.query('top_10').index.tolist()  
    bottom_10_list = output.query('bottom_10').index.tolist()  
    between_90_100_list = output.query('between_90_100').index.tolist()

Now, one can also use plain old pandas dataframe methods (instead of defining pipeline filters) to get the top and bottom by using the 'nlargest' and 'nsmallest' methods (pandas doesn't have a neat alternative to percent_between)

def before_trading_start(context, data):  
    output = pipeline_output('pipeline')  
    top_10_list = output.nlargest(10, 'mkt_cap').index.tolist()  
    bottom_10_list = output.nsmallest(10, 'mkt_cap').index.tolist()

Hope that helps...

Dan,

Thank you very much.