Back to Community
Pipeline Trading Universe - Best Practice

EDIT: The updated version of this algorithm uses Q1500US, one of the pipeline's built-in filters. Lesson 11 from the Pipeline tutorial introduces these built-in filters and provides a brief explanation on how they are used to specify a downsized base universe. More detail on the selection criteria of these filters can be found here.

About a week ago Scott Sanderson posted about Pipeline's new support for string data, and included a notebook demonstrating how to narrow down the trading universe. These filters get rid of a lot of the equities that an algorithm generally shouldn't trade, like non-primary shares.

Specifically, there are nine filters, checking that the equity:
1. is a common stock
2. doesn't have a name indicating it's a limited partnership (LP)
3. doesn't have a company reference entry indicating it's a limited partnership
4. has fundamental data associated with it in Morningstar (isn't an ETF)
5. isn't over-the-counter
6. isn't When Issued
7. isn't a depository receipt
8. is primary share
9. has a high dollar volume

Pipeline is a powerful tool that opens up access to the full range of 8000+ equities in Quantopian's database. This includes a wide variety of types of equities, including ETFs, ADRs, non-primary shares, etc. Different types of equities can behave differently or have different data available. For example, ETFs don't have fundamental data. As a result, a model can be heavily dependent on the types of equities that it's using. These filters can help select the types of equities that you want to trade.

We wanted to draw some attention to these filters, since this type of security selection is a best practice we recommend. I've attached an example mean-reversion algorithm that makes use of the filters, and selects from a generic set of liquid and common stocks. Clone this algorithm and try using these filters in your own strategy!

Clone Algorithm
Total Returns
Max Drawdown
Benchmark Returns
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
Utilizes the filters for a good trading universe laid out by Scott Sanderson:
from quantopian.algorithm import attach_pipeline, pipeline_output
from quantopian.pipeline import Pipeline
from import morningstar as mstar
from import USEquityPricing
from quantopian.pipeline.factors import AverageDollarVolume, SimpleMovingAverage
from quantopian.pipeline.filters.morningstar import IsPrimaryShare

def initialize(context):
    # Equity numbers for the mean reversion algorithm.
    context.num_securities = 20
    context.num_short = context.num_securities // 2
    context.num_long = context.num_securities - context.num_short
    schedule_function(my_rebalance, date_rules.week_start(), time_rules.market_open(hours=1))
    schedule_function(my_record_vars, date_rules.every_day(), time_rules.market_close())
    attach_pipeline(my_pipeline(context), 'my_pipeline')

def my_pipeline(context):
    pipe = Pipeline()
    9 filters:
        1. common stock
        2 & 3. not limited partnership - name and database check
        4. database has fundamental data
        5. not over the counter
        6. not when issued
        7. not depository receipts
        8. primary share
        9. high dollar volume
    Check Scott's notebook for more details.
    common_stock = mstar.share_class_reference.security_type.latest.eq('ST00000001')
    not_lp_name = ~mstar.company_reference.standard_name.latest.matches('.* L[\\. ]?P\.?$')
    not_lp_balance_sheet = mstar.balance_sheet.limited_partnership.latest.isnull()
    have_data = mstar.valuation.market_cap.latest.notnull()
    not_otc = ~mstar.share_class_reference.exchange_id.latest.startswith('OTC')
    not_wi = ~mstar.share_class_reference.symbol.latest.endswith('.WI')
    not_depository = ~mstar.share_class_reference.is_depositary_receipt.latest
    primary_share = IsPrimaryShare()
    # Combine the above filters.
    tradable_filter = (common_stock & not_lp_name & not_lp_balance_sheet &
                       have_data & not_otc & not_wi & not_depository & primary_share)
    high_volume_tradable = (AverageDollarVolume(window_length=21,
                                                mask=tradable_filter).percentile_between(70, 100))
    # The example algorithm - mean reversion. Note the tradable filter used as a mask.
    sma_10 = SimpleMovingAverage(inputs=[USEquityPricing.close], window_length=10,
    sma_30 = SimpleMovingAverage(inputs=[USEquityPricing.close], window_length=30,
    rel_diff = (sma_10 - sma_30) / sma_30
    top_rel_diff =
    pipe.add(top_rel_diff, 'top_rel_diff')
    bottom_rel_diff = rel_diff.bottom(context.num_long)
    pipe.add(bottom_rel_diff, 'bottom_rel_diff')
    return pipe

# Get the pipeline output and specify which equities to trade.
def before_trading_start(context, data):
    context.output = pipeline_output('my_pipeline')
    context.short_set = set(context.output[context.output['top_rel_diff']].index)
    context.long_set = set(context.output[context.output['bottom_rel_diff']].index)
    context.security_set = context.long_set.union(context.short_set)

# Rebalance weekly.
def my_rebalance(context,data):
    for stock in context.security_set:
        if data.can_trade(stock):
            if stock in context.long_set:
                order_target_percent(stock, 1. / context.num_securities)
            elif stock in context.short_set:
                order_target_percent(stock, -1. / context.num_securities)
    for stock in context.portfolio.positions:
        if stock not in context.security_set and data.can_trade(stock):
            order_target_percent(stock, 0)

# Record variables.
def my_record_vars(context, data):
    shorts = longs = 0
    for position in context.portfolio.positions.itervalues():
        if position.amount < 0:
            shorts += 1
        elif position.amount > 0:
            longs += 1
    record(leverage=context.account.leverage, short_count=shorts, long_count=longs)
We have migrated this algorithm to work with a new version of the Quantopian API. The code is different than the original version, but the investment rationale of the algorithm has not changed. We've put everything you need to know here on one page.
There was a runtime error.

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

15 responses

Thanks Nathan,

There is some evidence that you may not have captured everything:

What was your methodology for establishing the list of exclusions? And is there a way to confirm that it is correct (e.g. maybe the research environment could be used to confirm that all of the undesirables have been removed)? There's probably a set of checks that can be run, to confirm that the universe has the desirable characteristics, and that nothing has been missed (e.g. due to coding errors by Morningstar). Do you have an alternate set of data that would allow you to check the Morningstar codings? There's no reason to think that their data are perfect, right?

Also, I'd asked Scott about the possibility of including a filter for stocks that have known bad data. Any thoughts on the feasibility of including such a list in your filter?

In the end, it'd be great to have a canonical base universe of equities, vetted by Quantopian, with clean data, to use for long-short equity strategy development. I gather that is the end goal here, but I can't tell if we are there yet.

You might also consider excluding recent IPOs. As your filter stands now, stocks could be admitted 1 day after their first trade after IPO (although maybe your high_volume_tradable implicitly handles this case?). If you go down this path, note that there are some bad data (see; there should be an associated internal ticket, as communicated to me privately (check with Jamie). I'd think that most long-short equity algos, unless explicitly dealing in IPOs, would want to let the dust settle a bit before including them in a portfolio.

Great, thank you.

This should be in the "Interesting" tab.

Very interesting. I think there should be a simple way, build into quantopian, to limit the securities to the active and tradeable ones. It seems silly for everyone to have to use their own set of filters, just to get a good trading universe. Right now there is a very high learning curve in my opinion, which could be lowered considerable by more quantopian predefined stuff like this. The ones who wish to use a broader trading universe can then define their own.

Btw is there a simple way to limit the securities to an index, like S&P500 or the danish OMXC20, which is the one where i follow the companies the closest?

Hi Nathan,

Any response to my feedback above? Any thoughts on how to filter out stocks with bad data (e.g. I think Q maintains an internal list of problem children)?

Can set_do_not_order_list be used within pipeline? And would set_do_not_order_list(security_lists.leveraged_etf_list) even be necessary with your framework above?

Just wondering if you are done, and have established an authoritative list of exclusions, for writing a typical equity long-short algo? To Allan's point, could it be "released" in some fashion (e.g. put into a github repository)?

Also, I'm curious...what if one just wanted ETFs or ETF-like thingys? It would seem if you can exclude them there should also be a way to include them, and exclude everything else, no?


I'm also desperately looking for a way to include only ETFs in a Pipeline screen unless there's a way to leverage Factors outside of pipeline that I've missed. Backtests are timing out if the only screen I have on is "dollar_volume > 10**7". I know how to select the ETFs I want AFTER the pipeline returns but therein lies the problem. I imagine that performance would be exponentially better if the pipeline is never run on the unwanted stocks in the first place.

As I was looking at this problem today, I found a solution for me that seems to speed things up:

screen = (dollar_volume > 10**7) & morningstar.share_class_reference.symbol.latest.isnull()

sorry i am so new to coding and finance. How would I say something like ev_ebitda = mstar.fundamentals.valuation_ratios.ev_to_ebitda < 14 I have been trying to figure out fundamentals in pipeline but i get all sorts of errors like,

AttributeError: Neither 'UserBinaryExpression' object nor 'Comparator' object has an attribute 'ndim'


AttributeError: Neither 'UserInstrumentedAttribute' object nor 'Comparator' object associated with UserInstrumentedAttribute has an attribute 'window_safe'

from what i understand of the above code, I don't need to build CustomFactor inherited instances right?

Thank you,

Try using ".latest" to build comparisons using the latest value:

ev_ebitda = mstar.fundamentals.valuation_ratios.ev_to_ebitda.latest < 14  

I believe that should fix your issue.

thank you! it worked i just had to add the .latest and take out the .fundamentals! hahaaha,

Thanks a lot!

Hi Nathan,

I was specifically wondering - as far as the pipeline goes - is there a way to filter on beta? So far I've found methods to mostly calculate beta on a selected universe (example below). Also, wouldn't it just be inefficient to apply a vectorized calculation of Beta to a large group like that? But say that I wanted to use this as criteria for screening?

def _beta(ts, benchmark, benchmark_var):  
    return np.cov(ts, benchmark)[0, 1] / benchmark_var 

class Beta(CustomFactor):  
    inputs = [USEquityPricing.close]  
    window_length = 60  
    def compute(self, today, assets, out, close):  
        returns = pd.DataFrame(close, columns=assets).pct_change()[1:]  
        spy_returns = returns[sid(8554)]  
        spy_returns_var = np.var(spy_returns)  
        out[:] = returns.apply(_beta, args=(spy_returns,spy_returns_var,))  

For those reading this thread. Check out the built in filters: Q500() and default_us_equity_universe_mask().

thank you for this

A question for Grant and/or anyone else who may know.

How can I exclude from consideration:
a) Just the Tobacco stocks (for example if I want an "ethical" portfolio) even if I don't know all their symbols?
b) Just a short list of very specific known stocks that I consider to be "inappropriate" for whatever reason?

thanks in advance for your help.