What is with these bounds/BoundColumn errors?

I've seen like 5 very different answers where to 'research' these exceptions being thrown--can someone just answer the question and tell me what I'm doing wrong with something seemingly REMEDIAL at best. Best part of trading quant futures -- pipelines serve no purpose.
I've tried commenting things out that I would like to include, and it runs -- but ideally would like to include these filters in the mask/or as columns so I can select them in pd DF. Either would be nice.

def make_pipeline():

# Base universe set to the QTradableStocksUS

#prices = data.history(base_universe,'price',20,'1h')

# Factor of yesterday's close price.
yesterday_close = USEquityPricing.close.latest

#close = USEquityPricing.close
sma14 = SimpleMovingAverage(inputs=[USEquityPricing.close],window_length=14)
ubb = BollingerBands([USEquityPricing.close],window_length=14,k=1.0)
lbb = BollingerBands([USEquityPricing.close],window_length=14,k=-1.0)

top_close_price_filter = yesterday_close.top(200)
lbo = yesterday_close > ubb
sbo = yesterday_close < lbb

#dollar_volume = AverageDollarVolume(window_length=30)
#high_dollar_volume = dollar_volume.percentile_between(90, 100)

#Fundamental Factors

#fcf_so = fcf/so WHY WOULD THIS WORK , it's fucking division after all.

#Final Fundamentals by quartile (Also try .top(100) / bottom(100)
#top = fcf & ev_eb #.quartile(4) #Why cant .top or percentile work?
#btm = fcf.percentile_between(0,25) & ev_eb.percentile_between(0,25)

#Combine some screens -- High volume, FUNDAMENTAL FILTERS and technicals.

#L_or_S = top | btm

#is_tradeable = high_dollar_volume & base_universe & L_or_S #Of course not, that would be too logical.

pipe = Pipeline(
columns={
'close': yesterday_close,
'sma14':sma14,
'upper_bb':ubb,
'lower_bb':lbb,
#'fcf':fcf,
#'ev_eb':ev_eb
},
screen=base_universe #& fcf #.top(100)# & ev_eb.top(100)
)
return pipe

4 responses

These are actually good questions. We all have run up against pipeline errors and issues like this at times, and yes, it can be frustrating. I'll try to address each issue in order and maybe 'de-mystify' some of what's going on.

#prices = data.history(base_universe,'price',20,'1h')



History pricing requests don't really work inside a pipeline definition. Remember what we're doing is defining the pipeline at this stage. The make_pipeline function typically only runs once during the initialize phase of an algo. It just sets up the rows and columns which are to be returned when actually running the pipeline. The data.history method is a run time call to actually fetch data. Only use filter, factor, and classifier code in the pipeline definition since it only gets executed once.

The BollingerBands factors aren't commented out as being a problem however, they cause some problems later on.

ubb = BollingerBands([USEquityPricing.close],window_length=14,k=1.0)
lbb = BollingerBands([USEquityPricing.close],window_length=14,k=-1.0)



The correct code should be

bb = BollingerBands([USEquityPricing.close],window_length=14,k=1.0)
ubb =  bb.upper
lbb = bb.lower



The BollingerBands factor is one of a handful of built in factors which return multiple outputs (ie lower, middle, upper). One needs to use the dot notation to specify which output one wants. Check out this forum post for more detail https://www.quantopian.com/posts/what-does-the-class-bollingerbands-return . Specifically the last response.

On to the pe_ratio factors

#top_pe_stocks = pe_ratio.top(100, mask=universe)



Those lines are correct but just needed to define what 'pe_ratio' is. I picked Fundamentals.basic_eps_earnings_reports but there are actually several versions of eps which Morningstar offers (check the data reference here)

pe_ratio = Fundamentals.basic_eps_earnings_reports.latest



Now the lines

#dollar_volume = AverageDollarVolume(window_length=30)
#high_dollar_volume = dollar_volume.percentile_between(90, 100)



These lines are also correct. Just make sure to import the factor AverageDollarVolume before using it. Something like this

from quantopian.pipeline.factors import AverageDollarVolume



The following lines begin creating issues even though they weren't commented out

fcf = Fundamentals.fcf_yield #(mask=base_universe)



These should be

fcf = Fundamentals.fcf_yield.latest
so = Fundamentals.shares_outstanding.latest
ev_eb = Fundamentals.ev_to_ebitda.latest



Notice the latest method. That is what creates the actual factor. It may help to break down the statement Fundamentals.fcf_yield.latest. Fundamentals references the Morningstar dataset which contains many different fundamental fields. The fcf_yield attribute defines a single field or column of that data (technically a BoundColumn). However, it's a whole column containing multiple dates of data for each asset. Since factors, by definition, represent a single value for each asset one needs to specify which value from that column of fcf_yield to return. Most of the time one simply wants the most current value so use the latest method. Note that the latest method doesn't provide for a mask. If one really wants to use a mask then use the Latest class to create a factor ( Latest(inputs=[Fundamentals.fcf_yield], mask=high_dollar_volume) but this isn't typical.

One clarification to the above paragraph. The most important thing to understand about DataSets and BoundColumns and Factors is they do not hold actual data. They are simply collections of objects that tell the Pipeline API where and how to find the inputs to computations. Remember we are simply defining the pipeline at this stage and not fetching actual data. However, it's common to speak about these objects as if they hold the data (as in the description in the previous paragraph) but do remember that's not exactly correct.

Continuing on to the next issue

#fcf_so = fcf/so WHY WOULDN'T THIS WORK , it's #@[email protected]% division after all.
#top = fcf & ev_eb #.quartile(4) #Why cant .top or percentile work?
#btm = fcf.percentile_between(0,25) & ev_eb.percentile_between(0,25)



These lines created errors since 'fcf', 'so' , and 'ev_eb' weren't defined as factors (ie the above issue). Division works with factors. It doesn't work with BoundColumns. Fixing the above issues fixed these. Like this.

# Below will be a factor (ie a number associated with each asset
fcf_so = fcf/so

# These will be filters (ie a boolean value associated with each asset to be used in masks and screens)
top = fcf.top(100) & ev_eb.quartile(4)
btm = fcf.percentile_between(0,25) & ev_eb.percentile_between(0,25)



Finally the statements

#L_or_S = top | btm
#is_tradeable = high_dollar_volume & base_universe & L_or_S #Of course not, that would be too logical.



These are also correct. It just didn't work because 'top' and 'bottom' weren't defined correctly. This will work.

L_or_S = top | btm
is_tradeable = high_dollar_volume & base_universe & L_or_S


Summing this all up, the single error which propagated throughout, was missing the latest attribute. It's not entirely obvious because python doesn't flag it as an error. Only when one begins using what is expected to be a factor (when it's actually a BoundColumn object) does it create problems.

Hope it helps to step through the notebook like this. Troubleshooting pipeline definitions is a bit of an art but isn't difficult with practice.

I've attached a notebook with these changes.

Good luck.

1
Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Wow, that is quite a full answer --thank you so much!
I'm still in disbelief it was all because of dot notations and a missing import, latest and upper/lower;
I looked for documentation of BB's and it didn't really have anything aside from the inputs, (this was Quantopian.com/help or something: https://www.quantopian.com/help#pipeline-title) -- is there a more detailed documentation of technical and fundamentals functions/vars/factors?

Thanks again for answering everything, but also fixing it! Much appreciated.
Pipeline remains my nemesis.

Zach

Sorry, I have a final question -- I got everything working/need to ease up on the filters, maybe add some |'s in place of &'s--but I'm curious where I specify things like the data (being hourly, vs minute) and also where the various globals go--like slippage, commission, etc?

I assumed it was make_pipeline, or do I throw them in initialize, or above any functions?

Zach

@Zach Glad this helped. As far as these last questions...

where do I specify things like the data (being hourly, vs minute)
There are two basic mechanisms for getting data into an algo 1) pipeline and 2) the various 'data' method such as data.history. Originally, the only way to access data was with the data methods (and the now deprecated get_fundamentals method). The pipeline mechanism was added primarily for speed but also to handle the more complex data types beyond simple OHLCV data. Take a look at this post for some background on pipeline (https://www.quantopian.com/help#sample-earnings-risk).

If one only want's daily and not any current intra-day data then use pipeline exclusively. No need for the various data methods. By definition the data is daily so one doesn't specify 'minute' or 'hourly'. It's a three step process:

• Create a pipeline definition. This is typically done in a function but can also be done 'inline' with the code. A pipeline ultimately returns a pandas dataframe. This defines the columns (ie the factors) and the rows (ie the securities) in that dataframe.
• Instantiate and attach the pipeline to the algo. This needs to be done exactly once so place it in the initialize method typically like this. Note that multiple pipelines can be attached to an algo if one is so inclined.
    algo.attach_pipeline(make_pipline(), 'my_pipeline')


• Run the pipeline to fetch the results. This is typically done in the before_trading_start method. The result is a pandas dataframe which has the defined data. It's best to put this in the before_trading_start method because it get's allocated a longer run time. If placed in a scheduled function or handle_data it will only get allocated a minute and will generate an error if it runs longer than that. Something like this
    pipe_results = algo.pipeline_output('my_pipeline')



If one wants minute level or current intra-day OHLCV data then use the data methods. The two most common are data.current and data.history which return the current data and a series of data respectively. One can specify either minute data or daily data. Look at the docs for the options (https://www.quantopian.com/help#api-data-methods). Typical code would look like this and be placed in a scheduled function or in handle_data

      stock = symbol('IBM')
current_price = data.current(stock, 'price')
last_30_minutes_of_prices = data.history(stock, 'price', bar_count=30, frequency="1m")



where do the various globals go--like slippage, commission, etc? I assumed it was make_pipeline, or do I throw them in initialize, or above any functions?
Place anything which needs to be executed just once in the initialize method (like setting slippage, commissions, or initializing any variables). If you choose to place all the pipeline definition logic in a function then don't place any other logic there. Additionally, if you need to use 'global' variables it's best practice to put those in the initialize method as attributes of the context object like this

      context.MAX_STOCKS = 25
context.LEVERAGE = 1.0



Good luck.