These are actually good questions. We all have run up against pipeline errors and issues like this at times, and yes, it can be frustrating. I'll try to address each issue in order and maybe 'de-mystify' some of what's going on.
Start with the line
#prices = data.history(base_universe,'price',20,'1h')
History pricing requests don't really work inside a pipeline definition. Remember what we're doing is defining the pipeline at this stage. The
make_pipeline function typically only runs once during the initialize phase of an algo. It just sets up the rows and columns which are to be returned when actually running the pipeline. The
data.history method is a run time call to actually fetch data. Only use filter, factor, and classifier code in the pipeline definition since it only gets executed once.
The BollingerBands factors aren't commented out as being a problem however, they cause some problems later on.
ubb = BollingerBands([USEquityPricing.close],window_length=14,k=1.0)
lbb = BollingerBands([USEquityPricing.close],window_length=14,k=-1.0)
The correct code should be
bb = BollingerBands([USEquityPricing.close],window_length=14,k=1.0)
ubb = bb.upper
lbb = bb.lower
BollingerBands factor is one of a handful of built in factors which return multiple outputs (ie lower, middle, upper). One needs to use the dot notation to specify which output one wants. Check out this forum post for more detail https://www.quantopian.com/posts/what-does-the-class-bollingerbands-return . Specifically the last response.
On to the pe_ratio factors
#top_pe_stocks = pe_ratio.top(100, mask=universe)
#bottom_pe_stocks = pe_ratio.bottom(100, mask=universe)
Those lines are correct but just needed to define what 'pe_ratio' is. I picked
Fundamentals.basic_eps_earnings_reports but there are actually several versions of eps which Morningstar offers (check the data reference here)
pe_ratio = Fundamentals.basic_eps_earnings_reports.latest
top_pe_stocks = pe_ratio.top(100, mask=base_universe)
bottom_pe_stocks = pe_ratio.bottom(100, mask=base_universe)
Now the lines
#dollar_volume = AverageDollarVolume(window_length=30)
#high_dollar_volume = dollar_volume.percentile_between(90, 100)
These lines are also correct. Just make sure to import the factor
AverageDollarVolume before using it. Something like this
from quantopian.pipeline.factors import AverageDollarVolume
The following lines begin creating issues even though they weren't commented out
fcf = Fundamentals.fcf_yield #(mask=base_universe)
so = Fundamentals.shares_outstanding #(mask=base_universe) #&high_dollar_volume)
ev_eb = Fundamentals.ev_to_ebitda #(mask=base_universe)#&high_dollar_volume)
These should be
fcf = Fundamentals.fcf_yield.latest
so = Fundamentals.shares_outstanding.latest
ev_eb = Fundamentals.ev_to_ebitda.latest
latest method. That is what creates the actual factor. It may help to break down the statement
Fundamentals references the Morningstar dataset which contains many different fundamental fields. The
fcf_yield attribute defines a single field or column of that data (technically a BoundColumn). However, it's a whole column containing multiple dates of data for each asset. Since factors, by definition, represent a single value for each asset one needs to specify which value from that column of fcf_yield to return. Most of the time one simply wants the most current value so use the
latest method. Note that the
latest method doesn't provide for a mask. If one really wants to use a mask then use the
Latest class to create a factor ( Latest(inputs=[Fundamentals.fcf_yield], mask=high_dollar_volume) but this isn't typical.
One clarification to the above paragraph. The most important thing to understand about DataSets and BoundColumns and Factors is they do not hold actual data. They are simply collections of objects that tell the Pipeline API where and how to find the inputs to computations. Remember we are simply defining the pipeline at this stage and not fetching actual data. However, it's common to speak about these objects as if they hold the data (as in the description in the previous paragraph) but do remember that's not exactly correct.
Continuing on to the next issue
#fcf_so = fcf/so WHY WOULDN'T THIS WORK , it's #@[email protected]% division after all.
#top = fcf & ev_eb #.quartile(4) #Why cant .top or percentile work?
#btm = fcf.percentile_between(0,25) & ev_eb.percentile_between(0,25)
These lines created errors since 'fcf', 'so' , and 'ev_eb' weren't defined as factors (ie the above issue). Division works with factors. It doesn't work with BoundColumns. Fixing the above issues fixed these. Like this.
# Below will be a factor (ie a number associated with each asset
fcf_so = fcf/so
# These will be filters (ie a boolean value associated with each asset to be used in masks and screens)
top = fcf.top(100) & ev_eb.quartile(4)
btm = fcf.percentile_between(0,25) & ev_eb.percentile_between(0,25)
Finally the statements
#L_or_S = top | btm
#is_tradeable = high_dollar_volume & base_universe & L_or_S #Of course not, that would be too logical.
These are also correct. It just didn't work because 'top' and 'bottom' weren't defined correctly. This will work.
L_or_S = top | btm
is_tradeable = high_dollar_volume & base_universe & L_or_S
Summing this all up, the single error which propagated throughout, was missing the
latest attribute. It's not entirely obvious because python doesn't flag it as an error. Only when one begins using what is expected to be a factor (when it's actually a BoundColumn object) does it create problems.
Hope it helps to step through the notebook like this. Troubleshooting pipeline definitions is a bit of an art but isn't difficult with practice.
I've attached a notebook with these changes.