Pipeline at first can be a bit to wrap one's head around. The first thing to understand is you are defining the columns of a dataframe. The columns you define are ostensibly data which your algo requires for its decision making. It's no more than getting all the data in one place. Now, by using filters, one can and often does implement some logic within the pipeline definition. This is ok, but first and foremost get the data. There is a forum post here which may provide some better background on pipeline.
So, data. The first question then is what data do you need. Two pieces of data have been alluded to - volume and dollar-volume. No problem, the pipeline definition for both are very similar. Oh, almost forgot to mention. It is VERY strongly recommended to create pipeline definitions in the notebook environment. It's faster, but moreover, the output can be easily visualized. Pipeline definitions, along with custom factors, can be copied from a notebook into the IDE and work the same.
The 10 SMA and latest values for volume are pretty easy
# The datasets and the geo domain we want to use
from quantopian.pipeline.data import EquityPricing
from quantopian.pipeline.domain import US_EQUITIES
# Use the SimpleMovingAverage factor to calculate the 10 day SMA. It will calculate for all securities
last_10_days_each_sma = SimpleMovingAverage(inputs=[EquityPricing.volume], window_length=10)
# Use the latest method to just get the latest value
yesterday_each_sma = EquityPricing.volume.latest
The one confounding detail is we want to get these values now just for SPY. Typically, the output of a factor provides a value for all securities. It's kinda a series. When one performs math on factors, the values for each security are matched. To pass the value of SPY to all securities one must first 'slice' the factor to obtain just the SPY value. Then, one must 'broadcast' that value to all securities. There's more detail in the docs here. The code uses brackets notation like this.
last_10_days_spy_sma = last_10_days_each_sma[symbols('SPY')]
yesterday_spy_sma = yesterday_each_sma[symbols('SPY')]
Now that we have our basic factors, we can create some calculated factors based on these.
# We can now find the ratio. For technical reasons the first operand must be a factor and not a slice
ratio_10_days = last_10_days_each_sma / last_10_days_spy_sma
ratio_1_days = yesterday_each_sma / yesterday_spy_sma
Generally, slices and full factors can be used together (factors internally broadcast the slice data to all assets) however, slices are sort of second class citizens. In this case, a factor knows what to do with a slice, but a slice doesn't know what a factor is. The order is important. Put the factor first.
A pipeline can be set up using the dollar-volume in much the same way. Check out the attached notebook.
My personal opinion is dollar-volume is the way to go. Investors invest money not shares. By the way, both dollar-volume and volume are adjusted for splits so no need to worry there. Actually, my advice would be to not use SPY volume but rather the total dollar-volume of all shares in your universe. If thats something you want to pursue it will require a small custom factor.