Back to Community
output of one batch transform as the input of another?

Maybe my dataframe is rusty or maybe I just don't understand what the output of a batch transform is, but I am having trouble figuring this out.

Use case is simple:

  • data has say three stocks
  • I want to calculate the historical 10-day volatility (using std() for instance) of each stock
  • then I want to calculate the 5 day average of those volatilities

There must be an elegant way to do this that doesn't involve loops?


3 responses

Hi Simon,

I suppose you mean the 10-day rolling volatility?
I haven't run this but hopefully it will get the idea across:

@batch_transform(window_length=numpy.inf, compute_only_full=False) # will never clear out old events and update from the beginning  
def avg_volatility(datapanel):  
    vol = pd.rolling_std(datapanel.prices, 10)  
    return pd.rolling_mean(vol, 5)  


Yeah I found the same thing, but if I am going to use the rolling functions, what is the point of having them in a batch_transform in the first place? Or is that just because I wouldn't normally have a datapanel...

Or is that just because I wouldn't normally have a datapanel...

Yes, exactly. The central predicament is that we can not rely on an array of data being present. It was a central design choice of zipline for it to be streaming based. This has many benefits like being closer to reality, out-of-the-box support for storing the historical data in a db and streaming it (like we do on quantopian; it'd require way too much ram otherwise) and making it easy to make the jump to paper and eventually live trading. The down-side is that the style people normally process data (which is array based) doesn't work. Hence the batch_transform.

The name is actually not the best. It's really just an event accumulator. Once this PR is in: the batch_transform will also be much much faster. Then things like TAlib integration start making more sense too.