Back to Community
Why does batch_transform behave like this?

Hi everyone, need some help on the batch_transform.

Dive into the code

import random

def initialize(context):  
    pass

@batch_transform(window_length=1, refresh_period=1)  
def batch(datapanel):  
    return random.random()  
# Will be called on every trade event for the securities you specify.  
def handle_data(context, data):  
    x = batch(data)  
    if x is not None:  
        log.info(x)  
        order(sid(24), 50)  

From what I understand, when I set the
refresh period=1 the batch_transformed function should only be called once every day. [update: okay, that's incorrect. The function should be called every bar. However, the datapanel is only updated once everyday] Hence I expect the the output of logs be same within the same day when run in the minute mode. But here is what I got

2014-02-20handle_data:14INFO0.0866587877595  
2014-02-21handle_data:14INFO0.260505064836  
2014-02-21handle_data:14INFO0.297122017472  
2014-02-21handle_data:14INFO0.0491141736111  
2014-02-21handle_data:14INFO0.999269810604  
2014-02-21handle_data:14INFO0.0700925961977  
2014-02-21handle_data:14INFO0.422723775352  
2014-02-21handle_data:14INFO0.256027120872  
2014-02-21handle_data:14INFO0.406170130361  
2014-02-21handle_data:14INFO0.73002239253  
2014-02-21handle_data:14INFO0.515887335473  
2014-02-21handle_data:14INFO0.74262632377  
2014-02-21handle_data:14INFO0.707867163496  
2014-02-21handle_data:14INFO0.214877704526  
2014-02-21handle_data:14INFO0.751468426835  
2014-02-21handle_data:14INFO0.142119377371  
2014-02-21handle_data:14INFO0.981545623245  
2014-02-21handle_data:14INFO0.833314020675  

it seems to be rolling as the output is different for each minute. Even stranger, when i set
refresh_period = 2 The logs for the first day turn out as expected, that is the value returned for the log are the same for the first day. However, from the second day and on, the batch_transform behavior like rolling again.

Without digging into zipline source code, I cannot make sense of this behavior. Any idea?

Clone Algorithm
9
Loading...
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
import random

def initialize(context):
    pass

@batch_transform(window_length=1, refresh_period=1)
def batch(datapanel):
    return random.random()
    
# Will be called on every trade event for the securities you specify. 
def handle_data(context, data):
    x = batch(data)
    if x is not None:
        log.info(x)
        order(sid(24), 50)
This backtest was created using an older version of the backtester. Please re-run this backtest to see results using the latest backtester. Learn more about the recent changes.
There was a runtime error.
20 responses

Never mind, guys. I think I figured it out.

What's your conclusion?

Thanks
Ali

Well...I thought I did, but evidently I did not.

The code in the initial post doesn't work because the decorated function batch still get called every minute bar. The effect of the decorated function is to update the datapanel that is passed to the decorated function. Since the return value is generated random and does not depend on datapanel, I shouldn't expect the result be the same for every minute in a day.

But the problem is when I tried to use to code in the API documentation, the similar behavior again.

# Here we create a universe of securities based on the top 0.1% most liquid by  
# dollar volume traded. Approximately 7 individual securities will be used  
# by the algorithm.  
def initialize(context):  
  set_universe(universe.DollarVolumeUniverse(99.9, 100.0))

# This method expects to receive a datapanel containing dataframes for price,  
# volume, open_price, close_price, high, low, and volume. Each dataframe will  
# have ten rows (1 per trading day), and about 7 columns (one for each  
# stock). window_length is always in trading days.  
@batch_transform(window_length=1, refresh_period=1)  
def get_averages(datapanel):  
  # get the dataframe of prices  
  prices = datapanel['price']  
  # return a dataframe with one row showing the averages for each stock.  
  return prices.mean()

def handle_data(context, data):  
  # here is the magic part. We invoke the get_averages method using  
  # just the single data (a dictionary of daily bars indexed by security).  
  # the decorator will convert this to a datapanel spanning the 10 day history  
  # that we specified in with the window_length parameter in the decorator  
  # above.  
  averages = get_averages(data)  
  # add a newline to the beginning of the log line so that the column header of the  
  # is properly indented.  
  if averages is not None:  
      log.info('\n%s' % averages)  
  order(sid(67), 100)  
2014-02-03handle_data:28INFO  
39840     179.869210  
24705      37.373208  
67         49.748019  
42950      62.160294  
8554      175.444495  
21519     109.491919  
19920      84.881722  
26578    1151.228573  
24        502.595717  
dtype: float64  
2014-02-04handle_data:28INFO  
39840     179.861262  
24705      37.372644  
67         49.744122  
42950      62.157679  
8554      175.436367  
21519     109.484432  
19920      84.877850  
26578    1151.133855  
24        502.605717  
dtype: float64  
2014-02-04handle_data:28INFO  
39840     179.851517  
24705      37.372159  
67         49.740737  
42950      62.155602  
8554      175.428597  
21519     109.477150  
19920      84.874081  
26578    1151.049337  
24        502.612916  
dtype: float64  
2014-02-04handle_data:28INFO  
39840     179.843633  
24705      37.371736  
67         49.736968  
42950      62.153730  
8554      175.420621  
21519     109.469938  
19920      84.870132  
26578    1150.962749  
24        502.620506  
dtype: float64  

I ran your code (the one on top) and it runs as it should be, using both daily or minute data, the random value is generated once only for the whole day when the refresh_period = 1 for instance. It is generated once only every two days when refresh_period = 2, etc...

Hi Ali,

That's odd. Did you cloned the my code? If that's the case, it is probably because I have modified the code. I changed it back now. Can you run it again, and let me know the result please. Thanks

Hi Louis,

I cloned your code and got the same issue you did; the batch function is called every minute. But if you first backtest in daily mode and then in minute mode, it seems to work the way it should be; weird!
See below a sample of the log info I got in minute mode after backtesting in daily mode
2014-02-20handle_data:14INFO0.325009468121
2014-02-21handle_data:14INFO0.325009468121
2014-02-21handle_data:14INFO0.325009468121
2014-02-21handle_data:14INFO0.325009468121
2014-02-21handle_data:14INFO0.325009468121
2014-02-21handle_data:14INFO0.325009468121
2014-02-21handle_data:14INFO0.325009468121
2014-02-21handle_data:14INFO0.325009468121
2014-02-21handle_data:14INFO0.325009468121
2014-02-21handle_data:14INFO0.325009468121
2014-02-21handle_data:14INFO0.325009468121
2014-02-21handle_data:14INFO0.325009468121
2014-02-21handle_data:14INFO0.325009468121
2014-02-21handle_data:14INFO0.325009468121
2014-02-21handle_data:14INFO0.325009468121
2014-02-21handle_data:14INFO0.325009468121
2014-02-21handle_data:14INFO0.325009468121
2014-02-21handle_data:14INFO0.325009468121

Re Ali:

Cannot duplicate your correct result no matter what. I ran the daily mode first then minute mode. However, I do wonder whether my top most code should work or not. My understanding is the @batch_transfer decorator takes the [data] in handle_data and transform it to the a new panel_data with where each dataframe in the panel_data has a length(axis=0) of window_length. I think it is not the case that ,when we set refresh_window = 1, the batch_transformed function is only called everyday, it is still called every minute. What really happened is the transformation of [data] to panel_data only happens once a day.

However, as I said, I tried the example code in the API docs, the same situation still happens. So I still have no idea what is wrong. And apparently the way batch_transform is used in zipline is quite different from Quantopian. In the former, the transformed function need to be 'initialized' in the initialize function.

Here's a backtest that hopefully is illustrative of the batch transform behavior:

import numpy as np

# globals for batch transform decorator  
R_P = 0 # refresh period  
W_L = 1 # window length  
def initialize(context):  
    context.stocks = [sid(8554),sid(33652)] # SPY & BND

def handle_data(context, data):  
    OHLCV = get_data(data,context.stocks)  
    if OHLCV is None:  
        return  
    print get_datetime()  
    print OHLCV[0].shape  
    print OHLCV[0][-1,:]  
    O = np.array([data[context.stocks[0]].open_price,data[context.stocks[1]].open_price])  
    print O  
@batch_transform(refresh_period=R_P, window_length=W_L) # set globals R_P & W_L above  
def get_data(datapanel,sids):  
    O = datapanel['open_price'].as_matrix(sids)  
    H = datapanel['high'].as_matrix(sids)  
    L = datapanel['low'].as_matrix(sids)  
    C = datapanel['close_price'].as_matrix(sids)  
    V = datapanel['volume'].as_matrix(sids)  
    return (O,H,L,C,V)  

I set window_length = 0 so that the trailing window is rolling on a minutely basis. I intentionally started the backtest on a market early-close day, to show that the trailing window length is not fixed for this special case. However, if the backtest is started on a normal length day (e.g. 11/27/2013), the window length is fixed at 390 minutes, updated every minute.

Note that the batch transform will eventually be deprecated, in favor of history, so you might want to just work with history (although at this point, you'd need to write your own accumulator to obtain a trailing window of minutely data). Personally, I have been avoiding the batch transform.

Grant

Clone Algorithm
4
Loading...
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
import numpy as np

# globals for batch transform decorator
R_P = 0 # refresh period
W_L = 1 # window length
        
def initialize(context):
    
    context.stocks = [sid(8554),sid(33652)] # SPY & BND

def handle_data(context, data):
        
    OHLCV = get_data(data,context.stocks)
    if OHLCV is None:
        return
    
    print get_datetime()
    print OHLCV[0].shape
    print OHLCV[0][-1,:]
    O = np.array([data[context.stocks[0]].open_price,data[context.stocks[1]].open_price])
    print O
    
@batch_transform(refresh_period=R_P, window_length=W_L) # set globals R_P & W_L above
def get_data(datapanel,sids):
    O = datapanel['open_price'].as_matrix(sids)
    H = datapanel['high'].as_matrix(sids)
    L = datapanel['low'].as_matrix(sids)
    C = datapanel['close_price'].as_matrix(sids)
    V = datapanel['volume'].as_matrix(sids)
    
    return (O,H,L,C,V)
This backtest was created using an older version of the backtester. Please re-run this backtest to see results using the latest backtester. Learn more about the recent changes.
There was a runtime error.

Hi Grant,

Thanks for the explanation. I can see from your code how the window length is not fixed. I have to say, this behavior is not I have expected. However, the real issue I am having is with refresh_period.

Let's assume refresh_period=1 and window_length=1 and the backtest starts at 9:31:00 on 11/26/2013, which is a normal trading day. I expect the following to happen, from 9:31:00 to 15:59:00 on 11/26/2013, the batch_transformed function should return None. On 16:00:00, as we have accumulated enough data (390 minutes bars), the batch_transformed function should start to return whatever is defined in the transformed function. For the sake of argument, let's call it X. Now move on to the next day 11/27/2013, since the refresh_period=1, I expect the batch_transformed returns the same result X as on 11/26/2013 for the whole duration of 9:31:00 to 15:59:00. And then on 16:00:00 11/28/2013, a new result should be returned based on the data of 9:31:00 to 16:00:00 on 11/27/2013.

However, this is not what's happening in the actual back test. On fact, I got the exact same result for refresh_period=1 and refresh_period=0.

Try this backtest below and compare the results under two condition.

Clone Algorithm
9
Loading...
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
import random
from pytz import timezone

def initialize(context):  
    pass

@batch_transform(window_length=1, refresh_period=0)  
def batch(datapanel):  
    return datapanel['price'][sid(24)].sum()
# Will be called on every trade event for the securities you specify.  
def handle_data(context, data):  
    x = batch(data)  
    if x is not None: 
        print get_datetime().astimezone(timezone('US/Eastern'))
        print '\n%s' % x
        order(sid(24), 50)  
This backtest was created using an older version of the backtester. Please re-run this backtest to see results using the latest backtester. Learn more about the recent changes.
There was a runtime error.

Hello Louis,

Sorry, I'm gonna have to leave it up to the Quantopian support folks on this one. If you are a whiz at Python, you could sort out the code here:

https://github.com/quantopian/zipline/blob/master/zipline/transforms/batch_transform.py

Maybe an alternative approach would be to articulate the data that you need to feed into your algorithm. In other words, what are you trying to do?

Grant

Hi Louis,

If you set the refresh_period =1, this will update your data every day. If you set the refresh_period = 2, this will update your code every other day (as you correctly noted above).

Unfortunately, you ran into a bug in the batch_transform method. If you run an algo in minute mode with the refresh_period = 2 the data is refreshed every minute on the 2nd day. This of course is a bug. The behavior will be similar if you set the refresh period to any number except 0 or 1. For example, if you set refresh_period = 3 in minute mode, it will display correct behavior for the first 2 days, and then erroneously update every minute in the 3rd day.

We are deprecating batch_transform in favor of history for the future. It is unlikely that this bug will get fixed, but the improved history will work (correctly!) in minute mode.

Best,
Alisa

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Hi Alisa,

Thanks for the clarification. I have noticed this behavior, but wasn't sure if it's a bug or I have missed something. It's good to hear someone from the team to confirm this. By the way, if you know the same bug exist in zipline as well?

Louis

Yes, the same bug is in Zipline since its our backtest engine. And thanks for bringing it up, we'll make sure that the improved history function will fix these issues.

Hi Grant,

Thanks again for your help:) I am trying to implement a version of pairs trading strategy. Basically, I need to calculate the co-integration coefficient based on the minute data collected from previous day(s), and trade it for a day. I have an idea how to implement this myself. I will give it a try another day.

Louis

You're welcome. Accumulating data is no problem...just store it in context. And just like the batch transform, you'll need to wait until your trailing window is full before executing the algorithm. A big advantage of history is that the trailing window is available at the algorithm start (no warm-up period required), so you can start the algorithm immediately.

An example:

def accumulator(context, data):  
    if context.tic_count < window:  
        for i, stock in enumerate(context.stocks):  
            context.prices[context.tic_count,i] = data[stock].price  
            context.volumes[context.tic_count,i] = data[stock].volume  
    else:  
        context.prices = np.roll(context.prices,-1,axis=0)  
        context.volumes = np.roll(context.volumes,-1,axis=0)  
        for i, stock in enumerate(context.stocks):  
            context.prices[-1,i] = data[stock].price  
            context.volumes[-1,i] = data[stock].volume  
    context.tic_count += 1  

More accurately, context.tic_count should probably be called context.bar_count.

No time now, but I could illustrate usage of the accumulator in an algorithm, if it would be helpful.

Grant

Re Alisa:

I will keep a close eye on this history feature. I don't know if it's just me or not. But I think the decorator syntax for batch_transformed is not very intuitive and it seems the history feature provides a much better interface for the end users like me. In another word, I think it's a brilliant move. Good works guys!

Louis

Re Grant:

''A big advantage of history is that the trailing window is available at
the algorithm start (no warm-up period required)""

This is great. It will fixed the problem of inaccuracy in the performance metrics as warning period is included in the calculation for performance as well. Really looking forward to it.

Louis

Hi Quantopian,

Can you give us an estimated date when the history function will be available for minute data? I understand this is one of your top priorities in the development of the API, and this would come handy for algos that use trailing minute data for signal, as one has to wait for a long time after going live on simulation before any orders get processed if the trailing window is long.

Thanks
Vincent

Hi Vincent -

I'm not ready to give a date estimate. I know that's a pain, but we're still a very agile startup, and our priority list updates pretty quickly. It makes date commitments very tough (we get more production this way, but at the cost of predictability).

That said, I've currently got it on the board as the next big project. That bodes pretty well for a delivery in early summer. Please understand that could still change if something else comes up.

Dan

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.