Back to Community
how to get data from multiple columns in a csv without use of batch_transform?

As mentioned in the title, is there a way of using only fetch_csv() and history() to get data from multiple columns in a csv file without use of batch_transform?
Since in the help doc, it states "Note: This function [batch_transform] is deprecated in favor of history and will stop working at some point in the future. Please start using history instead."

I have attached my current method (use of batch_transform), which seems to be what many shared algorithms are using.
Specifically, within handle data(), get_past_prices() calls batch_transform() to gather data from multiple columns in the csv, and merges the data with security prices obtained using history().

Clone Algorithm
4
Loading...
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
import datetime
import numpy as np
import pandas as pd

RESOLUTION = '1d'
LOOK_BACK = 40

# start date shall be 2012-01-24 or later.
my_url ='https://dl.dropboxusercontent.com/u/5020910/quandl_yahoo_indices.csv'
csv_symbol = 'quandl'
csv_factors = ["YAHOO/INDEX_HUI","YAHOO/INDEX_VIX","YAHOO/INDEX_OSX",
               "YAHOO/INDEX_XAU","YAHOO/TSX_RTRE_TO","YAHOO/INDEX_WILREIT",
               "YAHOO/INDEX_SML","YAHOO/INDEX_N225","YAHOO/INDEX_W5KLCV",
               "YAHOO/INDEX_HCX"]

def preview(df):
    log.info(' %s ' % df.head())
    return df

def rename_col(df):
    df = df.fillna(method='ffill')
    df = df.tshift(1, freq='b')
    log.info(' \n %s ' % df.head())
    return df
    
# Get historical price data    
@batch_transform(refresh_period=1, window_length=LOOK_BACK)  
def get_past_prices(data):  
    sid_history = history(LOOK_BACK,RESOLUTION,'price')
    for x in csv_factors:
        sid_history[x] = data[x]['quandl']    
    return sid_history

def my_func(context,data):        
    prices = get_past_prices(data)    
    if prices is not None:        
        expected_prices_num = len(context.factors)+1
        print('prices #',prices.shape, 'expected #',(LOOK_BACK,expected_prices_num))
        if prices.shape[0] < LOOK_BACK:
            print('@not enough rows.')
            return
        if prices.shape[1] < expected_prices_num:
            print('!not enough columns.')
            return
        print('hallelujah.')
    else:
        print('prices is None')
    

def initialize(context):
    # DO NOT REMOVE 'dummy stock', let universe have atleast one stock.
    set_benchmark(symbol('SPY'))
        
    fetch_csv(my_url, 
              date_column='Date', 
              date_format='%Y-%m-%d',
              symbol=csv_symbol,
              pre_func=preview,
              post_func=rename_col,
              mask=False)    
        
    context.factors = csv_factors    
    
    schedule_function(
        my_func,
        date_rules.month_start(),
        time_rules.market_open(minutes=1))
    
    context.isFirstRun = True 
    
def handle_data(context, data):        
    # DO NOT REMOVE, gathering data
    prices = get_past_prices(data)
    
    #immediately start investing (as schedule may be monthly). 
    if context.isFirstRun is True:
       context.isFirstRun = False
       my_func(context, data)
    
    for f in csv_factors[:4]:
        if prices is not None:
            record(f,prices[f][-1])
        else:
            record(f,np.nan)
There was a runtime error.
2 responses

Hi Ted,
Currently there is no way to use history to access fetcher data unfortunately. The problems with batch_transform are two fold; one, it requires a warm-up period which you have already seen, and 2, it's being phased out so compatibility with future code changes is unknown.

What you can do is use a context variable to keep a dataframe of the index data as it comes through, then you can join it with sid data from history. You won't solve the problem of needing a warm up period, but you won't have to worry about your code breaking due to us moving further away from batch_transform.

A hacky way to get the initial lookback history is to encode the first days csv value as a string that contains a long list of previous values, then do some splits on the string to recover the original data. It's hacky but I don't see why it wouldn't work.

This post is another hacky way to get historical derived data, but it wont get you the trailing timeseries you are looking for. Sorry there isn't really an elegant solution to this yet, let me know if you have any questions.

Thanks for the in-depth explanation, David!