Back to Community
Handling data from prior to the first trading day. Theres a Yahoo CSV tool here too.

Hello all,
I have been trying to figure out a way to load data from before the start date so there is no data lag for the first days of trading. I would like to load a csv with the prior few months of prices so I only need it the one time to set some initial parameters. My problem is that the fetcher will go get the same old data daily which seems like a waste and there doesn't seem to be a way around this.
Questions:

  1. Is there a way to call fetch_csv once?
  2. can the fetcher be given several urls or be called several times per cycle?
  3. Is zipline.utils.factory.load_from_yahoo supported???? I get Runtime exception: TypeError: a float is required
  4. Will there be URL support outside of the fetcher?? I assume not for security reasons.

Im following this template because I think it's a great step in the right direction.

import datetime  
import math  
import numpy as np  
import pandas as pd

# I'd like to use the fetcher in this sort of way.

class Thing():

    def __init__(self, context):  
        self.context = context  
    def pre_fetcher(self, df):  
        # modify df and update context here  
        log.info(str(df))  
        return df  
    def post_fetcher(self, df):  
        # And/or here  
        log.info(str(df))  
        return df


class YahooCSV():  
    """  
    This is a tool to get the url for the current quote of a company  
    or the historical data over a time period. They have sector and  
    industry data as well.  
    To extend this there's more info here,  
    http://code.google.com/p/yahoo-finance-managed/wiki/CSVAPI  
    """

    def quote(self, symbol, stat='all'):  
        url = "http://download.finance.yahoo.com/d/quotes.csv?s=%s&f=" %(symbol)  
        all_stats = 'l1c1va2xj1b4j4dyekjm3m4rr5p5p6s7a0b0'  
        if not stat == 'all':  
            url += stat  
        else:  
            url += all_stats  
        return url  
    def history(self, sym, start_date, end_date):  
        #  
        # Date format must be 'YYYY-MM-DD'  
        # str(data[sym].datetime) works for this  
        #  
        url = 'http://ichart.yahoo.com/table.csv?'  
        url += 's={}&a={}&b={}&c={}&d={}&e={}&f={}'.format(  
            sym,  
            str(int(start_date[5:7]) - 1),  
            str(int(start_date[8:10])),  
            str(int(start_date[0:4])),  
            str(int(end_date[5:7]) - 1),  
            str(int(end_date[8:10])),  
            str(int(end_date[0:4]))  
        )  
        url += '&g=d&ignore=.csv'  
        return url  
def initialize(context):  
    context.stocks = [sid(16841), sid(8229)]  
    start_date = '2012-06-01'  
    end_date = '2013-01-01'  
    url = YahooCSV().history('AMZN', start_date, end_date)

    fetch_csv(  
        url,  
        date_column = 'Date',  
        symbol = 'AMZN',  
        pre_func = Thing(context).pre_fetcher,  
        post_func = Thing(context).post_fetcher,  
#      one_time_fetch = True           this would be awesome to have!!  
    )

########################################################  
#                                ALTERNATIVE  
########################################################

def initialize(context):  
    #  
    #  
    context.handle_back_data = True  
    context.back_data_start_date = datetime.datetime(2012,1,1)  
    context.back_data_end_date = datetime.datetime(2012,6,1)  
    #  
    #Build the Universe  
    #Add SPDR SPY  
    context.SPY = sid(8554)  
    context.SHY = sid(23911)

    #Set constraints on borrowing  
    context.pct_invested_threshold = 0.95 #Set limit on percent invested (as a decimal)  
    context.init_margin=1.50 #Set initial margin requirement  
    context.maint_margin=1.25 #Set the maintenance margin requirement  
def handle_data(context, data):  
    if context.handle_back_data:  
        handle_back_data(context)  
        context.handle_back_data = False

    # Update new frame  
    update_newFrame(context, data)  

    #Apply Trade Logic  
    trade_logic(context,data)

def handle_back_data(context):  
    start = context.back_data_start_date  
    end = context.back_data_end_date  
    back_data = load_from_yahoo(  
        stocks=['SPY', 'SHY'],  
        indexes={},  
        start=start,  
        end=end,  
        adjusted=False  
    )  
    # Mess with back data here:  

I actually like the second way more because load_from_yahoo takes a list of stocks.

5 responses

That's pretty cool. Great initiative! I love it.

Great minds think alike, of course. I'm also hoping we make this obsolete, and load it without dealing with Yahoo. We started the project already. Discussion here, spec here.

  1. You can call fetch_csv() I think 5 times, but you only have a few seconds to download the data.
  2. Fetcher only runs once per backtest, and once per day in live trading.
  3. Dunno that one
  4. We're open to other data import ideas. We've built special ones for Estimize, Google Trends, and Wikipedia. General URL access, though, seems to difficult to do securely.
Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

It would be great to have custom data section, so it's possible to preupload some miscellaneous data for back testing (i.e. ETF's NAV).

Pithawat, have you looked at Fetcher?

Thanks for getting back to me, I checked out that spec and I'm a fan, data.history looks like a major improvement. Gotta love all that pandas power. Are there any estimates on time to production?

We run a pretty agile development process, and our priority list updates frequently. That's great for some things, but it's terrible when I'm trying predicting software delivery dates. I think it will be several weeks. Could be sooner, could be later.