Back to Community
Problem accessing my created pandas DataFrame

Hi,

Thank you for your excellent work.

I want to take advantage of the new Fetcher and upload my own signals. In testing the framework for that, I create a simple random DataFrame and try to access it in handle_data(). But in trying to access a member of the DataFrame in handle_data(), I get "Error Runtime exception: KeyError:".

Here is the relevant code (sorry, I did not figure out how to link directly to my algo). I have a few ugly diagnostics to try to figure out the problem. 3 lines are indicated as causing an error (one of which is a bonus for this post: a simple assert statement does not work).


# Cloned from "Dual Moving Averages version 0.2"  
# https://www.quantopian.com/posts/dual-moving-averages-version-0-dot-2

import pandas as pd  
import numpy as npy

START = '2012-01-01'  
END = '2012-12-31'  
index = pd.date_range(START, END, freq='B')

THRESH = 0.1

def initialize(context):  
    print pd.version.version

    # define portfolio  
    context.sids = [sid(24), sid(3766), sid(5061), sid(22802),  
                    sid(26578)]  
    context.N = len(context.sids)  
    # init dict to track whether we invested in a particular sid  
    context.invested = {}  
    context.sig = {}  
    for cur_sid in context.sids:  
        context.invested[cur_sid] = False  
    context.sig = pd.DataFrame(  
        npy.random.rand(len(index), len(context.sids)),  
        index=index, columns=context.sids)

    # This confirms that my pandas syntax is correct.  
    print context.sig[context.sids[0]][index[0]]

def handle_data(context, data):  
    sig = context.sig  
    for cur_sid in context.sids:  
        # skip non-existant sids  
        if cur_sid not in data:  
            continue  
        t = pd.Timestamp(data[context.sids[0]].datetime.date())  
        log.critical('*, %s' % str(t))  
        print type(t)  
        cash_per_sid = context.portfolio.starting_cash / context.N  
        #print sig[cur_sid]  
        i = sig[cur_sid].index == t  
        dti = sig[cur_sid].index[i]  
        print dti  
        print len(dti)                    # output is 1  
        #assert len(dti) == 1        # <----------- UNCOMMENT LINE will crash  
        print type(dti)  
        #print sig[cur_sid][t]       # <-------- UNCOMMENT LINE will crash  
        # buy if sig exceeds threshold.  
        if (sig[cur_sid].ix[t] > THRESH): # and not context.invested[cur_sid]:  #  raises error  
            order_in_cash(data, cur_sid, cash_per_sid)  
            context.invested[cur_sid] = True  

Thank you for any help.

Ryan

4 responses

Hi Ryan,

Thanks for the questions and example code. I ran through and patched up a few things, and stripped out a lot of the print statement to give you a simple example. After I finished, I realized all this alignment by date work isn't necessary at all if you use fetcher. We actually put the data into handle_data's parameter for you - that way your lookahead bias is capped by the quality of the underlying data. Since I had the code working, I figured I would include it anyway! The most important change is a guard on the current algo time being in the index. I'm using a new, not quite documented yet, function to get the algorithm's current time get_datetime(). I also set the timezone of the date range - pandas is actually good about comparing naive and non-naive datetimes/timestamps, but I did it for clarity.

thanks,
fawce

Clone Algorithm
6
Loading...
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Cloned from "Dual Moving Averages version 0.2"  
# https://www.quantopian.com/posts/dual-moving-averages-version-0-dot-2

import pandas as pd  
import numpy as npy

START = '2008-01-01'  
END = '2013-12-31'  
index = pd.date_range(START, END, freq='B', tz='UTC')

THRESH = 0.1

def initialize(context):  
    # print pd.version.version

    # define portfolio  
    context.sids = [sid(24), sid(3766), sid(5061), sid(22802),  
                    sid(26578)]  
    context.N = len(context.sids)  
    # init dict to track whether we invested in a particular sid  
    context.invested = {}  
    context.sig = {}  
    for cur_sid in context.sids:  
        context.invested[cur_sid] = False  
    context.sig = pd.DataFrame(  
        npy.random.rand(len(index), len(context.sids)),  
        index=index, columns=context.sids)

    # This confirms that my pandas syntax is correct.  
    # print context.sig[context.sids[0]][index[0]]

def handle_data(context, data):  
    sig = context.sig  
    for cur_sid in context.sids:  
        # skip non-existant sids  
        if cur_sid not in data:  
            continue  
        
        t = get_datetime()
        # make sure we have a date that aligns with an entry in our
        # signal dataframe. The typical reason is misalignment between
        # the backtest dates and the START/END constants at the top
        # of this file.
        if t not in index:
            log.warning("{dt} not in index".format(dt=t))
            break
        

        i = sig[cur_sid].index == t 
        dti = sig[cur_sid].index[i]  
        assert len(dti) == 1        # <----------- UNCOMMENT LINE is safe! 
        assert sig[cur_sid][t] is not None     # <-------- UNCOMMENT LINE is safe!
        # buy if sig exceeds threshold.  
        if (sig[cur_sid].ix[t] > THRESH) and not context.invested[cur_sid]:  #  raises error  
             
            context.invested[cur_sid] = True 
This backtest was created using an older version of the backtester. Please re-run this backtest to see results using the latest backtester. Learn more about the recent changes.
There was a runtime error.
Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Hi Fawce,

Thanks a lot for that feedback and solution. I should have seen what was happening--I kept staring at the top of my log output, where the date was in my index, not at the bottom of it, where the problem was, as you found. Thank you for that.

I tweaked things a bit so that others can see how to play with such things and put in the skeleton of a simple exit strategy. With Fetcher, I will be able to upload things like my stop loss/profit limits, signal threshold, etc., along with my signal itself.

Within this same idea: is there an accessible lookup dictionary, such as

context.ticker_to_sid = {'AAPL': 24, 'GOOG': 26578, 'IBM': 3766, ...}  

and/or

context.sid_to_ticker = {24: 'AAPL', 26578: 'GOOG', 3766: 'IBM', ...}  

so that our code can live in both worlds (symbol and sid)? I think the answer is 'no' so here is a clarification of my question and planned workaround.

I will Fetch a signal which will be a DataFrame whose columns I would like to be tickers (AAPL, GOOG, etc.). Then I would like to take those tickers and programmatically create my portfolio (context.sids) from these tickers. From reading the other posts and docs, it seems as if this is not possible. Is that the case?

It looks like I need to be in the IDE, type in the symbols (tickers) that I have signals for so that their SIDs appear in the IDE. Then I need to use those SIDs as the names of my columns of my signal DataFrame here on my local machine before Fetch gets it. Essentially, I need to do some manual labor of typing 'sid(' over and over in the IDE, typing in my desired ticker, seeing the SID come up on the screen, and making my own local lookup table dictionary that I can use to create my DataFrames that Fetcher will eventually get. That seems to be the only way to coordinate my outside signal with Quantopian's.

Do I understand that correctly? Sorry if I've missed something obvious.

Regardless, I can deal with this and I very much appreciate the fantastic machine you have built here.

Ryan

Clone Algorithm
7
Loading...
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Simple example to test using an externally-derived signal, acquired
# through Fetcher, to open new positions. Here, I just create a DataFrame
# of a random signal and use a random exit strategy to test the machinery.
#
# No real trading issues are checked here (like margin, etc.).
#
# Cloned from Thomas Wiecki's "Dual Moving Averages version 0.2"  
# https://www.quantopian.com/posts/dual-moving-averages-version-0-dot-2
# Then adapted by me.

import pandas as pd
import numpy as npy


# In future, 'index' will be that of the signal I upload via Fetcher.
START = '2012-01-01'
END = '2012-12-31'
index = pd.date_range(START, END, freq='B')

# When signal exceeds this value, open a new position.
THRESH = 0.1


def initialize(context):
    
    #print pd.version.version

    # Define portfolio.
    context.sids = [sid(24), sid(3766), sid(5061), sid(22802), 
                    sid(26578)]
    context.N = len(context.sids)
    
    # Init dict to track whether we invested in a particular sid.
    context.invested = {}
    for cur_sid in context.sids:
        context.invested[cur_sid] = False
    
    context.sig = pd.DataFrame(
        npy.random.rand(len(index), len(context.sids)),
        index=index, columns=context.sids)
    
    
def order_in_cash(context, data, security, cash):
    shares = cash // data[security].price 
    if shares < 0: 
        # To fix python's flooring of negative integer division. 
        shares += 1 
    #log.debug("security: %i, cash: %f, shares: %i"% (security, cash, shares))
    log.info('Ordering %i shares of sid %i' %(shares, security))
    order(security, shares)
    context.invested[security] = True
    
    
def check_exit(context, data, cur_sid):
    return npy.random.rand(1) > 0.5


def close_position(context, data, cur_sid):
    shares = context.portfolio.positions[cur_sid].amount
    log.info('Closing %i shares of sid %i' % (shares, cur_sid))
    order(cur_sid, -shares)
    context.invested[cur_sid] = False

    
def handle_data(context, data):
    
    t = get_datetime()
        
    # Compliments of fawce in answer to my help question at
    # quantopian.com/posts/problem-accessing-my-created-pandas-dataframe
    #
    # Make sure we have a date that aligns with an entry in our
    # signal dataframe. The typical reason is misalignment between
    # the backtest dates and the START/END constants at the top
    # of this file.
    #
    # If t not in signal index, we cannot use it to open any new
    # positions but we should still check our stop loss/profit limits.
    if t in index:
        NO_OPEN = False
    else:
        log.warning("{t} not in index".format(t=t))
        NO_OPEN = True
        
    sig = context.sig
    cash_per_sid = context.portfolio.starting_cash / context.N
    
    for cur_sid in context.sids:
        
        # skip non-existant sids
        if cur_sid not in data:
            continue        
        
        # Should we close an open position?
        if context.invested[cur_sid]:
            if check_exit(context, data, cur_sid):
                close_position(context, data, cur_sid)
            continue
        
        # Are we not allowed to open a new position?
        if NO_OPEN:
            continue

        # Sanity check.
        i = sig[cur_sid].index == t
        assert len(sig[cur_sid].index[i]) == 1
        
        # Should we open a new position?
        if (sig[cur_sid][t] > THRESH):
            order_in_cash(context, data, cur_sid, cash_per_sid)
                
        
This backtest was created using an older version of the backtester. Please re-run this backtest to see results using the latest backtester. Learn more about the recent changes.
There was a runtime error.

Hi Ryan,

I'm glad you used the code!

Fetcher does symbol2sid mapping automatically. You just need to have a column named "symbol" in your source csv. You also need a column that represents the date as a string, that way each row has a symbol and a date. Then Fetcher will find the sid for the symbol "as of" the date of the same row. The help doc has more details about Fetcher's symbol2sid functionality: https://www.quantopian.com/help#overview-fetcher

have fun,
fawce

Hi Fawce--thanks again. Sorry to miss the symbol2sid functionality. Exploring now.