Back to Community
History Function Comparing 2 Stocks

Algorithm works by comparing ratio of prices A/B to average of 15 period A / average of 15 periods of B

How can I use the history function to record the historical 15 day average price for each stock and then record the Mean and Standard Deviation of this ratio?

16 responses

Trying something like this but it's not working. Any idea why?

def avgstd(context,data):  
    #Get X day period data  
    price_history = history(bar_count=20, frequency='1d', field='price')  
    mean = price_history.mean()  

    a_price = mean[data[context.stocks[0]]]  
    b_price = mean[data[context.stocks[1]]]

   ratio =a_price / b_price  
   ratio_mean = ratio.mean()  
   ratio_std = ratio.std()

try:

    a_price = mean[context.stocks[0]]  
    b_price = mean[context.stocks[1]]

history returns a pandas DataFrame and its keys/columns are the sids. So your mean dataframe expects a sid as an identifying key. Hope this helps.

That works, but how do I record the standard deviation of the ratio?

def ratiostd(context,data):  
    #Get X day period data  
    price_history = history(bar_count=15, frequency='1d', field='price')  
    mean = price_history.mean()  

    a_price = mean[context.stocks[0]]  
    b_price = mean[context.stocks[1]]

   ratio =a_price / b_price  

I'm probably misunderstanding your question but your price_history.mean() will give you an average price over 15 periods for each of the securities in your universe. For example, say you have context.stocks set up with 2 sids. Then your price_history.mean() will give you 2 numbers, each a mean for a stock. By doing a_price/b_price you're calculating the ratio of the means of 2 securities and that's just a single number. If you meant to store values calculated for each handle_data run, the you can simply store the value in some context list variable or such. e.g.

rolling_means.append(ratio)  

Similar idea if you also track stdev where stdev = price_history.std(). See https://www.quantopian.com/help#ide-history for more guidance. If you have a more complete example to share, I'm happy to go through it.

That makes sense, but when I try to get std of the rolling_means it breaks down. I added a log to see the data and it is only holding 1 number I think?

import numpy  
def initialize(context):  
    # this example works on Apple's data  
    context.aapl = sid(24)  
    context.msft = sid(5061)

def handle_data(context, data):  
    # use history to pull the last 10 days of price  
    price_history = history(bar_count=10, frequency='1d', field='price')  
    # calculate the standard deviation using std()  
    mean = price_history.mean()  
    # ratio calculation  
    m1=mean[context.aapl]  
    m2=mean[context.msft]  
    ratio=m1/m2  
    rolling_means = []  
    rolling_means.append(ratio)  
    std_ratio= numpy.std(rolling_means)  
    record(m1=m1)  
    record(m2=m2)  
    record(ratio=ratio)  
    record(ratio_std=std_ratio)  
    log.info(rolling_means)  

Al, I made some edits to the code with explanations that I hope will be clear. Let me know if you have any questions. Thanks

import numpy as np  
from pytz import timezone

def initialize(context):  
    # this example works on Apple's data  
    context.aapl = sid(24)  
    context.msft = sid(5061)

'''
Make rolling_means a context variable (instead of local to handle_data) so it doesn_t reset every time handle_data is called  
'''
    context.rolling_means = []

def handle_data(context, data):

'''
Calling history() requires running backtest in minute mode.  
Handle_data runs every time there_s data in a minute bar.  
For brevity of logs and faster testing, let_s force handle_data to run through completely only once a day.  
For this, use the newDay function defined below to check for a new day.  
newDay() checks the exchange time and allows execution of the rest of handle_data only if it_s 9:31am  
Otherwise, handle_data just keeps looping, doing nothing until the next 9:31am.  
There is no bar at 9:30am (the first bar is timed at it_s end - 9:31am. It has the open (at 9:30am), hi,lo and close at 9:31am)  
You can change the time to suit your needs or it can be implemented to simply check dates instead of times.  
'''
    context.exchange_time = get_datetime().astimezone(timezone('US/Eastern'))  
    if not newDay(context):  
        return  
    # use history to pull the last 10 days of price  
    price_history = history(bar_count=10, frequency='1d', field='price')  
    # calculate the standard deviation using std()  
    mean = price_history.mean()  
    # ratio calculation  
    m1=mean[context.aapl]  
    m2=mean[context.msft]  
    ratio=m1/m2  
    context.rolling_means.append(ratio)  
    std_ratio= np.std(context.rolling_means)  
    record(m1=m1)  
    record(m2=m2)  
    record(ratio=ratio)  
    record(ratio_std=std_ratio)  
#    log.info(context.rolling_means)  
'''
edited the log messaging a bit for clarity  
'''
    message = '{dt}: m1={m1}, m2={m2}, ratio={ratio}, std_ratio={std_ratio}'  
    message = message.format(dt=context.exchange_time,m1=m1,m2=m2,ratio=ratio,std_ratio=std_ratio)  
    log.debug(message)  


def newDay(context):  
    new_day = False  
    if context.exchange_time.hour == 9 and context.exchange_time.minute == 31:  
        new_day=True  
    return new_day

The only issue is keeping std of ratio to only include last X days. I added this code and was working on a way to have it function for anytime frame.

    #keep # of periods for std steady  
    if len(context.rolling_means)==10:  
        del context.rolling_means[0]  
        return  

You probably want to use python's deque object from the collections module instead of a list. Deque is like a list just more efficient and you can specify a maximum length. Once you reach the max length, each time you append on one end, it removes an item from the opposite end. See https://docs.python.org/2/library/collections.html#collections.deque

You might also want to take a look at some of the rolling stats methods provided natively by pandas. For example, it looks like rolling_mean and rolling_std might do exactly what you want with no more effort on your part. I threw together an example that shows some common idioms for working with DataFrames here.

https://www.quantopian.com/posts/working-with-history-dataframes

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Scott,

I have been trying to build a newer method that uses data frames, but am having trouble figuring out how to add new columns of data to the price history.

Ideally I would like to add:
a column called ratio and set its values = price of Sid[1] / price of Sid[2]
a column called historical_ratio and set its values = the average of the ratio for the period set by bar count
a final column called ratio_std and set its values = std of the ratio

Hi Al,

It sounds like what you want is pandas.rolling_mean and pandas.rolling_std. One caveat with their usage is that if you want a rolling mean of length 10 extending back 10 days, you actually need to request a history of length 19, because you need the 9 days prior to the earliest day you care about to calculate the mean. In general you to calculate a N bars of a rolling stat of length K, you'll need (N + K - 1) history bars. I've attached a backtest with a source code example for this sort of setup (I accidentally used 20 bars instead of 19. Index arithmetic is hard...).

You can read more about pandas rolling statistics capabilities here: http://pandas.pydata.org/pandas-docs/stable/computation.html#moving-rolling-statistics-moments.

Clone Algorithm
10
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 53b17f5c3fc28c07228be996
There was a runtime error.

Thanks for explaining that Scott. The last part that I am having difficulty with is getting 1 particular value from the dataframe. I can get a particular column and the last row, but it returns the Date index with the value I want.

ratio=prices['historical_ratio'][-1:]  
``` returns
2014-05-01 13:31:00+00:00    1.999622  

I think you'll get what you want if you drop the colon from the second indexer. The way to think about this is by analogy with python's list slicing semantics.

If I have a list, my_list = [1, 2, 3, 4], then

my_list[N]  

means "give me the Nth element of my_list". One of the nice features of Python's list slicing is that negative indices wrap to the end of lists, so
my_list[-1] becomes syntactic sugar for "give me the last element of my_list".

Adding a colon to your indexer changes the semantics of what you're asking Python to do. When you write
my_list[N:] you're now saying "give me the sub-list of elements drawn from my_list, starting at index N." So for example,

my_list = [1, 2, 3, 4]  
sub_list = my_list[2:] # sub_list = [3,4]  

In the particular case where N = -1, you're saying "give me the sub-list of elements drawn from my_list, starting with the last element of the list. Thus we get

my_list = [1, 2, 3, 4]  
sub_list = my_list[-1:] # sub_list = [4]  

Pandas Series (1D), DataFrame (2D), and Panel (3D) objects extend these semantics in a natural way. Indexing into one of these objects with a scalar value (i.e., without a colon), returns an object of one dimension lower than the original object. Indexing with a slice (i.e. using the colon syntax) returns a filtered-down object of the same dimension as the original object.

In the example above, prices is a 2D DataFrame object. When you index into it with the string 'historical_ratio' you've indexed by a scalar, so you get back a 1D Series, because that's the pandas data type of one dimension lower than your original DataFrame. This is analogous to doing my_list[N] in the simple list case. You then index into the resulting series with the slice [-1:], and because you used a slice instead of a scalar, we get a length-1 Series instead of a value. This is analogous to doing my_list[N:] in the list case.

If instead you index with [-1], pandas will return 1.999622. You can think of this as pandas again "lowering the dimension" of our 1D Series to a "0D" object, which is just a simple scalar.

Hope that helps,
-Scott

Thank you that clears it up very well. I had not been able to figure it out based on the documentation for data frames for some reason. This makes a lot of sense.

working well:

import pandas as pd  
from pytz import timezone

def initialize(context):  
    # Set Equities  
    context.sid1 = sid(24)   # Stock A  
    context.sid2 = sid(5061) # Stock B

def handle_data(context, data):  
    stockA = context.sid1  
    stockB = context.sid2  
    #Set Confidence interval  

    # Run the algorithm only if it is a new day  
    context.exchange_time =         get_datetime().astimezone(timezone('US/Eastern'))  
    if not newDay(context):  
        return  
    # Request history from the last 20 days, so that we can compute  
    # a 10 day mean/std starting from 10 days ago.  
    prices = history(20, '1d', 'price').copy()

    # Each column in a DataFrame is a pandas Series, and  
    # Series / Series does the division pairwise.  
    prices['ratio'] = prices[stockA] / prices[stockB]  
    # rolling_mean(series, 10) gives us the mean over the last 10 entries for each entry  
    # in the series.  Note that the first 10 entries will be NaN, because we don't have  
    # a long enough window to compute the full mean.  
    prices['historical_ratio'] = pd.rolling_mean(prices['ratio'], 10)  
    # rolling_std is analogous to rolling_mean.  
    prices['ratio_std'] = pd.rolling_std(prices['ratio'], 10)

    # Truncate the prices dataframe to the length we actually care about,  
    # which is the last 10 entries.  
    prices = prices.iloc[-10:]  
    log.info(prices)  
    log.info(list(prices.columns.values))

    # Do stuff with our calcuated DataFrame!  
    # ****Trading Locic****  
    # Get last period ratio  
    ratio=prices['ratio'][-1]  
    record(ratio=ratio)  
    # Get last period rolling ratio & standard deviation of rolling ratio  
    historical_ratio=prices['historical_ratio'][-1]  
    ratio_std=prices['ratio_std'][-1]  
    upper=historical_ratio+ratio_std*1.5  
    lower=historical_ratio+ratio_std*-1.5  
    record(historical_ratio=historical_ratio)  
    record(upper=upper)  
    record(lower=lower)  


# Determines if it is a new day  
def newDay(context):  
    new_day = False  
    if context.exchange_time.hour == 9 and context.exchange_time.minute == 31:  
        new_day=True  
    return new_day  

Awesome! Glad to be of help.

It's also worth noting that another (likely more efficient) way to do this would be

prices.ix[-1, 'historical_ratio']