Back to Community
Algorithm takes time to warm up?

Hello,

Fairly new to quantopian, self-taught python. Not the most experienced programmer.
My algorithm has hit a roadblock that I do not understand.

I have isolated the problematic code:

closes_15m = data.history(context.spylong,'close',390,'1m').resample('15T', label = 'right').mean() #.last()  
highs_15m = data.history(context.spylong,'high',390,'1m').resample('15T', label = 'right').mean() #.last()  
lows_15m = data.history(context.spylong,'low',390,'1m').resample('15T', label = 'right').mean() #.last()  

ema_8 = talib.EMA(closes_15m, timeperiod=8)  
ema_20 = talib.EMA(closes_15m, timeperiod=20)

ema_high = talib.EMA(highs_15m, timeperiod=14)  
ema_low = talib.EMA(lows_15m, timeperiod=14)  

After some scrutinizing, I found two issues so far:

1). When I run the algorithm and have it print highs_15m, it prints the highs for bars after market close. Literally, at timestamps past 16:00, such as 16:15:00, it will give me a high value which I guess is an aftermarket bar's high? I do not understand why it is calculating aftermarket bar highs and I would like it not to.

2). When I run the algorithm and have it print ema_high, every trading day it will wait for 13 bars (warming up I presume) and then at the end of the 14th bar (reminder: bars are each 15m), it will print values until at some point which if I recall correctly is 3 bars or so prior to market close, it will start to print NaN's and it will print out a lot of these NaN's even well after market close.

Here is the log output for phenomenon 1:

2017-07-06 09:45 PRINT 2017-07-05 14:00:00+00:00 34.291923
2017-07-05 14:15:00+00:00 34.195533
2017-07-05 14:30:00+00:00 34.139867
2017-07-05 14:45:00+00:00 34.266929
2017-07-05 15:00:00+00:00 34.385571
2017-07-05 15:15:00+00:00 34.447071
2017-07-05 15:30:00+00:00 34.466000
2017-07-05 15:45:00+00:00 34.427889
2017-07-05 16:00:00+00:00 34.400000
2017-07-05 16:15:00+00:00 34.386375
2017-07-05 16:30:00+00:00 34.448100
2017-07-05 16:45:00+00:00 34.466385
2017-07-05 17:00:00+00:00 34.497083
2017-07-05 17:15:00+00:00 34.515000
2017-07-05 17:30:00+00:00 34.543727
2017-07-05 17:45:00+00:00 34.543143
2017-07-05 18:00:00+00:00 34.503667
2017-07-05 18:15:00+00:00 34.511538
2017-07-05 18:30:00+00:00 34.526667
2017-07-05 18:45:00+00:00 34.600000
2017-07-05 19:00:00+00:00 34.528273
2017-07-05 19:15:00+00:00 34.463615
2017-07-05 19:30:00+00:00 34.506727
2017-07-05 19:45:00+00:00 34.551308
2017-07-05 20:00:00+00:00 34.566000
2017-07-05 20:15:00+00:00 34.580000
2017-07-05...
2017-07-06 10:00 PRINT 2017-07-05 14:15:00+00:00 34.190929
2017-07-05 14:30:00+00:00 34.139867
2017-07-05 14:45:00+00:00 34.266929
2017-07-05 15:00:00+00:00 34.385571
2017-07-05 15:15:00+00:00 34.447071
2017-07-05 15:30:00+00:00 34.466000
2017-07-05 15:45:00+00:00 34.427889
2017-07-05 16:00:00+00:00 34.400000
2017-07-05 16:15:00+00:00 34.386375
2017-07-05 16:30:00+00:00 34.448100
2017-07-05 16:45:00+00:00 34.466385
2017-07-05 17:00:00+00:00 34.497083
2017-07-05 17:15:00+00:00 34.515000
2017-07-05 17:30:00+00:00 34.543727
2017-07-05 17:45:00+00:00 34.543143
2017-07-05 18:00:00+00:00 34.503667
2017-07-05 18:15:00+00:00 34.511538
2017-07-05 18:30:00+00:00 34.526667
2017-07-05 18:45:00+00:00 34.600000
2017-07-05 19:00:00+00:00 34.528273
2017-07-05 19:15:00+00:00 34.463615
2017-07-05 19:30:00+00:00 34.506727
2017-07-05 19:45:00+00:00 34.551308
2017-07-05 20:00:00+00:00 34.566000
2017-07-05 20:15:00+00:00 34.580000
2017-07-05 20:30:00+00:00 NaN
2017-07-05...
2017-07-06 10:15 PRINT 2017-07-05 14:30:00+00:00 34.139143
2017-07-05 14:45:00+00:00 34.266929

2017-07-05 15:00:00+00:00 34.385571

Here is the log output for phenomenon 2:

2017-07-06 09:45 PRINT [ nan nan nan nan nan
nan nan nan nan nan
nan nan nan 34.38098045 34.40268003
34.42140841 34.43237617 34.44293115 34.45409588 34.47354976
34.48084616 34.47854872 34.48230586 34.49150611 34.50143863
34.51191348 nan nan nan nan
nan nan nan nan nan
nan nan nan nan nan
nan nan nan nan nan
nan nan nan nan nan
nan nan nan nan nan
nan nan nan nan nan
nan nan nan nan nan
nan nan nan nan nan
nan nan nan nan nan
nan nan ...
2017-07-06 10:00 PRINT [ nan nan nan nan nan
nan nan nan nan nan
nan nan nan 34.39863756 34.41790493
34.42933983 34.44029965 34.45181525 34.47157322 34.47913315
34.47706411 34.4810192 34.490391 34.5004722 34.51107591
nan nan nan nan nan
nan nan nan nan nan
nan nan nan nan nan
nan nan nan nan nan
nan nan nan nan nan
nan nan nan nan nan
nan nan nan nan nan
nan nan nan nan nan
nan nan nan nan nan
nan nan nan nan nan
nan nan ...
2017-07-06 10:15 PRINT [ nan nan nan nan nan
nan nan nan nan nan
nan nan nan 34.42374402 34.43440037
34.44468545 34.45561628 34.47486744 34.48198815 34.47953844
34.48316362 34.4922495 34.5020829 34.51247185 nan
nan nan nan nan nan
nan nan nan nan nan
nan nan nan nan nan
nan nan nan nan nan
nan nan nan nan nan
nan nan nan nan nan
nan nan nan nan nan
nan nan nan nan nan
nan nan nan nan nan
nan nan nan nan nan

nan nan ...

Please help as I am absolutely and utterly confused and without any inkling of why or how this is happening.
Thanks in advance!

6 responses

Maybe take a look at this thread https://www.quantopian.com/posts/how-to-use-the-resample-correctly . It addresses the issues you are seeing.

Basically, you first need to make sure that you are getting your 'buckets' defined correctly. They should be defined like this


resample_15m_mean = prices_1m.resample('15T', label='right', closed = 'right').mean()

Next, remember that the times reported in the result are in UTC. The US markets are open from 09:30-16:00 EDT (or EST depending upon the time of year). On July 5, 2017 (which the data is from) the markets were open from 13:30 - 20:00 UTC. Also be aware that there is no data for the initial minute. The available data is from 13:31-20:00.

Those two should clear up your issue #1.

Issue #2 is simply a result of the resample method returning NaN for any date-times which are missing in the data. The resample method tries to 'fill in' anything which it thinks is missing. In this case, all the times while the markets were closed it thinks is missing data and fills in with NaNs. The talib functions especially do not deal well with NaN values. You will need to drop those rows or deal with them in some other manner. This is shown in the samples in the post above.

It's always a very very good idea to work through the data inside a notebook BEFORE coding to get a clear understanding of that data. Attached is a sample notebook that fetches and manipulates the data so you can see exactly what is going on with the resample method.

Good luck.

Loading notebook preview...
Notebook previews are currently unavailable.

Hello,

I really appreciate the speedy reply, thank you for your assistance.
I've made the changes to resample in my code but I am still running into some peculiar bugs.

2017-07-06 09:45  PRINT 2017-07-03 17:00:00+00:00    34.492667  
2017-07-03 17:15:00+00:00          NaN  
2017-07-03 17:30:00+00:00          NaN  
2017-07-03 17:45:00+00:00          NaN  
2017-07-03 18:00:00+00:00          NaN  
2017-07-03 18:15:00+00:00          NaN  
2017-07-03 18:30:00+00:00          NaN  
2017-07-03 18:45:00+00:00          NaN  
2017-07-03 19:00:00+00:00          NaN  
2017-07-03 19:15:00+00:00          NaN  
2017-07-03 19:30:00+00:00          NaN  
2017-07-03 19:45:00+00:00          NaN  
2017-07-03 20:00:00+00:00          NaN  
2017-07-03 20:15:00+00:00          NaN  
2017-07-03 20:30:00+00:00          NaN  
2017-07-03 20:45:00+00:00          NaN  
2017-07-03 21:00:00+00:00          NaN  
2017-07-03 21:15:00+00:00          NaN  
2017-07-03 21:30:00+00:00          NaN  
2017-07-03 21:45:00+00:00          NaN  
2017-07-03 22:00:00+00:00          NaN  
2017-07-03 22:15:00+00:00          NaN  
2017-07-03 22:30:00+00:00          NaN  
2017-07-03 22:45:00+00:00          NaN  
2017-07-03 23:00:00+00:00          NaN  
2017-07-03 23:15:00+00:00          NaN  
2017-07-03...  

My algorithm is still attempting to calculate aftermarket 15m prices. Also for the during-market times of the previous trading day, it calculates NaN's.
How do I stop it from retrieving aftermarket prices and what do you make of the algorithm retrieving NaN's for price data during market hours?

Again, I really appreciate all the help.

Please attach a notebook or algorithm. It's much easier to help debug that way.

Hello,

I believe I have attached the algorithm.
Here you go

Clone Algorithm
2
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 5962f0c221f0506b5d024e87
There was a runtime error.

When working with security data, always make sure to account for NaN values. They pop up everywhere.

First, correct the resample methods to include the 'closed = 'right'' parameter.

    closes_15m = data.history(context.spylong,'close',390,'1m').resample('15T', label = 'right', closed = 'right').mean()  
    highs_15m = data.history(context.spylong,'high',390,'1m').resample('15T', label = 'right', closed = 'right').mean()  
    lows_15m = data.history(context.spylong,'low',390,'1m').resample('15T', label = 'right', closed = 'right').mean()  

This isn't causing a particular error but the data will be a bit off without it.

Second, the resample method adds a lot of NaNs in between days when the market isn't open. By always requesting 390 minutes of data you will always be getting data from part of today and part of yesterday. The resample method adds NaN values in between. So, you need to account for those somehow. The simplest is to remove them. This may at times create other issues, but, to start with simply apply the 'dropna()' method after using the resample method.

So, change the talib functions to look like this

    ema_8 = talib.EMA(closes_15m.dropna(), timeperiod=8)  
    ema_20 = talib.EMA(closes_15m.dropna(), timeperiod=20)

    ema_high = talib.EMA(highs_15m.dropna(), timeperiod=14)  
    ema_low = talib.EMA(lows_15m.dropna(), timeperiod=14)

Your logs should now show the first 13 values of the ema_high as NaN. This is the normal behavior of the talib functions. Anytime they do not have enough data (ie only 13 pieces of data for calculating a 14 day average) they will return NaN.

One should also be careful with NaN values which are returned by the 'high', 'low', and 'close' data.history method. If a security didn't trade during a specific minute then the values for those fields will be NaN. This is one advantage to using the 'price' field. It returns the last closing price but is forward filled. It never returns Nan.

The algorithm should work with these changes.

Thank you very much for this; sorry for the workload.

Is there anyway I could have the algo retrieve the during market 15m bar data of the previous trading day?
So as to not have the algorithm "warm-up" for 13 bars, generating NaN's as it does so.

For all your help, I'd be happy to tip you some dogecoin if you'll accept