Getting rid of talib.EMA() warm-up period?

Hello,

I am trying to abide this guideline found in the list of qualities of robust algorithms that I found somewhere on Quantopian.

"Verify the algo will backfill any historical data needed to set initial parameters. The algorithm should begin trading immediately, without a warm-up period"

Except I'm running into a roadblock. Another member of the community has told me that it is standard for talib functions to pass NaN's until it gathers enough data in the data to start producing valid numerical values. How can I have my algorithm backfill historical data (perhaps use the last 15m bars of the previous trading day) so that my 20 period talib.EMA produces numerical values right from the start of the trading day instead of idling for period - 1 bars?

Thank you for any and all help.

3
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
 Returns 1 Month 3 Month 6 Month 12 Month
 Alpha 1 Month 3 Month 6 Month 12 Month
 Beta 1 Month 3 Month 6 Month 12 Month
 Sharpe 1 Month 3 Month 6 Month 12 Month
 Sortino 1 Month 3 Month 6 Month 12 Month
 Volatility 1 Month 3 Month 6 Month 12 Month
 Max Drawdown 1 Month 3 Month 6 Month 12 Month
"""
This is a template algorithm on Quantopian for you to adapt and fill in.
"""
from quantopian.algorithm import attach_pipeline, pipeline_output
from quantopian.pipeline import Pipeline
from quantopian.pipeline.data.builtin import USEquityPricing
from quantopian.pipeline.factors import AverageDollarVolume
from quantopian.pipeline.filters.morningstar import Q1500US
import talib

def initialize(context):
context.spylong = sid(37514)
context.spyshort = sid(38532)

total_minutes = (6 * 60) + 30
for i in range(1,total_minutes):
if i % 15 == 0:
schedule_function(my_rebalance, date_rules.every_day(), time_rules.market_open(minutes=i), True)

schedule_function(my_record_vars, date_rules.every_day(), time_rules.market_close())

def make_pipeline():
pass

pass

def my_assign_weights(context, data):
closes_15m = data.history(context.spylong,'close',390,'1m').resample('15T', label='right', closed='right').mean() #.last()
ema_20 = talib.EMA(closes_15m.dropna(), timeperiod=20)
return ema_20

def my_rebalance(context,data):
x = my_assign_weights(context,data)
print x

def my_record_vars(context, data):
pass

def handle_data(context,data):
pass

There was a runtime error.
7 responses

    closes_15m = data.history(context.spylong,'close',390,'1m').resample('15T', label='right', closed='right').mean()



The 'data.history' above method will get the last 390 minutes of data regardless if that data was today or yesterday or, in the case of a Monday, the previous Friday. There's typically 390 minutes in a trading day (except half days). So, except for the very last minute of the day, it will always get some data from today and some from the previous trading day. If that previous trading day was a half day, it could even return data spanning three trading days.

The code then goes on executing the following function

    ema_20 = talib.EMA(closes_15m.dropna(), timeperiod=20)



This talib.EMA function doesn't return values for the first 19 points because it requires 20 points of data to perform the ema. This may be considered 'warming up' for the function but there isn't any 'warm up' required in your algorithm. There will be valid data the very first time 'my_assign-weights' is executed (which would be 15 minutes after market open on the first day of the backtest or live launch). Remember that 'ema_20' is a series of values. You probably want the latest value so check 'ema_20[-1]'. That will contain the ema of the 20 most recent 15 minute mean closes. You will be averaging one 15 minute value from the current day and 19 values (ie 4.75 hours) of data from the previous trading day(s).

Any particular reason this is being re-sampling every 15 minutes then taking the mean value of closes? Wouldn't simply using the minute data without resampling be appropriate? In either case ' warm up' isn't required though.

Hope that all makes sense?

Hello again Dan Whitnable,

It makes sense but I just can't imagine that live traders who use EMA's in their strategy would settle for the talib.EMA's limitation of having to wait n - 1 bars until it starts to generate usable numerical values for signals.

I guess the question I really mean to ask is: Is there anyway I can make this talib.EMA function use the last 19 or so points of data (15m bars) from the previous trading day to immediately start to produce numerical values that can be used for entry triggers instead of wasting time to wait for the first 19 points?

After some thought, I figure that if there is no way have the talib.EMA function cease waiting n - 1 bars to "warm-up", my only alternative would be to create my own function to generate the EMA values (i'm fairly certain this can be done although I am not 100% sure).

On a final note, no there is no particular reasoning behind taking the mean, sort of just put it that way after seeing someone provide that code on another post here at Quantopian. I realized now that .last() is the right function-end to put(?) although I'm not 100% sure of that either. I cannot use 1m data; I absolutely need to resample the data in 15m bars because I am moving a strategy at QuantConnect over to here and that strategy trades on the 15m chart.

Thank you for all your help.

@ Damon A

You asked "I guess the question I really mean to ask is: Is there anyway I can make this talib.EMA function use the last 19 or so points of data (15m bars) from the previous trading day to immediately start to produce numerical values that can be used for entry triggers instead of wasting time to wait for the first 19 points?"

Let's take a look at the actual data from the first trade of the algorithm. Below is the data which is being used by the talib.EMA function as of 13:45 UTC (ie 9:45 EDT) 15 minutes after the market opens on 2017-06-12 (ie the first day of the backtest).

closes_15m.dropna(): Series
Timestamp('2017-06-09 14:00:00+0000', tz='UTC'): 34.8478
Timestamp('2017-06-09 14:15:00+0000', tz='UTC'): 34.9386666667
Timestamp('2017-06-09 14:30:00+0000', tz='UTC'): 35.0237333333
Timestamp('2017-06-09 14:45:00+0000', tz='UTC'): 35.0624666667
Timestamp('2017-06-09 15:00:00+0000', tz='UTC'): 35.0512
Timestamp('2017-06-09 15:15:00+0000', tz='UTC'): 35.0779285714
Timestamp('2017-06-09 15:30:00+0000', tz='UTC'): 35.0482
Timestamp('2017-06-09 15:45:00+0000', tz='UTC'): 35.0217857143
Timestamp('2017-06-09 16:00:00+0000', tz='UTC'): 35.0204545455
Timestamp('2017-06-09 16:15:00+0000', tz='UTC'): 34.9867142857
Timestamp('2017-06-09 16:30:00+0000', tz='UTC'): 34.862
Timestamp('2017-06-09 16:45:00+0000', tz='UTC'): 34.7551428571
Timestamp('2017-06-09 17:00:00+0000', tz='UTC'): 34.7873846154
Timestamp('2017-06-09 17:15:00+0000', tz='UTC'): 34.7462142857
Timestamp('2017-06-09 17:30:00+0000', tz='UTC'): 34.6747333333
Timestamp('2017-06-09 17:45:00+0000', tz='UTC'): 34.6173333333
Timestamp('2017-06-09 18:00:00+0000', tz='UTC'): 34.4894
Timestamp('2017-06-09 18:15:00+0000', tz='UTC'): 34.4739333333
Timestamp('2017-06-09 18:30:00+0000', tz='UTC'): 34.5053076923
Timestamp('2017-06-09 18:45:00+0000', tz='UTC'): 34.2976666667
Timestamp('2017-06-09 19:00:00+0000', tz='UTC'): 34.0244666667
Timestamp('2017-06-09 19:15:00+0000', tz='UTC'): 34.2133333333
Timestamp('2017-06-09 19:30:00+0000', tz='UTC'): 34.3479333333
Timestamp('2017-06-09 19:45:00+0000', tz='UTC'): 34.2852666667
Timestamp('2017-06-09 20:00:00+0000', tz='UTC'): 34.3806666667
Timestamp('2017-06-12 13:45:00+0000', tz='UTC'): 34.3113333333



Notice that the last value (34.3113333333) is the the value from the first 15 minutes of the current trading day (2017-06-12) and the other 25 values are data from the previous trading day (2017-06-09). The talib.EMA function uses these values you pass and calculates a series of exponentially moving averages. The output series length of the talib functions always equals the input series length. In this case the input has length of 26 so the output has length of 26. Here is the 'ema_20' output from this input.


ema_20: ndarray
0: nan
1: nan
2: nan
3: nan
4: nan
5: nan
6: nan
7: nan
8: nan
9: nan
10: nan
11: nan
12: nan
13: nan
14: nan
15: nan
16: nan
17: nan
18: nan
19: 34.814403295
20: 34.7391712352
21: 34.689091435
22: 34.6566001872
23: 34.62123509
24: 34.5983238116
25: 34.5709913851



The talib.EMA function, with a timeperiod of 20, uses exactly 20 data points for each calculation. Because of this, the first 19 points in the ema_20 output series don't have enough data to calculate and therefore returns NaN. However starting at the 20th point (ie index 19) and for each subsequent group of 20 prices it computes the ema. The last ema in the series (ema_20[25] or ema_20[-1]) is the ema of the first 15 minutes of the current trading day (2017-06-12) and the last 19 values from the previous day. This I believe is exactly what you asked for?

Not sure if you realize, but all backtest variables can be inspected real time using the debug features of the backtester. Take a look here https://www.quantopian.com/help#debugger. The above numbers were simply cut and pasted from the debugger window. It's a great way to see what's going on.

Good luck.

Hello,

I finally understand what you mean. My apologies; I've changed the line 35 in the code to print x[-1] and I see now that I was mistaking the results of print ema_20 for the current ema_20 value which is ema_20[-1].

Thank you very, very much for all this help; this will go a long way in porting my quantconnect algo to quantopian.
Again, if you are willing to accept, I'm more than happy to tip you some dogecoin or bitcoin.

Glad I could help. No tip required.

The talib functions take a bit of getting used to. I was surprised the first time using them that the result is typically NOT a single number but a series. Most of the time the only value of interest is the last one in the series.

You and me both, the series output has already caught me off-guard a couple of times.

I have one last question, not related to the talib,ema really:

What helped you become skilled at building algorithms on Quantopian when you first started?

I'm sure diving right into coding in a baptism by fire approach played a role, forcing you to learn as you went; this is my current approach
But did you take time out of development to really sit down and study pandas/numpy documentation?
Are there any online courses / books / sites-sources that you recommend?

Thanks again!

did you take time out of development to really sit down and study pandas/numpy documentation?

Debugger for learning pandas. Jumping in here. I, personally, learn best by doing, mostly, rather than reading. Using my own backtests or cloned, I recall ramping up quickly on pandas by setting breakpoints in before_trading_start and messing around in the console with the pipeline output. One thing to know there, currently if you type to look at the output and it is very large, you'll see an error that goes by too fast to read and the debugger will bail out so you can get around that with like: output.head() or output.tail() or output.tail(10), limiting it. More precisely, my process is often:

1. Want to do something and stuck (the specific goal seems to help a lot)
2. Google for similar using some terms, often a phrase in quotes, and site:quantopian.com
3. Make some progress
4. Become really stuck, like on something involving numpy for example