Back to Community
Getting rid of talib.EMA() warm-up period?

Hello,

I am trying to abide this guideline found in the list of qualities of robust algorithms that I found somewhere on Quantopian.

"Verify the algo will backfill any historical data needed to set initial parameters. The algorithm should begin trading immediately, without a warm-up period"

Except I'm running into a roadblock. Another member of the community has told me that it is standard for talib functions to pass NaN's until it gathers enough data in the data to start producing valid numerical values. How can I have my algorithm backfill historical data (perhaps use the last 15m bars of the previous trading day) so that my 20 period talib.EMA produces numerical values right from the start of the trading day instead of idling for period - 1 bars?

Thank you for any and all help.

Clone Algorithm
2
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 596459a85211504de7faca21
There was a runtime error.
7 responses

Your algorithm is already pulling in data from the previous day.

    closes_15m = data.history(context.spylong,'close',390,'1m').resample('15T', label='right', closed='right').mean() 

The 'data.history' above method will get the last 390 minutes of data regardless if that data was today or yesterday or, in the case of a Monday, the previous Friday. There's typically 390 minutes in a trading day (except half days). So, except for the very last minute of the day, it will always get some data from today and some from the previous trading day. If that previous trading day was a half day, it could even return data spanning three trading days.

The code then goes on executing the following function

    ema_20 = talib.EMA(closes_15m.dropna(), timeperiod=20)

This talib.EMA function doesn't return values for the first 19 points because it requires 20 points of data to perform the ema. This may be considered 'warming up' for the function but there isn't any 'warm up' required in your algorithm. There will be valid data the very first time 'my_assign-weights' is executed (which would be 15 minutes after market open on the first day of the backtest or live launch). Remember that 'ema_20' is a series of values. You probably want the latest value so check 'ema_20[-1]'. That will contain the ema of the 20 most recent 15 minute mean closes. You will be averaging one 15 minute value from the current day and 19 values (ie 4.75 hours) of data from the previous trading day(s).

Any particular reason this is being re-sampling every 15 minutes then taking the mean value of closes? Wouldn't simply using the minute data without resampling be appropriate? In either case ' warm up' isn't required though.

Hope that all makes sense?

Hello again Dan Whitnable,

It makes sense but I just can't imagine that live traders who use EMA's in their strategy would settle for the talib.EMA's limitation of having to wait n - 1 bars until it starts to generate usable numerical values for signals.

I guess the question I really mean to ask is: Is there anyway I can make this talib.EMA function use the last 19 or so points of data (15m bars) from the previous trading day to immediately start to produce numerical values that can be used for entry triggers instead of wasting time to wait for the first 19 points?

After some thought, I figure that if there is no way have the talib.EMA function cease waiting n - 1 bars to "warm-up", my only alternative would be to create my own function to generate the EMA values (i'm fairly certain this can be done although I am not 100% sure).

On a final note, no there is no particular reasoning behind taking the mean, sort of just put it that way after seeing someone provide that code on another post here at Quantopian. I realized now that .last() is the right function-end to put(?) although I'm not 100% sure of that either. I cannot use 1m data; I absolutely need to resample the data in 15m bars because I am moving a strategy at QuantConnect over to here and that strategy trades on the 15m chart.

Thank you for all your help.

@ Damon A

You asked "I guess the question I really mean to ask is: Is there anyway I can make this talib.EMA function use the last 19 or so points of data (15m bars) from the previous trading day to immediately start to produce numerical values that can be used for entry triggers instead of wasting time to wait for the first 19 points?"

Yes. Your algorithm already does this (as explained previously).

Let's take a look at the actual data from the first trade of the algorithm. Below is the data which is being used by the talib.EMA function as of 13:45 UTC (ie 9:45 EDT) 15 minutes after the market opens on 2017-06-12 (ie the first day of the backtest).

closes_15m.dropna(): Series  
Timestamp('2017-06-09 14:00:00+0000', tz='UTC'): 34.8478  
Timestamp('2017-06-09 14:15:00+0000', tz='UTC'): 34.9386666667  
Timestamp('2017-06-09 14:30:00+0000', tz='UTC'): 35.0237333333  
Timestamp('2017-06-09 14:45:00+0000', tz='UTC'): 35.0624666667  
Timestamp('2017-06-09 15:00:00+0000', tz='UTC'): 35.0512  
Timestamp('2017-06-09 15:15:00+0000', tz='UTC'): 35.0779285714  
Timestamp('2017-06-09 15:30:00+0000', tz='UTC'): 35.0482  
Timestamp('2017-06-09 15:45:00+0000', tz='UTC'): 35.0217857143  
Timestamp('2017-06-09 16:00:00+0000', tz='UTC'): 35.0204545455  
Timestamp('2017-06-09 16:15:00+0000', tz='UTC'): 34.9867142857  
Timestamp('2017-06-09 16:30:00+0000', tz='UTC'): 34.862  
Timestamp('2017-06-09 16:45:00+0000', tz='UTC'): 34.7551428571  
Timestamp('2017-06-09 17:00:00+0000', tz='UTC'): 34.7873846154  
Timestamp('2017-06-09 17:15:00+0000', tz='UTC'): 34.7462142857  
Timestamp('2017-06-09 17:30:00+0000', tz='UTC'): 34.6747333333  
Timestamp('2017-06-09 17:45:00+0000', tz='UTC'): 34.6173333333  
Timestamp('2017-06-09 18:00:00+0000', tz='UTC'): 34.4894  
Timestamp('2017-06-09 18:15:00+0000', tz='UTC'): 34.4739333333  
Timestamp('2017-06-09 18:30:00+0000', tz='UTC'): 34.5053076923  
Timestamp('2017-06-09 18:45:00+0000', tz='UTC'): 34.2976666667  
Timestamp('2017-06-09 19:00:00+0000', tz='UTC'): 34.0244666667  
Timestamp('2017-06-09 19:15:00+0000', tz='UTC'): 34.2133333333  
Timestamp('2017-06-09 19:30:00+0000', tz='UTC'): 34.3479333333  
Timestamp('2017-06-09 19:45:00+0000', tz='UTC'): 34.2852666667  
Timestamp('2017-06-09 20:00:00+0000', tz='UTC'): 34.3806666667  
Timestamp('2017-06-12 13:45:00+0000', tz='UTC'): 34.3113333333


Notice that the last value (34.3113333333) is the the value from the first 15 minutes of the current trading day (2017-06-12) and the other 25 values are data from the previous trading day (2017-06-09). The talib.EMA function uses these values you pass and calculates a series of exponentially moving averages. The output series length of the talib functions always equals the input series length. In this case the input has length of 26 so the output has length of 26. Here is the 'ema_20' output from this input.


ema_20: ndarray  
0: nan  
1: nan  
2: nan  
3: nan  
4: nan  
5: nan  
6: nan  
7: nan  
8: nan  
9: nan  
10: nan  
11: nan  
12: nan  
13: nan  
14: nan  
15: nan  
16: nan  
17: nan  
18: nan  
19: 34.814403295  
20: 34.7391712352  
21: 34.689091435  
22: 34.6566001872  
23: 34.62123509  
24: 34.5983238116  
25: 34.5709913851

The talib.EMA function, with a timeperiod of 20, uses exactly 20 data points for each calculation. Because of this, the first 19 points in the ema_20 output series don't have enough data to calculate and therefore returns NaN. However starting at the 20th point (ie index 19) and for each subsequent group of 20 prices it computes the ema. The last ema in the series (ema_20[25] or ema_20[-1]) is the ema of the first 15 minutes of the current trading day (2017-06-12) and the last 19 values from the previous day. This I believe is exactly what you asked for?

Not sure if you realize, but all backtest variables can be inspected real time using the debug features of the backtester. Take a look here https://www.quantopian.com/help#debugger. The above numbers were simply cut and pasted from the debugger window. It's a great way to see what's going on.

Good luck.

Hello,

I finally understand what you mean. My apologies; I've changed the line 35 in the code to print x[-1] and I see now that I was mistaking the results of print ema_20 for the current ema_20 value which is ema_20[-1].

Thank you very, very much for all this help; this will go a long way in porting my quantconnect algo to quantopian.
Again, if you are willing to accept, I'm more than happy to tip you some dogecoin or bitcoin.

Glad I could help. No tip required.

The talib functions take a bit of getting used to. I was surprised the first time using them that the result is typically NOT a single number but a series. Most of the time the only value of interest is the last one in the series.

You and me both, the series output has already caught me off-guard a couple of times.

I have one last question, not related to the talib,ema really:

What helped you become skilled at building algorithms on Quantopian when you first started?

I'm sure diving right into coding in a baptism by fire approach played a role, forcing you to learn as you went; this is my current approach
But did you take time out of development to really sit down and study pandas/numpy documentation?
Are there any online courses / books / sites-sources that you recommend?

Thanks again!

did you take time out of development to really sit down and study pandas/numpy documentation?

Debugger for learning pandas. Jumping in here. I, personally, learn best by doing, mostly, rather than reading. Using my own backtests or cloned, I recall ramping up quickly on pandas by setting breakpoints in before_trading_start and messing around in the console with the pipeline output. One thing to know there, currently if you type to look at the output and it is very large, you'll see an error that goes by too fast to read and the debugger will bail out so you can get around that with like: output.head() or output.tail() or output.tail(10), limiting it. More precisely, my process is often:

  1. Want to do something and stuck (the specific goal seems to help a lot)
  2. Google for similar using some terms, often a phrase in quotes, and site:quantopian.com
  3. Make some progress
  4. Become really stuck, like on something involving numpy for example
  5. Google using site:stackoverflow.com <==
  6. If no answer, post a question
  7. Trying things in the debugger along the way
  8. Success