Back to Community
How to use the resample() correctly?

Hi,

I want to use the resample() to get the differemt timeframe datas for calculating the indicator. I've searched here in the forum and found some examples. But I doubt a little bit they are correct.

In my attached algo example I try to create the 5-minutes dataset of TESLA for the date of Jun.2nd 2017. What I got are as follow:
... 2017-06-02 21:31 cal_resampled_indicator:53 INFO 5T:
2017-06-01 20:00:00+00:00 340.29
2017-06-02 13:30:00+00:00 338.66
Name: Equity(39840 [TSLA]), dtype: float64
2017-06-02 21:36 cal_resampled_indicator:53 INFO 5T:
2017-06-02 13:30:00+00:00 338.480
2017-06-02 13:35:00+00:00 338.405
Name: Equity(39840 [TSLA]), dtype: float64
2017-06-02 21:41 cal_resampled_indicator:53 INFO 5T:
2017-06-02 13:35:00+00:00 337.3
2017-06-02 13:40:00+00:00 337.4
Name: Equity(39840 [TSLA]), dtype: float64
2017-06-02 21:46 cal_resampled_indicator:53 INFO 5T:
2017-06-02 13:40:00+00:00 338.415
2017-06-02 13:45:00+00:00 338.829
Name: Equity(39840 [TSLA]), dtype: float64
2017-06-02 21:51 cal_resampled_indicator:53 INFO 5T:
2017-06-02 13:45:00+00:00 338.300
2017-06-02 13:50:00+00:00 338.602
Name: Equity(39840 [TSLA]), dtype: float64
2017-06-02 21:56 cal_resampled_indicator:53 INFO 5T:
2017-06-02 13:50:00+00:00 338.55
2017-06-02 13:55:00+00:00 338.20
Name: Equity(39840 [TSLA]), dtype: float64
2017-06-02 22:01 cal_resampled_indicator:53 INFO 5T:
2017-06-02 13:55:00+00:00 338.68
2017-06-02 14:00:00+00:00 338.85
Name: Equity(39840 [TSLA]), dtype: float64
...

My opinion these are not correct.

They should be as follow:
... 2017-06-02 21:31 cal_resampled_indicator:53 INFO 5T:
2017-06-01 20:00:00+00:00 340.29
2017-06-02 13:30:00+00:00 338.66
Name: Equity(39840 [TSLA]), dtype: float64
2017-06-02 21:36 cal_resampled_indicator:53 INFO 5T:
2017-06-02 13:30:00+00:00 338.66
2017-06-02 13:35:00+00:00 338.405
Name: Equity(39840 [TSLA]), dtype: float64
2017-06-02 21:41 cal_resampled_indicator:53 INFO 5T:
2017-06-02 13:35:00+00:00 338.405
2017-06-02 13:40:00+00:00 337.4
Name: Equity(39840 [TSLA]), dtype: float64
2017-06-02 21:46 cal_resampled_indicator:53 INFO 5T:
2017-06-02 13:40:00+00:00 337.4
2017-06-02 13:45:00+00:00 338.829
Name: Equity(39840 [TSLA]), dtype: float64
2017-06-02 21:51 cal_resampled_indicator:53 INFO 5T:
2017-06-02 13:45:00+00:00 338.829
2017-06-02 13:50:00+00:00 338.602
Name: Equity(39840 [TSLA]), dtype: float64
2017-06-02 21:56 cal_resampled_indicator:53 INFO 5T:
2017-06-02 13:50:00+00:00 338.602
2017-06-02 13:55:00+00:00 338.20
Name: Equity(39840 [TSLA]), dtype: float64
2017-06-02 22:01 cal_resampled_indicator:53 INFO 5T:
2017-06-02 13:55:00+00:00 338.20
2017-06-02 14:00:00+00:00 338.85
Name: Equity(39840 [TSLA]), dtype: float64
...

Maybe someone can tell the reason?

Cheers

Clone Algorithm
15
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 593bde7615de616e5485b3b2
There was a runtime error.
13 responses

It appears you just want to shift the 'base' index of the resampling. The default for the resample method is to start at the resample boundry (in this case 0 and 5). If you want to offset this you can set the 'base' parameter. Maybe look at the pandas documentation http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.Series.resample.html .

The following should do what you want. Also see attached algorithm with this change.

    prices = prices_1m.resample('5T', closed='left', base=-3).last().dropna()

Clone Algorithm
6
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 593d3bd43d3fa069ea446fcc
There was a runtime error.

Hi Dan,

Many thanks!

The 'bas=-3' seems very important. But truly to say, I haven't found this in the examples I found here in the community. That's why I said "I doubt a little bit they are correct". Am I right? Or I haven't found the correct example here?

Cheers

There isn't a single 'correct' way to resample but here is my suggested method. This will return the last non-NaN price in each 5 minute bucket. The buckets start at times 01, 06, 11, 16 etc and end at times 05, 10, 15, 20 etc.

    close_1m = data.history(my_stocks, 'close', my_timeframe, '1m')  
    close_5m = close_1m.resample('5T', closed='right', label='right').last()

The "closed='right'" parameter includes the data from the end of the bucket (ie time 0, 5, 10 etc) in the calculation. The "label='right'" parameter labels each bucket with the last point in the bucket (otherwise we would have forward bias looking into the future). The 'last()' method simply returns the last non-NaN data value in the bucket.

However, if you don't want the last price but maybe the average price over the resampling period. One could do something like this

    close_1m = data.history(my_stocks, 'close', my_timeframe, '1m')  
    close_5m = close_1m.resample('5T', closed='right', label='right').mean()

Or, alternately if you want the maximum price over the bucket then this

    close_1m = data.history(my_stocks, 'close', my_timeframe, '1m')  
    close_5m = close_1m.resample('5T', closed='right', label='right').max()

Hi Dan,

Thans for the reply. The purpose of my question is to try to get the "clean" resampled datas for calculating the indicator of different timeframes.

I've tried your methed and have two questions.

  1. I test it again for TSLA on Jun.7th. I got the 5-min price as follow:
    ... ... 2017-06-07 15:36 cal_resampled_indicator:53 INFO 5T:
    2017-06-07 13:35:00+00:00 356.160
    2017-06-07 13:40:00+00:00 356.715
    Freq: 5T, Name: Equity(39840 [TSLA]), dtype: float64
    2017-06-07 15:41 cal_resampled_indicator:53 INFO 5T:
    2017-06-07 13:40:00+00:00 357.627
    2017-06-07 13:45:00+00:00 358.550
    ... ...

One can see as I mentioned in my first post, the price at 13:40 (356.715) in the first block is not the same as in the second block (357.627). Can one use such a price series as history to calculate the indicator such as RSI?

  1. I further tried to calculate the RSI. But the RSI values at the beginning of everyday are 'nan'. It seems the "yesterday datas" are not considered.

Cheers

Clone Algorithm
15
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 593f87b6a5f3b46e6f072c9c
There was a runtime error.

Furthermore:

As I compare the values from resample() with that on the real chart I found something "interesting". In the price block lised in 1., the last 'close' (such as the 356.715 and 358.550) are exactly the 'close' at the 05, 10, 15 etc in 1-min-chart.

But the first 'close' (356.160 and 357.627) are exactly the 'close' at the 05, 10, 15 etc but in the 5-min-chart.

Here are the 'close' I wrote down from the real chart (TSLA on Jun.7th, 2017):

TIME     1m-chart    5m-chart  
-----------------------------  
09:30    357.69      356.16  
09:35    356.72      357.63  
09:40    358.55      357.50  
09:45    357.73      357.15  
10:00    358.39      357.96    

This means, the values from the resample() contains price history in two different timeframes. So if I take this as the history in talib.RSI() to calculate the indicator as in my attached algo, the results must wrong, right?

How can I split or extract the price history of 5-min-chart?

Hi Vladimir,

Thanks for the reply first. I've tried you code and have following questions:

  1. As I just print out the 1m_close (print prices_1m[-1]), it begins from 10:45. This means from 9:30 to 10:30 there is no data.
2017-06-07 16:45  PRINT 357.905  
2017-06-07 16:50  PRINT 360.411  
2017-06-07 16:55  PRINT 359.6  
2017-06-07 17:00  PRINT 358.715  
2017-06-07 17:05  PRINT 359.28  
2017-06-07 17:10  PRINT 359.248  
2017-06-07 17:15  PRINT 359.0  
...
...

Sorry for the inconvenience by the time stamp. I am in Germnay and in a different timezone.

  1. As I change the timeframe to 15 minues, it begins much later, from 11:15.
2017-06-07 19:15  PRINT 357.58  
2017-06-07 19:30  PRINT 357.16  
2017-06-07 19:45  PRINT 355.51  
2017-06-07 20:00  PRINT 357.485  
2017-06-07 20:15  PRINT 357.425  
2017-06-07 20:30  PRINT 357.622  
2017-06-07 20:45  PRINT 358.302  
...
...

  1. As I print out 5m_close (print ("5T:\n%s" %prices_5m)), which are the data history to calculate the RSI, I got these:
    ... 2017-06-07 16:45 PRINT 5T:
    2017-06-07 13:35:00+00:00 357.230
    2017-06-07 13:40:00+00:00 358.320
    2017-06-07 13:45:00+00:00 357.087
    2017-06-07 13:50:00+00:00 357.535
    2017-06-07 13:55:00+00:00 358.674
    2017-06-07 14:00:00+00:00 357.830
    2017-06-07 14:05:00+00:00 357.477
    2017-06-07 14:10:00+00:00 357.750
    2017-06-07 14:15:00+00:00 358.560
    2017-06-07 14:20:00+00:00 358.216
    2017-06-07 14:25:00+00:00 357.780
    2017-06-07 14:30:00+00:00 357.850
    2017-06-07 14:35:00+00:00 357.650
    2017-06-07 14:40:00+00:00 357.940
    2017-06-07 14:45:00+00:00 357.905
    Freq: 5T, Name: Equity(39840 [TSLA]), dtype: float64
    2017-06-07 16:50 PRINT 5T:
    2017-06-07 13:40:00+00:00 358.320
    2017-06-07 13:45:00+00:00 357.087
    2017-06-07 13:50:00+00:00 357.535
    2017-06-07 13:55:00+00:00 358.674
    2017-06-07 14:00:00+00:00 357.830
    2017-06-07 14:05:00+00:00 357.477
    2017-06-07 14:10:00+00:00 357.750
    2017-06-07 14:15:00+00:00 358.560
    2017-06-07 14:20:00+00:00 358.216
    2017-06-07 14:25:00+00:00 357.780
    2017-06-07 14:30:00+00:00 357.850
    2017-06-07 14:35:00+00:00 357.650
    2017-06-07 14:40:00+00:00 357.940
    2017-06-07 14:45:00+00:00 360.191
    2017-06-07 14:50:00+00:00 360.411
    ... ...

One can see for example, the close on '14:45:00+00:00 ' in the first block (357.905) is not equal to that in the second block (360.191).

Cheers

And when I print the RSI, it begins from 10:45. Before this there is Na. I couldn't image how to draw this on the chart?

I wonder if one should and how to consider/include the price data of "yesterday"? This also means, how to get a constantly moving price datas and therefor it's indicator value.

The '.resample' method fills in any data which it believes is missing in the original data. As long as the input data is a contiguous time series this isn't an issue. However, if input data spans multiple trading days, the '.resample' method will try to fill in all the non-trading times with data. This behavior can be seen in the output below


Timestamp('2017-06-01 19:00:00+0000', tz='UTC', offset='5T'): 8.38  
Timestamp('2017-06-01 19:05:00+0000', tz='UTC', offset='5T'): 8.385  
Timestamp('2017-06-01 19:10:00+0000', tz='UTC', offset='5T'): 8.38  
Timestamp('2017-06-01 19:15:00+0000', tz='UTC', offset='5T'): 8.38  
Timestamp('2017-06-01 19:20:00+0000', tz='UTC', offset='5T'): 8.37  
Timestamp('2017-06-01 19:25:00+0000', tz='UTC', offset='5T'): 8.37  
Timestamp('2017-06-01 19:30:00+0000', tz='UTC', offset='5T'): 8.387  
Timestamp('2017-06-01 19:35:00+0000', tz='UTC', offset='5T'): 8.37  
Timestamp('2017-06-01 19:40:00+0000', tz='UTC', offset='5T'): 8.375  
Timestamp('2017-06-01 19:45:00+0000', tz='UTC', offset='5T'): 8.37  
Timestamp('2017-06-01 19:50:00+0000', tz='UTC', offset='5T'): 8.38  
Timestamp('2017-06-01 19:55:00+0000', tz='UTC', offset='5T'): 8.395  
Timestamp('2017-06-01 20:00:00+0000', tz='UTC', offset='5T'): 8.37  
Timestamp('2017-06-01 20:05:00+0000', tz='UTC', offset='5T'): nan  
Timestamp('2017-06-01 20:10:00+0000', tz='UTC', offset='5T'): nan  
Timestamp('2017-06-01 20:15:00+0000', tz='UTC', offset='5T'): nan  
Timestamp('2017-06-01 20:20:00+0000', tz='UTC', offset='5T'): nan  
Timestamp('2017-06-01 20:25:00+0000', tz='UTC', offset='5T'): nan  
Timestamp('2017-06-01 20:30:00+0000', tz='UTC', offset='5T'): nan  
 ... a whole lot of times that I deleted...  
Timestamp('2017-06-02 13:15:00+0000', tz='UTC', offset='5T'): nan  
Timestamp('2017-06-02 13:20:00+0000', tz='UTC', offset='5T'): nan  
Timestamp('2017-06-02 13:25:00+0000', tz='UTC', offset='5T'): nan  
Timestamp('2017-06-02 13:30:00+0000', tz='UTC', offset='5T'): nan  
Timestamp('2017-06-02 13:35:00+0000', tz='UTC', offset='5T'): 8.55

Notice all the NaN values after trading hours 2017-06-01 20:05:00 and before trading hours 2017-06-02 13:30:00.

Since no data was provided to the '.resample' method for non-trading hours, it assumes NaNs for those times and outputs appropriately. In this case I used '.last()' so it outputted NaN for all the non-trading time values.

The above post stated at the beginning of everyday are 'nan'. It seems the "yesterday datas" are not considered. This isn't exactly true. Yesterdays data is considered it's just that the resample method inserted a lot more data points (for non-trading hours) into the dataset which probably were NaNs. One may want to drop these inserted rows before using the resampled data. A simple method is to use '.dropna'

trading_hours_only_data = resampled_data.dropna()

A BIG caution with this approach. It will also drop any trading data rows with NaNs too. Often this isn't a problem but it is something to consider.

I agree to what you said.

But how about the point I mentioned above ? Here again:

One can see for example, the close on '14:45:00+00:00 ' in the first block (357.905) is not equal to that in the second block (360.191).

@Thomas

I'm not sure how you are resampling and what data you are starting from. However, attached is a notebook which shows the behavior of the '.resample' method which I believe is consistent with what you are expecting.

Loading notebook preview...
Notebook previews are currently unavailable.

Many thanks!

Here is my algo again. It must be "clean" and "correct" now. :-/

Clone Algorithm
13
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 5940eab591e19975110e6b3f
There was a runtime error.

Why are you using last? Shouldn't it be first?