Back to Community
Forward Fill in data.history vs pipeline using 1d vs 1m

Edited:

From Q/help: 'price' is forward filled. 'open', 'high', 'low', and 'close' return the relevant information for the current trade bar. If there is no current trade bar, NaN is returned. These fields are never forward-filled.

I've been tracking down what I thought was a bug in my code but I believe it may be a bug in pipeline output. Perhaps it is backtest only but I've distilled the issue down to this minimal backtest code.

I call data.history with '1d' and I compare it to daily pipeline output using USEquityPricing.close.latest.

If you look at appl (and others) in the logs you can see that the pipeline data is shifted by 1 day from the data.history value.

e.g.
2017-08-14 09:31 data.history['close'] = $159.023 ---> 2017-08-14 09:31 pipline['close'] = $157.49
2017-08-11 09:31 data.history['close'] = $157.49

Note the dates and how the pipeline output seems shifted.

Clone Algorithm
3
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 59937de941e63555061ca6b2
There was a runtime error.
7 responses

I don't know if that is the cause but when you call data.history at market open you get the close price at the end of 1 minute bar.

With the history call in before_trading_start they are the same. Someone from Quantopian covered this recently and I'm sorry I couldn't find it.

When data.history is called during the day its last value is the previous minute even when using '1d', daily values.
So in the code, df['close'][-2] matching pipeline for yesterday's close is right.
We can think of data.history with 1d as daily resolution including today so far, when the day isn't done yet, that is, not to be excluding today's trading.
The reason that still works at a scheduled market "open" is that the market has actually been open for a full 1 minute already, so history's [-1] is the close from that first minute.

Meanwhile 'price' is forward filled while 'close' is not, it can be nan at least using 1m, per minute. I wonder if there is ever a time with 1d history where a value can be nan, not a number, an entire day with no trades.

Edit: Yes.

    hst = data.history(symbol('FMBH'), ['price', 'close', 'volume'], 30, '1d')  
    log.info(' \n{}'.format(hst.tail()))

                           close  price   volume  
2018-04-19 00:00:00+00:00  35.69  35.69   1497.0  
2018-04-20 00:00:00+00:00  36.15  36.15  11715.0  
2018-04-23 00:00:00+00:00  36.46  36.46   6549.0  
2018-04-24 00:00:00+00:00  36.96  36.96  10748.0  
2018-04-25 00:00:00+00:00    NaN  36.96      0.0  

From Q/help:
"price" is always forward-filled. The other fields ("open", "high", "low", "close", "volume") are never forward-filled.

I would have expected to see NaN's on the current close when requesting daily history but it seems we do get the 1 minute close.

data.history has always returned the current minute value even at 1d frequency. I found this very convenient instead of returning NaN and having to call the function again in 1m frequency to get minute price. I believe convenience is the reason behind the api behaviour.

Other than the 1 day shift there are also differences in the split adjustment behaviour of pipeline vs history. Have a look at this NB

EDIT: the reason behind the 1 day shift is that pipeline is run every day after market close (precisely before market open of the next day) , so the data available to pipeline is yesterday data.

Loading notebook preview...
Notebook previews are currently unavailable.

This additional backtest will hopefully help someone in the future. I ran data.current(), data.history(frequency='1m'), data.history(frequency='1d') and pipeline before and after trading starts.

Case 1: Before trading starts

Calling data.current(), data.history(frequency='1m'), data.history(frequency='1d), and pipeline_output in before_trading_starts() returns price/close as expected. (e.g. close=close, price=price). Volume for '1m' = data.current() and is the last tick's volume. Volume for '1d' is the sum of the previous trading day's volume. For me this is intuitive.

Case 2: At initial market open - intuitive

data.current() = data.history(frequency='1m')[-1] and it the current price (forward filled) and previous close. All other data.history bars are previous ticks.

Case 3: Market open plus 30 minutes - still intuitive

data.current() = data.history(frequency='1m') and is the previous tick's OHLCV and the forward filled price

Case 4: At market open - The beware of case (daily history)

When you call data.history(frequency='1d') at market open you get the previous bar's 1m close value along with the forward filled price. So at the very moment the market opens data.history(frequency='1m') = data.history(frequency='1d') = data.current()

As time progresses data.history(frequency='1d') diverges and high marks the highest bar value prior to the current bar, low marks the lowest bar value, and volume is aggregated up to that point.

Q's help explicitly states that "price" is always forward-filled. The other fields ("open", "high", "low", "close", "volume") are never forward-filled.

For me this is a type of forward-filling but I do agree it can be helpful. You can probably glean this out of the doc's from a careful reading of how they handle the examples for volume but since most of the examples are forward filled price it is not obvious that tick data on a frequency of daily is returned for close/open while the high/low gets the max/min of all previous bars up to that point.

Perhaps a better statement in help would be:
data.history() with frequency=daily doesn't honor the never forward-filled contract for OHLCV and instead returns the close/open of the previous minute bar, the max/min as high/low of all previous bars, and aggregated volume for all bars within the current trading day.

I don't think it is correct to say OHLCV is NOT forward-filled when frequency='1d'.

Case 5: At market open - Pipeline call - also beware of

Now we come to the reason I thought this was a bug. If you look at the description for the US Equity Pricing database it says it contains minute data for OHLCV (not price). But still, minute level data.

So, what do you think happens if you call pipeline data for USEquityPricing during the trading day? If you guessed it would behave like data.history() you're wrong (or in my case I thought data.history should behave like pipeline data).

In fact when you call pipeline at market open you get the previous day's close and the previous day's total volume and not tick data. (e.g. this pipeline call exactly matches the before_trading_starts pipeline call). For me this is the definition of not forward filled.

Why is this important?

This introduced a very subtle issue for me where two dataframes with exactly the same time-stamped index had different values for "close". With everything trading on "daily" frequency this was very hard to track down.

Net

USEquityPricing always returns end of previous day data no matter what time it is called from pipeline (at least as far as I can tell).

data.history(frequency='1d') is previous tick's value for O/C for the current trading day, the max/min of all previous bars for H/L within the current trading day, and the aggregated value for Volume for the current trading day.

This backtest contains the log for apple called 30 minutes after trading starts. It is also helpful to remove minutes=30 and compare the results.

Cheers,
John

Clone Algorithm
3
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 59948afbf17822510a74662f
There was a runtime error.

@Aqua @Blue @Luca thanks for you replies. They were most helpful.

@Luca the notebook about splits in pipeline vs history is very useful and I'm sure saved me a future travail.

Thanks for that backtest @John Glossner. I started there wanting to see all of the combinations together with a focus on stock splits, and there's some extra stock split info in the code. I didn't include volume but it could be added by anyone interested.

bts - before_trading_start
open - market open
day - during the day, 30 minutes after open
pipe - pipeline_output()
currnt - data.current(s, 'price')
h_1m - data.history(s, 'price', 1, '1m')[-1]
h_1d - data.history(s, 'price', 1, '1d')[-1]
sma - SimpleMovingAverage(inputs=[USEquityPricing.close], window_length=22, mask=m)

Example out. I think this would wrap so I'm going to do so manually, see also the top of the code in comments, it is NFLX 1:7, while this is AAPL on Monday 2014-06-09.

2014-06-06 07:01 _:105 INFO  
                          bts  
       sma      pipe  currnt    h_1m    h_1d  
     -------  ------------------------------  
AAPL  610.61  647.35  647.35  647.35  647.35

                          open                              day  
                pipe  currnt    h_1m    h_1d      pipe  currnt    h_1m    h_1d  
              ------------------------------    ------------------------------  
              647.35  649.39  649.39  649.39    647.35  647.86  647.86  647.86


2014-06-09 07:01 _:105 INFO  
                          bts  
       sma      pipe  currnt    h_1m    h_1d  
     -------  ------------------------------  
AAPL   87.59   92.23   92.23   92.23   92.23

                          open                              day  
                pipe  currnt    h_1m    h_1d      pipe  currnt    h_1m    h_1d  
              ------------------------------    ------------------------------  
               92.23   92.72   92.72   92.72     92.23   92.31   92.31   92.31  
Clone Algorithm
1
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 5aff920f6b5a4543369d730d
There was a runtime error.