Back to Community
Data from Quantopian not matching other data sources

Hello,

I was writing some algos and couldn't figure out why even the most basic systems were not trading the way I thought they should.

After further examination i noticed two things. The first is that the data that im logging for OHLC is significantly off. More than I would expect even with exchanges reporting slightly different trade numbers. Below iv copied the daily OHLC from my log dated 9/1/17-9/28/17. I have put it next to data I copied from Nasdaq historical data.

Spreadsheet: Nasdaq vs Quantopian data

Nasdaq data: Nasdaq Historical Data

The Open data seems to be good, the High data is off by just a bit, but the Low and Close data are way off. The difference can be multiple points in some cases. I have also tested this with other sample dates as well as data from other sources and the results are just as bad. Some where even the Opens and Highs are off as well.

My second issue is that my data.history 1 minute log data is off by four hours. I thought the default time was EST but i am getting 4 hours ahead of this. Here is an example that should be logging OHLC 1m data from the last two minutes. In my log that should be 9:34 and 9:35. It is displaying 13:04 and 13:05 instead. As you can see though, it initiates the logging at 9:35 correctly, but then the time zone changes.

2017-09-25 09:35 Test:48 INFO 2017-09-25 13:34:00+00:00 2017-09-25 13:35:00+00:00
Equity(24 [AAPL]) 149.88 149.89
Equity(24 [AAPL]) 150.08 150.03
Equity(24 [AAPL]) 149.78 149.79
Equity(24 [AAPL]) 149.91 149.79

Can someone help me with this issue? I am not sure if it is something wrong with my code, or an issue with the back end of Quantopian.

Iv attached the simple code that I have been using to test this.

Clone Algorithm
1
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 59cec3d2b8f45e5068128efb
There was a runtime error.
3 responses

Hello Ryan,

Thanks for asking these questions. I believe I can help you out.

The reason that your prices appear to so different is that you're not looking at the full day's data in your algorithm. For instance, in line 36 you make your data call using

context.one_day_high= data.history(stock, 'high', day_bar_count,'1d')

and then in line 52 you set
day_high = context.one_day_high[-1].

You're running this code 5 minutes after the market opens. So, 5 minutes into the day, you're looking at "today's" price bar and recording the high - but that's the high of the day so_far. At 9:35AM Eastern, you can't know what the high of the day will be overall. What you need to do is log "yesterday's" bar, and compare that output to your other data sources, and you'll see them align much more closely.

As for the timestamps that you're confused by, those are coming out in UTC. 13:30 UTC = 9:30 EDT. You'll find that after daylight savings in the US that the logging will change to 14:30 UTC, or 9:30 EST.

A few tips for you to consider:

  • Remember that open, high, low, close, and volume fields are not forward filled. Particularly when you look at minute data, don't be surprised when you see NaNs.
  • Rather than write 4 lines of code to get your fields, you can request them all in one history call. It will be much more efficient. context.one_min_full= data.history(stock, fields=["open", "high", "low", "close"], bar_count=minute_bar_count, frequency="1m")
  • This kind of data exploration is probably easier to do in research than in an algorithm. See the attached notebook.
Loading notebook preview...
Notebook previews are currently unavailable.
Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Thanks for the reply Dan.

I thought that the 1 day data always called the bar before the same way the 1m data logs the data of the minute before it.

I put the schedule_function to maket_close and the data fits.

Thank you

It's probably better to think about it as "what do I know right now?" as opposed to "what was the last completed bar?" In all cases the simulation will give you all of the information that you could know at that point in time, but without giving any look-ahead bias. So, at 9:35:00 you can know the 1-minute bar that ended at 9:35:00.