OK, I have a handle on it now.
Here's what we did originally with this data from Quandl.
For any data set, we have two segments of data: data that we are loading to reflect history and the data that we process on an ongoing basis.
As i mentioned above, the
asof_date field is the datetime that we get from the data itself. It typically is the "date to which the record applies". So if it has an
asof_date of October 1, then for VIX that is the volatility metrics (open, close, etc) for the day of October 1.
We also have the
timestamp field. The timestamp field indicates the time at which this data is actually available to an algorithm. This field is meant to help us prevent look ahead bias. So if you have data about October 1, but it takes 10 days for Quandl (or the CBOE or whoever) to publish the data and for Quantopian to process it, then the timestamp field will be October 11.
For new data points that we load each day, the timestamp field is set to be the value the data actually gets stored and is available through the Quantopian API. We have actual values.
But for historical data that we loaded initially, we don't know what that value would have been, back in 2002 or 2010 or 2013. For that historical data that we loaded, we need to assign a lag between the asof_date and the timestamp. In the case of these volatility data sets, we ran the processing for some time and then calculated the mean. Over that time period, we observed a 44 hour lag, on average and therefore set the difference between the provided asof_date and timestamp to be 44 hours.
We did this calculation for each dataset from Quandl so the actual lag varies from set to set. Some of the macro economic datasets have a 7 day lag (like the ADP employment data). Others are shorter (like Yahoo's VIX).
The conclusion is that the data, from the sample we took initially, indicated that the CBOE data from Quandl wasn't available prior to open -- it was afterwards and therefore the cause of the lag in the data availability in your backtests in 2010.
That said, we can look back at the data now that we've processed more data and reconsider the lag we provide on that historical data. You can check out the latest real lag by examining asof_date and timestamp for these data sets. My quick examination this morning led me to think that the lag in more recent data records has gone down.
Hope this helps explain the behavior.