Back to Community
Pipeline data over the weekend?

In a notebook, I ran the same pipeline on Saturday and then again today. I'm pretty sure the values it provided for Monday changed.

I assume this is because the dataset has values for each day of the weekend, but pipeline only returns the most recent day. E.g. if you run a pipeline on a Monday, the value for that Monday are based on the data from Sunday. However, if you run pipeline on Saturday, the pipeline values for the next Monday are based on the data from the previous Friday.

It reminds me, I've seen a post about this before, but never saw a resolution.

As the market is not open on Sundays, the data that comes up on Mondays is usually pretty meaningless. Would it be possible to aggregate the weekend data so that come Monday morning pipeline outputs data that combines Friday+Saturday+Sunday? Or at least sticks with the Friday data instead of the Sunday data?

2 responses

Hi Viridian,

Your question about weekend data is a good one. The way that pipeline bins its data is actually more like 'sessions' rather than 'days' (despite the fact that we call them 'days' everywhere). Each session goes from 45 minutes before market open on one trading day (called the 'pipeline cutoff time') to 45 minutes before market open on the next day. For example, if you run a pipeline on a Monday and you have a factor with a window length of 1 (i.e. .latest), the pipeline will retrieve data for the latest session, which is actually the previous Friday at 8:45am (assuming US_EQUITIES domain) to the Monday at 8:45am. So if your factor is using something like daily close price, it will get the most recent pricing data from the aforementioned session, which will be the close price on Friday. It's worth noting that each element in the lookback window arrays you get in a pipeline factor corresponds to one of these sessions. This is why you only see 5 elements per week.

Things get a little more interesting if you use a dataset that is expected to have records dated on the weekend. There aren't many examples like that on Q, but the sentiment datasets and the Insider Transactions dataset are two examples of this. I'll start by focusing on the sentiment datasets because the Insider Transactions dataset gets a little more complicated!

The sentiment datasets sometimes get sentiment scores on the weekend. Just like with every other dataset, pipeline slots every data point into a 'session'. However, when there are multiple data points in one session, Pipeline always surfaces the one with the most recent asof_date. If an asset has 3 sentiment scores come in with asof_dates on Friday, Saturday, and Sunday, the Sunday score will be slotted into the array for the Friday @ 8:45am --> Monday @ 8:45am session.

For Insider Transactions, we do things a little differently since the dataset is actually implemented as a DataSetFamily. The best way to learn about how weekend data is handled in the case of Insider Transactions is to read the bottom section of the notebook posted in this thread (titled "Calendar days vs. trading days"). The fact that there could be multiple transactions per day for a single asset meant we needed a new type of API to enable people to aggregate the data into a single value per asset per day, since that is the format that pipeline computations expect as input.

Looking ahead a bit, we'd like to add support to pipeline to make it possible to express custom aggregations over data that doesn't naturally fit into the single value per asset per day model (really, this should be defined as single value per asset per session). I don't know exactly when this will happen, and to be honest, I think in future datasets rather than those that are already integrated, but I figured it was worth mentioning now to show that this is a problem we're thinking about, but haven't solved yet! I'm curious, is there a particular dataset you are looking at where you expect the custom aggregation of weekend data to be helpful?

Thanks for the great question!


Regarding your original observation, the pipeline cutoff times I mentioned above are when data for the session are 'locked in'. For instance, if you try to run a pipeline on Saturday, the most recent session is the Friday @ 8:45am --> Monday @ 8:45am session which is still current. It's possible (and in some cases, likely) that new data will come in for that session. After 8:45am on Monday, any new data will go into the next session. To be honest, I'm surprised that you were able to run a pipeline on Saturday with an end date on the next Monday (I'm assuming 'next' here based on your observation, let me know if my assumption is wrong). We should probably fire a warning if you run a pipeline in a session that hasn't ended yet to avoid non-deterministic results as you observed. I'll file an issue that the current behavior is confusing. Thanks for reporting it.


Let me know if any of the above was confusing or if you have any further questions. It's a tricky topic!

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

I'm surprised that you were able to run a pipeline on Saturday with an end date on the next Monday

I was surprised too. But I'm glad it worked, because I'm not sure I would have noticed this behavior otherwise. The pipeline output produced from Friday's session looked very good, however when I re-ran pipeline later and it used the Sunday session's data, it was low-quality output typical to see on Mondays.

I'm curious, is there a particular dataset you are looking at where you expect the custom aggregation of weekend data to be helpful?

For PsychSignals it would make a big difference, but I suspect Sentdex would also benefit. Here's my reasoning: Sundays typically aren't a very rich day for data, because not much breaking news comes out over the weekend and not many people are actively tweeting about stocks while the market is closed. Fridays, however, are an important day for gauging sentiment, because options typically expire on Fridays, and there is a lot of hijinks/manipulation/hedging around options expirations.

Having no way to access Friday sentiment data means we're losing a very rich session in favor of a very barren session. Fridays have potentially the richest data of any day of the week.

Are Friday and Saturday session data not included in SimpleMovingAverages either?