I am observing another anomaly with this dataset these days. There were several times in the past few weeks that I noticed that some stocks in this dataset had their BusinessDaysSincePreviousEarnings wrong. I understand that this dataset is probably collected using crowd-sourcing and there are always chances of mistakes. But, the frequency of mistakes has changed so dramatically these days that I am starting to believe that there might be a systematic problem in this dataset. I have been working with this dataset for a year now and I never noticed this much mistakes.
For example, today (Feb 09) for the following stocks BusinessDaysSincePreviousEarnings was assigned to zero whereas they released their reports yesterday and BusinessDaysSincePreviousEarnings should have been 1:
NSIT, ZEN, CRAY, FTK, PAYC, BLKB, ALNY, TTMI, PPC, IMPV, PCMI.
Note that neither I have cherry-picked these stocks nor I have checked all the stocks reported today. I found these stocks almost by chance. That's why it's indicating a bigger problem in this dataset.
Here is the code that you can run to test it. Run the backtest for two days from Feb 08 to Feb 09:
from quantopian.pipeline import Pipeline
from quantopian.algorithm import attach_pipeline, pipeline_output
from quantopian.pipeline.data.eventvestor import EarningsCalendar
from quantopian.pipeline.factors.eventvestor import (
# Create and attach an empty Pipeline.
pipe = Pipeline()
pipe = attach_pipeline(pipe, name='pipeline')
# Construct Factors.
previous_earning = BusinessDaysSincePreviousEarnings()
recent_earning_report = (previous_earning < 1)
# Remove rows for which the Filter returns False.
def get_stocks(context, data):
results = pipeline_output('pipeline')
print results['pe'][symbols('NSIT', 'ZEN', 'CRAY', 'FTK', 'PAYC', 'BLKB', 'ALNY', 'TTMI', 'PPC', 'IMPV', 'PCMI')]