First, want to verify you are using the
EventVestor EarningsCalendar dataset. This seems the case.
So, there are four typical fields used in this dataset:
next_asof_date - datetime64[ns]
previous_asof_date - datetime64[ns]
next_announcement - datetime64[ns]
previous_announcement - datetime64[ns]
There's really just two pieces of data -
next_announcement. These are both dates. Each of these have an associated
asof_date. So, for example, every time a new 'earnings announcement' date is posted it will have a new associated 'asof_date' which is the date it was posted. Below are the four fields for AAPL during 2018.
PIPELINE DATE next_ann next_asof_date prev_ann prev_asof_date
2018-01-02 00:00:00+00:00 NaT NaT 2017-11-02 2017-10-04
2018-01-04 00:00:00+00:00 2018-02-01 2018-01-03 2017-11-02 2017-10-04
2018-02-01 00:00:00+00:00 2018-02-01 2018-01-03 2018-02-01 2018-01-03
2018-02-02 00:00:00+00:00 NaT NaT 2018-02-01 2018-01-03
2018-04-04 00:00:00+00:00 2018-05-01 2018-04-03 2018-02-01 2018-01-03
2018-05-01 00:00:00+00:00 2018-05-01 2018-04-03 2018-05-01 2018-04-03
2018-05-02 00:00:00+00:00 NaT NaT 2018-05-01 2018-04-03
2018-07-05 00:00:00+00:00 2018-07-31 2018-07-03 2018-05-01 2018-04-03
2018-07-31 00:00:00+00:00 2018-07-31 2018-07-03 2018-07-31 2018-07-03
2018-08-01 00:00:00+00:00 NaT NaT 2018-07-31 2018-07-03
2018-10-04 00:00:00+00:00 2018-11-01 2018-10-03 2018-07-31 2018-07-03
2018-11-01 00:00:00+00:00 2018-11-01 2018-10-03 2018-11-01 2018-10-03
2018-11-02 00:00:00+00:00 NaT NaT 2018-11-01 2018-10-03
2018-11-06 00:00:00+00:00 2019-01-31 2018-11-02 2018-11-01 2018-10-03
Since you are seeing the five dates 1/3/2018, 4/3/2018, 7/3/2018, 10/3/2018, 11/02/2018, you must be looking at the
next_asof_date field. This is probably not what want? I would think you want the actual dates (not the date when the company said they would make the announcement). In any case the issue and the fix would be the same.
One issue is that there can be many times a year which a company says they are going to release earnings then change their mind and release them on a different date. This will result in more than four dates a year. Additionally, there will typically be five or more
next_earnings dates in a single year - four for the current year and then one for the the following year. That's the situation with AAPL. On 11/02/2018 they stated they will announce their next earnings on 01/31/2019. That added a fifth post for an earnings announcement to 2018.
So, what to do? If one is looking for the most recent 4 earnings dates then something like this:
pipe_output['stock'] = pipe_output.index.get_level_values(level='1')
last_announcements = pipe_output.drop_duplicates(['stock','previous_announcement'], keep='last')
last_4_announcements = last_announcements.groupby(level='security').previous_announcement.nlargest(4)
What this does is first add a new column to the data frame which is a duplicate of the 'security' index. This just makes it easier to use the
drop_duplicates method. Then apply the
drop_duplicates method to get just the last row where the security and the
previous_announcement date are equal. Finally, group by security and take the largest 4
previous_announcementdates. Those will be the last four dates which each company actually announced earnings.
Hope that helps? See attached notebook.