Event Study Tearsheet

Updated 6/16 - The Event Study notebook has been completely revamped with help from Luca one of our community members. It's also been modularized, documented, and cleaned so you can tweak it to your specifications.

To run it you call run_event_study with these parameters:

def run_event_study(event_data, date_column='asof_date',
start_date='2007-01-01', end_date='2014-01-01',
benchmark_sid='SPY', days_before=10, days_after=10, top_liquid=500,
use_liquid_stocks=True):
"""
Calculates simple & cumulative returns for events and plots stock price movement
before and after the event date.
Parameters
----------
event_data : pd.DataFrame
DataFrame that contains the events data with date and sid columns as
a minimum. See interactive tutorials on quantopian.com/data
date_column : String
String that labels the date column to be used for the event. e.g. asof_date
start_date, end_date : Datetime
Start and end date to be used for the cutoff for the evenet study
benchmark : string, int, zipline.assets._assets.Equity object
Security to be used as benchmark for returns calculations. See get_returns
days_before, days_after : int
Days before/after to be used to calculate returns for.
top_liquid : Int
If use_liquid_stocks is True, top_liquid determines the top X amount of stocks
to return ranked on liquidity
use_liquid_stocks : Boolean
If set to True, it will filter out any securities found in event_data
according to the filters found in filter_universe
"""


This event study looks at buyback announcements.

164
Notebook previews are currently unavailable.
Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

27 responses

Seong,

Thank you for your NB, I used it in here for evaluation of technical patterns. I cleaned and packed your code in a single function, so that it can be easily used elsewhere. Hopefully this might be useful to other people.

Also, I'd like to share with you two shortcomings I found in the NB (that I fixed):

1 - The standard deviation bars should be calculated from the event date, not from the beginning of the cumulative return series. The standard deviation shouldn't be dependent on how far in the past we decide to plot the cumulative series. We are interested in what happens AFTER the event and so the event date should be the date when we start calculating of how far the prices deviate from each others.
To fix that, I vertically shifted the cumulative plot so that it equals 0 at day 0 (I only modified get_returns function to achieve that). This has the side effect of making very easy the comparison between returns vs abnormal returns plots.

2 - I believe we should use open price instead of close price for the event study. This is because a hypothetical algorithm would place orders at market open (not after the market closes anyway). So it makes more sense to study the change of open prices.

Please let me know what you think.

Note: the NB begins with an explanation of the technical patterns we want to study. The event study is at the bottom of the NB and it is independent of the initial part of the NB so that it can be moved to other NBs.

34
Notebook previews are currently unavailable.

Luca,

Thank you for 'bumping' this post up into the recent category. Anyone and everyone interesting in technical trading should study this notebook and use it as a platform to test the efficacy of potential technical signal ideas. Also, very good catch about using opening prices. Small change/impact but it's all about the details. Also, I agree your std dev calculation calculation is more correct. The cleaned up code is excellent.

Seong, it's been awhile since you originally posted this. I was surprised there hadn't been any replies before now. This is a gem. You deserve more than kudos!

This post should be tagged 'interesting'.

Seong, another point to add to the discussion, beta calculation. The calc_beta function uses numpy.polyfit to calculate Beta but I tried to swap that with scipy.stats.linregress (it is used in the new pipeline factor RollingLinearRegressionOfReturns to calculate Beta, so I assumed it was well tested).

The results are not as similar as I hoped.

I guess Q has a lot of experience in calculating Beta, could you suggest the best method to use?

def calc_beta(...):
[...]
m, b = np.polyfit(bench_prices, stock_prices, 1)
return m


vs

def calc_beta(...):
[...]
regr_results = scipy.stats.linregress(y=stock_prices, x=bench_prices)
#alpha = regr_results[1]
beta = regr_results[0]
#r_value = regr_results[2]
#p_value = regr_results[3]
#stderr = regr_results[4]
return beta


Hi Luca,

I'm not too familiar in the distinctions between the two, but I would suggest using linregress as from my reading it provides a number of additional statistics and speed improvements over polyfit.

This work is awesome, by the way.

Seong

Ok then, here is a new version with the updated beta calculation

6
Notebook previews are currently unavailable.

Updated with a much more memory efficient version that will run across a longer timespan.

Luca,

I've taken some of your changes along with the original framework and tried cleaning up the code quite a bit. I've also included a new, more efficient way of calculating price returns by using a cumulative price change versus individual (x - y)/x calculations.

To run it you call run_event_study with these parameters:

def run_event_study(event_data, date_column='asof_date',
start_date='2007-01-01', end_date='2014-01-01',
benchmark_sid='SPY', days_before=10, days_after=10, top_liquid=500,
use_liquid_stocks=True):
"""
Calculates simple & cumulative returns for events and plots stock price movement
before and after the event date.
Parameters
----------
event_data : pd.DataFrame
DataFrame that contains the events data with date and sid columns as
a minimum. See interactive tutorials on quantopian.com/data
date_column : String
String that labels the date column to be used for the event. e.g. asof_date
start_date, end_date : Datetime
Start and end date to be used for the cutoff for the evenet study
benchmark : string, int, zipline.assets._assets.Equity object
Security to be used as benchmark for returns calculations. See get_returns
days_before, days_after : int
Days before/after to be used to calculate returns for.
top_liquid : Int
If use_liquid_stocks is True, top_liquid determines the top X amount of stocks
to return ranked on liquidity
use_liquid_stocks : Boolean
If set to True, it will filter out any securities found in event_data
according to the filters found in filter_universe
"""


While I'll either be updating this thread or creating a new one to feature the notebook, here it is looking at Buyback Announcements.

9
Notebook previews are currently unavailable.

Beautiful, clean, modular code. I am impressed. I especially like the liquid stock filtering, that is a very useful addition.

I found some serious bugs though (refactoring without unit testings is always a pain in the neck, I know). If you upload the project on github I can submit my fixes there.

Well, here is the new code anyway. it is very likely I made some mistakes too so if someone feels like reviewing it, that would be nice.

List of changes:

get_cum_returns:

• days_before/days_after arguments should be considered trading days, not calendar days (but you might disagree on this) otherwise each event might get a cumulative return series of different length (depending on how many "non trading" days are included in days_before/days_after)

• bug fix: when sid == benchmark the function breaks (and it also returns a DataFrame instead of a Series, breaking the caller code)

• bug fix: abnormal returns were calculated subtracting the benchmark cumulative returns from sid cumulative returns but we should subtract the daily returns and build the cumulative returns after that

• bug fix: cumulative return series calculation from price series is wrong:

cum_returns = price.pct_change().cumsum().fillna(0)   # wrong
cum_returns = (price.pct_change().fillna(0) + 1).cumprod() - 1 # correct


get_returns:

• start_date/end_date (date range used to load the daily price) calculation is wrong and we end up having less days of data than required by get_cum_returns

• valid_sids should be populated even when use_liquid_stocks is False, otherwise we'll discard all the events in run_event_study function

• when calling get_price we have to avoid sid duplicates (e.g. when sid is benchmark) otherwise the returned dataframe will contain duplicated columns and that would break get_cum_returns

• benchmark should be passed to get_cum_returns too

run_event_study:

• benchmark must be a sid/Equity but not a string as it will be used to index the price DataFrame

filter_universe:

• added option to filter by price and volume too (this is something I needed fro my research but I think it's a common requirement too)
14
Notebook previews are currently unavailable.

Thanks Luca. This is great. I'll update the main post with your changes after reviewing.

Do you mind summarizing a bit the differences between event study 1 and 2?

The event study 2 is what I was working on (event study applied technical pattern). I replaced the old event study code with the one in your new NB and I noticed something was wrong so I fixed it. Finally I added event study 1 that is your original case study to make the comparison easier

Gotcha, thanks for providing that.

I've been unsuccessful in trying to speed up the run-time when using the use_liquid_stocks flag. At the moment, we're running a pipeline output (filtered for security) per each event and in any given run (10,000 max events if using interactive), it takes 1+ hour to run.

Any thoughts on a better way to approach this?

Seong

Luca,

I've tried making some speed improvements that allow you to use the use_liquid_stocks flag. Now, instead of running a pipeline output everyday, you only run it once over a 252 day time window.

This looks at negative earnings surprises over 2013 ~ 2014 with a surprise < -50% (So really bad surprises) for the top 500 most liquid securities. The general observation is that the market seems to revert in the initial few days but am planning on running this over a longer time period to see how it'll hold up.

20
Notebook previews are currently unavailable.

Running pipeline once per event study (in this case once per year) seems a reasonable workaround. I wonder if "run_pipeline" has some warming up delay as it happens when backtesting. That would explain the slowness. A possible solution would be to run pipeline for the whole event study date range and to save the resulting dataframe to perform daily top volume stocks filtering.

Hi, I made the processing run about ten times faster (from 1000 seconds to 100 seconds for 2016) by grouping by date and calling get_price for all
symbols on the date.
Can we setup source control for this code? May be in alphalens or gist.github.com
Can notebooks be diff-ed? If not it will be a .py file instead.

Ph Quam

Would you mind posting the notebook here? I'm not aware of a code diff for ipython notebooks but can review and replace the main one in this thread with yours after some feedback

Seong

@Seong Lee
I have also implemented grouping by several weeks. Now there are 3 different implementations of get_returns() under the same name.
calc_beta is not used correctly - the same issue as std() above. Beta calculation should not include days_before.
I even doubt if beta should be calculated per sid of for the whole factor returns. My reasoning is that we compare a tested strategy returns against the returns of Buy and Hold strategy on the benchmark instrument. The dates on which the factor generates buying signals are part of the tested factor and they should not affect the performance of the benchmark.

4
Notebook previews are currently unavailable.

There was a minor bug in my code - global varible df was used instead of event_data parameter.
I added the notebook in my alphalens fork https://github.com/volconst/alphalens/commits/master/alphalens/event_study_nb.py
The fix is there and I will commit more changes.

This line in get_returns():
prices = prices.shift(-1)
looked suspicious to me before and now I have a test that shows it is bad.
I create a single event with 2 prices: 1 and 1.2 and expect to see relative change of 0.2 = (1.2-1)/1 at day 1. However this line leads to a change -0.16 in day 1. I am attaching the test based on the original notebook from this thread, but will post the fix based on the HEAD of my branch which contains all the performance optimizations I have added.

1
Notebook previews are currently unavailable.

This is without shifting https://github.com/volconst/alphalens/commit/1a7479d099887ecdf8bc9e357e22af0a1db26624
The good thing is that now there is more exploitable price movement at day 0, more in 2014 than 2013 - can this be due to data being recorded and
delivered faster or more correctly?

6
Notebook previews are currently unavailable.

@Seong (and Luca and Ph Quam,

Thanks for this! It's really great! I have two simple questions...

1. Are days before and days after calendar days (including weekends, holidays, etc) or are they only days the markets are open?
2. Which is the best notebook to clone? There seem to be several referenced here, each having a different bug or fix (open vs closing price, global var vs local var). I'm not sure if I should use the first or the last notebook.

Thanks!

@Brett

1. Those are only days the markets are open. The code looks at the previous/next price information provided, not the previous/next calendar day.
2. I would go with Seong's version as it was updated several times but you might like to check the changes introduced by Ph Quam to increase the speed.

As a side note, when Alphalens is updated on Quantopian, we will be able to use the event study embedded in there. Have a look at the example.

Hi Luca!

Why is it that all the graphs showing cumulative abnormal returns before and after are at zero on the event date? Is that because the graph is showing two CARs on one graph? The CAR for before the event and the CAR for after the event?

THanks

The returns are 0 at event date because the plots consider the event date the reference and computes the previous and following returns relative to that date. In this way you can easily see the percentage gain/loss from the event date.

Thanks Luca. That makes sense but I haven't seen any other event studies do that. I am not an event study expert though :)

Well, there might be other interesting ways of displaying that information. What would you have expected from that plot?

I think this is probably the best way of viewing the data for the purpose of finding events that result in positive CAR. Like I said I'm new to event studies and my initial investigation has lead me to more generic types of event studies.

As an example, https://www.eventstudytools.com/event-study-methodology

These tools show the CAR for the whole event window.

That said, the display in this notebook seems pretty good for purposes here.

There's an event study built into AlphaLens. It might be helpful.


"""
create_full_tear_sheet(factor_data, long_short=False, group_neutral=False, by_group=False)
create_event_returns_tear_sheet(factor_data, prices, avgretplot=(3, 11),
long_short=False, group_neutral=False, by_group=False)
factor_data : pd.DataFrame - MultiIndex
A MultiIndex DataFrame indexed by date (level 0) and asset (level 1),
containing the values for a single alpha factor, forward returns for
each period, the factor quantile/bin that factor value belongs to,
and (optionally) the group the asset belongs to.
- See full explanation in utils.get_clean_factor_and_forward_returns
long_short : bool
Should this computation happen on a long short portfolio? if so, then
mean quantile returns will be demeaned across the factor universe.
Additionally factor values will be demeaned across the factor universe
when factor weighting the portfolio for cumulative returns plots
group_neutral : bool
Should this computation happen on a group neutral portfolio? if so,
returns demeaning will occur on the group level.
Additionally each group will weight the same in cumulative returns
plots
by_group : bool
If True, display graphs separately for each group.
"""