Back to Community
Event Study Tearsheet

Updated 6/16 - The Event Study notebook has been completely revamped with help from Luca one of our community members. It's also been modularized, documented, and cleaned so you can tweak it to your specifications.

To run it you call run_event_study with these parameters:

def run_event_study(event_data, date_column='asof_date',  
                    start_date='2007-01-01', end_date='2014-01-01',  
                    benchmark_sid='SPY', days_before=10, days_after=10, top_liquid=500,  
                    use_liquid_stocks=True):  
    """  
    Calculates simple & cumulative returns for events and plots stock price movement  
    before and after the event date.  
    Parameters  
    ----------  
    event_data : pd.DataFrame  
        DataFrame that contains the events data with date and sid columns as  
        a minimum. See interactive tutorials on quantopian.com/data  
    date_column : String  
        String that labels the date column to be used for the event. e.g. `asof_date`  
    start_date, end_date : Datetime  
        Start and end date to be used for the cutoff for the evenet study  
    benchmark : string, int, zipline.assets._assets.Equity object  
        Security to be used as benchmark for returns calculations. See `get_returns`  
    days_before, days_after : int  
        Days before/after to be used to calculate returns for.  
    top_liquid : Int  
        If use_liquid_stocks is True, top_liquid determines the top X amount of stocks  
        to return ranked on liquidity  
    use_liquid_stocks : Boolean  
        If set to True, it will filter out any securities found in `event_data`  
        according to the filters found in `filter_universe`  
    """  

This event study looks at buyback announcements.

Loading notebook preview...
Notebook previews are currently unavailable.
Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

21 responses

Seong,

Thank you for your NB, I used it in here for evaluation of technical patterns. I cleaned and packed your code in a single function, so that it can be easily used elsewhere. Hopefully this might be useful to other people.

Also, I'd like to share with you two shortcomings I found in the NB (that I fixed):

1 - The standard deviation bars should be calculated from the event date, not from the beginning of the cumulative return series. The standard deviation shouldn't be dependent on how far in the past we decide to plot the cumulative series. We are interested in what happens AFTER the event and so the event date should be the date when we start calculating of how far the prices deviate from each others.
To fix that, I vertically shifted the cumulative plot so that it equals 0 at day 0 (I only modified get_returns function to achieve that). This has the side effect of making very easy the comparison between returns vs abnormal returns plots.

2 - I believe we should use open price instead of close price for the event study. This is because a hypothetical algorithm would place orders at market open (not after the market closes anyway). So it makes more sense to study the change of open prices.

Please let me know what you think.

Note: the NB begins with an explanation of the technical patterns we want to study. The event study is at the bottom of the NB and it is independent of the initial part of the NB so that it can be moved to other NBs.

Loading notebook preview...
Notebook previews are currently unavailable.

Luca,

Thank you for 'bumping' this post up into the recent category. Anyone and everyone interesting in technical trading should study this notebook and use it as a platform to test the efficacy of potential technical signal ideas. Also, very good catch about using opening prices. Small change/impact but it's all about the details. Also, I agree your std dev calculation calculation is more correct. The cleaned up code is excellent.

Seong, it's been awhile since you originally posted this. I was surprised there hadn't been any replies before now. This is a gem. You deserve more than kudos!

This post should be tagged 'interesting'.

Seong, another point to add to the discussion, beta calculation. The calc_beta function uses numpy.polyfit to calculate Beta but I tried to swap that with scipy.stats.linregress (it is used in the new pipeline factor RollingLinearRegressionOfReturns to calculate Beta, so I assumed it was well tested).

The results are not as similar as I hoped.

I guess Q has a lot of experience in calculating Beta, could you suggest the best method to use?

def calc_beta(...):  
    [...]  
    m, b = np.polyfit(bench_prices, stock_prices, 1)  
    return m  

vs

def calc_beta(...):  
    [...]  
    regr_results = scipy.stats.linregress(y=stock_prices, x=bench_prices)  
    #alpha = regr_results[1]  
    beta = regr_results[0]  
    #r_value = regr_results[2]  
    #p_value = regr_results[3]  
    #stderr = regr_results[4]  
    return beta  

Hi Luca,

I'm not too familiar in the distinctions between the two, but I would suggest using linregress as from my reading it provides a number of additional statistics and speed improvements over polyfit.

This work is awesome, by the way.

Seong

Ok then, here is a new version with the updated beta calculation

Loading notebook preview...
Notebook previews are currently unavailable.

Updated with a much more memory efficient version that will run across a longer timespan.

Luca,

I've taken some of your changes along with the original framework and tried cleaning up the code quite a bit. I've also included a new, more efficient way of calculating price returns by using a cumulative price change versus individual (x - y)/x calculations.

To run it you call run_event_study with these parameters:

def run_event_study(event_data, date_column='asof_date',  
                    start_date='2007-01-01', end_date='2014-01-01',  
                    benchmark_sid='SPY', days_before=10, days_after=10, top_liquid=500,  
                    use_liquid_stocks=True):  
    """  
    Calculates simple & cumulative returns for events and plots stock price movement  
    before and after the event date.  
    Parameters  
    ----------  
    event_data : pd.DataFrame  
        DataFrame that contains the events data with date and sid columns as  
        a minimum. See interactive tutorials on quantopian.com/data  
    date_column : String  
        String that labels the date column to be used for the event. e.g. `asof_date`  
    start_date, end_date : Datetime  
        Start and end date to be used for the cutoff for the evenet study  
    benchmark : string, int, zipline.assets._assets.Equity object  
        Security to be used as benchmark for returns calculations. See `get_returns`  
    days_before, days_after : int  
        Days before/after to be used to calculate returns for.  
    top_liquid : Int  
        If use_liquid_stocks is True, top_liquid determines the top X amount of stocks  
        to return ranked on liquidity  
    use_liquid_stocks : Boolean  
        If set to True, it will filter out any securities found in `event_data`  
        according to the filters found in `filter_universe`  
    """  

While I'll either be updating this thread or creating a new one to feature the notebook, here it is looking at Buyback Announcements.

Loading notebook preview...
Notebook previews are currently unavailable.

Beautiful, clean, modular code. I am impressed. I especially like the liquid stock filtering, that is a very useful addition.

I found some serious bugs though (refactoring without unit testings is always a pain in the neck, I know). If you upload the project on github I can submit my fixes there.

Well, here is the new code anyway. it is very likely I made some mistakes too so if someone feels like reviewing it, that would be nice.

List of changes:

get_cum_returns:

  • days_before/days_after arguments should be considered trading days, not calendar days (but you might disagree on this) otherwise each event might get a cumulative return series of different length (depending on how many "non trading" days are included in days_before/days_after)

  • bug fix: when sid == benchmark the function breaks (and it also returns a DataFrame instead of a Series, breaking the caller code)

  • bug fix: abnormal returns were calculated subtracting the benchmark cumulative returns from sid cumulative returns but we should subtract the daily returns and build the cumulative returns after that

  • bug fix: cumulative return series calculation from price series is wrong:

cum_returns = price.pct_change().cumsum().fillna(0)   # wrong  
cum_returns = (price.pct_change().fillna(0) + 1).cumprod() - 1 # correct  

get_returns:

  • start_date/end_date (date range used to load the daily price) calculation is wrong and we end up having less days of data than required by get_cum_returns

  • valid_sids should be populated even when use_liquid_stocks is False, otherwise we'll discard all the events in run_event_study function

  • when calling get_price we have to avoid sid duplicates (e.g. when sid is benchmark) otherwise the returned dataframe will contain duplicated columns and that would break get_cum_returns

  • benchmark should be passed to get_cum_returns too

run_event_study:

  • benchmark must be a sid/Equity but not a string as it will be used to index the price DataFrame

filter_universe:

  • added option to filter by price and volume too (this is something I needed fro my research but I think it's a common requirement too)
Loading notebook preview...
Notebook previews are currently unavailable.

Thanks Luca. This is great. I'll update the main post with your changes after reviewing.

Do you mind summarizing a bit the differences between event study 1 and 2?

The event study 2 is what I was working on (event study applied technical pattern). I replaced the old event study code with the one in your new NB and I noticed something was wrong so I fixed it. Finally I added event study 1 that is your original case study to make the comparison easier

Gotcha, thanks for providing that.

I've been unsuccessful in trying to speed up the run-time when using the use_liquid_stocks flag. At the moment, we're running a pipeline output (filtered for security) per each event and in any given run (10,000 max events if using interactive), it takes 1+ hour to run.

Any thoughts on a better way to approach this?

Seong

Luca,

I've tried making some speed improvements that allow you to use the use_liquid_stocks flag. Now, instead of running a pipeline output everyday, you only run it once over a 252 day time window.

This looks at negative earnings surprises over 2013 ~ 2014 with a surprise < -50% (So really bad surprises) for the top 500 most liquid securities. The general observation is that the market seems to revert in the initial few days but am planning on running this over a longer time period to see how it'll hold up.

Loading notebook preview...
Notebook previews are currently unavailable.

Running pipeline once per event study (in this case once per year) seems a reasonable workaround. I wonder if "run_pipeline" has some warming up delay as it happens when backtesting. That would explain the slowness. A possible solution would be to run pipeline for the whole event study date range and to save the resulting dataframe to perform daily top volume stocks filtering.

Hi, I made the processing run about ten times faster (from 1000 seconds to 100 seconds for 2016) by grouping by date and calling get_price for all
symbols on the date.
Can we setup source control for this code? May be in alphalens or gist.github.com
Can notebooks be diff-ed? If not it will be a .py file instead.

Ph Quam

Would you mind posting the notebook here? I'm not aware of a code diff for ipython notebooks but can review and replace the main one in this thread with yours after some feedback

Seong

@Seong Lee
I have also implemented grouping by several weeks. Now there are 3 different implementations of get_returns() under the same name.
calc_beta is not used correctly - the same issue as std() above. Beta calculation should not include days_before.
I even doubt if beta should be calculated per sid of for the whole factor returns. My reasoning is that we compare a tested strategy returns against the returns of Buy and Hold strategy on the benchmark instrument. The dates on which the factor generates buying signals are part of the tested factor and they should not affect the performance of the benchmark.

Loading notebook preview...
Notebook previews are currently unavailable.

There was a minor bug in my code - global varible df was used instead of event_data parameter.
I added the notebook in my alphalens fork https://github.com/volconst/alphalens/commits/master/alphalens/event_study_nb.py
The fix is there and I will commit more changes.

This line in get_returns():
prices = prices.shift(-1)
looked suspicious to me before and now I have a test that shows it is bad.
I create a single event with 2 prices: 1 and 1.2 and expect to see relative change of 0.2 = (1.2-1)/1 at day 1. However this line leads to a change -0.16 in day 1. I am attaching the test based on the original notebook from this thread, but will post the fix based on the HEAD of my branch which contains all the performance optimizations I have added.

Loading notebook preview...
Notebook previews are currently unavailable.

This is without shifting https://github.com/volconst/alphalens/commit/1a7479d099887ecdf8bc9e357e22af0a1db26624
The good thing is that now there is more exploitable price movement at day 0, more in 2014 than 2013 - can this be due to data being recorded and
delivered faster or more correctly?

Loading notebook preview...
Notebook previews are currently unavailable.

@Seong (and Luca and Ph Quam,

Thanks for this! It's really great! I have two simple questions...

  1. Are days before and days after calendar days (including weekends, holidays, etc) or are they only days the markets are open?
  2. Which is the best notebook to clone? There seem to be several referenced here, each having a different bug or fix (open vs closing price, global var vs local var). I'm not sure if I should use the first or the last notebook.

Thanks!

@Brett

  1. Those are only days the markets are open. The code looks at the previous/next price information provided, not the previous/next calendar day.
  2. I would go with Seong's version as it was updated several times but you might like to check the changes introduced by Ph Quam to increase the speed.

As a side note, when Alphalens is updated on Quantopian, we will be able to use the event study embedded in there. Have a look at the example.