Back to Community
Building the Foundations for Hypothesis Testing

Hi Quantopian Community,

My name is Matt and I'm a research analyst here at Quantopian. My goal is to get compelling and accessible research into the hands of our community. So without further ado, here's the first (of many) posts. This post specifically focuses on how to generate data for testing hypotheses.

In many cases, gathering data is the hardest part of running an analysis. Understanding what sample selection, universe, and variables you're working with will set the foundation for the rest of your research. So whether you've been following the Quantpedia Series or want to learn how to conduct your own research, this tutorial will show you how to use Pipeline to extract the data you need.

The motivation for this post came from a research paper that I'm currently analyzing here at Quantopian. This research is an OOS implementation of Milian's paper, "Overreacting to a History of Underreaction" where Milian examines the well-known Post Earnings Announcement Drift Effect (PEAD for short). There, he suggests that the PEAD has been reversed in past years due to the overcrowding of arbitrageours invested in PEAD strategies. He finds that firms providing the biggest positive earnings announcement surprise are the ones that had significant negative returns shortly after the subsequent earnings announcement.

While the results of my OOS implementation will soon be published in the Quantpedia Series, this post takes you through the exact steps I used to meet the data requirements necessary to perform my analysis. Specifically, I show you how to:

  1. Run and query large batches of data through Pipeline
  2. Filter the Pipeline for specific time frames (corporate actions, earnings announcements, etc.)
  3. Use the Pipeline to generate forward looking returns
  4. Categorize securities into deciles based off previous earnings surprise per calendar quarter

Clone the notebook to get started.

Loading notebook preview...
Notebook previews are currently unavailable.

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

1 response


When attempting to run:

positions_data = split_run_pipeline(positions_pipeline, START, END, SPLITS)

I'm getting a ValueError, that I can't seem to reconcile. Any idea what I can do to address this error?

Here's the error in all of its glory:

ValueError: Bad response: Computation failed with message:
File "/home/databazaar/.venv/lib/python3.4/site-packages/blaze/server/", line 643, in compserver
File "/home/databazaar/.venv/lib/python3.4/site-packages/multipledispatch/", line 164, in call
return func(args, **kwargs)
File "/home/databazaar/.venv/lib/python3.4/site-packages/blaze/compute/", line 412, in compute
result = top_then_bottom_then_top_again_etc(expr3, d4, **kwargs)
File "/home/databazaar/.venv/lib/python3.4/site-packages/blaze/compute/", line 189, in top_then_bottom_then_top_again_etc
return top_then_bottom_then_top_again_etc(expr3, scope4, **kwargs)
File "/home/databazaar/.venv/lib/python3.4/site-packages/blaze/compute/", line 189, in top_then_bottom_then_top_again_etc
return top_then_bottom_then_top_again_etc(expr3, scope4, **kwargs)
File "/home/databazaar/.venv/lib/python3.4/site-packages/blaze/compute/", line 153, in top_then_bottom_then_top_again_etc
return compute_down(expr, *leaf_data, **kwargs)
File "/home/databazaar/.venv/lib/python3.4/site-packages/multipledispatch/", line 164, in call
return func(*args, **kwargs)
File "/home/databazaar/.venv/src/databazaar/databazaar/utils/", line 95, in compute_throttler
File "/home/databazaar/.venv/lib/python3.4/site-packages/multipledispatch/", line 164, in call
return func(
args, **kwargs)
File "/home/databazaar/.venv/lib/python3.4/site-packages/blaze/compute/", line 412, in compute
result = top_then_bottom_then_top_again_etc(expr3, d4, **kwargs)
File "/home/databazaar/.venv/lib/python3.4/site-packages/blaze/compute/", line 158, in top_then_bottom_then_top_again_etc
expr2, scope2 = bottom_up_until_type_break(expr, scope, **kwargs)
File "/home/databazaar/.venv/lib/python3.4/site-packages/blaze/compute/", line 301, in bottom_up_until_type_break
for i in inputs])
File "/home/databazaar/.venv/lib/python3.4/site-packages/blaze/compute/", line 301, in
for i in inputs])
File "/home/databazaar/.venv/lib/python3.4/site-packages/blaze/compute/", line 325, in bottom_up_until_type_break
File "/home/databazaar/.venv/lib/python3.4/site-packages/multipledispatch/", line 164, in call
return func(*args, **kwargs)
File "/home/databazaar/.venv/lib/python3.4/site-packages/blaze/compute/", line 1038, in compute_up
assert names == expr._child.fields