Back to Community
Earnings drift with Estimize

UPDATE 08/25/2016: We've recently found a technical issue with the Estimize Consensus Estimates data feed through Pipeline and are working to resolve it. A similar strategy using the Street's Earnings Consensus by Zacks can be found here and the strategy below has been updated with that backtest until a solution is fully implemented.

UPDATE 02/16/2016: This algorithm now uses the Estimize Consensus Estimates found through the Pipeline API and a new backtest is attached.

Earnings estimates (earnings per share or EPS) and revenue estimates are heavily used in both quant and fundamental stock analysis as forward-looking indicators of stock performance and sources of alpha. Traditionally, estimates are given by sell-side analysts on Wall Street and are then aggregated and averaged into what's commonly referred to as "the Wall Street Consensus" or simple "the street's" expectations. Starting in 2011 however, the fintech startup Estimize launched a new platform allowing anyone on the web to share their own earnings and revenue estimates. Website visitors and contributors can browse the estimates submitted by other users.

At our NYC meetup, I presented some validation work on claims made in a recent Estimize whitepaper with the goal of replicating results in the new Quantopian Research Platform and providing a basis for future work. You can find slides from the event on this slideshare.

I confirmed the finding that there is potential for crowdsourced earnings data to be an interesting new source of alpha, especially given that current earnings surprise strategies are almost exclusively based off the Wall Street Consensus.

Sample Trading Strategy

Here I'm sharing a basic PEAD (Post Earnings Announcement Drift) Strategy which goes long (short) companies whose actual earnings announcements beat (miss) expectations, also knows as a positive (negative) earnings "surprise".

I've tried to make the code easy to read and modify, almost everything you might want to change can be done by fiddling with the values in initialize(context). Below are some notes on the strategy so have fun and post here with any questions or insights you'd like to share!

Strategy Notes

  • Data set: The full dataset used is Estimize's Consensus Estimates and EventVestor's Earnings Calendar dataset.
  • Weights: The weight for each security is determined by the total number of longs and shorts we have in that current day. So if we have 2 longs and 2 shorts, the weight for each long will be 50% (1.0/number of securities) and the weight for each short will be -50%. This is a rolling rebalance at the beginning of each day according to the number of securities currently held and to order.
  • Long/Short: The current algorithm goes both long/short.
  • Hedging: [OPTIONAL] You have the ability to turn on net dollar exposure hedging with the SPY
  • Days held: Positions are currently held for 3 days but are easily changeable by modifying 'context.days_to_hold'
  • Percent threshold: Only surprises between 0% and 4% in absolute magnitude will be considered as a trading signal. These are adjustable using the minimum and maximum threshold variables in context.
  • Earnings dates: All trades are made 1 business day AFTER an earnings announcement regardless of whether it was a Before Market Open or After Market announcement
  • Jess released a similar algo about a year back which is now outdated. Please use this algorithm and format instead!

Thanks,
Seong

Join our webinar on March 1st, 2016 at 6PM EST with Leigh Drogen, CEO of Estimize, and Vinesh Jha, CEO of ExtractAlpha, as we discuss how to use Crowdsourced Estimates in your algorithms.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

39 responses

Thanks for sharing Seong, interesting stuff. I am trying to understand the data fetching part. I understand that you got a snapshot from Estimize and preprocessed it.

But I can't see how this code can run without an implementation for fetch_estimize() ?

I am actually able to clone the algo and run it but I have no clue how that is even possible, or what the data format might look like...

Hi JB,

Great questions! fetch_estimize is a custom method we've implemented because although we cannot give you the data itself, we still wanted to share the algorithm and allow users to gain some potential strategy ideas off of this post. As for what the data looks like, I can give you a few ways to see it for yourself.

The best way is to include a "preview" in your "pre_func" method which in this case is defined as "def pre_func(df):". In that line you can add "log.info(' %s ' % df.head())" in order to see the columns of the dataframe and the first few rows of what the data looks like. This is what it looks like in code:

def pre_func(df):  
    log.info(' %s ' % df.head())  
    #: The rest of what I have in the code  

From this point on you can see which columns are there, what the data looks like (because the data is consistent for the rest of the rows) and how to change your algorithm to match it. For example, you'll notice that when you add the line I have above, you'll see that there's also a "wallstreet_eps" column present. So now you can change "context.which_eps" to the string "wallstreet_eps" in order to base the strategy off of the Street's earning's estimates instead of the Estimize estimates.

This is generally the best way to clone/preview an algorithm where someone is fetching an inaccessible CSV file.

Give it a try and let me know what you think!

Seong

Thanks Seong, got it. So the CSV file is inaccessible but the data is cached somewhere on the quantopian servers. If anyone else is wondering, the format looks like this:

id  sector  industry    ticker  fiscal_year     fiscal_quarter  actual_eps  actual_rev  wallstreet_eps  wallstreet_rev date     date    estimize_eps    estimize_rev    num_participants  
4e671c877cb02d7af700002c    Information Technology  Software    SWI     2011    4   0.29    55.61   0.25    53.87   2/7/12 6:00 0.29    54.2    1  

It seems API access to Estimize requires a license but the rest of the platform is free. Will check it out.

Cheers!

JB C, I'm the founder at Estimize, glad to answer any questions regarding the API.

We do license the full data set in real time through a JSON REST API.

For Individuals: We provide a 1 month trial of the full real time API for $1,000. After that the cost for accessing the API is $1,000/month on a yearly contract.

For Firms: We provide the same 1 month $1,000 trial, but the cost of the API is between $5,000-$10,000 roughly based on assets under management.

All of our historical data is available to backtest for free via Quantopian.

You can get a trial started by contacting us through this form.

http://www.estimize.com/api

On another note, it's great to see everyone working with the data on Quantopian, we're hugely fans of the Quanoptian founder Fawce and his team, and love seeing all the innovative algorithms built by the community here. Please reach out if you have any questions!

Leigh

Hello Seong,

How can one tell if the algorithm performance is actually due to post-earnings-announcement drift, or other factors? It seems that you need to formulate a baseline for comparison. For example, maybe there is some sort of Monte Carlo simulation you could run to show that by buying/selling the same securities over the same period but without the Estimize information, the return would have been statistically worse.

Would there be any way to replace SPY with a suitable benchmark that captures the basket of securities you are trading?

Or maybe the Estimize data provide no advantage, and you could just detect gaps up/down in price overnight and get the same result?

Grant

@Grant Kiehne -- I think you make a great point. Wouldn't the following provide a similar result:

1) Detect the first trading day after earnings and;
2) If stock > by MAVG20 buy or stock < MVAG20 short

Since you would start trading the day after earnings it is implied that the movement is correlated to the announcement, regardless if earnings beat the street/consensus. Food for thought.

@Grant

You bring up some really interesting points. The original whitepaper that this algo (http://com.estimize.public.s3.amazonaws.com/papers/Estimize%20Whitepaper%20Executive%20Summary.pdf)
was based off of compared Estimize against the Street's consensus (and the strategy based off Estimize did fare better) but that doesn't rule out the possibility that perhaps both strategies had returns that were affected by factors other than earnings drift. Although, I think that's a possibility present in any algorithm (e.g. randomness as Nassim Taleb says very often). I would think you would be able to take the basket of securities as a universe and go long (equal weights) and see if that works as a benchmark? I'm not too sure on this one. Thoughts?

@Greg

That sounds like it would make a very interesting strategy. Maybe a challenge would be to write it up and share it? :)

@Grant The white paper written by Vinesh and I does account for various different potentially correlated factors such as momentum, value, growth, etc. The returns shown in the paper are residual to any other factors, not just market neutral, but residual and uncorrelated.

@Seong,

I ain't no expert, but I think that the point here is that the drift is the potential arbitrage opportunity. There's a long-term relaxation phenomenon that has to be separated from other deterministic factors and noise. Perhaps your algorithm is taking advantage of the effect, but I just can't tell from what you've presented. My intuition is that the effect will be relatively small, given that it is well-known and the market is incredibly efficient.

@Leigh,

Am I thinking about this correctly? Is there a prescribed method that could be applied to sort out if Seong's example is taking advantage of the effect? And to what extent his use of Estimize data was helpful?

I looked at the slide show, and either I missed something, or what, but wasn't there a direct correlation between participant number and accuracy rate? If so then why throw out the above 20 particpiant data set for Co's? Is it something to do with the increased participant number creating too much noise? simple stat analysis would tell you that few particpants=inaccuracy, yet I did not see a filter throwing out co's that had a small number of particpants.

Looking at the backtest, I noticed that most of the increase in performance for the algo was in April 2013, while the benchmark stayed steady in its growth. The performance from April 2013 onwards was much more confined to a range in growth. I'm less curious about the recent performance than the aforementioned April increase. Any logical reason for the timing on that? Or just the way it played out?

Christian,

In this case, we kept every instance where number of participants were greater than or equal to 20 and we threw away anytime it was less than that. The pre_func method keeps data where num_participants are > 20 and throws out all the rest.

df = df[(df['num_participants'] > 19)]  

But you're right, I did find a correlation between number of participants and the accuracy rate. I don't know if I have a solid answer for you about the increased performance in April 2013, it may have just been that the earnings drift for surprises based off the Estimize data was larger than the other instances that they've occurred so maybe it was the way it played out in this case.

Seong

OK, I must have misread that first programming bit. What you wrote makes more sense. thanks

Hey all,

The cut of data has been updated to the end of October 2014 so feel free to update your backtests to a more recent date.

Thanks!

Clone Algorithm
67
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 5462498c7f087e188c09709e
There was a runtime error.

Really interesting to see the dip in performance of the algorithm during the months early in the year when many high beta names melted down and the market went through a little volatility. But in October when the market went through a ton of volatility and everything melted down there wasn't much if any poor performance in the algo, and more recently it's done very well. This would make sense, as many of the names where there are going to be a ton of estimates and companies which beat/miss their consensus by a wide margin are going to be high growth companies, they all got crushed together early in the year, basically correlations went way up, and the algo stopped working. In other words, the market didn't care for a short time what the fundamentals of the stocks where, everything got crushed together in the high beta growth space.

I wonder whether there is a benefit to creating a switch to turn off the algo for stocks within a sector that experiencing abnormal volatility above a certain level. The performance of this strategy very much depends on companies being rewarded/punished for beating/missing expectations, but at times when market correlations get higher due to volatility that isn't going to happen as much and the algo will suffer.

Overall obviously great performance out of the strategy, but more fine tuning can be done I believe, without curve fitting, an in and out of sample test here for volatility could make sense.

Hi Seong & Leigh,

Maybe I'm just a cranky skeptic, but I'd like to reiterate the need to be more analytical here with respect to performance and assigning cause-and-effect relationships. I'm no financial expert, but my impression is that we're dealing with a complex system with lots of noise, so it's gonna be tricky to tease out what's actually going on. For example, if the algo is based on high growth companies, and we've been in a bull market, then the fact that the algo beats the broad market could just be due to the securities that were selected, right? How do we know that the PEAD effect is at play here? If it gets washed out by volatility, then maybe there are other underlying factors that wash it out completely, and we aren't seeing it at all in the backtest.

Grant

Grant,

Your points are extremely legitimate and to be frank, I don't think I'm the most knowledgable person to answer your questions. Although I'd like to say that I wasn't trying to assign any cause-and-effect relationship but that perhaps there was some correlation.

  • For example, if the algo is based on high growth companies, and we've been in a bull market, then the fact that the algo beats the broad market could just be due to the securities that were selected, right?
  • How do we know that the PEAD effect is at play here?
  • If it gets washed out by volatility, then maybe there are other underlying factors that wash it out completely, and we aren't seeing it at all in the backtest.

I think these questions need to be answered and the right place to do that would be in Research. I'm extremely interested in collaborating on a project with you on this and I think it would be a ton of fun for the both of us, shoot me a line if you're interested. If not, that's okay too :)

Seong

Grant,

I don't think you're quite grasping the idea of market neutral and residual returns. The bull vs bear market thing doesn't effect this algorithm in any way what so ever. Residual returns means residual to a slew of different factors, growth and momentum being two of them. While this example on Quantopian only uses the SPY to produce the "market neutral" returns, the paper written by the Deutsche Bank Quant Research team, as well as our white paper, which look at this strategy, both go into far more depth.

As well, the idea of causality is irrelevant. Good quant research is not about causality, it's about finding an inefficiency in the market that can be repeatably arbitraged. We're not attempting to explain why the post earnings drift takes place, just that by using the Estimize data set we can capture it. Good quant research is achieve by looking at these strategies both in, and out of sample, which has been done. Now, one can make an argument that our sample size is not large enough, or the length of history in the data set is not long enough, but based on the fact that a number of large quant funds pay for and trade on our data, I'd say that's not an issue at this point.

My point regarding market volatility has nothing to do with bull vs bear markets, I would bet that it works the same way for volatility on the upside as well.

Thanks Leigh,

My main point is that there is an assumed predictive model being applied, but the data presented here on this thread are insufficient to validate the model (at least for me, with my limited background in this field).

Given that you sell your dataset to large quant funds, do you think that Quantopian members will still be able to use it to their advantage? Or will all of the benefit be "arbitraged away" (I guess this is the lingo)? If the strategy truly works and it is public knowledge, then it shouldn't work for long, correct?

Also, is there any evidence that the "Wall Street estimates" are coming into line with the more accurate ones of Estimize?

Grant

Here's Seong's backtest from above, but with Anony Mole's custom slippage model (https://www.quantopian.com/posts/trade-at-the-open-slippage-model). It simulates orders closer to the opening bell. My understanding is that Quantopian doesn't yet support orders at the true open (since all open orders are cancelled at the daily market close). This may be a use case to sort out how to make it work, since the return is appreciably higher. --Grant

Clone Algorithm
94
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 546790766a3a0708f9b70b13
There was a runtime error.

Grant,

The post earnings drift strategy using Estimize data shows strong alpha generation in and out of sample, so I'm not sure what you mean by the data presented being insufficient.

Regarding the arbitrage of this advantage, yes, over time some of the alpha generating capability of this strategy will go away as the Estimize data set is utilized by more and more quants. But given that the Estimize Consensus is much closer to the true representation of expectations than the Wall Street consensus, there should always be some post earnings drift alpha to be captured, I doubt the market will ever be that efficient as to arbitrage it within minutes. It took 20 years to arbitrage the post earnings drift agains the Wall Street numbers.

No, there is no evidence that wall street numbers are coming into line with Estimize, that is due to the systematic bias under which wall street estimates are created by those analysts.

Thanks Leigh,

I definitely have some reading to do. Also, the backtest I posted above suggests more of a monotonic upward trend in return, which is encouraging. Glad you are able to show Wall Street bias!

Grant

Hi Seong,

Should your backtest include the $1000/month cost for access to the Estimize data? Also, what would be the comparable cost for a feed of the Wall Street consensus estimates, via fetcher? I'm just wondering if a strategy like this is anywhere within reach of a typical individual retail investor who most likely won't be shelling out $12K/year for the data.

Regarding the Wall Street estimates, one interesting analysis we could do with the data would be to see if the actual earnings can be predicted via the Wall Street estimates. Then, if the Wall Street estimates were cheaper or free, then one could just use those. Basically, if there is a consistent systematic bias by Wall Street, then maybe one can back out a model to correct for the bias.

What is the game plan by Quantopian for being able to set up an algo for executing orders closer to the open? I can see that one could load fresh data via fetcher overnight, but then the algo can't access the market until just after 9:31, right?

Another thought is that there might be an advantage to timing a series of orders over the ~50 second window allowed by handle_data, so rather than putting in a single order essentially right at the whole minute, put in several orders over the minute, at different sub-minute times.

Grant

The notebook that this algorithm was based off of has just been released here: https://www.quantopian.com/posts/research-stepping-through-crowdsourced-earnings-data-with-estimize. Feel free to take a look!

For those looking to backtest this, please use this latest code!

"""
    This is a PEAD strategy based off Estimize's earnings estimates. Estimize (estimize.com) is a service that aggregate financial estimates from independent, buy-side, sell-side analysts as well as students and professors. The data that we're using here will be called from our custom 'fetch_estimize' method and contains the following columns:  
    - date: the date that the company announced it's earnings  
    - actual_eps: the actual earnings announcement on that date  
    - wallstreet_eps: the Wall Street consensus' estimates for that earnings announcement  
    - estimize_eps: the Estimize estimates for that earnings announcement  
    - ticker: the stock ticker

    Much of the variables are meant for you to be able to play around with them:  
        1. context.algo_type: defines whether you want a long/short strategy  
        2. context.days_to_hold: defines the number of days you want to hold before exiting a position  
        3. context.min/max_surprise: defines the min/max % surprise you want before trading on a signal  
        4. context.which_eps: defines that you're using the 'estimize_eps' rather than the 'wallstreet_eps' as your benchmark. Change it to 'wallstreet_eps' if you'd rather use that.  
"""

from pytz import timezone  
from datetime import datetime, timedelta

import numpy as np

def initialize(context):  
    #: Declares whether we're doing a short/long/both strategy  
    context.algo_type = 'both'  
    #: Declaring the days to hold, change this to what you want)))  
    context.days_to_hold = 3  
    #: Declares which stocks we currently held and how many days we've held them dict[stock:days_held]  
    context.stocks_held = {}  
    #: Declares which eps estimate to benchmark against  
    context.which_eps = 'estimize_eps'  
    #: Declares the minimum magnitude of percent surprise  
    context.min_surprise = .01  
    context.max_surprise = .06  
    #: Declares the number of ticks that we've used in the current day  
    context.ticks = 0  
    #: Boolean holding the memory whether or not we've already ordered for the current day  
    context.ordered = False  
    #: Boolean holding the memory whether or not we need to retry closing out our positions  
    context.retry = False  
    #: Initialize our Hedge  
    context.spy = sid(8554)  
    #: Get our tickers and set them as our data  
    fetch_estimize(  
              pre_func=pre_func,  
              symbol_column='ticker')  
    #: Get the same tickers and set them as our universe  
    fetch_estimize(  
              pre_func=pre_func,  
              symbol_column='ticker',  
              universe_func=my_universe)

def handle_data(context, data):  
    #: Converting the time in to the Eastern Timezone  
    #: This will be made obsolete with the `schedule_function` method. Coming out soon!  
    current_time = get_datetime().astimezone(timezone('US/Eastern'))  
    """  
        Setting starting positions  
    """  
    if current_time.hour == 9 and current_time.minute == 31:  
        context.ticks = 0  
    """  
        Log current positions at 10:00 AM  
    """  
    if current_time.hour == 10 and current_time.minute == 0 and context.ordered == True:  
        #: Get all positions  
        all_positions = "Current positions for %s : " % (str(get_datetime()))  
        for pos in context.portfolio.positions:  
            all_positions += "%s at %s shares, " % (pos.symbol, context.portfolio.positions[pos].amount)  
        log.info(all_positions)  
    """  
        Main ordering conditions  
    """  
    if context.ticks <= 30 and context.ordered == False:  
        #: Create a dict of stocks to buy/sell and the position (long/short)  
        stocks_to_order = {}  
        for stock in data:  
            #: First check if data exists and we're using 'all' because it's easier to read but  
            #: it's the same as 'and'  
            if all([context.which_eps in data[stock],  
                  'reports_at_2' in data[stock],  
                  stock in data]):  
                date_format = "%Y-%m-%d"  
                trade_date = datetime.strptime(data[stock]['reports_at_2'], date_format)  
                #: Next check if it's the correct date to trade  
                if all([trade_date.year == get_datetime().year,  
                       trade_date.month == get_datetime().month,  
                       trade_date.day == get_datetime().day]):  
                    estimize_eps = data[stock][context.which_eps]  
                    actual_eps = data[stock]['actual_eps']  
                    #: Getting the percent surprise needed before we trade on that signal  
                    percent_surprise = (actual_eps - estimize_eps)/(estimize_eps + 0.0)  
                    #: Positive Surprise  
                    if (percent_surprise >= context.min_surprise and percent_surprise <= context.max_surprise) and (context.algo_type == 'both' or context.algo_type == 'long') :  
                        stocks_to_order[stock] = 'long'  
                    #: Negative Surprise  
                    if (percent_surprise <= -context.min_surprise and percent_surprise >= -context.max_surprise) and (context.algo_type == 'both' or context.algo_type == 'short'):  
                        stocks_to_order[stock] = 'short'  
                    #: If neither positive nor negative surprise, do nothing  
                    else:  
                        pass  
        #: Create weights for each of our long and short positions based on the number of longs/shorts we have  
        if len(stocks_to_order) != 0:  
            #: Get total number of stocks and divide by 1.0  
            total_long = len([s for s in stocks_to_order if stocks_to_order[s] == 'long'])  
            total_short = len([s for s in stocks_to_order if stocks_to_order[s] == 'short'])  
            long_weight = 1.0/total_long if total_long != 0 else 0  
            short_weight = -1.0/total_short if total_short != 0 else 0  
        #: Go through our stocks to order and order them  
        for stock in stocks_to_order:  
            #: Check if we have data for the stock  
            if stock in data:  
                #: Check whether it's a long or a short  
                if stocks_to_order[stock] == 'long':  
                    weight = long_weight  
                elif stocks_to_order[stock] == 'short':  
                    weight = short_weight  
                log.info("Entering position on %s at %s" % (stock.symbol, str(get_datetime())))  
                order_target_percent(stock, weight)  
                #: Set the number of days held = 0  
                context.stocks_held[stock] = 0  
                context.ordered = True  
        #: Finally, we hedge our positions by getting the net dollar ordered and matching that to 0  
        if context.ordered == True:  
            #: Get the total amount ordered for the day  
            amount_ordered = 0  
            for order in get_open_orders():  
                for oo in get_open_orders()[order]:  
                    amount_ordered += oo.amount * data[oo.sid].price

            #: Order our hedge  
            order_target_value(context.spy, -amount_ordered)  
            context.stocks_held[context.spy] = 0  
            log.info("We currently have a net order of $%0.2f and will hedge with SPY by ordering $%0.2f" % (amount_ordered, -amount_ordered))  

    """  
        Exit position/days held update logic  
    """  
    #: Go through each held stock and update the number of days held and close out any positions  
    #: that have been held past context.days_to_hold  
    if (current_time.hour == 15 and current_time.minute == 45) or context.retry == True:  
        for stock in context.portfolio.positions:  
            #: Get the number of days that we've currently held this stock  
            days = context.stocks_held.get(stock)  
            #: None is the condition for a security that we don't currently hold  
            if days == None:  
                continue  
            #: If days_to_hold is set to 1, close out any position at the end of the day  
            if context.days_to_hold == 1:  
                #: If we don't have data for the stock, break and retry  
                if stock not in data:  
                    context.retry = True  
                    break  
                #: If we just placed an order for the stock, don't bother ordering again  
                if stock not in get_open_orders():  
                    log.info("Exiting position on %s at %s" % (stock.symbol, str(get_datetime())))  
                    order_target_percent(stock, 0)  
                    #: Refresh all variables  
                    context.retry = False  
                    context.ordered = False  
            #: Same order logic but for when context.days_to_hold != 1  
            elif context.days_to_hold > 1:  
                if days >= context.days_to_hold:  
                    if stock not in data:  
                        context.retry = True  
                        break  
                    elif stock not in get_open_orders():  
                        log.info("Exiting position on %s at %s" % (stock.symbol, str(get_datetime())))  
                        order_target_percent(stock, 0)  
                        context.retry = False  
                        context.ordered = False  
                        context.stocks_held[stock] = None  
                else:  
                    context.stocks_held[stock] += 1  

    context.ticks += 1  

"""
    Fetcher helper methods  
"""

def pre_func(df):  
    """  
        This takes in our dataframe, cleans up the dates in it because I want to just buy it the day after the earnings are released  
    """  
    #: We're going to shift the dates according to when we should be trading the ticker  
    df['date'] = df['date'].apply(lambda x: shift_dates(x))  
    #: We're going to make a copy of the date column to make sure we know when to trade at the appropriate date  
    df['reports_at_2'] = df['date']  
    df = df[(df['num_participants'] > 19)]  
    df = df[(df['ticker'] != 0)]  
    df = df.drop(["Unnamed: 0"], axis=1)  
    return df

def shift_dates(row):  
    """  
        This function is going to take all the dates in the dataframe, test whether it's before market or after market close, and shift the dates appropriately.  
        1. If it's before market open, keep the date the same  
        2. If it's after market open, shift the date by 1 day  
    """  
    row_date = datetime.strptime(row, "%Y-%m-%d %H:%M:%S")  
    if row_date.hour > 9:  
        row_date = row_date + timedelta(days=1)  
    elif row_date.hour == 9 and row_date.minute >= 30:  
        row_date = row_date + timedelta(days=1)  
    else:  
        row_date = row_date  
    row = row_date.strftime("%Y-%m-%d")  
    return row

def my_universe(context, fetcher_data):  
    """  
        Method for setting our universe of stocks which we will use to determine  
        our weights for each security as well  
    """  
    #: Setting our universe of stocks  
    sids = set(fetcher_data['sid'])  
    sids = [s for s in sids if s != 0]  
    symbols = [s.symbol for s in sids]  
    log.info("Our daily universe size is %s sids" % len(symbols))  
    return sids  

and if you want a long only version one has to cancel the hedging part as it shorts the spy (could be replaced by buying the SH)

"""
    This is a PEAD strategy based off Estimize's earnings estimates. Estimize (estimize.com) is a service that aggregate financial estimates from independent, buy-side, sell-side analysts as well as students and professors. The data that we're using here will be called from our custom 'fetch_estimize' method and contains the following columns:  
    - date: the date that the company announced it's earnings  
    - actual_eps: the actual earnings announcement on that date  
    - wallstreet_eps: the Wall Street consensus' estimates for that earnings announcement  
    - estimize_eps: the Estimize estimates for that earnings announcement  
    - ticker: the stock ticker

    Much of the variables are meant for you to be able to play around with them:  
        1. context.algo_type: defines whether you want a long/short strategy  
        2. context.days_to_hold: defines the number of days you want to hold before exiting a position  
        3. context.min/max_surprise: defines the min/max % surprise you want before trading on a signal  
        4. context.which_eps: defines that you're using the 'estimize_eps' rather than the 'wallstreet_eps' as your benchmark. Change it to 'wallstreet_eps' if you'd rather use that.  
"""

from pytz import timezone  
from datetime import datetime, timedelta

import numpy as np

def initialize(context):  
    #: Declares whether we're doing a short/long/both strategy  
    context.algo_type = 'long'  
    #: Declaring the days to hold, change this to what you want)))  
    context.days_to_hold = 3  
    #: Declares which stocks we currently held and how many days we've held them dict[stock:days_held]  
    context.stocks_held = {}  
    #: Declares which eps estimate to benchmark against  
    context.which_eps = 'estimize_eps'  
    #: Declares the minimum magnitude of percent surprise  
    context.min_surprise = .01  
    context.max_surprise = .06  
    #: Declares the number of ticks that we've used in the current day  
    context.ticks = 0  
    #: Boolean holding the memory whether or not we've already ordered for the current day  
    context.ordered = False  
    #: Boolean holding the memory whether or not we need to retry closing out our positions  
    context.retry = False  
    #: Initialize our Hedge  
    context.spy = sid(8554)  
    #: Get our tickers and set them as our data  
    fetch_estimize(  
              pre_func=pre_func,  
              symbol_column='ticker')  
    #: Get the same tickers and set them as our universe  
    fetch_estimize(  
              pre_func=pre_func,  
              symbol_column='ticker',  
              universe_func=my_universe)

def handle_data(context, data):  
    #: Converting the time in to the Eastern Timezone  
    #: This will be made obsolete with the `schedule_function` method. Coming out soon!  
    current_time = get_datetime().astimezone(timezone('US/Eastern'))  
    """  
        Setting starting positions  
    """  
    if current_time.hour == 9 and current_time.minute == 31:  
        context.ticks = 0  
    """  
        Log current positions at 10:00 AM  
    """  
    if current_time.hour == 10 and current_time.minute == 0 and context.ordered == True:  
        #: Get all positions  
        all_positions = "Current positions for %s : " % (str(get_datetime()))  
        for pos in context.portfolio.positions:  
            all_positions += "%s at %s shares, " % (pos.symbol, context.portfolio.positions[pos].amount)  
        log.info(all_positions)  
    """  
        Main ordering conditions  
    """  
    if context.ticks <= 30 and context.ordered == False:  
        #: Create a dict of stocks to buy/sell and the position (long/short)  
        stocks_to_order = {}  
        for stock in data:  
            #: First check if data exists and we're using 'all' because it's easier to read but  
            #: it's the same as 'and'  
            if all([context.which_eps in data[stock],  
                  'reports_at_2' in data[stock],  
                  stock in data]):  
                date_format = "%Y-%m-%d"  
                trade_date = datetime.strptime(data[stock]['reports_at_2'], date_format)  
                #: Next check if it's the correct date to trade  
                if all([trade_date.year == get_datetime().year,  
                       trade_date.month == get_datetime().month,  
                       trade_date.day == get_datetime().day]):  
                    estimize_eps = data[stock][context.which_eps]  
                    actual_eps = data[stock]['actual_eps']  
                    #: Getting the percent surprise needed before we trade on that signal  
                    percent_surprise = (actual_eps - estimize_eps)/(estimize_eps + 0.0)  
                    #: Positive Surprise  
                    if (percent_surprise >= context.min_surprise and percent_surprise <= context.max_surprise) and (context.algo_type == 'both' or context.algo_type == 'long') :  
                        stocks_to_order[stock] = 'long'  
                    #: Negative Surprise  
                    if (percent_surprise <= -context.min_surprise and percent_surprise >= -context.max_surprise) and (context.algo_type == 'both' or context.algo_type == 'short'):  
                        stocks_to_order[stock] = 'short'  
                    #: If neither positive nor negative surprise, do nothing  
                    else:  
                        pass  
        #: Create weights for each of our long and short positions based on the number of longs/shorts we have  
        if len(stocks_to_order) != 0:  
            #: Get total number of stocks and divide by 1.0  
            total_long = len([s for s in stocks_to_order if stocks_to_order[s] == 'long'])  
            total_short = len([s for s in stocks_to_order if stocks_to_order[s] == 'short'])  
            long_weight = 1.0/total_long if total_long != 0 else 0  
            short_weight = -1.0/total_short if total_short != 0 else 0  
        #: Go through our stocks to order and order them  
        for stock in stocks_to_order:  
            #: Check if we have data for the stock  
            if stock in data:  
                #: Check whether it's a long or a short  
                if stocks_to_order[stock] == 'long':  
                    weight = long_weight  
                elif stocks_to_order[stock] == 'short':  
                    weight = short_weight  
                log.info("Entering position on %s at %s" % (stock.symbol, str(get_datetime())))  
                order_target_percent(stock, weight)  
                #: Set the number of days held = 0  
                context.stocks_held[stock] = 0  
                context.ordered = True  
        #: Finally, we hedge our positions by getting the net dollar ordered and matching that to 0  
        if context.ordered == True and (context.algo_type == 'both' or context.algo_type == 'short'):  
            #: Get the total amount ordered for the day  
            amount_ordered = 0  
            for order in get_open_orders():  
                for oo in get_open_orders()[order]:  
                    amount_ordered += oo.amount * data[oo.sid].price

            #: Order our hedge  
            order_target_value(context.spy, -amount_ordered)  
            context.stocks_held[context.spy] = 0  
            log.info("We currently have a net order of $%0.2f and will hedge with SPY by ordering $%0.2f" % (amount_ordered, -amount_ordered))  

    """  
        Exit position/days held update logic  
    """  
    #: Go through each held stock and update the number of days held and close out any positions  
    #: that have been held past context.days_to_hold  
    if (current_time.hour == 15 and current_time.minute == 45) or context.retry == True:  
        for stock in context.portfolio.positions:  
            #: Get the number of days that we've currently held this stock  
            days = context.stocks_held.get(stock)  
            #: None is the condition for a security that we don't currently hold  
            if days == None:  
                continue  
            #: If days_to_hold is set to 1, close out any position at the end of the day  
            if context.days_to_hold == 1:  
                #: If we don't have data for the stock, break and retry  
                if stock not in data:  
                    context.retry = True  
                    break  
                #: If we just placed an order for the stock, don't bother ordering again  
                if stock not in get_open_orders():  
                    log.info("Exiting position on %s at %s" % (stock.symbol, str(get_datetime())))  
                    order_target_percent(stock, 0)  
                    #: Refresh all variables  
                    context.retry = False  
                    context.ordered = False  
            #: Same order logic but for when context.days_to_hold != 1  
            elif context.days_to_hold > 1:  
                if days >= context.days_to_hold:  
                    if stock not in data:  
                        context.retry = True  
                        break  
                    elif stock not in get_open_orders():  
                        log.info("Exiting position on %s at %s" % (stock.symbol, str(get_datetime())))  
                        order_target_percent(stock, 0)  
                        context.retry = False  
                        context.ordered = False  
                        context.stocks_held[stock] = None  
                else:  
                    context.stocks_held[stock] += 1  

    context.ticks += 1  

"""
    Fetcher helper methods  
"""

def pre_func(df):  
    """  
        This takes in our dataframe, cleans up the dates in it because I want to just buy it the day after the earnings are released  
    """  
    #: We're going to shift the dates according to when we should be trading the ticker  
    df['date'] = df['date'].apply(lambda x: shift_dates(x))  
    #: We're going to make a copy of the date column to make sure we know when to trade at the appropriate date  
    df['reports_at_2'] = df['date']  
    df = df[(df['num_participants'] > 19)]  
    df = df[(df['ticker'] != 0)]  
    df = df.drop(["Unnamed: 0"], axis=1)  
    return df

def shift_dates(row):  
    """  
        This function is going to take all the dates in the dataframe, test whether it's before market or after market close, and shift the dates appropriately.  
        1. If it's before market open, keep the date the same  
        2. If it's after market open, shift the date by 1 day  
    """  
    row_date = datetime.strptime(row, "%Y-%m-%d %H:%M:%S")  
    if row_date.hour > 9:  
        row_date = row_date + timedelta(days=1)  
    elif row_date.hour == 9 and row_date.minute >= 30:  
        row_date = row_date + timedelta(days=1)  
    else:  
        row_date = row_date  
    row = row_date.strftime("%Y-%m-%d")  
    return row

def my_universe(context, fetcher_data):  
    """  
        Method for setting our universe of stocks which we will use to determine  
        our weights for each security as well  
    """  
    #: Setting our universe of stocks  
    sids = set(fetcher_data['sid'])  
    sids = [s for s in sids if s != 0]  
    symbols = [s.symbol for s in sids]  
    log.info("Our daily universe size is %s sids" % len(symbols))  
    return sids  

For folks interested in the strategies discussed here (and the underlying estimate data in general), data from Estimize is now available built-into Quantopian via the Quantopian Store. The data is updated daily -- check it out!

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

The backtest stops trading on Nov 16th 2014...Is it because the data ''fetched'' on estimize are not available for free and that's why nothing is happening?

I actually don't get how are the data about the estimize's eps and wallstreet's eps are added to the algo... can't anybody point this out in the algo?

That's correct. This post was originally completed with a static sample of data from Estimize last year. We have integrated fuller versions of Estimize's data into the platform - both consensus numbers and analyst-by-analyst estimates. These data sets update frequently and won't have the same problem this static data set has.

Today, these data sets are available for use in Research. We are actively working on integrating data from Estimize (and other partner data sources) into our API for use in algorithms.

Thanks
Josh

Hey guys,

I've implemented a similar algorithm using the native Pipeline API and the Estimize Consensus Estimates . The basics are essentially the same, except that in this case, I'm waiting a full business day after an earnings announcement before making a trade.

  • Long/Short securities with an Earnings Surprise of absolute magnitude between 0 and 4%
  • Keep leverage under 2
  • Optional code included if you want to hedge out your positions with the SPY.

*Note: The timespan reflects the sample dataset dates available for both the Earnings Calendar and Estimize Consensus datasets. I'll follow up with the most recent version as well.

This strategy is still in a rough draft mode so would love to get your thoughts on this algorithm and how I could improve it.

Clone Algorithm
2015
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 56bcf64abda44b1193b5d588
There was a runtime error.

@Seong - I am curious why the Beta Coefficient is a steady 0 with this algo? That implies there is no vol or risk...which would mean that Estimize is nearly always accurate...is this look ahead bias or am i missing something? Maybe Im just an idiot and have no idea what I'm talking about hah..

Daniel,

Check out the Investopedia definition for beta. The phrase that helped me the most (and I'm no expert) is "you can think of beta as the tendency of a security's returns to respond to swings in the market"

In this case, an algo whose beta is approximately 0 is an algo whose returns tend not to change with the overall change in the market. There is relative independence between the market and the algorithm.

Daniel,

Going off of what Josh said, the algorithm is able to have a near 0 beta because it is market neutral. It's able to be market neutral by holding both long and short positions rather than just long or just short. If it were the case of just having long or short positions, you'd expect the beta to be closer to 1 or -1.

Beta of 0 doesn't mean that the algorithm is perfect or that there is no volatility. In fact, if you look at the volatility of the backtest, it's .24 for the course of the backtest. A beta close to zero simply means that systematic risk (market risk) isn't present but other types of risk like idiosyncratic risk are present (ie. AAPL has a terrible earnings announcement) where as in a strategy with a beta of 1, you'd have both market and idiosyncratic risk.

I'd suggest checking out our beta hedging lecture on the lecture series tutorial page for more information (http://www.investopedia.com/terms/i/idiosyncraticrisk.asp)

As for the lookahead bias, this algorithm actually waits a full business day before trying to trade on an earnings announcement. So if anything, it's a day late in making a decision.

Happy to answer any other questions,
Seong

Hi Seong,

I noticed that the algorithm is incredibly sensitive to the surprise cut. For instance, if I choose 0.0 and 0.1 I see a loss of ~ 100% on the timescale of the simulation.

Was this parameter hand tuned to make the simulation work? Or is there some kind of reason why one would expect 0.0-0.04% to be a good range?

I would have thought a "surprise" of <1% is pretty immaterial, probably just noise. You are therefore going long and short on a broad range of stocks that probably in aggregate look like the market. While I wouldn't expect it to crash to 0% equity in a short sim, I would expect lots of trading costs and no real performance. The other way to think about it is the size of the trading costs relative to the upside (presumably well less than 1%, given that many trades will not work out).

Personally, I don't like these hard coded thresholds, as I suspect they will depend on market conditions. Another technique would be to rank the current surprise with ALL the surprises for ANY stock in the last year, and treat it as a tradeable surprise if it's in the top half. In other words, is the surprise level above the mean of the last year? Granted, I've introduced two more parameters here (lookback and threshold), but they feel less "hand picked" to me. They would probably cope better with non-stationarity. You could also run simulations of different slices of the ranking, and be fairly sure that each slice has a similar number of trades.

https://en.wikipedia.org/wiki/Data_transformation_(statistics)#Transforming_to_a_uniform_distribution

How do you guys at Estimize get buy-side analysts to contribute estimates? Aren't they not supposed to share that type of information even if you were paying them enough to make it worth their while? My understanding is that it's difficult for non-professionals to compete in this type of estimates situation because professionals generally have a big information advantage in obtaining some of these estimates through access to management, but I'd be curious to hear if that perception is incorrect and/or how Estimize solves that issue.

@ Seong Lee

Bug in your latest implementation

The latest algorithm you have written to filter out estimates with less than 20 analysts has a minor bug in it. You have used num_estimates to filter out while the column is actually num_estimates.latest. If you look at the pipeline graph you will see that num_estimates >= 20 is an always TRUE statement. You can try something like num_estimates >= 1000 and see it is still an always TRUE statement which probably means that value represents the total number of estimates.

Possible fix:
use num_estimates.latest in the pipeline screening

@SeongLee

I'm just getting started with algorithmic investing. One that I wonder about. Why do you compare a strategy against SPY, a US broad index? Why not against a balanced portfolio of US / european / emerging market / bonds?

I suppose you want to see if you "beat the market". If the algorithm can do better in the US vs the broad US market. I wonder how SPY compares to such a balanced portfolio.

Thank you!
Matt

Matt, the SPY is simply our default for benchmarking in the backtester. You can actually set it to other stocks or ETF in your algo code, if you like. Alternatively, you can also analyze the performance in more detail with pyfolio in our research environment.