event-based guidance algo?

I decided to maybe dink around with the guidance data, inspired by the recent mini-contest. I put together the attached algo, and if one views EPS guidance as a company event within the period of the mini-contest (June 1, 2015 until Oct 1, 2018), there are relatively few companies participating, and the total grows rather slowly (roughly linearly from zero to 150 companies over the contest period).

It would seem that an event-based algo would be best-suited, where, for example, one detects a release of guidance from company XYZ, determines an alpha vector weight for XYZ, and holds for N days (e.g. N = 5 days), and then closes the position (i.e. sets the alpha vector weight for XYZ to zero).

Am I on the right track here? Or should I be thinking about the problem differently? It could be a bit of work to write an event-based algo framework, so before I try it, I thought I'd get some advice from the crowd.

Also, as an architectural change, I'm thinking of writing the factor outside of Pipeline--just export the data to before_trading_start and do the computation there using standard Pandas/numpy/libraries, versus the Q API. Any opinions?

18
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
 Returns 1 Month 3 Month 6 Month 12 Month
 Alpha 1 Month 3 Month 6 Month 12 Month
 Beta 1 Month 3 Month 6 Month 12 Month
 Sharpe 1 Month 3 Month 6 Month 12 Month
 Sortino 1 Month 3 Month 6 Month 12 Month
 Volatility 1 Month 3 Month 6 Month 12 Month
 Max Drawdown 1 Month 3 Month 6 Month 12 Month
from quantopian.pipeline.filters import QTradableStocksUS
from quantopian.algorithm import attach_pipeline, pipeline_output
from quantopian.pipeline import Pipeline
from quantopian.pipeline.data.factset.estimates import Guidance
from quantopian.pipeline.factors import PercentChange
import quantopian.optimize as opt
import pandas as pd
import numpy as np

def normalize(x):

# r = x.rank()
# r = r/r.max()
# r = r - 0.5

# return r/r.abs().sum()
return x/x.abs().sum()

def guidance_pipeline():
# Slice the DataSetFamily to create a DataSet.
guid_eps_q1 = Guidance.slice('EPS', 'qf', 1)

# Get the percent change of the upper and lower bounds on the company guidance
# to see whether or not guidance has changed over the last two trading days.
earnings_change_lower = PercentChange(inputs=[guid_eps_q1.low], window_length=2)
earnings_change_upper = PercentChange(inputs=[guid_eps_q1.high], window_length=2)

pipe = Pipeline(
columns={
'guid_eps_q1_low': guid_eps_q1.low.latest,
'guid_eps_q1_high': guid_eps_q1.high.latest,
'guid_eps_q1_asof_date': guid_eps_q1.asof_date.latest,
'guid_eps_q1_period_label': guid_eps_q1.period_label.latest,
'guid_eps_q1_timestamp': guid_eps_q1.timestamp.latest,
'earnings_change_lower': earnings_change_lower,
'earnings_change_upper': earnings_change_upper,
},
screen=(
earnings_change_lower.notnull()
& earnings_change_upper.notnull()
# A non-zero percent change indicates a change in guidance in the last
# 2 trading days.
& (
(earnings_change_lower != 0)
| (earnings_change_upper != 0)
)
)
)

return pipe

def initialize(context):

attach_pipeline(guidance_pipeline(), 'guidance_pipeline')

# Schedule my rebalance function
schedule_function(func=rebalance,
date_rule=date_rules.every_day(),
time_rule=time_rules.market_close(),
half_days=True)
# record my portfolio variables at the end of day
schedule_function(func=recording_statements,
date_rule=date_rules.every_day(),
time_rule=time_rules.market_close(),
half_days=True)

context.stocks = []

def recording_statements(context, data):

record(num_positions=len(context.portfolio.positions))
record(leverage=context.account.leverage)
record(num_stocks = len(context.stocks))

guidance_pipe = pipeline_output('guidance_pipeline')

stocks_low = guidance_pipe.guid_eps_q1_low.index.values.tolist()
stocks_high = guidance_pipe.guid_eps_q1_high.index.values.tolist()
context.stocks.extend(stocks_low)
context.stocks.extend(stocks_high)
context.stocks = list(set(context.stocks))

weights = np.ones(len(context.stocks))
context.alpha = normalize(pd.Series(weights,index=context.stocks))

def rebalance(context, data):

objective = opt.TargetWeights(context.alpha)

order_optimal_portfolio(objective=objective,
constraints=[]
)
There was a runtime error.
18 responses

Hi Grant,

Some observations on your code:

# A non-zero percent change indicates a change in guidance in the last
# 2 trading days.
& (
(earnings_change_lower != 0)
| (earnings_change_upper != 0)


This is only valid if you are considering only those stocks that changes their guidance report after their initial report. But if you want to consider all stocks that have reported their guidance I would eliminate it.

stocks_low = guidance_pipe.guid_eps_q1_low.index.values.tolist()
stocks_high = guidance_pipe.guid_eps_q1_high.index.values.tolist()
context.stocks.extend(stocks_low)
context.stocks.extend(stocks_high)
context.stocks = list(set(context.stocks))


Here by using extend for both high and low, you are double counting the number of stocks (names). Normally, companies report both high and low guidance.

Hope this helps.

Thanks James -

My understanding is that this code will avoid the double-counting:

context.stocks = list(set(context.stocks))


How does one detect that EPS guidance has been issued by a given company? In other words, what is the flag that an event has occurred?

Grant,

Yes, set should take care of duplication since it only counts unique names. Sorry my bad.

I think the change in guid_eps_q1.asof_date.latest should flag that event. And the daily change of earnings_change_lower and/or earnings_change_high should account for revision.

Thanks James -

Any thoughts on the event-based algo approach? It seems like at any point in time one would hold only a few stocks if the decay time for the effect of the event would be 5-15 days. I’ll have to work it out, but one has 4 quarters per year, 253 trading days per year and 400 companies reporting guidance. So the average number of stocks held at any given time will be relatively low if the alpha decays quickly after the event.

I'm not really sure an event-based approach is the way to go because daily changes are very few as most companies that report guidance it do it once and rarely revises their guidance. . I've played around with @Vladimir's NB (located at the start of that challenge thread) to see how names are reporting and their frequencies and earnings and sales have around 350 - 400 names while dividends and cash flow are almost none (~5-10) and this just on the first quarter. As you go further down 2Q, 3Q and 4Q names are dwindling down to almost nothing. I think it's best to get to know the dataset first in research before contemplating on the right approach.

I guess there are really two sequential events: first the guidance and then the actual report. I guess one could treat each guidance plus actual as a single event.

Maybe try to use BusinessDaysSincePreciousEvent As a filter?

Thanks Joakim -

If I search the documentation on BusinessDaysSincePreviousEvent nothing is returned. However, it is used in a sample algo:

https://www.quantopian.com/help#sample-fundamentals

It is also here:

https://github.com/quantopian/zipline/blob/80367fbd7deb118ae0f0937e051403ed9947d345/zipline/pipeline/factors/events.py#L16

Recently, an example was provided here by Q support:

https://www.quantopian.com/posts/get-current-date-in-pipeline

Here's an example of BusinessDaysSincePreviousEvent to screen the universe:

guid_eps_q1 = Guidance.slice('EPS', 'qf', 1)

screen=(
(days_since_guidance < 15)


So, if the guidance was issued within 15 days, then the stock is included; otherwise, it is excluded. This imposes a post-event holding period of 15 days.

I'm wondering what Q is really after here? Take EPS for example. There are two events: EPS guidance and EPS actuals. The mini-contest rule is:

Use the guidance data set as your primary signal source. It is OK to combine guidance signals with other sources if there are predictive interactions between them.

So, what if I use EPS actuals as my primary signal source, and EPS guidance as a source that interacts with it (e.g. if actual differs from guidance, then something good/bad happens to stock price)? It would seem that the EPS actuals are what would dominate the stock price, and so incorporating them would make it my primary signal source, right?

The only thing Q can measure anyway is that the universe consists of stocks that issued guidance of some sort over June 1, 2015 until Oct 1, 2018 (or to the present, since presumably the judging will include the out-of-sample period). I'm thinking that "primary signal source" is too qualitative, and for a black-box algo, not assess-able anyway--just ignore it. Is this how others are approaching the "primary signal source" requirement?

18
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
 Returns 1 Month 3 Month 6 Month 12 Month
 Alpha 1 Month 3 Month 6 Month 12 Month
 Beta 1 Month 3 Month 6 Month 12 Month
 Sharpe 1 Month 3 Month 6 Month 12 Month
 Sortino 1 Month 3 Month 6 Month 12 Month
 Volatility 1 Month 3 Month 6 Month 12 Month
 Max Drawdown 1 Month 3 Month 6 Month 12 Month
from quantopian.pipeline.filters import QTradableStocksUS
from quantopian.algorithm import attach_pipeline, pipeline_output
from quantopian.pipeline import Pipeline
from quantopian.pipeline.data.factset.estimates import Guidance
from quantopian.pipeline.factors import PercentChange
import quantopian.optimize as opt
import pandas as pd
import numpy as np
from quantopian.pipeline.factors import BusinessDaysSincePreviousEvent

def normalize(x):

# r = x.rank()
# r = r/r.max()
# r = r - 0.5

# return r/r.abs().sum()
return x/x.abs().sum()

def guidance_pipeline():
# Slice the DataSetFamily to create a DataSet.
guid_eps_q1 = Guidance.slice('EPS', 'qf', 1)

# Get the percent change of the upper and lower bounds on the company guidance
# to see whether or not guidance has changed over the last two trading days.
earnings_change_lower = PercentChange(inputs=[guid_eps_q1.low], window_length=2)
earnings_change_upper = PercentChange(inputs=[guid_eps_q1.high], window_length=2)

pipe = Pipeline(
columns={
'guid_eps_q1_low': guid_eps_q1.low.latest,
'guid_eps_q1_high': guid_eps_q1.high.latest,
'guid_eps_q1_asof_date': guid_eps_q1.asof_date.latest,
'guid_eps_q1_period_label': guid_eps_q1.period_label.latest,
'guid_eps_q1_timestamp': guid_eps_q1.timestamp.latest,
'earnings_change_lower': earnings_change_lower,
'earnings_change_upper': earnings_change_upper,
},
screen=(
(days_since_guidance < 15)
)
)

return pipe

def initialize(context):

attach_pipeline(guidance_pipeline(), 'guidance_pipeline')

# Schedule my rebalance function
schedule_function(func=rebalance,
date_rule=date_rules.every_day(),
time_rule=time_rules.market_close(),
half_days=True)
# record my portfolio variables at the end of day
schedule_function(func=recording_statements,
date_rule=date_rules.every_day(),
time_rule=time_rules.market_close(),
half_days=True)

context.stocks = []

def recording_statements(context, data):

record(num_positions=len(context.portfolio.positions))
# record(leverage=context.account.leverage)
record(num_stocks = len(context.stocks))

guidance_pipe = pipeline_output('guidance_pipeline')

stocks_low = guidance_pipe.guid_eps_q1_low.index.values.tolist()
stocks_high = guidance_pipe.guid_eps_q1_high.index.values.tolist()
context.stocks.extend(stocks_low)
context.stocks.extend(stocks_high)
context.stocks = list(set(context.stocks))

# weights = np.ones(len(context.stocks))
# context.alpha = normalize(pd.Series(weights,index=context.stocks))

stocks_lh = []
stocks_lh.extend(stocks_low)
stocks_lh.extend(stocks_high)
stocks_lh = list(set(stocks_lh))

weights = np.ones(len(stocks_lh))
context.alpha = normalize(pd.Series(weights,index=stocks_lh))

def rebalance(context, data):

objective = opt.TargetWeights(context.alpha)

order_optimal_portfolio(objective=objective,
constraints=[]
)
There was a runtime error.

Grant,

I think you are spot on with this "primary signal source" grey area. How does one determine which signal between two (or more) factor interactions is more dominant to be considered as the primary source. This is like the chicken and the egg problem, which came first! I am already seeing some submissions that loosely misinterpreted this "rule consideration". Look at the number of daily positions in their charts and you would know those that have taken much liberty in their interpretation. If you analyze the Guidance data in research, most of the data is concentrated on the first quarter and have no more than 500 names.

James -

I think it amounts to specifying the point-in-time universe, which is measurable. For example, from https://www.quantopian.com/contest/rules, we have:

Trade liquid stocks: Trade liquid stocks: Contest entries must have 95% or more of their invested capital in stocks in the QTradableStocksUS universe (QTU, for short). This is checked at the end of each trading day by comparing an entry’s end-of-day holdings to the constituent members of the QTradableStocksUS on that day. Contest entries are allowed to have as little as 90% of their invested capital invested in members of the QTU on up to 2% of trading days in the backtest used to check criteria. This is in place to help mitigate the effect of turnover in the QTU definition.

For the most recent guidance mini-contest, one would simply apply an additional criterion that would allow only names that reported guidance at least once prior to the point-in-time backtest (I suppose looking back as far as the data would support, 2004). The guidance universe could be a lot larger than 400.

Yeah Grant, I'm not a very good python coder but what I would like to do is run the guidance dataset in research and come up with a daily number of unique names for the series so that we can put this issue to rest. The problem occurs in data handling of factor combination given we are allowed to have interactions with non guidance factors provided, etc, etc. It is very easy to make a mistake specially when one fills Nans with a neutral number at the wrong stage of processing. On my first try, I made this silly mistake and came up with 1800 names. You ought to give it a go, you're one of the best coders here that I know.

@ James -

One has an event (e.g. I note a car parked in the street in front of my house.), followed by the interpretation of the event (e.g. Am I expecting a visitor? Is it a police car? Was it there yesterday? etc.). In the case of a stock event, the way I'm thinking about it is I then have to score the event relative to the current scores of the other stocks with open positions (versus scoring across an entire universe of ~400 stocks, since each event has a finite lifetime, e.g. ~15 days max). The relative scoring algorithm is the interpretation of the event.

Grant,

I get what you're driving at, an event based algo using guidance and actuals and scoring on that. In this approach you've get lucky to get 3-5 names on a daily basis on a 63-day cycle.

base_universe = QTradableStocksUS()

eps_guid_q1 = Guidance.slice('EPS', 'qf', 1)
guidance_asof = eps_guid_q1.asof_date.latest
eps_guid_q1_high = eps_guid_q1.high.latest
eps_guid_q1_low = eps_guid_q1.low.latest

pipe = Pipeline(
columns={
'guidance_asof': guidance_asof,
'eps_high': eps_guid_q1_high,
'eps_low': eps_guid_q1_low,
},
screen=(
base_universe
& (BusinessDaysSincePreviousEvent(inputs=[guidance_asof]) < 15)
)


This could result in a very low turnover. But I could be wrong.

@ James -

If you look at the algo I posted above (https://www.quantopian.com/posts/event-based-guidance-algo#5d99f64e777dec51e955ac5e), one would have a portfolio that cycles quarterly from a min of ~20 positions, to a max of ~250 positions, with a 15 day holding period.

For an individual idiosyncratic factor (e.g. EPS guidance) I don't think it really matters so much how many names are held at any point in time. The legacy/outdated contest rules don't apply at the individual factor level (I think Q has not done enough to make this clear, by the way, in case anyone is listening). In the limit of a large number of uncorrelated factors that in total cover the QTU, it just doesn't matter; one wants niche strategies, since the ones that apply broadly will likely have low SR (e.g. the common risk factors). Maybe this is why the Q fund has apparently struggled; trying to have each algo find alpha across the entire QTU just ends up with a low SR "me too" fund.

Thomas does list the below for what they are looking for when determining the winners. I believe this is referring to ‘number of holdings’ in the portfolio at any one time. See bottom left blue graph in the new tearsheet.

Universe size (larger is better)

Here's a first attempt. I'm using:

def guidance_pipeline():

guid_eps_q1 = Guidance.slice('EPS', 'qf', 1)

pipe = Pipeline(
columns={
'guid_eps_q1_low': guid_eps_q1.low.latest,
'guid_eps_q1_high': guid_eps_q1.high.latest,
},
screen=(
(days_since_guidance < 15)
& guid_eps_q1.low.latest.notnull()
& guid_eps_q1.high.latest.notnull()
)
)
return pipe

1