An Empirical Algorithmic Evaluation of Technical Analysis

At a recent meeting of the Quantopian staff journal club, I presented a paper by Andrew Lo, Harry Mamaysky, and Jiang Wang called Foundations of Technical Analysis: Computational Algorithms, Statistical Inference, and Empirical Implementation (2000). In this paper, the authors utilize non-parametric kernel regression to smooth a stock's daily price time series to a point where the local minima and maxima that a human technical analyst would find relevant can be separated from noisier short-term price fluctuations. The authors then search these denoised local minima and maxima for the patterns commonly pursued by technical analysts. Once they've identified occurrences of particular patterns, the authors test their predictive power by observing the subsequent forward return on the stock.

For a fuller explanation of what is going on in this notebook, I encourage you to take a look at the original paper: https://www.cis.upenn.edu/~mkearns/teaching/cis700/lo.pdf

It is interesting to note that since this paper was written in 2000 and all the data used in my implementation is from 2003-2016, my results can be considered to be "out of sample" with respect to the authors' findings.

As I discuss in the notebook, one of my concerns with the author's methodology is the introduction of lookahead bias via the kernel regression. I'm eager to see how these technical patterns perform as predictors when implemented in an actual trading algorithm. (I'd love some help getting this analysis running as an algorithm on Quantopian.) I imagine we could use Pipeline to scan for patterns on a 40 day lookback window for a large universe of stocks. I'd also be interested to see how this pattern detection works when we smooth price timeseries using an exponential moving average instead of kernel regression.

If you're not familiar with research notebooks, what you need to do is click "Clone Noteook" (you need to be logged in first) and then you can page through the step-by-step of my analysis. You can also "run cell" from the beginning and reproduce all of the results that I did.

1224
Notebook previews are currently unavailable.
Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

32 responses

This may help.

Wonderful effort Q. I applaud the vast array of interesting papers you guys have published.

Despite my deep scepticism I will certainly contribute to this once I have the time.

I believe an Event Study could give us much more information regarding those technical patterns. So I added an event study at the bottom of your NB. With this information it should be easier to build an algorithm as we can better decide when to enter/exit our positions once we detect a pattern.

80
Notebook previews are currently unavailable.

I was puzzling yesterday about currency conversion and back testing on patterns.

On trend following I have decided the "correct" way to deal with a multi currency Portfolio is to convert all time series to the base currency prior to running my daily orders (and of course for any backtesting). In my view there is little point in declaring an entry into the yen version if the Nikkei if you are a dollar investor - you need to take the currency into account to check the trade still makes sense I'm dollars.

Many backtesting programs work with native currencies and only convert the P&L back to the base currency.

While my method is fine for long term trends, I suspect it would create havoc with other types of pattern such as those being considered here.

you need to take the currency into account to check the trade still makes sense I'm dollars.

This is offtopic but you could also hedge the position on FX market so then index returns would then basically be in USD. If you trade yen based nikkei equity without hedging you are actually trading both the equity and yen at the same time (this might not be a problem at all but it's worth noting).

Mikko M
Agreed but in trend following that is exactly what I think one should do - trade the trend in the Nikkei as that trend is modified by the currency. There are two other alternatives - take signals just on the Nikkei trend and ignore the currency except in calculating the daily P&L, or hedge as you suggest.

There are very few hedged ETFs around - most trade in one currency (eg the USD) with an un-hedged investment in yen assets. The market ensures the USD price remains in line both with the movement of the Nikkei and the USD/JPY.

I don't have much difficulty with any of these approaches but in other forms of pattern recognition, I suspect logically a dollar investor would be best to base his signals on the unconverted JPY time series. There again I have done no testing whatsoever and am just raising a "note to myself" for when I come to look at this later.

If there is any validity to a "head and shoulders" (if it has any predictive power) I imagine that may disappear if you convert the time series to USD.

In trend following I don't mind doing that. I'm an investors rather than a trader and if an up-trend in the Nikkei is outweighed by a down trend in the yen, I don't want to take the trade.

Incidentally I don't hedge. I want the diversification.

@Luca That event study is a great addition to this analysis. Thank you for sharing!

Some takeaways:

1) I'd want the algo to trade on each of these technical signals. A larger "N" is going to lead to smaller error bars on our overall forward return.

2) Most of these signals are significantly decayed after 6 days. (I'm still concerned that the kernel regression has introduced lookahead bias). I'd be interested to see how many of these signals would have been missed if we provided only {1,2,3,4,5,6,7} days of trailing price data after the last local maximums identified with the full range of available data.

3) Inverse Head-and-Shoulders appears to be a particularly useful signal. It occurs often and has a large, slow decaying forward return.

Do you notice anything in particular?

@Andrew, here is an algorithm that uses those patterns.

There is a pipeline factor (PatternFactor) that detect those patterns and return a code (an integer) any time a pattern is found.

You can see the factor has a class member "indentification_lag" that is a limit on how many days ago a pattern can be recognized: ideally we would like to recognize a pattern the day after his last local minimum/maximum is found, but in practice we need more days to detect the last minimum/maximum. The "indentification_lag" parameter is there to let us test different configuration.

After changing "indentification_lag" we are suppose to update the number of days a position is held accordingly: the latter we detect a pattern the less days we can hold a position. The number of holding days per event can be configured in before_trading_start function.

This backtest used an indentification_lag of 4 days and a holding period 2 days (as Andrew correctly reported "Most of these signals are significantly decayed after 6 days" )

293
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
 Returns 1 Month 3 Month 6 Month 12 Month
 Alpha 1 Month 3 Month 6 Month 12 Month
 Beta 1 Month 3 Month 6 Month 12 Month
 Sharpe 1 Month 3 Month 6 Month 12 Month
 Sortino 1 Month 3 Month 6 Month 12 Month
 Volatility 1 Month 3 Month 6 Month 12 Month
 Max Drawdown 1 Month 3 Month 6 Month 12 Month
from quantopian.algorithm import attach_pipeline, pipeline_output
from quantopian.pipeline import Pipeline
from quantopian.pipeline.data.builtin import USEquityPricing
from quantopian.pipeline.data import morningstar
from quantopian.pipeline.factors import CustomFactor, SimpleMovingAverage, Latest, EWMA, EWMSTD, Returns, ExponentialWeightedMovingAverage, AverageDollarVolume

import pandas as pd
import numpy as np
import datetime

from statsmodels.nonparametric.kernel_regression import KernelReg
from scipy.signal import argrelextrema
from numpy import linspace
from collections import defaultdict

#################################################################
# Keep track of leverage and long/short exposure
#
# One Class to rule them all, One Class to define them,
# One Class to monitor them all and in the bytecode bind them
#

class ExposureMngr(object):

def __init__(self, target_leverage = 1.0, target_long_exposure_perc = 0.50, target_short_exposure_perc = 0.50):
self.target_leverage            = target_leverage
self.target_long_exposure_perc  = target_long_exposure_perc
self.target_short_exposure_perc = target_short_exposure_perc
self.short_exposure             = 0.0
self.long_exposure              = 0.0
self.open_order_short_exposure  = 0.0
self.open_order_long_exposure   = 0.0

def get_current_leverage(self, context, consider_open_orders = True):
curr_cash = context.portfolio.cash - (self.short_exposure * 2)
if consider_open_orders:
curr_cash -= self.open_order_short_exposure
curr_cash -= self.open_order_long_exposure
curr_leverage = (context.portfolio.portfolio_value - curr_cash) / context.portfolio.portfolio_value
return curr_leverage

def get_exposure(self, context, consider_open_orders = True):
long_exposure, short_exposure = self.get_long_short_exposure(context, consider_open_orders)
return long_exposure + short_exposure

def get_long_short_exposure(self, context, consider_open_orders = True):
long_exposure         = self.long_exposure
short_exposure        = self.short_exposure
if consider_open_orders:
long_exposure  += self.open_order_long_exposure
short_exposure += self.open_order_short_exposure
return (long_exposure, short_exposure)

def get_long_short_exposure_pct(self, context, consider_open_orders = True, consider_unused_cash = True):
long_exposure, short_exposure = self.get_long_short_exposure(context, consider_open_orders)
total_cash = long_exposure + short_exposure
if consider_unused_cash:
total_cash += self.get_available_cash(context, consider_open_orders)
long_exposure_pct   = long_exposure  / total_cash if total_cash > 0 else 0
short_exposure_pct  = short_exposure / total_cash if total_cash > 0 else 0
return (long_exposure_pct, short_exposure_pct)

def get_available_cash(self, context, consider_open_orders = True):
curr_cash = context.portfolio.cash - (self.short_exposure * 2)
if consider_open_orders:
curr_cash -= self.open_order_short_exposure
curr_cash -= self.open_order_long_exposure
leverage_cash = context.portfolio.portfolio_value * (self.target_leverage - 1.0)
return curr_cash + leverage_cash

def get_available_cash_long_short(self, context, consider_open_orders = True):
total_available_cash  = self.get_available_cash(context, consider_open_orders)
long_exposure         = self.long_exposure
short_exposure        = self.short_exposure
if consider_open_orders:
long_exposure  += self.open_order_long_exposure
short_exposure += self.open_order_short_exposure
current_exposure       = long_exposure + short_exposure + total_available_cash
target_long_exposure  = current_exposure * self.target_long_exposure_perc
target_short_exposure = current_exposure * self.target_short_exposure_perc
long_available_cash   = target_long_exposure  - long_exposure
short_available_cash  = target_short_exposure - short_exposure
return (long_available_cash, short_available_cash)

def update(self, context, data):
#
# calculate cash needed to complete open orders
#
self.open_order_short_exposure  = 0.0
self.open_order_long_exposure   = 0.0
for stock, orders in  get_open_orders().iteritems():
price = data.current(stock, 'price')
if price == np.NaN:
continue
amount = 0 if stock not in context.portfolio.positions else context.portfolio.positions[stock].amount
for oo in orders:
order_amount = oo.amount - oo.filled
if order_amount < 0 and amount <= 0:
self.open_order_short_exposure += (price * -order_amount)
elif order_amount > 0 and amount >= 0:
self.open_order_long_exposure  += (price * order_amount)

#
# calculate long/short positions exposure
#
self.short_exposure = 0.0
self.long_exposure  = 0.0
for stock, position in context.portfolio.positions.iteritems():
amount = position.amount
last_sale_price = position.last_sale_price
if amount < 0:
self.short_exposure += (last_sale_price * -amount)
elif amount > 0:
self.long_exposure  += (last_sale_price * amount)
#################################################################

def find_max_min(prices):
prices_ = prices.copy()
prices_.index = linspace(1., len(prices_), len(prices_))
kr = KernelReg([prices_.values], [prices_.index.values], var_type='c', bw=[1.8,1])
f = kr.fit([prices_.index.values])
smooth_prices = pd.Series(data=f[0], index=prices.index)

local_max = argrelextrema(smooth_prices.values, np.greater)[0]
local_min = argrelextrema(smooth_prices.values, np.less)[0]

price_local_max_dt = []
for i in local_max:
if (i>1) and (i<len(prices)-1):
price_local_max_dt.append(prices.iloc[i-2:i+2].argmax())

price_local_min_dt = []
for i in local_min:
if (i>1) and (i<len(prices)-1):
price_local_min_dt.append(prices.iloc[i-2:i+2].argmin())

prices.name = 'price'
maxima = pd.DataFrame(prices.loc[price_local_max_dt])
minima = pd.DataFrame(prices.loc[price_local_min_dt])
max_min = pd.concat([maxima, minima]).sort_index()
max_min.index.name = 'date'
max_min = max_min.reset_index()
max_min = max_min[~max_min.date.duplicated()]
p = prices.reset_index()
max_min['day_num'] = p[p['index'].isin(max_min.date)].index.values
max_min = max_min.set_index('day_num').price

return max_min

def find_patterns(max_min):
patterns = defaultdict(list)

for i in range(5, len(max_min)+1):
window = max_min.iloc[i-5:i]

# pattern must play out in less than 36 days
if window.index[-1] - window.index[0] > 35:
continue

# Using the notation from the paper to avoid mistakes
e1 = window.iloc[0]
e2 = window.iloc[1]
e3 = window.iloc[2]
e4 = window.iloc[3]
e5 = window.iloc[4]

rtop_g1 = np.mean([e1,e3,e5])
rtop_g2 = np.mean([e2,e4])
if (e1 > e2) and (e3 > e1) and (e3 > e5) and \
(abs(e1 - e5) <= 0.03*np.mean([e1,e5])) and \
(abs(e2 - e4) <= 0.03*np.mean([e1,e5])):
patterns['HS'].append((window.index[0], window.index[-1]))

elif (e1 < e2) and (e3 < e1) and (e3 < e5) and \
(abs(e1 - e5) <= 0.03*np.mean([e1,e5])) and \
(abs(e2 - e4) <= 0.03*np.mean([e1,e5])):
patterns['IHS'].append((window.index[0], window.index[-1]))

elif (e1 > e2) and (e1 < e3) and (e3 < e5) and (e2 > e4):
patterns['BTOP'].append((window.index[0], window.index[-1]))

elif (e1 < e2) and (e1 > e3) and (e3 > e5) and (e2 < e4):
patterns['BBOT'].append((window.index[0], window.index[-1]))

# Triangle Top
elif (e1 > e2) and (e1 > e3) and (e3 > e5) and (e2 < e4):
patterns['TTOP'].append((window.index[0], window.index[-1]))

# Triangle Bottom
elif (e1 < e2) and (e1 < e3) and (e3 < e5) and (e2 > e4):
patterns['TBOT'].append((window.index[0], window.index[-1]))

# Rectangle Top
elif (e1 > e2) and (abs(e1-rtop_g1)/rtop_g1 < 0.0075) and \
(abs(e3-rtop_g1)/rtop_g1 < 0.0075) and (abs(e5-rtop_g1)/rtop_g1 < 0.0075) and \
(abs(e2-rtop_g2)/rtop_g2 < 0.0075) and (abs(e4-rtop_g2)/rtop_g2 < 0.0075) and \
(min(e1, e3, e5) > max(e2, e4)):

patterns['RTOP'].append((window.index[0], window.index[-1]))

# Rectangle Bottom
elif (e1 < e2) and (abs(e1-rtop_g1)/rtop_g1 < 0.0075) and \
(abs(e3-rtop_g1)/rtop_g1 < 0.0075) and (abs(e5-rtop_g1)/rtop_g1 < 0.0075) and \
(abs(e2-rtop_g2)/rtop_g2 < 0.0075) and (abs(e4-rtop_g2)/rtop_g2 < 0.0075) and \
(max(e1, e3, e5) > min(e2, e4)):
patterns['RBOT'].append((window.index[0], window.index[-1]))

return patterns

def _pattern_identification(prices, indentification_lag):
max_min = find_max_min(prices)

# we are only interested in the last pattern (if multiple patterns are there)
# and also the last min/max must have happened less than "indentification_lag"
# days ago otherways it mush have already been identified or it is too late to be usefull
max_min_last_window = None

for i in reversed(range(len(max_min))):
if (prices.index[-1] - max_min.index[i]) <= indentification_lag:
max_min_last_window = max_min.iloc[i-4:i+1]
break

if max_min_last_window is None:
return 0

# possibly identify a pattern in the selected window
patterns = find_patterns(max_min_last_window)
if len(patterns) != 1:
return 0

name, start_end_day_nums = patterns.iteritems().next()

pattern_code = {
'HS'   : -2,
'IHS'  : 2,
'BTOP' : -1,
'BBOT' : 1,
'TTOP' : -4,
'TBOT' : 4,
'RTOP' : -3,
'RBOT' : 3,
}

return pattern_code[name]

class PatternFactor(CustomFactor):

inputs = [USEquityPricing.close]
window_length = 40
indentification_lag = 4

def compute(self, today, assets, out, close):
prices = pd.DataFrame(close, columns=assets)
out[:] = prices.apply(_pattern_identification, args=(self.indentification_lag,))

def make_pipeline(context):

pipe = Pipeline()

#
# Screen out penny stocks and low liquidity securities.
#
price = SimpleMovingAverage(inputs=[USEquityPricing.close], window_length=22)
dollar_volume = AverageDollarVolume(window_length=20)

price_filter   = (price >= 1.0)
dollar_volume_filter = dollar_volume.top(500)

full_filter = price_filter & dollar_volume_filter
pipe.set_screen(full_filter)

#
#

return pipe

# Put any initialization logic here. The context object will be passed to
# the other methods in your algorithm.
def initialize(context):

#
# Algo configuration
#

context.exposure = ExposureMngr(target_leverage = 1.0,
target_long_exposure_perc = 0.50,
target_short_exposure_perc = 0.50)

context.daily_cash_limit_perc  = 0.40 # as we expect to find events every day, we want to have some cash available for trading
# This variable put a percentage limit on the cash we can use for trading every day
# so that something will be left available for the following days

#
# Algo internal state
#
context.universe = []
context.shorts =  pd.Series()
context.longs  =  pd.Series()
context.positions_to_clear = {}

context.position_expire = {}

#
# Algo logic starts
#
attach_pipeline(make_pipeline(context), 'factors')

schedule_function(rebalance,          date_rules.every_day(), time_rules.market_open())
schedule_function(log_stats,          date_rules.every_day(), time_rules.market_close())

# Compute final rank and assign long and short baskets.
results = pipeline_output('factors')
results = results.replace([np.inf, -np.inf], np.nan)
results = results.dropna()

print 'Basket of stocks %d' % (len(results))

now = get_datetime()

#
# Fill context.positions_to_clear with positions that we need to exit
# "rebalance" method will use that information to exit required positions
#
context.positions_to_clear = {}
temporary_exclusions = []
for sec, position in context.portfolio.positions.iteritems():
temporary_exclusions.append(sec)
if now >= context.position_expire.get(sec, now):
context.positions_to_clear[sec] = position.amount

# clear old entries
for sec in context.position_expire.keys():
if sec not in context.portfolio.positions:
del context.position_expire[sec]

# we don't want to enter positions that we already hold
results = results.drop(temporary_exclusions, axis=0, errors='ignore')

#
# Now fill context.shorts and context.longs and "rebalance" method will use
# that information to enter required positions
#
patterns = [ # name, code, number of days to hold the positions
('HS'  , -2, 2),
('IHS' ,  2, 2),
('BTOP', -1, 2),
('BBOT',  1, 2),
('TTOP', -4, 2),
('TBOT',  4, 2),
('RTOP', -3, 2),
#('RBOT',  3, 4),
]

context.shorts = pd.Series()
context.longs  = pd.Series()
for name, code, holding_days in patterns:
positions = results[ results['pattern'] == code ]['pattern']
if len(positions) <= 0:
continue
if code < 0:
context.shorts = context.shorts.append( positions )
elif code > 0:
context.longs  = context.longs.append( positions )
expire_date = now + datetime.timedelta(days=holding_days)
for sec in positions.index:
context.position_expire[sec] = expire_date

print 'shorts (length %d):\n' % (len(context.shorts.index)), context.shorts
print 'longs  (length %d):\n' % (len(context.longs.index)), context.longs

context.universe = (context.longs.index | context.shorts.index)

def rebalance(context, data):

#
# calculate how much money we have for rebalancing today
#
context.exposure.update(context, data)

long_available_cash, short_available_cash = context.exposure.get_available_cash_long_short(context)

log.debug( 'long_available_cash %f short_available_cash %f' % (long_available_cash, short_available_cash) )

#
# as we expect to find events every day, we want to have some cash available for trading
# we put a percentage limit on the cash we can use for trading every day
# so that something will be left available for the following days
#
long_available_cash  *= context.daily_cash_limit_perc
short_available_cash *= context.daily_cash_limit_perc

#
# if we don't have enoght money we cannot weight equally the stocks or have enough shares to cover minimum requirements
#
if long_available_cash < (context.portfolio.portfolio_value * 0.03):
long_available_cash = 0

if short_available_cash < (context.portfolio.portfolio_value * 0.03):
short_available_cash = 0

log.debug( 'We will use long_cash %f (%d sec) short_cash %f (%d sec)' % (long_available_cash, len(context.longs.index), short_available_cash,  len(context.shorts.index)) )

#
# Enter new positions
#
for sec in context.longs.index:
cash_per_long_sec  = long_available_cash  / len(context.longs.index)
if cash_per_long_sec > 0 and data.can_trade(sec):
order_value(sec, cash_per_long_sec)
log.debug( 'long order %s amount %f' % (str(sec), cash_per_long_sec) )

for sec in context.shorts.index:
cash_per_short_sec = short_available_cash / len(context.shorts.index)
if cash_per_short_sec > 0 and data.can_trade(sec):
order_value(sec, -cash_per_short_sec)
log.debug( 'short order %s amount %f' % (str(sec), cash_per_short_sec) )

#
# Clear positions for this day
#
for sec in context.positions_to_clear:
order_target(sec, 0)
log.debug( 'clear positions for %s' % (str(sec)) )

def log_stats(context, data):
context.exposure.update(context, data)
long_exposure_pct, short_exposure_pct = context.exposure.get_long_short_exposure_pct(context)
record(lever=context.account.leverage,
exposure=context.account.net_leverage,
num_pos=len(context.portfolio.positions),
long_exposure_pct=long_exposure_pct,
short_exposure_pct=short_exposure_pct)

There was a runtime error.

I'm glad we looked at this in an algo. Identifying the last maxima/minima in a pattern is clearly a key limitation. It'd be interesting to run these tests on various kernel regressor bandwidth parameter values to see if there is an identification lag/smoothness combination that produces superior returns.

I wonder if Kalman filter smoothing would look any different? I'll try swapping one into the notebook.

Here is a factor tear sheet of PatterFactor (the pipeline factor used in the previous algorithms to identify the technical patterns). There is a tear sheet for identification lags of 1, 2 and 3 days. I expect the results to be similar to what we found in the event study (even though in this NB we analyse a shorter time period 2013 - 2016, for technical limitation of research environment)

Edit: updated NB

15
Notebook previews are currently unavailable.

Summarizing what we have done up to know....

• The event study shows the technical patterns have some signal that we can exploit

• The signal decays after ~6 days, though Andrew is concern that the kernel regression introduces a lookahead bias: indeed we cannot predict a local minimum or maximum without having collected some data(daily close price) AFTER the local minimum/maximum. This means we cannot use the full 6 days signal but possibly less days of it and would be interested to know how many.

• Created a pipeline factor, PatternFactor, that detects the same patterns analysed in the event study. PatternFactor is configurable in the number of identification lag days to detect a local minimum/maximum. Obviously the less days we configure the less local minimum/maximum it finds and also, less days could mean detect false positive.

• Created an algorithm with PatternFactor and experimented with identification lag of 1,2,3,4 and 5 days (more is pointless as the signal decays after 6 days)

• The algorithm is not profitable, this can be due to bugs in the code :) or Andrew's concern is well founded. To understand the problem we created a factor tear sheet for PatternFactor

• Studied PatternFactor with the Factor Tear Sheet and discovered that if we don't give enough identification lag days to the factor it is not able to detect
the same patterns we studied in the event study. Meaning it detect local maximum/minimum that are false positive (they wouldn't have been detected as local maximum/minimum if we had provided more data)

The last conclusion was not clear simply looking at the last two factor tear sheets posted above. So here is a new NB that shows PatternFactor with 1,2,3 and 10 days identification lag (I never tried 10 days lag before ). In this last NB I also filter out stocks whose price is less than 5 dollars because the signal returns are very low and the transaction cost would make those stocks untradable.

My opinion is that with identification lag of 10 days the factor is able to detect patterns very similar to the ones seen in the event study. There are little discrepancy but we have to keep in mind that we are not using the same universe in the tear sheet and in the event study (as the pipeline recalculate its stock universe every day while the event study has a fixed universe for the whole time period examined).

If we look at the results of PatternFactor with 1,2 or 3 days identification lag we can see the recognized patterns are not similar to the ones seen in the
event study. In particular, the more lag days we have the more similar the patterns are to the ones in the event study. In my opinion that should corroborate Andrew's thesis/fear of lookahead bias.

It would be nice to have my conclusion reviewed by someone else, as it is very likely that I missed something (and we can never exclude bugs presence in the code).

Another thing I noticed is that PatternFactor with identification lag of 1 day, while detecting different patterns than the event study, generates a signal that last 1 day. Is not very strong but we could try exploiting it.

36
Notebook previews are currently unavailable.

The algorithm has few shortcomings.... but just to show the signal generated from identification lag of 1 day.

293
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
 Returns 1 Month 3 Month 6 Month 12 Month
 Alpha 1 Month 3 Month 6 Month 12 Month
 Beta 1 Month 3 Month 6 Month 12 Month
 Sharpe 1 Month 3 Month 6 Month 12 Month
 Sortino 1 Month 3 Month 6 Month 12 Month
 Volatility 1 Month 3 Month 6 Month 12 Month
 Max Drawdown 1 Month 3 Month 6 Month 12 Month
from quantopian.algorithm import attach_pipeline, pipeline_output
from quantopian.pipeline import Pipeline
from quantopian.pipeline.data.builtin import USEquityPricing
from quantopian.pipeline.data import morningstar
from quantopian.pipeline.factors import CustomFactor, SimpleMovingAverage, Latest, EWMA, EWMSTD, Returns, ExponentialWeightedMovingAverage, AverageDollarVolume

import pandas as pd
import numpy as np
import datetime

from statsmodels.nonparametric.kernel_regression import KernelReg
from scipy.signal import argrelextrema
from numpy import linspace
from collections import defaultdict

#################################################################
# Keep track of leverage and long/short exposure
#
# One Class to rule them all, One Class to define them,
# One Class to monitor them all and in the bytecode bind them
#

class ExposureMngr(object):

def __init__(self, target_leverage = 1.0, target_long_exposure_perc = 0.50, target_short_exposure_perc = 0.50):
self.target_leverage            = target_leverage
self.target_long_exposure_perc  = target_long_exposure_perc
self.target_short_exposure_perc = target_short_exposure_perc
self.short_exposure             = 0.0
self.long_exposure              = 0.0
self.open_order_short_exposure  = 0.0
self.open_order_long_exposure   = 0.0

def get_current_leverage(self, context, consider_open_orders = True):
curr_cash = context.portfolio.cash - (self.short_exposure * 2)
if consider_open_orders:
curr_cash -= self.open_order_short_exposure
curr_cash -= self.open_order_long_exposure
curr_leverage = (context.portfolio.portfolio_value - curr_cash) / context.portfolio.portfolio_value
return curr_leverage

def get_exposure(self, context, consider_open_orders = True):
long_exposure, short_exposure = self.get_long_short_exposure(context, consider_open_orders)
return long_exposure + short_exposure

def get_long_short_exposure(self, context, consider_open_orders = True):
long_exposure         = self.long_exposure
short_exposure        = self.short_exposure
if consider_open_orders:
long_exposure  += self.open_order_long_exposure
short_exposure += self.open_order_short_exposure
return (long_exposure, short_exposure)

def get_long_short_exposure_pct(self, context, consider_open_orders = True, consider_unused_cash = True):
long_exposure, short_exposure = self.get_long_short_exposure(context, consider_open_orders)
total_cash = long_exposure + short_exposure
if consider_unused_cash:
total_cash += self.get_available_cash(context, consider_open_orders)
long_exposure_pct   = long_exposure  / total_cash if total_cash > 0 else 0
short_exposure_pct  = short_exposure / total_cash if total_cash > 0 else 0
return (long_exposure_pct, short_exposure_pct)

def get_available_cash(self, context, consider_open_orders = True):
curr_cash = context.portfolio.cash - (self.short_exposure * 2.0)
if consider_open_orders:
curr_cash -= self.open_order_short_exposure
curr_cash -= self.open_order_long_exposure
leverage_cash = context.portfolio.portfolio_value * (self.target_leverage - 1.0)
return curr_cash + leverage_cash

def get_available_cash_long_short(self, context, consider_open_orders = True):
total_available_cash  = self.get_available_cash(context, consider_open_orders)
long_exposure         = self.long_exposure
short_exposure        = self.short_exposure
if consider_open_orders:
long_exposure  += self.open_order_long_exposure
short_exposure += self.open_order_short_exposure
current_exposure       = long_exposure + short_exposure + total_available_cash
target_long_exposure  = current_exposure * self.target_long_exposure_perc
target_short_exposure = current_exposure * self.target_short_exposure_perc
long_available_cash   = target_long_exposure  - long_exposure
short_available_cash  = target_short_exposure - short_exposure
return (long_available_cash, short_available_cash)

def update(self, context, data):
#
# calculate cash needed to complete open orders
#
self.open_order_short_exposure  = 0.0
self.open_order_long_exposure   = 0.0
for stock, orders in  get_open_orders().iteritems():
price = data.current(stock, 'price')
if price == np.NaN:
continue
amount = 0 if stock not in context.portfolio.positions else context.portfolio.positions[stock].amount
for oo in orders:
order_amount = oo.amount - oo.filled
if order_amount < 0 and amount <= 0:
self.open_order_short_exposure += (price * -order_amount)
elif order_amount > 0 and amount >= 0:
self.open_order_long_exposure  += (price * order_amount)

#
# calculate long/short positions exposure
#
self.short_exposure = 0.0
self.long_exposure  = 0.0
for stock, position in context.portfolio.positions.iteritems():
amount = position.amount
last_sale_price = position.last_sale_price
if amount < 0:
self.short_exposure += (last_sale_price * -amount)
elif amount > 0:
self.long_exposure  += (last_sale_price * amount)
#################################################################

def find_max_min(prices):
prices_ = prices.copy()
prices_.index = linspace(1., len(prices_), len(prices_))
kr = KernelReg([prices_.values], [prices_.index.values], var_type='c', bw=[1.8,1])
f = kr.fit([prices_.index.values])
smooth_prices = pd.Series(data=f[0], index=prices.index)

local_max = argrelextrema(smooth_prices.values, np.greater)[0]
local_min = argrelextrema(smooth_prices.values, np.less)[0]

price_local_max_dt = []
for i in local_max:
if (i>1) and (i<len(prices)-1):
price_local_max_dt.append(prices.iloc[i-2:i+2].argmax())

price_local_min_dt = []
for i in local_min:
if (i>1) and (i<len(prices)-1):
price_local_min_dt.append(prices.iloc[i-2:i+2].argmin())

prices.name = 'price'
maxima = pd.DataFrame(prices.loc[price_local_max_dt])
minima = pd.DataFrame(prices.loc[price_local_min_dt])
max_min = pd.concat([maxima, minima]).sort_index()
max_min.index.name = 'date'
max_min = max_min.reset_index()
max_min = max_min[~max_min.date.duplicated()]
p = prices.reset_index()
max_min['day_num'] = p[p['index'].isin(max_min.date)].index.values
max_min = max_min.set_index('day_num').price

return max_min

def find_patterns(max_min):
patterns = defaultdict(list)

for i in range(5, len(max_min)+1):
window = max_min.iloc[i-5:i]

# pattern must play out in less than 36 days
if window.index[-1] - window.index[0] > 35:
continue

# Using the notation from the paper to avoid mistakes
e1 = window.iloc[0]
e2 = window.iloc[1]
e3 = window.iloc[2]
e4 = window.iloc[3]
e5 = window.iloc[4]

rtop_g1 = np.mean([e1,e3,e5])
rtop_g2 = np.mean([e2,e4])
if (e1 > e2) and (e3 > e1) and (e3 > e5) and \
(abs(e1 - e5) <= 0.03*np.mean([e1,e5])) and \
(abs(e2 - e4) <= 0.03*np.mean([e1,e5])):
patterns['HS'].append((window.index[0], window.index[-1]))

elif (e1 < e2) and (e3 < e1) and (e3 < e5) and \
(abs(e1 - e5) <= 0.03*np.mean([e1,e5])) and \
(abs(e2 - e4) <= 0.03*np.mean([e1,e5])):
patterns['IHS'].append((window.index[0], window.index[-1]))

elif (e1 > e2) and (e1 < e3) and (e3 < e5) and (e2 > e4):
patterns['BTOP'].append((window.index[0], window.index[-1]))

elif (e1 < e2) and (e1 > e3) and (e3 > e5) and (e2 < e4):
patterns['BBOT'].append((window.index[0], window.index[-1]))

# Triangle Top
elif (e1 > e2) and (e1 > e3) and (e3 > e5) and (e2 < e4):
patterns['TTOP'].append((window.index[0], window.index[-1]))

# Triangle Bottom
elif (e1 < e2) and (e1 < e3) and (e3 < e5) and (e2 > e4):
patterns['TBOT'].append((window.index[0], window.index[-1]))

# Rectangle Top
elif (e1 > e2) and (abs(e1-rtop_g1)/rtop_g1 < 0.0075) and \
(abs(e3-rtop_g1)/rtop_g1 < 0.0075) and (abs(e5-rtop_g1)/rtop_g1 < 0.0075) and \
(abs(e2-rtop_g2)/rtop_g2 < 0.0075) and (abs(e4-rtop_g2)/rtop_g2 < 0.0075) and \
(min(e1, e3, e5) > max(e2, e4)):

patterns['RTOP'].append((window.index[0], window.index[-1]))

# Rectangle Bottom
elif (e1 < e2) and (abs(e1-rtop_g1)/rtop_g1 < 0.0075) and \
(abs(e3-rtop_g1)/rtop_g1 < 0.0075) and (abs(e5-rtop_g1)/rtop_g1 < 0.0075) and \
(abs(e2-rtop_g2)/rtop_g2 < 0.0075) and (abs(e4-rtop_g2)/rtop_g2 < 0.0075) and \
(max(e1, e3, e5) > min(e2, e4)):
patterns['RBOT'].append((window.index[0], window.index[-1]))

return patterns

def _pattern_identification(prices, indentification_lag):
max_min = find_max_min(prices)

# we are only interested in the last pattern (if multiple patterns are there)
# and also the last min/max must have happened less than "indentification_lag"
# days ago otherways it mush have already been identified or it is too late to be usefull
max_min_last_window = None

for i in reversed(range(len(max_min))):
if (prices.index[-1] - max_min.index[i]) == indentification_lag:
max_min_last_window = max_min.iloc[i-4:i+1]
break

if max_min_last_window is None:
return np.nan

# possibly identify a pattern in the selected window
patterns = find_patterns(max_min_last_window)
if len(patterns) != 1:
return np.nan

name, start_end_day_nums = patterns.iteritems().next()

pattern_code = {
'HS'   : -2,
'IHS'  : 2,
'BTOP' : -1,
'BBOT' : 1,
'TTOP' : -4,
'TBOT' : 4,
'RTOP' : -3,
'RBOT' : 3,
}

return pattern_code[name]

class PatternFactor(CustomFactor):

params = ('indentification_lag',)
inputs = [USEquityPricing.close]
window_length = 40

def compute(self, today, assets, out, close, indentification_lag):
prices = pd.DataFrame(close, columns=assets)
out[:] = prices.apply(_pattern_identification, args=(indentification_lag,))

#https://www.quantopian.com/posts/new-feature-multiple-output-pipeline-custom-factors
#import pandas as pd
#import numpy as np
#df = pd.DataFrame((np.random.randn(40)).reshape(10,4))
#f = lambda x: pd.Series( {'mean':x.mean(), 'std':x.std()} )
#dfa = df.apply(f)
#dfa.loc['mean']
#dfa.loc['std']

def make_pipeline(context):
"""
Create and return our pipeline.
"""
pipe = Pipeline()

#
# Screen out penny stocks and low liquidity securities.
#
price = SimpleMovingAverage(inputs=[USEquityPricing.close], window_length=22)
volume = SimpleMovingAverage(inputs=[USEquityPricing.volume], window_length=22)
price_filter   = (price >= 5.0)
volume_filter  = (volume >= 200 * 60 * 6.30)

dollar_volume_filter = dollar_volume.top(500)

full_filter = dollar_volume_filter
pipe.set_screen(full_filter)

#
#
pattern = PatternFactor(mask=full_filter, window_length = 42, indentification_lag=1)

return pipe

# Put any initialization logic here. The context object will be passed to
# the other methods in your algorithm.
def initialize(context):

#
# Algo configuration
#

context.exposure = ExposureMngr(target_leverage = 1.0,
target_long_exposure_perc = 0.50,
target_short_exposure_perc = 0.50)

context.daily_cash_limit_perc  = 0.80 # as we expect to find events every day, we want to have some cash available for trading
# This variable put a percentage limit on the cash we can use for trading every day
# so that something will be left available for the following days
#
# Algo internal state
#
context.universe = []
context.shorts =  pd.Series()
context.longs  =  pd.Series()
context.positions_to_clear = {}

context.position_expire = {}

#
# Algo logic starts
#
attach_pipeline(make_pipeline(context), 'factors')

schedule_function(rebalance,          date_rules.every_day(), time_rules.market_open())
schedule_function(log_stats,          date_rules.every_day(), time_rules.market_close())

# Compute final rank and assign long and short baskets.
results = pipeline_output('factors')
results = results.replace([np.inf, -np.inf], np.nan)
results = results.dropna()

print 'Basket of stocks %d / %d' % (len(results), len(pipeline_output('factors')))

now = get_datetime()

#
# Fill context.positions_to_clear with positions that we need to exit
# "rebalance" method will use that information to exit required positions
#
context.positions_to_clear = {}
temporary_exclusions = []
for sec, position in context.portfolio.positions.iteritems():
temporary_exclusions.append(sec)
if now >= context.position_expire.get(sec, now):
context.positions_to_clear[sec] = position.amount

# clear old entries
for sec in context.position_expire.keys():
if sec not in context.portfolio.positions:
del context.position_expire[sec]

# we don't want to enter positions that we already hold
results = results.drop(temporary_exclusions, axis=0, errors='ignore')

#
# Now fill context.shorts and context.longs and "rebalance" method will use
# that information to enter required positions
#
patterns = [ # name, code, number of days to hold the positions
('HS'  , -2, 1),
('IHS' ,  2, 1),
('BTOP', -1, 1),
('BBOT',  1, 1),
('TTOP', -4, 1),
('TBOT',  4, 1),
('RTOP', -3, 1),
#('RBOT',  3, 4),
]

context.shorts = pd.Series()
context.longs  = pd.Series()
for name, code, holding_days in patterns:
positions = results[ results['pattern'] == code ]['pattern']
if len(positions) <= 0:
continue
if code < 0:
context.shorts = context.shorts.append( positions )
elif code > 0:
context.longs  = context.longs.append( positions )
expire_date = now + datetime.timedelta(days=holding_days)
for sec in positions.index:
context.position_expire[sec] = expire_date

print 'shorts (length %d):\n' % (len(context.shorts.index)), context.shorts
print 'longs  (length %d):\n' % (len(context.longs.index)), context.longs

context.universe = (context.longs.index | context.shorts.index)

def rebalance(context, data):

#
# calculate how much money we have for rebalancing today
#
context.exposure.update(context, data)

available_cash = context.exposure.get_available_cash(context)

log.debug( 'available_cash %f' % (available_cash) )

#
# as we expect to find events every day, we want to have some cash available for trading
# we put a percentage limit on the cash we can use for trading every day
# so that something will be left available for the following days
#
available_cash  *= context.daily_cash_limit_perc

log.debug( 'We will use cash %f: long %d sec, short %d sec' % (available_cash, len(context.longs.index), len(context.shorts.index)) )

#
# Hre we decide how much cash we want to assing to each security
#
cash_per_sec = available_cash / (len(context.longs.index)+len(context.shorts.index))
# no more than 2000 anyway
cash_per_sec = min( cash_per_sec, 2000)

#
# Enter new positions
#
for sec in context.longs.index:
if cash_per_sec > 0 and data.can_trade(sec):
order_value(sec, cash_per_sec)
log.debug( 'long order %s amount %f' % (str(sec), cash_per_sec) )

for sec in context.shorts.index:
if cash_per_sec > 0 and data.can_trade(sec):
order_value(sec, -cash_per_sec)
log.debug( 'short order %s amount %f' % (str(sec), cash_per_sec) )

#
# Clear positions for this day
#
for sec in context.positions_to_clear:
order_target(sec, 0)
log.debug( 'clear positions for %s' % (str(sec)) )

def log_stats(context, data):
context.exposure.update(context, data)
long_exposure_pct, short_exposure_pct = context.exposure.get_long_short_exposure_pct(context)
record(lever=context.account.leverage,
exposure=context.account.net_leverage,
num_pos=len(context.portfolio.positions),
long_signals=len(context.longs.index),
short_signals=len(context.shorts.index))

There was a runtime error.

Thanks all for contributing to this post but I have one question:
What look ahead bias is being referred to? That is, Isn't the period used for the kernel regression plus lag to observe the signal roughly equal to the time that a trader would take to identify the various patterns?

The look ahead bias is in the Andres'NB and in the event study NB (not in the algorithm and the factor tear sheet) and it is due to the kernel regression being run for the full time period in one shot. In reality you would have to run the regression every day instead, like a rolling kernel regression on the past 30 days of data or so. Why this matter? Because the kernel regression doesn't give you good smoothed values at the end of a time series, there isn't just enough data to do it. This means that in reality you cannot trust kernel regression in the proximity of the current date, as you run a regression every day, and the last local minimum/maximum needed to identify a pattern might become evident only some days later. By the time a pattern is identified there might still be some alpha left or not, but to understand that we should rewrite the NB code so that it performs the pattern identification using a rolling kernel regression to simulate live trading behaviour, differently we cannot trust the results as they are biased.

Thanks Luca....I guess I assumed that the NB was replicating the paper exactly. So since the authors were using rolling regressions, they had no look ahead bias as Andrew implied at the beginning of this post:

As I discuss in the notebook, one of my concerns with the author's methodology is the introduction of lookahead bias via the kernel regression...

Well, In his NB Andrew takes into account the detection lag when he calculates the returns. He assumes a 4 days identification lag that decrease the hypothetical returns. But still the NB might be considering patterns that wouldn't have been recognized in real trading, with a rolling kernel regression
It's not that hard to modify the code so that it uses a rolling kernel regression though.

For all the experts on this forum, is the paper by Lo on this subject dated? (do not mean disrespect for the paper here). Are there there any best practices/standard libraries in identifying these basic technical patterns since it appears that most of the financial investment platforms offer screening for these patterns such as finviz.com and interactive broker(?). Thanks.

Hi Andrew, I am not able to run the notebook and it has problems at cell # 9. The error is:

raise ValueError('Cannot index with multidimensional key')

KeyErrorTraceback (most recent call last)
in ()
25
26 plot_window(msft_prices, smooth_prices, local_max_dt, local_min_dt, price_local_max_dt,
---> 27 price_local_min_dt, pd.datetime(2005,2,19), pd.datetime(2006,2,26))
28 plot_window(msft_prices, smooth_prices, local_max_dt, local_min_dt, price_local_max_dt,
29 price_local_min_dt,pd.datetime(2007,9,1), pd.datetime(2008,9,1))

in plot_window(prices, smooth_prices, smooth_maxima_dt, smooth_minima_dt, price_maxima_dt, price_minima_dt, start, end, ax)
13 smooth_prices_.plot(ax=ax)
14
---> 15 smooth_max = smooth_prices_.loc[smooth_maxima_dt]
16 smooth_min = smooth_prices_.loc[smooth_minima_dt]
17 price_max = prices_.loc[price_maxima_dt]

/usr/local/lib/python2.7/dist-packages/pandas/core/indexing.pyc in getitem(self, key) 1294 return self._getitem_tuple(key)
1295 else:
-> 1296 return self._getitem_axis(key, axis=0)
1297
1298 def _getitem_axis(self, key, axis=0):

/usr/local/lib/python2.7/dist-packages/pandas/core/indexing.pyc in _getitem_axis(self, key, axis) 1454 raise ValueError('Cannot index with multidimensional key')
1455
-> 1456 return self._getitem_iterable(key, axis=axis)
1457
1458 # nested tuple slicing

/usr/local/lib/python2.7/dist-packages/pandas/core/indexing.pyc in _getitem_iterable(self, key, axis) 1020 def _getitem_iterable(self, key, axis=0):
1021 if self._should_validate_iterable(axis):
-> 1022 self._has_valid_type(key, axis)
1023
1024 labels = self.obj._get_axis(axis)

/usr/local/lib/python2.7/dist-packages/pandas/core/indexing.pyc in _has_valid_type(self, key, axis) 1377
1378 raise KeyError("None of [%s] are in the [%s]" %
-> 1379 (key, self.obj._get_axis_name(axis)))
1380
1381 return True

KeyError: "None of [['2003-01-13T00:00:00.000000000' '2003-02-20T00:00:00.000000000'\n '2003-03-20T00:00:00.000000000' '2003-04-04T00:00:00.000000000'\n '2003-05-08T00:00:00.000000000' '2003-06-02T00:00:00.000000000'\n '2003-06-19T00:00:00.000000000' '2003-07-09T00:00:00.000000000'\n '2003-07-28T00:00:00.000000000' '2003-09-08T00:00:00.000000000'\n '2003-09-19T00:00:00.000000000' '2003-10-07T00:00:00.000000000'\n '2003-10-17T00:00:00.000000000' '2003-12-19T0

Amir Vahid,
Change the code

local_max_dt = smooth_prices.iloc[local_max].index.values
local_min_dt = smooth_prices.iloc[local_min].index.values


to

local_max_dt = smooth_prices.iloc[local_max].index
local_min_dt = smooth_prices.iloc[local_min].index


Here is the Alphalens event study NB

EDIT: cleaned up the NB a bit

41
Notebook previews are currently unavailable.

Luca: This is excellent! A couple of observations:

• Really cool to also see what the stock is doing before the event which gives one a visual on what the technical analysis is latching onto
• Holy cow, some of these seem to actually do something meaningful
• Interesting to see how some that are bearish (RTOP and TTOP) seem to have the right negative short-term alpha but then long-term positive alpha (thanks for including 11D as one can see that it's where a lot of long-term decays again).

Have you attempted any type of signal combination?

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

@Thomas, there is my old algorithm few posts above but please note that it's not straightforward to decide how to best trade all those signals together. Depending on the detected pattern we want to hold the securities for different number of days. Also, since some of the patterns perform best after more than one day, when we detect some signals we don't know if it is better to trade all available cash on those or to keep some cash for following days. Either we risk to have an underleveraged portfolio or we don't trade all the signals because we used all the cash. The trading logic is quite complex.

Please install the fastdtw on the research notebook.

Out of curiosity, I tried the TBOT signal alone

136
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
 Returns 1 Month 3 Month 6 Month 12 Month
 Alpha 1 Month 3 Month 6 Month 12 Month
 Beta 1 Month 3 Month 6 Month 12 Month
 Sharpe 1 Month 3 Month 6 Month 12 Month
 Sortino 1 Month 3 Month 6 Month 12 Month
 Volatility 1 Month 3 Month 6 Month 12 Month
 Max Drawdown 1 Month 3 Month 6 Month 12 Month
from quantopian.algorithm import attach_pipeline, pipeline_output
from quantopian.pipeline import Pipeline
from quantopian.pipeline import factors, filters, classifiers
from quantopian.pipeline.factors import CustomFactor, SimpleMovingAverage, AverageDollarVolume, Returns
from quantopian.pipeline.filters import StaticAssets, Q500US, Q1500US, Q3000US, QTradableStocksUS
from quantopian.pipeline.filters.fundamentals import IsPrimaryShare
from quantopian.pipeline.classifiers.fundamentals import Sector
from quantopian.pipeline.data.builtin import USEquityPricing

import pandas as pd
import numpy as np
import datetime

import scipy.stats as stats
from statsmodels.nonparametric.kernel_regression import KernelReg
from scipy.signal import argrelextrema
from numpy import linspace
from collections import defaultdict

##################################   PatternFactor   ###################################################

def find_max_min(prices):
prices_ = prices.copy()
prices_.index = linspace(1., len(prices_), len(prices_))
kr = KernelReg([prices_.values], [prices_.index.values], var_type='c', bw=[1.8,1])
f = kr.fit([prices_.index.values])
smooth_prices = pd.Series(data=f[0], index=prices.index)

local_max = argrelextrema(smooth_prices.values, np.greater)[0]
local_min = argrelextrema(smooth_prices.values, np.less)[0]

price_local_max_dt = []
for i in local_max:
if (i>1) and (i<len(prices)-1):
price_local_max_dt.append(prices.iloc[i-2:i+2].argmax())

price_local_min_dt = []
for i in local_min:
if (i>1) and (i<len(prices)-1):
price_local_min_dt.append(prices.iloc[i-2:i+2].argmin())

prices.name = 'price'
maxima = pd.DataFrame(prices.loc[price_local_max_dt])
minima = pd.DataFrame(prices.loc[price_local_min_dt])
max_min = pd.concat([maxima, minima]).sort_index()
max_min.index.name = 'date'
max_min = max_min.reset_index()
max_min = max_min[~max_min.date.duplicated()]
p = prices.reset_index()
max_min['day_num'] = p[p['index'].isin(max_min.date)].index.values
max_min = max_min.set_index('day_num').price

return max_min

def find_patterns(max_min):
patterns = defaultdict(list)

for i in range(5, len(max_min)+1):
window = max_min.iloc[i-5:i]

# pattern must play out in less than 36 days
if window.index[-1] - window.index[0] > 35:
continue

# Using the notation from the paper to avoid mistakes
e1 = window.iloc[0]
e2 = window.iloc[1]
e3 = window.iloc[2]
e4 = window.iloc[3]
e5 = window.iloc[4]

rtop_g1 = np.mean([e1,e3,e5])
rtop_g2 = np.mean([e2,e4])
if (e1 > e2) and (e3 > e1) and (e3 > e5) and \
(abs(e1 - e5) <= 0.03*np.mean([e1,e5])) and \
(abs(e2 - e4) <= 0.03*np.mean([e1,e5])):
patterns['HS'].append((window.index[0], window.index[-1]))

elif (e1 < e2) and (e3 < e1) and (e3 < e5) and \
(abs(e1 - e5) <= 0.03*np.mean([e1,e5])) and \
(abs(e2 - e4) <= 0.03*np.mean([e1,e5])):
patterns['IHS'].append((window.index[0], window.index[-1]))

elif (e1 > e2) and (e1 < e3) and (e3 < e5) and (e2 > e4):
patterns['BTOP'].append((window.index[0], window.index[-1]))

elif (e1 < e2) and (e1 > e3) and (e3 > e5) and (e2 < e4):
patterns['BBOT'].append((window.index[0], window.index[-1]))

# Triangle Top
elif (e1 > e2) and (e1 > e3) and (e3 > e5) and (e2 < e4):
patterns['TTOP'].append((window.index[0], window.index[-1]))

# Triangle Bottom
elif (e1 < e2) and (e1 < e3) and (e3 < e5) and (e2 > e4):
patterns['TBOT'].append((window.index[0], window.index[-1]))

# Rectangle Top
elif (e1 > e2) and (abs(e1-rtop_g1)/rtop_g1 < 0.0075) and \
(abs(e3-rtop_g1)/rtop_g1 < 0.0075) and (abs(e5-rtop_g1)/rtop_g1 < 0.0075) and \
(abs(e2-rtop_g2)/rtop_g2 < 0.0075) and (abs(e4-rtop_g2)/rtop_g2 < 0.0075) and \
(min(e1, e3, e5) > max(e2, e4)):

patterns['RTOP'].append((window.index[0], window.index[-1]))

# Rectangle Bottom
elif (e1 < e2) and (abs(e1-rtop_g1)/rtop_g1 < 0.0075) and \
(abs(e3-rtop_g1)/rtop_g1 < 0.0075) and (abs(e5-rtop_g1)/rtop_g1 < 0.0075) and \
(abs(e2-rtop_g2)/rtop_g2 < 0.0075) and (abs(e4-rtop_g2)/rtop_g2 < 0.0075) and \
(max(e1, e3, e5) > min(e2, e4)):
patterns['RBOT'].append((window.index[0], window.index[-1]))

return patterns

def _pattern_identification(prices, indentification_lag):
max_min = find_max_min(prices)

# we are only interested in the last pattern (if multiple patterns are there)
# and also the last min/max must have happened less than "indentification_lag"
# days ago otherways it mush have already been identified or it is too late to be usefull
max_min_last_window = None

for i in reversed(range(len(max_min))):
days_ago = prices.index[-1] - max_min.index[i]
if days_ago <= indentification_lag:
last_max_min = days_ago
max_min_last_window = max_min.iloc[i-4:i+1]
break

if max_min_last_window is None:
return np.nan

# possibly identify a pattern in the selected window
patterns = find_patterns(max_min_last_window)
if len(patterns) != 1:
return np.nan

name, start_end_day_nums = patterns.iteritems().next()

pattern_code = {
'HS'   : 20,
'IHS'  : 2,
'BTOP' : 10,
'BBOT' : 1,
'TTOP' : 40,
'TBOT' : 4,
'RTOP' : 30,
'RBOT' : 3,
}

return pattern_code[name]

class PatternFactor(CustomFactor):

params = ('indentification_lag',)
inputs = [USEquityPricing.close]
window_length = 40

def compute(self, today, assets, out, close, indentification_lag):
prices = pd.DataFrame(close, columns=assets)
out[:] = prices.apply(_pattern_identification, args=(indentification_lag,))

##################################################################################################

def make_pipeline(context):
"""
Create and return our pipeline.
"""
pipe = Pipeline()
universe =  Q500US()
pipe.set_screen(universe)
return pipe

# Put any initialization logic here. The context object will be passed to
# the other methods in your algorithm.
def initialize(context):
set_slippage(slippage.VolumeShareSlippage(volume_limit=0.025, price_impact=0.0))

attach_pipeline(make_pipeline(context), 'factors')

schedule_function(rebalance,          date_rules.every_day(), time_rules.market_open())
schedule_function(log_stats,          date_rules.every_day(), time_rules.market_close())

# Compute final rank and assign long and short baskets.
results = pipeline_output('factors')
results = results.replace([np.inf, -np.inf], np.nan)
results = results.dropna()

print 'Basket of stocks %d ' % (len(results))

#
# Now fill context.shorts and context.longs and "rebalance" method will use
# that information to enter required positions
#
patterns = [ # name, code
#('HS'  , -2),
#('IHS' ,  2),
#('BTOP', -1),
#('BBOT',  1),
#('TTOP', -4),
('TBOT',  4),
#('RTOP', -3),
#('RBOT',  3),
]

context.shorts = pd.Series()
context.longs  = pd.Series()
for name, code in patterns:
positions = results[ results['pattern'] == code ]['pattern']
if len(positions) <= 0:
continue
if code < 0:
context.shorts = context.shorts.append( positions )
elif code > 0:
context.longs  = context.longs.append( positions )

print 'shorts (length %d):\n' % (len(context.shorts.index)), context.shorts
print 'longs  (length %d):\n' % (len(context.longs.index)), context.longs

def rebalance(context, data):

tot_pos = context.longs.size + context.shorts.size
if tot_pos > 0:
context.longs[:] = 1. / tot_pos
context.shorts[:] = 1. / tot_pos

for security in context.shorts.index:
if get_open_orders(security):
continue
order_target_percent(security, -context.shorts[security])

for security in context.longs.index:
if get_open_orders(security):
continue
order_target_percent(security, context.longs[security])

for security in context.portfolio.positions:
if get_open_orders(security):
continue
if data.can_trade(security) and security not in (context.longs.index | context.shorts.index):
order_target_percent(security, 0)

def log_stats(context, data):
record(lever=context.account.leverage,
exposure=context.account.net_leverage,
num_pos=len(context.portfolio.positions),
long_signals=len(context.longs.index),
short_signals=len(context.shorts.index))
There was a runtime error.

Guys, thank you so much for your effort.

Did you test ta-lib candlestick patterns as well?

Thank you all for so much effort. Such an interresting work.

@Luca
I am trying to play arround with your NB(cleaned up version) , Just tryng to analyze the alphalens with 4 quantiles and change the universe to 'QTradableStocksUS'. But the algo seems to be unending!!

I ran it for more than 4 hours and tried to do so 5 times but i can t see an end of it.
Was it taking that long when you ran it ??

NB: kept the other variable Identical:
factor_name = 'factor'

start_date = '2014-08-01'
end_date = '2016-08-01'
filter_universe = True # very slow, filter out untradable stocks
show_sector_plots = False # very slow to load the sector column in pipeline

# alphalens specific

periods = (1, 2, 3, 4, 5, 6, 10)
quantiles = 4
bins = None
avgretplot = (10, 25) # use None to avoid plotting or (days_before, days_after)
filter_zscore = None
long_short = False

prices_cache = None # this saves lots of time when running tear sheet multiple times

@Moussa,

You may be running into too much ‘memory utilisation’?

Try clearing memory by killing all active NBs. Then maybe try a shorter time period, and/or fewer periods (eg only 1, 5, 10 perhaps before honing in?).

@Joakim

Thank you a lot for taking the time to help.

I did kill all active NB, it was the only one running . However i will try for fewer periods but @ luca did succeed for 10 periods so i do not understand.
Even when i try to run the NB with no changes, the same thing happens.

Did you succeed with running it yourself?

Having the same problems. It will run for about 1% of total backtest, and then it just loads forever and never gets anywhere. Based on the time that is took for it to get to 1% completion, I can only imagine that it must've taken several days for @Luca to compile the algorithm.

Wondering if anyone has been able to put the patterns into Custom Factors?

Thank you very much for your all contribution. It seems like the long-strategies are generally more profitable than the shorting one. I wonder if this is because, in the long-term, the stock market would generally go up?