Quantopian Lecture Series: This Time You're More Wrong

You’ve constructed a model and are getting significant p-values. Everything looks good, but then your algorithm starts losing money. Sometimes the ways that models are constructed can lead to a complete breakdown in the statistics used to evaluate them. We will show some common cases of this and discuss warning signs.

The lecture will be presented at this meetup. We will be releasing a video lecture as well, watch this thread for a link.

Also in this lecture:

This is part of Quantopian’s Summer Lecture Series. We are currently developing a quant finance curriculum and will be releasing clone-able notebooks and algorithms to teach key concepts. Stay tuned for more. We are also working on a permanent home for all of our notebooks.

Credit for the notebooks goes to Evgenia 'Jenny' Nitishinskaya, and credit for the algorithms goes to David Edwards.

986
Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

26 responses

Since this week's topic is more about what not to do I decided to create an algorithm in response to a question during the last lecture on Kalman filtering.

I was asked, "where would you go next with this algorithm?" My response was that I would factor the entire thing into a class, that way I can create multiple instances of it and trade several pairs at once. That's what I have done here, this algorithm trades 6 pairs at once using a class nearly the same as Kalman filtering algo from last time. The pairs are not special, I just selected two companies from the same industry for 6 industries.

Please "clone and tweak" this algorithm, this is only meant to be a scaffolding for you to build upon, you still need to add your own twist.

422
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
 Returns 1 Month 3 Month 6 Month 12 Month
 Alpha 1 Month 3 Month 6 Month 12 Month
 Beta 1 Month 3 Month 6 Month 12 Month
 Sharpe 1 Month 3 Month 6 Month 12 Month
 Sortino 1 Month 3 Month 6 Month 12 Month
 Volatility 1 Month 3 Month 6 Month 12 Month
 Max Drawdown 1 Month 3 Month 6 Month 12 Month
"""

Author: David Edwards

This algorithm extends the Kalman Filtering pairs trading algorithm from a
previous lecture to support multiple pairs. In order to extend the idea,
the previous algorithm was factored into a class so several instances can be
created with different assets.

This algorithm was developed by David Edwards as part of
Quantopian's 2015 summer lecture series. Please direct any
questions, feedback, or corrections to [email protected]
"""

import numpy as np
import pandas as pd
from pykalman import KalmanFilter
import statsmodels.api as sm

def initialize(context):
# Quantopian backtester specific variables
set_symbol_lookup_date('2014-01-01')

context.pairs = [
initial_bars=300, freq='1m', delta=1e-3, maxlen=300),
initial_bars=300, freq='1m', delta=1e-3, maxlen=300),
initial_bars=300, freq='1m', delta=1e-3, maxlen=300),
initial_bars=300, freq='1m', delta=1e-3, maxlen=300),
initial_bars=300, freq='1m', delta=1e-3, maxlen=300),
initial_bars=300, freq='1m', delta=1e-3, maxlen=300),

]

weight = 1.8 / len(context.pairs)
for pair in context.pairs:
pair.leverage = weight

for minute in range(10, 390, 90):
for pair in context.pairs:
time_rule=time_rules.market_open(minutes=minute))

def __init__(self, y, x, leverage=1.0, initial_bars=10,
freq='1d', delta=1e-3, maxlen=3000):
self._y = y
self._x = x
self.maxlen = maxlen
self.initial_bars = initial_bars
self.freq = freq
self.delta = delta
self.leverage = leverage
self.Y = KalmanMovingAverage(self._y, maxlen=self.maxlen)
self.X = KalmanMovingAverage(self._x, maxlen=self.maxlen)
self.kf = None
self.entry_dt = pd.Timestamp('1900-01-01', tz='utc')

@property
def name(self):
return "{}~{}".format(self._y.symbol, self._x.symbol)

try:
if self.kf is None:
self.initialize_filters(context, data)
return
self.update()
if get_open_orders(sid=self._x) or get_open_orders(sid=self._y):
return

reference_pos = context.portfolio.positions[self._y].amount

now = get_datetime()
if reference_pos:
if (now - self.entry_dt).days > 20:
order_target(self._y, 0.0)
order_target(self._x, 0.0)
return
# Do a PNL check to make sure a reversion at least covered trading costs
# I do this because parameter drift often causes trades to be exited
# before the original spread has become profitable.
pnl = self.get_pnl(context, data)
if zscore > -0.0 and reference_pos > 0 and pnl > 0:
order_target(self._y, 0.0)
order_target(self._x, 0.0)

elif zscore < 0.0 and reference_pos < 0 and pnl > 0:
order_target(self._y, 0.0)
order_target(self._x, 0.0)

else:
if zscore > 1.5:
order_target_percent(self._y, -self.leverage / 2.)
order_target_percent(self._x, self.leverage / 2.)
self.entry_dt = now
if zscore < -1.5:
order_target_percent(self._y, self.leverage / 2.)
order_target_percent(self._x, -self.leverage / 2.)
self.entry_dt = now
except Exception as e:
log.debug("[{}] {}".format(self.name, str(e)))

def update(self):
prices = np.log(history(bar_count=1, frequency='1m', field='price'))
self.X.update(prices)
self.Y.update(prices)
self.kf.update(self.means_frame().iloc[-1])

means = self.means_frame()
beta, alpha = self.kf.state_mean
return means[self._y] - (beta * means[self._x] + alpha)

def means_frame(self):
mu_Y = self.Y.state_means
mu_X = self.X.state_means
return pd.DataFrame([mu_Y, mu_X]).T

def initialize_filters(self, context, data):
prices = np.log(history(self.initial_bars, self.freq, 'price'))
self.X.update(prices)
self.Y.update(prices)

# Drops the initial 0 mean value from the kalman filter
self.X.state_means = self.X.state_means.iloc[-self.initial_bars:]
self.Y.state_means = self.Y.state_means.iloc[-self.initial_bars:]
self.kf = KalmanRegression(self.Y.state_means, self.X.state_means,
delta=self.delta, maxlen=self.maxlen)

def get_pnl(self, context, data):
x = self._x
y = self._y
prices = history(1, '1d', 'price').iloc[-1]
positions = context.portfolio.positions
dx = prices[x] - positions[x].cost_basis
dy = prices[y] - positions[y].cost_basis
return (positions[x].amount * dx +
positions[y].amount * dy)

def handle_data(context, data):
record(market_exposure=context.account.net_leverage,
leverage=context.account.leverage)

class KalmanMovingAverage(object):
"""
Estimates the moving average of a price process
via Kalman Filtering.

See http://pykalman.github.io/ for docs on the
filtering process.
"""

def __init__(self, asset, observation_covariance=1.0, initial_value=0,
initial_state_covariance=1.0, transition_covariance=0.05,
initial_window=20, maxlen=3000, freq='1d'):

self.asset = asset
self.freq = freq
self.initial_window = initial_window
self.maxlen = maxlen
self.kf = KalmanFilter(transition_matrices=[1],
observation_matrices=[1],
initial_state_mean=initial_value,
initial_state_covariance=initial_state_covariance,
observation_covariance=observation_covariance,
transition_covariance=transition_covariance)
self.state_means = pd.Series([self.kf.initial_state_mean], name=self.asset)
self.state_vars = pd.Series([self.kf.initial_state_covariance], name=self.asset)

def update(self, observations):
for dt, observation in observations[self.asset].iterkv():
self._update(dt, observation)

def _update(self, dt, observation):
mu, cov = self.kf.filter_update(self.state_means.iloc[-1],
self.state_vars.iloc[-1],
observation)
self.state_means[dt] = mu.flatten()[0]
self.state_vars[dt] = cov.flatten()[0]
if self.state_means.shape[0] > self.maxlen:
self.state_means = self.state_means.iloc[-self.maxlen:]
if self.state_vars.shape[0] > self.maxlen:
self.state_vars = self.state_vars.iloc[-self.maxlen:]

class KalmanRegression(object):
"""
Uses a Kalman Filter to estimate regression parameters
in an online fashion.

Estimated model: y ~ beta * x + alpha
"""

def __init__(self, initial_y, initial_x, delta=1e-5, maxlen=3000):
self._x = initial_x.name
self._y = initial_y.name
self.maxlen = maxlen
trans_cov = delta / (1 - delta) * np.eye(2)
obs_mat = np.expand_dims(
np.vstack([[initial_x], [np.ones(initial_x.shape[0])]]).T, axis=1)

self.kf = KalmanFilter(n_dim_obs=1, n_dim_state=2,
initial_state_mean=np.zeros(2),
initial_state_covariance=np.ones((2, 2)),
transition_matrices=np.eye(2),
observation_matrices=obs_mat,
observation_covariance=1.0,
transition_covariance=trans_cov)
state_means, state_covs = self.kf.filter(initial_y.values)
self.means = pd.DataFrame(state_means,
index=initial_y.index,
columns=['beta', 'alpha'])
self.state_cov = state_covs[-1]

def update(self, observations):
x = observations[self._x]
y = observations[self._y]
mu, self.state_cov = self.kf.filter_update(
self.state_mean, self.state_cov, y,
observation_matrix=np.array([[x, 1.0]]))
mu = pd.Series(mu, index=['beta', 'alpha'],
name=observations.name)
self.means = self.means.append(mu)
if self.means.shape[0] > self.maxlen:
self.means = self.means.iloc[-self.maxlen:]

x = observations[self._x]
y = observations[self._y]
return y - (self.means.beta[-1] * x + self.means.alpha[-1])

@property
def state_mean(self):
return self.means.iloc[-1]


There was a runtime error.

love these series Guys! Always enjoy looking at your code David, very clean and well structured.

Thanks Ted, I appreciate the positive feedback!

The recordings from this lecture are ready. You can view the full lecture or check out the 3 separate topics below:

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Does anyone understand the relationship between autocorrelation and non-stationary?

Is it necessary to use both sm.tsa.stattools.acf and sm.tsa.stattools.adfuller?

Great link, Jonathan. I found it myself while researching the topic. Wikipedia, as usual, also has a good explanation of non-stationarity. The way I think about it is that non-stationarity is describing whether or not a data generating process changes over time. Autocorrelation is describing one aspect of how that change occurs. This is a very simplified and hand-wavy definition of course, so I hope I haven't confused anyone further.

One thing to note is that many many series in finance are auto-correlated. Prices and returns, especially, are influenced by the recent prices and returns. It's important to at least check to see how auto-correlated a series is before trying to model future values.

David's multiple-pairs Kalman-filter-based algo deserves more attention, this is good stuff.

I agree Simon, I liked making this one, there's a few gems in it from an implementation perspective. The pattern of factoring a strategy into a class and creating instances of it is really useful. I find it easier to track several things when each instance is doing its own bookkeeping.

Kalman filtering is pretty cool, I hadn't really played with the idea much before the lecture series, but I've been looking for excuses to use them since.

Thanks for sharing that algo.

Looking over the source code, isn't it overkill to feed in filtered average prices into the Kalman filter?

Can't you replicate the same thing by just adjusting the observation_covariance?

Anybody looking for a discussion purely about Kalman filters and their usage should see here. Also, the original Kalman filters lecture thread, available as part of the Quantopian Lecture Series.

www.quantopian.com/lectures

Hi Johnathan,
It could very well be overkill, I'm not sure really. This algo was a demo of the concepts in the Kalman filtering lecture, which discussed using it for moving averages and regression. I wanted to make sure there was code for both of those concepts in there so I filtered the prices also.

David

As an example for Kalman filters, that makes sense. Thanks!

Hello, I've just cloned the algorithm of David above, but it's not working anymore. I guess because of the new API. I've tried to changed it, so that it works again:
I changed  sid  in  asset  in line 82. So now the algorirthm is working again, but the results are different ( I got a max drawdown of 8.8%. Any idea how to fix it?

60
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
 Returns 1 Month 3 Month 6 Month 12 Month
 Alpha 1 Month 3 Month 6 Month 12 Month
 Beta 1 Month 3 Month 6 Month 12 Month
 Sharpe 1 Month 3 Month 6 Month 12 Month
 Sortino 1 Month 3 Month 6 Month 12 Month
 Volatility 1 Month 3 Month 6 Month 12 Month
 Max Drawdown 1 Month 3 Month 6 Month 12 Month
"""

Author: David Edwards

This algorithm extends the Kalman Filtering pairs trading algorithm from a
previous lecture to support multiple pairs. In order to extend the idea,
the previous algorithm was factored into a class so several instances can be
created with different assets.

This algorithm was developed by David Edwards as part of
Quantopian's 2015 summer lecture series. Please direct any
questions, feedback, or corrections to [email protected]
"""

import numpy as np
import pandas as pd
from pykalman import KalmanFilter
import statsmodels.api as sm

def initialize(context):
# Quantopian backtester specific variables
set_symbol_lookup_date('2014-01-01')

context.pairs = [
initial_bars=300, freq='1m', delta=1e-3, maxlen=300),
initial_bars=300, freq='1m', delta=1e-3, maxlen=300),
initial_bars=300, freq='1m', delta=1e-3, maxlen=300),
initial_bars=300, freq='1m', delta=1e-3, maxlen=300),
initial_bars=300, freq='1m', delta=1e-3, maxlen=300),
initial_bars=300, freq='1m', delta=1e-3, maxlen=300),

]

weight = 1.8 / len(context.pairs)
for pair in context.pairs:
pair.leverage = weight

for minute in range(10, 390, 90):
for pair in context.pairs:
time_rule=time_rules.market_open(minutes=minute))

def __init__(self, y, x, leverage=1.0, initial_bars=10,
freq='1d', delta=1e-3, maxlen=3000):
self._y = y
self._x = x
self.maxlen = maxlen
self.initial_bars = initial_bars
self.freq = freq
self.delta = delta
self.leverage = leverage
self.Y = KalmanMovingAverage(self._y, maxlen=self.maxlen)
self.X = KalmanMovingAverage(self._x, maxlen=self.maxlen)
self.kf = None
self.entry_dt = pd.Timestamp('1900-01-01', tz='utc')

@property
def name(self):
return "{}~{}".format(self._y.symbol, self._x.symbol)

try:
if self.kf is None:
self.initialize_filters(context, data)
return
self.update()
if get_open_orders(asset=self._x) or get_open_orders(asset=self._y):
return

reference_pos = context.portfolio.positions[self._y].amount

now = get_datetime()
if reference_pos:
if (now - self.entry_dt).days > 20:
order_target(self._y, 0.0)
order_target(self._x, 0.0)
return
# Do a PNL check to make sure a reversion at least covered trading costs
# I do this because parameter drift often causes trades to be exited
# before the original spread has become profitable.
pnl = self.get_pnl(context, data)
if zscore > -0.0 and reference_pos > 0 and pnl > 0:
order_target(self._y, 0.0)
order_target(self._x, 0.0)

elif zscore < 0.0 and reference_pos < 0 and pnl > 0:
order_target(self._y, 0.0)
order_target(self._x, 0.0)

else:
if zscore > 1.5:
order_target_percent(self._y, -self.leverage / 2.)
order_target_percent(self._x, self.leverage / 2.)
self.entry_dt = now
if zscore < -1.5:
order_target_percent(self._y, self.leverage / 2.)
order_target_percent(self._x, -self.leverage / 2.)
self.entry_dt = now
except Exception as e:
log.debug("[{}] {}".format(self.name, str(e)))

def update(self):
prices = np.log(history(bar_count=1, frequency='1m', field='price'))
self.X.update(prices)
self.Y.update(prices)
self.kf.update(self.means_frame().iloc[-1])

means = self.means_frame()
beta, alpha = self.kf.state_mean
return means[self._y] - (beta * means[self._x] + alpha)

def means_frame(self):
mu_Y = self.Y.state_means
mu_X = self.X.state_means
return pd.DataFrame([mu_Y, mu_X]).T

def initialize_filters(self, context, data):
prices = np.log(history(self.initial_bars, self.freq, 'price'))
self.X.update(prices)
self.Y.update(prices)

# Drops the initial 0 mean value from the kalman filter
self.X.state_means = self.X.state_means.iloc[-self.initial_bars:]
self.Y.state_means = self.Y.state_means.iloc[-self.initial_bars:]
self.kf = KalmanRegression(self.Y.state_means, self.X.state_means,
delta=self.delta, maxlen=self.maxlen)

def get_pnl(self, context, data):
x = self._x
y = self._y
prices = history(1, '1d', 'price').iloc[-1]
positions = context.portfolio.positions
dx = prices[x] - positions[x].cost_basis
dy = prices[y] - positions[y].cost_basis
return (positions[x].amount * dx +
positions[y].amount * dy)

def handle_data(context, data):
record(market_exposure=context.account.net_leverage,
leverage=context.account.leverage)

class KalmanMovingAverage(object):
"""
Estimates the moving average of a price process
via Kalman Filtering.

See http://pykalman.github.io/ for docs on the
filtering process.
"""

def __init__(self, asset, observation_covariance=1.0, initial_value=0,
initial_state_covariance=1.0, transition_covariance=0.05,
initial_window=20, maxlen=3000, freq='1d'):

self.asset = asset
self.freq = freq
self.initial_window = initial_window
self.maxlen = maxlen
self.kf = KalmanFilter(transition_matrices=[1],
observation_matrices=[1],
initial_state_mean=initial_value,
initial_state_covariance=initial_state_covariance,
observation_covariance=observation_covariance,
transition_covariance=transition_covariance)
self.state_means = pd.Series([self.kf.initial_state_mean], name=self.asset)
self.state_vars = pd.Series([self.kf.initial_state_covariance], name=self.asset)

def update(self, observations):
for dt, observation in observations[self.asset].iterkv():
self._update(dt, observation)

def _update(self, dt, observation):
mu, cov = self.kf.filter_update(self.state_means.iloc[-1],
self.state_vars.iloc[-1],
observation)
self.state_means[dt] = mu.flatten()[0]
self.state_vars[dt] = cov.flatten()[0]
if self.state_means.shape[0] > self.maxlen:
self.state_means = self.state_means.iloc[-self.maxlen:]
if self.state_vars.shape[0] > self.maxlen:
self.state_vars = self.state_vars.iloc[-self.maxlen:]

class KalmanRegression(object):
"""
Uses a Kalman Filter to estimate regression parameters
in an online fashion.

Estimated model: y ~ beta * x + alpha
"""

def __init__(self, initial_y, initial_x, delta=1e-5, maxlen=3000):
self._x = initial_x.name
self._y = initial_y.name
self.maxlen = maxlen
trans_cov = delta / (1 - delta) * np.eye(2)
obs_mat = np.expand_dims(
np.vstack([[initial_x], [np.ones(initial_x.shape[0])]]).T, axis=1)

self.kf = KalmanFilter(n_dim_obs=1, n_dim_state=2,
initial_state_mean=np.zeros(2),
initial_state_covariance=np.ones((2, 2)),
transition_matrices=np.eye(2),
observation_matrices=obs_mat,
observation_covariance=1.0,
transition_covariance=trans_cov)
state_means, state_covs = self.kf.filter(initial_y.values)
self.means = pd.DataFrame(state_means,
index=initial_y.index,
columns=['beta', 'alpha'])
self.state_cov = state_covs[-1]

def update(self, observations):
x = observations[self._x]
y = observations[self._y]
mu, self.state_cov = self.kf.filter_update(
self.state_mean, self.state_cov, y,
observation_matrix=np.array([[x, 1.0]]))
mu = pd.Series(mu, index=['beta', 'alpha'],
name=observations.name)
self.means = self.means.append(mu)
if self.means.shape[0] > self.maxlen:
self.means = self.means.iloc[-self.maxlen:]

x = observations[self._x]
y = observations[self._y]
return y - (self.means.beta[-1] * x + self.means.alpha[-1])

@property
def state_mean(self):
return self.means.iloc[-1]


There was a runtime error.

I think that some strategies will have slightly different results under the new API, and it's actually more accurate. We fixed a bunch of bugs and infidelities with the upgrade, so the new results should be treated as the correct ones now. We pulled the Kalman algorithm from the lectures page for now, as we're in the middle of building a ton of other lecture content. At some point we may upload a new version, but yours seems to be working for now.

...we're in the middle of building a ton of other lecture content...

Good news! They are excellent, I enjoyed all of them. Thanks.

I really like the economic thesis behind trading pairs, so I cloned this algorithm and have been experimenting with finding additional pairs, tweaking the algorithm, etc. In doing so, I uncovered something that doesn't seem quite right to me, and I would like to get the forum's thoughts on it. I was running the algorithm with just one single pair to see results, and I realized that if I switched the order of the securities (put security A in the _x position and security B in the _y position, then run again with A in the _y position and B in the _x position), the results are different.

From my understanding of the "idea" behind this algorithm, I believe the results should be the same regardless of how the 2 securities in the pair are ordered. I'm not quite at the level yet where I am able to debug why something like this would be happening. Would anyone care to take a look and see if they can explain to me why this should be happening or help change the algorithm so this no longer happens? Also, I have had trouble getting it updated so that the 3 instances of deprecated code are removed despite several attempts.

I will attempt to share 2 backtests that are identical except for the ordering of the pairs, and you can see that the outcomes are different.

Thanks for anyone who might be able to give this a shot,

Joseph

Follow-up: I give up for now, but will try to attach the backtests again later. Take my word for it - try cloning the earlier backtest, picking one pair of securities, run it, reverse the order of the securities and run it again... Sorry I can't get the backtests to attach right now!

Joseph

Here is one with F and GM...

6
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
 Returns 1 Month 3 Month 6 Month 12 Month
 Alpha 1 Month 3 Month 6 Month 12 Month
 Beta 1 Month 3 Month 6 Month 12 Month
 Sharpe 1 Month 3 Month 6 Month 12 Month
 Sortino 1 Month 3 Month 6 Month 12 Month
 Volatility 1 Month 3 Month 6 Month 12 Month
 Max Drawdown 1 Month 3 Month 6 Month 12 Month
"""

Author: David Edwards

This algorithm extends the Kalman Filtering pairs trading algorithm from a
previous lecture to support multiple pairs. In order to extend the idea,
the previous algorithm was factored into a class so several instances can be
created with different assets.

This algorithm was developed by David Edwards as part of
Quantopian's 2015 summer lecture series. Please direct any
questions, feedback, or corrections to [email protected]
"""

import numpy as np
import pandas as pd
from pykalman import KalmanFilter
import statsmodels.api as sm

def initialize(context):
# Quantopian backtester specific variables
set_symbol_lookup_date('2014-01-01')

context.pairs = [
initial_bars=300, freq='1m', delta=1e-3, maxlen=300),

]

weight = 1.8 / len(context.pairs)
for pair in context.pairs:
pair.leverage = weight

for minute in range(10, 390, 90):
for pair in context.pairs:
time_rule=time_rules.market_open(minutes=minute))

def __init__(self, y, x, leverage=1.0, initial_bars=10,
freq='1d', delta=1e-3, maxlen=3000):
self._y = y
self._x = x
self.maxlen = maxlen
self.initial_bars = initial_bars
self.freq = freq
self.delta = delta
self.leverage = leverage
self.Y = KalmanMovingAverage(self._y, maxlen=self.maxlen)
self.X = KalmanMovingAverage(self._x, maxlen=self.maxlen)
self.kf = None
self.entry_dt = pd.Timestamp('1900-01-01', tz='utc')

@property
def name(self):
return "{}~{}".format(self._y.symbol, self._x.symbol)

try:
if self.kf is None:
self.initialize_filters(context, data)
return
self.update()
if get_open_orders(asset=self._x) or get_open_orders(asset=self._y):
return

reference_pos = context.portfolio.positions[self._y].amount

now = get_datetime()
if reference_pos:
if (now - self.entry_dt).days > 20:
order_target(self._y, 0.0)
order_target(self._x, 0.0)
return
# Do a PNL check to make sure a reversion at least covered trading costs
# I do this because parameter drift often causes trades to be exited
# before the original spread has become profitable.
pnl = self.get_pnl(context, data)
if zscore > -0.0 and reference_pos > 0 and pnl > 0:
order_target(self._y, 0.0)
order_target(self._x, 0.0)

elif zscore < 0.0 and reference_pos < 0 and pnl > 0:
order_target(self._y, 0.0)
order_target(self._x, 0.0)

else:
if zscore > 1.5:
order_target_percent(self._y, -self.leverage / 2.)
order_target_percent(self._x, self.leverage / 2.)
self.entry_dt = now
if zscore < -1.5:
order_target_percent(self._y, self.leverage / 2.)
order_target_percent(self._x, -self.leverage / 2.)
self.entry_dt = now
except Exception as e:
log.debug("[{}] {}".format(self.name, str(e)))

def update(self):
prices = np.log(history(bar_count=1, frequency='1m', field='price'))
self.X.update(prices)
self.Y.update(prices)
self.kf.update(self.means_frame().iloc[-1])

means = self.means_frame()
beta, alpha = self.kf.state_mean
return means[self._y] - (beta * means[self._x] + alpha)

def means_frame(self):
mu_Y = self.Y.state_means
mu_X = self.X.state_means
return pd.DataFrame([mu_Y, mu_X]).T

def initialize_filters(self, context, data):
prices = np.log(history(self.initial_bars, self.freq, 'price'))
self.X.update(prices)
self.Y.update(prices)

# Drops the initial 0 mean value from the kalman filter
self.X.state_means = self.X.state_means.iloc[-self.initial_bars:]
self.Y.state_means = self.Y.state_means.iloc[-self.initial_bars:]
self.kf = KalmanRegression(self.Y.state_means, self.X.state_means,
delta=self.delta, maxlen=self.maxlen)

def get_pnl(self, context, data):
x = self._x
y = self._y
prices = history(1, '1d', 'price').iloc[-1]
positions = context.portfolio.positions
dx = prices[x] - positions[x].cost_basis
dy = prices[y] - positions[y].cost_basis
return (positions[x].amount * dx +
positions[y].amount * dy)

def handle_data(context, data):
record(market_exposure=context.account.net_leverage,
leverage=context.account.leverage)

class KalmanMovingAverage(object):
"""
Estimates the moving average of a price process
via Kalman Filtering.

See http://pykalman.github.io/ for docs on the
filtering process.
"""

def __init__(self, asset, observation_covariance=1.0, initial_value=0,
initial_state_covariance=1.0, transition_covariance=0.05,
initial_window=20, maxlen=3000, freq='1d'):

self.asset = asset
self.freq = freq
self.initial_window = initial_window
self.maxlen = maxlen
self.kf = KalmanFilter(transition_matrices=[1],
observation_matrices=[1],
initial_state_mean=initial_value,
initial_state_covariance=initial_state_covariance,
observation_covariance=observation_covariance,
transition_covariance=transition_covariance)
self.state_means = pd.Series([self.kf.initial_state_mean], name=self.asset)
self.state_vars = pd.Series([self.kf.initial_state_covariance], name=self.asset)

def update(self, observations):
for dt, observation in observations[self.asset].iterkv():
self._update(dt, observation)

def _update(self, dt, observation):
mu, cov = self.kf.filter_update(self.state_means.iloc[-1],
self.state_vars.iloc[-1],
observation)
self.state_means[dt] = mu.flatten()[0]
self.state_vars[dt] = cov.flatten()[0]
if self.state_means.shape[0] > self.maxlen:
self.state_means = self.state_means.iloc[-self.maxlen:]
if self.state_vars.shape[0] > self.maxlen:
self.state_vars = self.state_vars.iloc[-self.maxlen:]

class KalmanRegression(object):
"""
Uses a Kalman Filter to estimate regression parameters
in an online fashion.

Estimated model: y ~ beta * x + alpha
"""

def __init__(self, initial_y, initial_x, delta=1e-5, maxlen=3000):
self._x = initial_x.name
self._y = initial_y.name
self.maxlen = maxlen
trans_cov = delta / (1 - delta) * np.eye(2)
obs_mat = np.expand_dims(
np.vstack([[initial_x], [np.ones(initial_x.shape[0])]]).T, axis=1)

self.kf = KalmanFilter(n_dim_obs=1, n_dim_state=2,
initial_state_mean=np.zeros(2),
initial_state_covariance=np.ones((2, 2)),
transition_matrices=np.eye(2),
observation_matrices=obs_mat,
observation_covariance=1.0,
transition_covariance=trans_cov)
state_means, state_covs = self.kf.filter(initial_y.values)
self.means = pd.DataFrame(state_means,
index=initial_y.index,
columns=['beta', 'alpha'])
self.state_cov = state_covs[-1]

def update(self, observations):
x = observations[self._x]
y = observations[self._y]
mu, self.state_cov = self.kf.filter_update(
self.state_mean, self.state_cov, y,
observation_matrix=np.array([[x, 1.0]]))
mu = pd.Series(mu, index=['beta', 'alpha'],
name=observations.name)
self.means = self.means.append(mu)
if self.means.shape[0] > self.maxlen:
self.means = self.means.iloc[-self.maxlen:]

x = observations[self._x]
y = observations[self._y]
return y - (self.means.beta[-1] * x + self.means.alpha[-1])

@property
def state_mean(self):
return self.means.iloc[-1]


There was a runtime error.

And here is another one with F and GM, just in the reverse order, which I would have thought wouldn't have changed the overall results.

6
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
 Returns 1 Month 3 Month 6 Month 12 Month
 Alpha 1 Month 3 Month 6 Month 12 Month
 Beta 1 Month 3 Month 6 Month 12 Month
 Sharpe 1 Month 3 Month 6 Month 12 Month
 Sortino 1 Month 3 Month 6 Month 12 Month
 Volatility 1 Month 3 Month 6 Month 12 Month
 Max Drawdown 1 Month 3 Month 6 Month 12 Month
"""

Author: David Edwards

This algorithm extends the Kalman Filtering pairs trading algorithm from a
previous lecture to support multiple pairs. In order to extend the idea,
the previous algorithm was factored into a class so several instances can be
created with different assets.

This algorithm was developed by David Edwards as part of
Quantopian's 2015 summer lecture series. Please direct any
questions, feedback, or corrections to [email protected]
"""

import numpy as np
import pandas as pd
from pykalman import KalmanFilter
import statsmodels.api as sm

def initialize(context):
# Quantopian backtester specific variables
set_symbol_lookup_date('2014-01-01')

context.pairs = [
initial_bars=300, freq='1m', delta=1e-3, maxlen=300),

]

weight = 1.8 / len(context.pairs)
for pair in context.pairs:
pair.leverage = weight

for minute in range(10, 390, 90):
for pair in context.pairs:
time_rule=time_rules.market_open(minutes=minute))

def __init__(self, y, x, leverage=1.0, initial_bars=10,
freq='1d', delta=1e-3, maxlen=3000):
self._y = y
self._x = x
self.maxlen = maxlen
self.initial_bars = initial_bars
self.freq = freq
self.delta = delta
self.leverage = leverage
self.Y = KalmanMovingAverage(self._y, maxlen=self.maxlen)
self.X = KalmanMovingAverage(self._x, maxlen=self.maxlen)
self.kf = None
self.entry_dt = pd.Timestamp('1900-01-01', tz='utc')

@property
def name(self):
return "{}~{}".format(self._y.symbol, self._x.symbol)

try:
if self.kf is None:
self.initialize_filters(context, data)
return
self.update()
if get_open_orders(asset=self._x) or get_open_orders(asset=self._y):
return

reference_pos = context.portfolio.positions[self._y].amount

now = get_datetime()
if reference_pos:
if (now - self.entry_dt).days > 20:
order_target(self._y, 0.0)
order_target(self._x, 0.0)
return
# Do a PNL check to make sure a reversion at least covered trading costs
# I do this because parameter drift often causes trades to be exited
# before the original spread has become profitable.
pnl = self.get_pnl(context, data)
if zscore > -0.0 and reference_pos > 0 and pnl > 0:
order_target(self._y, 0.0)
order_target(self._x, 0.0)

elif zscore < 0.0 and reference_pos < 0 and pnl > 0:
order_target(self._y, 0.0)
order_target(self._x, 0.0)

else:
if zscore > 1.5:
order_target_percent(self._y, -self.leverage / 2.)
order_target_percent(self._x, self.leverage / 2.)
self.entry_dt = now
if zscore < -1.5:
order_target_percent(self._y, self.leverage / 2.)
order_target_percent(self._x, -self.leverage / 2.)
self.entry_dt = now
except Exception as e:
log.debug("[{}] {}".format(self.name, str(e)))

def update(self):
prices = np.log(history(bar_count=1, frequency='1m', field='price'))
self.X.update(prices)
self.Y.update(prices)
self.kf.update(self.means_frame().iloc[-1])

means = self.means_frame()
beta, alpha = self.kf.state_mean
return means[self._y] - (beta * means[self._x] + alpha)

def means_frame(self):
mu_Y = self.Y.state_means
mu_X = self.X.state_means
return pd.DataFrame([mu_Y, mu_X]).T

def initialize_filters(self, context, data):
prices = np.log(history(self.initial_bars, self.freq, 'price'))
self.X.update(prices)
self.Y.update(prices)

# Drops the initial 0 mean value from the kalman filter
self.X.state_means = self.X.state_means.iloc[-self.initial_bars:]
self.Y.state_means = self.Y.state_means.iloc[-self.initial_bars:]
self.kf = KalmanRegression(self.Y.state_means, self.X.state_means,
delta=self.delta, maxlen=self.maxlen)

def get_pnl(self, context, data):
x = self._x
y = self._y
prices = history(1, '1d', 'price').iloc[-1]
positions = context.portfolio.positions
dx = prices[x] - positions[x].cost_basis
dy = prices[y] - positions[y].cost_basis
return (positions[x].amount * dx +
positions[y].amount * dy)

def handle_data(context, data):
record(market_exposure=context.account.net_leverage,
leverage=context.account.leverage)

class KalmanMovingAverage(object):
"""
Estimates the moving average of a price process
via Kalman Filtering.

See http://pykalman.github.io/ for docs on the
filtering process.
"""

def __init__(self, asset, observation_covariance=1.0, initial_value=0,
initial_state_covariance=1.0, transition_covariance=0.05,
initial_window=20, maxlen=3000, freq='1d'):

self.asset = asset
self.freq = freq
self.initial_window = initial_window
self.maxlen = maxlen
self.kf = KalmanFilter(transition_matrices=[1],
observation_matrices=[1],
initial_state_mean=initial_value,
initial_state_covariance=initial_state_covariance,
observation_covariance=observation_covariance,
transition_covariance=transition_covariance)
self.state_means = pd.Series([self.kf.initial_state_mean], name=self.asset)
self.state_vars = pd.Series([self.kf.initial_state_covariance], name=self.asset)

def update(self, observations):
for dt, observation in observations[self.asset].iterkv():
self._update(dt, observation)

def _update(self, dt, observation):
mu, cov = self.kf.filter_update(self.state_means.iloc[-1],
self.state_vars.iloc[-1],
observation)
self.state_means[dt] = mu.flatten()[0]
self.state_vars[dt] = cov.flatten()[0]
if self.state_means.shape[0] > self.maxlen:
self.state_means = self.state_means.iloc[-self.maxlen:]
if self.state_vars.shape[0] > self.maxlen:
self.state_vars = self.state_vars.iloc[-self.maxlen:]

class KalmanRegression(object):
"""
Uses a Kalman Filter to estimate regression parameters
in an online fashion.

Estimated model: y ~ beta * x + alpha
"""

def __init__(self, initial_y, initial_x, delta=1e-5, maxlen=3000):
self._x = initial_x.name
self._y = initial_y.name
self.maxlen = maxlen
trans_cov = delta / (1 - delta) * np.eye(2)
obs_mat = np.expand_dims(
np.vstack([[initial_x], [np.ones(initial_x.shape[0])]]).T, axis=1)

self.kf = KalmanFilter(n_dim_obs=1, n_dim_state=2,
initial_state_mean=np.zeros(2),
initial_state_covariance=np.ones((2, 2)),
transition_matrices=np.eye(2),
observation_matrices=obs_mat,
observation_covariance=1.0,
transition_covariance=trans_cov)
state_means, state_covs = self.kf.filter(initial_y.values)
self.means = pd.DataFrame(state_means,
index=initial_y.index,
columns=['beta', 'alpha'])
self.state_cov = state_covs[-1]

def update(self, observations):
x = observations[self._x]
y = observations[self._y]
mu, self.state_cov = self.kf.filter_update(
self.state_mean, self.state_cov, y,
observation_matrix=np.array([[x, 1.0]]))
mu = pd.Series(mu, index=['beta', 'alpha'],
name=observations.name)
self.means = self.means.append(mu)
if self.means.shape[0] > self.maxlen:
self.means = self.means.iloc[-self.maxlen:]

x = observations[self._x]
y = observations[self._y]
return y - (self.means.beta[-1] * x + self.means.alpha[-1])

@property
def state_mean(self):
return self.means.iloc[-1]


There was a runtime error.

Hey Jason,

We're incredibly busy with QuantCon SG right now, but will take a look at this ASAP.

Thanks,
Delaney

Thank you Delaney! I really appreciate it!

Joseph

Hi Joseph,

I suspect that the issue is due to the calculation of the hedge_ratio function. Since it calculates linear regression parameters and the intercept term is going to be different, you end up with different (and non-symmetric) values for the beta, or "hedge ratio". This will result in different calculations depending on which equity in the pair is the independent variable and which is the dependent. Unfortunately, I haven't had a lot of time to explore it to figure out the root cause, so this is just my suspicion.

An alternative to doing the linear regression might be to try computing the ratio of the stocks instead of the linear combination and using that as the trading signal. So your hedge ratio would instead be calculating Y/X or X/Y (where X and Y are the elements of your pair) and you would be checking the z-score of a rolling calculation of that ratio as your trading signal. Your ordering logic will probably be more annoying than the linear combination case because you have to calculate the correct number of stocks to approximate that ratio, but I suspect that it will be more likely to be symmetric.

Cheers,
Max

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Hi Joseph and Max,
The hedge ratio found by a linear regression will vary when the X/Y variables are switched because of the error calculation. Ordinary least squares is the most common method, which assumes that the error is in the output variable and optimizes only to minimize the residuals between Y and the predicted value. If total least squares is chosen for the error function the residual will be calculated assuming that both the X and Y contain errors. When using total least squares the hedge ratio should not change (other than by inverting) when the order is swapped.
I haven't dug into pyman filters deep enough to determine if this is why this is happening in this particular case. Intuitively I wouldn't think so, since the kalman filter was designed to make predictions with the assumption that the inputs contain error. Maybe someone with more knowledge on kalman filters can comment.

In order to implement a Kalman filter you need to supply a model of the system in question. In this case, the system is modelled as a simple linear regression between two securities. It is assumed in the Kalman filter that the system is noisy and the hidden state (regression coefficients or hedge ratio here) of the system can't be observed directly. In addition to this the observations, which is the dependent variable in the linear regression, is assumed to contain noise.

Since we are assuming that the model of our system is a simple linear regression the results will change depending on which security you choose to be the dependent variable. It may be possible to model the system in the Kalamn filter as a total least squares regression which would remove this choice but I have never looked into it.

Thanks, Aidan. For those of you who don't know, Aidan recently spoke at QuantCon SG and presented on precisely this. It was a great talk and whereas we didn't record it, he did share his materials here.

Nice