Kalman Filters Best Practices

## Setup

I just finished watching Quantopian's Lecture on Kalman Filters and went through the notebook.

- Concept
- Equations
- Beta Example
- Numerical Example
- A Textbook
- An IPython Textbook

The Python library that is being used is pykalman

# The Code

In the Quantopian notebook, the meat of the code is here:

start = '2012-01-01'
end = '2015-01-01'
y = get_pricing('AMZN', fields='price', start_date=start, end_date=end)
x = get_pricing('SPY', fields='price', start_date=start, end_date=end)

delta = 1e-3
trans_cov = delta / (1 - delta) * np.eye(2) # How much random walk wiggles
obs_mat = np.expand_dims(np.vstack([[x], [np.ones(len(x))]]).T, axis=1)

kf = KalmanFilter(n_dim_obs=1, n_dim_state=2, # y is 1-dimensional, (alpha, beta) is 2-dimensional
initial_state_mean=[0,0],
initial_state_covariance=np.ones((2, 2)),
transition_matrices=np.eye(2),
observation_matrices=obs_mat,
observation_covariance=2,
transition_covariance=trans_cov)

# Use the observations y to get running estimates and errors for the state parameters
state_means, state_covs = kf.filter(y.values)


# Question 1: How to pick delta? Why delta / (1 - delta) * np.eye(2)?

Where does a delta of 1e-3 come from? And why not just do:

trans_cov = delta * np.eye(2) # How much random walk wiggles


Perhaps this is something that must be optimized using some cross-validation, although I'm not sure what metric to use. If anyone has any insight, would greatly appreciate it.

# Question 2: How is observation_covariance = 2 decided?

I understand we're talking about prices here, and \$2 move for a stock 'feels' like a good estimate for the variance of the price of Amazon, but is there a better way to select this rather than gut feelings?

# Question 3: Why is this approach better than just doing some rolling beta?

At the end of the day, to do some rolling beta you must decide what the lookback window is.

I understand the appeal of Kalman because you don't decide the lookback window, but you do need to decide transition_covariance? And changing this can have the same effect as increasing/decreasing your lookback window.

# Question 4: Best Practices

Are there best practices that I should be aware of?

# Thanks

Any help would greatly be appreciated. And if anyone has any questions that I might be able to answer, feel free to ask me in this thread.

34 responses

Awesome questions, I am starting to use Kalman filters to estimate linear models so I'm looking forward to hearing more...

Good questions, I'll try to answer the ones I have some intuition about.

Both questions 1 & 2 are related. The observation and transition covariance matrices tell the filter how much it should trust the observations and transitions vs. the values predicted by the filter. If you have a large variance in your observations (maybe from a crappy sensor), you will want to trust the predicted value more than the new observation. The transition is the other way around, if there is little variance in how the transitions occur then you might trust the prediction more if a large transition occurs.

You're right that these assumptions basically replace the window length parameter, but there is less ambiguity about what these numbers mean. For example, you can be pretty sure that your observations of a current price are pretty spot on for a moving average filter, but the transitions will have less certainty. You can use historical volatilities to estimate parameters, but you are always injecting some belief about the system at the end of the day.

1. I'm not sure, you're right it would be simpler to do delta * np.eye(2). The implementation was taken from the beta example in your post and it was done there. Also, in Ernie Chan's book he does the same thing in his example, but I can't recall if there was a justification for it.
2. This was also chosen because the example we used went with it, there was no attempt to optimize for an ideal value.
3. It's difficult to say one is better than the other really, but the filter does have appealing qualities. A reasonable estimate of transition variance feels better to me than choosing arbitrary look back windows. Also the Bayesian online updating is appealing to me intuitively and it's less computationally expensive.
4. Kalman filtering is more complex because you have to supply the filter a model of how you believe the underlying system works. I think that most of the bread and butter lies in the model you provide the filter. I'm not an expert so I'm not sure of best practices. I'd say you should work through the examples in the Ipython notebook/textbook, it's great having interactive resources like that to help build intuition.

I am still working through http://nbviewer.ipython.org/github/rlabbe/Kalman-and-Bayesian-Filters-in-Python/blob/master/table_of_contents.ipynb, but I agree with David, I think the important part is that you've designed a good linear model for the data in the first place. Unfortunately, it's quite hard, at least for me!

I also started a thread on Reddit, which you can find here

With pykalman, you can run a method that helps choose the parameters to maximize the likelihood:

kf.em(
X       = y_training.values,
n_iter  = 10,
em_vars = [
'initial_state_covariance',
'transition_covariance',
'observation_covariance'
]
)


I've been pretty satisfied with the results.

I experimented with the Kalman filter moving average implementation using the notebook presented in the Quantopian Lecture series. I came to a number of conclusions. Perhaps someone can point out the errors in my logic since I'm not a Kalman filter expert. The state space model which is used appears to be the simple model that next price = the current price plus noise... a random walk. It appears that the same state space model is used for both examples in the notebook. I believe that is probably the best model in both instances unless someone can find a reference to a better model. The idea is that the Kalman filter would identify patterns in the remaining noise.

My experimentation led to further research which confirmed my suspicions.

From the informal reference above:
"A random walk + noise model can be shown to be equivalent to a EWMA (exponentially weighted moving average). The kalman gain ends up being the same as the EWMA weighting."

Note: The Kalman gain can be adjusted by changing the transition covariance value.

A more formal analysis of the equivalence is presented here which contains an appropriate excerpt from here.

So, I wonder if the linear regression model can be replaced by a simpler equivalent also since it appears to use the same state space model.

Yes, for that particular model, they are the same, I recall reading this a while ago but it had slipped my mind. The advantage of the Kalman filter, of course, is that you can create one for a more complex linear model, if you can make one which better describes the data.

So, I wonder if the linear regression model can be replaced by a simpler equivalent also since it appears to use the same state space model.

Not sure what you mean by that?

@ Simon
I agree that the Kalman filter would be useful for more complex linear models. I was just thinking that it might be overkill for the applications presented.

Due to the simple transition model being utilized, the Kalman filter moving average equation reduces to a simple exponential moving average. So, I'm wondering if the Kalman filter implementation of the presented linear regression relationship might perhaps reduce to some computationally less expensive equation for a rolling weighted least squares linear regression due to the simple transition model being utilized.

You are correct @Rob, the moving average and regression equations assume random walk transitions, e.g. identity matrix + noise. As far as simplifying to a less computationally expensive operation I'm not sure about that. One of the Kalman filter's appealing features is that it only needs its current state to estimate the next state so it's pretty fast to compute.

A few good articles on Kalman filters (for pairs trading) in the last year:

Thanks for everybody's useful comments here. You are rapidly approaching and exceeding my knowledge of Kalman filters, so I probably can't be that helpful on some of these topics. I'm mainly glad that the notebook has got people thinking about whether it might be a good idea to use them in their strategy, I understand that they are not appropriate for every situation.

One main point that seems to be brought up is why is selecting a transition variance is any less vulnerable to overfitting than choosing a look-back window. To me it seems that an estimate of transition variance is a more objective quantity with a clear and computable meaning. You can make an educated guess about it, or attempt to infer it from data. With look-back windows, however, you can simply choose the one that will give you the best returns over a historical period. We show that this is very vulnerable in this lecture. Ultimately, even if the filter reduces to a moving average, it gives you a tool to determine what the correct look-back window should be by intelligently choosing the transition variance.

We will do a full lecture on less vulnerable robust optimization techniques in the future. One should almost never use optimization without referring to robust optimization practices, as it will almost surely result in overfitting. You can, as always, find the full lecture series here.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

I recently gave a presentation on Kalman filters and wrote up a blog post. You can find them here.

Let me know what you think!

Hey Jonathan, your analysis is really cool. Your checking for stationary, normal, and homoskedastic errors is excellent; I've been trying to convince people to do the same in the lecture series. I'm sure that people would like to see the notebook behind your analysis, would you mind sharing it here? You can actually attach notebooks to replies now (new feature), so you can reply to this thread without starting a new one.

Unfortunately almost all of my analysis is done using custom libraries that I've built. Although the meat of the code is explained in the presentation.

If anyone has any questions, I'd be more than happy to answer.

Btw, stationary is not a requirement for Kalman filters.

A time-varying Kalman filter can perform well even when the noise covariance is not stationary.

Source

Gotcha, either way I think the habit of quickly running a batch of tests against your errors is a great one to have. Worst case you see the non-stationarity test come up positive and then learn that that's actually okay for Kalman filters.

I agree.

I've been learning a lot from the lecture series. Keep up the good work!

Thanks for your support, we're definitely going to be releasing a bunch more content going forward. We're also considering the idea of running a free course and certification around the lecture series. I will be floating the idea at meetups to see how much traction it generates.

In the shower this morning, I wondered - in pairs trading, I've seen people use Kalman filters to estimate the cointegration beta as an alternative to OLS regression. Subsequently, people often fit an OU model to estimate the half-life. Could one make a Kalman filter that directly modelled the OU process (or something similar) of the spread, and predicted the mean-reverting spread value? This would be neat for two reasons: one, you'd have the filter learning the mean reversion parameter, and two, it would generalize very simply to multi-legged (>2) baskets...

Interesting idea, a quick search seems to imply that whereas there may be a little bit of work in this direction, it is certainly not easily accessible. These two links may be helpful:

Hi everyone,

I've been trying to find papers about the theory behind using the random walk model for the state transitions. Does anyone know where I could read up more about the math behind it?

I feel like there may be some related literature in the Markov model space, as the concept is similar there.

Bayesian Forecasting and Dynamic Models is the best book I have read on the subject so far

This is also worth posting I think:

Thanks for the replies. I will definitely have a look into those resources. I should have made it more clear that I was looking for the reasoning behind the selection of the random walk model for this particular strategy. Are there studies that show that alpha and beta fit the random walk model well?

Thanks again

I don't actually know of much research showing this. A lot of beta forecasting models are proprietary. I do know that GARCH models are used for volatility forecasting, and maybe those could be used for beta forecasting.

Thank you Delaney. I'll try fit a GARCH model for beta forecasting and let you know the results.

Very curious how it might work actually, let me know if you get anywhere. I might do a full notebook on beta forecasting at some point and it would be great to have GARCH as one of the methods in a big comparison.

I guess that implies you think that the beta is conditionally heteroskedastic? Not sure about that...

That's a good point. It may be better to start by fitting a simpler autoregressive model like ARMA.

I will fit a few models and do a writeup comparing their performance. Thanks again Delaney.

This will be an interesting read:
http://aum.sagepub.com/content/23/1/1.full.pdf
Compares GARCH to Kalman filter technique

Great find.

rather than start a new thread, figured this would be the right spot for this question.

Does anyone know if it is possible to estimate an observation using pykalman with more than one pair? IE some made up basket of cointegrated pairs? All the examples I seem to find only est real time hedge ratio when you have X,Y... but what if you have X,Y,Z or even more?