Quantopian Lecture Series: Autocorrelation and Autoregressive (AR) Models

Autocorrelation (the property of an autoregressive time series) is one of the most common effects in financial time series, and also one of the biggest innovations to come out of time series analysis in the last 100 years. It describes the phenomena of future values being dependent on current and past values as well as new information. Autocorrelation also leads to fat tails and tail risk, which can sink your algorithm if you assume underlying distributions are not autoregressive.

Because there is so much autocorrelation in finance we should understand how to effectively model it. This notebook covers the idea behind autocorrelation, describes properties of autoregressive time series, and shows how to fit a model to the data.

All lectures can be found here. https://www.quantopian.com/lectures

1527
Notebook previews are currently unavailable.
Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

19 responses

Hi Delaney,

Thought I might try to noodle on this a bit. You say:

Autoregressive processes will tend to have more extreme values than data drawn from say a normal distribution. This is beacuse the value at each time point is influenced by recent values. If the series randomly jumps up, it is more likly to stay up than a non-autoregressive series.

Does this imply a kind of 'stickyness' to prices? And would it imply that mean reversion strategies should work, at an appropriate time scale? My thinking is that if the price tends to get stuck in the wings of the distribution, but will eventually move back to the mean, then autoregressive processes might be ripe for mean reversion strategies? If so, maybe stocks that exhibit greater degrees of such behavior would be better suited for mean reversion strategies?

That interpretation is correct, people will refer to the behavior as 'sticky' or 'spiky'. A process that is autoregressive will revert to some mean, and therefore mean reversion trading should work on these series. In fact as you mention it may work better as there is more deviation from that mean. However it is also important to note that the mean may not necessarily be stationary over time, and because of the spikiness you are exposing yourself to much greater volatility and tail risk.

So are there tried-and-true ways of identifying batches of stocks that would tend to mean-revert, for a long-short strategy (on a time-scale relevant to Quantopian today)? I'm just wondering if there is any practical application of this mathy stuff.

Auto-regressive series can result in what traders call "momentum", not "mean reversion".

Some random thoughts:

If the price gets stuck in a wing (|z| >> 0), either above or below the normalized mean (z=0), and stays there on a rolling basis (|z| >> 0), then there's an opportunity for momentum trading. But if it reverts back to the mean (z ~ 0), then it is mean reversion. But it seems that in order to take advantage of either situation, there has to be stickiness--enough predictability and time to take advantage of the anomaly. Seems like it is all the same, but maybe stocks tend to get stuck in one wing or the other asymmetrically?

Momentum and mean reversion behavior can be present simultaneously, imagine a series trending upwards but with noise around the trend. After de-trending the series you will be left with mean-reverting noise. In general a series will exhibit price momentum if the returns are auto-correlated. If the prices are autocorrelated and you observe a higher price, there is from then on no greater chance of the price being higher compared to what it is now than there is of it being lower. I would not advise trying to momentum trade a series whose prices, but not returns, are AR. These lectures explain more.

@Grant Yes in real life stocks will not exhibit textbook AR behavior, and there will be asymmetry and whatnot. In practice I think mean reversion is a safer bet because you are betting "At some point in the future this stock will return to its central value." whereas with momentum you are betting "This stock will continue to become higher or lower, and I can time the exit before the crash."

Just to point out: Many financial time series exhibit serial autocorrelation. Thus perhaps an AR(1) model should be able to fit the data. However if you do try to fit an AR(1) model, the coefficient phi will be something very close to 1 (so called "unit root"). When plugging into the formula, it is easy to see that this just reduces to random walk. The reason I point this out is to highlight the fact that AR(1) is not what really explains the data (even if the ACF shows extremely high autocorrelation); it is really first-order differencing and thus a random walk model which is not really capable of making future predictions.

That's a good point, AR is not sufficient to make predictions about specific future values. In general testing for AR behavior is usually to show that the series is non-stationary. Modeling usually requires the use of some independent data sources to make predictions.

Hi Delaney,

Perhaps a topic for a different thread, but I'm curious if there are ways to sort out characteristic time scales for price movements, point-in-time, across large collections of securities (e.g. could one analyze the Q database)? I see from https://en.wikipedia.org/wiki/Autocorrelation that the computational complexity is not friendly. For example, you have 500 stocks, perhaps a trailing window of minutely data of 20 days (390 minutes per day), 252 trading days in a year (assuming the computation is done daily), and maybe 15 years of data. It starts to become a lot of data. Any thoughts?

That sounds like a large scale research project to me, something that would take a while. It's probably better to have a specific hypothesis before diving into something like that. One thing to keep in mind is that these long term analyses fall prey to non-stationary market conditions. For instance, certain types of momentum and mean reversion used to be tradeable on a daily basis. But as more and more quants and retail traders started realizing this, the timescale of tradeability lowered and lowered until now a lot of the old technical indicators don't work or are only tradeable on a HFT basis.

In general I think what is interesting is to subsample you data and use that are your timescale analysis. Look at data subsampled by month, week, day. You may find that things trend upwards quite consistently when sampled on a monthly basis, but have way more noise on the daily basis. This kind of observations can help design the hypotheses behind pricing strategies that ultimately underly trading strategies.

Autocorrelation methods tend to be better for work on futures and volatility studies, ARCH in particular is used to forecast volatility in a variety of contexts. I don't know if simply measuring the amount of AR behavior in equities will get you anywhere by itself. Again probably better to have an idea for a pricing model that you're validating by testing amount of AR behavior over certain frequencies or in a certain asset class, or after certain events or whatnot.

Just adding to what Delaney has said: Remember that with an analysis like that, it is crucial to split your data and build your model off of a sample other than what you backtest. You can relatively easily find data with minor but significant autocorrelation that can be explained by something like an ARIMA(1,1,0) model (AR term after one order of differencing), but it is very rare to be find another out of sample dataset that is explained by that model. In general that extra AR term would have no use, and the new sample would only be explainable by an ARIMA(0,1,0)/random walk model. The problem is that for the most part a single order of differencing is enough to make the data stationary, meaning there is no residual autocorrelation that can be predicted by an AR term or MA term. So as Delaney said, negative autocorrelation that could be explained by a simple MA term or positive autocorrelation that could be explained by an AR term was likely more common before more traders become more "sophisticated" and the number of traders increased, but has now mostly diminished causing the data to mostly exhibit a random walk. It is very possible and relatively easy to do in theory, but in practice difficult to find.

Yup great point, maintaining strict statistical discipline in out of sample testing is really important.

the timescale of tradeability lowered and lowered until now a lot of the old technical indicators don't work or are only tradeable on a HFT basis

Hi Delaney -

Regarding your comment I pasted above, is there a way of characterizing the "timescale of tradeability" for stocks, either individually or as baskets? For example, y'all are enthusiastic about releasing tradeable universes (https://www.quantopian.com/posts/the-tradeable500us-is-almost-here), but there is an implicit assumption that the timescale for tradeable price movements is long enough that one could devise a profitable algo. Would there be a way to determine the timescale of tradeability for stocks since 2002, using the Quantopian database? Perhaps your statement could be backed up with a little study? Can it be tested as a hypothesis?

Out of curiosity, as a Quantopian employee, is your access to data restricted the same as for users? Or could you set up your own machine (e.g. under your desk) and crunch data in a more modern parallel processing/GPU fashion? It's curious that there haven't been any studies using high-performance computing by Quantopian employees (although this must have been done in parallel in some fashion: https://www.quantopian.com/posts/q-paper-all-that-glitters-is-not-gold-comparing-backtest-and-out-of-sample-performance-on-a-large-cohort-of-trading-algorithms). A while back, there was https://www.quantopian.com/posts/zipline-in-the-cloud-optimizing-financial-trading-algorithms, but it never went anywhere.

But as more and more quants and retail traders started realizing this, the timescale of tradeability lowered and lowered until now a lot of the old technical indicators don't work or are only tradeable on a HFT basis.

Yes, indeed. I have done many studies on the futures markets showing the slow degradation of trend "efficiency" over the past 40 years.

Two rather sad but very important lessons can be learned by looking at the histories of CTAs Dunn and JWH.

Effectively both seem to have started out 30 years ago plus on short term momentum strategies. And both crashed and burnt in recent years, presumably through lack of adaptation.

Hi Delaney,

I found serial auto-correlation within a basket of stocks. Basically, I used portfolio optimization techniques to find weights such that the combined basket has tons of serial auto-correlation. But I am lost after that. How do I trade it? Could you please post a sample algorithm?

First of all, you likely want to do some additional model validation, including an out of sample test, to ensure that your method for finding AR baskets is not just overfitting to noise in price data.

AR is such a core statistical building block that I'm not sure there is a go to way to trade it, it's more in how you might build a model that takes advantage of the autocorrelation. What does autocorrelation tell us? It tells us that future values of the series are more likely to be closer to the current value, and it's not just totally randomly chosen. It also makes a mean reversion trade less likely to work as mean reversion requires stationarity. AR also tells us that samples are not independent and therefore estimates of variance, p-values, standard errors, and CLT will be wrong. So already we've gained a large amount of information that may be valuable in constructing a model.

In practice I think AR behavior is more something people look to get rid of before applying other models, as the lack of independence breaks methods like linear regression and also p-values, like I mentioned. Directly trading on it may not be as feasible, but understanding how it might break other models is important.

Thanks Delaney. Here is my first attempt. A lot more can be done but I would like to keep it to myself for the moment.

58
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
 Returns 1 Month 3 Month 6 Month 12 Month
 Alpha 1 Month 3 Month 6 Month 12 Month
 Beta 1 Month 3 Month 6 Month 12 Month
 Sharpe 1 Month 3 Month 6 Month 12 Month
 Sortino 1 Month 3 Month 6 Month 12 Month
 Volatility 1 Month 3 Month 6 Month 12 Month
 Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 5829ccbe67be32106dbabd49
There was a runtime error.

Thanks for sharing your algorithm Pravin. Haven't had time to delve into it to deeply but noticed your return calculation can be made into a 1 liner.

np.log(prices / prices.shift(1)).dropna() or np.log1p(prices / prices.shift(1)).dropna()

The first AR(3) example in the notebook is a mildly non-stationary gaussian process. The roots of the polynomial (1 -.8B -.1B^2 -.05b^3) are strictly inside the unit circle (largest magnitude is at .958 < 1 ). After an initial burn-in, say 100 steps (.958^100 =.014) , the AR(3) process becomes stationary gaussian. Its covariance at steady state is 8.61. Time varying state covariances of the corresponding state-space models are computable by iterating a covariance update equation or in steady state by a discrete lyapunov equation. If you choose the initial states from the steady state covariance distribution, the process will not need a burn-in time.

IMHO x \in AR(3) is not fat-tailed, it just has a standard deviation greater than 1.