UPDATE: This algorithm is now out-of-date. Please find the new version at https://www.quantopian.com/posts/fixed-version-of-ernie-chans-gold-vs-gold-miners-stat-arb.

# Due Credit

Ernie Chan is the author of Quantitative Trading, which is a terrific and accessible intro to algorithmic trading concepts. Ernie also maintains a blog, where he generously explains concepts and provides examples - one of my favorites is his post about exploring an arbitrage between Gold and gold-mining companies. Exchange Traded Funds (ETFs) exist for both gold the commodity and gold-mining companies, so it is pretty easy to explore Ernie's idea in Quantopian. Ernie also provides code with his book that demonstrates how to research this arbitrage opportunity, which helped me understand the statistical relationship the strategy exploits.

Lots of fundamental investors will take a view on commodity/company arbitrage trades like this one, so I was really interested in how an algorithmic investing approach would fare.

Ernie's book suggested GLD for the commodity, and GLX for mining companies. Both are ETFs in the Quantopian history, so I used them here.

# How it works

The relationships used in the strategy are based on the spread between the stock prices. First an Ordinary Least Squares (OLS) calculation is done between the pricing histories. I think of OLS as a tool to fit a linear model to a scatter plot of the prices of GLD vs. the prices of GLX. To create the OLS required a bit of advanced coding in Quantopian - I used our (as of yet private, but soon to be opensource) EventWindow classes to create a rolling OLS.

Once OLS is calculated, you have a beta factor that can be used to normalize the spread between the two stocks. Then we calculated a z score, which is basically the number of standard deviations the current spread is from the average spread for the trailing window. If the spread is wide, the z score will be a positive number representing the number of standard deviations above the average. If the spread is narrow, it will be a negative number. When the spread is wide we sell GLD and buy GLX, with the expectation that the spread will return to normal. When the spread is narrow, we instead buy GLD and sell GLX. You can see the symmetry in the $ value of the bets over time if you clone the algo, and run it (be sure to start *after* June 2006).

I did a number of backtests - nearly 30 - to debug and then explore parameter values for the bet size (in # of shares) and length of the trailing window (I settled on 6 months). I also played around with different lengths for the average and stddev vs. the OLS.

# Clone me!

I found it extremely fun to follow Ernie's lead, so in addition to sharing the backtest below, I wanted to throw out a few ideas for variations on this strategy that you can try by cloning my code and making small edits:

- Ernie has another post about the same kind of arbitrage between energy stocks and energy commodity futures. It would be cool to try this strategy with ETFs from those areas.
- Alcoa sells aluminum, but they are tightly coupled to natural gas prices because they use so much in the smelting of aluminum ore. I've always wondered if there is a statistical relationship that echoes the fundamental one.
- It would be fun to try this statarb approach on a merger arbitrage. In that case, you might want to add a bias for a merger to either happen or not, but it would be neat to trade the spread minute to minute on a deal.
- The airlines are always on watch when oil goes up. Same for Fedex and UPS. Maybe there is a relationship there.
- GLD and GLX seem to have maintained a high level of cointegration, but there are periods like Q4 2008 when the relationship changes suddenly. It would be challenging to modify this algo to detect a break down in the cointegration of the two stocks. Maybe something as simple as tracking the size of a standard deviation might be interesting.

The other piece of this algorithm that may be fun to tweak is the order code. I didn't do anything fancy, but I did add a condition to reduce the leverage if the spread falls within one standard deviation of the average. I think it would be really interesting to adjust the **amount** of stock bought/sold based on the width of the spread. So instead of just making the same bet when the z factor is -2 vs. -4, maybe doubling your orders at -4 would be interesting!

Have fun,

fawce