**Investment Hypothesis**

Pairs trading is a strategy that involves two stocks which are simultaneously longed and shorted according to their future expected prices. Pairs trading is usually done on pairs of stocks that have some underlying economic relationship and can be very profitable if their difference is mean-reverting. In general, if the difference is beyond a certain level above the mean, the over-valued stock is shorted while the under-valued stock is longed, with the expectation that the stocks would revert to their 'true' value in the future; and vice-versa in case the difference is below a certain level under the mean.

The main hypothesis of my strategy, derived from the paper that was assigned to me, is that the change in a mean-reverting stock spread can be modeled as an OU process in the following way:

dXt = θ * ( μt - Xt ) * dt + σdWt

where μ is the mean and θ is the mean-reversion rate. A higher level of θ indicates faster convergence to the mean, which means that profits can be realized on pairs with higher θ much faster and therefore in a greater volume than pairs with lower θ.

**Investment Algorithm**

The spread between stocks A and B at time t is defined as Xt = ln( A(t) / A(0) ) − ln( B(t) / B(0) ), t ≥ 0

Trades are made if the current value of the spread crosses the Bolinger bands. The upper and lower bands in this case are defined as *μ + 0.5 * σ* and *μ - 0.5 * σ* respectively, where μ represents the 30 days moving average and σ represents the 30 days moving standard deviation.

The following course of action is implemented if the value of the spread crosses the Bolinger bands:

if X > μ + 0.5 * σ

stock A is shorted and stock B is longed

if X < μ - 0.5 * σ

stock A is longed and stock B is shorted

**Validating the Investment Hypothesis**

Pairs trading strategy requires that there must be an economic relationship between the two stocks. Therefore in my strategy, the stocks within a pair are always chosen from within the same industry. To do this, I create a pipeline that filters stocks for their industrial classification, and their volume of trade. I then use this pipeline to calculate the mean reversion rate, as well as the adfuller statistic for each pair.

To estimate the mean-reversion rate, I first rewrite the OU process equation as

dXt / dt = θ * ( μt - Xt ) + ( σdWt / dt )

Then creating variables for ( dXt / dt ) and ( μt - Xt ), I run the following regression

dXt / dt = α + β * ( μt - Xt ) + ε

and use β as an approximation for θ i.e. the mean-reversion rate.

The stock pairs with the highest mean-reversion rates and an adfuller statistic of below 0.1 (indicating co-integration) are then chosen to be included in the backtest. The attached notebook demonstrates this process for stocks in the energy sector. The seventh cell in the notebook shows the twenty stock pairs with the highest mean-reversion rates. The eighth cell shows the adfuller statistic for all of these pairs. The final two cells arrange the stock labels into the input that is then copied and used in the backtest.

The backtest is run from July 2018 to July 2019 on minutely data and includes pairs from energy and technology industries. The strategy gives a 5.13% return and a Sharpe ratio of 2.19.

My key deviation from the paper is in the way I select the stock pairs. The paper is quite vague about it and only briefly mentions that the stock pairs are chosen on the basis of mean-reversion rates and some other characteristics. The exact methodology is not described in any detail. Therefore, my formula for selecting on the basis of mean-reversion rates and co-integration is improvised.

A potential shortcoming of my strategy is that there is a forward-looking bias in my method for pairs selection, since the mean-reversion rates are calculated for the year in which the backtest is applied. The main reason for the presence of this bias is that I was unable to include the pair selection method in the backtest environment. However, my strategy does confirm the hypothesis that co-integrated pairs with high mean-reversion rates are likely to provide good returns. I tested this for several years, and the backtest always shows positive returns. On the other hand, including pairs with very low mean-reversion rates always results in losses.

**Conclusion**

Overall, my strategy shows that trading on pairs with high mean-reversion rates and a high degree of co-integration can be profitable. Although the returns are ultimately lower than the market, that is to be expected since pairs trading returns tend not to imitate the market. In fact, it can be seen in the backtest that the strategy provided positive returns even when the market returns turned to losses. Therefore, this strategy may be recommended as a safe bet, but not one that is likely to result in huge returns.