Back to Community
pipeline pair cointegration notebook (3700 cointegrated pairs)

Combined Delaney's Researching a Pairs Trading Strategy notebook with a simple pipeline I found somewhere (can't seem to figure out where I got it right now but shoutout to unknown stranger for a good first pass pipeline).

The pipeline returns about a 1000 securities that for a variety of reasons are good for trading. Of these I check for pairwise cointergration over the last year if the securities belong to the same morningstar sector. It takes about 15 or 20 minutes to run and yields 3700 pairs with pvalues less than 5%

Haven't gotten around to building a trading strategy based on this system yet but i thought the heat map looked really cool so its worth posting it here even if only for that reason.

Loading notebook preview...
Notebook previews are currently unavailable.
6 responses

The difficulty with this is that p-values of 5% are not really significant anymore when you've drawn from so many trials.

Well I'm not really as clued up on significance and critical values as i would like to be but looking at the coint() docs and the McKinnon paper they reference for their critical value determination it would appear to me as if the critical value for each test is determined solely by the number of variables being cointergrated (i.e. 2 in each case) and the number of samples used for the test (i.e. number of trading days over the year which is 250 or something like that)

All of which is to say i think to quote the docs "The Null hypothesis is that there is no cointegration, the alternative
hypothesis is that there is cointegrating relationship." and that this hypothesis is limited to only the pair I am testing rather than to the overall 1000 stock/1000000 pair system.

As I said though this really isn't my area of expertise though so my interpretation of the results i have obtained could very well be way off.

My understanding is that if i wished to compare these results across the entire system i would need to record the scores from each pair's test and then calculate critical values from the mean and standard deviation of the score for the entire universe? But even that is probably not useful as that would be entirely in sample so i would in all reality probably have to replicate McKinnon's simulation based approach to determining critical values for a system of this size before i would be able to actually interpret the scores across the entire system.

Despite all that my plan was to select random pairs from the 3700 pairs as my trading strategy needs additional pairs. I would then check for and only trade the pair if I find a short term (say 3 or 6 month) cointergration using minute data.

Please keep posting about this strategy; I tried something similar a few years ago, and found that most of the pairs were either garbage (spurious and/or in-sample only), or perfect (because they were ETFs on the same things) and therefore too low in variance to profit from with Quantopian. But, I am curious! I know some people do it successfully, I won't name names, but they are quiet about how they pick their pairs.

Out of curiosity I built an algorithm that trades those cointegrated pairs. While the current pair selection criteria is trivial and the results disappointing, as expected, it would be fairly easy to change the pair selection logic with a smarter one while reusing the algorithm code. In the hope it can be useful to someone else, here is the algorithm.

Clone Algorithm
Backtest from to with initial capital
Total Returns
Max Drawdown
Benchmark Returns
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 58088fe784e024137fa7d9b4
There was a runtime error.

I make a video from youtube about cointegration: