pattern recognition based on zlib

Here is a simple example of a pattern recognition algorithm based on the text string compression routine zlib. Thanks to James Jack for useful discussions and coding the NCD and CDM functions, and to Quantopian for enabling zlib.

For those interested in doing an intellectual "deep dive" into the topic, a starting point is:

Ming Li; Xin Chen; Xin Li; Bin Ma; Vitanyi, P.M.B.; , "The similarity metric,"
Information Theory, IEEE Transactions on , vol.50, no.12, pp. 3250- 3264, Dec. 2004
http://homepages.cwi.nl/~paulv/papers/similarity.pdf

Clone Algorithm
47
Loading...
Backtest from to with initial capital ( data)
Custom data:
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Information Ratio
--
Benchmark Returns
--
Volatility
--
Max Drawdown
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Information Ratio 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
This backtest was created using an older version of the backtester. Please re-run this backtest to see results using the latest backtester. Learn more about the recent changes.
There was a runtime error.
2 responses

I ran the algorithm on SPY & SH (S&P 500 ETF & S&P 500 short ETF, respectively). Sample output, with X & Y representing the coded prices (relative to their respective moving averages) over a 30-day trailing window:

2012-02-23handle_data:53INFO----------------------------------  
2012-02-23handle_data:54INFO X: 000000000000000011111111111111  
2012-02-23handle_data:55INFO Y: 111111111111111100000000000000  
2012-02-23handle_data:56INFO----------------------------------  
2012-02-23handle_data:57INFO NCD: 0.142857142857  
2012-02-23handle_data:58INFO CDM: 0.571428571429  
2012-02-23handle_data:59INFO----------------------------------  

One might have expected NCD & CDM both to be ~ 1 (indicating a high degree of dissimilarity). Instead, both indicate a relatively high similarity between SPY & SH (NCD << 1 & CDM ~ 0.5). The interpretation, I think, is that SPY & SH (as coded) have the same information content. For example, if I am given the SPY time series, I can predict the SH time series (so long as I know that it moves in the opposite direction). I obtain a similar result for the pair SPY & IVV, which move in the same direction.

Clone Algorithm
47
Loading...
Backtest from to with initial capital ( data)
Custom data:
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Information Ratio
--
Benchmark Returns
--
Volatility
--
Max Drawdown
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Information Ratio 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
This backtest was created using an older version of the backtester. Please re-run this backtest to see results using the latest backtester. Learn more about the recent changes.
There was a runtime error.

Hi Grant, I think I better start with RSquared, as it's something I should be able to understand easily, and is pretty straight forward. For example, SH vs SPY has a RSquared above 0.9, which is consistent with them being mirrors of eachother.