stationary subspace analysis

SSA is an algorithm that factorizes a multivariate time series into stationary and non stationary components. Unfortunately there is no python implementation available. I used the Java SSA toolbox on a a random set of 10 XLE components using Yahoo daily prices and tried to see if it can find a mean reverting stationary series. Apparently it seems to work.

You can also see results of my experiment here

Looking forward to comments for a discussion/improvements/viability etc.

6 responses

Seems like a pretty cool trick.

The wikipedia page is uninformative. The first citation describes the method in detail.

It's not clear from your Quora post what metrics you're measuring its success by.

This is essentially a method for finding co-integrated linear combinations of instruments, which I would expect to be substantially arbitraged out of most market prices at this stage.

Thanks for your feedback Alex. I implemented a closed form solution of SSA in python and here are the test results. It works but requires careful calibration of block sizes (epoch lengths) and available blocks.

10

Any suggestions as to how to create buy and sell rules from this research?

Would you guys be very patient and explain to a non statistician (me) how you would apply this to a stock system?

I can see that typically a time series for financial instrument is non stationary. It contains trends, its variance increases and decreases in time, in might contain seasonality particularly in the commodities markets.

The above chart looks to a non expert (me) to be stationary. It looks to be mean reverting and hence one could profit from an overbought / oversold type of system.

If this is the case where have the non stationary elements gone? Is there another chart which shows just non stationary elements? Can one use such a procedure on a single time series to inform one when to apply mean reversion and when to apply momentum strategies?

Is it predictive or merely historic?

Hi Anthony,

You are absolutely right. This procedure separates the stationary and non stationary components. If you want you could also extract the non stationary components from this procedure. Simply use the eigen vector corresponding to the largest eigen value and you get the non stationary.

Basically given X non stationary stock prices, it finds out weights such that the combined portfolio is stationary.

The red line in the above graph is historic (in sample data) and the blue line is out of sample. Attached is a demo algorithm to trade using this research. To increase trading activity add more baskets.

Best regards,

48
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
 Returns 1 Month 3 Month 6 Month 12 Month
 Alpha 1 Month 3 Month 6 Month 12 Month
 Beta 1 Month 3 Month 6 Month 12 Month
 Sharpe 1 Month 3 Month 6 Month 12 Month
 Sortino 1 Month 3 Month 6 Month 12 Month
 Volatility 1 Month 3 Month 6 Month 12 Month
 Max Drawdown 1 Month 3 Month 6 Month 12 Month
import math
import numpy as np
import pandas as pd
import scipy as sp
from statsmodels.tsa.vector_ar.var_model import VAR
from sklearn.covariance import OAS, EmpiricalCovariance
import statsmodels.api as smapi

def initialize(context):
context.days = 0
context.hasPosition = False
context.stocks = None
context.positionStocks = None
context.mean = 0
context.std = 0
context.posSign = 0
context.signal = 0
context.weights = None

if context.hasPosition:
context.days += 1
return

fundamental_df = get_fundamentals(query(fundamentals.valuation.market_cap)
.filter(fundamentals.asset_classification.morningstar_sector_code == 309)
.filter(fundamentals.valuation.market_cap != None)
.filter(fundamentals.company_reference.primary_exchange_id != "OTCPK")
.filter(fundamentals.share_class_reference.security_type == 'ST00000001')
.filter(~fundamentals.share_class_reference.symbol.contains('_WI'))
.filter(fundamentals.share_class_reference.is_primary_share == True)
.filter(fundamentals.share_class_reference.is_depositary_receipt == False)
.order_by(fundamentals.valuation.market_cap.desc())).T
context.stocks = fundamental_df[0:15].index

def handle_data(context,data):
monitorPosition(context, data)

if context.hasPosition:
return

prices = data.history(context.stocks, "price", 200,"1d")
prices = prices.dropna(axis=1)

mu = []
sig = []
BlockSize = np.shape(prices.values)[1]
insample = prices.values

for i in range(0, np.shape(insample)[0]-BlockSize, BlockSize):
x = insample[i:i+BlockSize, :]
mu.append(np.mean(x, axis=0))
sig.append(EmpiricalCovariance().fit(x).covariance_)

mu = np.asarray(mu)
sig = np.asarray(sig)

Mu = np.mean(mu, axis=0)
N = np.shape(sig)[0]
Sig = np.zeros(np.shape(sig[0, :, :]))

for i in range(0, N):
Sig += sig[i]

Sig /= N * 1.
invSig = np.linalg.inv(Sig)
S = np.zeros(np.shape(Sig))

for i in range(0, N):
S = S + np.outer(mu[i], mu[i]) + 2 * np.dot(sig[i],np.dot(invSig, sig[i]))
S = S / N - np.outer(Mu,Mu) - 2 * Sig;
phi,lambdas = sp.linalg.eig(S, Sig)
idx = phi.argsort()
stat = lambdas[:, idx[0]]

pvalue = np.dot(prices.values, stat)
return
context.mean, context.std = fitOU(pvalue)
signal = (pvalue[-1] - context.mean) / context.std
abspval = np.dot(prices.values[-1, :], np.abs(stat))

if abs(signal) > 0.25:
context.signal = signal
print signal
openPos(-np.sign(signal), prices, stat, abspval, data, context)

def openPos(side, prices, W, abspval, data, context):
context.positionStocks = []
wsum = 0
for i, sid in enumerate(prices):
val = W[i] * data.current(sid,"price") / abspval * context.portfolio.portfolio_value * side
order_target_value(sid, val)
wsum += val
context.positionStocks.append(sid)
context.days = 0
context.posSign = side
context.weights = W
context.hasPosition = True

def monitorPosition(context, data):
if not context.hasPosition:
return

prices = []
for sid in context.positionStocks:
prices.append(data.current(sid,"price"))

pvalue = np.dot(prices, context.weights)
signal = (pvalue - context.mean) / context.std

if context.posSign > 0 and (signal > 0 or signal < -6):
closeAll(context, data, signal)
elif context.posSign < 0 and (signal < 0 or signal > 6):
closeAll(context, data, signal)
elif context.days > 35:
closeAll(context, data, signal)

def closeAll(context, data, signal):
context.positionStocks = None
context.mean = 0
context.std = 0
context.posSign = 0
context.weights = None
context.hasPosition = False
log.info(signal)
for sid in context.portfolio.positions:
order_target(sid, 0)

def fitOU(S):
n = np.shape(S)[0] - 1
Sx  = np.sum(S[:-1])
Sy  = np.sum(S[1:])
Sxx = np.dot(S[:-1], S[:-1])
Sxy = np.dot(S[:-1], S[1:])
Syy = np.dot(S[1:], S[1:])

a  = ( n*Sxy - Sx*Sy ) / ( n*Sxx - math.pow(Sx,2))
b  = ( Sy - a*Sx ) / n
sd = math.sqrt((n*Syy - math.pow(Sy,2) - a*(n*Sxy - Sx*Sy) )/n/(n-2))
delta  = 1. / 252.
mu     = b/(1-a)
sigma  = sd * math.sqrt( -2* math.log(a)/delta/(1-math.pow(a,2)))
return mu, sigma        
There was a runtime error.

Basically given X non stationary stock prices, it finds out weights such that the combined portfolio is stationary.

Hugely valuable explanation and thank you. I have cloned the algo, for which more thanks and will turn my attention to it once I have got a little further forward in my foray into machine learning (using Noddy textbooks that even I can understand).

I am still trying to answer the following question which I have been asking myself for a while now:

Does increased statistical and ML sophistication translate into better trading? Does it (can it) enhance or replace "traditional" technical analysis solutions converted to systematic strategies such as the simple momentum system I posted here.

Will my research overcome my scepticism and lead me to withdraw my accusations of "mathturbation" ( a phrase famously used by sometime physicist and trader Ed Seykota).

Will I manage to verify the following academic paper? And if so, will it stand up to further out of sample training/testing? Will it achieve a high CAGR in actual trading or will it turn out to have been over fitted to the data?

45% CAGR from ML enhanced Momentum Trading

That is an interesting paper Anthony.