Median Reversion Strategy for On-Line Portfolio Selection

Hi all,
I have looked at this algorithm and back test
I was wondering if someone could give me some detailed explanation in regards to this algorithm logic
and why is it so successful??...
I know my question is a bit vague but any help would be greatly appreciated.

Many thanks all,
Best Regards,
Andrew

62
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
 Returns 1 Month 3 Month 6 Month 12 Month
 Alpha 1 Month 3 Month 6 Month 12 Month
 Beta 1 Month 3 Month 6 Month 12 Month
 Sharpe 1 Month 3 Month 6 Month 12 Month
 Sortino 1 Month 3 Month 6 Month 12 Month
 Volatility 1 Month 3 Month 6 Month 12 Month
 Max Drawdown 1 Month 3 Month 6 Month 12 Month
"""
Robust Median Reversion Strategy for On-Line Portfolio Selection,
Dingjiang Huang, Junlong Zhou, Bin Li, Steven C.H. Hoi, and Shuigeng
Zhou. International Joint Conference on Artificial Intelligence, 2013.
http://ijcai.org/papers13/Papers/IJCAI13-296.pdf
"""
import numpy as np
from pytz import timezone
from scipy.optimize import minimize_scalar
import pandas as pd

window = 15 # trailing window length in bars

def initialize(context):
"""
Initialize context object.

Context object is passed to the every call of handle_data.
It uses to store data to use in the algo.

:param context: context object
:returns: None
"""
context.stocks = [ sid(19662), # XLY Consumer Discrectionary SPDR Fund
sid(19656), # XLF Financial SPDR Fund
sid(19658), # XLK Technology SPDR Fund
sid(19655), # XLE Energy SPDR Fund
sid(19661), # XLV Health Care SPRD Fund
sid(19657), # XLI Industrial SPDR Fund
sid(19659), # XLP Consumer Staples SPDR Fund
sid(19654), # XLB Materials SPDR Fund
sid(19660)] # XLU Utilities SPRD Fund

context.prices = np.zeros([window,len(context.stocks)])

context.bar_count = 0

context.eps = 5 # change epsilon here
context.init = False

def handle_data(context, data):
"""
The main proccessing function.

This function is called by quantopian backend whenever a market event
occurs for any of algorithm's specified securities.

:param context: context object
:param data: A dictionary containing market data keyed by security id.
It represents a snapshot of your algorithm's universe as of
when this method was called.
"""
record(cash=context.portfolio.cash)

if not context.init:
# initializisation. Buy the same amount of each security
part = 1. / len(context.stocks)
for stock in context.stocks:
order_target_percent(stock, part)
context.init = True
accumulator(context,data)
return

# accumulate bars
accumulator(context,data)

if context.bar_count < window:
return

if context.bar_count % trading_frequency != 0.0:
return

if get_open_orders():
return

# skip bar if any stocks did not trade
for stock in context.stocks:
if data[stock].datetime < get_datetime():
return

parts = rmr_strategy(context.portfolio, context.stocks, data,
context.prices, context.eps)

# rebalance portfolio accroding to new allocation
for stock, portion in zip(context.stocks, parts):
order_target_percent(stock, portion)

def rmr_strategy(portfolio, stocks, data, prices, eps):
"""
Core of Robust Median Reversion strategy implementation.

:param portfolio: portfolio object
:param stocks: list of sid objects used in the algo
:param data: market event object
:param prices: historical data in a form of numpy matrix
:param eps: epsilon value
:returns: new allocation for the portfolio securities
"""
# update portfolio
b_t = []
for stock in stocks:
b_t.append(portfolio.positions[stock].amount * data[stock].price)

b_t = np.divide(b_t, np.sum(b_t))

x_tilde = np.zeros(len(stocks))
for i, stock in enumerate(stocks):
x_tilde[i] = l1_median(prices[:,i])/prices[-1,i]

x_bar = x_tilde.mean()

# Calculate terms for lambda (lam)
dot_prod = np.dot(b_t, x_tilde)
num = eps - dot_prod
denom = (np.linalg.norm((x_tilde - x_bar))) ** 2

b = b_t
# test for divide-by-zero case
if denom != 0.0:
b = b_t + max(0, num/denom) * (x_tilde - x_bar)

return simplex_projection(b)

def simplex_projection(v, b=1):
"""
Projection vectors to the simplex domain.

Implemented according to the paper: Efficient projections onto the
l1-ball for learning in high dimensions, John Duchi, et al. ICML 2008.
Implementation Time: 2011 June 17 by [email protected] AT pmail.ntu.edu.sg
Optimization Problem: min_{w}\| w - v \|_{2}^{2}
s.t. sum_{i=1}^{m}=z, w_{i}\geq 0

Input: A vector v \in R^{m}, and a scalar z > 0 (default=1)
Output: Projection vector w

:Example:
>>> proj = simplex_projection([.4 ,.3, -.4, .5])
>>> print proj
array([ 0.33333333, 0.23333333, 0. , 0.43333333])
>>> print proj.sum()
1.0

Original matlab implementation: John Duchi ([email protected])
Python-port: Copyright 2012 by Thomas Wiecki ([email protected]).
"""
v = np.asarray(v)
p = len(v)

# Sort v into u in descending order
v = (v > 0) * v
u = np.sort(v)[::-1]
sv = np.cumsum(u)

rho = np.where(u > (sv - b) / np.arange(1, p+1))[-1]
theta = np.max([0, (sv[rho] - b) / (rho+1)])
w = (v - theta)
w[w < 0] = 0
return w

def l1_median(x):
"""
Computes L1 median (spatial median) using scipy.optimize.minimize_scalar

:param x: a numpy 1D ndarray (vector) of values
:returns: scalar estimate of L1 median of values
"""
a = float(np.amin(x))
b = float(np.amax(x))

res = minimize_scalar(dist_sum, bounds = (a,b), args = tuple(x), method='bounded')

return res.x

def dist_sum(m,*args):
"""
1D sum of Euclidian distances

:param m: scalar position
:param *args: tuple of positions
:returns: 1D sum of Euclidian distances
"""

s = 0
for x in args:
s += abs(x-m)

return s

def accumulator(context,data):

if context.bar_count < window:
for i, stock in enumerate(context.stocks):
context.prices[context.bar_count,i] = data[stock].price
else:
context.prices = np.roll(context.prices,-1,axis=0)
for i, stock in enumerate(context.stocks):
context.prices[-1,i] = data[stock].price

context.bar_count += 1

########################################################
# Slippage model to trade at the open or at a fraction of the open - close range.
'''Class for slippage model to allow trading at the open
or at a fraction of the open to close range.
'''
# Constructor, self and fraction of the open to close range to add (subtract)
#   from the open to model executions more optimistically
def __init__(self, fractionOfOpenCloseRange):

# Store the percent of open - close range to take as the execution price
self.fractionOfOpenCloseRange = fractionOfOpenCloseRange

ocRange = closePrice - openPrice
ocRange = ocRange * self.fractionOfOpenCloseRange
if (ocRange != 0.0):
targetExecutionPrice = openPrice + ocRange
else:
targetExecutionPrice = openPrice
log.info('\nOrder:{0} open:{1} close:{2} exec:{3} side:{4}'.format(

# Create the transaction using the new price we've calculated.
return slippage.create_transaction(
order,
targetExecutionPrice,
order.amount
) 
There was a runtime error.
6 responses

That's very impressive. Thanks for sharing this with us. I have no input because this is frankly beyond my skill level.

What was your leverage at for this backtest?

I checked. No leverage.

I believe it is to do with slippage and commissions. I put in the following slippage and commissions:

Assumes that your entire buy order will be filled 0.01 higher and a sell order will be filled 0.01 lower (could still be conservative depending on order size and liquidity)

Slightly more expensive that IB's 0.005 per share with $1 minimum set_commission(commission.PerShare(cost=0.01, min_trade_cost=1)) With these parameters, the algorithm went to -99% returns in the 2 month period between the beginning of Jan and end of Feb this year. The algorithm is so profitable because of very unrealistic transaction costs and slippage. As a follow up, the original model has the following line of code (a custom slippage model): set_slippage(TradeAtTheOpenSlippageModel(0.1)) If we are to assume that this model is as accurate as possible, we still cannot ignore the commissions system, which has been set to: set_commission(commission.PerTrade(cost=0)) Thus, if we keep their slippage model and just change the commission scheme to the one mentioned in my prior post, the algorithm, when run from Jan 4th 2015 to present on minute data hits -99% returns by mid-March. This tells me that the original algorithm's hyperinflated results are almost entirely because of the lack of commissions being built into the system. Leverage is the ratio of amount spent vs initial capital. If borrowing occurs, leverage goes above the value of 1 (borrowing, negative cash) and that condition is called margin. As soon as one's first buy occurs, leverage is no longer zero. Examples: The leverage for a first purchase of$5 (including commission) on $10 initial capital would be .5 (one half). Spending$15 on \$10 initial capital would be 1.5.
Most of the time when we refer to "leverage" we really mean to be talking about 'margin', negative cash, borrowing.

Here's another way to keep an eye out for margin. It won't tax the logging limits too much because it only logs new highs (or lows in the second one), and can be easily dropped into an algo (it does its own initialization for storing the previous value to be able to compare):

def handle_data(context, data):
# Log any new leverage high
if 'leverage_high' not in context:
context.leverage_high = 0
if context.account.leverage > context.leverage_high:
context.leverage_high = context.account.leverage
log.info('leverage_high {}'.format(context.leverage_high))


Sometimes an easier way to think of it is lowest cash used instead, and this does that:

def handle_data(context, data):
# Log any new lowest cash level
if 'cash_low' not in context:
context.cash_low = context.portfolio.starting_cash
if context.portfolio.cash < context.cash_low:
context.cash_low = context.portfolio.cash
log.info('cash low {}'.format( int(context.cash_low) ))


You used record() for cash, that's great
Side-note: One thing that folks ought to keep in mind is that the custom chart necessarily does smoothing to be able to be so nice, so mousing over it can show averaged values that didn't really actually happen. For example if you record just two values, 0 on some bars and 100 on others, sometimes the chart will show you a value of say, 60, due to smoothing.
Lines like those above provide precision if needed.

By the way, did you know you can click custom chart legend items to toggle them off and on? I just discovered that a few days ago, useful sometimes when two or more recorded sets of values are in widely different ranges from each other.

You'll find that the algo above uses a lot of leverage/margin. It had already spent twice the initial capital five days in (leverage of 2.0040087117). Three times a couple of years later. I didn't have the patience to wait for the full test. It is useful for you to know that unfortunately the chart and metrics currently ignore margin (negative cash, borrowing). On our own we can calculate profit divided by maximum spent (including margin) and that will give us an idea of code merit no matter how much margin happens to be used as code changes are made. And that feels great to not have to be putting forth so much effort to be as close as possible to the utilization of 100% of initial capital. Good luck.