Collab request on Robust Median Reversion Strategy

I came across this algorithm write-up:

Robust Median Reversion Strategy for On-Line Portfolio Selection

http://ijcai.org/papers13/Papers/IJCAI13-296.pdf

Abstract
On-line portfolio selection has been attracting in-
creasing interests from artificial intelligence com-
munity in recent decades. Mean reversion, as one
most frequent pattern in financial markets, plays
an important role in some state-of-the-art strate-
gies. Though successful in certain datasets, ex-
isting mean reversion strategies do not fully con-
sider noises and outliers in the data, leading to
estimation error and thus non-optimal portfolios,
which results in poor performance in practice. To
overcome the limitation, we propose to exploit the
reversion phenomenon by robust L1-median esti-
mator, and design a novel on-line portfolio selec-
tion strategy named “Robust Median Reversion”
(RMR), which makes optimal portfolios based on the improved reversion estimation. Empirical re-
sults on various real markets show that RMR can
overcome the drawbacks of existing mean reversion
algorithms and achieve significantly better results.
Finally, RMR runs in linear time, and thus is suit-

Any interest in collaborating to code it? I think I'd prefer to work on github, versus the new Quantopian tool (although we could try it), since we'd have version control, bug reporting, etc.

--Grant

53 responses

Hi Grant,

Great article - thanks for sharing and it would be interesting seeing the strategy coded up.

While we don't (yet!) have version control, in collaboration you can see the changes being made in real time. The changes are auto-saved for all users. There is also a chat window to develop the strategy, note possible issues, and discuss iteration ideas.

Best of luck!

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

I am down. Send it to me. [email protected]

Hello Joshua,

I'd like to read through the paper carefully (and possibly some of the references), before jumping into coding. One piece of necessary code has already been written:

def simplex_projection(v, b=1):
"""Projection vectors to the simplex domain

Implemented according to the paper: Efficient projections onto the
l1-ball for learning in high dimensions, John Duchi, et al. ICML 2008.
Implementation Time: 2011 June 17 by [email protected] AT pmail.ntu.edu.sg
Optimization Problem: min_{w}\| w - v \|_{2}^{2}
s.t. sum_{i=1}^{m}=z, w_{i}\geq 0

Input: A vector v \in R^{m}, and a scalar z > 0 (default=1)
Output: Projection vector w

:Example:
>>> proj = simplex_projection([.4 ,.3, -.4, .5])
>>> print proj
array([ 0.33333333, 0.23333333, 0. , 0.43333333])
>>> print proj.sum()
1.0

Original matlab implementation: John Duchi ([email protected])
Python-port: Copyright 2012 by Thomas Wiecki ([email protected]).
"""

v = np.asarray(v)
p = len(v)

# Sort v into u in descending order
v = (v > 0) * v
u = np.sort(v)[::-1]
sv = np.cumsum(u)

rho = np.where(u > (sv - b) / np.arange(1, p+1))[0][-1]
theta = np.max([0, (sv[rho] - b) / (rho+1)])
w = (v - theta)
w[w


Note that the code above got truncated. Perhaps I'll put it on github. --Grant

I'm interested.

Some papers describing the "Modified Weiszfeld Algorithm" required to implement the Robust Median Reversion Strategy:

[EDIT] Another relevant reference: http://www.statistik.tuwien.ac.at/public/filz/papers/2011COST2.pdf

I set up a github file; feel free to contribute (I'm a github newbie, so just let me know if I need to set permissions, etc.):

https://github.com/gkiehne/quantopian/blob/master/l1_median.py

Grant

Josh, Ed,

I expect to read through some of the literature over the weekend. I don't have a software background, so you might consider how the program should be structured. It is basically the OLMAR algorithm shared several times on Quantopian. Although it slows down the backtesting, I think that the trailing window of data should be obtained with history at the minute level, but perhaps with trading limited to once per day.

Ed, I recall you provided a nice example of a finite state machine in Python. Might this be appropriate?

Grant

Hi Grant,

Let's implement the algo first, then make it nicer by using fsm or whatever is feasible to use. BTW, it would be good to use some chat/irc/IM for discussions. That would speed up development from my point of view. Do you have any preferences on that? gtalk/skype/irc?

Regarding working with git I'd suggest you to have a look at http://git-scm.com/book and https://github.com/erlang/otp/wiki/Writing-good-commit-messages

Regards,
Ed

Thanks Ed,

Sounds good. The first question to answer, I figure, is "Can the results reported in the paper be replicated on Quantopian?" This leads to the question of what data were used, and can we reconstruct them for backtesting?

Regarding chat, I came across https://gitter.im/ (still in beta), which claims to have github integration. Let me know your thoughts. Personally, I tend to be available early mornings/nights/weekends (Northeastern United States), and even then, sometimes have other competing demands on my time. So, "asynchronous mode" tends to work best for me.

Thanks for the github references.

Grant

may I suggest you try out the R&D framework I made: https://github.com/Novaleaf/QuantShim as it makes development/debugging of intraday algos a bit easier. if you do use it, let me know and i'll help out further if needed.

Thanks Jason,

I am tweaking a version of the OLMAR algorithm, and for now, using numpy.median instead of the L1 median (spatial median) called for in the paper. I'll post the backtest result here for folks to chew on.

Grant

Here's my first cut at the algorithm, using numpy.median instead of the L1 median.

I also added the algo to github:

https://github.com/gkiehne/quantopian/blob/master/rmr.py

Note these settings were used, so the return in optimistic:

set_slippage(slippage.FixedSlippage(spread=0.00))


Grant

181
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
 Returns 1 Month 3 Month 6 Month 12 Month
 Alpha 1 Month 3 Month 6 Month 12 Month
 Beta 1 Month 3 Month 6 Month 12 Month
 Sharpe 1 Month 3 Month 6 Month 12 Month
 Sortino 1 Month 3 Month 6 Month 12 Month
 Volatility 1 Month 3 Month 6 Month 12 Month
 Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Robust Median Reversion Strategy for On-Line Portfolio Selection,
# Dingjiang Huang, Junlong Zhou, Bin Li, Steven C.H. Hoi, and Shuigeng
# Zhou. International Joint Conference on Artificial Intelligence, 2013.
# http://ijcai.org/papers13/Papers/IJCAI13-296.pdf

import numpy as np
from pytz import timezone

def initialize(context):

context.stocks = [ sid(19662),  # XLY Consumer Discrectionary SPDR Fund
sid(19656),  # XLF Financial SPDR Fund
sid(19658),  # XLK Technology SPDR Fund
sid(19655),  # XLE Energy SPDR Fund
sid(19661),  # XLV Health Care SPRD Fund
sid(19657),  # XLI Industrial SPDR Fund
sid(19659),  # XLP Consumer Staples SPDR Fund
sid(19654),  # XLB Materials SPDR Fund
sid(19660) ] # XLU Utilities SPRD Fund

context.m = len(context.stocks)
context.b_t = np.ones(context.m) / context.m
context.eps = 5 # change epsilon here
context.init = False

def handle_data(context, data):

prices = history(6,'1d','price').as_matrix(context.stocks)[0:-1,:]

cash = context.portfolio.cash
record(cash=cash)

if not context.init:
rebalance_portfolio(context, context.b_t)
context.init = True
return

return

# skip bar if any orders are open or any stocks did not trade
for stock in context.stocks:
if bool(get_open_orders(stock)) or data[stock].datetime < get_datetime():
return

# update portfolio
for i, stock in enumerate(context.stocks):
context.b_t[i] = context.portfolio.positions[stock].amount*data[stock].price

context.b_t = np.divide(context.b_t,np.sum(context.b_t))

m = context.m
x_tilde = np.zeros(m)
b = np.zeros(m)

for i, stock in enumerate(context.stocks):
# Use numpy median until L1 median (spatial median) implemented
median_price = np.median(prices[:,i])
x_tilde[i] = median_price/prices[-1,i]

###########################
# Inside of OLMAR (algo 2)

x_bar = x_tilde.mean()

# Calculate terms for lambda (lam)
dot_prod = np.dot(context.b_t, x_tilde)
num = context.eps - dot_prod
denom = (np.linalg.norm((x_tilde-x_bar)))**2

# test for divide-by-zero case
if denom == 0.0:
lam = 0 # no portolio update
else:
lam = max(0, num/denom)

b = context.b_t + lam*(x_tilde-x_bar)

b_norm = simplex_projection(b)

rebalance_portfolio(context, b_norm)

def rebalance_portfolio(context, desired_port):

for i, stock in enumerate(context.stocks):
order_target_percent(stock,desired_port[i])

# Converts all time-zones into US EST to avoid confusion
loc_dt = get_datetime().astimezone(timezone('US/Eastern'))
if loc_dt.hour == 10 and loc_dt.minute > 0:
return True
else:
return False

def simplex_projection(v, b=1):
"""Projection vectors to the simplex domain

Implemented according to the paper: Efficient projections onto the
l1-ball for learning in high dimensions, John Duchi, et al. ICML 2008.
Implementation Time: 2011 June 17 by [email protected] AT pmail.ntu.edu.sg
Optimization Problem: min_{w}\| w - v \|_{2}^{2}
s.t. sum_{i=1}^{m}=z, w_{i}\geq 0

Input: A vector v \in R^{m}, and a scalar z > 0 (default=1)
Output: Projection vector w

:Example:
>>> proj = simplex_projection([.4 ,.3, -.4, .5])
>>> print proj
array([ 0.33333333, 0.23333333, 0. , 0.43333333])
>>> print proj.sum()
1.0

Original matlab implementation: John Duchi ([email protected])
Python-port: Copyright 2012 by Thomas Wiecki ([email protected]).
"""

v = np.asarray(v)
p = len(v)

# Sort v into u in descending order
v = (v > 0) * v
u = np.sort(v)[::-1]
sv = np.cumsum(u)

rho = np.where(u > (sv - b) / np.arange(1, p+1))[0][-1]
theta = np.max([0, (sv[rho] - b) / (rho+1)])
w = (v - theta)
w[w<0] = 0
return w

There was a runtime error.

Here's an update using a brute force method for the L1 median:

def l1_median(x):
x_min = np.amin(x)
x_max = np.amax(x)
mu = np.linspace(x_min, x_max, num=50)
sum_dist = np.zeros_like(mu)
for k in range(len(mu)):
mu_k = mu[k]*np.ones_like(x)
sum_dist[k] = np.sum(np.absolute(x-mu_k))
l_min = np.argmin(sum_dist)
return mu[l_min]


set_slippage(slippage.FixedSlippage(spread=0.00))


Grant

181
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
 Returns 1 Month 3 Month 6 Month 12 Month
 Alpha 1 Month 3 Month 6 Month 12 Month
 Beta 1 Month 3 Month 6 Month 12 Month
 Sharpe 1 Month 3 Month 6 Month 12 Month
 Sortino 1 Month 3 Month 6 Month 12 Month
 Volatility 1 Month 3 Month 6 Month 12 Month
 Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Robust Median Reversion Strategy for On-Line Portfolio Selection,
# Dingjiang Huang, Junlong Zhou, Bin Li, Steven C.H. Hoi, and Shuigeng
# Zhou. International Joint Conference on Artificial Intelligence, 2013.
# http://ijcai.org/papers13/Papers/IJCAI13-296.pdf

import numpy as np
from pytz import timezone
from scipy import stats
import math

def initialize(context):

context.stocks = [ sid(19662),  # XLY Consumer Discrectionary SPDR Fund
sid(19656),  # XLF Financial SPDR Fund
sid(19658),  # XLK Technology SPDR Fund
sid(19655),  # XLE Energy SPDR Fund
sid(19661),  # XLV Health Care SPRD Fund
sid(19657),  # XLI Industrial SPDR Fund
sid(19659),  # XLP Consumer Staples SPDR Fund
sid(19654),  # XLB Materials SPDR Fund
sid(19660) ] # XLU Utilities SPRD Fund

context.m = len(context.stocks)
context.b_t = np.ones(context.m) / context.m
context.eps = 5 # change epsilon here
context.init = False

def handle_data(context, data):

prices = history(6,'1d','price').as_matrix(context.stocks)[0:-1,:]

cash = context.portfolio.cash
record(cash=cash)

if not context.init:
rebalance_portfolio(context, context.b_t)
context.init = True
return

return

# skip bar if any orders are open or any stocks did not trade
for stock in context.stocks:
if bool(get_open_orders(stock)) or data[stock].datetime < get_datetime():
return

# update portfolio
for i, stock in enumerate(context.stocks):
context.b_t[i] = context.portfolio.positions[stock].amount*data[stock].price

context.b_t = np.divide(context.b_t,np.sum(context.b_t))

m = context.m
x_tilde = np.zeros(m)
b = np.zeros(m)

for i in range(context.m):
x_tilde[i] = l1_median(prices[:,i])/prices[-1,i]

###########################
# Inside of OLMAR (algo 2)

x_bar = x_tilde.mean()

# Calculate terms for lambda (lam)
dot_prod = np.dot(context.b_t, x_tilde)
num = context.eps - dot_prod
denom = (np.linalg.norm((x_tilde-x_bar)))**2

# test for divide-by-zero case
if denom == 0.0:
lam = 0 # no portolio update
else:
lam = max(0, num/denom)

b = context.b_t + lam*(x_tilde-x_bar)

b_norm = simplex_projection(b)

rebalance_portfolio(context, b_norm)

def rebalance_portfolio(context, desired_port):

for i, stock in enumerate(context.stocks):
order_target_percent(stock,desired_port[i])

# Converts all time-zones into US EST to avoid confusion
loc_dt = get_datetime().astimezone(timezone('US/Eastern'))
if loc_dt.hour == 10 and loc_dt.minute > 0:
return True
else:
return False

def simplex_projection(v, b=1):
"""Projection vectors to the simplex domain

Implemented according to the paper: Efficient projections onto the
l1-ball for learning in high dimensions, John Duchi, et al. ICML 2008.
Implementation Time: 2011 June 17 by [email protected] AT pmail.ntu.edu.sg
Optimization Problem: min_{w}\| w - v \|_{2}^{2}
s.t. sum_{i=1}^{m}=z, w_{i}\geq 0

Input: A vector v \in R^{m}, and a scalar z > 0 (default=1)
Output: Projection vector w

:Example:
>>> proj = simplex_projection([.4 ,.3, -.4, .5])
>>> print proj
array([ 0.33333333, 0.23333333, 0. , 0.43333333])
>>> print proj.sum()
1.0

Original matlab implementation: John Duchi ([email protected])
Python-port: Copyright 2012 by Thomas Wiecki ([email protected]).
"""

v = np.asarray(v)
p = len(v)

# Sort v into u in descending order
v = (v > 0) * v
u = np.sort(v)[::-1]
sv = np.cumsum(u)

rho = np.where(u > (sv - b) / np.arange(1, p+1))[0][-1]
theta = np.max([0, (sv[rho] - b) / (rho+1)])
w = (v - theta)
w[w<0] = 0
return w

def l1_median(x):

x_min = np.amin(x)
x_max = np.amax(x)

mu = np.linspace(x_min, x_max, num=50)

sum_dist = np.zeros_like(mu)

for k in range(len(mu)):
mu_k = mu[k]*np.ones_like(x)
sum_dist[k] = np.sum(np.absolute(x-mu_k))

l_min = np.argmin(sum_dist)

return mu[l_min]


There was a runtime error.

What is the superscript plus notation in the Modified Weiszfeld algorithm? It's on page 4 under Proposition 1. The part with (1 - η(μ) / γ(μ)) superscript +.

The part of the equations that use Pt-i != μ don't make sense to me. When I tried this on some data sets there is never a case where this condition is met where an n assets x 1 price vector is equal to μ. So I'm also getting η == 0. The second the term in the equation is always zero.

Hello Brent,

Not sure about the superscript plus sign. In notation used for sets, it indicates that only positive elements are included (see http://www.cs.virginia.edu/~robins/cs3102/basics.pdf). I don't know how this applies here, however.

Regarding the Pt-i != μ condition, when I skimmed over the "Modified Weiszfeld Algorithm" paper I gathered that the modification was developed to address the special case of μ landing on one of the p's and getting stuck during the iterative solution (see http://www.stat.rutgers.edu/home/cunhui/papers/43.pdf, pg. 2).

There appears to be a better discussion of the Modified Weiszfeld Algorithm in http://www.statistik.tuwien.ac.at/public/filz/papers/2011COST2.pdf. Take note of the improvement mentioned on pg. 5.

Also, note that we only need code for the 1D L1 median, as I illustrate above with the brute force method.

Grant

I do the backtest with default slippage and commission($0.03 per share) .fyi 1 Loading... Total Returns -- Alpha -- Beta -- Sharpe -- Sortino -- Max Drawdown -- Benchmark Returns -- Volatility --  Returns 1 Month 3 Month 6 Month 12 Month  Alpha 1 Month 3 Month 6 Month 12 Month  Beta 1 Month 3 Month 6 Month 12 Month  Sharpe 1 Month 3 Month 6 Month 12 Month  Sortino 1 Month 3 Month 6 Month 12 Month  Volatility 1 Month 3 Month 6 Month 12 Month  Max Drawdown 1 Month 3 Month 6 Month 12 Month # Robust Median Reversion Strategy for On-Line Portfolio Selection, # Dingjiang Huang, Junlong Zhou, Bin Li, Steven C.H. Hoi, and Shuigeng # Zhou. International Joint Conference on Artificial Intelligence, 2013. # http://ijcai.org/papers13/Papers/IJCAI13-296.pdf import numpy as np from pytz import timezone from scipy import stats import math def initialize(context): context.stocks = [ sid(19662), # XLY Consumer Discrectionary SPDR Fund sid(19656), # XLF Financial SPDR Fund sid(19658), # XLK Technology SPDR Fund sid(19655), # XLE Energy SPDR Fund sid(19661), # XLV Health Care SPRD Fund sid(19657), # XLI Industrial SPDR Fund sid(19659), # XLP Consumer Staples SPDR Fund sid(19654), # XLB Materials SPDR Fund sid(19660) ] # XLU Utilities SPRD Fund context.m = len(context.stocks) context.b_t = np.ones(context.m) / context.m context.eps = 5 # change epsilon here context.init = False #set_slippage(slippage.FixedSlippage(spread=0.00)) #set_commission(commission.PerShare(cost=0.01)) def handle_data(context, data): prices = history(6,'1d','price').as_matrix(context.stocks)[0:-1,:] cash = context.portfolio.cash record(cash=cash) if not context.init: rebalance_portfolio(context, context.b_t) context.init = True return if not intradingwindow_check(context): return # skip bar if any orders are open or any stocks did not trade for stock in context.stocks: if bool(get_open_orders(stock)) or data[stock].datetime < get_datetime(): return # update portfolio for i, stock in enumerate(context.stocks): context.b_t[i] = context.portfolio.positions[stock].amount*data[stock].price context.b_t = np.divide(context.b_t,np.sum(context.b_t)) m = context.m x_tilde = np.zeros(m) b = np.zeros(m) for i in range(context.m): x_tilde[i] = l1_median(prices[:,i])/prices[-1,i] ########################### # Inside of OLMAR (algo 2) x_bar = x_tilde.mean() # Calculate terms for lambda (lam) dot_prod = np.dot(context.b_t, x_tilde) num = context.eps - dot_prod denom = (np.linalg.norm((x_tilde-x_bar)))**2 # test for divide-by-zero case if denom == 0.0: lam = 0 # no portolio update else: lam = max(0, num/denom) b = context.b_t + lam*(x_tilde-x_bar) b_norm = simplex_projection(b) rebalance_portfolio(context, b_norm) def rebalance_portfolio(context, desired_port): for i, stock in enumerate(context.stocks): order_target_percent(stock,desired_port[i]) def intradingwindow_check(context): # Converts all time-zones into US EST to avoid confusion loc_dt = get_datetime().astimezone(timezone('US/Eastern')) if loc_dt.hour == 10 and loc_dt.minute > 0: return True else: return False def simplex_projection(v, b=1): """Projection vectors to the simplex domain Implemented according to the paper: Efficient projections onto the l1-ball for learning in high dimensions, John Duchi, et al. ICML 2008. Implementation Time: 2011 June 17 by [email protected] AT pmail.ntu.edu.sg Optimization Problem: min_{w}\| w - v \|_{2}^{2} s.t. sum_{i=1}^{m}=z, w_{i}\geq 0 Input: A vector v \in R^{m}, and a scalar z > 0 (default=1) Output: Projection vector w :Example: >>> proj = simplex_projection([.4 ,.3, -.4, .5]) >>> print proj array([ 0.33333333, 0.23333333, 0. , 0.43333333]) >>> print proj.sum() 1.0 Original matlab implementation: John Duchi ([email protected]) Python-port: Copyright 2012 by Thomas Wiecki ([email protected]). """ v = np.asarray(v) p = len(v) # Sort v into u in descending order v = (v > 0) * v u = np.sort(v)[::-1] sv = np.cumsum(u) rho = np.where(u > (sv - b) / np.arange(1, p+1))[0][-1] theta = np.max([0, (sv[rho] - b) / (rho+1)]) w = (v - theta) w[w<0] = 0 return w def l1_median(x): x_min = np.amin(x) x_max = np.amax(x) mu = np.linspace(x_min, x_max, num=50) sum_dist = np.zeros_like(mu) for k in range(len(mu)): mu_k = mu[k]*np.ones_like(x) sum_dist[k] = np.sum(np.absolute(x-mu_k)) l_min = np.argmin(sum_dist) return mu[l_min]  There was a runtime error. Hi Grant, I started to work on improving code style and readability of the algo. Here is my first pull request: https://github.com/gkiehne/quantopian/pull/1 Please, review my changes. We can discuss them either here or on github. Regards, Ed Hi Grant, i see that you are very familiar with a lot of research papers.... Why don't you put a few lines in your profile so that we know better what is your background and your area of interest? I think the discussions on the community are the best part of Quantopian, and a better reciprocal knowledge will certainly improve the quality of the communication. In any case, thanks for all the work that you are sharing. Hi Grant, What do you think about using @batch_transform and switching algo to daily time frame? It would speed up development a lot. Regards, Ed My understanding is that the batch transform will be deprecated, so I've been avoiding it. Also, daily bars are not supported in paper/live trading. I guess I'd like to hear from Quantopian regarding the fundamental problem of not being able to accelerate backtests in minute mode, by restricting trading to a specific time. It seems that there is a lot of overhead... --Grant Here's a re-factored version, thanks to Ed Bartosh. It is available here for collaboration: https://github.com/gkiehne/quantopian/blob/master/rmr.py --Grant 181 Loading... Total Returns -- Alpha -- Beta -- Sharpe -- Sortino -- Max Drawdown -- Benchmark Returns -- Volatility --  Returns 1 Month 3 Month 6 Month 12 Month  Alpha 1 Month 3 Month 6 Month 12 Month  Beta 1 Month 3 Month 6 Month 12 Month  Sharpe 1 Month 3 Month 6 Month 12 Month  Sortino 1 Month 3 Month 6 Month 12 Month  Volatility 1 Month 3 Month 6 Month 12 Month  Max Drawdown 1 Month 3 Month 6 Month 12 Month """ Robust Median Reversion Strategy for On-Line Portfolio Selection, Dingjiang Huang, Junlong Zhou, Bin Li, Steven C.H. Hoi, and Shuigeng Zhou. International Joint Conference on Artificial Intelligence, 2013. http://ijcai.org/papers13/Papers/IJCAI13-296.pdf """ import numpy as np from pytz import timezone def initialize(context): """ Initialize context object. Context object is passed to the every call of handle_data. It uses to store data to use in the algo. :param context: context object :returns: None """ context.stocks = [ sid(19662), # XLY Consumer Discrectionary SPDR Fund sid(19656), # XLF Financial SPDR Fund sid(19658), # XLK Technology SPDR Fund sid(19655), # XLE Energy SPDR Fund sid(19661), # XLV Health Care SPRD Fund sid(19657), # XLI Industrial SPDR Fund sid(19659), # XLP Consumer Staples SPDR Fund sid(19654), # XLB Materials SPDR Fund sid(19660) ] # XLU Utilities SPRD Fund context.eps = 5 # change epsilon here context.init = False set_slippage(slippage.FixedSlippage(spread=0.00)) set_commission(commission.PerTrade(cost=0)) def handle_data(context, data): """ The main proccessing function. This function is called by quantopian backedn whenever a market event occurs for any of algorithm's specified securities. :param context: context object :param data: A dictionary containing market data keyed by security id. It represents a snapshot of your algorithm's universe as of when this method was called. """ record(cash=context.portfolio.cash) if not context.init: # initializisation. Buy the same amount of each security part = 1. / len(context.stocks) for stock in context.stocks: order_target_percent(stock, part) context.init = True return # Trade only once per day, at 10:00 loc_dt = get_datetime().astimezone(timezone('US/Eastern')) if loc_dt.hour != 10 or loc_dt.minute != 0: return if get_open_orders(): return # skip bar if any stocks did not trade for stock in context.stocks: if data[stock].datetime < get_datetime(): return prices = history(6, '1d', 'price').as_matrix(context.stocks)[0:-1,:] parts = rmr_strategy(context.portfolio, context.stocks, data, prices, context.eps) # rebalance portfolio accroding to new allocation for stock, portion in zip(context.stocks, parts): order_target_percent(stock, portion) def rmr_strategy(portfolio, stocks, data, prices, eps): """ Core of Robust Median Reviersion strategy implementation. :param portfolio: portfolio object :param stocks: list of sid objects used in the algo :param data: market event object :param prices: historical data in a form of numpy matrix :param eps: epsilon value :returns: new allocation for the portfolio securities """ # update portfolio b_t = [] for stock in stocks: b_t.append(portfolio.positions[stock].amount * data[stock].price) b_t = np.divide(b_t, np.sum(b_t)) x_tilde = np.zeros(len(stocks)) for i, stock in enumerate(stocks): # Use numpy median until L1 median (spatial median) implemented median_price = np.median(prices[:,i]) x_tilde[i] = median_price/prices[-1,i] x_bar = x_tilde.mean() # Calculate terms for lambda (lam) dot_prod = np.dot(b_t, x_tilde) num = eps - dot_prod denom = (np.linalg.norm((x_tilde - x_bar))) ** 2 b = b_t # test for divide-by-zero case if denom != 0.0: b = b_t + max(0, num/denom) * (x_tilde - x_bar) return simplex_projection(b) def simplex_projection(v, b=1): """ Projection vectors to the simplex domain. Implemented according to the paper: Efficient projections onto the l1-ball for learning in high dimensions, John Duchi, et al. ICML 2008. Implementation Time: 2011 June 17 by [email protected] AT pmail.ntu.edu.sg Optimization Problem: min_{w}\| w - v \|_{2}^{2} s.t. sum_{i=1}^{m}=z, w_{i}\geq 0 Input: A vector v \in R^{m}, and a scalar z > 0 (default=1) Output: Projection vector w :Example: >>> proj = simplex_projection([.4 ,.3, -.4, .5]) >>> print proj array([ 0.33333333, 0.23333333, 0. , 0.43333333]) >>> print proj.sum() 1.0 Original matlab implementation: John Duchi ([email protected]) Python-port: Copyright 2012 by Thomas Wiecki ([email protected]). """ v = np.asarray(v) p = len(v) # Sort v into u in descending order v = (v > 0) * v u = np.sort(v)[::-1] sv = np.cumsum(u) rho = np.where(u > (sv - b) / np.arange(1, p+1))[0][-1] theta = np.max([0, (sv[rho] - b) / (rho+1)]) w = (v - theta) w[w < 0] = 0 return w There was a runtime error. do you guys know any easy math primers to help understand the symbols in the paper? It's been so long since I've used real math I can't figure any of it out :( Which part are you trying to figure out, specifically? When I get the chance, perhaps I'll provide an outline of the parts I understand (things get kinda hairy under the hood with the optimization stuff). --Grant Hi Grant, well ... really all of it. but I found a wikipedia page that seems to be what I need: http://en.wikipedia.org/wiki/List_of_mathematical_symbols so now I just need to slog through the paper props for getting this strategy working! I"m always very skeptical of public papers like this but indeed it seems to be valid. I'm starting to go through the paper now, though I see that grant's implementation of the l1 median only very slightly outperforms standard median...... @grant and @ed, are you guys using a chat software at all? please pm me or shoot me an email: [email protected] Hi Jason, No, we didn't use chat. I created pull request on github and Grant accepted it. We can try to use gitter as Grant suggested. I've created chatroom there: https://gitter.im/bartosh/quantopian Please, try to join. PS: I share you pain in understanding the paper and matching the code to it. I think the best way to get explanations is to invite Grant to the chat and ask him for detailed explanations. Regards, Ed Hi Grant, Regarding batch_transform vs history. I propose to modify algo to use both of them. While we're in developing mode we can still use batch_transform, which is much faster. Regards, ed Hello Ed, I understand your point regarding improving the backtest efficiency by using daily bars. I'd prefer to steer clear of the batch transform; it will be deprecated so let's not propagate it. Instead I suggest we construct our own accumulator. A couple approaches: 1. Deque (https://docs.python.org/2/library/collections.html). [EDIT] https://www.quantopian.com/posts/rookie-q-how-do-i-store-data-for-consecutive-ticks 2. Adopt the RollingPanel in zipline (https://github.com/quantopian/zipline/blob/master/zipline/utils/data.py). Another, more fundamental issue, is that a backtest running on daily bars executes orders using the close of the next trading day. Anony Mole developed a custom slippage model to address the problem, which he introduced here: https://www.quantopian.com/posts/trade-at-the-open-slippage-model Jason Swearingen presents a variant, as well. Basically, if we are going to use daily bars, we should make some approximations so that the results map onto live trading using minute bars. Of course, all of this raises the question of using the online version of Quantopian for development in the first place, if we are going to test on daily bars. We could consider using zipline, instead, with the idea of eventually moving to minute bars on Quantopian if everything pans out. Grant hi grant, maybe you can get on the chat site ed setup. i re-implemented the algo from your guys references and see some... questionable algorithms in play. unfortunatly as my mathy sage skills are weak I haven't started reading the paper to try to decypher if it's your guys implementation or issues with the algo itself. for starters, you can change your rmr_strategy to return simplex_projection((x_tilde - x_bar)*4000) and you'll effectively get the same results as the more complex thing you guys have right now. anyway, i'm still looking at your algo, i'll try to get into the paper tomorrow. Here's a variant that limits trading with: if context.day_count % 5 != 0.0: return  I use the default slippage model, with commissions: set_commission(commission.PerShare(cost=0.013, min_trade_cost=1.3))  Not super sexy, but it does suggest the potential to beat the market without taking on additional risk. Grant 181 Loading... Total Returns -- Alpha -- Beta -- Sharpe -- Sortino -- Max Drawdown -- Benchmark Returns -- Volatility --  Returns 1 Month 3 Month 6 Month 12 Month  Alpha 1 Month 3 Month 6 Month 12 Month  Beta 1 Month 3 Month 6 Month 12 Month  Sharpe 1 Month 3 Month 6 Month 12 Month  Sortino 1 Month 3 Month 6 Month 12 Month  Volatility 1 Month 3 Month 6 Month 12 Month  Max Drawdown 1 Month 3 Month 6 Month 12 Month """ Robust Median Reversion Strategy for On-Line Portfolio Selection, Dingjiang Huang, Junlong Zhou, Bin Li, Steven C.H. Hoi, and Shuigeng Zhou. International Joint Conference on Artificial Intelligence, 2013. http://ijcai.org/papers13/Papers/IJCAI13-296.pdf """ import numpy as np from pytz import timezone def initialize(context): """ Initialize context object. Context object is passed to the every call of handle_data. It uses to store data to use in the algo. :param context: context object :returns: None """ context.stocks = [ sid(19662), # XLY Consumer Discrectionary SPDR Fund sid(19656), # XLF Financial SPDR Fund sid(19658), # XLK Technology SPDR Fund sid(19655), # XLE Energy SPDR Fund sid(19661), # XLV Health Care SPRD Fund sid(19657), # XLI Industrial SPDR Fund sid(19659), # XLP Consumer Staples SPDR Fund sid(19654), # XLB Materials SPDR Fund sid(19660) ] # XLU Utilities SPRD Fund context.eps = 1 # change epsilon here context.init = False context.day_count = -1 # set_slippage(slippage.FixedSlippage(spread=0.00)) # set_commission(commission.PerTrade(cost=0)) set_commission(commission.PerShare(cost=0.013, min_trade_cost=1.3)) def handle_data(context, data): """ The main proccessing function. This function is called by quantopian backedn whenever a market event occurs for any of algorithm's specified securities. :param context: context object :param data: A dictionary containing market data keyed by security id. It represents a snapshot of your algorithm's universe as of when this method was called. """ # record(cash=context.portfolio.cash) if not context.init: # initializisation. Buy the same amount of each security part = 1. / len(context.stocks) for stock in context.stocks: order_target_percent(stock, part) context.init = True return # Trade only once per day, at 10:00 loc_dt = get_datetime().astimezone(timezone('US/Eastern')) if loc_dt.hour != 10 or loc_dt.minute != 0: return else: context.day_count += 1 if context.day_count % 5 != 0.0: return if get_open_orders(): return # skip bar if any stocks did not trade for stock in context.stocks: if data[stock].datetime < get_datetime(): return prices = history(6, '1d', 'price').as_matrix(context.stocks)[0:-1,:] parts = rmr_strategy(context.portfolio, context.stocks, data, prices, context.eps) # rebalance portfolio accroding to new allocation for stock, portion in zip(context.stocks, parts): order_target_percent(stock, portion) def rmr_strategy(portfolio, stocks, data, prices, eps): """ Core of Robust Median Reviersion strategy implementation. :param portfolio: portfolio object :param stocks: list of sid objects used in the algo :param data: market event object :param prices: historical data in a form of numpy matrix :param eps: epsilon value :returns: new allocation for the portfolio securities """ # update portfolio b_t = [] for stock in stocks: b_t.append(portfolio.positions[stock].amount * data[stock].price) b_t = np.divide(b_t, np.sum(b_t)) x_tilde = np.zeros(len(stocks)) for i, stock in enumerate(stocks): x_tilde[i] = l1_median(prices[:,i])/prices[-1,i] x_bar = x_tilde.mean() # Calculate terms for lambda (lam) dot_prod = np.dot(b_t, x_tilde) num = eps - dot_prod denom = (np.linalg.norm((x_tilde - x_bar))) ** 2 b = b_t # test for divide-by-zero case if denom != 0.0: b = b_t + max(0, num/denom) * (x_tilde - x_bar) return simplex_projection(b) def simplex_projection(v, b=1): """ Projection vectors to the simplex domain. Implemented according to the paper: Efficient projections onto the l1-ball for learning in high dimensions, John Duchi, et al. ICML 2008. Implementation Time: 2011 June 17 by [email protected] AT pmail.ntu.edu.sg Optimization Problem: min_{w}\| w - v \|_{2}^{2} s.t. sum_{i=1}^{m}=z, w_{i}\geq 0 Input: A vector v \in R^{m}, and a scalar z > 0 (default=1) Output: Projection vector w :Example: >>> proj = simplex_projection([.4 ,.3, -.4, .5]) >>> print proj array([ 0.33333333, 0.23333333, 0. , 0.43333333]) >>> print proj.sum() 1.0 Original matlab implementation: John Duchi ([email protected]) Python-port: Copyright 2012 by Thomas Wiecki ([email protected]). """ v = np.asarray(v) p = len(v) # Sort v into u in descending order v = (v > 0) * v u = np.sort(v)[::-1] sv = np.cumsum(u) rho = np.where(u > (sv - b) / np.arange(1, p+1))[0][-1] theta = np.max([0, (sv[rho] - b) / (rho+1)]) w = (v - theta) w[w < 0] = 0 return w def l1_median(x): """ Computes L1 median (spatial median) using brute force grid search. :param x: a numpy 1D ndarray (vector) of values :returns: scalar estimate of L1 median of values """ x_min = np.amin(x) x_max = np.amax(x) mu = np.linspace(x_min, x_max, num=50) sum_dist = np.zeros_like(mu) for k in range(len(mu)): mu_k = mu[k]*np.ones_like(x) sum_dist[k] = np.sum(np.absolute(x-mu_k)) l_min = np.argmin(sum_dist) return mu[l_min] There was a runtime error. Hello Jason, I'll set up the chat thingy on github, but as I mentioned to Ed, I'm not much of a chatter, for a variety of reasons. Asynchronous mode tends to work best for me. Grant Hi Grant, I'm not a chatter either, i just figured you guys don't want to spam this thread with random questions/issues. but if so, here goes: is the algorithm a complete implementation of the paper? or do you guys still have more to do? As it stands, I don't see anything special... it's a long-only strategy so we are suffering from look-ahead bias right now (bull market). also, it's mean-revision signal is to simply look at the 6day median and to weigh positional bets on that. I don't see how that's going to work :( Hey Jason, In response to your questions/comments: Is the algorithm a complete implementation of the paper? Yes, I believe so. With the addition of the brute force L1 median computation, it should be complete. A more efficient L1 median algorithm would be nice, but at this point, it is probably sufficient. Regarding the viability of the approach, it all depends on the objective. I decided to pick the same plain-vanilla securities as the ones used by Quantopian in their example re-balancing algorithm:  context.stocks = [ sid(19662), # XLY Consumer Discrectionary SPDR Fund sid(19656), # XLF Financial SPDR Fund sid(19658), # XLK Technology SPDR Fund sid(19655), # XLE Energy SPDR Fund sid(19661), # XLV Health Care SPRD Fund sid(19657), # XLI Industrial SPDR Fund sid(19659), # XLP Consumer Staples SPDR Fund sid(19654), # XLB Materials SPDR Fund sid(19660) ] # XLU Utilities SPRD Fund  Basically the entire market, as I see it. So one question is, "Is there an active trading approach that can beat the market using these securities?" In other words, why buy and hold SPY when I can construct an active portfolio that performs better than SPY? Generally, it seems like a good framework for mean reversion, since the mean/median/L1 median can be replaced: x_tilde[i] = price_predictor(prices[:,i])/prices[-1,i]  A potentially fundamental problem is that the optimization problem was formulated without trading cost and slippage taken into account. One can fiddle with the parameters to get a decent return, but the algorithm should really "learn" every trading minute, and make decisions that are optimal in the context of cost and slippage. Grant Hi Grant, if complete, then there's another issue i touched on with a msg a little earlier: return simplex_projection((x_tilde - x_bar)*4000) the original basically does something like return simplex_projection(bt + (4.xxx / 0.0xx) * (x_tilde - x_bar)) which seems rather wacky. the bt component is completely dwarfed by the other, so much that it's meaningless. also the multiplier is so large that it renderes the other parameters that go into it meaningless too.. (whats really different from 4000* than 5000* ?) not much as it all gets divided away in the simplex projection. other than that, I still don't really understand how this is going to be highly useful for mean revision. if anything i see the current sector etf selection (and this algo's returns) to be equivelent to the quantopian live strategy, not only in portfolio but this is effectively just "long holding" them, just reshuffling a little bit each day. so I really don't see any mean revision at work here.... Hi Grant, Can you explain the meaning of this prices[-1,i] and this prices[:,i] operations? I'm not a python newbie, but I don't understand that. pylint(code style checker) is also confused by that. Regards, Ed Not super sexy, but it does suggest the potential to beat the market without taking on additional risk. looking at the risk metrics(Sharp ratio is almost 3, max drawdown 0.07) I'm curious what would you call super sexy :) Ed, Your question was: Can you explain the meaning of this prices[-1,i] and this prices[:,i] operations? This is indexing of the numpy ndarray prices: prices[-1,i] - the last element of the column indexed by i prices[:,i] - all of the rows of the column indexed by i (i.e. the entire column) Each column corresponds to a set of security prices, with time increasing downward. Grant Jason, Yes, there is some mathematical voodoo going on with the simplex projection. When Thomas Wiecki and I worked on the OLMAR algorithm, I sorta convinced myself that everything was done correctly, but the underlying math was too hairy (at least for the effort I was willing to apply). Grant hi Grant, I thought I understood the simplexprojection from the example, but after logging it's output, I see I'm wrong. I'll research that more. Though empirically the simplexProjection selects the top mean revision candidate (for upward growth) and discards the others. this results in a lot more "mean revision" than I originally thought it was doing, and thus this is most likely the reason for the additional gains beyond the standard buy-and-hold. the simplex projection implementation you are using discards negative numbers. is this what it's supposed to do? the line: v = (v > 0) * v Yes, I believe so. Otherwise, the portfolio vector would have negative elements and shorting would result. --Grant Here's an incremental update. I managed to get scipy.optimize.minimize_scalar to work for finding the L1 median: def l1_median(x): """ Computes L1 median (spatial median) using scipy.optimize.minimize_scalar :param x: a numpy 1D ndarray (vector) of values :returns: scalar estimate of L1 median of values """ a = float(np.amin(x)) b = float(np.amax(x)) res = minimize_scalar(dist_sum, bounds = (a,b), args = tuple(x), method='bounded') return res.x def dist_sum(m,*args): """ 1D sum of Euclidian distances :param m: scalar position :param *args: tuple of positions :returns: 1D sum of Euclidian distances """ s = 0 for x in args: s += abs(x-m) return s  I updated https://github.com/gkiehne/quantopian/blob/master/l1_median.py with the new code. Grant 181 Loading... Total Returns -- Alpha -- Beta -- Sharpe -- Sortino -- Max Drawdown -- Benchmark Returns -- Volatility --  Returns 1 Month 3 Month 6 Month 12 Month  Alpha 1 Month 3 Month 6 Month 12 Month  Beta 1 Month 3 Month 6 Month 12 Month  Sharpe 1 Month 3 Month 6 Month 12 Month  Sortino 1 Month 3 Month 6 Month 12 Month  Volatility 1 Month 3 Month 6 Month 12 Month  Max Drawdown 1 Month 3 Month 6 Month 12 Month """ Robust Median Reversion Strategy for On-Line Portfolio Selection, Dingjiang Huang, Junlong Zhou, Bin Li, Steven C.H. Hoi, and Shuigeng Zhou. International Joint Conference on Artificial Intelligence, 2013. http://ijcai.org/papers13/Papers/IJCAI13-296.pdf """ import numpy as np from pytz import timezone from scipy.optimize import minimize_scalar def initialize(context): """ Initialize context object. Context object is passed to the every call of handle_data. It uses to store data to use in the algo. :param context: context object :returns: None """ context.stocks = [ sid(19662), # XLY Consumer Discrectionary SPDR Fund sid(19656), # XLF Financial SPDR Fund sid(19658), # XLK Technology SPDR Fund sid(19655), # XLE Energy SPDR Fund sid(19661), # XLV Health Care SPRD Fund sid(19657), # XLI Industrial SPDR Fund sid(19659), # XLP Consumer Staples SPDR Fund sid(19654), # XLB Materials SPDR Fund sid(19660) ] # XLU Utilities SPRD Fund context.eps = 1 # change epsilon here context.init = False context.day_count = -1 # set_slippage(slippage.FixedSlippage(spread=0.00)) # set_commission(commission.PerTrade(cost=0)) set_commission(commission.PerShare(cost=0.013, min_trade_cost=1.3)) def handle_data(context, data): """ The main proccessing function. This function is called by quantopian backedn whenever a market event occurs for any of algorithm's specified securities. :param context: context object :param data: A dictionary containing market data keyed by security id. It represents a snapshot of your algorithm's universe as of when this method was called. """ # record(cash=context.portfolio.cash) if not context.init: # initializisation. Buy the same amount of each security part = 1. / len(context.stocks) for stock in context.stocks: order_target_percent(stock, part) context.init = True return # Trade only once per day, at 10:00 loc_dt = get_datetime().astimezone(timezone('US/Eastern')) if loc_dt.hour != 10 or loc_dt.minute != 0: return else: context.day_count += 1 if context.day_count % 5 != 0.0: return if get_open_orders(): return # skip bar if any stocks did not trade for stock in context.stocks: if data[stock].datetime < get_datetime(): return prices = history(6, '1d', 'price').as_matrix(context.stocks)[0:-1,:] parts = rmr_strategy(context.portfolio, context.stocks, data, prices, context.eps) # rebalance portfolio accroding to new allocation for stock, portion in zip(context.stocks, parts): order_target_percent(stock, portion) def rmr_strategy(portfolio, stocks, data, prices, eps): """ Core of Robust Median Reviersion strategy implementation. :param portfolio: portfolio object :param stocks: list of sid objects used in the algo :param data: market event object :param prices: historical data in a form of numpy matrix :param eps: epsilon value :returns: new allocation for the portfolio securities """ # update portfolio b_t = [] for stock in stocks: b_t.append(portfolio.positions[stock].amount * data[stock].price) b_t = np.divide(b_t, np.sum(b_t)) x_tilde = np.zeros(len(stocks)) for i, stock in enumerate(stocks): x_tilde[i] = l1_median(prices[:,i])/prices[-1,i] x_bar = x_tilde.mean() # Calculate terms for lambda (lam) dot_prod = np.dot(b_t, x_tilde) num = eps - dot_prod denom = (np.linalg.norm((x_tilde - x_bar))) ** 2 b = b_t # test for divide-by-zero case if denom != 0.0: b = b_t + max(0, num/denom) * (x_tilde - x_bar) return simplex_projection(b) def simplex_projection(v, b=1): """ Projection vectors to the simplex domain. Implemented according to the paper: Efficient projections onto the l1-ball for learning in high dimensions, John Duchi, et al. ICML 2008. Implementation Time: 2011 June 17 by [email protected] AT pmail.ntu.edu.sg Optimization Problem: min_{w}\| w - v \|_{2}^{2} s.t. sum_{i=1}^{m}=z, w_{i}\geq 0 Input: A vector v \in R^{m}, and a scalar z > 0 (default=1) Output: Projection vector w :Example: >>> proj = simplex_projection([.4 ,.3, -.4, .5]) >>> print proj array([ 0.33333333, 0.23333333, 0. , 0.43333333]) >>> print proj.sum() 1.0 Original matlab implementation: John Duchi ([email protected]) Python-port: Copyright 2012 by Thomas Wiecki ([email protected]). """ v = np.asarray(v) p = len(v) # Sort v into u in descending order v = (v > 0) * v u = np.sort(v)[::-1] sv = np.cumsum(u) rho = np.where(u > (sv - b) / np.arange(1, p+1))[0][-1] theta = np.max([0, (sv[rho] - b) / (rho+1)]) w = (v - theta) w[w < 0] = 0 return w def l1_median(x): """ Computes L1 median (spatial median) using scipy.optimize.minimize_scalar :param x: a numpy 1D ndarray (vector) of values :returns: scalar estimate of L1 median of values """ a = float(np.amin(x)) b = float(np.amax(x)) res = minimize_scalar(dist_sum, bounds = (a,b), args = tuple(x), method='bounded') return res.x def dist_sum(m,*args): """ 1D sum of Euclidian distances :param m: scalar position :param *args: tuple of positions :returns: 1D sum of Euclidian distances """ s = 0 for x in args: s += abs(x-m) return s  There was a runtime error. I'm wondering if L1 vs standard median is overfitting... Though tomorrow I'll try to work through the paper and the descriptions of the L1Median to try understanding the theory behind it... I noticed the same thing about alpha in the equations. It is always a very large value. It also renders the epsilon parameter insignificant. In the paper they run the test over epsilon from 1 to 100 and it shows a large increase in total wealth achieved but this not the results I am getting. The epsilon parameter has almost no effect on my results. Something is not right some where. I implemented it in R which has implementations of the L1 Median. They are mentioned in the third PDF about the topic that Grant posted. Hi guys, I had a brief look at the papers, and while I have never come across the L1 median, from his example, a phase threshold despiking algorithm or something similar to identify outliers in your data might be something to investigate, as if you write off your data as too noisy and just take a modified median as a proxy to the mean arent you restricting your ability to use it for other statistical purposes? Basicly uses a proxy for first and second derivative and applies an elpsoid based on various stats to identify possible outliers. Afterall, the only way to identify an outlier is to compare it to the surrounding data. I can provide papers on it if people are interested, and its only like 6 lines of matlab code. We used it all the time in fluids to post process our data to overcome exactly this problem of biased estimates. Hi folks, As the paper highlights, trading cost kills the performance for high epsilon (aggressive updating). Here's a version that uses Anony Mole's https://www.quantopian.com/posts/trade-at-the-open-slippage-model on daily bars. It trades every day (can be adjusted using global trading_frequency = 1 # trading frequency in bars). You can play around with the trailing window size, epsilon, trading frequency, and cost parameters to understand the trade-offs. One issue I see is that there needs to be some way of putting a lower limit on the number of shares to be purchased when re-balancing (e.g. no point in paying$1.30 for a single share).

Grant

181
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
 Returns 1 Month 3 Month 6 Month 12 Month
 Alpha 1 Month 3 Month 6 Month 12 Month
 Beta 1 Month 3 Month 6 Month 12 Month
 Sharpe 1 Month 3 Month 6 Month 12 Month
 Sortino 1 Month 3 Month 6 Month 12 Month
 Volatility 1 Month 3 Month 6 Month 12 Month
 Max Drawdown 1 Month 3 Month 6 Month 12 Month
"""
Robust Median Reversion Strategy for On-Line Portfolio Selection,
Dingjiang Huang, Junlong Zhou, Bin Li, Steven C.H. Hoi, and Shuigeng
Zhou. International Joint Conference on Artificial Intelligence, 2013.
http://ijcai.org/papers13/Papers/IJCAI13-296.pdf
"""
import numpy as np
from pytz import timezone
from scipy.optimize import minimize_scalar
import pandas as pd

window = 15 # trailing window length in bars

def initialize(context):
"""
Initialize context object.

Context object is passed to the every call of handle_data.
It uses to store data to use in the algo.

:param context: context object
:returns: None
"""
context.stocks = [ sid(19662), # XLY Consumer Discrectionary SPDR Fund
sid(19656), # XLF Financial SPDR Fund
sid(19658), # XLK Technology SPDR Fund
sid(19655), # XLE Energy SPDR Fund
sid(19661), # XLV Health Care SPRD Fund
sid(19657), # XLI Industrial SPDR Fund
sid(19659), # XLP Consumer Staples SPDR Fund
sid(19654), # XLB Materials SPDR Fund
sid(19660)] # XLU Utilities SPRD Fund

context.prices = np.zeros([window,len(context.stocks)])

context.bar_count = 0

context.eps = 5 # change epsilon here
context.init = False

def handle_data(context, data):
"""
The main proccessing function.

This function is called by quantopian backend whenever a market event
occurs for any of algorithm's specified securities.

:param context: context object
:param data: A dictionary containing market data keyed by security id.
It represents a snapshot of your algorithm's universe as of
when this method was called.
"""
record(cash=context.portfolio.cash)

if not context.init:
# initializisation. Buy the same amount of each security
part = 1. / len(context.stocks)
for stock in context.stocks:
order_target_percent(stock, part)
context.init = True
accumulator(context,data)
return

# accumulate bars
accumulator(context,data)

if context.bar_count < window:
return

if context.bar_count % trading_frequency != 0.0:
return

if get_open_orders():
return

# skip bar if any stocks did not trade
for stock in context.stocks:
if data[stock].datetime < get_datetime():
return

parts = rmr_strategy(context.portfolio, context.stocks, data,
context.prices, context.eps)

# rebalance portfolio accroding to new allocation
for stock, portion in zip(context.stocks, parts):
order_target_percent(stock, portion)

def rmr_strategy(portfolio, stocks, data, prices, eps):
"""
Core of Robust Median Reversion strategy implementation.

:param portfolio: portfolio object
:param stocks: list of sid objects used in the algo
:param data: market event object
:param prices: historical data in a form of numpy matrix
:param eps: epsilon value
:returns: new allocation for the portfolio securities
"""
# update portfolio
b_t = []
for stock in stocks:
b_t.append(portfolio.positions[stock].amount * data[stock].price)

b_t = np.divide(b_t, np.sum(b_t))

x_tilde = np.zeros(len(stocks))
for i, stock in enumerate(stocks):
x_tilde[i] = l1_median(prices[:,i])/prices[-1,i]

x_bar = x_tilde.mean()

# Calculate terms for lambda (lam)
dot_prod = np.dot(b_t, x_tilde)
num = eps - dot_prod
denom = (np.linalg.norm((x_tilde - x_bar))) ** 2

b = b_t
# test for divide-by-zero case
if denom != 0.0:
b = b_t + max(0, num/denom) * (x_tilde - x_bar)

return simplex_projection(b)

def simplex_projection(v, b=1):
"""
Projection vectors to the simplex domain.

Implemented according to the paper: Efficient projections onto the
l1-ball for learning in high dimensions, John Duchi, et al. ICML 2008.
Implementation Time: 2011 June 17 by [email protected] AT pmail.ntu.edu.sg
Optimization Problem: min_{w}\| w - v \|_{2}^{2}
s.t. sum_{i=1}^{m}=z, w_{i}\geq 0

Input: A vector v \in R^{m}, and a scalar z > 0 (default=1)
Output: Projection vector w

:Example:
>>> proj = simplex_projection([.4 ,.3, -.4, .5])
>>> print proj
array([ 0.33333333, 0.23333333, 0. , 0.43333333])
>>> print proj.sum()
1.0

Original matlab implementation: John Duchi ([email protected])
Python-port: Copyright 2012 by Thomas Wiecki ([email protected]).
"""
v = np.asarray(v)
p = len(v)

# Sort v into u in descending order
v = (v > 0) * v
u = np.sort(v)[::-1]
sv = np.cumsum(u)

rho = np.where(u > (sv - b) / np.arange(1, p+1))[0][-1]
theta = np.max([0, (sv[rho] - b) / (rho+1)])
w = (v - theta)
w[w < 0] = 0
return w

def l1_median(x):
"""
Computes L1 median (spatial median) using scipy.optimize.minimize_scalar

:param x: a numpy 1D ndarray (vector) of values
:returns: scalar estimate of L1 median of values
"""
a = float(np.amin(x))
b = float(np.amax(x))

res = minimize_scalar(dist_sum, bounds = (a,b), args = tuple(x), method='bounded')

return res.x

def dist_sum(m,*args):
"""
1D sum of Euclidian distances

:param m: scalar position
:param *args: tuple of positions
:returns: 1D sum of Euclidian distances
"""

s = 0
for x in args:
s += abs(x-m)

return s

def accumulator(context,data):

if context.bar_count < window:
for i, stock in enumerate(context.stocks):
context.prices[context.bar_count,i] = data[stock].price
else:
context.prices = np.roll(context.prices,-1,axis=0)
for i, stock in enumerate(context.stocks):
context.prices[-1,i] = data[stock].price

context.bar_count += 1

########################################################
# Slippage model to trade at the open or at a fraction of the open - close range.
'''Class for slippage model to allow trading at the open
or at a fraction of the open to close range.
'''
# Constructor, self and fraction of the open to close range to add (subtract)
#   from the open to model executions more optimistically
def __init__(self, fractionOfOpenCloseRange):

# Store the percent of open - close range to take as the execution price
self.fractionOfOpenCloseRange = fractionOfOpenCloseRange

ocRange = closePrice - openPrice
ocRange = ocRange * self.fractionOfOpenCloseRange
if (ocRange != 0.0):
targetExecutionPrice = openPrice + ocRange
else:
targetExecutionPrice = openPrice
log.info('\nOrder:{0} open:{1} close:{2} exec:{3} side:{4}'.format(

# Create the transaction using the new price we've calculated.
return slippage.create_transaction(
order,
targetExecutionPrice,
order.amount
) 
There was a runtime error.

lol i spent a couple hours trying to find out why my implementation differs from yours grant... turns out you changed epsilon. I guess doing that makes the original b_t+ more relevant to the output though.

Grant,

I spent a bunch of time yesterday looking at this strategy, here's what I got thus far:

if you want some code for reducing trivial rebalancing, you can use the order processing logic from my framework:

    def processOrder(this, data, rebalanceThreshholdPercent=0.05, stopLimitPercent=0.0):
''' set rebalanceThreshholdPercent to zero (0.0) to cause the position to readjust even if the targetPercentage doesn't change.   this is useful for reinvesting divideds / etc
but is set to 0.02 (2 percent) so we don't spam orders '''
this.currentCapitalSharePercent = this.targetCapitalSharePercent
estimatedSharePrice = data[this.security.qsec].close_price
#determine value of percent
targetSharesValue = this.security.framework.context.portfolio.portfolio_value * this.currentCapitalSharePercent
targetSharesTotal = int(math.copysign(math.floor(abs(targetSharesValue / estimatedSharePrice)),targetSharesValue))
targetSharesDelta = targetSharesTotal - this.currentShares

if targetSharesTotal != 0:
if abs(targetSharesDelta / (targetSharesTotal * 1.0)) < rebalanceThreshholdPercent:
logger.debug("{0} ORDER SKIPPED! {1} (change to small) : {2} + {3} => {4} shares".format(this.strategyName,this.security.symbol, this.currentShares, targetSharesDelta, targetSharesTotal))
#our position change was too small so we skip rebalancing
return

#calc stoplimit, if any
if stopLimitPercent == 0.0:
stopLimitPrice = None
else:
stopLimitPrice = estimatedSharePrice + (estimatedSharePrice * -stopLimitPercent if targetSharesTotal > 0 else estimatedSharePrice * stopLimitPercent)

#do actual order
if(abs(targetSharesDelta) >= 1): #can not perform an order on less than 1 share
this.security.framework.logger.info("{0} order {1} : {2} + {3} => {4} shares".format(this.strategyName,this.security.symbol, this.currentShares, targetSharesDelta, targetSharesTotal))
this.currentShares = targetSharesTotal


You'll have to replug it into whatever code you are using, but feel free to use this as a reference.

FYI to check the 'pure' revision potential of this, I added some code to do shorts in addition to longs, though the results are not promising (i run my tests from 2012 to present)

long_pos = this.simplex_projection(b);
short_pos = -this.simplex_projection(-b);
return (long_pos + short_pos)/2.0;


Another FYI: I have been trying to use median of intraday closes over the last 6 days and that gives pretty similar results to the L1Median. Though the more I try to "optimize" this the worse my results get :(

FYI, here attached backtest for Vanguard ETF portfolio and putting a lower limit on the number of shares to be purchased when re-balancing.

commission and slippage mode as following:

58
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
 Returns 1 Month 3 Month 6 Month 12 Month
 Alpha 1 Month 3 Month 6 Month 12 Month
 Beta 1 Month 3 Month 6 Month 12 Month
 Sharpe 1 Month 3 Month 6 Month 12 Month
 Sortino 1 Month 3 Month 6 Month 12 Month
 Volatility 1 Month 3 Month 6 Month 12 Month
 Max Drawdown 1 Month 3 Month 6 Month 12 Month
"""
Robust Median Reversion Strategy for On-Line Portfolio Selection,
Dingjiang Huang, Junlong Zhou, Bin Li, Steven C.H. Hoi, and Shuigeng
Zhou. International Joint Conference on Artificial Intelligence, 2013.
http://ijcai.org/papers13/Papers/IJCAI13-296.pdf
"""

# revised: here needs to be some way of putting a lower limit on the number of shares to be purchased when re-balancing (e.g. no point in paying \$1.30 for a single share).

import numpy as np
from pytz import timezone
from scipy.optimize import minimize_scalar
import pandas as pd

window = 15 # trailing window length in bars

def initialize(context):
"""
Initialize context object.

Context object is passed to the every call of handle_data.
It uses to store data to use in the algo.

:param context: context object
:returns: None
"""
#context.stocks = [ sid(19662), # XLY Consumer Discrectionary SPDR Fund
#                   sid(19656), # XLF Financial SPDR Fund
#                   sid(19658), # XLK Technology SPDR Fund
#                   sid(19655), # XLE Energy SPDR Fund
#                   sid(19661), # XLV Health Care SPRD Fund
#                   sid(19657), # XLI Industrial SPDR Fund
#                   sid(19659), # XLP Consumer Staples SPDR Fund
#                   sid(19654), # XLB Materials SPDR Fund
#                   sid(19660)] # XLU Utilities SPRD Fund
context.stocks =   [ sid(25906),  # Vanguard健康照護類股ETF
sid(35344),  # Vanguard大型成長股ETF
sid(25899),  # Vanguard小型成長股ETF
sid(32521),  # Vanguard中型價值股ETF
sid(26668),  # Vanguard工業類股ETF
sid(25908),  # Vanguard公用事業類股
sid(25904),  # Vanguard金融類股ETF
sid(26670),  # Vanguard電信類股ETF
sid(26667),  # Vanguard能源類股ETF
sid(25905),  # Vanguard資訊科技類股ETF
sid(25902),  # Vanguard非必需消費類股ET
sid(25903) ] # Vanguard必需性消費類股ETFF
context.prices = np.zeros([window,len(context.stocks)])

context.bar_count = 0

context.eps = 5 # change epsilon here
context.init = False

def handle_data(context, data):
"""
The main proccessing function.

This function is called by quantopian backend whenever a market event
occurs for any of algorithm's specified securities.

:param context: context object
:param data: A dictionary containing market data keyed by security id.
It represents a snapshot of your algorithm's universe as of
when this method was called.
"""
record(cash=context.portfolio.cash)

if not context.init:
# initializisation. Buy the same amount of each security
part = 1. / len(context.stocks)
for stock in context.stocks:
order_target_percent(stock, part)
context.init = True
accumulator(context,data)
return

# accumulate bars
accumulator(context,data)

if context.bar_count < window:
return

if context.bar_count % trading_frequency != 0.0:
return

if get_open_orders():
return

# skip bar if any stocks did not trade
for stock in context.stocks:
if data[stock].datetime < get_datetime():
return

parts = rmr_strategy(context.portfolio, context.stocks, data,
context.prices, context.eps)

# rebalance portfolio accroding to new allocation
for stock, portion in zip(context.stocks, parts):
current_stock_position  = context.portfolio.positions[stock].amount
current_portfolio_value = context.portfolio.portfolio_value
target_share_difference = (current_portfolio_value * portion) / data[stock].price
if abs(current_stock_position - target_share_difference) > 10:
order_target_percent(stock, portion)

def rmr_strategy(portfolio, stocks, data, prices, eps):
"""
Core of Robust Median Reversion strategy implementation.

:param portfolio: portfolio object
:param stocks: list of sid objects used in the algo
:param data: market event object
:param prices: historical data in a form of numpy matrix
:param eps: epsilon value
:returns: new allocation for the portfolio securities
"""
# update portfolio
b_t = []
for stock in stocks:
b_t.append(portfolio.positions[stock].amount * data[stock].price)

b_t = np.divide(b_t, np.sum(b_t))

x_tilde = np.zeros(len(stocks))
for i, stock in enumerate(stocks):
x_tilde[i] = l1_median(prices[:,i])/prices[-1,i]

x_bar = x_tilde.mean()

# Calculate terms for lambda (lam)
dot_prod = np.dot(b_t, x_tilde)
num = eps - dot_prod
denom = (np.linalg.norm((x_tilde - x_bar))) ** 2

b = b_t
# test for divide-by-zero case
if denom != 0.0:
b = b_t + max(0, num/denom) * (x_tilde - x_bar)

return simplex_projection(b)

def simplex_projection(v, b=1):
"""
Projection vectors to the simplex domain.

Implemented according to the paper: Efficient projections onto the
l1-ball for learning in high dimensions, John Duchi, et al. ICML 2008.
Implementation Time: 2011 June 17 by [email protected] AT pmail.ntu.edu.sg
Optimization Problem: min_{w}\| w - v \|_{2}^{2}
s.t. sum_{i=1}^{m}=z, w_{i}\geq 0

Input: A vector v \in R^{m}, and a scalar z > 0 (default=1)
Output: Projection vector w

:Example:
>>> proj = simplex_projection([.4 ,.3, -.4, .5])
>>> print proj
array([ 0.33333333, 0.23333333, 0. , 0.43333333])
>>> print proj.sum()
1.0

Original matlab implementation: John Duchi ([email protected])
Python-port: Copyright 2012 by Thomas Wiecki ([email protected]).
"""
v = np.asarray(v)
p = len(v)

# Sort v into u in descending order
v = (v > 0) * v
u = np.sort(v)[::-1]
sv = np.cumsum(u)

rho = np.where(u > (sv - b) / np.arange(1, p+1))[0][-1]
theta = np.max([0, (sv[rho] - b) / (rho+1)])
w = (v - theta)
w[w < 0] = 0
return w

def l1_median(x):
"""
Computes L1 median (spatial median) using scipy.optimize.minimize_scalar

:param x: a numpy 1D ndarray (vector) of values
:returns: scalar estimate of L1 median of values
"""
a = float(np.amin(x))
b = float(np.amax(x))

res = minimize_scalar(dist_sum, bounds = (a,b), args = tuple(x), method='bounded')

return res.x

def dist_sum(m,*args):
"""
1D sum of Euclidian distances

:param m: scalar position
:param *args: tuple of positions
:returns: 1D sum of Euclidian distances
"""

s = 0
for x in args:
s += abs(x-m)

return s

def accumulator(context,data):

if context.bar_count < window:
for i, stock in enumerate(context.stocks):
context.prices[context.bar_count,i] = data[stock].price
else:
context.prices = np.roll(context.prices,-1,axis=0)
for i, stock in enumerate(context.stocks):
context.prices[-1,i] = data[stock].price

context.bar_count += 1

########################################################
# Slippage model to trade at the open or at a fraction of the open - close range.
'''Class for slippage model to allow trading at the open
or at a fraction of the open to close range.
'''
# Constructor, self and fraction of the open to close range to add (subtract)
#   from the open to model executions more optimistically
def __init__(self, fractionOfOpenCloseRange):

# Store the percent of open - close range to take as the execution price
self.fractionOfOpenCloseRange = fractionOfOpenCloseRange

ocRange = closePrice - openPrice
ocRange = ocRange * self.fractionOfOpenCloseRange
if (ocRange != 0.0):
targetExecutionPrice = openPrice + ocRange
else:
targetExecutionPrice = openPrice
log.info('\nOrder:{0} open:{1} close:{2} exec:{3} side:{4}'.format(

# Create the transaction using the new price we've calculated.
return slippage.create_transaction(
order,
targetExecutionPrice,
order.amount
) 
There was a runtime error.

To determine the raw predictive power of these mean revisions, I'm trying to do 50% long, 50% short, using the code I described earlier.

long_pos = this.simplex_projection(b);
short_pos = -this.simplex_projection(-b);
return (long_pos + short_pos)/2.0;


So far, not much luck, it's always just a random walk around 0%. Any thoughts on how I might improve this?

Jason,

Not sure. It seems like the problem could be formulated in a symmetric fashion, as you suggest, so that both long and short positions could be considered. I'll have to think about that one.

Grant

I cloned the code for this strategy, and back tested it with recent data. It showed that the performance of the strategy lags the benchmark a lot since August, 2014, -21% vs. 2.5% during 8/1/2014 - 9/23/2015, and Sharpe Ratio -0.99. This is horrible performance. Does this strategy really work at all?

Also, for most dates, the strategy only holds one or two ETF's. And we know holding XLE (energy) really hurt performance since the last year. The strategy holds XLE as only position for many dates.

Thank you all for coding it up.