Back to Community
Median Reversion Strategy for On-Line Portfolio Selection

Hi all,
I have looked at this algorithm and back test
I was wondering if someone could give me some detailed explanation in regards to this algorithm logic
and why is it so successful??...
I know my question is a bit vague but any help would be greatly appreciated.

Many thanks all,
Best Regards,

Clone Algorithm
Total Returns
Max Drawdown
Benchmark Returns
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
Robust Median Reversion Strategy for On-Line Portfolio Selection,
Dingjiang Huang, Junlong Zhou, Bin Li, Steven C.H. Hoi, and Shuigeng
Zhou. International Joint Conference on Artificial Intelligence, 2013.
import numpy as np
from pytz import timezone
from scipy.optimize import minimize_scalar
import pandas as pd

window = 15 # trailing window length in bars
trading_frequency = 1 # trading frequency in bars

def initialize(context):
Initialize context object.

Context object is passed to the every call of handle_data.
It uses to store data to use in the algo.

:param context: context object
:returns: None
    context.stocks = [ sid(19662), # XLY Consumer Discrectionary SPDR Fund
                       sid(19656), # XLF Financial SPDR Fund
                       sid(19658), # XLK Technology SPDR Fund
                       sid(19655), # XLE Energy SPDR Fund
                       sid(19661), # XLV Health Care SPRD Fund
                       sid(19657), # XLI Industrial SPDR Fund
                       sid(19659), # XLP Consumer Staples SPDR Fund
                       sid(19654), # XLB Materials SPDR Fund
                       sid(19660)] # XLU Utilities SPRD Fund

    context.prices = np.zeros([window,len(context.stocks)])
    context.bar_count = 0
    context.eps = 5 # change epsilon here
    context.init = False

    # set_slippage(slippage.FixedSlippage(spread=0.00))
    # set_commission(commission.PerShare(cost=0.013, min_trade_cost=1.3))

def handle_data(context, data):
The main proccessing function.

This function is called by quantopian backend whenever a market event
occurs for any of algorithm's specified securities.

:param context: context object
:param data: A dictionary containing market data keyed by security id.
It represents a snapshot of your algorithm's universe as of
when this method was called.
    if not context.init:
        # initializisation. Buy the same amount of each security
        part = 1. / len(context.stocks)
        for stock in context.stocks:
            order_target_percent(stock, part)
        context.init = True
    # accumulate bars
    if context.bar_count < window:
    if context.bar_count % trading_frequency != 0.0:

    if get_open_orders():

    # skip bar if any stocks did not trade
    for stock in context.stocks:
        if data[stock].datetime < get_datetime():

    parts = rmr_strategy(context.portfolio, context.stocks, data,
                         context.prices, context.eps)

    # rebalance portfolio accroding to new allocation
    for stock, portion in zip(context.stocks, parts):
        order_target_percent(stock, portion)

def rmr_strategy(portfolio, stocks, data, prices, eps):
Core of Robust Median Reversion strategy implementation.

:param portfolio: portfolio object
:param stocks: list of sid objects used in the algo
:param data: market event object
:param prices: historical data in a form of numpy matrix
:param eps: epsilon value
:returns: new allocation for the portfolio securities
    # update portfolio
    b_t = []
    for stock in stocks:
        b_t.append(portfolio.positions[stock].amount * data[stock].price)

    b_t = np.divide(b_t, np.sum(b_t))

    x_tilde = np.zeros(len(stocks))
    for i, stock in enumerate(stocks):
        x_tilde[i] = l1_median(prices[:,i])/prices[-1,i]

    x_bar = x_tilde.mean()

    # Calculate terms for lambda (lam)
    dot_prod =, x_tilde)
    num = eps - dot_prod
    denom = (np.linalg.norm((x_tilde - x_bar))) ** 2

    b = b_t
    # test for divide-by-zero case
    if denom != 0.0:
        b = b_t + max(0, num/denom) * (x_tilde - x_bar)

    return simplex_projection(b)

def simplex_projection(v, b=1):
Projection vectors to the simplex domain.

Implemented according to the paper: Efficient projections onto the
l1-ball for learning in high dimensions, John Duchi, et al. ICML 2008.
Implementation Time: 2011 June 17 by [email protected] AT
Optimization Problem: min_{w}\| w - v \|_{2}^{2}
s.t. sum_{i=1}^{m}=z, w_{i}\geq 0

Input: A vector v \in R^{m}, and a scalar z > 0 (default=1)
Output: Projection vector w

>>> proj = simplex_projection([.4 ,.3, -.4, .5])
>>> print proj
array([ 0.33333333, 0.23333333, 0. , 0.43333333])
>>> print proj.sum()

Original matlab implementation: John Duchi ([email protected])
Python-port: Copyright 2012 by Thomas Wiecki ([email protected]).
    v = np.asarray(v)
    p = len(v)

    # Sort v into u in descending order
    v = (v > 0) * v
    u = np.sort(v)[::-1]
    sv = np.cumsum(u)

    rho = np.where(u > (sv - b) / np.arange(1, p+1))[0][-1]
    theta = np.max([0, (sv[rho] - b) / (rho+1)])
    w = (v - theta)
    w[w < 0] = 0
    return w

def l1_median(x):
Computes L1 median (spatial median) using scipy.optimize.minimize_scalar

:param x: a numpy 1D ndarray (vector) of values
:returns: scalar estimate of L1 median of values
    a = float(np.amin(x))
    b = float(np.amax(x))
    res = minimize_scalar(dist_sum, bounds = (a,b), args = tuple(x), method='bounded')
    return res.x

def dist_sum(m,*args):
1D sum of Euclidian distances

:param m: scalar position
:param *args: tuple of positions 
:returns: 1D sum of Euclidian distances
    s = 0
    for x in args:
        s += abs(x-m)
    return s

def accumulator(context,data):
    if context.bar_count < window:
        for i, stock in enumerate(context.stocks):
            context.prices[context.bar_count,i] = data[stock].price
        context.prices = np.roll(context.prices,-1,axis=0)
        for i, stock in enumerate(context.stocks):
            context.prices[-1,i] = data[stock].price 
    context.bar_count += 1
# Slippage model to trade at the open or at a fraction of the open - close range.  
class TradeAtTheOpenSlippageModel(slippage.SlippageModel):  
    '''Class for slippage model to allow trading at the open  
       or at a fraction of the open to close range.  
    # Constructor, self and fraction of the open to close range to add (subtract)  
    #   from the open to model executions more optimistically  
    def __init__(self, fractionOfOpenCloseRange):

        # Store the percent of open - close range to take as the execution price  
        self.fractionOfOpenCloseRange = fractionOfOpenCloseRange

    def process_order(self, trade_bar, order):  
        openPrice = trade_bar.open_price  
        closePrice = trade_bar.price  
        ocRange = closePrice - openPrice  
        ocRange = ocRange * self.fractionOfOpenCloseRange  
        if (ocRange != 0.0):  
            targetExecutionPrice = openPrice + ocRange  
            targetExecutionPrice = openPrice '\nOrder:{0} open:{1} close:{2} exec:{3} side:{4}'.format(  
            trade_bar.sid.symbol, openPrice, closePrice, targetExecutionPrice, order.direction))

        # Create the transaction using the new price we've calculated.  
        return slippage.create_transaction(  
There was a runtime error.
6 responses

That's very impressive. Thanks for sharing this with us. I have no input because this is frankly beyond my skill level.

What was your leverage at for this backtest?

I checked. No leverage.

I believe it is to do with slippage and commissions. I put in the following slippage and commissions:

Assumes that your entire buy order will be filled 0.01 higher and a sell order will be filled 0.01 lower (could still be conservative depending on order size and liquidity)

Slightly more expensive that IB's 0.005 per share with $1 minimum
set_commission(commission.PerShare(cost=0.01, min_trade_cost=1))

With these parameters, the algorithm went to -99% returns in the 2 month period between the beginning of Jan and end of Feb this year. The algorithm is so profitable because of very unrealistic transaction costs and slippage.

As a follow up, the original model has the following line of code (a custom slippage model):


If we are to assume that this model is as accurate as possible, we still cannot ignore the commissions system, which has been set to:


Thus, if we keep their slippage model and just change the commission scheme to the one mentioned in my prior post, the algorithm, when run from Jan 4th 2015 to present on minute data hits -99% returns by mid-March. This tells me that the original algorithm's hyperinflated results are almost entirely because of the lack of commissions being built into the system.

Leverage is the ratio of amount spent vs initial capital. If borrowing occurs, leverage goes above the value of 1 (borrowing, negative cash) and that condition is called margin. As soon as one's first buy occurs, leverage is no longer zero. Examples: The leverage for a first purchase of $5 (including commission) on $10 initial capital would be .5 (one half). Spending $15 on $10 initial capital would be 1.5.
Most of the time when we refer to "leverage" we really mean to be talking about 'margin', negative cash, borrowing.

Here's another way to keep an eye out for margin. It won't tax the logging limits too much because it only logs new highs (or lows in the second one), and can be easily dropped into an algo (it does its own initialization for storing the previous value to be able to compare):

def handle_data(context, data):  
    # Log any new leverage high  
    if 'leverage_high' not in context:  
        context.leverage_high = 0  
    if context.account.leverage > context.leverage_high:  
        context.leverage_high = context.account.leverage 'leverage_high {}'.format(context.leverage_high))  

Sometimes an easier way to think of it is lowest cash used instead, and this does that:

def handle_data(context, data):  
    # Log any new lowest cash level  
    if 'cash_low' not in context:  
        context.cash_low = context.portfolio.starting_cash  
    if < context.cash_low:  
        context.cash_low = 'cash low {}'.format( int(context.cash_low) ))  

You used record() for cash, that's great
Side-note: One thing that folks ought to keep in mind is that the custom chart necessarily does smoothing to be able to be so nice, so mousing over it can show averaged values that didn't really actually happen. For example if you record just two values, 0 on some bars and 100 on others, sometimes the chart will show you a value of say, 60, due to smoothing.
Lines like those above provide precision if needed.

By the way, did you know you can click custom chart legend items to toggle them off and on? I just discovered that a few days ago, useful sometimes when two or more recorded sets of values are in widely different ranges from each other.

You'll find that the algo above uses a lot of leverage/margin. It had already spent twice the initial capital five days in (leverage of 2.0040087117). Three times a couple of years later. I didn't have the patience to wait for the full test. It is useful for you to know that unfortunately the chart and metrics currently ignore margin (negative cash, borrowing). On our own we can calculate profit divided by maximum spent (including margin) and that will give us an idea of code merit no matter how much margin happens to be used as code changes are made. And that feels great to not have to be putting forth so much effort to be as close as possible to the utilization of 100% of initial capital. Good luck.