Simple Monthly Relative Strength Strategy by DaveG

This is my first algorithm attempt (and I'm still learning Python)....

The algo is actually inspired by www.etfreplay.com : on the second-last trading day of each month (because I want to trade on the last day of the month and need to take the Quantopian "lag" into account) I calculate the RS according to the formula RS = 0.5 * 62 day performance + 0.3 * 20 day performance + 0.2 * annualized 20 day volatility.

I did some optimization of the parameters using zipline - would be nice if we could do the same with Quantopian :)

DaveG

155
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
 Returns 1 Month 3 Month 6 Month 12 Month
 Alpha 1 Month 3 Month 6 Month 12 Month
 Beta 1 Month 3 Month 6 Month 12 Month
 Sharpe 1 Month 3 Month 6 Month 12 Month
 Sortino 1 Month 3 Month 6 Month 12 Month
 Volatility 1 Month 3 Month 6 Month 12 Month
 Max Drawdown 1 Month 3 Month 6 Month 12 Month
from zipline.utils import tradingcalendar

import pytz
import datetime as dt
from math import sqrt
import pandas

window = 63

def lookup(sid) :
symbol = {'Security(2174)':'DIA','Security(8554)':'SPY','Security(32012)':'IHE','Security(32290)':'FBT',
'Security(25903)':'VDC','Security(12915)':'MDY','Security(33370)':'UUP','Security(23911)':'SHY',
'Security(22972)':'EFA','Security(14517)':'EWC','Security(18387)':'BND','Security(23870)':'IEF',
'Security(26669)':'VNQ','Security(33080)':'INP','Security(28054)':'DBC','Security(27102)':'VWO',
'Security(32616)':'EEB','Security(32266)':'MYY','Security(23921)':'TLT','Security(23134)':'ILF',
'Security(23118)':'EPP','Security(21757)':'EWZ','Security(33334)':'FXY','Security(26703)':'FXI'}

return symbol[str(sid)]

try :
except :
raise

@batch_transform(window_length=window)
def get_past_prices(data):
prices = data['price']
sell_prices = data['price']

# data consists of dict for each sid for each date
def initialize(context):

#   STOCKS = ['DIA','SPY','IHE','FBT','VDC','MDY','UUP','SHY','EFA','EWC','BND','IEF','VNQ','INP','DBC', 'VWO','EEB','MYY','TLT','ILF','EPP','EWZ','FXY','FXI']
context.stocks = [sid(2174),sid(8554),sid(32012),sid(32290),sid(25903),sid(12915),sid(33370),sid(23911),\
sid(22972),sid(14517),sid(23870),sid(26669),sid(33080),sid(28054),sid(27102),sid(32616),\
sid(32266),sid(23921),sid(23134),sid(23118),sid(21757),sid(33334),sid(26703)]

context.top_n = 2

# shares in portfolio at last rebalance date
context.last = []

context.capital_base = 1000000.
context.notional = 1000000.

# want to rebalance on the last trading day of the month
# because of the event-driven nature of zipline, this means
# that we need to apply the ranking formula a day earlier

start = dt.datetime(2003,1,1, 0, 0, 0, 0, pytz.utc)
end = dt.datetime(2013,7,1, 0, 0, 0, 0, pytz.utc)
context.event_dates =  get_trading_dates(start, end, offset = -1 )

def handle_data(context, data):

#get prices for each security
# all_prices[0] = price, [1] = buy_prices, [2] = sell_prices
# can set these up in transform get_past_prices()
all_prices = get_past_prices(data)

#circuit breaker in case transform returns none
if all_prices is None:
return

sell_prices = all_prices[2]
all_prices = all_prices[0]

#circuit breaker, only calculate on 2nd last trading day of month
if get_datetime() in context.event_dates:

#        log.debug('%s' % get_datetime())

# daily returns
d_returns = all_prices / all_prices.shift(1) - 1

#calculate 20 and 62 day performance and 20 day volatility

perf_62 = all_prices.ix[-1] / all_prices.ix[0] - 1
perf_20 = all_prices.ix[-1] / all_prices.ix[window - 1 - 20] - 1
vol_20 = d_returns[window - 1 - 19:].std() * sqrt(252)

rs = perf_62.rank() * 0.5 + perf_20.rank() * 0.3 + vol_20.rank(ascending=False) * 0.2
rs_modified = rs * 10 + perf_62.max()- perf_62 / perf_62.max()

ranks = rs_modified.rank(ascending=False)

holdings = pandas.Series(data=ranks.values, index=ranks.index)
for symbol in holdings.index :
if holdings[symbol] <= context.top_n :
holdings[symbol] = 1
else :
holdings[symbol] = 0
pass

# hold shares if in last rebalance portfolio
# rebalance the remainder in equal proportions

# shares in portfolio for this rebalance date
this = [sym for sym in context.stocks if holdings[sym] > 0]

# only shares in 'last' and not in 'this' are "liquidated"
# and "liquidated value" used to buy shares in 'this' not in last in equal proportions
for symbol in [sym for sym in context.last if sym not in this] :
qty = context.portfolio.positions[symbol].amount
# treat nan as 0
if qty > 0 : pass
else : qty = 0

order(symbol, - qty)
context.notional += qty * sell_prices[symbol][-1]

log.debug('  %s' % symbol + ' %s' % -qty + ' @ ' + '%s' % sell_prices[symbol][-1] + ' Cash = ' + str(context.notional))

liquidation_value = context.notional
pass

if len([sym for sym in this if sym not in context.last]) > 0 :
# divide available cash equally
weight_per_share = 1. / len([sym for sym in this if sym not in context.last])

for symbol in [sym for sym in this if sym not in context.last] :

qty = int(liquidation_value * weight_per_share / buy_prices[symbol][-1])

order(symbol, qty)
context.notional -= qty * sell_prices[symbol][-1]
log.debug('  %s' % symbol + ' %s' % qty + ' @ ' + '%s' % buy_prices[symbol][-1] + ' Cash = ' + str(context.notional))

else :
pass
context.last = this
pass
pass
pass
This backtest was created using an older version of the backtester. Please re-run this backtest to see results using the latest backtester. Learn more about the recent changes.
There was a runtime error.
10 responses

I ran your algo with a pseudo-random universe of stocks instead of your optimized list. The results aren't pretty. In 2008 the algo starts borrowing heavily just before the credit crunch and never recovers.

12
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
 Returns 1 Month 3 Month 6 Month 12 Month
 Alpha 1 Month 3 Month 6 Month 12 Month
 Beta 1 Month 3 Month 6 Month 12 Month
 Sharpe 1 Month 3 Month 6 Month 12 Month
 Sortino 1 Month 3 Month 6 Month 12 Month
 Volatility 1 Month 3 Month 6 Month 12 Month
 Max Drawdown 1 Month 3 Month 6 Month 12 Month
from zipline.utils import tradingcalendar

import pytz
import datetime as dt
from math import sqrt
import pandas

window = 63

def capital_used_and_free_cash(context):
# calculate capital_used using absolute amount: abs(amount)*cost_basis
pos = context.portfolio.positions
capital_used = sum(abs(pos[s].amount) * pos[s].cost_basis for s in pos)
record(capital_used = capital_used)
# calculate free cash: starting_cash + pnl - capital_used
port = context.portfolio
free_cash = port.starting_cash + port.pnl - capital_used
record(free_cash = free_cash)
return

try :
except :
raise

@batch_transform(window_length=window)
def get_past_prices(data):
prices = data['price']
sell_prices = data['price']

# data consists of dict for each sid for each date
def initialize(context):

#   STOCKS = ['DIA','SPY','IHE','FBT','VDC','MDY','UUP','SHY','EFA','EWC','BND','IEF','VNQ','INP','DBC', 'VWO','EEB','MYY','TLT','ILF','EPP','EWZ','FXY','FXI']
#context.stocks = [sid(2174),sid(8554),sid(32012),sid(32290),sid(25903),sid(12915),sid(33370),sid(23911),\
#                    sid(22972),sid(14517),sid(23870),sid(26669),sid(33080),sid(28054),sid(27102),sid(32616),\
#                    sid(32266),sid(23921),sid(23134),sid(23118),sid(21757),sid(33334),sid(26703)]
set_universe(universe.DollarVolumeUniverse(97.0,99.0))

context.top_n = 2

# shares in portfolio at last rebalance date
context.last = []

context.capital_base = 1000000.
context.notional = 1000000.

# want to rebalance on the last trading day of the month
# because of the event-driven nature of zipline, this means
# that we need to apply the ranking formula a day earlier

start = dt.datetime(2003,1,1, 0, 0, 0, 0, pytz.utc)
end = dt.datetime(2013,7,1, 0, 0, 0, 0, pytz.utc)
context.event_dates =  get_trading_dates(start, end, offset = -1 )

def handle_data(context, data):

capital_used_and_free_cash(context)

#get prices for each security
# all_prices[0] = price, [1] = buy_prices, [2] = sell_prices
# can set these up in transform get_past_prices()
all_prices = get_past_prices(data)

#circuit breaker in case transform returns none
if all_prices is None:
return

sell_prices = all_prices[2]
all_prices = all_prices[0]

#circuit breaker, only calculate on 2nd last trading day of month
if get_datetime() in context.event_dates:

#        log.debug('%s' % get_datetime())

# daily returns
d_returns = all_prices / all_prices.shift(1) - 1

#calculate 20 and 62 day performance and 20 day volatility

perf_62 = all_prices.ix[-1] / all_prices.ix[0] - 1
perf_20 = all_prices.ix[-1] / all_prices.ix[window - 1 - 20] - 1
vol_20 = d_returns[window - 1 - 19:].std() * sqrt(252)

rs = perf_62.rank() * 0.5 + perf_20.rank() * 0.3 + vol_20.rank(ascending=False) * 0.2
rs_modified = rs * 10 + perf_62.max()- perf_62 / perf_62.max()

ranks = rs_modified.rank(ascending=False)

holdings = pandas.Series(data=ranks.values, index=ranks.index)
for symbol in holdings.index :
if holdings[symbol] <= context.top_n :
holdings[symbol] = 1
else :
holdings[symbol] = 0
pass

# hold shares if in last rebalance portfolio
# rebalance the remainder in equal proportions

# shares in portfolio for this rebalance date
#this = [sym for sym in context.stocks if holdings[sym] > 0]
this = [sym for sym in data.keys() if holdings[sym] > 0]

# only shares in 'last' and not in 'this' are "liquidated"
# and "liquidated value" used to buy shares in 'this' not in last in equal proportions
for symbol in [sym for sym in context.last if sym not in this] :
qty = context.portfolio.positions[symbol].amount
# treat nan as 0
if qty > 0 : pass
else : qty = 0

order(symbol, - qty)
context.notional += qty * sell_prices[symbol][-1]

log.debug('  %s' % symbol + ' %s' % -qty + ' @ ' + '%s' % sell_prices[symbol][-1] + ' Cash = ' + str(context.notional))

liquidation_value = context.notional
pass

if len([sym for sym in this if sym not in context.last]) > 0 :
# divide available cash equally
weight_per_share = 1. / len([sym for sym in this if sym not in context.last])

for symbol in [sym for sym in this if sym not in context.last] :

qty = int(liquidation_value * weight_per_share / buy_prices[symbol][-1])

order(symbol, qty)
context.notional -= qty * sell_prices[symbol][-1]
log.debug('  %s' % symbol + ' %s' % qty + ' @ ' + '%s' % buy_prices[symbol][-1] + ' Cash = ' + str(context.notional))

else :
pass
context.last = this
pass
pass
pass
This backtest was created using an older version of the backtester. Please re-run this backtest to see results using the latest backtester. Learn more about the recent changes.
There was a runtime error.

Hi Dennis C,

thanks for the interest. You're absolutely right about the ETF basket chosen. Choosing stocks at random will undoubtedly find LOTS of 'baskets' which won't perform at all. That's why you need to optimize. As I mentioned, I used zipline to do this offline, as there's really no way to do it in Quantopian. The result was a combination of ETFs, RS weightings, lookback parameters and no. of ETFs to hold.

With long term investing in mind, I would venture that portfolio optimization is a must - mean-variance, efficient frontier or whatever. Clearly, this hardly applies (if at all) to day trading. If Quantopian is also interested in attracting long-term investors (like me), then optimization will be a must. Of course there is always zipline offline.

rgds,
DaveG

What period did you optimize for? Was it the same years as the backtest you posted?

I'd be interested in seeing your basket optimized for 2007 and backtested on 2008. Same for 08/09, 09/10, etc.

The optimization I've done is very rudimentary, using different start dates until the present. I deliberately included 2007-2008 in the example to show the performance before, during and after the 2008 melt-down. What you are asking can easily be achieved by porting the code to zipline and looping through all the combinations you're interested in (you could also do the same online manually, of course :) ). I'm not suggesting that the optimization process is easy - I'm doing it by stupid brute force, which is computation-heavy and time consuming. Now having someone weigh in on this conversation with much better knowledge on this kind of optimization would be great! Especially the choice of the 'optimum' basket to use for different market conditions.

Thanks Dave. That's totally understandable.

But the reason I was asking was because of optimization bias / overfitting. You didn't answer whether your optimization overlapped the dates you are showing above.

Edit: Sorry, you did answer: "using different start dates until the present". But that is a concerning answer since it completely overlaps.

This is an interesting thread and very similar to the thread I was going to post. I've seen this issue referred to as "data mining". Now, I've heard of the phrase "data mining" used in a positive sense in biomolecular engineering but I see it used in a negative sense with regard to trading algorithms, and for good reason. Essentially, by "optimizing" performance to a given set of data and then testing on that same data set you bias the results toward the positive.

I am having much difficulty with this issue now. Furthermore, I've tried to go about this in another way as well. Essentially I calculate the "ETF replay algorithm" (which I think btw may be (0.4*63 day return) + (0.3*21 day return) MINUS (0.3*21 day volatility) ) but rather than doing a backtesting, I plot the value of the equation on the x-axis versus the 63 day FUTURE performance. I see no correlation. Is it wrong to assume a correlation must exist for a trading strategy to be more than dumb luck?

@Daniel, I think you are on the right track. However I am by no means an expert on this.

My take is that data mining can easily result in overfitting. And when that happens the optimization is unlikely to be useful on out of sample data (e.g. a time period outside of your optimization). And you won't notice if you test on the same time period.

In other words if you optimize and test on the same time period then during the testing you have a "time travel problem". This is because your optimization used data from the future relative to the "current date" within the backtest. This is also known as "look ahead bias".

To find a cure for this problem we must slay two issues: overfitting and time travel.

For overfitting I believe the answer is to use simple models (linear models are good) to reduce the chance of curve fitting. And don't spend a lot of time tweaking. Even better let your algorithm find "natural" ratios instead of putting in magic numbers. So an algorithm that responds to a 200 day trailing window is better than an algorithm based on "optimization" of years of historical data.

For time travel (aka look ahead bias) you should always validate your algorithm on previously unseen time periods. This includes all tinkering and development you did to choose your algo and tweaks you made to it. For instance you could do all of your testing on 2009-2010 time periods (hopefully randomizing within that) and then validate on 2011-2012.

If you have been developing on all time periods (2002-2013) and build a great "optimized" algo then you will have no choice but to start live paper trading to validate it. That is because the only "unseen" time period is strictly the future. (That is the great secret of "walk forward" testing since you are guaranteed to get unseen data). The downside is that your validation test will take days or weeks!

I agree on all fronts Dennis. I guess we'll just have to keep these issues in mind as we move forward. I just wish there was a more systematic way of proceeding.

Where did you learn to code like that bro? Sort the lists using functions etc?
I just discovered quantopian the other day.. made an account.. wrote my first bot this weekend. Just realized it's very similar to yours, etf replay style :)

Each month, buy the top 2 relative strength out of this basket: VTI, VEU, TLT, VNQ, GLD (Ivy5)
it has a 16.2% CAGR since 2003 with 12.6% drawdown (on etf replay).
I wonder why these RS strategies are all lagging in the 2012-2015 period though - maybe too many RS strategies so RS-killers were developed?

My coding is horrible - i'm basically keeping the "max" and "maxposition" inside a while loop - not even using sort :(
I was wondering if you had any tips on the quickest, fastest way to code better :)

thanks a lot!

Hi Ben - Do you know any other programming languages? If you are new to programming, I would recommend you google a Tutorial. I find other languages, like C++, more ubiquitous and so I usually recommend starting there. However, there is nothing wrong with starting with Python. If you are familiar with programming - i.e. the concept of loops, conditions, etc. - then I'd search for a language reference and just keep at it. Sharing parts of your code can bring helpful insight from others.