Mean Reversion and Momentum Switching Model

Hello All,

This is my first post, so please forgive if it resembles a run-on sentence... If my delivery seems familiar, you might know me by my blog, MKTSTK.com. If so, you know I generally DGAF about dropping some truly valuable information on the community just for fun, so without further ado...

Today I wanted to present a little strategy that's been causing some buzz amongst the prop traders and hedge fundos I've been talking to so far. In case you were wondering if Quantopian gave you an edge, this should be proof positive of their commitment to attracting top-notch datasets. Please note that although this strat uses the fetcher API, soon you should be able to use Psychsignal's datasets I've developed directly in Quantopian in addition to the PsychSignal Trader Mood API which is already available for free for backtesting and live trading.

The strategy is driven by a daily datafeed I created called the HIVE-MIND. The Hive, as its called for brevity's sake, is made of two distinct components: 1) the Hive-Bot and 2) the Hive-Net.

The Hive-Bot measures the activity of a symbol with respect to the social media landscape. The Hive-Bot transforms the multidimensional social message flow into a simple scale between 0 and 1.0, called the Social Anomaly Score (SAS). At the high end, 1.0 represents a frenzy level of activity related to a symbol. In the middle, 0.5 is meant to signify a normal social pattern (i.e. what is expected given the historical profile of the symbol over time). At the other extreme, a reading near 0.0 represents a low amount of interest.

My research has shown that the Hive-Bot's SAS is predictive of volatility and correlation. Thus it makes sense to use it as a market timing mechanism. There are many possible forms for this to take within a real trading strategy. One such conceivable usage is to switch between mean reverting and momentum strategies. Despite many idiosyncrasies, trading strategies often break-down into simplistic categories of being levered to momentum or mean reversion. If one could differentiate, a priori, between mean reverting and momentum periods in the market one could make a fortune... but how might you construct such a strategy?

The following presents a model which combines both mean reverting and momentum based strategies. While it is not levered by default, the strategy can choose to employ leverage when it is advantageous. The strategy uses the Hive-Bot's SAS to sense when to switch between mean reverting and momentum regimes. The strategy uses two different look-back windows to gauge momentum and trades around 50 equity ETF's.

Since I'm new to Q, any feedback would be much appreciated. Happy Trading

1231
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
 Returns 1 Month 3 Month 6 Month 12 Month
 Alpha 1 Month 3 Month 6 Month 12 Month
 Beta 1 Month 3 Month 6 Month 12 Month
 Sharpe 1 Month 3 Month 6 Month 12 Month
 Sortino 1 Month 3 Month 6 Month 12 Month
 Volatility 1 Month 3 Month 6 Month 12 Month
 Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Import the libraries we will use here.
from quantopian.algorithm import attach_pipeline, pipeline_output
from quantopian.pipeline import Pipeline, CustomFilter
from quantopian.pipeline.factors import AverageDollarVolume, Returns
from quantopian.pipeline import CustomFactor
from quantopian.pipeline.data.psychsignal import aggregated_twitter_withretweets_stocktwits as psychsignal

import pandas as pd
import numpy as np

sid(2174),   # DIA
sid(19920),  # QQQ
sid(21519),  # IWM
sid(21513),  # IVV
sid(21520),  # IWV
sid(34385),  # VEA
sid(28073),  # XBI
sid(19658),  # XLK
sid(19661),  # XLV
sid(19659),  # XLP
sid(19660),  # XLU
sid(19655),  # XLE
sid(19657),  # XLI
sid(22972),  # EFA
sid(14522),  # EWL
sid(14520),  # EWJ
sid(14529),  # EWU
sid(27102),  # VWO
sid(33655),  # HYG
sid(19654),  # XLB
sid(19662),  # XLY
sid(32275), # XRT
sid(26432),  # FEZ
sid(14516),  # EWA
sid(14519),  # EWH
sid(32270),  # SSO
sid(39214),  # TQQQ
sid(22739),  # VTI
sid(38533),  # UPRO
sid(21512),  # IVE
sid(26669),  # VNQ
sid(21518),  # IWF
sid(21507),  # IJH
sid(21517),  # IWD
sid(25909),  # VTV
sid(28364),  # VIG
sid(25910),  # VUG
sid(21508),  # IJR
sid(32620),  # PFF
sid(12915),  # MDY
sid(25647),  # DVY
sid(21516),  # IWB
sid(32888),  # VYM
sid(25907),  # VO
sid(40107),  # VOO
sid(21514),  # IVW
sid(25899),  # VB
sid(22908),  # IWR
sid(21786))  # IWO

anchor_symbol  = sid(8554)

class SidInList(CustomFilter):
inputs = []
window_length = 1
params = ('sid_list',)

def compute(self, today, assets, out, sid_list):
out[:] = np.in1d(assets, sid_list)

def initialize(context):
my_sid_filter = SidInList(
sid_list = (anchor_symbol)
)
pipe = Pipeline(screen = my_sid_filter)
attach_pipeline(pipe, 'my_pipeline')

url = 'http://hive.psychsignal.com/public/historical/hivebot/SPY'
fetch_csv(url,
date_column='date',
symbol='spy_sas',
usecols=['SAS'],
date_format='%Y-%m-%d')

schedule_function(rebalance,
date_rules.every_day(),#(days_offset=1),
time_rules.market_open(hours=0, minutes=30))

def handle_data(context, data):
context.SAS = data.current('spy_sas','SAS')

def my_pipeline(context):
my_sid_filter = SidInList(
sid_list = (anchor_symbol)
)

pipe = Pipeline(
columns = {},
screen = my_sid_filter,
)

return pipe

record(leverage_ratio=context.account.leverage)
context.output = pipeline_output('my_pipeline')

def rebalance(context,data):
sas_thresh = 0.66
sas  = context.SAS
momo_thresh = 0.
revs_thresh = 0.
price_history = data.history(anchor_symbol, "price", 40, "1d")
momo1 = (price_history.ix[-1] - price_history.ix[0]) / price_history.ix[0]
momo2 = (price_history.ix[-1] - price_history.ix[-2]) / price_history.ix[-2]
#price_history = data.history(trading_symbol, "price", 40, "1d")
#momo1 = (price_history.ix[-1] - price_history.ix[0]) / price_history.ix[0]
#momo2 = (price_history.ix[-1] - price_history.ix[-2]) / price_history.ix[-2]
pos_weight1 = 0
pos_weight2 = 0
if sas > sas_thresh:
# momo mode
# check if momo1 or momo2 is triggered
if momo1 > momo_thresh:
pos_weight1 = bull_weight
elif momo1 < -momo_thresh:
pos_weight1 = bear_weight
if momo2 > momo_thresh:
pos_weight2 = bull_weight
elif momo2 < -momo_thresh:
pos_weight2 = bear_weight
else:
# mean reversion mode
# chick if momo1 is triggered
if momo1 > revs_thresh:
pos_weight1 = 2. * bear_weight
elif momo1 < -revs_thresh:
pos_weight1 = 2. * bull_weight

pos_weight = pos_weight1 + pos_weight2

del price_history
del momo1
del momo2
del sas

There was a runtime error.
26 responses

Why don't you use the SAS for each symbol rather than just one for all for SPY? It would be helpful if the usefulness of the signal could be validated in isolation, outside of a backtest with a bunch of free parameters, perhaps using the "alpha lens" framework they just announced for Research.

Good question. My research into the Hive-Bot has shown that SPY's SAS exerts a powerful influence on the dependency structure of the market. Thus, there was good reason to think that SPY's SAS could be used as a global indicator / signal with value across a number of trading instruments. Moreover, I found that adding SAS to absolute momentum strategies improved their risk/return profiles Thus, I viewed this strategy as an extension of the above lines of research. You can get a deeper look at some of the published results here

That being said, no doubt this is just the tip of the iceberg, my plan is to post many, many more tests as we go forward. My hope is that the community will start to explore the possibilities in the Hive dataset as well. The idea here was to provide one of many possible example of how to use the Hive-Mind in a trading algorithm. I think things will get truly interesting once we get the Hive-Net integrated as well and can start connecting the dots... using the network graphs directly in trading strategies on Quantopian.

Do you have a direct link to those whitepapers? I seem to run into a content wall.

Sorry about that, we are tweaking the website flow at the moment, we'll send you the papers directly

I found there are relevant non-firewalled whitepapers available here:
https://psychsignal.com/internalwhitepapers/

Hello Tynan,
Is there any chance this algo is leaking SAS data from the future? You say in the text that your SAS is a daily datafeed, so I assume that the daily value for SAS would be calculated based on the social media from that day, rather than from the previous day. In your algo, you use data.current('spy_sas','SAS')) pulled from the historical SAS records, suggesting that you're getting the current day's SAS value (day[0]) at 10am and trading on that knowledge, when what you would really have available at 10am would be the SAS value for day[-1]. This might explain some of the exceptional performance seen here... when SPY is going up well, the social media reflects this intraday, and hence a purchase around 10am based on the full day's social insights might be quite a good idea.
On a side note, I find that this algo performs considerably better by reducing the universe of ETFs substantially, to a handful or so. Also, of course, these returns are remarkably fictitious for any retail trader given the very large number of trades made. It's quite a fun algo to play with though, thanks for publishing.

no, the production hive is updated intraday, so this daily version is generated well in advance of the open to avoid exactly that bias

re: transactions

i'm using the default impact/commish models, which i think default to $0.0075 a share? that is more than double IB's US equity commissions, so I think if anything its pretty conservative with regards to that, although with some higher rate brokers daily rebalancing could probably get expensive. could always trade a super liquid subset if commish/liquidity is an issue. or just trade the e-mini using spy's signal (it will be very cool when futures are up and running on quantopian!) Thanks Tynan, I am new here and missed the important default commission and slippage calculations. But I'm still confused about historical SAS data. Correct me if I'm wrong, because this seems awkward. When I get a price (say the "close") for a ticker for a given date, that will be the close that was recorded for that specific date. But if what you say is correct about the hive data, this historical SAS value you're pulling gives you the SAS value (i.e., the social media indicator value) for the previous day's social media, not the value calculated for the date specified. This seems to make your data arrangement different than most -- i.e., to get the SAS social media value for SPY as recorded for 2015-11-23, I would have to use the value for 2015-11-24! @Paul Have a look at the attached notebook, specifically the trader mood data set columns. You'll notice there is an "asof_date: The date to which this data applies." and a second "timestamp" which is the date the data is available to trade. Quantopian adds this "timestamp" column to every data set in order to prevent future snooping and standardize non-standard data sets. With the Hive data, Tynan is following the exact same convention Quantopian uses to timestamp the underlying Trader Mood data. The data is available after 4am UTC for trading the same day. 27 Loading notebook preview... Notebook previews are currently unavailable. Thanks to you both for your patience with me; I have much to learn coming from another platform, but I'm liking what I'm learning very much. Hi Tynan, thanks for this. It looks interesting, any idea when this dataset will be available for full use in Quantopian? @ Hayden Until Quantopian starts pulling HIVE data into the data store we will continue to make the data available via the fetcher API. Feel free to use it as much as you like. Let me check with Seong for an estimate as to when the data will make it into the store. Hi James, thanks for that. The fetcher data in the example is only up until last week so I assume there will be no current data until it gets integrated? I assume it will be a premium data source? I've quickly integrated the sample data above into an algorithm I've been getting ready to live trade and it's had positive results both in terms or return and DD so I'm keen to have more of a play around with it. But obviously it's ready when it's ready :) One thing I'm not too sure about. So this data indicates the level of social media activity, not the mood, correct? I've had a brief play with the trader mood data and couldn't get consistent results from it, but this activity data seemed to have good results without much effort. That seems counter intuitive; wouldn't it be possible that a high SAS score could equally indicate a high activity of negative social discussion and therefore indicate a downturn? That doesn't seem to be the case in backtesting. I imagine there's more info in your whitepaper so I'll have a read on the weekend. Thanks! Thanks for contributing this really interesting strategy (great blog btw). My question is the how the free parameters introduced in the rebalancing function are chosen. How did you arrive at the SAS threshold of .66? Why different weights for the long side 1.0 vs short side -0.8? How sensitive is the portfolio to the threshold level and long/short weights? Hi Brian C. I'm guessing, but for this particular universe and weights, the choice of 40 day SAS lookback and SAS threshold of .66 seem fairly optimized, not based on any particular rationale. Clone it yourself and try several nearby variants and I think you'll find it so (I did). My own playing with this algorithm seems to work best with a smaller universe of ETFs (e.g., try removing all but SPY and TQQQ for an extreme example with double the returns, but a concurrently high MDD and volatility). Personally, I'm bending this algorithm in other ways, like removing the short side (long only) and keeping it unleveraged. I'm more partial to finding parameters that are both not so optimized (ie, more of a plateau of decent returns in a 2D space of lookback and SAS threshold) and also display a better equity curve over the past couple of flat years. What I'd like best to find is a decent standalone momentum strategy where the SAS data improves it, rather than what is shown here as an example, which is a reasonably poor strategy without the SAS data (try setting sas_thresh to -1 and see the momentum strategy fail on its own). In some cases, I've found what appear to be decent momentum strategies (with sas_thresh set to -1) worsened by addition of a positive sas_thresh. These are intriguing data I'm continuing to explore. With regards to the sas threshold, since the sas is bounded by 0 and 1, initially I looked at a simple partitioning: 0 to 1/3 was defined as “low”, 1/3 to 2/3 as “medium” and anything above 2/3 as “high”. This rough classification system proved useful, so the 0.66 follows from that, and thus I doubt its optimized globally although it’s seems like it’s a good enough heuristic to start with. An extension of this strategy would be to make the sas threshold adaptive, maybe using some kind of ema or something more elaborate to vary the threshold in a sensible manner. Its also conceivable that the appropriate thresholds could vary based on the particular strategic use case, e.g. have diff thresholds for momentum and mean reversion Also, you could def run this strat with equal long/short weightings, but I know in practice it can be advantageous to tilt these strats slightly long because of the persistent bias towards the stock market, which has imposed a sort of penalty to shorts, hence the slight bias in weights 1.0/-0.8. Firstly, an update on data: we are currently onboarding the SAS dataset w Q and hope to have it live mid-september In the meantime, after a slight learning curve (is it possible to get fetch_csv to work in a notebook?), I wanted to post a notebook that dives a little bit deeper into the SAS data. You see, sometimes when people hear that there is an intimate link between SAS and volatility, inquiring minds want to know if they can get the same value as the SAS by replicating it with the VIX (or the VIX futures curve). The thing about the VIX is that it is biased towards downside volatility. Nobody really wants/needs to hedge upside volatility (those shocks are GOOD) so using the VIX only captures part of the picture. If prices are ripping, realized volatility could be very high while the vix is actually falling. Thus, this notebook takes a look at the contemporaneous correlation between the SAS (smoothed and unsmoothed) and the vix futures curve. 12 Loading notebook preview... Notebook previews are currently unavailable. I cloned the original algo and did not change anything. The results are different. Why? 32 Loading... Total Returns -- Alpha -- Beta -- Sharpe -- Sortino -- Max Drawdown -- Benchmark Returns -- Volatility --  Returns 1 Month 3 Month 6 Month 12 Month  Alpha 1 Month 3 Month 6 Month 12 Month  Beta 1 Month 3 Month 6 Month 12 Month  Sharpe 1 Month 3 Month 6 Month 12 Month  Sortino 1 Month 3 Month 6 Month 12 Month  Volatility 1 Month 3 Month 6 Month 12 Month  Max Drawdown 1 Month 3 Month 6 Month 12 Month # Import the libraries we will use here. from quantopian.algorithm import attach_pipeline, pipeline_output from quantopian.pipeline import Pipeline, CustomFilter from quantopian.pipeline.factors import AverageDollarVolume, Returns from quantopian.pipeline import CustomFactor from quantopian.pipeline.data.psychsignal import aggregated_twitter_withretweets_stocktwits as psychsignal import pandas as pd import numpy as np trading_symbols = (sid(8554), # SPY sid(2174), # DIA sid(19920), # QQQ sid(21519), # IWM sid(21513), # IVV sid(21520), # IWV sid(34385), # VEA sid(28073), # XBI sid(19658), # XLK sid(19661), # XLV sid(19659), # XLP sid(19660), # XLU sid(19655), # XLE sid(19657), # XLI sid(22972), # EFA sid(14522), # EWL sid(14520), # EWJ sid(14529), # EWU sid(27102), # VWO sid(33655), # HYG sid(19654), # XLB sid(19662), # XLY sid(32275), # XRT sid(26432), # FEZ sid(14516), # EWA sid(14519), # EWH sid(32270), # SSO sid(39214), # TQQQ sid(22739), # VTI sid(38533), # UPRO sid(21512), # IVE sid(26669), # VNQ sid(21518), # IWF sid(21507), # IJH sid(21517), # IWD sid(25909), # VTV sid(28364), # VIG sid(25910), # VUG sid(21508), # IJR sid(32620), # PFF sid(12915), # MDY sid(25647), # DVY sid(21516), # IWB sid(32888), # VYM sid(25907), # VO sid(40107), # VOO sid(21514), # IVW sid(25899), # VB sid(22908), # IWR sid(21786)) # IWO anchor_symbol = sid(8554) class SidInList(CustomFilter): inputs = [] window_length = 1 params = ('sid_list',) def compute(self, today, assets, out, sid_list): out[:] = np.in1d(assets, sid_list) def initialize(context): my_sid_filter = SidInList( sid_list = (anchor_symbol) ) pipe = Pipeline(screen = my_sid_filter) attach_pipeline(pipe, 'my_pipeline') url = 'http://hive.psychsignal.com/public/historical/hivebot/SPY' fetch_csv(url, date_column='date', symbol='spy_sas', usecols=['SAS'], date_format='%Y-%m-%d') schedule_function(rebalance, date_rules.every_day(),#(days_offset=1), time_rules.market_open(hours=0, minutes=30)) def handle_data(context, data): context.SAS = data.current('spy_sas','SAS') def my_pipeline(context): my_sid_filter = SidInList( sid_list = (anchor_symbol) ) pipe = Pipeline( columns = {}, screen = my_sid_filter, ) return pipe def before_trading_start(context, data): record(leverage_ratio=context.account.leverage) context.output = pipeline_output('my_pipeline') def rebalance(context,data): sas_thresh = 0.66 sas = context.SAS bull_weight = 1.0 / len(trading_symbols) bear_weight = -0.8 / len(trading_symbols) momo_thresh = 0. revs_thresh = 0. price_history = data.history(anchor_symbol, "price", 40, "1d") momo1 = (price_history.ix[-1] - price_history.ix[0]) / price_history.ix[0] momo2 = (price_history.ix[-1] - price_history.ix[-2]) / price_history.ix[-2] for trading_symbol in trading_symbols: #price_history = data.history(trading_symbol, "price", 40, "1d") #momo1 = (price_history.ix[-1] - price_history.ix[0]) / price_history.ix[0] #momo2 = (price_history.ix[-1] - price_history.ix[-2]) / price_history.ix[-2] pos_weight1 = 0 pos_weight2 = 0 if sas > sas_thresh: # momo mode # check if momo1 or momo2 is triggered if momo1 > momo_thresh: pos_weight1 = bull_weight elif momo1 < -momo_thresh: pos_weight1 = bear_weight if momo2 > momo_thresh: pos_weight2 = bull_weight elif momo2 < -momo_thresh: pos_weight2 = bear_weight else: # mean reversion mode # chick if momo1 is triggered if momo1 > revs_thresh: pos_weight1 = 2. * bear_weight elif momo1 < -revs_thresh: pos_weight1 = 2. * bull_weight pos_weight = pos_weight1 + pos_weight2 order_target_percent(trading_symbol, pos_weight) del price_history del momo1 del momo2 del sas There was a runtime error. Hi Terry, Even though the code is the same, you did run a different backtest. Your algo trades with a much smaller capital base,$10k vs my algo at $1MM My guess is that this illustrates the effect of commissions on different sized portfolios. This strategy uses the default Q commission model, so there is a$1 minimum trade cost (or 0.0075 per share) cost associated with each trade

This algo rebalances daily, so there's a potential $50 a day fixed commission cost. For a$1MM portfolio, $50 represents a 0.005% drag on returns. For a$10k portfolio, \$50 is 0.5%!

Tynan,

Thanks for clarifying. My newb brain did not catch that.

@Elle

What seems to be the confusion?

FYI the historical and live updating HIVE SAS scores should be available natively in the Quantopian data pipe within the next few weeks.

Hey Tanan, i am pretty new to the quantopian community. i have cloned your model but unfortunately i have the following error message.
Exception: Could not connect to http://hive.psychsignal.com/public/historical/hivebot/SPY
There was a runtime error on line 84.

Let me know if you find a solution to resolve it
Thanks

Hi Tynan,

hope you're still following this thread. I was also pretty interested into the predictive power of social media data. So I've done some research. As you linked earlier, there are 4 psychsignal datasets on Q right now.

stocktwits (same as stocktwits_free, which is used in the tutorial notebook)

to kind of figure the differences out I put them all into alphalens. There are quite many questions on how to perform the alphalens test (Which time frame? Which variables from the datasets? Which values from the tearsheet to compare?)

My conclusions from the attached Notebook:
- I think bull_minus_bear is the field to choose (*(-1) because it's a negative correlation).
- The predicitve power is the strongest for short term forecasts (like one day). Which kind of makes sense.
- I like the stocktwits most. It's mostly green and it has a high IR. And at least for the chosen period isn't negative in it's returns.
- To not only use stocktwits data, I think additionally also tweets without retweets is valuable (also kind of a good correlation)

One risk that you can't see in the notebook: if you shift the timeframe a bit, the results are definitelly worse.

Tynan, you said they want to include the SAS Dataset mid-september, any updates on this?

19
Notebook previews are currently unavailable.

Hi Everyone and thanks for this interesting post,

Is still someone running this algo on Q ? I've just tried to run it and i get this error
Exception: Could not connect to http://hive.psychsignal.com/public/historical/hivebot/SPY

wondering if the hive data still available ..

thanks for you help

hive.psychsocial.com subdomain is no longer live, and their website seems a bit moribund, with no blog posts since Sept 2016 which was when this thread was started. i have no insider knowledge, but no tweets from the company since June, no clear signs of activity. their CEO has a linkedin presence, but only thing of interest seems to be a move toward using this sort of data toward trading crypto (see decryptz.com).

@Jai Otmane

Hey, sorry not with psychsignal anymore, but if you have interest in this methodology contact [email protected] or visit www.slicematrix.com. Thanks.