Stock prices reflect the trading decisions of many individuals. For the most part, quantitative finance has developed sophisticated methods that try to predict future trading decisions (and the price) based on past trading decisions. However, what about that the information gathering phase that precedes a trading decision? Two recent papers in Nature’s Scientific Reports suggest that Google searches and Wikipedia usage patterns contain signal about this information gathering phase that can be exploited in a trading algorithm. As is (unfortunately) very common, there is no published code in the paper that we can use to easily replicate the results. The algorithms are very simple, though, so I coded both of them on Quantopian. They indeed seem to perform quite favourably and thus roughly replicate the results of the paper, as you can see below. The original simulations have not included modeling transaction costs or slippage which we include here. In that regard, we can show that these strategies still seem to work under more realistic settings.
This algorithm looks at the Google Trends data for the word ‘debt’. According to the paper, that word has the most predictive power.
This data is not as easy to automate within Quantopian, but it’s relatively easy to do so manually. I downloaded the csv file and edited it to get it into the right format. I uploaded the resulting file here.
If you want to use my data on ‘debt’ feel free to do so. If you want to use a the Google Trend for a different word, you can download the CSV, edit it to look like mine, and place it in a public Dropbox or some other webserver.
If there is enough interest we can make this data more accessible (if you want to help me with this, an automated Python script that parses the csv returned by Google Trends to the format I posted would be well appreciated).
For this algorithm, once the weekly average is smaller than the moving average of the delta_t (in this case delta_t == 5 weeks), we buy and hold the S&P500 for one week. If the weekly average is larger than the moving average then we sell and re-buy the S&P500 after one week. The original paper uses the Dow Jones Industrial Average, the S&P500 is highly correlated however.
Suggestions for improvement (please post improvements as replies to this thread):
- The authors used many different search queries, listed here . If you upload different queries in the same csv format as I did we can explore those as well.
- delta_t == 3 is what the authors of the paper used. It would be interesting to see how the algorithm performs when this is changed.
- The underlying algorithm is a very basic moving average cross-over. Certainly a more clever strategy might be able to do a much better job.
|Returns||1 Month||3 Month||6 Month||12 Month|
|Alpha||1 Month||3 Month||6 Month||12 Month|
|Beta||1 Month||3 Month||6 Month||12 Month|
|Sharpe||1 Month||3 Month||6 Month||12 Month|
|Sortino||1 Month||3 Month||6 Month||12 Month|
|Volatility||1 Month||3 Month||6 Month||12 Month|
|Max Drawdown||1 Month||3 Month||6 Month||12 Month|
# This algorithm recreates the algorithm presented in # "Quantifying Trading Behavior in Financial Markets Using Google Trends" # Preis, Moat & Stanley (2013), Scientific Reports # (c) 2013 Thomas Wiecki, Quantopian Inc. import numpy as np import datetime # Average over 5 weeks, free parameter. delta_t = 5 def initialize(context): # This is the search query we are using, this is tied to the csv file. context.query = 'debt' # User fetcher to get data. I uploaded this csv file manually, feel free to use. # Note that this data is already weekly averages. fetch_csv('https://gist.githubusercontent.com/twiecki/5629198/raw/6247da04bacebcd6334a4b91ed21f14483c6d4d0/debt_google_trend', date_format='%Y-%m-%d', symbol='debt', ) context.order_size = 10000 context.sec_id = 8554 context.security = sid(8554) # S&P5000 def handle_data(context, data): c = context if c.query not in data[c.query]: return # Extract weekly average of search query. indicator = data[c.query][c.query] # Buy and hold strategy that enters on the first day of the week # and exits after one week. if data[c.security].dt.weekday() == 0: # Monday # Compute average over weeks in range [t-delta_t-1, t[ mean_indicator = mean_past_queries(data, c.query) if mean_indicator is None: return # Exit positions amount = c.portfolio['positions'][c.sec_id].amount order(c.security, -amount) # Long or short depending on whether debt search frequency # went down or up, respectively. if indicator > mean_indicator: order(c.security, -c.order_size) else: order(c.security, c.order_size) # If we want the average over 5 weeks, we'll have to use a 6 # week window as the newest element will be the current event. @batch_transform(window_length=delta_t+1, refresh_period=0) def mean_past_queries(data, query): # Compute mean over all events except most current one. return data[query][query][:-1].mean()