Back to Community
Analyzing the relationship between investor attention and the predictability of arbitrage strategies for the US market

There has been recent research covering a possible relationship between investor attention and the predictability of arbitrage strategies. Inspired by the astonishing work in this field, this research tried to replicate the results for the US market. In general, an investment strategy for a certain stock was defined to entail predictability if the best performing hyper-parameter set for month t was also amongst the Top 20 percent of best performing hyper-parameter sets in month t+1. In this research, a Simple Moving Average (SMA) Crossover Strategy where a variety of different values for the short period SMA and the long period SMA were used was tested. As a proxy for the investor’s attention the aggregated post volume on Twitter and StockTwits for each stock was used. Subsequently, the post volume was used as an input variable for several machine learning algorithms to classify whether a strategy entailed predictability or not. In general, two different approaches of models were tested:

  1. A total of one year of data was taken (e.g. 2015, 2016, etc.) and then split into 70 percent for training the models and 30 percent for evaluating the machine learning models.
  2. A monthly rolling prediction was performed i.e. the models were only trained on one month of data and their performance was evaluated on the following month. Subsequently, the models were retrained with the data of the next month.

Finally, the models were evaluated by analyzing the ROC-AUC-Scores and comparing a baseline with the precision scores of the models.
Unfortunately, by analyzing the evaluation metrics a relationship between investor’s attention and the predictability of arbitrage strategies for the US market could not be inferred. The precision scores of the models in both approaches, almost matched the baseline. In both modeling approaches, the Gaussian Support Vector Machine (SVM) algorithm performed best. Additionally, the ROC-AUC-Scores for all models were around the 50 percent mark indicating that the models contain no predictive power.

Unfortunately, the promising results of prior work in this field, could not be reproduced for the US market. This can have multiple reasons. Firstly, a different market was analyzed. Secondly, prior research used post volume created at specific stock media platforms while in this work post volume from StockTwits as well as from the social media platform Twitter was used. Nevertheless, future research must be conducted to test different proxies of investor’s attention to clarify whether the proposed thesis holds for the US market. Finally, it should be tested whether the proposed relationship only holds for certain sectors.

$#$ Introduction into Systematic Investment Strategies Seminar Uni Freiburg

Loading notebook preview...
1 response

Well done!

I think as you say that the proxy from stocktwit is a bit to concealing. I think u should try something more direct such as trade data. Often it is the case that attention in media is correlation, not causation, and as such you often need something more direct. But finding the right proxy is always hard.