Hi all,

So I've noticed that some of the most popular algorithms shared are takes on momentum trading. As was mentioned in one of the threads, it is unclear whether momentum trading actually works. Academics have generally argued that it does not. So I decided to test if there is any data or insight to be gathered when making a trading decision today based on the past t-days' price movements.

For instance, assume that we are trying to determine whether the SPY index will go up or down tomorrow. Is knowing that the direction the S&P went today (or yesterday, the day before that.. etc..), useful in making our decision?

Obviously, there are many parameters you can experiment with - I've shared here one specific implementation. Specifically, I'm using Bernoulli NaiveBayes (http://scikit-learn.org/dev/modules/naive_bayes.html). If I am trying to predict date t, I use dates t-1, t-2, t-3, t-4, and t-5 as predictors.

Further, I define t (and t-1...etc) as binary variables (hence using Bernoulli as opposed to some other Naive Bayes).

The value 1 denotes upward movement, the value 0 denotes downward movement. So, for instance, if the price of SPY was higher at close today than at close yesterday, t-1 is marked as '1'. If the price of SPY was lower yesterday at close than the day before that, then t-2 is marked as '0'. etc.

I use a moving training window of 60 days (and the 5 previous movements for each of those days) to fit the Bernoulli NB model. I then use the latest 5 day price movements to predict what the price movement will be tomorrow.

If you the run the script, you will see that I am logging output each day. I'm also keeping track of the PNL (simply by calculating the spread you would have based on your previous predicted decision, buy or sell).

Most importantly, you will see that the rate of accuracy converges on around 50% as the program runs... this is rather a poor result and suggests this particular method of prediction isn't all that good. Why might this be?

- the Naive part of Naive Bayes! - the premise of naive bayes is that each feature (eg. t-1 or t-2) influences the probability that the predicted value, t, is a particular value. But, it does so independently which could be the problem. For instance, the influence of t-1 on t might also depend on the value of t-2.

- Are binary variables the way to go? Is a upward price movement the same as any other price movement regardless of magnitude? Probably not...
- are we missing predictors? Should volume be accounted for too?
- Can our time frames be optimized? Should our training window be shorter (implying the behaviour of the market changes more frequently) or longer? Is 5 days as predictors too many?

- Are binary variables the way to go? Is a upward price movement the same as any other price movement regardless of magnitude? Probably not...

Feel free to experiment and share if you can! One improvement that I've tried and appears to noticeably is to shorten the training period to 30 days and use a Decision Tree Classifier instead (this solves the problem of independence).

Finally, this is my second post here on machine learning and I'm wondering.. am I contributing? I don't really have a sense of what most users' experience with machine learning is - so if I'd be more helpful explaining more please let me know! I've done my best to comment through my code as well