Back to Community
Applying Deep Learning to Enhance Momentum Trading Strategies in Stocks: 45.93% annual return

Thomas Wiecki mentioned this a couple of years ago (he omitted the spaces, so look for "ApplyingdeepLearningtoEnhanceMomentumTradingStrategiesinStocks) in the thread on trading ideas.

Takeuchi, L., Lee, Y. (2013). Applying Deep Learning to Enhance Momentum Trading Strategies in Stocks.

We use an autoencoder composed of stacked restricted Boltzmann machines to extract features from the history of individual stock prices. Our model is able to discover an enhanced version of the momentum effect in stocks without extensive hand-engineering of input features and deliver an annualized return of 45.93% over the 1990-2009 test period versus 10.53% for basic momentum.

Can anyone with a data science head create a Q version of this?

54 responses

Utterly fascinating. Quite impossible for me to comment at this stage but there are a great number of questions I have to which I will need to search for answers. For me the most interesting sentence is as follows:

our model is not merely rediscovering
known patterns in stock prices, but going beyond what
humans have been able to achieve

Is it REALLY possible for deep learning to take a simple set of returns and improve on the "forecasts" made by the application of a simple momentum strategy? This papers seems to indicate that that is the case.

Having read the paper two or three times I am still unclear exactly what each "stack" actually does but no doubt I will eventually tumble upon some sort of conclusion.

Happily, this paper comes at a time when I had decided to retire from the incredibly boring research I have done to date. I have decided to "learn" AI and deep learning. Or at least to attempt to.

I am far from certain that it has any application to the long term prediction of stock prices but this article seems to suggest otherwise. I look forward to finding out whether this research has indeed discovered Eldorado or whether other factors are in play which will make this line of research as fruitless as most others in the financial markets.

Training a deep neural net on quantopian data would be challenging unless you could run the notebooks / algorithms on machines with powerful GPUs attached.

If you have offline access to relevant trading data, you could train a net from that on non-quantopian machines and then translate the resulting net to scipy for execution in the quantopian framework.

Very interesting to read some of the other papers from Stanford on deep learning applied to markets. The referenced paper claims just over a 50% classification accuracy as to whether trades will end up winners or losers the following month. Just using price as the input.

The model is correct 53.84% of the time when it predicts class 1 and a
somewhat lower 53.01% of the time when it predicts class 2.

Consider that a typical un-adorned old fashioned trend following strategy typically provides 40% winning trades and profits by running winners and cutting losers.

If it did work in 2013 would it work any more? I would think banks and brokerage houses would have armies of PhDs writing code like that.

Greg
Many people think that way. And I know what you mean. But if it is true then you may as well give up altogether. As may Quantopian. I have no idea whether it still works but I intend to replicate the study. All I am sure of is my own ignorance.

There was a thread a while back where some one tried this using one of machine learning libraries on a single stock:

Predicting Price Movements via Regimes and Machine Learning

It may be a good place to start.

It runs quite slow. To speed things up you may want to download price data from EOData (or other site) and work from that on your own machine.

Anthony, I found this Python machine learning code (and associated MOOC course) and thought you might find it useful: http://www.johnwittenauer.net/machine-learning-exercises-in-python-part-1/

Another group has posted even better accuracy numbers (82% vs 53%). Not sure on the quality though.
http://link.springer.com/chapter/10.1007/978-3-319-42297-8_40

You probably could just reach out to the authors about their implementation.

RBMs can be done in R with deepnet
https://cran.r-project.org/web/packages/deepnet/

Interesting. The methodology in the Springer link is also based on price only as the input, although perhaps one should not be surprised by the vastly greater accuracy: this is forecasting 1 minute ahead whereas the Lee project forecasts one month ahead.

I'm concentrating on Python, Keras and Theano...as well as skLearn.

Is the paper freely available somewhere?

@Anthony - Yeah, some differences implementation. Python can make calls to R if need. Tried using PDNN for python?

@Alex - Try this http://sci-hub.cc/10.1007/978-3-319-42297-8_40

Patrick
My current knowledge is infantile. I'm beginning from scratch on the whole topic and building ANNs from scratch for the experience using some Noddy textbooks. I'm interested in the whole field so looking at whatever ML techniques could be useful including RBMs.

My suspicion is that so far as longer term investment is concerned this will all turn out to be a waste of time. Or rather that it will not provide me with risk adjusted returns better than the simple 50/50 system I outlined on my website.

But we will see. I am as keen to shoot the lights out as anyone else but know from experience that these ventures usually turn out rather differently than one might have hoped!

When I am a bit further down the line I will contact Takeutchi Lee and see what he did further (if anything) with the strategy outlined. I wonder if he actually traded it? Either for himself or his employers.

@Patrick: Thank you.

Patrick
Gosh, I just noticed this in the referenced paper:

The data used for training and testing are the AAPL tick-by-tick transactions
from September to November of 2008.

1 stock tested for 3 months! I'm surprised they did not take it a bit further than that but who knows perhaps the resulst would have been the same for different stocks and periods?

Hi Anthony and group. Two issues:

  1. How many trials were involved in achieving this outperformance? It is not clear. Did they adjust parameters of the RBM until they got the desired result? Besides look-ahead bias which they claim not to be an issue, there is also data-snooping and selection bias. In fact, selection bias can be quite large.

  2. The paper was published at the end of 2013 but the test sample sample ended in 2009. There is no reason for that except in the case that the outperformance came from short-selling during the 2000 and 2008 bear markets, in which case it vanished after 2009.

Claims of outperformance of momentum by Glabadanidis were recently debunked by Prof. Zakamulin after he showed there was look-ahead bias in calculations. More about this and other issues, also regarding special market conditions that give rise to high t-stats, in my recent paper, http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2810170

Best.

I found a download link for the paper referenced by Patrick: (Arevalo et al. July 2016) High-Frequency Trading Strategy Based on Deep Neural Networks

Has anyone examined the technique proposed by Lee et al.? I'm having a go (using free Quandl data) but I'm finding it difficult to follow. I can handle the ML aspects. But I'm not quite sure how they are packaging the data.

I think it's something like this:

For a given moment in time for a particular stock we can construct a (labelled) training item by using the previous 13 months worth (and the subsequent 1 months worth) of daily data for that stock.

We use this data to construct 12 monthly cumulative returns ending a month short of our moment. So I'm guessing just add up daily Adj_Close prices & spit out the value every 30 or so passes. Now it gets interesting. They do the same thing for every other stock at this moment, and get a z-value for our stock over this set (i.e. # of standard deviations from the mean). So the movement of this z-value is showing the growth of this particular stock relative to the whole market. Since the algorithm is going to be invested a certain amount of money in the market, and just shifting it between stocks, this is what you want!

Looks like they do this for each of the 12 monthly cumrets.

And then they do the same process for the previous 30 days.

That actually makes a lot of sense because you want to be feeding in Data with mean 0 round about the (-1, +1) range into your NN.

So that covers input data. (there is one extra input that is a 'start of year' flag. But a complete supervisor training item also requires an associated output value. It looks as though they are just using whether that particular stock went up in the ensuing month. Although I don't understand their language, they talk about 'above the median'. And median of what??? It seems a really weird way of doing it. Why not just look at whether the price one month later is higher or lower than the price at this particular moment & output 1 or 0 accordingly? I think that's what I will do as I don't understand what they're saying.

Then I can only assume that everything is shunted forwards by a single day at the algorithm is repeated to generate another sample.

It seems strange to me that they don't make use of daily volume.

π

Hi, Pi,

I actually had a go at implementing it on my local machine in TensorFlow using yahoo data I downloaded. "Above the median" just means "above the median of percentage returns for every stock for that month". Just looking at whether the price were higher or lower in absolute terms for the month (rather than whether it were higher or lower relative to all other stock movements) would probably be less effective. They're consistent in using this relative approach, as all return data features are z-scored for each month time-step.

I've backtested in zipline, and so far I've not been able to duplicate their stellar results, but I'm still hopeful as currently my code doesn't use an RBM-based auto-encoder (I'll re-code this part when I get time) and I'm also not training either the auto-encoder or full network for very many cycles on my single-GPU machine. Also I think I could add historical data for now-defunct stocks (rather than just currently tradable ones that I'm using as my "universe") that would get better feature-extraction results in the auto-encoder phase. It wasn't clear to me whether they did this or not. Of course, this old historical data would have to come from another source (not free yahoo data). They're training with data from 1965-1989, which just isn't a lot of data for a deep neural net (and probably way to old for the resultant model to have any practical value for the trading in the present).

By the way, these guys seemed to be able to reproduce the white-paper results with the same input features and a slightly different machine learning model: http://www.math.kth.se/matstat/seminarier/reports/M-exjobb15/150612a.pdf

So roughly 53℅ correct prediction on the test? In 53℅ of cases the network predicted stocks which ended up in the top half of returns in the following period? So very similar ....

No backtest provided but....as I say much better than many long term TF systems.

Yeah, supposedly 52.89% in the paper I referenced, though I'm not getting these results in my own code (yet). Yeah it's too bad there's no backtest data provided. This algorithm is definitely long-term, low frequency (you run it once per month, and hold your positions for the entire month) though it could certainly be modified to be shorter term. I intend to play around with it using minute data too, eventually and different trade frequencies on the monthly/daily data.

The Takeichi paper didn't mention vol and drawdown either. Likely to be pretty high I imagine. Also all sorts of other problems like a bias depending on the reallocation date and god knows what else. But interesting stuff. Personally a month holding period would not worry me if the return was really that good. But, to be honest, after years of fooling myself I am pretty jaundiced about backtesting whatever system is used.

@Justin Weeks,

In my humble experience annualized returns above 15% are either due to look-ahead bias or over-fitting. The market does not allow these high returns because a leveraged trader would own it longer-term. So these academic researchers are being fooled by backtesting and its most serious caveat which is the "no impact" on prices.

If you take a look at my Aug 124 post above, there is a mention of the papers by Glabadanidis on price series momentum that were declared seminal with returns in the order of 15% only to be refuted recently by Zakamulin for being the result of look-ahead bias. We are talking about simple algos here, yet the implemented code in excel has look-ahead bias. Imagine what can go wrong with complicated ML algos in that domain. I use the geometric equity curve test. If it holds, then the probability of a flaw in backtest is > 95%.

@Michael Harris, you may be right, and thanks for trying to save me from myself and the impractical academics, but I decided I'd be happy reproducing these returns even if flawed. At that point if they seemed too-good-to-be-true, I'd try to pick them apart to find bias/snooping/over-fitting. The main point for me was really an exercise in learning TensorFlow and applying deep learning techniques to financial time-series data. I do believe there are patterns to be teased out of this kind of data using the deep learning approach, though maybe the momentum-based model that this particular ML algorithm produces isn't ultimately going to be profitable. The great thing about deep neural networks is that once you have the basic data flow down and have the network structure declared it's easy to feed it different data that you think might be predictive and produce a model with completely different behavior. It's also relatively easy to modify the network structure, and very easy to tweak params to see if they yield better test results, though as you mentioned, if done improperly I understand there is risk of over-fitting. I still have a lot to learn about the gotchas, so thanks for the words of warning.

@Justin Weeks, maybe you misunderstood me, I did not comment on your work and efforts but on academic papers with results that cannot be replicated and even contain serious errors, assumption and demonstrate lack of understanding of markets and trading.

If you pay close attention to the results of that paper, the following issues are present:

  1. Repeated trials until the authors get a good result. This introduces data-snooping bias. They do not adjust their t-stat for that which shows lack of understanding of the perils of data-mining.

  2. Bulk of gains are between 1990 and 2001, probably a long over-fit during the strongest uptrend in stock market history and than a short over-fit during the dot com crash.

  3. Authors do not report important metrics, such as max drawdown, Sharpe ratio and payoff ratio.

Unfortunately, the academic environment knows how to trick company executives with promises of high returns and authors of similar papers get high paying positions and before they are fired they accumulate good wealth at the expense of honest analysts who will never report unrealistic annual return figures and will apply a reality check to reduce data-mining bias. These honest people have no impressive results to show but only reality and they will never pass thought the door of a large investment bank or hedge fund.

The whole paper was a demonstration of how one can use ML to over-fit data and generate unrealistic returns while obscuring the facts.

Oh god, you really should listen to Michael. He is so damn right. I'm sitting here penning another book - my foolish publishers came back for more. I wanted to have an entire first section on what NOT to do and had written quite a few chapters on the foolishness of relying on back testing in probabilistic trading.

The publishers askme me not to: readers only want to hear what DOES work apparently.

I am actually convinced that ML is a suitable tool for trend following but have absolutely no doubt that a 45% annual return is a fool's quest. Contrary to Michael I do believe in trends (in stocks at least) although even there I have been tricked and mislead in the past by over fitting.

After 30 years in the markets, 15 of those spent largely on systematic trading in one way or another, I deeply cynical. The hedge fund world mostly makes money for the fund managers who walk off with huge fees after their funds collapse. They then start another.

It seems we have two sides to the argument: machine learning experts that know little of real-world trading, and real-world traders lacking expertise in machine learning.

I've got addicted to ML. If I can develop profitable trading algorithms, great! If not, nevermind, there are plenty of decent looking fallback options.

I don't see any suggestion of intellectual dishonesty in that Lee paper, however I do agree it is annoying that papers are allowed to publish results without supporting code.

If anyone is interested in chatting ML, do pop into ##machinelearning on IRC Freenode.

Justin -- thanks for that reply, and the link!

π

PS I looked through the paper Patrick linked (http://link.springer.com/chapter/10.1007/978-3-319-42297-8_40), it look very sketchy. However the original paper appears robust as far as I can see. I will continue to attempt to replicate it.

@PI,

"machine learning experts that know little of real-world trading, and real-world traders lacking expertise in machine learning."

This alludes to a false dichotomy. Real world trading can be accomplished via a variety of methods including ML. Lacking ML experience may not be a disadvantage in many cases as it can save many exercises in futility.

"I don't see any suggestion of intellectual dishonesty in that Lee paper"

Someone would expect university researchers to be familiar with data-mining and data-snooping bias. The paper was about p-hacking with ML. This is disturbing for an academic paper. The exact number of trials to get to final result should have been reported. But this does not equate to intellectual dishonesty but to naive application of ML.

Good point about code but I suspect that even if you had the exact code you would still not being able to replicate the results due to stochasticity.

you would still not being able to replicate the results due to stochasticity.

You ought to be able to get close. In deep learning it seems random numbers are usually only used for generation of the initial weights.....although I am one of those with much experience in markets and little in ML!

@Anthony,

These are error rates from five successive runs of a Multilayer perceptron with exact same parameters on the exact same data from a project I am working for a client

0.4924
0.4762
0.4932
0.4933
0.4837

Yes but I wonder how those differences translate into CAGR in a trading system? I wonder whether it makes so much difference that some runs predict say 51℅ of stocks correctly each month or 52.4℅? Know the vagaries of back testing I suspect not?

ML is just fitting a non-linear equation with 10s' to 1000's of undetermined coefficients to data.
It seems like it would be impossible to avoid overfitting. In an upward or downward trending market, I suspect the ML algorithms
would just learn momentum rules.

If ML is going to work I think you will need to apply it to multiple stocks at once, throw in fundamental data, economic factors etc.
Then perhaps it can discover a pattern in a set of data too big for a human to look at.

The human brain is really good at recognizing patterns. If there were a pattern in the price history of a single stock
I think you would see it.

Hi All,

Just observing that a key part of the Takeuchi/Lee paper is to "stationarize" the data by turning it into a cross-sectional format.

in section 2.2:

"we compute a series of 12 cumulative returns using the monthly returns and 20 cumulative returns
using the the daily returns. We note that price momentum
is a cross-sectional phenomenon with winners
having high past returns and losers having low past returns
relative to other stocks. Thus we normalize each
of the cumulative returns by calculating the z-score relative
to the cross-section of all stocks for each month
or day"

If the statistics isn't stationary, the model will not converge or if it manages to converge (mathematically) it's not going to be very useful.

@David, I think they just kept on trying things until they got an impressive result. This is the definition of data-mining bias mainly driven by data-snooping. Nowhere in their paper there is a reference to data-mining bias.

@Michael, data snooping is definitely a possibility. However, the set-up seems quite plausible for fairly good results.. maybe not 50% returns but maybe up to 20% in a "normal" year. I know you spoke of 15% but I am optimistic perhaps naively.

Normal year: the paper didn't talk about more interesting things such as (macro) regime switching that might affect the test results. For instance momentum behaviour can be wildly different in the last quarter of 2008 vs. second through fourth quarter of 2009. If your test covers or misses 2008, it could change results.

A likely place data-snooping gets into the set-up (unless the authors actually kept trying with different setup) is the hold-out cross validation portion. In my experience this is where "leakage" can inadvertently be introduced into the system. By leakage I mean a leakage of future data. The authors never provided the details on the hold-out x-val but if they were not careful with how they created the test set or sets for the hold-out cross-validation, they probably committed the same mistakes in training the finished product...

Here's a Kaggle page on leakage:

https://www.kaggle.com/wiki/Leakage

From another platform's CEO:

[Many of those algorithms were developed by students using sophisticated machine learning methods like neural networks. “I’m impressed by the quality and stability of the trading algorithms"...]

Deep learning appears very important to stay competitive.

Alex,

"If you have offline access to relevant trading data, you could train a net from that on non-quantopian machines and then translate the resulting net to scipy for execution in the quantopian framework."

Is it that, I could execute in the quantopian framework but would be unable to join the contest? I have the relevant data. I am looking for ways to get some paper trade track record. I could use Interactive Brokers' paper trading but it is costly to have many IB accounts.

Greg - thanks for the info.

"It runs quite slow. To speed things up you may want to download price data from EOData (or other site) and work from that on your own machine."

After working with outside data and own machine, is there any short cut to change back the codes to upload back smoothly to quantopian?

"[Many of those algorithms were developed by students using sophisticated machine learning methods like neural networks. “I’m impressed by the quality and stability of the trading algorithms"...]"

But the assumption is that he does not know the algorithms...or am I missing something...

Maybe the next market regime change will sort things out...

@Michael

[deleted - see below post from Antony]

"Next market regime change", you mean, when some platforms cannot survive? Very interesting anyway.

@Alpha

I mean when market dynamics change, all the over-fitted ML systems will fail.

More information about the significance problem can be found in my paper: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2810170

@Alpha

For now the impact of these competitions is small. Market regime changes are driven by structural changes (algo trading in the late 1990s, decimalization, then HFT, etc.). In my opinion ensemble results are random http://www.priceactionlab.com/Blog/2016/09/data-science/

There is no way of distinguishing a low log loss due to multiple trials from a statistical significant one. These competitions are doomed in my opinion as more entrants mean further convergences of the sample mean to 0 true mean. Plus they have short-term risk of ruin which is uncontrollable, although small. The key to profits is identifying one or two robust features for the current regime and use those in simple algo. All else translates to more bias, more noise, more risk.

I think this thread has drifted off topic. If that is the case, could those responsible please create new threads & migrate accordingly? I would like to remain subscribed to this thread but only receive email notifications that pertain to the original subject.

I can implimentation that, working on Indian market, my interest is more on minute or five minute data. Also there can be far better use of deepnet if you combine this along with self learned patterns.

Anybody has experience on how to put the trained network into production? To be more specific, how to save the trained model and use it in the real time trading environment. Thanks.

Just doing that with my own machine learning algos on the VIX futures contracts. I will report back when done. But I won't be using it on Q or drafting it in Q since I use daily prices, futures contracts and a different python back testing engine.

I've been looking to back-test this since a long time. Finally, I took a stab at it. Here are my results (and settings):

Total no. of tickers: 2,585
Exchange: NYSE and NASDAQ
Date range: 2012-02-21 to 2016-11-29
Business days: 1,203
Train data: From start until 2015-12-31
Test data: From 2016-01-01 to end

Neural Network (Encoder-decoder)
• Architecture o (#nodes in each hidden layers): (33 i/p)-40-4-40-(33 o/p)
o Activation function for hidden layers: ReLu
o Activation function for output: Linear
• Optimization o batch_size=100,000
o Epochs=100
o Optimizer: Adam (learning rate: 0.001)
o Loss function: mse
• Performance (on training set) o Loss after 100 epochs = 0.1505

Neural Network (Classifier)
• Architecture o (#nodes in each hidden layers): (4 i/p)->20->(1 o/p)
o Activation function for hidden layers: ReLu
o Activation function for output: Sigmoid
• Optimization o batch_size=100,000
o Epochs=100
o Optimizer: Adam (learning rate: 0.01)
o Loss function: binary_crossentropy
o Regularization: 40% dropout in hidden layer
• Performance (on training set) o Loss after 100 epochs = 0.6926
o Accuracy (classification rate): 0.5141
• Performance (on test set) o Accuracy (classification rate): 0.4844
• Return (long top decile and short bottom decile) = -1.66% (annualized).

I used Quandl data (EOD dataset) to construct the 13 features as suggested in the paper.

I used different learning rates and regularization approaches but, results do not differ drastically. Interstingly, a naive approach to go long (on every stock) in given period yields +19.34% return. This is not surprising since the test period is 2016, and the market grew at an equivalent rate.

Looking forward to your thoughts.

@Michael Harris
I like your blogs but I think you are missing something for ML algo now. It can be adaptive if you are using rolling window with weights to retrain. That is the same process as we human being relearning the new environment. DNN may need more data but other ML algos might still be useful. The method in the paper might have "overfitted" the strategy in picking up the network architecture but as they are not directly optimizing on the final PnL I think the "overfitting" problem would be less severe than the normal trading system optimization on the final PnL/Sharpe/Sortino.

@Rajat Kathuria
I have carried out similar experiments on US stocks and I think your training size is a little bit too small. Nevertheless, the system is not doing very well since 2016 in my setups even if I have used cross-validation to tune the nn/ML structure. The best period in my test period(2000-2017) was right after the tech bubble which corroborates figure 4 in the Stanford paper. Post-2000, my monthly return is much lower(~20% CAGR, 1.6 Sharpe, 16% MaxDD) than the # reported in the paper partly because of using only post-2000 in the test sample.

@qi Chen

Adding more data may not help, since

  1. Currently, training data has close to 3 million obs
  2. Training on data too far back in time, may result in NN mistuned to current scenario.

@Rajat Kathuria

I see your point but I think the original paper was forecasting monthly returns instead of daily returns so you would only have 2500*12*5=150K data point. With ~half for training, you "only" have ~75K data for a deep NN which might be too small?

I guess your usage of forecasting daily returns versus monthly returns might explain why your test resulted in a negative CAGR while mine is still positive albeit much smaller than in the paper.

@Qi Chen

I, too, forecast monthly returns but, I do not constrain constructing the features for just 1st day of every month. I construct them for every day. This way I have 2,515*1,203=~3M obs. When computing PnL however, I choose a particular day of the month to invest/close a position.

I acknowledge that this way consecutive days will not have much variation in input features/outcome.

Nonetheless, I'll try training on more isolated dates (one each month) as you suggested.

Does anyone know where to find a stacked auto-encoder algorithm that is already coded up in Python?

Or any other algorithm (already coded up) that does unsupervised learning of features that can be used to select or weigh factors.

Hi John Strong,

Not sure exactly if this is what you're looking for but here's a sample code courtesy of Keras:

input_img = Input(shape=(784,))  
encoded = Dense(128, activation='relu')(input_img)  
encoded = Dense(64, activation='relu')(encoded)  
encoded = Dense(32, activation='relu')(encoded)

decoded = Dense(64, activation='relu')(encoded)  
decoded = Dense(128, activation='relu')(decoded)  
decoded = Dense(784, activation='sigmoid')(decoded)

autoencoder = Model(input_img, decoded)  
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')

autoencoder.fit(x_train, x_train,  
                epochs=100,  
                batch_size=256,  
                shuffle=True,  
                validation_data=(x_test, x_test))