Back to Community
Simple Machine Learning Example Mk II

My original machine learning example was a popular post, and I figure it's about time for an update.

Although machine learning usually seems complicated at first, it's actually easy to work with.

Here, a model is created based off of past events and their outcomes. There are 3 input variables, or previous events, considered in this algorithm. They are the previous 3 days' changes in price. The outcome is whether a price increased or decreased in the following bar. Many of these events and their outcomes are used to generate a model using regression in scikit-learn . The model is then used to try to predict future changes in price.

Note that this is just an example, and should be improved before real use. Clone the algorithm, and let me know if you have any questions!

Clone Algorithm
2592
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 55b1026d5c1b8b0c61ad2d2a
We have migrated this algorithm to work with a new version of the Quantopian API. The code is different than the original version, but the investment rationale of the algorithm has not changed. We've put everything you need to know here on one page.
There was a runtime error.
Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

36 responses

Cheers for posting!

Code is so clean and simple compared to the models I have seen written entirely in python

Very good concept . . and the framework given is also neat.

I tried my hands at it and found that everytime I backtest without changing code over
same date range results are different.
ie every time the model built with same input and output data set is behaving differently.

Dears,

Can anyone nominate a good trading Algo for me? I need to buy a good one with monthly ROI 10%

Thank you

How come you are generating the model at the end of each week instead of each day? and have you had much experience with the other esembles like classifier etc?

Yagnesh, that's just because of the model I am using — random forest. A different non-random model could be chosen, as well.

d36, that's a mostly arbitrary choice I made as an example, and in hopes to slightly speed up the backtest. The model should hardly change much day to day when I am considering 400 days of history, but it will usually change a nontrivial amount week to week.

I thought so. Even similar experience with neural nets they learn different things from same
set of data every time.

I am not expert on machine learning but went through sklearn and think 'polynomial interpolation'
and Decision tree regression with Adaboost has some promise.

Yeah, there is definitely a ton of background on selecting different models for different situations. It is very easy to use machine learning to solve a problem decently well, but much more difficult to be sure how it is applying to the situation and to select exactly the right model. Many models can be applied generally, though, and work fine in most situations.

In the trade method, should the prediction be based on recent_prices or the first difference of recent_prices, like in create_model?

You're right! My mistake. I was mucking around and I forgot to add that line back in. I updated the post, thanks for pointing that out.

Hi,author.Could you supply the code with minute ?I am curious about the performance in miuntes.Many Thanks

Actually, the code is in minutes! This is indicated above the backtest. Again, this is just a code example and you should not expect to make money with this.

Hello Gus!
I was wondering what this 'fit' method really does..
In simple words can we say that it looks at the current last 3 days and search back in history if there was something similar and then consider the day after these 3 days patterns to see if history repeat itself to make a prediction?
In this case i have a question..
(sorry for my bad english :) ) i was doing something that looks similar but with a 'manual' approach, do you think this machine learning method is more efficient than a simple correlation analysis when it comes to find similar patterns in history?

Loading notebook preview...
Notebook previews are currently unavailable.

i have updated my notebook with some comments ( cannot edit the previous notebook attachment )

Loading notebook preview...
Notebook previews are currently unavailable.

Thanks for the explanation of your work, cool notebook!

That's what it does to an extent, however, this is dependent upon the model used by sk-learn. For example, the RandomForestRegressor I'm using makes use of decision trees. It's not an easy subject, and I'm not an expert myself, but you can read about it here: http://scikit-learn.org/stable/modules/classes.html#module-sklearn.ensemble

It looks to me like you made your own sort-of model that you are using to look at the price time series. You can simply compare models to see what works best, but beware that you may be overfitting. Also, generally, it's difficult to make signals based purely off of price time series work! It may be wise to implement some other streams of information in, like volume data etc. Good luck.

I haven't been able to replicate your backtest results so far, Gus. I take it the algorithm is stochastic, since I get different results each time I run it. I have yet to see it produce positive returns, though. This is what I get running the code as-is on the same time period. Any idea why the stochastic effects might be giving such drastically different outcomes? Is there a good way to make this deterministic?

Clone Algorithm
18
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 568c6b1bb91075116b982f2c
There was a runtime error.

The reason it's stochastic is because this uses a random forest model. I just selected this as an example because it's a common model. You can read about alternative models here. As I noted, this is just an example, and you should not expect to get consistent positive returns out of something like this. In order to improve the likelihood of success, you should broaden the scope of the input data (for example, include trade volume, earnings, etc.). It's generally very difficult to get an algorithm that yields consistently positive returns. Good luck!

Just a small FYI for anyone interested in ensemble models, default number of individual classifiers in sklearn is very (too) small. Random forest should have at least hundreds (thousands is better) classifiers. Default in sklearn is 10 (!). As these are [almost] random decision trees it's no wonder that youy will get wildly different results each time.

So, I saw this and was interested in the algo. When I ran it, I noticed something rather curious {not the randomforrest(), that I understand}. Rather, that when running the algo, it would place a buy order, and then the subsequent day place a sell order for twice the amount held. Then it gets more bazzare. The algo would buy at 140.57, sell at 139.00; then magic occurs, and your cash value goes up a few grand.

I ran this back test during 1/1/2008-1/1/2009 to stress test, but I think I broke something.

Ideas??

Clone Algorithm
2
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 56b7b909014e0512a0351075
There was a runtime error.

Corey, that is to be expected because we are either going short or long. To go from long to short we For your second point, if we short at $140.57 and then reduce our position to zero at $139.00 then we have made just made $1.57 per share. Are you sure you're looking at a long and not a short? Other unexpected cash fluctuations may occur for a variety of reasons: slippage taking effect, commission, only being able to buy integer numbers of shares, etc. These are all designed to best simulate market conditions. Cash fluctuations should not be too surprising, although a few thousand dollars is a lot in this case, so that sounds like profit showing up in cash later due to us trading in daily mode. If you want to dig deeper you can log parameters of interest or run in minute mode. Hope that helps.

I ran this with a couple changes:
1. change stock symbol to BDCL
2. change look back to 10 days.

It lost about 100% of the money, so then I changed logic to sell short when the prediction was up and it still lost about 100%.

Is it just frequent trading and commissions that are killing it?

Clone Algorithm
8
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 56bc26352abf4512c273e559
There was a runtime error.

Looks like it is mostly due to slippage in this case. This can be fixed by limiting trading amount or limiting trading frequency. Here is your algorithm with slippage and commission off:

Clone Algorithm
23
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 56be629b6d5da912ce2635db
There was a runtime error.

Gus,

I've been playing around with this model, and I've been trying to add in another independent variable, but keep getting errors. Any chance you could post an updated backtest for us that has something like volume as an additional independent variable?

Many thanks,

Joseph

The best way to do this, I think, is to just add the inputs together into the same list for each sample. Here's an example with volume as an input. I also changed how often the model was generated and when we trade. Remember, this is just an example of how to use some sklearn models. There are also better ways to implement this now, like with the Pipeline API.

Clone Algorithm
107
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 580664cf834129105c8e9a94
There was a runtime error.

Gotcha. Thanks for taking the time to post this, Gus!

@ Gus Gordon
Hello Gus, I am not able to train any other model using your method of implementation ML. I tried your code as template and applied QDA(),LogisticRegression,SVC() on features like daily_returns,multiple_day_returns, rolling_mean and time_lagged but each time except RandomForest It returns error that model is not fit yet.

Here is the code:
https://gist.github.com/arshpreetsingh/7ac097ae9097a7a859976342db8bbe93

Is there something wrong in the implementation? On the other hand in my local IPython /Jupyter notebook I am able to train models.

Clone Algorithm
37
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 58bfff57307b7361f11755f4
There was a runtime error.

Gus, I wonder can the framework be easily modified to have multiple stocks as input?

Giuseppe, your notebook is very cool, do you have an algorithm framework similar to this one for it?

I modified the code to include multiple stocks as an input to Random Forest Regressor. A new model is trained for each stock. I parameterized the regressor with 64 trees and a max depth of 6. I suppose the parameters can be tuned using cross-validation. A good alternative to Random Forest is XGBoost (https://github.com/dmlc/xgboost). However, it's not available yet in the python IDE.

Clone Algorithm
18
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 593638d503b3c26dec68ea4b
There was a runtime error.

How does something like scikit-learn get added as a support package to this quantopian platform?

I tested it with volume alone and a different lookback period and history range. Seems like it can still generate alpha.

Clone Algorithm
128
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 59723c59ee43454e17f6bc2e
There was a runtime error.

Thanks for sharing, it is an amazing concept and the framework is really looking as clean as possible. We need more people like you in the world!

When people first explained machine learning to me it sounded like rocket science, but as you can see from this explanation, it's really not that hard to work with. Thanks Gus! Really great post.

I ran a backtest using volume to predict SPY moves but then had the algo trade XIV instead. Huge drawdowns so this strategy needs more work, but recent results seem promising.

Clone Algorithm
133
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 5979c87520de9451f599133f
There was a runtime error.

Gus, any idea how to add a second variable to the Random Forrest Prediction? For example, if I wanted to add the MACD from Ta-lib as a variable, how would I do that?

I need help using this algo with order_optimal_portfolio. Can someone recreate?

@Vadim Smolyakov I have tried using this algo with Gradient Boosting Regression from sklearn. It's similar to that of XGBoost in that they both use gradient boosted trees, a form of ensemble learning. Does anyone know if Quantopian has LightGBM support? It is much more faster and efficient due to its low-level architecture.

Recreated using two regression models: prediction = 0.5 random forest + 0.5 gradient boosting. The idea is that we want don't base our decision purely off of one regression technique. Still managed to create alpha using MSFT instead of SPY. Attaching a visual pipeline image that explains the process soon. Still assumes slippage and commissions.

Clone Algorithm
55
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 5bf2ef70d7117d42b70b3e71
There was a runtime error.

Adding some more complexity to the "simple" program just for fun :

  1. Utilizing a scaler on the training data of this trading algorithm using Sklearn's MinMaxScaler, so that the data is now scaled from -1 to 1 .
  2. Giving the ability to use multiple regression models to create a prediction.
  3. Healthcare + tech are a good combo since they naturally diversify from each other.
  4. Decreasing the frequency of shorting to once a week (previously, there was a chance it would sell everyday).

ps. I wouldn't use gradient boosting or any ensemble methods in the real world. Bayesian classifiers, in my opinion, should have a better performance.
@Vadim Smolyakovl ,would there not be a flaw in using cross-validation with time-series data since we are dealing with sequential movements?

Clone Algorithm
28
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 5c0b3fae4cc6f14ab3c9c658
There was a runtime error.