First Attempt at Machine Learning - Perceptron Algorithm Using past 30 Days of Trading Data

ML is never too far away from us, and the idea of perceptron algorithm is pretty straight-forward. In this experiment I divided past 30 days of trading data into 6 groups, and see classify that group as "positive" if the average price of the first four days is less than the fifth day (trend of increasing prices), and "negative" vice versa. Then I train these data, starting with a zero vector, using iterative correction to make that vector able to predict whether the next day has a increasing or decreasing price, and decide to buy or sell accordingly. The performance is not bad - even though the max drawdown is almost 100% (not suitable for risk-averse people)

Also note that it takes quite some time to backtest, which suggests that ML algos are generally computation intensive.

For some background, wikipedia or lecture notes from MIT EECS department (1) and (2) can be very helpful.

Some ideas to tweak it:
1) training periods: currently I set it at 5, meaning training data are each visited five times
2) max,min notional: I split 1,000,000 bucks evenly to the seven stocks in my portfolio
3) instead of buying/selling 1000 shares each time, maybe as indicator becomes larger (stronger increase/decrease signal) we can buy/sell more shares

Any comments and suggestions are welcomed!

Please note there is a typo in the code (thanks to Chris). Corrected algo is attached as a later post.
I'm also trying to use set_universe and believe that's gonna be a better approach for choosing stocks :)

390
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
 Returns 1 Month 3 Month 6 Month 12 Month
 Alpha 1 Month 3 Month 6 Month 12 Month
 Beta 1 Month 3 Month 6 Month 12 Month
 Sharpe 1 Month 3 Month 6 Month 12 Month
 Sortino 1 Month 3 Month 6 Month 12 Month
 Volatility 1 Month 3 Month 6 Month 12 Month
 Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 51316450d55c0b086be5b5f1
This backtest was created using an older version of the backtester. Please re-run this backtest to see results using the latest backtester. Learn more about the recent changes.
There was a runtime error.
12 responses

Taibo,

Very cool - you should try using set_universe and see how this approach does without hand-picked stocks.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Thanks Taibo,

I'll have to read a bit about the perceptron algorithm. For me, the MIT EECS links above don't work. I get:

``````This XML file does not appear to have any style information associated with it. The document tree is shown below.
``````

Also, Thomas Wiecki alerted me to potential problem with calling the batch transform more than once per handle_data() call (see https://www.quantopian.com/posts/olmar-implementation-fixed-bug). He states:

The way we were using the batch_transform is not valid since one can only call it once in every handle_data() call. The reason is that calling the batch_transform also appends the data frame to the window, so if you call it multiple times it will get confused.

I had been using code similar to yours, iterating over sids and associated calls to a batch transform. This, apparently, can corrupt data without an error message.

Fawce, is this correct?

I expect that your code can be re-written so that you make only one call per batch transform per call to handle_data() (might help your execution speed problem, too). For example, this will return a numpy ndarray of all of the sid prices with a single call:

``````prices = get_prices(data,context)

@batch_transform(refresh_period=R_P, window_length=W_L) # set globals R_P & W_L above
def get_prices(datapanel,sids):
return datapanel['price'].values
``````

Grant

As much as I hate to say this. There are numerous bugs in the code. the result is that you use FUTURE price values during the training phase of your preceptron, so no wonder you are getting this stellar performance.

Rule no.1 of strategy backtesting: If it looks to good to be true, it probably is too good to be true !

Hint: look at what you are REALLY calculating for testY here:

``````def caltheta(datapanel, sid, num):
prices=datapanel['price'][sid]
for i in range(len(prices)):
if prices[i] is None:
return None
testX=[[prices[i] for i in range(j,j+4)] for j in range(0,30,5)]
avg=[np.average(testX[k]) for k in range(len(testX))]
testY=[np.sign(prices[5*i+4]-avg[i]) for i in range(len(testX))]
theta=perceptron(testX,testY,num)
return theta
``````

cheers

Hello Censix - I agree that the result is too good to be true, but it's not because there is any data leaking in from the future. Zipline, the Quantopian backtester, reads in data one event at a time. This backtest is run in daily mode, so that's one event per day. Orders are placed at the end of the day, and are filled the following day. There is no future data available to the algorithm; it is completely insulated from future knowledge.

The reason this algo has such a huge return is that it is immensely leveraged - by the end it has borrowed \$138,000,000 in cash! It also has a max drawdown of 99.85%. I agree that it's not a practical algo at this point, but the concept that Taibo has put in play here is interesting. I cloned it and I'll try to make it truer to reality later.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

isn't this buying every day even if the indicator is < the order parameter is +
line 37.

Dan, i think we agree that nobody would use the algo as it is for trading, unless you have the strong desire to go bankrupt very fast. My reason for claiming that is uses future data is as follows. maybe you can clarify where I am wrong, if I'm wrong.

``````def handle_data(context, data):
....
theta=caltheta(data,stock,5)  # <--- to me it looks like the predictor theta is trained on all the available 'data', not on the 30 day rolling historical data 'historicalPrices' that are calculated on the next line. And this is done for every iteration = every day in the dataset
historicalPrices=get_prices(data,stock)
indicator=np.dot(theta,historicalPrices)

``````

So the backtest run here is entirely 'in-sample' . What is needed is a backtest on
out-of-sample data and see how it performs, since the in-sample performance only shows that you have fit a model reasonably well.

As for "in-sample performance", this is central topic of ML - how well are training errors and generalization errors related. The very fact that ML exists shows that we generally believe a good training error is enough to predict future performances. Apparently you can't use future data to train your sample, but only use them to test it.

@Chris: the indicator predicts whether the price is greater than historical average. This is a very rough binary linear classification model, so I'll need to see if it can be improved.

Taibo. most of my earlier comments result from the fact that there was some misunderstanding about the way quantopian uses its 'batch_transform' within the strategy code. I recognize your strategy now for what it is. training AND predicting on rolling windows. In that sense the in-sample, out-of-sample argument does not really apply either. Well, lets hope people can improve the performance by minimizing the drawdowns. interesting work.

if indicator<=0 and notional[i]>context.min_notional:
order(stock,1000)
log.info("1000 shares of %s sold." %stock)

How is this order a sale? The order function takes negative values to mean sale am i correct?

Oops thanks Chris for pointing that out... that explains why the return is so high and the drawdown is also so high .. my bad I forgot to add the minus sign there.
So I tweaked the algo by linking the orders to the indicator. And finally it comes back to normal level.

390
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
 Returns 1 Month 3 Month 6 Month 12 Month
 Alpha 1 Month 3 Month 6 Month 12 Month
 Beta 1 Month 3 Month 6 Month 12 Month
 Sharpe 1 Month 3 Month 6 Month 12 Month
 Sortino 1 Month 3 Month 6 Month 12 Month
 Volatility 1 Month 3 Month 6 Month 12 Month
 Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 5136576b6dfbd40a68be450e
This backtest was created using an older version of the backtester. Please re-run this backtest to see results using the latest backtester. Learn more about the recent changes.
There was a runtime error.

Doesn't this method benefit from being trained for awhile before you give it a whirl. Btw TradingAlgorithm should have a feature for this...(grace period). It should be constantly adjusting based on the new classifications...