Back to Posts
Listen to Thread
Edited

ML is never too far away from us, and the idea of perceptron algorithm is pretty straight-forward. In this experiment I divided past 30 days of trading data into 6 groups, and see classify that group as "positive" if the average price of the first four days is less than the fifth day (trend of increasing prices), and "negative" vice versa. Then I train these data, starting with a zero vector, using iterative correction to make that vector able to predict whether the next day has a increasing or decreasing price, and decide to buy or sell accordingly. The performance is not bad - even though the max drawdown is almost 100% (not suitable for risk-averse people)

Also note that it takes quite some time to backtest, which suggests that ML algos are generally computation intensive.

For some background, wikipedia or lecture notes from MIT EECS department (1) and (2) can be very helpful.

Some ideas to tweak it:
1) training periods: currently I set it at 5, meaning training data are each visited five times
2) max,min notional: I split 1,000,000 bucks evenly to the seven stocks in my portfolio
3) instead of buying/selling 1000 shares each time, maybe as indicator becomes larger (stronger increase/decrease signal) we can buy/sell more shares

Any comments and suggestions are welcomed!

Please note there is a typo in the code (thanks to Chris). Corrected algo is attached as a later post.
I'm also trying to use set_universe and believe that's gonna be a better approach for choosing stocks :)

Clone Algorithm
262
Loading...
Backtest from to with initial capital ( data)
Cumulative performance:
Algorithm Benchmark
Custom data:
Week
Month
All
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Information Ratio
--
Benchmark Returns
--
Volatility
--
Max Drawdown
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Information Ratio 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
This backtest was created using an older version of the backtester. Please re-run this backtest to see results using the latest backtester. Learn more about the recent changes.
There was a runtime error.

Taibo,

Very cool - you should try using set_universe and see how this approach does without hand-picked stocks.

Thanks Taibo,

I'll have to read a bit about the perceptron algorithm. For me, the MIT EECS links above don't work. I get:

This XML file does not appear to have any style information associated with it. The document tree is shown below.  
      <Error><Code>AccessDenied</Code><Message>Request has expired</Message><RequestId>FB95668E31FC936E</RequestId><Expires>2013-03-02T05:31:40Z</Expires><HostId>adltiryJ5gFj/VC9YScNwD+hWCUP/LQWQncEnCzrzgLLB2DHCPkfa1hkJUkYq16D</HostId><ServerTime>2013-03-02T12:38:49Z</ServerTime></Error>  

Also, Thomas Wiecki alerted me to potential problem with calling the batch transform more than once per handle_data() call (see https://www.quantopian.com/posts/olmar-implementation-fixed-bug). He states:

The way we were using the batch_transform is not valid since one can only call it once in every handle_data() call. The reason is that calling the batch_transform also appends the data frame to the window, so if you call it multiple times it will get confused.

I had been using code similar to yours, iterating over sids and associated calls to a batch transform. This, apparently, can corrupt data without an error message.

Fawce, is this correct?

I expect that your code can be re-written so that you make only one call per batch transform per call to handle_data() (might help your execution speed problem, too). For example, this will return a numpy ndarray of all of the sid prices with a single call:

prices = get_prices(data,context)

@batch_transform(refresh_period=R_P, window_length=W_L) # set globals R_P & W_L above  
def get_prices(datapanel,sids):  
    return datapanel['price'].values  

Grant

As much as I hate to say this. There are numerous bugs in the code. the result is that you use FUTURE price values during the training phase of your preceptron, so no wonder you are getting this stellar performance.

Rule no.1 of strategy backtesting: If it looks to good to be true, it probably is too good to be true !

Hint: look at what you are REALLY calculating for testY here:

def caltheta(datapanel, sid, num):  
    prices=datapanel['price'][sid]  
    for i in range(len(prices)):  
        if prices[i] is None:  
            return None  
    testX=[[prices[i] for i in range(j,j+4)] for j in range(0,30,5)]  
    avg=[np.average(testX[k]) for k in range(len(testX))]  
    testY=[np.sign(prices[5*i+4]-avg[i]) for i in range(len(testX))]  
    theta=perceptron(testX,testY,num)  
    return theta  

cheers

http://censix.com

Hello Censix - I agree that the result is too good to be true, but it's not because there is any data leaking in from the future. Zipline, the Quantopian backtester, reads in data one event at a time. This backtest is run in daily mode, so that's one event per day. Orders are placed at the end of the day, and are filled the following day. There is no future data available to the algorithm; it is completely insulated from future knowledge.

The reason this algo has such a huge return is that it is immensely leveraged - by the end it has borrowed $138,000,000 in cash! It also has a max drawdown of 99.85%. I agree that it's not a practical algo at this point, but the concept that Taibo has put in play here is interesting. I cloned it and I'll try to make it truer to reality later.

isn't this buying every day even if the indicator is < the order parameter is +
line 37.

Dan, i think we agree that nobody would use the algo as it is for trading, unless you have the strong desire to go bankrupt very fast. My reason for claiming that is uses future data is as follows. maybe you can clarify where I am wrong, if I'm wrong.

def handle_data(context, data):  
    ....  
        theta=caltheta(data,stock,5)  # <--- to me it looks like the predictor theta is trained on all the available 'data', not on the 30 day rolling historical data 'historicalPrices' that are calculated on the next line. And this is done for every iteration = every day in the dataset  
        historicalPrices=get_prices(data,stock)  
        indicator=np.dot(theta,historicalPrices)

So the backtest run here is entirely 'in-sample' . What is needed is a backtest on
out-of-sample data and see how it performs, since the in-sample performance only shows that you have fit a model reasonably well.

Thanks to all the reply! Sorry about the file link. I'll try to upload it somewhere else.

As for "in-sample performance", this is central topic of ML - how well are training errors and generalization errors related. The very fact that ML exists shows that we generally believe a good training error is enough to predict future performances. Apparently you can't use future data to train your sample, but only use them to test it.

@Chris: the indicator predicts whether the price is greater than historical average. This is a very rough binary linear classification model, so I'll need to see if it can be improved.

Taibo. most of my earlier comments result from the fact that there was some misunderstanding about the way quantopian uses its 'batch_transform' within the strategy code. I recognize your strategy now for what it is. training AND predicting on rolling windows. In that sense the in-sample, out-of-sample argument does not really apply either. Well, lets hope people can improve the performance by minimizing the drawdowns. interesting work.

if indicator<=0 and notional[i]>context.min_notional:
order(stock,1000)
log.info("1000 shares of %s sold." %stock)

How is this order a sale? The order function takes negative values to mean sale am i correct?

Oops thanks Chris for pointing that out... that explains why the return is so high and the drawdown is also so high .. my bad I forgot to add the minus sign there.
So I tweaked the algo by linking the orders to the indicator. And finally it comes back to normal level.

Clone Algorithm
262
Loading...
Backtest from to with initial capital ( data)
Cumulative performance:
Algorithm Benchmark
Custom data:
Week
Month
All
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Information Ratio
--
Benchmark Returns
--
Volatility
--
Max Drawdown
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Information Ratio 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
This backtest was created using an older version of the backtester. Please re-run this backtest to see results using the latest backtester. Learn more about the recent changes.
There was a runtime error.

Doesn't this method benefit from being trained for awhile before you give it a whirl. Btw TradingAlgorithm should have a feature for this...(grace period). It should be constantly adjusting based on the new classifications...

Log in to reply to this thread.
Not a member? Sign up!