Back to Community
Random Forest for Universe of Stock - Extension of ( Simple Machine Learning Example )

Hi,

This is an extension of “Simple Machine Learning Example” by Gus Gordon. This code has a capacity to train the data using random forest of each stock for the whole universe.

Let me first start by saying this is very inefficient code and is not suitable for testing with minute data. The reason is that I am accessing historical price data ( the last 500 or so days ) each time I called handle_data function. So, that is clearly a waste since I should have just stored the last 500 days of price data and add the last piece each new day.

Since the contents of stocks we are testing are constantly changing, I am thinking to create a dictionary of deque. Please let me know if you think there is a better way.
Currently, I don’t understand enough of the underlying structure when I accessing “data” by calling “handle_data(context,data)” I tried to print it. It looks something like this. I cannot see the whole data since there is logging limit.

And, I cannot see the type of this “data” since Quantopian prohibits the use of function “type.” So, I would really appreciate if someone tells me more about this data so that I can manipulate and access each part of ‘data’ object.
For strategy perspective, it appears that feeding only its own historical price series is not enough to create a profitable machine learning algorithm, ( at least for this Random Forest. ) I will try other classification algorithms later on..

BarData({Security(14848, symbol='YHOO', security_name='YAHOO INC', exchange='NASDAQ GLOBAL MARKET', start_date=datetime.datetime(1996, 4, 12, 0, 0, tzinfo=), end_date=datetime.datetime(2014, 10, 29, 0, 0, tzinfo=), first_traded=None): SIDData({'high': 34.47, 'price': 33.875, 'volume': 17874901, 'open_price': 33.99, 'low': 33.67, 'sid': Security(14848, symbol='YHOO', security_name='YAHOO INC', exchange='NASDAQ GLOBAL MARKET', start_date=datetime.datetime(1996, 4, 12, 0, 0, tzinfo=), end_date=datetime.datetime(2014, 10, 29, 0, 0, tzinfo=), first_traded=None), 'source_id': 'DynamicTradeSource9ff47b00706a65c8c7907de35ef40e9e', 'close_price': 33.875, 'dt': datetime.datetime(2014, 5, 20, 0, 0, tzinfo=), 'type': 4}), Security(5121, symbol='MU', security_name='MICRON TECHNOLOGY INC', exchange='NASDAQ GLOBAL SELECT MARKET', start_date=datetime.datetime(1993, 1, 4, 0, 0, tzinfo=), end_date=datetime.datetime(2014, 10, 29, 0, 0, tzinfo=), first_traded=None): SIDData({'high': 27.16, 'pri...

Clone Algorithm
41
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
from pytz import timezone
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from collections import deque


def initialize(context):
    
   # context.stocks = [sid(698), sid(12691), sid(7883), sid(3136),sid(6583),sid(18738),sid(5387), sid(3735), sid(3766), sid(25555),sid(1898),sid(14848),sid(5061),sid(5692),sid(42950),sid(38989)]
   # context.stocks = [sid(698)]
    set_universe(universe.DollarVolumeUniverse(floor_percentile=98.0,ceiling_percentile=100.0))
    #context.other_stocks = sid(698)
    context.classifier = RandomForestClassifier() # Use a random forest classifier
    context.prediction=0

def handle_data(context, data):

    price_history = history(bar_count=511, frequency='1d', field='price')
    # the bar_count here is actually quite important... just because this would esentially set 
    # the number of training examples used each time
    
    
    print data
    
    for stock in data:
        print stock
       
        X,Y,Z=getting_X_and_Y(price_history[stock])

            

        context.classifier.fit(X, Y) # Generate the model
        context.prediction = context.classifier.predict(Z) # Predict
        order_target_percent(stock, context.prediction*0.00625) # because there are about 160 stocks in the portfolio, 
        # each stock will have equal percentage in the portfolio




            
def getting_X_and_Y(price_history_data):
    # x and y are sample input and output that we use to train data
    # z is the current data, from which we like to predict 
    # z should have the same form as X
    x=[]
    y=[]
    price_series=[]

    for z in price_history_data:
        price_series.append(z)
            
       
    
    
    length=len(price_series)
    changes= np.diff(price_series) > 0
    for i in range(0,length-11):
        x.append(changes[i:i+10])
        y.append(changes[i+10])
    
    z=changes[-10:]

    return x,y,z


         
There was a runtime error.