I have seen Gus Gordon's algorithm which uses machine learning method--Random Forest.
Simple Machine Learning Example
I have tried some other machine learning methods, including svm and AdaBoost. It turns out that svm performs the best in the same case of the single security--Boeing. Also, I attempted to adjust the parameters involved in the algo, including the number of training samples and the length of features.
I think only use prices as features may not be enough, and I add the volumes to the feature vector.
Inspired by the idea of ensemble learning, which enhances the performance of weak classifiers, I combined three classifiers together, gave them different voting weights. In addition, the weights of classifiers are parameters that can be learned further, but it would be a little bit complicated.
Fortunately, the algo performs somewhat better than before. However, In my experiments, it seems that the performance of the algo depends on the security I choose to a great extent, which means it's not as stable as the benchmark. Maybe a portfolio of more securities would be better.
This is the first time I post my idea. If anything wrong, please just tell me. Thanks!
# Use three machine learning methods. More here: http://scikit-learn.org/stable/user_guide.html from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier from sklearn import svm from collections import deque import numpy as np def initialize(context): context.security = sid(698) # Boeing context.window_length = 10 # Amount of prior bars to study context.clf1 = RandomForestClassifier(n_estimators=20) context.clf2 = AdaBoostClassifier(n_estimators=10) context.clf3 = svm.SVC() # deques are lists with a maximum length where old entries are shifted out context.recent_prices = deque(maxlen=context.window_length+2) # Stores recent prices context.recent_volumes = deque(maxlen=context.window_length+2) # Stores recent volumes context.X = deque(maxlen=500) # Independent, or input variables context.Y = deque(maxlen=500) # Dependent, or output variable context.prediction1 = 0 context.prediction2 = 0 context.prediction3 = 0 def handle_data(context, data): context.recent_prices.append(data[context.security].price) # Update the recent prices context.recent_volumes.append(data[context.security].volume) # Update the recent volumes if len(context.recent_prices) == context.window_length+2: # If there's enough recent price data # Make a list of 1's and 0's, 1 when the price increased from the prior bar price_changes = np.diff(context.recent_prices) > 0 volume_changes = np.diff(context.recent_volumes) > 0 feature = np.append(price_changes[:-1], volume_changes[:-1]) context.X.append(feature) # Add independent variables, the prior changes context.Y.append(price_changes[-1]) # Add dependent variable, the final change if len(context.Y) >= 50: # There needs to be enough data points to make a good model context.clf1.fit(context.X, context.Y) # Generate the model1 context.clf2.fit(context.X, context.Y) # Generate the model2 context.clf3.fit(context.X, context.Y) # Generate the model3 target_feature = np.append(price_changes[1:], volume_changes[1:]) context.prediction1 = context.clf1.predict(target_feature) context.prediction2 = context.clf2.predict(target_feature) context.prediction3 = context.clf3.predict(target_feature) # use weighted voting to get position percentage position = 0 if context.prediction1: position += 0.1 if context.prediction2: position += 0.1 if context.prediction3: position += 0.8 log.info(position) order_target_percent(context.security, position) record(prediction1=int(context.prediction1), prediction2 = int(context.prediction2), prediction3=int(context.prediction3))