Back to Community
Machine Learning

Place of Contemplation: Machine Learning @ Quantopian

6 responses

Machine Learning on Quantopian Part 3: Building an Algorithm Thomas Wiecki : Nov 9, 2017

Machine Learning on Quantopian Part 2: ML as a Factor Thomas Wiecki : Jun 23, 2017

Multiple Model Machine Learning Eric Novinson : Aug 5, 2017

Machine Learning With Multiple Random Forest Models Eric Novinson : Aug 12, 2017

Market/Security Prediction using Machine Learning Classifier and Google Trend Luc Prieur : Aug 22, 2017

Uses fetch_csv to get the factors, computed using ScikitLearn in Notebook, into a scheduled function:

data.current(data.fetcher_assets, ['probability', 'predictiveness', 'prediction'])  

Support Vector Machine in Pipeline Luc Prieur : Sep 29, 2017

Algorithm template: ScikitLearn SVM & GaussianNB in Pipeline to train ML model by stock to return predictions on stock movement for long/short positions.

Trading with Sentiment Machine Learning Hefei YU : Dec 7, 2017

Applies an NTU paper using ScikitLearn LogisticRegression, RandomForestClassifier & SVCs for Sentiment Analysis and Machine Learning to predict stock price movements.

Machine Learning Stock Selection + Mean Variance Portfolio Optimization Jun Ouyang : Dec 13, 2017

Predicts using ScikitLearn RandomForestClassifier & DecisionTreeClassifier with technical indicators: RSI MACD EMA SMA & ADX
Optimises using SciPy opt for Markowitz Mean Variance Optimization to construct a one-week portfolio.

Neural Network Tests for Mean Reversion or Momentum Trending Seong Lee : Oct 3, 2013

Uses Hurst Exponent and Sharpe Ratio as inputs to train for a sample of days before using stock data.

Alphalens Boilerplate to test ML Factor Luca : Mar 11, 2018

  1. ML factor is run with a single predictive factor in input.
  2. A slightly predictive factor is added to the mix.
  3. Analyze the performance of ML with the predictive factor in input

Text Mining Corporate Filings by Yin Luo (QuantCon NYC 2017) Phoebe Jordan : Mar 8, 2018

Applies web scraping, distributed cloud computing, NLP and machine learning techniques to systematically analyze corporate filings from the EDGAR database. Yin Luo's NLP-based stock selection signals have strong and consistent performance, with low turnover and slow decay, and is uncorrelated to traditional factors.

Machine Learning Part 1 : Intro to Advances in Financial Machine Learning by Dr Marcos Lopez de Prado Anthony Cavallaro : Aug 23, 2018

Machine Learning Part 2 : Data Structures for Financial Machine Learning Anthony Cavallaro : Oct 1, 2018

Machine Learning Part 3 : Labeling Data for Financial Machine Learning Anthony Cavallaro : Jan 18, 2019

Wikipedia: Reinforcement Learning


Reinforcement Learning API Raphael Meudec : Aug 1, 2018

Deep Q-Learning with Keras and Gym keon : Feb 6, 2017


Reinforcement Learning Denny Britz : May 29, 2018

Reinforcement Learning for Stock Prediction Siraj Raval from Edward Lu : Sep 7, 2017

Deep Reinforcement Learning in Trading Saeed Rahman : May 11, 2018

Course: Deep Reinforcement Learning at Berkeley

RL Toolkit:

OpenAI Gym for Reinforcement Learning


Time Series Forecasting using Statistical and Machine Learning Models Jeffrey Yau : Dec 21, 2017

Deep Reinforcement Learning - MDP, Q-Value, Q-Learning, policy gradients, Actor-Critic Stanford University : Aug 11, 2017

Learn How to Build a Model in Python to Analyze Sentiment from Twitter Data (NLP & LSTM in Keras) Max Margenot : Sep 7, 2018

Use of Deep Learning in Tactical Multi-Asset Strategies Calvin Yu : Feb 22, 2019


OpenAI Gym

DeepMind Control Suite

The 10 Reasons Most Machine Learning Funds Fail Marcos Lopez de Prado : Jan 27, 2018

Deep Reinforcement Learning in Trading Ashwini Patil, Saeed Rahman, Padmanabha Guddeti : Jul 30, 2018

Deep Learning for Global Tactical Asset Allocation Gaurav Chakravorty, Ankit Awasthi, Brandon Da Silva : Oct 19, 2018


Reinforcement Learning - Q Learning Arthur Juliani : Aug 26, 2016 | Part 0 - Part 8

Deep Reinforcement Learning in Trading Saeed Rahman : Jul 30, 2018 | LinkedIn + Quantopian strategy workflow

Multivariate Time Series Forecasting with LSTMs in Keras Jason Brownlee : Aug 14, 2017

Linear Algebra for Machine Learning Jason Brownlee : Feb 21, 2018

Linear Algebra Cheat Sheet for Machine Learning Jason Brownlee : Feb 23, 2018

How to Develop Convolutional Neural Network Models for Time Series Forecasting Jason Brownlee : Nov 12, 2018

How to Develop LSTM Models for Time Series Forecasting Jason Brownlee : Nov 14, 2018

Using Machine Learning to Predict Stock Prices Vivek Palaniappan : Oct 30, 2018


  • Reinforcement Learning: Richard S. Sutton and Andrew G. Barto.
  • Artificial Intelligence - A Modern Approach: Stuart J. Russell and Peter Norvig.
  • Deep Learning: Ian Goodfellow, Yoshua Bengio, and Aaron Courville.

Not sure that this is the place to talk about can take it to another thread if not cool.

The current example of ML in the Q pipeline is @Wiecki 's above. I've tried using it quite a bit, yet have never gotten any compelling results out of it, even though it is embedded in all my contest algos. I'd like to discuss the underlying algorithm used for it with the goal of both trying to understand the fundamentals and the limitations of such a classifier, and how to move onto better solutions.

For the purposes of this discussion, I'll talk colloquially about the algorithm, in the hope that together we can get some traction of understanding on how to use AI/ML techniques in computational finance.

Let's outline how I use the ML method above, so that we can see if I'm understanding it correctly.

  1. Take a collection of time series that represent asset values.
  2. Discretize by sampling the time series, all by the same interval( e.g. Daily Closes).
  3. Take the resulting time-indexed mapping [Prices:(time x assets) ---> Reals], and compute a collection of features, f1, f2, ..., fk (e.g. factors)
    -the Prices array of data, and find our case, a collection of alpha-factors. (e.g. a function from Prices ----> Reals, like: macd10, fcf, value, trendline, ...). Note: this is just a set of new columns in the Prices array.
  4. Create a Returns/LogReturns function(column), and a set of functions(columns) which are classifiers(e.g. +-1 for Returns>70%, <30%), taking into account shifting the data back, to prevent "look-ahead bias".
  5. Now use an ML(e.g. from scikit.learn) model, with a particular kernel that generates regions in search space to separate out those points with different classification values method to train and validate the classifiers.
  6. Finally, use the data "hold-out" set to score your results for accuracy and precision.

The main problem I see with the above method is that it is not taking any sequential information into account, so therefore one has to take potentially large unknown size time-windows and large unknown size number of assets and factors to fracture the search space to actually learn anything useful.

I liken it to taking the set of points [i, returns(i), f1(i), ..., fk(i), classifier(i)], where i is time, throwing them up in the air, and having them land in the data-array list totally randomly, and then learning the surfaces needed for the classification...with no regard for the fact that all the asset time series are stochastically-continuous traces that are not just random numbers, but have sequential and spatial cross relations that exhibit periodicity and continuity constraints.

AI models that take into account involve RNN and LSTM networks...still not sure that this works better in the environment we're talking about, but the current environment, without sequential information seems inefficient, and I can't seem to get it to be effective.


Here are some links, I believe, to some resources to move forward. I haven't read all of them, yet am willing to discuss them from a learning point of view.

  1. Deep Learning the Stock Market
  2. Computational Intelligence and Financial Markets: A Survey and Future Directions
  3. A deep learning framework for financial time series using stacked autoencoders and long-short term memory
  4. Deep Learning in Finance
  5. Deep Learning in Finance-Re-work Deep Learning Summit, Singapore

Thanks @Karl for your detailed response!
Your CAPM comments cement what I've seen experimentally wrt ML, and in retrospect...what did I expect to see? ....
Do you consider Morningstar fundamentals as curated data? It's not ex-post trade data though...what is an example of that (showing my non-finance background) ?

I have yet to look at RNN for finance, yet expect to do that soon.
Have been working on a cloud platform to allow long and deep analysis offline from Q, which is what I feel is needed to make any progress using ML for trading.
Thanks for the links above.
I really appreciate the openness of the Q forum and, in particular this thread.

Machine learning and artificial intelligence both are related to each other. Machine learning applications are used almost everywhere. Leading companies like Facebook uses artificial intelligence for their advancement.