Back to Community


8 responses

Derek: Thanks for sharing this, very interesting analysis. I love the focus on visualization as well as the comparison of multiple techniques.

I do have a suggestion. Predicting prices can lead to inflated R² values because just predicting the previous value will be reasonable close. Looking at your R² values and what the regression methods predict suggest that's happening here.
Another approach, which would allow to test more classifiers, is to predict up or down movement for the rest of the day. This translates the problem to a binary one and gives a direct method of predictability in terms of % miscalculation. Moreover, it would directly tell you whether to long/short a stock to build a backtest around.


The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Derek: That's great, glad you did that analysis. Certainly shows how difficult it is to make these types of forecasts. Then again, you only have to be right 52% of the time to make money (or make more money when you're right).

There are a couple of extensions that come to mind. Trying it on different stocks, as well as adding other features like volume are fairly obvious. Another idea is to use the predicted probability (some sklearn classifiers have .predict_proba() which give a probability instead of a 0/1 label; see here for example). Then you might only want to trade if the probability is very high or low (if the classifier is indeed more often right when it's certain).


Hi, derek I'm relatively new here.. in quantopian.. anyways, Can you discuss how these models will help us in forecasting in the near term.. or year ahead ? thank you I'm very interested if it helps in my trading..

Derek: why do not use 80% of day for training, the prediction should be better for remaining 20%?

One more thing, that minute prices are much more random than daily, so finding dependencies seems difficult

Derek: Just came across this list which is relevant to your first exploration. Instead of R², one of these measures would be better to evaluate the forecast.

Mikhail: Good point. I think instead it could be interesting to use weekdays Mon-Thurs to predict closing on Fri.

Derek: This reminds me of the Kaggle Winton Stock Market Challenge, and maybe you developed this notebook with that competition in mind. Judging by the leaderboard it doesn't look like many got above the benchmark of predicting nothing, and those who do don't get very far from the benchmark. But I'm sure there is something to learn from that community in the Forum section. Their choice of metric was MAE.

Hi Derek,

a very interesting attempt, thanks for sharing. Maybe one can consider adding some longer-term context to the data, e.g. some features from the daily chart, like position of the day relative to some moving averages, previuos day bars etc. The idea would be an intraday pattern may unfold differently depending on if a day session happens to be far away from long-term mean, or follow some daily pattern formation etc.