Portfolio Structure and Overfitting

Recently, we’ve updated you on our progress allocating capital to community authored investment strategies. We have learned quite a bit about what Quantopian needs to do to continue growing, as well as what community members need to do to be successful quants. We are dedicated to spreading opportunity and giving everyone the tools and guidance to create the next $50M algo. In this post, I’d like to share two things you can work on in your own algorithms: portfolio construction, and how to avoid overfitting. First, some background Quantopian is a crowd-sourced asset manager. We partner with community members like you to find and exploit market inefficiencies (alpha). We provide all the data, technology, and training for you to research and test your ideas. You do the creative work and keep ownership of whatever you create here. You give us permission to run simulations and evaluate your work. When we find algorithms that we think produce alpha, we offer a licensing/royalty agreement. If you accept our offer, we’ll use your algorithm to direct capital (largest to date is a single algorithm that has been allocated$50M), and pay you a percentage of the returns.

With over 200,000 members, the Quantopian community is the largest quantitative research group in the world. At a traditional fund, the primary research challenge is sourcing enough strategies. At Quantopian, we have the complement of that problem: filtering through a huge number of ideas to find the best ones. Quantopian members have been contributing investment algorithms for years, and we are now the custodian of the world’s largest database of investment algorithms.

Our Learnings
As a crowd-sourced asset manager, our job at Quantopian is to carefully evaluate the performance of each algorithm in our database, while maintaining the privacy of your intellectual property. What did we learn from evaluating such a large number of investment algorithms? We found we need to teach and then guide the community in two areas:

• Portfolio Construction
• Avoiding Overfitting

Portfolio Construction
Evaluating a large set of strategies in our database, we learned that the Quantopian Community needed more guidance on how to construct and maintain a market-neutral portfolio. The most common faults we found were structural (e.g. trading illiquid stocks, high position concentration). Our new contest rules grew out of this work. Now you can use the contest rules as a clear guideline for creating a structurally sound algorithm.

Turns out, these structural properties are actually easy to check on in-sample data because they generally don’t change out-of-sample. In practice, this means we can automatically check these criteria as soon as you’re done coding. In fact, we do - the full backtest screen now reports whether your algorithm’s backtest satisfies all the criteria.

Overfitting
Of course, portfolio construction is just the beginning. Creating a structurally sound algorithm positions you to tackle the next challenge, which is quite a bit deeper: building an algorithm that performs well on new data (out-of-sample performance). If you’ve heard any talks by our Managing Director of Portfolio Management and Research, Jess Stauth, you know that a major pitfall for out-of-sample performance is overfitting. Overfitting means your model is brittle, and that it will fail when it encounters new data.

Competing in the contest is a great way to test your model for overfitting. Our scoring function emphasizes consistent performance over time, and is calculated on a daily basis against new data.

Overfitting can happen in myriad ways, and as you explore potential strategies, you need to keep asking yourself if overfitting could be creeping in.

What does overfitting look like?

• Bad data hygiene -- repeated backtests over your entire available data history without reserving any data for out of sample testing.
• Excessive precision without accuracy -- parameters tuned to 5 decimal places but tested on just a few thousand data points.
• Similarly, enormous parameter space -- your model is tuned with 100s of parameters, but you have a only 10,000 data points to test.
• Rare event exploitation -- your regime detection model triggers once in a ten year simulation, perfectly timing the reversal of your two ranking factors.

Overfitting isn’t specific to investment models. Check out this story of a Kaggler Dropping 50 Spots in 1 Minute. The Public Leader Board in Kaggle competitions is very similar to backtest results on Quantopian. Not only are the results not necessarily indicative of future performance, the more effort you put into optimizing the results in-sample, the more likely you are overfit and due for a catastrophe when your model is released in the wild.

Our goals are totally aligned with yours here, when we talk about overfitting, that’s because it’s the single biggest problem faced by our community members. For a more in-depth look, the Quantopian Lecture series has an entire lecture devoted overfitting in quantitative finance. We hope you’ll come away with a greater understanding of what it is and how you can avoid it, but if you have questions we’d love to hear them on the Quantopian forums.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

29 responses

@fawce,

Thanks for the clarity and guidance!

Congratulations Fawce. What you and your team at Q have done is great and is getting better & better all the time! Thank you.

One of the things you mentioned when I first started becoming interested in Quantopian was that you are seeking to encourage as wide as possible diversity of algos. People keep asking about styles other than EquityLongShort, but I know this all takes time. However even within the current EquityLS space, and without giving away any confidential info, could you find ways to guide us with regard to:

• the diversity of EquityLS algos that you have accumulated and, more importantly from an algo developer's viewpoint

• whether a good algo is likely to be deemed "would be good enough... if Q didn't have any like it already", but is now unlikely to gain an allocation because it is too close (by whatever metrics are appropriate) to existing algos already in Q's portfolio?

Cheers, best regards, TonyM.

Hi Fawce -

I'll try to provide my usual list of feedback when I get the chance, but one thing to noodle on is the Strategic Intent requirement on Get Funded. I discuss it at length here. It may be a double-edged sword, in the context of your over-fitting problem (it is more of a "your" than an "our" problem, since there is just opportunity cost to the Community, whereas you are trying to run a business). The problem is that there's a certain amount of out-of-sample data required to achieve a required level of statistical confidence that returns and their variability (combined into a Sharpe ratio, or whatever metric you chose) will persist. The risk is that you may be using the Strategic Intent description from the quant to fudge the time scale required for statistical confidence. If you also bring into play the identity of the quant, and potentially other meta-data, then you could end up short-circuiting the statistics even further, and making the wrong decision. It seems that there might be more downside than upside to having a "story" about the strategy versus just waiting the required period of time to get to whatever statistical confidence level you (and perhaps your customer(s)) require.

Of course, the other side of the sword is that you get can apply a kind of "sniff test" to the algo, based on the Strategic Intent statement, and discard the really smelly ones. If the algo exhaust just doesn't jibe with the described intent, then maybe more percolation time is required. Although, if you've hit the 95% confidence level out-of-sample, it would still seem to be unnecessary, since the over-fitting risk would have been mitigated.

Can this be automated? Is it possible to extend the platform to support a mode where a user can never see some data (say last 2 years) in research or backtest. Perhaps making it optional via account settings/sign up.

As a quant I only need a yes or no answer on whether a model backtest is overfit (if Quantopian can somehow run a backtest on hidden data in the background and just update the backtest with a yes/no overfit).

I'd be happy if the Quantopian platform helped me enforce a basic holdout policy say rolling window of last N months of data (blackout period in research and backtest) for the next M days/weeks/months which the user in his own estimate sets for model development. This will prevent human error in self enforcement of holdouts considering the backtest end date always seems to default to the current day.

Hi @Grant,
With the title of Fawce's original post "Portfolio Structure & Overfitting", perhaps you might regard my response to it as being off-topic. Certainly I was not attempting to address the issue of overfitting at all, but I was attempting to take a perspective on the issue of the composition of Q's total portfolio of algos. I'm not quite sure whether your post is a reply just to Fawce or are you at least partly addressing the issue raised in my post as well, which is basically that of "similarity of algos"?

Personally I had not even considered author's statements of "strategic intent" at all. Even assuming all authors are genuinely trying to express what they INTEND to do or think they are doing, there may potentially be at least some degree of discrepancy for all sorts of reasons including (foreign) language skills etc, between a statement of intent / "story" regarding an algo and the reality of what the algo is actually doing. I agree with you that any such written statement is likely to be a double-edged sword. It may or may not have utility at all. While it is reasonable for Q to enquire as to the author's "intent", for practical purposes the only real ways of determining similarities & differences between algos (at least as I see it) are in their relative performance statistics. I think we are agreeing on this.

Regarding @Leo's question "can this be automated", I'm not quite sure what is the "this" referred to? If you (Leo) are talking about backtesting / avoiding the problems of over-fitting then please feel free to just ignore the following comment. However if you are talking about the portfolio topic of similarity of algos, then I would think this should indeed be very easily automatable. Its basically just a case of measuring the similarity of or "distance between" members of a set (in this case set of algos) given their respective performances over time. All the usual tools like correlations from standard stats, mutual entropy from information theory, distance between nearest neighbors in clustering, and so on could be used & automated to give a set of completely objective metrics for "similarity" of algos. Automating that would 1) Help Q to ensure "not too many algos that are too similar" in an objectiove way that is completely independent from any statements of "intent" that may or may not correspond to reality, and 2) Also providing algo authors some feedback on "closeness of nearest neighbor" algos would also be very valuable to them in their algo design so as not to waste time working on algos very similar to what Q may already have an abundance of.

Cheers, best regards, TonyM.

Cheers, best regards,
Tony.

@Tony, in my prior post in the phrase "can this be automated" the word "this" refers to a specific overfitting guard namely holdout data. It appeared to me that can be supported by extending the platform to support a holdout mode (perhaps adding roles to user and assigning permissions to specific roles). Not only does it help the user develop best practices but it can also give a data point for Quantopian to know the holdout data when a specific backtest was run.

@Leo, not a good idea. Even if it might be. For one, I want to flexibility to do what I want without preset constraints other than those that I decide to impose on my trading strategies. You want a hold-out period. Do it in your trading strategies. Do not force it on mine...

@Guy, I wrote in my post "Perhaps making it optional via account settings". Maybe you missed that part?

One caution for the Q team is that you need to more carefully scrutinize certain vendor-supplied data sets for over-fitting. There was considerable evidence put forth by the community that the Alpha Vertex data set had been over-fit (e.g. see https://www.quantopian.com/posts/alpha-vertex-precog-test). It is interesting that the data set is still supported; perhaps the Q team could check it, to see if it conclusively exhibits evidence of over-fitting and make a decision if it should be still offered? We probably have sufficient out-of-sample data at this point to make a statistically firm assessment. Or alternatively, maybe the historical portion of the data should be dropped, if it is deemed biased? For the Factset data introduction, it would be great if the Q team could pre-screen data sets, and not allow the ones that have been over-fit (or maybe Factset does this as part of their service?).

Hi @Leo, I tend to agree with Guy's viewpoint, mainly because there are many ways to address the general problem of over-fitting. Almost everyone has their own different approach to it, including the notion of what actually constitutes "best practice". Even if do-able, my guess is that the level of (non-)acceptance by users would probably be outweighed by the effort involved in creating an automated scheme to satisfy everyone's diverse ideas. I do acknowledge the points you make, but personally I think Q does a good job in guiding but NOT explicitly prescribing to users how to design, build or test their algos.

@Tony, why should it be of concern to anyone if it is an optional account setting feature that a user need to explicitly turn on? If one doesn't want platform to holdout data - no action needed, continue as before. If nothing changes for a user with regard to their use of Quantopian platform I don't see where the concern is. I think platform supported holdout data itself is a good optional feature. Whether Quantopian wants to find some ways to help the user in that regard that is their decision and choice.

@Leo, why even consider it an “option” when you already have the ability to do it with any setting you want?

@Guy, personally I find it useful to have access restrictions strictly enforced so that some policy that a user wants doesn't get violated accidentally due to human error, and also I think it can help Quantopian as well by regressing overfit observations against this account setting.

I don't want to add anymore to this thread and wont be responding further, this will be my last post in this thread.

Thanks for the feedback everyone!

@Tony - Overfitting is by far the more pressing challenge for the community. Still, we owe the you more feedback on the correlation between strategies. We look at the correlation between strategies when we evaluate strategies, first to eliminate duplicates (and near duplicates), and then again in weight assignment for the algorithms chosen for the live portfolio. I can't give you a description of the feedback method we will use or a delivery expectation yet, but we are actively talking about it. Higher on the priority list is delivering more markets and more data via our FactSet partnership. This will allow the community to fan out even more.

@Leo - We like your idea. Riffing on it, we came up with applying a holdout to the data by default and allowing the user to optionally override when they want to check out-of-sample performance.

@fawce, I did go through all lectures mid 2017 when I joined Quantopian but the information overload was so much that it didn't occur to me from the overfitting lecture that it was the most pressing concern on your end. I would have certainly gotten the message if it was made a little more prominent in Quantopian and a warning dialog thrown if user is operating in a mode where holdout data is being used for anything other than OOS validation.

If some form of data with-holding is provided by Q as an optional feature then @Leo is of course correct that there is no real reason for anyone to object to something that they are not obliged to use if they don't want to. My hesitancy was really only related to how much effort I thought might be involved in implementing this and also, for me personally at least, overfitting is much less of an issue/concern than nearness or similarity between strategies. However if Q can address BOTH of these issues, then great.

@Fawce, yes, i'm not surprised that, from an overall community perspective, overfitting would be the bigger problem than similarity between strategies. No doubt you already have many people commenting on the former, so I will confine my comments to the latter issue. I think correlation is not the only useful measure of "similarity", but it will certainly go a long way towards helping, and so it would be a good place to start.

Presumably all strategies/algos could be considered in a pairwise manner, Q could calculate the correlation coefficients for the key metrics, e.g. at least the set {TotalRtn, SpecificRtn, Sharpe, Drawdown, as well as other factors as per the Q Risk model} and then from this generate some form of weighted inverse distance measure, and then report that result in terms of something like "distance to nearest neighbor algo" and also "average distance to (e.g.) 5 or 10 closest neighbors". Then, from a portfolio perspective, a "good" algo would be one that:
a) satisfies all of Q's general constraints,
b) performs well according to Q's chosen performance metrics, and also
c) has a "sufficiently" large distance (measured in inverse correlation distance or whatever other appropriate metrics/units) between itself and its nearest neighbor(s).
I'm happy to continue to provide comments / feedback on this issue if you want.

Cheers, best regards, TonyM.

I have a copy of Rob Carver's book, Systematic Trading. Perhaps there is a copy kicking around Q headquarters. Chapter Three. Fitting. See TABLE 5: IT TAKES DECADES OF DATA TO SEE IF MOST STRATEGIES ARE LIKELY TO BE PROFITABLE. For a true SR of 1.0, Average years to pass profit T-Test is 6 years, and for a true SR of 2.0, it is 1.4 years.

I'm wondering if the Q six-month rule-of-thumb is just not long enough to know one way or another the true SRs of algos? Maybe you are mistakenly calling the ones that don't hold up "over-fit" when there is just natural variation in their SRs?

The other thing I recall from the book is that a long-term SR of 0.5-1.0 is about all one might expect. So, maybe you should reject backtests that are "too good to be true" as the saying goes?

Overall, what data and analysis support the statement "Overfitting is by far the more pressing challenge for the community." Did you repeat something like what Thomas W. and Co. did with https://www.quantopian.com/posts/q-paper-all-that-glitters-is-not-gold-comparing-backtest-and-out-of-sample-performance-on-a-large-cohort-of-trading-algorithms? Maybe when the contest 6-month point is up, you'll publish a study to quantify the degree to which over-fitting is at play in the algos?

It will help if overfitting is defined in a scientific/mathematical way.

I think what we probably want is to define an overfit as sometype of logic like this

SPECIALCASE[i]

if X then do Y

where X is defined as F[e[1],e[2],...e[n]] where e[j] is some calculated event that a user arrives at using some type of market behavior.

Then it boils down to how large i and j each are and what is the probability of market throwing an event in the future that is not covered by i,j

More special casing => higher i,j => likely higher in sample sharpe and shorter the time it takes market to present a new situation in out of sample not previously seen in sample.

"The other thing I recall from the book is that a long-term SR of 0.5-1.0 is about all one might expect. So, maybe you should reject backtests that are "too good to be true" as the saying goes?"

@Grant, in my opinion one cannot categorically reject backtests because "too good to be true". One can assign a high probability that it is highly overfit or there is also a smaller probability that someone really came along and was able to write such an algorithm. Persistently higher sharpe possibly leads to shorter OOS required as it is unexpected that an highly overfit algorithm continues OOS. But if sharpe varied quite a bit then it will take longer OOS. The bayesian cone need to account for that.

It will help if overfitting is defined in a scientific/mathematical way.

Well, we have the Bayesian cone thingy (see for example the algo tearsheet I posted here). The out-of-sample returns stay in the inner cone, so I guess it was "not over-fit." And I suppose if the returns had veered off into the next zone (higher or lower), we'd say "maybe over-fit" and for the next zone (higher or lower), "probably over-fit" and then not in a cone at all "over-fit." I'm thinking to be rigorous, Q should reject algos that go up out of the cone and perform consistently better out-of-sample, and have a backtest that does not support the out-of-sample trend? Not the classic over-fitting problem, but still indicative of lack of model fit out-of-sample.

One can assign a high probability that it is highly overfit or there is also a smaller probability that someone really came along and was able to write such an algorithm.

On Q, as a function of SR, I'd bet that at some break point, the probability of over-fitting monotonically and quickly approaches 1. For example, say a backtest that meets all other criteria for the contest/fund has a 10-year consistent SR = 3.5. The next ten years are not likely to show a SR = 3.5.

Having a holdout period to check against overfitting can be effective only if it is not possible for the Users to test against that holdout period. Otherwise what could happen is that we develop an idea and it works fine in the in-sample backtesting period. So then we test against the holdout period and find that it does not work so well. So what do we do? We modify the idea a bit and test it again over the in-sample data to see if it still works. Then to validate it, we check it over the holdout period again. Effectively, the holdout period becomes in-sample since in a 2 step process, we are indirectly fitting over it! So, it can work only if the holdout data is not available to the Users at all. Which may be too restrictive.

Another question to ponder upon is how much holdout data do we need given the confidence level of the strategy? Now assume we are targeting to achieve at least a 0.7 to 1.0 Sharpe strategy where the recommended in-sample backtesting period is between 6 to 10 years to pass the profit T-test. Lets say we test over an in-sample period of 8 years. Now even with a 95% confidence strategy, it will still make a loss about once in every 10 years. In the 8 in-sample years, if the strategy was consistently positive, and if we have 2 years of holdout data where in one of the years it made a loss which may be natural, do we conclude that the strategy is overfit?

The lecture on Overfitting (https://www.quantopian.com/lectures/the-dangers-of-overfitting) is great and offers some excellent suggestions to tackle overfitting at the strategy design and implementation level itself. Here are my comments on some of the suggestions mentioned in this lecture:

1. Choosing simple models: Every time we try to complicate rules, we are probably trying to handle too many specific cases in the data patterns and hence probably overfitting.
2. Having fewer parameters: Is there really any need for having more than 1 parameter per alpha factor? For technical factors, the parameter could be the rolling window length for the look back, for fundamental factors it could be the frequency of the earnings results you look at: either quarterly or annual.
3. Taking larger samples: We can take larger samples across both time and space so that we have a greater number of data points. Across time, by having a longer in-sample backtesting period and across space, by trading in at least 100 or more stocks on an average at any given time as recommended by Quantopian. Trying to overfit over this large sample size given just a few parameters is extremely difficult. But of course, requiring longer in-sample backtesting periods could increase the load on Quantopian's infrastructure if Users are not already doing it by themselves. Not sure how feasible that would be.

@Rainbow Parrot's comments are good. No matter what holdout is initially used, people will invariably iterate back & forth to try to get "better" results. Unseen holdout data that is subsequently seen (even if only once) is no longer "unseen" data at all. It is easy to deceive oneself by creating the illusion of OOS whereas the reality has become IS.

With regard to variable parameters, generally any more than 1 is probably too many, and the best case is none. And yes it is actually possible to do it at least in some cases with a bit of careful re-design.

Following @Leo M's comment about defining over-fitting in a scientific / mathematical way, the best way (that I know of) regarding how to do this is in terms of specification of the number of degrees of freedom of the system and the data. The first good published work that I know of which includes this topic is Bob Pardo's old but excellent book: "The Evaluation & Optimization of Trading Strategies" (Robert Pardo, Wiley, 2008). Bob's rule of thumb is to ensure that the number of remaining (unused) degrees of freedom exceeds 90% of the total number after considering all the rules & conditions imposed by the trading system. This also helps with determining the minimum data requirement for any given complexity of trading system.

Over time, I have tried to move completely away from anything that looks like "data mining", which may give great looking results over any given sample period, and may even work for a while going forward, but can never give us any assurance that it will actually continue to work longer term. Conversely systems based on known fundamental effects will continue to work in future no matter how many other people "discover" the system.

Cheers, all the best, TonyM.

There has been a lot of good discussion here on some guidelines and ideas about portfolio structure and overfitting. This has got me thinking about the structure of Q Optimize API as it relates to overfitting. In ML/AI algorithms for financial predictions, in most cases. the objective function is often some variable in the future that needs to be maximized or minimize by some optimization scheme. Under a 60% training / 20% testing / 20% validation breakdown of a sequential time series with validation OOS data being the latest available, normally after the training and testing routine of your choice ML algo (Neural Nets, Random Forests, SVM, etc.) all weights maximized or minimized are locked in to validate the OOS data to determine if there is good generalization on OOS data it has not seen before or if it is overfitted (generalization is bad).

In the context of Q Optimize API, the frequency of optimization is dependent on what you state in the schedule function (daily, weekly, etc.) The optimization concept under the Optimize API, if I understand it correctly, is structured to fit the design of a long/short equity market neutral strategies against all Q given constraints. In plain and short, given your inputs (alpha factors), the output is some kind of weight distribution attributable to these factors that gives out the resulting cumulative returns of the strategy. I believe Blue Seahawk made a study that MaximizeAlpha construct predominantly optimizes position concentration. I guess the TargetWeight construct optimizes assigned weights to each selected stocks. So if you set schedule function to rebalance daily, optimization is done and weights are updated everyday, even for you OOS validation. Unlike ML/AI algorithms, where weights obtained from training and testing are locked in to validate OOS data, the Q Optimize API does not lock in these weights, on thecontrary, it just keeps updating weights. I wonder if the Q design of Optimize API is suceptible to natural overfitting because of the frequency of optimization? Is it even possible to have a meaningful assessment of OOS or holdout data, if it is being optimized too? What do you guys think? I'd like to get some feedback or comments. Thanks.

@ James -

My first thought is that the Q Optimize API does not contribute to the over-fitting problem highlighted by Fawce. It has no way to look-ahead, effectively, to boost performance in-sample.

The Q Optimize API may be contributing to unnecessary turnover and volatility, however. The output should really be smoothed over N days, if it is run every day (assuming that that this doesn't kill the alpha). I've played around with trying to smooth it, but never got to the point I was happy with the solution. I should probably revisit it at some point, and post an example.

@fawce,

In light of the active discussions on the shortcomings of your third party fundamental data provider, Morningstar, as described by community members, Constantino and Blue Seahawk here, my question is, since Q does not see code and therefore does not see what alpha factors is being used by the algo author, how does Q safeguard against what I would describe as "bad data hygiene" given the inconsistencies in frequencies, timing and non reporting of fundamental data which I believe is the basis of foundation in stock selection and filtering. While the data integrity provided by Morningstar is beyond your control, isn't it incumbent upon Q to do some data integrity tests and filter out the stocks that do not meet some threshold criteria like what you did in filtering QTU. This actually poses serious risks for algos in your portfolio that depend on fundamental factors being scored across QTU.

Thanks @James, well said.

I posted a question regarding partner data over-fitting here. Perhaps folks have additional questions/concerns. I'm thinking that for any kind of derived data set, Q will need to let it "age" out-of-sample, to exclude the possibility of over-fitting. Then, analysis can be provided to the community to show that the in-sample data can be trusted, at a certain level of statistical significance.

The notion of over-fitting a trading strategy might technically turn out to be quite a misnomer. It might require to do so many compromises that its very notion becomes almost irrelevant.

I see the problem of strategy over-fitting as two separate, sequenced and interacting problems: first, a stock selection process, and then a trading process.

The selection process is like a stock screener of some kind. A way of selecting for some reason or other some stocks to trade on. In my post on gaming multiple strategies, it was said that selecting 500 stocks out of some 2,000, like in the QTU stock universe, had $$5.6^{486 }$$ possible combinations. Every day, every week, or every month you would have such a selection to make from this huge stock universe. This number is so large that even a slight shift in criteria could yield quite a different list of stocks, each with their own idiosyncrasies. You could schedule your trading procedure 30 minutes later and get different answers while during that time global activity on the planet would not have changed that much.

On the other hand, the specifications could be so specific that a single list might be the only thing that comes out.

The Selection Process

The outcome of the selection process is what will be fed to your dynamic trading process. It is therefore critical to understand which stocks are being picked and for what reasons.

Is it over-fitting if you change the selection process resulting in a new selection that might not be comparable to the previous one? We could all argue that that would be sufficient to get different answers even if none of the trading procedures or program logic were changed an iota.

Over-fitting implies that you have something “fitted” to compare it to (from where you derived the expression over-fitted) or that you have taken enough samples from the available selectable universe to reach the notion of averages. How many samples would that be? One might not be enough!

The over-fit notion might possibly apply if we had averages where we could say: above or below those averages. But, there, the problem is that we have no means to locate or identify what the averages might be. The selectable universe is simply too large. You take a million samples (which requires running a million simulations, tally all the results) and all you have is $$10^6$$ out of $$5.6^{486}$$, leaving $$5.6^{480}$$ untested scenarios.

Changing the portfolio selection process does not make it over-fit, just different.

It is one combination out of $$5.6^{486}$$. And, a few days later, the whole selection process will need to be done again, and again be a 1 in $$5.6^{486}$$. It would be sufficient to change a single factor to get a different selection.

To grasp the magnitude of $$5.6^{486}$$ take $$10^{10}$$ supercomputers (that's 10 billion machines) $$\times 10^{16}$$ operations per second $$\times 10^{14}$$ seconds over some 10 million years of trading. This adds up to: $$10^{40}$$.

NO Exhaustive Search

So yes, nobody has yet to make an exhaustive search, Monte Carlo style, or otherwise. Not even close. One million more of those supercomputers raises the total to: $$10^{46}$$. Still, an infinitesimal fraction of the total (it is: 0.0 followed by 439 zeros, and then a 1). I think you see the point. The selections we make could be pretty unique and come with no information of the world they come from (mean, standard deviation, and what have you).

How could we declare something over-fit if there is nothing that we can compare it to or somehow declare as a representative average of this huge set of possible scenarios? Even the indices we use are just 1 in $$5.6^{486}$$ other possible choices. So, we make compromises and simply declare, this whatever something, as an average of what is out there.

Whatever you select as your trading universe will most probably be different from everyone else unless they use the exact same trading program you do on the same data from the same data provider and did those simulations under the same conditions using the same trade times over the same time interval.

Survivorship Bias

The best you could do is bias your selection toward survivors. The financial literature recommends: avoid survivorship biases as much as you can. But, that is exactly what you should do: seek out survivors by all means. You want to trade the best that is out there, not just anything that comes out of this huge universe of possibilities.

Your selectable and tradable stocks usually should not be penny stocks or those on the verge of bankruptcy, even though you could specialize in those. Going forward, you will want to play only survivors anyway. Those that have positive future prospects of doing more business. None of the nearly bankrupt candidates should ever be able to stay on your tradable stock list. If it is a bias to want to play the best stock candidates you can find, then so be it, do it. It is just plain common sense. And it is not because you are programming something that you should lose it.

However, the caveat is: everybody else is trying to do the same thing (the millions of them). And guess what, a lot of them navigate toward the same solution. For instance, all the traders that sort their tradable universe by highest market cap should all get the same answer, all the time.

There was only one possible past universe to select from and it is of historical record.

Therefore, whatever their respective trading strategies, they will all be a variation on the same theme which is: how can you take advantage of large caps sorted by size?

Just by putting this observation on the table and the problem changed.

The Problem Changed

Now we have traders (he/she) playing a variation on a theme. Whatever trading strategy they design, it will deal with an ordered list of the highest caps, the exact same selection set as the other guy. For whatever, when you analyze a lot of strategies, it is what you find. And this reduces the problem to a single set in this huge stock universe... which furthermore becomes the same for everyone on this theme. Is this an over-fit? Again, I don't think so. It is just a selection process.

I presume, except for published strategies, that everyone has a different selection process making their tradable stock universe unique. And from day to day, that universe could change. It is not from the stock selection process that we could declare a trading strategy in some over-fitted territory. All you have is a selection, whatever it is. And whatever criteria served in that selection process can also be applied going forward.

The problem is different going forward since you will not know what your selection process will choose. It is not a question of over-fitting, nor is it a notion of random-like selection. Going forward, your strategy will select its stocks, but you will not be able to know which factor dominated, or why next year at this date a particular stock made the list. You cannot even know if it will be there or not. However, some do have much higher probabilities of being there than others. Does your trading strategy suddenly become over-fitted because your stock was there or not there? Again, it is not from the selection process itself that we could speak of over-fitting. For the trader, that stuff did not make any money. Or did it?

A long-term investor would argue that it did and that the selection process was the most important of all. It should be observed that you do not see traders in the Fortune 500, only bag-holders that have held on their shares for years and years, which in a way defends their argumentation.

For the traders, their respective trading strategies are what will differentiate them.

Hi Fawce -

Inspired quants have written
Over 9,000,000 backtests on Quantopian.

I suggest changing this, since ideally if one has done a proper job of researching and constructing the algo, relatively few backtests should be required. It is also a path to over-fitting, since for a relatively short back test duration, one can manually iterate and fiddle with parameters until the results appear optimal (your platform doesn't support automatically creating a back test response surface over parameter space, so I assume manually parameter tweaking). You might recall also that you published a paper that showed that lots of back testing may suggest over-fitting (see https://www.quantopian.com/posts/q-paper-all-that-glitters-is-not-gold-comparing-backtest-and-out-of-sample-performance-on-a-large-cohort-of-trading-algorithms for links and discussion).

In some sense, you may be advertising how much of your precious funding from venture capitalists has been wasted in paying for backtests that don't pay off. You could revise the metric to capture how many of those 9,000,000 back tests have produced returns for the Quantopian investors. It would be interesting if you could back out the scaling. For every N back tests, how much revenue has Quantopian generated? Does it scale linearly? With 10X more back tests, would you make 10X more money? My hunch is that if you study the relationship between the number of back tests and the success of your business, you find a weak, if any, relationship (in fact, it may be a drag on your business, both in direct costs, and indirect, since users get bogged down in back test tweaking).

Make sense?