Back to Community
contest entry guidance - estimates-based strategy?

I just skimmed over this announcement:

https://www.quantopian.com/posts/new-data-factset-estimates

Buried within it:

Contest & Allocations

Consensus Estimates and Actuals are both available in backtesting, so you are able to use them in the contest. FactSet estimates data is the first of its kind on Quantopian. Strategies written using this dataset are likely to be uncorrelated to strategies that are already running in the contest. For this reason, we heavily encourage you to familiarize yourself with this dataset and try to enter estimates-based strategies in the contest. To further incentivize that, we are increasing the limit on the number of contest entries to 5 per person so that you can make a new entry with FactSet Estimates without having to withdraw one of your existing entries. Algorithms that use FactSet Estimates are eligible to be considered for an allocation.

All good. My question is how viable a strategy would be, if it were based solely on the new FactSet estimates data set? One important point of the excellent architectural piece by Jonathan Larkin, A Professional Quant Equity Workflow, is that quants should be thinking in terms of combining a set of alpha factors ("A successful strategy usually includes many individual alphas"). So, I would think that since the contest has been running for awhile, and we have a large field of over 300 entrants, that a significant number would have followed Jonathan's advice, and constructed multi-factor algos, and would simply add one or more estimates-based alpha factors to their already proven algos, versus submitting new algos built exclusively on estimates-based alpha factors. If this assumption is true, then winning money in the contest would require more than a single-factor algo (or one based on multiple factors, but all derived from the same data set). The contest scoring does not include weighting for being uncorrelated with the other algos in the field, so I'm just not understanding the argument that an algo based solely on the new FactSet estimates data would be successful.

Or is this all wrong-headed thinking, and I should write a contest algo based exclusively on the new FactSet estimates data set? If so, please explain.

As a footnote, I think the statement "Strategies written using this dataset are likely to be uncorrelated to strategies that are already running in the contest" needs justification (and "uncorrelated" is too strong, I'd bet, suggesting a complete absence of correlation...highly unlikely). I would consider the FactSet estimates data to be effectively public-domain within the trading industry, and so it is quite possible that the information is already largely incorporated into other data sets that are already being used in the contest (and the Q fund). There's no exclusivity to the estimates data, so why would it contain new information? But maybe my assumption that the existing data sets already contain estimates information is wrong, and we could actually see totally uncorrelated estimates-based alpha factors. Any insights?

10 responses

I think what Q is implying, perhaps, is that it may be possible for the community to add new alpha to the Q fund, by researching the FactSet estimates data, and then incorporating any promising factors found into their multi-factor frameworks. One thing that would help would be to add a "Q fund" style risk factor to the existing five style factors: momentum, market cap, value, mean reversion, and volatility. This way, algos that are too much of what Q has in abundance would not qualify for the contest and the fund. The problem, I think, is that Q probably doesn't want to publish what is in their fund, via a new risk factor (e.g. it would be too easy for competitors to back out information about the fund). So the crux of the problem is that Q is asking users to make incremental, accretive contributions to the fund, but as far as I can tell, there's no way to get feedback as one develops a strategy if it will just be more of the same, or new alpha for the fund. And it would seem that the contest incentivizes more of the same alpha, versus unique alpha, vis-a-vis the Q fund. So, it is hard to see through to the end here, as an incentivized, measurable task. What am I missing?

Hi Grant,

I think at a high level, what Q is now doing is presenting us with new content datasets to explore and exploit in hopes that we the authors will come up with something different or unique alphas based on these new datasets, as a standalone or as an add on enhancement to existing factor combinations. The new driver is now uniqueness or differentiated from what they already know or have. So instead of what you call "Q fund" style risk factor, what is needed is a measurement metric of uniqueness as the better guidance. I think they're working on this.

Thanks James - Is the uniqueness metric something that was described on the Q site? Why do you think that they’re working on something? Seems like adding it as another risk factor would be the way to go, but it would basically need to be the Q fund, if the goal is to limit exposure to already-funded alpha. Then the contest would automatically adjust via an update to the risk model (and everyone with a non-conforming algo would be disqualified and have to focus on new alpha streams for the Q fund).

Grant, just my version of palm reading, guessing based on direction. Intuitively, having differentiated diversity in approaches and sources of alphas is key to efficiently managing a portfolio of uncorrelated funds that is focused in specific trading strategy. Introducing new content datasets that Q authors haven't seen or used before is one way Q ensures they are getting new and/or differentiated source of alphas. Of course, they would want to measure the impact of these new unique alphas vis a vis their attributions to the Q fund as they are added. So it's probably a score rather than a risk factor.

@ James - I think it needs to be feedback one can use in developing strategies, which is what the risk factors provide. I'm not sure that a contest uniqueness score (comparing each contest algo to all of the other contest algos) will cut it. At this point, with presumably ~30 or so decent "uncorrelated" strategies in the Q fund, the goal in my mind would be to add an itsy-bitsy bit of new alpha based on new data sets, and not futz around with alpha that is already represented in the Q fund. I just don't see the incentive and feedback mechanism to work on idiosyncratic sources of alpha, assuming that the Q fund has been wildly successful, and just needs to round things out with some variety.

Grant,

There is also the matter of handling and managing the point of diminishing risk adjusted returns or alpha decay in these ~30 or so uncorrelated strategies. Adding differentiated and unique fresh alphas may compensate for the decay. you want to be one step ahead of the inevitable,the so called curse of non stationarity.

A relevant announcement:

https://www.quantopian.com/posts/quantopian-business-update

Therein, he comments:

we are exploring ways to update the contest to include some measure of an algorithm’s originality or uniqueness. This is an early stage research project and we will be relatively conservative with any changes to the contest. We will, as always, be totally transparent and solicit your feedback before we make any rule changes.

This would seem to be straightforward: add the Q fund as a style risk factor in the risk model. The problem, of course, is that this ends up revealing details about the construction and performance of the fund, which probably wouldn't be copacetic with the regulatory powers-that-be, Q fund investors, and Q management.

@Grant,

The problem with your concept of the Q fund as a style risk factor in the risk model, is precisely that. It can be reversed engineered and thereby exposing its performance, something that Q and their investors would probably not want to divulge. A uniqueness score is not something new and has its advantages aside from masking performance. Numerai has open sourced its code for these kind of measure.

description here

@ James -

Yep. It’s a problem in my mind for Q. Numerai may be different because as I recall they have a kind of encryption (called ‘homomorphic’) so that they can push out data to the masses without being able to reveal anything useful about where it comes from. It basically turns it into a pure data science problem.

Q could develop a black box evaluation scheme relative to the fund (e.g. submit your backtest ID and get a report back, thumbs up or down if it has any hope at all of contributing to the fund).

Originality
For checking originality, Numerai chose the two-sample Kolmogorov–Smirnov test. It is intended to tell if two samples come from the same distribution. The idea is quite simple: one takes two sets of predictions and computes a cumulative distribution function for each. The biggest difference between the distributions becomes the KS statistic.

The Q fund operations and portfolio management has now also become a data science problem from the prespective that each individual fund algo is treated as individual signals that are combined with other signals to form the portfolio execution and aggregate performance thereof.