Risk-Focused Algo

I designed this strategy this morning after attending Dr. Stauth's excellent webinar last night.

The algo exploits just one idea: humans fixate on levels (anchoring and adjustment).

The main point of the exercise is to try to build something that captures the highlights of the webinar, namely:

1) The algo holds > 1000 positions.
2) Daily turnover is 11%.
3) Sector and Style exposures have means close to zero.
4) The style exposures are dynamic and tightly constrained.

The algo is set up with 10m and the Quantopian default cost model. It passed all the Contest constraints, so I've entered it in the contest! 407 Loading notebook preview... Notebook previews are currently unavailable. 89 responses Thanks for joining us for the webinar! I'm so glad you found it useful - and awesome work on your contest submission -- it looks great! Disclaimer The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances. I love this type of usage of Pyfolio and the forums. It allows you to share your results for feedback without disclosing the logic of your strategy at all. Disclaimer The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances. When and where was the seminar announced ? I must have missed it. Can it still be watched?Thanks. Hi Tim, We announced the Webinar in a few places including the forums and a marketing email. Sorry that we weren't able to notify you about it ahead of time. The good news is that it was recorded and you can see it here. Disclaimer The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances. In this follow-up post, I add an extra factor to the original single-factor model. The goal is to investigate the benefits of factor diversification in 'pure alpha' strategies. To achieve as much diversification as possible, I chose a fundamental factor that should be uncorrelated with Size and Value. The main effects are that Daily Turnover falls from 11.1% to 8.6%, overall strategy volatility falls slightly, and the SPY beta is closer to zero. 26 Loading notebook preview... Notebook previews are currently unavailable. As an endnote to the exercise, it has to be said that the performance side of the equation almost looks too good to be true for the second model. The rolling 6-month Sharpe Ratio is consistently above 1, only temporarily dipping below once during the simulation. I'm happy that this exercise has shown it is relatively straightforward to mitigate the common risk components of a strategy. However, a substantial part of the webinar was devoted to the problem of overfitting and various strategies for holding out data. That's the next step! Loving this exercise! How are you thinking about approaching hold out data and overfitting guards? Hi Jess, I already have the following safeguards in place: 1) The risk allocation is fixed and reflects the structure of the stock market. 2) I trade hundreds of stocks without placing more weight on individual stocks based on their historical ‘alpha’. After the webinar, and the driver of this exercise: 3) If the constrained solution survives the purging of common risk factors, then that is probably a good sign. A couple of ideas going forward: 4) Restrict models to having a couple of ideas / factors. Then have the discipline to say that those ideas have now been ‘used up’. This also ensures a nice conveyor belt of fresh ideas. 5) Run the Alphalens analysis on a long period up until the 2-year simulation period, but not including the simulation period. This avoids lengthy and tedious backtests and is closer to what someone can actually do in real life. @Olive Coyote, Very impressive, especially the second one! Please stop now. ;) Hi Leo, My goal is to design a Modeling Process where the only discretionary aspects are: 1) The choice of factors (based on sound economic principles and research papers) 2) The directions of the factor sorts, or equivalently, to which side of the market does the risk premium accrue --- longs or shorts? For me, OOS testing will hopefully allow the Modeling Process to be adapted (most likely, simplified) until perhaps 6 out of 10 models have live trading results that fit towards the center of the Bayesian Cone. But perhaps that's way too optimistic! Leo, I respect your opinion and of course you are free to take any approach you want. But the dangers of in-sample fitting of multi-factor models are particularly subtle and dangerous. The following is a good analogy: You toss 10 coins and notice 4 have landed 'Heads'. You decide that the 4 coins showing 'Heads' must be biased coins, so you discard the other 6 and build a strategy around the 4 coins showing 'Heads' again in the future. Suppose those coins are instead fair coins. To have a chance of ending up in the center of the Bayesian Cone out of sample, you're looking at a 1-in-8 or 1-in-16 chance (something like that, because 3 Heads and 1 Tail would be pretty good). This is my interpretation of one of the comments Dr. Stauth made in her talk. It paints a pretty solid case for some kind of holdout data or walk-forward analysis. Oh now I see what you are trying to do. Basically you are going to let OOS dictate which factors survived and which need replacement or what new factors need OOS validation. By repeatedly doing this you are hoping to have a conveyor belt of factors (that survived by placing at the center of the bayesian cone in the subsequent OOS). Tthose you will eventually mold into the final algo. That's a nice approach. I guess one has to have a lot of patience for that approach. I am surveying what you and Joakim are doing to get a feel for what will work for me. I am inclined towards choosing a holdout strategy that fits my personality. I have little patience and somewhat of an overconfidence in my abilities to just figure things out. I'll probably go with this twist of hold out data: • Use research platform for time periods till 2017-06-30 for alpha factor discovery (this should include AlphaLens analysis) • Occasionally combine the alpha factors and test until 2017-12-31 (once a month) • At the end of the test one last time validate till 2018-06-30 (once in 6 months) • Data beyond 2018-07-01 (never used) That way I will have hold out data of two different six month periods and by the time I am finished at the end of the 2018 there still is one 6 month period that the algorithm would have never seen. This also aligns with my concern that the first time I use data until 2017-12-31 that data will lose its importance moving forward in subsequent testing, but I will still have the last six month period upto 2018-06-30 that I plan to save until the finale! I like both approaches, but for now, I'm sticking to my much simpler (and possibly not as effective at minimizing overfitting) approach. In this post, I have increased the size of the sample period with a view to using a 60 / 20 training set / validation set exercise later. I am belatedly performing an Alphalens analysis on the 1st factor. It only takes 5-10 minutes to perform. The things I look for in an Alphalens analysis are as follows: 1) An 'economically meaningful' amount of basis points per period. 2) Returns being generated in both the top and bottom quantiles. 3) An attractive Factor-Weighted Cumulative Return. If the factor passes those criteria, I then look for an IC in the order of 0.01 and a p-value / t-statistic of < 0.05 and > 2, respectively. Hard limit of < 0.10 for the p-value, if other statistics are compelling. Factor 1 passes all those tests. This post illustrates the danger of Overfitting. I use my Contest 38 algo, which has some strengths, but a key weakness. The algo finished in the Top 5 of the competition. The Sharpe Ratio for the live contest period was a modest 0.26. The strengths of the algo are that the risk structure is pretty advanced for the time it was submitted (January 2018). It only fails on the new contest criteria with respect to using the Q1500 universe and for Turnover being too low. The weakness, however, is clear by looking at the Bayesian Cone --- Overfitting! At the time, I allocated more capital to stocks with higher historic 'alpha'. I've since moved on to using 1/N naive diversification as a more robust alternative. 5 Loading notebook preview... Notebook previews are currently unavailable. Antony, I have 2 questions : Do you consider, as achievement, algo with "out of sample" 0.17% % Annual Return in 1.6%-2.0% CD Rate environment and where you find some strength? Do you think the algo with "out of sample" 0.17% Annual Return in 1.6%-2.0% CD Rate environment should be in best 5 of Quantopian Open? That is quite a compelling example of EXACTLY the overfitting issue we talked about in the webinar - thanks for sharing it! Did you use a trailing, rolling window to determine your weights in this version? Or some other technique? Would you be willing to share a tearsheet of the version where you drop back to a flat 1/N weighting, with no other modification? EDIT: Thought of another question - have you previously run backtests over prior periods? Is there another range of year(s) of past data you haven't testing yet that you might be able to preserve as hold out? Vladimir - Unless I'm misunderstanding, the point of this post is to illustrate an overfitting fail - not to propose this algo would be a wise investment as is. Jessica, The algo finished in the Top 5 of the competition. Can you answer my second question? @Jess, If I may try to rephrase Vladimir's queries as it relates to the above example that you find quite a compelling example of overfitting. If this algo is a compelling case of overfitting, why did it place 5th in the contest? Is the contest 38 scoring system out of sync vis a vis the objective Q is trying to achieve? This algorithm has been running for close to 4 months out of sample, and is the main contributor to the900 I have won in the new contest.

It has a far more robust asset allocation based on Bayesian principles, and is sitting fairly well in the Cone.

However, I think the Contest 38 algorithm is much closer to meeting Quantopian's overall allocation criteria. It's just very challenging to earn significant returns when you have zero sector and style betas across the board.

8
Notebook previews are currently unavailable.

"If this algo is a compelling case of overfitting, why did it place 5th in the contest? Is the contest 38 scoring system out of sync vis a vis the objective Q is trying to achieve?"

I think there are two elements that are unsatisfying in the case of this January submission Olive Coyote was generous enough to share:

(1) With the benefit of hindsight it appears that some aspect of the algo development process was likely overfit to in-sample data. The in-sample sharpe of over 3.0 drops out of sample to ~0.26. There is no aspect of the contest scoring (for the 6 month contests or the current daily contests) that explicitly penalizes apparent overfitting based on mismatch between in and out of sample stats. We can certainly debate the merits of that design and if and how scoring could be modified to penalize overfitting - but for contest 38 we can only look to the out of sample results to sanity check an algo's ranked position in the contest.

(2) Intuitively, we don't find the algo's out of sample Sharpe that compelling to justify a top 5 finisher. I took a quick look back at the top 10 algos from that contest 38 and the top 3 look pretty reasonable, but it falls off quite steeply after that -- here I think it highlights the improved SIMPLIFIED scoring mechanism for the daily contest where if you qualify you're just ranked on risk adjusted returns. I think that will avoid this effect going forward where we see counter-intuitive rankings from a Sharpe basis.

Contest 38 top 10 finishing Sharpe ratios:

2. salmon zebra - 0.9
3. red gazelle - 1.5
4. scarlet dove - 0.23
5. olive coyote - 0.10
6. pear panther - 0.27
7. olive coyote - (-0.09)
8. pink owl - (-0.4)
9. violet pig - 0.50
10. green ape - (-0.45)

I'm curious if others here agree that the daily contest scoring ranking methodology is doing a better job of sorting the best algos to the top of the heap?

@Jess,

Thanks for your response. I look at contest 38 as an experimental sedgeway to the new daily contests, The design didn't completely capture the desired results. I'll leave it at that.

The new contest scoring system is definitely more robust and a step in the right direction. However, I think there is room for improvements in terms of mitigating luck factor and gaming, penalizing overfitting, inclusion of holdout validation data and measuring metrics with the same duration. I think the scoring system will be a continuing work in progress with a feedback loop until all nuances are factored in.

Hi Jess -

I'm curious if others here agree that the daily contest scoring ranking methodology is doing a better job of sorting the best algos to the top of the heap?

"The proof of the pudding is in the eating." Very soon, we'll be at the 6-month mark of the present contest, which is the minimum out-of-sample period for fund allocation decisions. It will be up to Q to decide if proof will be provided, by providing data showing the effectiveness of the present contest in producing fund-worthy algos (e.g. plot of dollars allocated versus contest score).

Presumably, Q could perform preliminary analysis now, and assign a "probability of funding" score for each contest algo, and then publish the aggregate statistics (dropping traceability to individual quants). Aside from the qualitative, self-reported Strategic Intent statement, you should have all the data per the Get Funded page to make a decent assessment, prior to completing the 6-month mark for the current contest. For example, what would a plot of probability of funding versus contest score for all current algos look like?

Personally, I'm generally happy with the direction things have headed for the contest/fund (although I'm skeptical, specifically, about the Strategic Intent requirement and the style risk factors).

"I'm curious if others here agree that the daily contest scoring ranking methodology is doing a better job of sorting the best algos to the top of the heap?"

A classification of best assumes out performance assuming everything else was constant across algorithms, for instance all algorithms took the same risk or had the market regime to their favor to the same degree in the just concluded 6 month period.

Risk in our contest is bounded, not constant. Market regimes are never constant, so can't say that today's best algorithms will continue to be the "best" (top of the heap) in the subsequent 6 month period or when the next market regime switch happens.

I think the backtest statistics can largely be taken with a pinch of salt. It's straightforward to find 'risk factors' that perform well for 7 or 8 quarters in a 2-year period, but quickly becomes difficult for larger sample sizes such as 3 years.

There's likely to be a negative correlation between the Sharpe Ratio of a backtest and the length of its sample.

An algo should instead be judged against the systematic risk exposures / budget it consumes. You can measure this by adding the absolute values of the style factor betas and the CAPM beta.

So my Contest 38 algo's 0.26 Sharpe Ratio no longer looks that bad given it only consumed 0.00 + 0.08 +0.03 + 0.01 + 0.04 + 0.00 = 0.16 of 'beta'.

Whereas the one that has done well in the new contest consumed 0.06 + 0.12 + 0.30 + 0.05 + 0.11 + 0.21 = 0.85 of beta.

What happens in a large pool of algos that take significant systematic risk exposures is that some -- by chance -- will be on the side that receives the corresponding risk premiums. That was the big insight I got from the webinar.

Thanks, Leo. That is definitely the way forward.

At a minimum you should have one downturn, one bull market and one consolidation. That means starting from 2007 or before. Going further will be better but not possible in Quantopian much further.

Definitely agree to have a meaningful backtest that encompasses different market regimes. I would go even further as to propose a 2-3 year holdback data say starting 2016 to present as a OOS validation period which will have some scoring weight together with 6 month live OOS to account for consistency. As I have advocated before in previous posts, accurate measurement of metrics should have the same time period, same OOS start date and end date!

This thread is about how to correctly structure these types of strategies.

The rules for Contest 38 were designed to encourage contestants to structure their algorithms correctly. The 5 bps slippage model was introduced, and a new leverage = 1 rule was added.

It was known in advance that the ranking for that contest was not going to be decided by Sharpe Ratios.

The new contest aims to solve many problems by requiring algorithms to pass the constraints first. Only then is it fair to judge the overall quality of algorithms by their risk adjusted returns. But now it seems contestants have learned that the best way to do well in the contest is to load on common risk factors.

This is my last post on the subject. I shall be retreating to a gentler life away from the forums!

Good luck, everyone.

Per the requirements on Get Funded, the overall return just needs to be positive:

Positive returns

Strategies should have positive total returns. The return used for the
Positive Returns constraint is defined as the portfolio value at the
end of the backtest used to check criteria divided by the starting
capital ($10M). The is no concept of "excess return" in evaluating algos (which might be o.k....see below). By the way, this requirement needs some minimum durations (e.g. minimum 2.5 year backtest, with minimum 0.5-year out-of-sample). Given that Q is cobbling together a "crowd-sourced" fund of 30 or more algos (one would hope hundreds), incremental alpha accretion may be more important than locking in a specific return minimum (e.g. the risk-free rate). It is also worth noting that the Get Funded requirements don't include incorporation of a volatility metric. So, presumably it is not taken into account in judging algos for funding? This is a big disconnect between the contest ranking, and the Get Funded requirements. It is implicit in the top-level goal "Create a trading strategy on our platform which will continue to make money in the future. " but there's actually no explicit guidance on volatility relative to returns. Hi Grant, Interesting you brought up the issue of volatility metric. While not highlighted in the Get Funded page, it is very much reflected in the scoring system of the new daily contests where the 63 day rolling volatility of the algo is use to adjust cummulative returns and is floored at 2%. A contest score of 1 can be achieved at the very minimum to have a 6 month OOS live returns of 2% with a 63 day rolling volatility of 2% or less. On the other hand, a score of 1 can also be achieved by having a 6 month OOS live returns of 50% with a 63 day rolling volatility of 50%. Given these two equal contest scores with extreme volatilities, it not clear how they will be ranked. Will they be ranked equally or should they favor the lower volatility which is more logical since at the fund execution level they plan to leverage it many times. The current contest rules does not explicitly address this issue, but I suspect they are particularly biased with low volatility algos. For these reasons, like Olive Coyote, I have refocused on containing risks to achieve low volatility after neutralizing beta, style and sector risks factors with the expectation of lower returns. @ James Villa - I have refocused on containing risks to achieve low volatility after neutralizing beta, style and sector risks factors with the expectation of lower returns. Same basic approach here, using multiple pipeline factors and combining them (sum of z-scores). Main take-aways are to get to ~100 stocks minimum and keep daily turnover to ~0.1. And then waiting 6 months+, which as Jess pointed out is not the ideal cycle time for development. Ugh... OC, In my personal opinion, factors that affect the performance out of sample vs in sample are more than just whether the algorithm itself is overfit. It is possible the algorithm latched onto some property of the market that has been persistent in the short term and continue into out of sample (but favorable conditions may not continue forever). In that case out of sample could still be good but not predictable over a very long term. It is also possible the six month out of sample period is just a different market regime like the current 6 month period where there is only sideways movement. Algorithms that performed well in this period (like in the new contest) could be the ones that thrive on volatility (say mean reversion algorithms). Whether mean reversion will continue to perform at the same level going forward is unknown, and neither is an out of sample under performance in such a period indicative of overfit of a strategy that was developed when markets were rising. In my personal opinion it is important to get a handle on what drove the performance in-sample when we analyze whether out of sample is overfit. Also I consider it important to tally performance with the same market regime going backward. @Leo, @ Jess, I followed Jess's advice earlier in the thread to re-run the analysis with the 1/N naive diversification approach. Interestingly, the picture is similar, but this time the out-of-sample period fits within the Bayesian Cone. At the bottom, admittedly, but still within it. Conclusion: the technique of holding out data and using the Bayesian Cone won't necessarily force you to discard strategies just because they have underperformed recently. Hi Olive, I can relate to your idea on primary/couple factors per model well: 4) Restrict models to having a couple of ideas / factors. Then have the discipline to say that those ideas have now been ‘used up’. This also ensures a nice conveyor belt of fresh ideas. I have hitherto been focused on primary+supporting factors in my contest and live algorithms - I treat them as OOS validation of the factors - still some distance from actually combining all the factors into one single algorithm. ps: As for overfitting, the situation seems different if each trade is treated as if a game in reinforcement learning according to Tom Starke. Hi Karl, That's good to hear! Intuitively, at least, it seems to guard against accidentally cherry-picking a selection of factors, but we do lose the benefits of diversification, I suppose. Thanks for the reference on reinforcement learning. Perhaps we need the ability to 'stitch' backtests together in Quantopian, so that we can design strategies where the parameters evolve. @Leo, I agree with your thoughts...the current use of the word overfit is overloaded and under defined...in my opinion! alan I think the two main contributors to Overfitting are: 1) Over-parameterization. 2) Reliance on in-sample data. In this post, I address problem 1) by imposing economic constraints on the parameters. Practically, this means avoiding the temptation, for example, to exclude sectors. I address problem 2) by using Bayesian portfolio construction techniques. By definition, Bayesian techniques place some weight on a Prior distribution. When it comes to attaching weights to stocks, two sensible Priors are: 1) Equal-weighting (1/ N). 2) Weighting by Market Capitalization. The Data Stage then 'fits' the algo to in-sample data, and the Posterior Stage combines the two in some way. I return to the two factors used in my Contest 38 algorithm, and use the period from 1st February 2016 to 31st January 2018 for the Data Stage. I have used the period from 1st February 2018 to 31st July 2018 as a Validation Set. It plays no role in the portfolio construction process. I think the bottom-line results are far more realistic, with a more plausibe in-sample Sharpe, and less degradation out of sample. 4 Loading notebook preview... Notebook previews are currently unavailable. One concern I do have is with the floor for turnover (currently 5%). It seems that one needs to use signals such as those generated by StockTwits, or by technical trading signals to generate 5% turnover. Reducing the floor to, say, 4% would allow more purely fundamental-based strategies. Hi Antony, In my experience, adding an additional factor, if it's different enough from your first factor, will bring up the turnover to above 5% average daily usually, even if both are based on the fundamentals dataset. Thanks, Joakim. Perhaps I'm reading too much into one particular factor (Dividend Yield). Cheers. Folks might find some of the discussion in this book relevant: Systematic Trading: A unique new method for designing trading and investing systems by Robert Carver Link: http://a.co/1t6CNJn The author also presented at QuantCon 2017: "Trading Strategies that are Designed, Not Fitted" by Robert Carver from QuantCon NYC 2017 https://youtu.be/-aT55uRJI8Q Hi @Jess, Just a comment and quick question regarding your comments on the new scoring system, as per part 2) of your post from 4 days ago, in which you state: "... it highlights the improved SIMPLIFIED scoring mechanism for the daily contest where if you qualify you're just ranked on risk adjusted returns. I think that will avoid this effect going forward where we see counter-intuitive rankings from a Sharpe basis". My thoughts are that ALL different risk adjusted return metrics that one could think of, irrespective of whether Sharpe, Calmar, Sortino, etc or anything else are all using SOME sort of ratio of the form: (function of Return)/ (function of Risk), AND so is the new simplified scoring formula as well. I certainly have no problem at all with that per se and if, in the opinion of Q, the new formula works "better", then that's absolutely fine by me. My question (in several parts) however is this: Suppose there are two different algos that score identically based on the new formula (or whatever other risk-adj return metric might be used in future), then how do you proceed from there in determining & ranking which algo is preferable in practice (e.g. for allocation)? If 2 algos had the SAME risk-adj return score and, all other things being equal, then surely the one with the higher absolute return would be preferable? Is that right? Please could you comment. Next part of my question is then, if we have 2 algos that are NEARLY the same in terms of Risk adj Rtn, with algo "A" being just SLIGHTLY better then algo "B" in terms of Risk-Adj Rtn, then my understanding is that algo "A" comes out ahead of "B" in the daily contest, irrespective of the UN-adjusted returns of either algo. But now if algo "B" is WAY AHEAD of "A" in terms of cumulative return (not risk adj), then, in practice for allocation, do we have a situation where Q would prefer algo "B" (with the much better absolute return) over algo "A" (with an ALMOST identical Risk-Adj Rtn and essentially identical in other respects)? Putting this another way, is there potential for a situation in which the rankings for allocation would NOT correspond to rankings in the contest? I assume that the answer is probably yes. Please could you comment. Cheers, TonyM. Hi @Tony, I have raised this point in my above post: A contest score of 1 can be achieved at the very minimum to have a 6 month OOS live returns of 2% with a 63 day rolling volatility of 2% or less. On the other hand, a score of 1 can also be achieved by having a 6 month OOS live returns of 50% with a 63 day rolling volatility of 50%. Given these two equal contest scores with extreme volatilities, it not clear how they will be ranked. Will they be ranked equally or should they favor the lower volatility which is more logical since at the fund execution level they plan to leverage it many times. The current contest rules does not explicitly address this issue, but I suspect they are particularly biased with low volatility algos. At the execution level of the Q hedge fund, an algo chosen for allocation will be subjected to a separate analysis by Q as to how many times it will be levered., the process being opaque maybe because it is proprietary but consistent with Steve Cohen's Point72 trading strategy which is to leverage it up to 8 times. Having said that, I believe Q would prefer to go with the algo that has a lower volatility (risks) rather than higher unadjusted returns assuming scores to be equal or almost equal . The new contest format and scoring system utilizes one unit leverage, however at execution level it is leverage many times over depending on Q analysis of algo. I have raised this question to Dr. Jess Stuath and below is her answer: @james you asked a question about why we want to evaluate strategies at unit (1.0) leverage. Specifically "This is what is throwing me off, is this the intended usage of Q's market neutral strategy? If so, then why not design the contest to reflect that (x times leverage)?" The answer is that it makes our task of evaluating strategies at scale that much simpler if we can assume a fairly consistent leverage profile across all candidate strategies. In our investment process we apply leverage at the portfolio level, and we assign a weight in the portfolio to each individual algorithm. So we think about weights and leverage separately in our process. While it's certainly true that we could try to back out leverage applied at different levels by different users, that can get complicated if people use widely varying leverage over time in their strategies. There's nothing wrong with that approach in principle - but it not only makes our evaluation problem harder, it makes combining such a strategy into a portfolio of strategies more challenging as well. Under the current contest design the way I think about it is that we're creating a level playing field of max leverage = 1 and allowing people to compete to achieve the best results possible given that (and several other) constraint(s). So, bottomline, even if your algo is chosen for allocation, the total amount of allocation and thus your potential earnings is dependent on how many times your algo will be levered which is a totally separate analysis and is solely a Q decision. Update on Bayesian Methodology: Step 1) Go back as far as the data allows, and split the data into 75% training set, 25% validation set. Step 2) Does the performance of the posterior in the validation set sit within the Bayesian Cone? If 'yes', proceed to Step 3. If 'no', discard the factor. Step 3) Calculate a new posterior based on the full data set. This is the final submission. An alternative yes/no test in Step 2 is to use a threshold Sharpe Ratio such as 0.50. Hi @Antony, firstly my apologies to you, it was not my intention to sidetrack your discussion of your methodology. Hi @ James, nice to chat with you again and thanks for your comments. Although I have been away from Q for 6 months, evidently we are both still continuing to think about very similar issues. The combination of your & Jess's comments as per your post above clarify the following points: • For evaluation & comparison purposes, Q wants a level playing field with leverage = 1, as per the contest. I think this is clear, as is Jess’ explanation of it. • For investment purposes, if Q selects an algo then Q will subsequently apply some leverage factor. It is not clear (to us) what that leverage factor will be, nor how exactly Q arrives at it. Personally I am quite OK with simply accepting that as being "Q-confidential" … unless Jess or someone else in Q would care to clarify further. • I believe, James, that you are correct in saying that: "the total amount of allocation and thus your potential earnings is dependent on how many times your algo will be levered". From the previous point, we don't (currently) know what that leverage factor would be, but does it really matter in terms of the algo design process? As both Q and the algo author share a vested interest in maximizing (risk-adjusted) returns therefore, at least from my perspective, I'm quite happy to just leave it that whatever leverage Q finally chooses is not really a "need to know" from an algo author's perspective. • Conversely however, what I think is VERY relevant to serious algo developers, is obtaining more clarity on Q's selection process, especially in cases where the Risk-Adjusted returns of several algos are almost the same (and therefore would score similarly in the daily contest). Evidently achieving good daily contest results constitute part of a "necessary but not sufficient set of conditions" for Q's selection of an algo for real-live trading. It is the set of "additional" conditions, in the form of Objectives or objective functions that ARE important to the algo designer. -In particular, James, you have highlighted the fact that currently there is very little if any transparency on this issue. For my part I assumed that, given EQUAL Risk-adjusted returns, Q would probably put priority on higher absolute returns (which is what I do in algos for my own personal use as a secondary objective after the primary one of maximizing Risk-Adj Returns). You have suggested quite the contrary, namely that you believe: “ … Q would prefer to go with the algo that has a lower volatility (risks) rather than higher unadjusted returns assuming [Risk-Adjusted Return] scores to be equal or almost equal “. And you may well be right, but the situation is that we really just don’t know. As part of a rational algo design process, it makes sense to have both Primary and Secondary Objectives. I think it has been made clear that the Primary one is the maximization of Risk-Adjusted Return in the well-defined form as stated by Jess and specified precisely in @Rene’s white paper on Q’s risk model. However what should we be using as Secondary Objective(s)? Please @Jess, @Rene, @Delaney & others, can you give us some specific feedback from Q on the relative importance of multiple objectives beyond the primary one of maximizing Risk-Adjusted Return? Regards, TonyM. Hi Tony, Pretty much the whole point of the thread was my attempt to distil into a couple of examples my interpretation of Dr. Stauth's webinar on what they look for when allocating capital. If you follow the evolution, you can see that the first two notebooks are well received, and share the theme of almost completely eliminating systematic risk. This is a big plus for a portfolio manager of multiple algos, because systematic risks are additive at the portfolio level (not diversifiable). You then see later a comment that thanks me for giving an example of terrible practice! In that contest, I submitted 3 entries with backtest Sharpe Ratios of 3.1, 3.4, and 3.6. Ovefitting! What I have since learned (and is missing from the first two notebooks) is that they would like us to use as much data as possible, but in an efficient way, so that some of it is held back for cross-validation / holdout purposes, whatever you want to call it. I am taking a Bayesian approach to this using a 75% / 25% data split, but here are a couple of other methods discussed in the webinar: 1) Split the data in half. Train on the first half, test on the second half. 2) Alternate the data, so that you train on 2010, 2012, 2014, 2016 and test on 2011, 2013, 2015, 2017. Cheers. Hi Antony, Thanks for your comments, and I think you have done a great job in your preceding posts, both in answering some questions and also in raising some interesting new ones! I will now try to link my interjection to your previous train of thought. In your first notebooks you have certainly done exceptionally well in eliminating systematic risk (or the risk associated with "common returns" as it is called in Q's model). As you say, it was very well received by Q and this implicitly answers a question that I raised in a separate post asking if the goal is intended to be the maximization of "specific (i.e. non-systematic) returns" ? Although Q has not responded to my question (yet), apparently the answer seems to be yes, as you have effectively demonstrated, and I agree with your comment about systematic risk and its impact for a fund running multiple portfolios. With regard to high Sharpe ratios (or any other quality metrics), I guess all we can say is that high values are "probably a necessary but almost certainly not a sufficient" condition for a good system. What we get matters less than how we got there! As algo developers, Q has given us a precise set of constraints, a single objective function to maximize (namely one specific function of Risk Adj Rtn), some good general guidelines, but then Q seems consistently to avoid answering various other questions. Sometimes I find that Q's lack of direct answers to direct questions seems frustrating, but maybe there is an underlying reasoning behind it. Although there are other possible explanations, presumably Q's intention is simply to try to encourage as much algo diversity as possible, even though it can be frustrating to so frequently have to "tease out" information by inference. For example, your comment: " What I have since learned ... is that they would like us to use as much data as possible". Just FYI at the Quantcons in Singapore for the last 2 years, Delaney offered (approximately) the following comment to people who aspire to win an allocation, saying that he had THREE hints for them, namely: - Don't only consider price data, - Look at Alternative data, and - Use non-pricing data. ;-)) With regard to data hold-back, splitting data sets, and avoiding over-fitting, there are lots of different ways to do it (e.g. your Baysean approach, the webinar & elsewhere), and also in acknowledgement to @LeoM 's excellent comment: "I wanted to elaborate on that because I think that goes to the core of strategy development. Are we accounting for risk exposures in a way that the strategy is balanced in all market regimes without knowing which market regime it is currently operating in". Sometimes we don't even know what regime the market is currently in. And of course we don't know what the market will throw at us in future. Will it be anything like what we have already seen at some time in the past? Will it be nothing like what we have ever seen before in THIS market but nevertheless maybe something similar to what has been seen in some other completely different market? Is data from other unrelated markets just a useless diversion, or is it in fact a plausible analogue for what MIGHT happen in a possible future market regime in our market, even if never seen before in our specific data set(s)? And of course on the other hand we always have the question of how really can we avoid, or at least minimize, the adverse impact of over-fitting or data-mining bias, or whatever else we call it? I have 2 ideas that I come back to in my own personal system development outside of the Q context, namely: 1) Consider possible use of ALL data, from all real markets, everywhere, over all timescales and all time periods. Why? Because this is the only way to get a look at the full spectrum of possible market regimes that our system might need to be prepared for in an always-unknown future. 2) Use NO actual historical data series at all. Why? Because this is the only way that our system can avoid all the various forms of over-fitting. Base the system entirely on logic only, and avoid anything that looks like data mining in any way. Neither of these are "conventional" approaches, and perhaps you might be skeptical, but personally I have benefited from at least considering the key aspects of both of these rather extreme ideas. Cheers, all the best from TonyM Looking forward to more interesting & practical discussions with you. On the theme of real markets everywhere... Value and Momentum Everywhere Yes, good example, you got it. Some great ROBUST systems can be designed based only on thinking carefully about different aspects of known archetypal market behaviors and what therefore "should" work, without using specific data at all! I'll quickly step in here and say that I agree. The best approach is simply borrowing from what we've learned is the best approach in the natural sciences. Use your understanding of the world (markets) to come up with an idea for something that is true, or should be true and probably isn't due to inefficiency. Then create a model which predicts future states based on this hypothesis. Certainly you should let the data inform your world view as you constantly update your prior, but remember that it's prior -> hypothesis -> model -> test -> updated priors. I did a long-form webinar about this a while ago if anybody is interested. Model name: Portable Alpha 1 Data sample considered: 8 years Backtest: 4 years Typical holdings: 900 long, 900 short All Quantopian Contest constraints satisfied? YES SPY beta: 0.00 Jensen's Alpha: 2% Sharpe Ratio: 1.10 Momentum beta: 0.00 Size beta: 0.01 Value beta: 0.00 Short-term reversal beta: 0.00 Volatility beta: -0.01 3 Loading notebook preview... Notebook previews are currently unavailable. Model name : Portable Alpha 2 Commentary: Adds an extra factor to Portable Alpha 1. Improved Sharpe Ratio and exposure profile. Data sample considered: 8 years Backtest: 4 years Typical holdings: 900 long, 900 short All Quantopian Contest constraints satisified? YES SPY beta: 0.0 Jensen's Alpha: 2% Sharpe Ratio: 1.49 Momentum beta: 0.00 Size beta: -0.01 Value beta: 0.00 Short-term reversal beta: 0.00 Volatility beta: -0.01 6 Loading notebook preview... Notebook previews are currently unavailable. @Antony Jackson, Very impressive indeed! Out of curiosity, are these two alpha factors or either one derived from fundamental data supplied by Morningstar? The reason I ask is because of active discussions on the shortcomings of fundamental data by Morningstar and my post here Hi James, Yes, 3 fundamental factors, I’m afraid! Hi Anthony, I have been testing 1 up to 5 fundamental factor combinations, scored across QTU over 10 years and sometimes the shortcomings of the fundamental factor/s used manifest itself in backtest performance in areas like low turnover, inconsistent returns / sharpe, etc. Don't be afraid, if you do due diligence on the fundamental factors that you use. Things like checking and verifying if the factors you're using all have 4 quarters of data if that is the frequency of reporting, throwing away the Nans or stocks with insufficient data. My point being that data used as inputs should have veracity and consistency, otherwise, garbage in, garbage out. I would rather that Q perform this data integrity checks and standardization much like they did when they came up with the QTU universe, filtered and processed to their specs. Model name: Portable Alpha 3 I'll make this my last post on the subject, as otherwise it gets monotonous! I've added an event-driven risk factor to this strategy, which you can see comes through on a regular basis in the turnover chart. Most of the exposures are zero or close to zero, with a large well-diversified portfolio. The key performance statistics are: Sharpe Ratio: 1.73 Alpha: 3% The three models Portable Alpha 1, Portable Alpha 2, and Portable Alpha 3 are now my three contest entries, so it will be interesting to see how they perform. 5 Loading notebook preview... Notebook previews are currently unavailable. Hi Antony, Great work! I’d be interested to know how correlated they are to each other. Have tried to run the combined factors through Delaney’s alpha correlation check notebook? If they are not very correlated, why not combine the factors into a single algo? Hi Joakim, It is the same algo each time, adding an extra factor in each version. Cheers. Gotcha, thanks! It’s an impressive one. Have you tested it OOS at all to see if/how overfit? Did you test the factors in AL? Yes, at the risk factor selection stage, I look for consistency in the periods 2010-2014 and 2014-2018. Then I use a Bayesian method for the 2014-2018 period. This involves a ‘prior’ model unrelated to the data, which helps mitigate data-snooping risk. To be honest, I only use Alphalens to check the factor data looks sensible for complicated factors, and for analyzing factors that generate large turnover in the backtester. I see. Have you checked the correlation between the factors’ return streams? Are they equally weighted, or you do something fancier? Mine are all equal weighted currently. I may look at giving more weight to stronger factors in the future though. Just equally-weighted at the moment. Seems like a potential minefield in terms of overfitting to model the factor covariance matrix as well! If I were to put different weights on individual factors, I think I would run them as separate strategies and use (1 / Vol) for the weights. Fair enough. Eaqual weight of two highly correlated factors effectively means stronger weight to those factors than to other less correlated factors though, right? Not an easy thing, not to me anyway, and I’m very worried about overfitting too. Yes, you are right, Joakim. I haven’t addressed correlation between the factors yet, and sidestepped it by using equal weights, but it can’t be efficient including things that are close proxies for each other. Haven't posted for a while. This is a balance sheet-based strategy with 2 underlying risk factors. I have changed my portfolio construction technique such that all exposures lie in the range -0.05 < beta < +0.05. The intuition behind this informal rule is that too much constraining may kill the original alpha. Usual set-up:$10 million notional, default cost model.

1
Notebook previews are currently unavailable.

NB: I now perform the in-sample / out-of-sample testing at the AlphaLens stage. I split an 8-year period in two, and look for a statistically significant IC of > 0.10 in both samples.

Good call I reckon. I try to do something similar. Don’t think I ever got a mean IC of > 0.1 though. If I get a mean IC of around 0.01 and risk-adjusted IC or around 0.1 I’m happy. :)

Possibly looking to assign more weight to factors with historically higher risk adjusted IC.

For this next model, I wanted to find out if it's possible to develop a model only using feedback from the 'new' backtest screen. It turns out that the graphics on the style exposures are particularly clear.

I've found that it's possible to run long backtests, but difficult to run the tearsheets to completion, so this has been a very useful exercise.

This model only has a 2-year backtest, just to get used to the visual feedback from the new screen.

Model Name: Robust
Jensen's Alpha: 4%

0
Notebook previews are currently unavailable.

Hi Antony,

I saw this issue come up a few times now that the backtests run but the tearsheets hang. Did you know you can modify the notebook to run other subsets of analyses vs. the full tearsheet option? e.g. create_simple_tearsheet will return just summary stats table and a subset of the basic plots.

I'm ripping off a nice answer Cal gave someone in helpscout below in case its useful in more detail

Cheers, Jess

... all create_full_tear_sheet() does is make calls to other tear sheet generators, which you can run individually. Analyzing your backtest in this fashion will allow you to create tearsheets on larger backtests than would be possible otherwise.

Here are the functions that create_full_tear_sheet() calls:
create_returns_tear_sheet()
create_interesting_times_tear_sheet()
create_round_trip_tear_sheet()
create_capacity_tear_sheet()
create_risk_tear_sheet()
create_bayesian_tear_sheet()
create_perf_attrib_tear_sheet() (this one is usually the most memory hungry)
create_position_tear_sheet()
create_txn_tear_sheet()

Also, create_simple_tear_sheet() is a smaller version of create_full_tear_sheet(). You might be able to get away with running that one.

For example, try running the following code in a research notebook:

bt = get_backtest()
bt.create_returns_tear_sheet()

Lastly, a few of the functions that create_full_tear_sheet() calls are hidden behind if statements. For example, create_txn_tear_sheet() will only run if your backtest passes the following test:

if transactions is not None:
create_txn_tear_sheet()

You can view all of the functions, and the various attributes that an algorithm must have in order to create that particular tear sheet here:
https://github.com/quantopian/pyfolio/blob/master/pyfolio/tears.py#L67

Thanks very much, Jess. That's very helpful.

I just about managed to get the tearsheet to run for a 4-year backtest.

The results seem to confirm that the 'new' backtest screen is sufficient to construct an adequate portfolio.

Model Name: Robust II

Sharpe Ratio: 2.38
Calmar Ratio: 4.19

5
Notebook previews are currently unavailable.

This is a note to myself about the current state of my methodology. I've revisited an algorithm from a while back that had the title 'Portable Alpha 3' and made the following changes:

1. Replaced an ad hoc method, with which I was never entirely happy, with a more robust technique that makes more efficient use of the data.

2. Replaced some stale statistics based on the old Q1500 universe.

I now have a template for a 4-factor model that applies equal weights to each factor. The emphasis is on finding new factors and researching them in AlphaLens. The rest is now a mechanical procedure.

5
Notebook previews are currently unavailable.

My final entry of 2018. Looking forward to using the platform for teaching over the next couple of months.

First, a summary of the methodology:

1) In the research phase, identify fundamental factors that are profitable in both the 2010-14 and 2014-18 periods. This turns out to be a surprisingly high hurdle to overcome.

2) Due to the large amount of positions held, for the 2014-18 period, construct a portfolio that approximately puts the same weight on each factor while stripping out exposures to common risk factors.

The next model is a five-factor fundamental model. The performance statistics are as follows:

Model Name: Blend II

Sharpe ratio: 1.90
Calmar ratio: 2.53
Jensen's alpha: 3.0%
SPY beta: 0.00

Momentum beta: 0.00
Size beta: 0.00
Value beta: 0.00
Short-term reversal beta: 0.00
Volatility beta: 0.00

This is the 'purest' alpha strategy I have written so far; graphically the total return line and the specific return line are tight to each other, and the common return line is virtually horizontal.

Looking forward to 2019, I have one hyperparameter in this methodology which at the moment I do not vary. I plan to tune this parameter by training algorithms in the 2012-16 period, and using the 2017-18 period as a validation set.

4
Notebook previews are currently unavailable.

Inspired by the great work in these two threads,

Alternative Test For Overfitting

and

Tackling Ovefitting

I revisited my latest model, Blend II, and ran it from 2010-2018.

My portfolio typically holds approximately 1800 positions, so there is insufficient memory to run an 8-year tearsheet, but the 8-year backtest runs to completion without any problems.

I was pleased to see that the model survives the entire period, particularly with regard to the catalog of risk constraints. The maximum drawdown only slightly deteriorates to -1.5% over the 8-year period.

There is some degradation of the Sharpe Ratio 'out-of-sample', but deep down I suspected that my Bayesian technique has been updating the 'prior' too rapidly with respect to the data.

Rather than rebuilding this model, however, I am going to adjust my overall technique ready for the next model . To rebuild the current model would only leak the current out-of-sample period back into the training set.

Overall, very pleased. This approach makes the memory limitation on tearsheets a plus for robust model development.

Hi Anthony,

There is some degradation of the Sharpe Ratio 'out-of-sample', but deep down I suspected that my Bayesian technique has been updating the 'prior' too rapidly with respect to the data.

What do you mean by "...updating the 'prior' too rapidly with respect to the data."? Can you please elaborate on that?

Hi James,

I don't want to elaborate too much, but I'm thinking in general of techniques that shrink estimated parameters towards zero.

I thought the the concept was explained well in one of the latest webinars (in the context of Ridge Regression):

Home Runs and Strike Outs

Edit: the direction I'm heading long-term is in line with the recent asset pricing literature --- is the market pricing a risk? A conservative prior is to say that the value of the risk premium parameter in a Fama-MacBeth two-pass regression is zero, and then (possibly) update that opinion in light of the data.

Hi @Anthony, regarding "This approach makes the memory limitation on tearsheets a plus for robust model development." I have had a similar experience. Sort of a blessing in disguise actually.

I agree Leo. Forced holdouts of various types definitely help.

Hi Anthony,

Thanks for the reference. I'll have a look at it when I get a chance.

Note to self:

Today successfully completed a methodology for generating a family of Bayesian shrinkage estimators, with parametric intensities.

Just cataloguing a new algorithm;

Model Name: Bayes-Stein

Backtest Period: 12-2014 to 12-2018

Annual Return: 4.2%
Annual Volatility: 1.6%

Sharpe Ratio: 2.56
Calmar Ratio: 3.76

Max Drawdown: 1.1%

1
Notebook previews are currently unavailable.

Not too shabby either I reckon. Well done!! I guess you’re not using FactSet for this one?

Is this OOS? If not, how does its OOS stats compare?

I do the in-sample / out-of-sample analysis at the factor level (4-year / 4-year split).

Then it's a mechanical Bayesian construction method.

I haven't got round to FactSet, yet!

The real headache for me has been getting tearsheets to run for more than 4 years. I think I've found a way round that problem, though, and am working on something that includes social sentiment data, starting back in 2011 when the message volume started to be decent.

Cool! Very impressive strategies!

Killing all notebooks and starting with fresh memory at around 5% usually works for me for up to 7-8 year long backtests. You may be trading more and holding more positions though so might use more memory perhaps.

This is the first version of the final model I've written before moving on to FactSet data.

The underlying model is based on five fundamental factors and social sentiment. This version just applies equal weight to each factor.

The sample period is 2014-01-01 to 2018-12-28.

The strategy has a vol of 1.5% and a Sharpe of 1.7.

0
Notebook previews are currently unavailable.

I intend this to be the final post I make in this thread, the reason being I've just made the finishing touches to an overall methodology that started way back in July with the 1st tearsheet webinar.

Before posting my final model, I wanted to share some general findings. They may not be to everyone's taste, but they seem to be working for me.

• Out of sample testing needs to occur at the alpha research stage. I currently split the sample into two 4-year periods and throw the signal away if it fails in either period. A similar, more sophisticated, approach has been put forward by Dr. Thomas Wiecki that essentially still splits the sample in two, but guards against regime change by having in-sample and out-of-sample periods alternating in all years.

• The backtester is best used to test if the alpha signals survive real-world conditions and --- crucially --- to mitigate common risks. If out-of-sample testing hasn't been addressed at the research stage, it's likely to be too late at the completed algorithm stage. The data snooping has already occurred.

• Risk mitigation acts as a final check against overfitting. I have found that, if you start with the same alphas, but use different portfolio construction techniques , once common risks are hedged away, the models pretty much arrive at the same place. I tested this several times in the contest by running models in parallel and noting the similarities in the equity curves.