Back to Community
VALUE + GROWTH + QUALITY : Composite Combination

With a little help from my friends (shout out to @Viridian Hawk), I've been trying different ways of combining my different factor composites into a single strategy. It's been interesting to see how they interact with each other, and I've transformed the factors/composites a bit and tried to manage the final exposures slightly as well, but I'm probably not doing this entirely 'correctly' (what can I say, I'm still learning).

Are they better off as individual strategies, or more robust when combined (or just more overfit?), I have no idea, but I'd be interested to hear what others think.

Since my VALUE composite failed the profitability requirement in the contest, I'll submit this one instead, which has about 1/3rd 'Value' baked in to it. Since both the GROWTH and QUALITY composites were still profitable during the last 2 years, I think this combined one should be ok as well. However, since this one has such low volatility (around 1.3% or so), I plan to submit a more concentrated version (with volatility > 2%) to the contest, so it doesn't get penalized by the volatility floor in the score calculation. A bit unfortunate if you ask me.

Any feedback welcomed.

Loading notebook preview...
16 responses

Alpha decay & Risk exposures NB:

Click to load notebook preview

Thanks Antony. Yes, the backtest is trading at the open but the notebook is using close prices. All factors were also developed in Research (not the IDE), and only the ones that held up on held out quarters were kept and included in the composites. I also have a 'hypothesis' for each factor that I came up with before looking at the statistics (i.e. I'm not fitting my hypothesis based on what the stats are saying), and in each 'factor composite' they all have a similar economic rationale, e.g. 'Value' in the Value Composite. The combination piece, and the Style risk regressions for this strategy could definitely be overfit though, as I did these in the IDE, which may not be best?

But yes, I agree, these results are most likely quite a bit too good to be true. I tend to learn best from making mistakes though, and that's my main aim here, it's not trying to impress anyone.

PS: In case anyone is wondering, Cliff's note referred to above I believe is from his Liquid Alt Ragnarök paper, especially page 6. Model robustness is indeed key, and preferred over a high in-sample Sharpe Ratio model that doesn't generalize on future data.

Are they better off as individual strategies, or more robust when combined

For the contest, combined is likely stronger. At the fund level, combined signals are also advantageous due to the higher Sharpe, stability, and lower volatility. However, I would think under the new structure for the Q Fund, Quantopian would want to combine the individual algos themselves.

penalized by the volatility floor in the score calculation

I must have missed this -- I didn't realize the contest has a volatility floor. This further supports my critique that the contest favors algorithms with intermittent winning streaks over algorithms like yours with stability approaching 1.0.

I wonder whether it's better to increase your position concentrations at the final combination stage (via dropping the positions with the lowest weights and then re-normalizing the weights) as opposed to individually on each style composite. It seems like it should be, no?

Better to build simple models with sustainable SRs in the range 1.25 - 1.50 that generalize, in my opinion.

I agree it's better to build robust signals and avoid overfitting. However, often in these forums and in lectures and books experts claim that you should be exceedingly skeptical of strategies with high Sharpe Ratios. I don't buy it. If you combine enough "simple models with sustainable SRs in the range of 1.25 - 1.50", you'll soon enough arrive at SRs of 4.0 - 5.0+. I think one should be equally--or rather even more--skeptical of low Sharpe ratios. Happening upon a weak spurious correlation is orders easier than arriving at a strong spurious correlation. I don't see how a Sharpe Ratio of 1.25 in of itself will be any more sustainable than 4.5.

So what I'm getting at is, I don't think Joakim's 4.0+ Sharpe ratio in this super-composite should raise eyebrows any more than the 2.5 Sharpe Ratio algorithms it is comprised of. Each of those style composites is probably made up of three or four simple factors, each closer to the 1.25 - 1.5 SR range. Ultimately the question hinges on whether those individual factors are robust or overfit. It sounds like Joakim followed a pretty good process, but as you point out he may also be inclined to juice his in-sample SR.

Somewhere there's a thin line between juicing in-sample Sharpe ratio and making improvements that are somewhat predictive. If out-of-sample performance improves at all, is it not worth it?

Changing rebalance to close is a good start. What other wrenches can he throw in the gears to test robustness?

@Viridian and @Antony,

Thanks for all the feedback - I find it extremely valuable! Here are my thoughts on some of your comments:

I must have missed this -- I didn't realize the contest has a
volatility floor. This further supports my critique that the contest
favors algorithms with intermittent winning streaks over algorithms
like yours with stability approaching 1.0.

I tend to agree. I think they do it to lessen the impact of anyone trying to 'game' the score by creating an artificially low volatility portfolio 63 days from today, which I can understand. My suggestion would be to drop the volatility floor for any strategies running for 63 days or more in the contest, as those would be using 'real' volatility then.

I wonder whether it's better to increase your position concentrations
at the final combination stage (via dropping the positions with the
lowest weights and then re-normalizing the weights) as opposed to
individually on each style composite. It seems like it should be, no?

This is basically what I'm doing. Not necessarily because I think it would perform better (I honestly don't know), but more because I found it easier to implement. I'm way below average in this forum when it comes to coding ability.

Regarding the high in-sample Sharpe Ratio, I agree that it's pretty useless if it doesn't work on future data. For this one, I'd be happy if I get a Sharpe Ratio of 1 or so (during 'normal' market regime) on future data, or even just higher than the historical Sharpe Ratio of the S&P of about 0.4. Maybe that's too optimistic even? So yeah, it might be better to try to identify the factors with highest economic merit, and pounce conviction on those instead?

A lower in-sample Sharpe is completely fine with me if it means higher likelihood of a higher Sharpe on future data. In general though, I struggle with being able to distinguish if a factor is performing poorly OOS because the factor is crap or because the OOS period is an unfavorable one for that particular factor. This is a big reason for why I like to use Thomas' odd/even quarters cross-validation over an extended period of time.

Just FYI here's the same strategy rebalancing an hour before close, and with slippage turned off, both in line with the new guidelines.

I only rebalance at the open because I believe I'll get a slightly higher score in the contest that way with this type of strategy (I could be wrong), and I also like to keep default trading costs as that's what the contest uses. Until/unless the contest changes to be better aligned with the new guidelines, I'm unlikely to change these.

Click to load notebook preview

I normally don't test/validate (or train) on this period for two reasons:

  1. Not all datasets are available this far back, and
  2. The GFC period is so unusual, I don't think it's very well suited for either training or testing.

Basically y'all have to trust me on this but I haven't trained on anything before Jan 2010 (so the first period is the OOS in this case). This is a single backtest I've done on these combined factor composites, rebal at open with default friction. Holds up pretty well if you ask me. :)

I had to remove two of the factors as I don't have access to data for those this early. I'm quite confident in those factors however so I'm not too concerned about that. Honestly I was quite skeptical before seeing this but now I'm starting to perk up. :)

Not sure what to make of the high Kurtosis - could this be because of the Quant Quake in mid Aug 2007?

Trying to poke holes at this, the only thing I can think of now is that there's some sort of look ahead in the datasets. I seriously doubt that though, but that's all I can think of. Thoughts?

Click to load notebook preview

Here's the corresponding Alpha decay / Risk exposure NB.


Click to load notebook preview

Thanks @Antony,

I’m doing exactly this, just waiting for the performance data from the contest submission. I don’t have high hopes for this period however, for the reason you mentioned. I don’t think it’s reasonable to expect quant factors to do well at all times though and the last year or so seems to be one of those periods. Maybe I’m fooling myself though?

Here's the tearsheet from the contest and the hold out period. Quite disappointing to me, but I kinda expected this. The VALUE composite drags it down a bit I reckon. Is the poor performance in the last year due to it being overfit, or because of the recent difficult period for traditional quant factors, or something else? I honestly don't know but I'd welcome any thoughts on this.

Note: This one is quite a bit more concentrated, about 150 long and 150 short, so it doesn't get affected by the contest volatility floor.

Click to load notebook preview

Interesting that your specific returns dip in December 2018 when everybody (CNBC) thought the next major market crash had begun.

If you're going to take the composite concept all the way, it'd benefit from an additional factor that performs its best during market panics. (I just checked your Quality factor, and it appears there is no "flee to Quality" during this period.) So maybe mean reversion is next project?

Also, have you given any thought to dynamically shifting the the weights between the different factors depending on market regime? I would think during times of low volatility you might want to tilt more towards Growth, for example. Or during periods of relatively high valuations you might want to tilt away from Value or Quality weights.


Mean reversion is not really my thing. I can't really seem to get much working unless I turn off trading costs. It would also make this strategy way too complex (it probably already is) as well I reckon. Buying quality growth companies at reasonable prices (shorting expensive negative growth junk stocks) makes good intuitive sense to me, and introducing short term reversal signals will likely just result on fitting on noise.

I did think about creating a BAB (Betting Against Beta) / low-volatility 'composite' and though I had some success, all the Returns turned out to be 'Common' with negative 'Specific' returns, so I decided against that. In addition to making the combined composite more complex, it would also likely dilute any alpha.

Volatility timing is not really my thing either. Not using historical/realized volatility anyway (though Yulia Malitskaia seems to have had great success with this), and I don't think we have access to implied volatility unless we use our own sources via self-serve.

I would rather sin a little by using the composites relative strength (relative to each other) to slightly tilt the weights, as per my earlier post. One could then do either a 'momentum' tilt (more weight to the composite that recently outperformed relative to the other ones), or a 'reversal' tilt (more weight to the composite that recently under-performed relative to the other composites). I think 'reversal' tilt would make most sense (to me any way). I don't really know how to setup that framework though unfortunately. Maybe you could help?

For now, I'm just gonna leave them running in the contest to see how they perform on live data the next 3 months. I'm starting to think that @Antony is right though, and that my factor composite models are too complex in their current form. I am quite happy that it held up well during 2005-2010, which you can say is my 'validation' period. I don't think I have looked at this period since my attempt to 'rescue' my Warren Buffett on the Move strategy from being overfit, back in January. There are a few common factors from that strategy in some of my composites however.

I might revisit these ones in 3 months to see if I can 'simplify' them a bit. Plenty of stuff to work on in the meantime. :)

Just a quick update on my different factor composites in the Q Contest. The GROWTH and QUALITY ones have been running for almost a month now, and neither of them have limited position concentration. VGQ 150 is the combined VALUE + GROWTH + QUALITY composites, equal weighted portfolio blend, and limited to 150 long and 150 short positions.

Too short a period to make anything out of this yet of course. I'll be interested to see if the concentrated ones (VALUE and VGQ) will (in general) perform better than the non-concentrated ones (GROWTH and QUALITY).

@Joakim, thanks for your update. Have you considered doing a fifth algo that will do factor timing?

Thanks @James! What kind of factor timing were you thinking of? I have thought of factor tilting of these 3 composites, based on the factors’ recent relative strength. I don’t really know how to do that though, do you?

It's similar factor timing and factor tilting. The main idea behind factor timing is to establish conditions based on historical data that will score the weights of each factor based on some metric(s) or indicators. So it's like discerning which factor is in favor, out of favor for your timeframe (lookback) . In your example, relative strength of factors can be coupled with momentum and volatility of factors to form your timing/tilting factor allocation indicator.

Thanks James. Honestly I would love to try that, I just don't know how to do it. It would be great to have a template algo for this, where we could just plug in our factors.