This post delves into what is commonly referred to in the academic literature as “quality” factors. In contrast to value factors, quality factors may be less understood by people outside the finance/accounting fields. I’ll divide this post into five sections:
- What are quality factors
- What are some nice features of quality factors
- Sector neutralizing quality factors
- Backtesting the Novy-Marx quality factor (gross profitability)
- Factor timing on quality factors
I will demonstrate these ideas with a notebook (and an algorithm to follow). Two excellent papers on the subject, which I will refer to numerous times, are Asness, Frazzini, Pederson, “Quality Minus Junk”, and Novy-Marx, “The Other Side of Value: The Gross Profitability Premium”.
What are Quality Factors?
In very general terms, “quality” stocks are defined as stocks that have features that should command higher prices. Examples would be stocks with high profitability (in terms of, for example, return on equity, return on assets, gross profitability, gross margin), high growth (for example, the five-year growth of the above items), conservative accounting (for example, low non-cash accruals), and high payouts (for example, low net equity and debt issuance, high dividend payout ratios). This list of quality factors is by no means comprehensive. For example, in a paper by an Accounting professor that received a lot of attention among traders and academics, Piotroski’s F-Score paper uses nine accounting variables to pick long and short stock candidates. A key feature of quality factors is that they are solely based on accounting variables and not market values, unlike value factors (for example, Book/Market, Price/Cash Flow, Price/Earnings, Dividend/Price,…) that use market prices to determine whether stocks are overvalued or undervalued.
These quality factors should lead to higher valuations, and in the shared notebook that’s attached, I show that this is indeed the case. The way I measure the valuation premium for high quality stocks is to run a cross-sectional regression of market-to-book (M/B) ratios on a measure of quality, the gross profitability-to-asset ratio (GP/A):
(M/B)i = a + b (GP/A)i + ei
The regression coefficient, b, measures the relationship between quality and valuation premium, and the coefficient is significantly positive - higher quality stocks have higher market-to-book ratios. However, and this is where the potential inefficiency comes in, the market may not be valuing these quality stocks high enough, which implies future expected returns should be larger. In other words, although high quality stocks command a valuation premium, empirical evidence suggests the premium may be too modest.
Nice Features of Quality Stocks
I’m going to focus on the measure of quality used by Novy-Marx, namely gross profitability, which is defined as (revenues – cost of goods sold)/(book value of assets)). He argues that gross profitability is a better measure than those based on earnings or cash flow. For example, capital expenditures and R&D (which are not part of cost of goods sold) reduce cash flow and earnings but may improve future operations.
As a stand-alone factor, gross profitability performs well, but the strength in quality factors like this lies in its interaction with value factors. First of all, quality is usually negatively correlated with value: high quality stocks tend to have higher market-to-book ratios, as we discussed. And quality seems to do well when value doesn’t, and vice versa, so quality is a good hedge for value. But most importantly, stocks that have both value and quality perform better than the sum of the individual factors. Buying value stocks that also have high quality scores apparently avoids the “value trap” – stocks that are cheap but never recover. This interaction between value and quality can be captured by trading the “corner boxes” in a double sort – going long stocks that have both high quality and high book-to-market ratios and shorting stocks that have both low quality and low book-to-market ratios.
Novy-Marx was certainly not the first to notice the positive interaction between value and quality factors. For example, Joel Greenblatt’s “The Little Book that Beats the Market” uses two factors, a quality factor (ROC) and a value factor (P/E), both measured with respect to the enterprise value of the firm (debt plus equity) rather than equity (which, they argue, makes it easier to compare companies with different capital structures).
This strategy has very low turnover (and therefore very low transaction costs and very high capacity). In backtesting, I rebalanced once/month, and even monthly the signals are very persistent. Since the factors involve pure accounting variables, they don’t change much over time.
Also, the outperformance of quality portfolios is even larger if we look at the three-factor Fama-French risk-adjusted returns rather than raw returns, since quality stocks tend to have negative exposures to value. Novy-Marx also claims that when he sorts stocks into quality deciles, high quality stocks tend to have larger market capitalizations. The negative exposure to size would further enhance the three-factor Fama-French risk-adjusted returns. However, I did not find any correlation between quality stocks and market capitalization in the recent period I looked at.
Before presenting the backtesting results, we should step back and discuss how to deal with sectors and industries. When working with accounting data, there could be large differences between various accounting measures across different sectors. For example, profit margin, a quality factor, may differ considerably among financial, utility, technology, and consumer discretionary stocks. The same applies to other fundamental factors: dividend yield, profit margins, patents, change in employees, R&D normalized by assets or sales, employee utilization, accruals, stock option expensing (and many others) all vary widely across sectors. When selecting stocks, it may be more relevant to compare companies with their peers.
There are numerous ways of dealing with sector differences:
1. Demean the factors by sector
2. Standardize the factors by z-scoring within sectors
3. Forcing strict sector neutrality (for example, computing quantiles within each sector)
4. Eliminate certain sectors (financial stocks and utilities, for example)
5. Selectively eliminate sectors based on performance
6. All of the above but for industry groups rather than sectors
I focus on demeaning because that’s what Novy-Marx did. In almost all backtests I ran, demeaning reduced the volatility of the strategy and increased the Sharpe Ratio, and in many cases it actually increased the average returns also.
Most academic papers demean by industries rather than sectors. Morningstar has 69 industry groups, larger than the 24 GICS industry groups or the 47 Fama-French Industries, and with our smaller Q1500 universe, it seems too granular to demean by Morningstar’s industry groups.
Let me make a few comments on selectively eliminating sectors. I know it is tempting to do this, especially with tools like Alphalens, where it is easy to examine performance by sector. Indeed, I’ve seen this approach employed in practice numerous times. Lehman Brothers published their Quantitative Stock Selection Model, and one of the features that they touted was that they employ a large set of 27 factors, but applied them to each sector differently. (I have a hard copy of their report, but don’t have a link I can supply). For example, in the health care sector, some factors they use are EBITDA to EV, ROE, Incremental Net Margins, Intangibles to Assets, and an Earnings Revisions Ratio. In contrast, in the Technology sector, they use P/E, change in Share outstanding, change in employees, change in Debt to Assets, Earnings Revisions, and Earnings Surprise. And both sectors have a few common factors, like Momentum and change in Accruals. The weights of the factors also vary within sectors, based on regression results and some subjectivity on their part.
I would caution about the huge potential data mining involved. Even for a single factor, there are 2^11 ways to selectively include it in the 11 sectors (2^11 – 1, if you dismiss the most likely scenario, that it doesn’t work for any sector). I certainly think there are signals that you may think (a priori, not after-the-fact) would not work well for certain sectors. For oil companies, ratios involving proven reserves rather than revenues might be useful. Or insider trades may be a stronger signal for sectors where there may be more asymmetric information between the managers and investors about future products, like in the technology and pharmaceutical sectors. But I would be very cautious about overfitting.
Testing the Novy-Marx Paper
Sorting stocks into 10 deciles based on only GP/A results in a Sharpe Ratio of 0.46 and a total return over 14 years of 53%, which is comparable to the Sharpe Ratio of sorting stocks only on B/M, where the total return is 64%. Because quality is a hedge against value, a simple 50-50 mix of GP/A and B/M portfolios has about the same average returns but a much higher Sharpe Ratio than either factor by itself.
The 50-50 mix of GP/A and B/M simply adds the longs and the shorts of the two separate factors, but doesn’t take advantage of any interaction between the two factors. Trading the corner boxes of a double sort, or ranking stocks based on each factor and summing the ranks cap, resulted in significantly higher returns than that of the individual factors.
Book/Market is a relatively weak value signal, and quality factors work even better in conjunction with value factors like P/S or P/CF. However, the goal here is not to come up with an optimized multifactor model but merely to demonstrate the nice interaction between quality and a simple value factor.
Because fundamental factors are so noisy and inconsistent, factor timing has become the holy grail of fundamental factor models. There have been numerous attempts at trying to improve performance using factor timing, just like people try to time the overall stock market. For example, the momentum factor is known for having infrequent but severe crashes (most recently after the financial crisis) despite performing well overall, and a recent paper, “Momentum Crashes” argues that a bear market indicator and a volatility forecasting signal can be used to double the Sharpe Ratio of a static momentum strategy.
With quality factors, Asness et al. argue that if there are periods when high quality stocks have higher valuations (measured by high Market/Book ratios), then the market is already pricing in high valuations for quality stocks, and high values now will lead to lower returns in the future. Conversely, when quality is not priced in, quality should have higher future returns. Although Asness et al. don’t directly test a trading strategy for timing quality factors, they run regressions of future factor returns on ex ante measures of quality factor valuations and find a significant relationship.
I backtested several timing strategies that were suggested but not tested in Asness et al. Every month, I performed the cross-sectional regression described earlier, regressing Market/Book value of each stock on the gross profitability of each stock in the universe (actually, I regressed Book/Market, not Market/Book, and reversed the sign later: stocks could have negative Book Values, and it’s not a good idea to look at factor ratios where the denominator can be negative). I first standardized the Book/Market and Gross Profitability, which reduces the influence of outliers and also allows for easier interpretation of the regression coefficient. The slope of the monthly regressions is the signal used for factor timing.
I got very mixed results when I tried factor timing in this way. A few backtests I ran marginally improved the performance, but a majority of them did worse than the static strategy. The timing was not able to avoid the Quant crash of August 2007 when value severely underperformed but quality held up well – the slope coefficient was pointing a little more to quality than value, but nowhere near the extreme ranges. Asness et al. argue that timing would have done very well during the tech crash of 2000, but that is before our data starts. And even Asness, in a separate piece in Institutional Investor (here), places some skepticism on the ability to time factors.
I record the slope coefficient in my backtesting to show how it varies over time. For most of the backtesting period, it ranged from about 0.20 to 0.33. The rule I used was that if the slope of the cross sectional regression went above a threshold, I only traded the value factor, and if the slope went below another threshold, I only traded the quality factor (which of course involves some look ahead bias to find reasonable ranges). I should mention that there are an endless set of ways to dynamically adjust the mix of quality and value so the potential for data mining is huge.
This is by no means the only way to do factor timing. For example, Hua, Kantsyrev, and Qian, “Factor-Timing Model” propose more of a pure statistical approach to factor timing. They start with a set of potential conditioning variables that are used for timing another set of factors (the conditioning variables they use are the VIX, the Fed Funds rate, U.S. Consumer Confidence index, the calendar month, the debt-to-market capitalization ratio spread, and the book-to-price ratio spread). They employ a sequential approach to figure out which conditioning variables to use and how many to use, guided by the Akaike Information Criterion (AIC) (see the Quantopian lecture “Autocorrelation and AR Models”). I plan to play around with their timing technique and see how it does out of sample. My suspicion is that this leads to overfitting and won’t perform well out of sample, but it would be an interesting technique to try out. I have a healthy bit of skepticism on factor timing, but I haven’t given up completely on the idea that it could improve factor performance.