Back to Community
Quality Factors

This post delves into what is commonly referred to in the academic literature as “quality” factors. In contrast to value factors, quality factors may be less understood by people outside the finance/accounting fields. I’ll divide this post into five sections:

  1. What are quality factors
  2. What are some nice features of quality factors
  3. Sector neutralizing quality factors
  4. Backtesting the Novy-Marx quality factor (gross profitability)
  5. Factor timing on quality factors

I will demonstrate these ideas with a notebook (and an algorithm to follow). Two excellent papers on the subject, which I will refer to numerous times, are Asness, Frazzini, Pederson, “Quality Minus Junk”, and Novy-Marx, “The Other Side of Value: The Gross Profitability Premium”.

What are Quality Factors?

In very general terms, “quality” stocks are defined as stocks that have features that should command higher prices. Examples would be stocks with high profitability (in terms of, for example, return on equity, return on assets, gross profitability, gross margin), high growth (for example, the five-year growth of the above items), conservative accounting (for example, low non-cash accruals), and high payouts (for example, low net equity and debt issuance, high dividend payout ratios). This list of quality factors is by no means comprehensive. For example, in a paper by an Accounting professor that received a lot of attention among traders and academics, Piotroski’s F-Score paper uses nine accounting variables to pick long and short stock candidates. A key feature of quality factors is that they are solely based on accounting variables and not market values, unlike value factors (for example, Book/Market, Price/Cash Flow, Price/Earnings, Dividend/Price,…) that use market prices to determine whether stocks are overvalued or undervalued.

These quality factors should lead to higher valuations, and in the shared notebook that’s attached, I show that this is indeed the case. The way I measure the valuation premium for high quality stocks is to run a cross-sectional regression of market-to-book (M/B) ratios on a measure of quality, the gross profitability-to-asset ratio (GP/A):
(M/B)i = a + b (GP/A)i + ei
The regression coefficient, b, measures the relationship between quality and valuation premium, and the coefficient is significantly positive - higher quality stocks have higher market-to-book ratios. However, and this is where the potential inefficiency comes in, the market may not be valuing these quality stocks high enough, which implies future expected returns should be larger. In other words, although high quality stocks command a valuation premium, empirical evidence suggests the premium may be too modest.

Nice Features of Quality Stocks

I’m going to focus on the measure of quality used by Novy-Marx, namely gross profitability, which is defined as (revenues – cost of goods sold)/(book value of assets)). He argues that gross profitability is a better measure than those based on earnings or cash flow. For example, capital expenditures and R&D (which are not part of cost of goods sold) reduce cash flow and earnings but may improve future operations.

As a stand-alone factor, gross profitability performs well, but the strength in quality factors like this lies in its interaction with value factors. First of all, quality is usually negatively correlated with value: high quality stocks tend to have higher market-to-book ratios, as we discussed. And quality seems to do well when value doesn’t, and vice versa, so quality is a good hedge for value. But most importantly, stocks that have both value and quality perform better than the sum of the individual factors. Buying value stocks that also have high quality scores apparently avoids the “value trap” – stocks that are cheap but never recover. This interaction between value and quality can be captured by trading the “corner boxes” in a double sort – going long stocks that have both high quality and high book-to-market ratios and shorting stocks that have both low quality and low book-to-market ratios.

Novy-Marx was certainly not the first to notice the positive interaction between value and quality factors. For example, Joel Greenblatt’s “The Little Book that Beats the Market” uses two factors, a quality factor (ROC) and a value factor (P/E), both measured with respect to the enterprise value of the firm (debt plus equity) rather than equity (which, they argue, makes it easier to compare companies with different capital structures).

This strategy has very low turnover (and therefore very low transaction costs and very high capacity). In backtesting, I rebalanced once/month, and even monthly the signals are very persistent. Since the factors involve pure accounting variables, they don’t change much over time.

Also, the outperformance of quality portfolios is even larger if we look at the three-factor Fama-French risk-adjusted returns rather than raw returns, since quality stocks tend to have negative exposures to value. Novy-Marx also claims that when he sorts stocks into quality deciles, high quality stocks tend to have larger market capitalizations. The negative exposure to size would further enhance the three-factor Fama-French risk-adjusted returns. However, I did not find any correlation between quality stocks and market capitalization in the recent period I looked at.

Sector Neutralizing

Before presenting the backtesting results, we should step back and discuss how to deal with sectors and industries. When working with accounting data, there could be large differences between various accounting measures across different sectors. For example, profit margin, a quality factor, may differ considerably among financial, utility, technology, and consumer discretionary stocks. The same applies to other fundamental factors: dividend yield, profit margins, patents, change in employees, R&D normalized by assets or sales, employee utilization, accruals, stock option expensing (and many others) all vary widely across sectors. When selecting stocks, it may be more relevant to compare companies with their peers.

There are numerous ways of dealing with sector differences:
1. Demean the factors by sector
2. Standardize the factors by z-scoring within sectors
3. Forcing strict sector neutrality (for example, computing quantiles within each sector)
4. Eliminate certain sectors (financial stocks and utilities, for example)
5. Selectively eliminate sectors based on performance
6. All of the above but for industry groups rather than sectors

I focus on demeaning because that’s what Novy-Marx did. In almost all backtests I ran, demeaning reduced the volatility of the strategy and increased the Sharpe Ratio, and in many cases it actually increased the average returns also.

Most academic papers demean by industries rather than sectors. Morningstar has 69 industry groups, larger than the 24 GICS industry groups or the 47 Fama-French Industries, and with our smaller Q1500 universe, it seems too granular to demean by Morningstar’s industry groups.

Let me make a few comments on selectively eliminating sectors. I know it is tempting to do this, especially with tools like Alphalens, where it is easy to examine performance by sector. Indeed, I’ve seen this approach employed in practice numerous times. Lehman Brothers published their Quantitative Stock Selection Model, and one of the features that they touted was that they employ a large set of 27 factors, but applied them to each sector differently. (I have a hard copy of their report, but don’t have a link I can supply). For example, in the health care sector, some factors they use are EBITDA to EV, ROE, Incremental Net Margins, Intangibles to Assets, and an Earnings Revisions Ratio. In contrast, in the Technology sector, they use P/E, change in Share outstanding, change in employees, change in Debt to Assets, Earnings Revisions, and Earnings Surprise. And both sectors have a few common factors, like Momentum and change in Accruals. The weights of the factors also vary within sectors, based on regression results and some subjectivity on their part.

I would caution about the huge potential data mining involved. Even for a single factor, there are 2^11 ways to selectively include it in the 11 sectors (2^11 – 1, if you dismiss the most likely scenario, that it doesn’t work for any sector). I certainly think there are signals that you may think (a priori, not after-the-fact) would not work well for certain sectors. For oil companies, ratios involving proven reserves rather than revenues might be useful. Or insider trades may be a stronger signal for sectors where there may be more asymmetric information between the managers and investors about future products, like in the technology and pharmaceutical sectors. But I would be very cautious about overfitting.

Testing the Novy-Marx Paper

Sorting stocks into 10 deciles based on only GP/A results in a Sharpe Ratio of 0.46 and a total return over 14 years of 53%, which is comparable to the Sharpe Ratio of sorting stocks only on B/M, where the total return is 64%. Because quality is a hedge against value, a simple 50-50 mix of GP/A and B/M portfolios has about the same average returns but a much higher Sharpe Ratio than either factor by itself.

The 50-50 mix of GP/A and B/M simply adds the longs and the shorts of the two separate factors, but doesn’t take advantage of any interaction between the two factors. Trading the corner boxes of a double sort, or ranking stocks based on each factor and summing the ranks cap, resulted in significantly higher returns than that of the individual factors.

Book/Market is a relatively weak value signal, and quality factors work even better in conjunction with value factors like P/S or P/CF. However, the goal here is not to come up with an optimized multifactor model but merely to demonstrate the nice interaction between quality and a simple value factor.

Factor Timing

Because fundamental factors are so noisy and inconsistent, factor timing has become the holy grail of fundamental factor models. There have been numerous attempts at trying to improve performance using factor timing, just like people try to time the overall stock market. For example, the momentum factor is known for having infrequent but severe crashes (most recently after the financial crisis) despite performing well overall, and a recent paper, “Momentum Crashes” argues that a bear market indicator and a volatility forecasting signal can be used to double the Sharpe Ratio of a static momentum strategy.

With quality factors, Asness et al. argue that if there are periods when high quality stocks have higher valuations (measured by high Market/Book ratios), then the market is already pricing in high valuations for quality stocks, and high values now will lead to lower returns in the future. Conversely, when quality is not priced in, quality should have higher future returns. Although Asness et al. don’t directly test a trading strategy for timing quality factors, they run regressions of future factor returns on ex ante measures of quality factor valuations and find a significant relationship.

I backtested several timing strategies that were suggested but not tested in Asness et al. Every month, I performed the cross-sectional regression described earlier, regressing Market/Book value of each stock on the gross profitability of each stock in the universe (actually, I regressed Book/Market, not Market/Book, and reversed the sign later: stocks could have negative Book Values, and it’s not a good idea to look at factor ratios where the denominator can be negative). I first standardized the Book/Market and Gross Profitability, which reduces the influence of outliers and also allows for easier interpretation of the regression coefficient. The slope of the monthly regressions is the signal used for factor timing.

I got very mixed results when I tried factor timing in this way. A few backtests I ran marginally improved the performance, but a majority of them did worse than the static strategy. The timing was not able to avoid the Quant crash of August 2007 when value severely underperformed but quality held up well – the slope coefficient was pointing a little more to quality than value, but nowhere near the extreme ranges. Asness et al. argue that timing would have done very well during the tech crash of 2000, but that is before our data starts. And even Asness, in a separate piece in Institutional Investor (here), places some skepticism on the ability to time factors.

I record the slope coefficient in my backtesting to show how it varies over time. For most of the backtesting period, it ranged from about 0.20 to 0.33. The rule I used was that if the slope of the cross sectional regression went above a threshold, I only traded the value factor, and if the slope went below another threshold, I only traded the quality factor (which of course involves some look ahead bias to find reasonable ranges). I should mention that there are an endless set of ways to dynamically adjust the mix of quality and value so the potential for data mining is huge.

This is by no means the only way to do factor timing. For example, Hua, Kantsyrev, and Qian, “Factor-Timing Model” propose more of a pure statistical approach to factor timing. They start with a set of potential conditioning variables that are used for timing another set of factors (the conditioning variables they use are the VIX, the Fed Funds rate, U.S. Consumer Confidence index, the calendar month, the debt-to-market capitalization ratio spread, and the book-to-price ratio spread). They employ a sequential approach to figure out which conditioning variables to use and how many to use, guided by the Akaike Information Criterion (AIC) (see the Quantopian lecture “Autocorrelation and AR Models”). I plan to play around with their timing technique and see how it does out of sample. My suspicion is that this leads to overfitting and won’t perform well out of sample, but it would be an interesting technique to try out. I have a healthy bit of skepticism on factor timing, but I haven’t given up completely on the idea that it could improve factor performance.

Loading notebook preview...
Notebook previews are currently unavailable.

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

3 responses

Great work! but I want to point again the attention to one of the biggest limitation of using fundamentals data with the pipeline API: the lack of annual and TTM data.

Novy-Marx uses annual gross profitability and the same does Piotroski and the majority of other study.

The community is asking for that since years, there is a lot of posts about this theme (simply look in the forum for "historical fundamental data" or "fundamentals timeframe") and I hope to see this feature in the new year 2017 :-)
Quantopian is the best platform out there, but this lack is becoming very disturbing.

Thanks for your detailed and easy-to-follow notebook. Great graph presentations! Appreciate for sharing references to numerous sources to check out!

The get_backtest function seems not to work after I copy your notebook. Not sure if anybody else has the same issue. However, I can still see the backtesting performance graph without issue.

It is interesting to note that the strategy works well in Q1500 out of the sample but not with Q500. Would it be that Q500 securities are much more efficiently priced in the GP/A quality factor? Does that imply market can be more efficient in pricing small caps in the future?

Notice that even after you demean by sector, sectors like Energy, Materials, Real Estate and Telecom have much smaller number of companies. Can you rank stocks according to “GP_to_A_demean” within each sector? Then for each sector, you can better match number of long and short. Correct me if I am wrong. I see you rank the whole universe base on “GP_to_A_demean” in the notebook.

In the Boxplot grouped by sector_name, different sectors have different GP/A mean and variance. For a similar level of mean across Financial, Healthcare, Industrial and Material, the variance is quite different. Does that title your preference when matching factors with sectors?

One general question regarding the cross-sectional regression. In the regression result for “bm_z” and “gp_z”, the t is very significant for “gp_z” and R-squared is 5.6%. From your experience, can you help understand what number can be taken as a great number?

Hi Chao,

Thanks for your comments and questions.

  • Since I didn’t share all the backtesting runs, unfortunately you can’t rerun any cells in the notebook that use the get_backtest function.
  • I do find that a vast majority of the strategies I've looked at in the past work better for small cap universes than large cap ones (ignoring transaction costs), but you're correct that there is a particularly big difference in this case. Interestingly, over his much longer backtesting period, Novy-Marx argues that quality factors work particularly well on large cap stocks (see Table 7 in his paper for a backtest on the 500 largest stocks), but I did not find that to be the case with recent data. I don't know whether small caps will get more efficient in the future, but it seems like this difference in inefficiency between small caps and large caps has existed for a while.
  • You could rank stocks separately for each sector as you say, and you could also z-score rather than de-mean stocks within each sector. Both methods produce not only longs and shorts that are close to sector neutral, but also the number of longs and shorts would closely match the sector weights (by just de-meaning like I did, you get close to sector neutrality but some sectors are over- or under-represented). It’s really an empirical question on which method is best.
  • Similar to the last question, de-meaning does over-represent sectors that have higher variance in factor scores. And for other factors, there may be a completely different set of sectors that are over-represented. Again, it's an empirical question whether tilting towards sectors where there is more differentiation in scores is better.
  • The R-squared we get is about the same as Asness gets over his 1986-2012 backtesting period (and half of what he gets in the 1956-2012 period). There are obviously many other omitted factors that can help explain the variation in B/M (for example, profit growth). Of course the magnitude of this R-squared from a cross-sectional regression of B/M on GP/A cannot be compared to the lower R-squared that you would typically see from a traditional Fama-MacBeth cross-sectional regressions of (noisier) monthly returns on GP/A.