Back to Community
Industry Concentration Strategy

I came across this recent paper: “Are US Industries Becoming More Concentrated” (see here), which contains some interesting findings. The authors convincingly argue that industries in the US have become significantly more concentrated in the last two decades as fewer companies have gone public and there are more mergers (Credit Suisse has also put out a similar research note documenting the drop in the number of listed securities). The authors show that over the last two decades, over 75% of the industries have become more concentrated, leading to larger companies that face less competition. In fact, there are fewer public companies now than there were in the early 1970’s, when GDP was a fraction of what it is now. They further show that the decline in public companies has not been replaced by private firms or foreign firms.

The main measure they use to gauge industry concentration is called the Herfindahl-Hirschman Index (HHI) (see here), which is the same measure used by regulators to determine whether a potential merger will be anti-competitive. The HHI is defined as the sum of the squared ratios of firm sales to total industry sales. For example, if there were three firms in an industry, one with a market share of 50% and the other two with market shares of 25% each, then the HHI = (0.5)^2 + (0.25)^2 + (0.25)^2 = 0.375. The highest possible HHI, when there is only one firm in a particular industry, is 1.0

Even though the focus of the paper is to examine and explain recent trends in industry concentration, the authors briefly describe a trading strategy. The authors suggest buying the stocks in the ten industries with the largest yearly increase in HHI, shorting the stocks in the ten industries with the largest yearly decrease in HHI, and holding for one year. They form equally weighted portfolios of the ten industries, and use the NAICS industry classification (which is available on Quantopian from Morningstar). They claim to achieve an annual alpha of around 9% from 2001-2014. These results suggest that investors do not fully appreciate the benefits of operating in an environment of less competition.

Unfortunately, when I tried to reproduce the results of their paper, I achieved less than half of the returns they reported, and even that required a little bit of data mining. I tried a second measure of industry concentration, and I also measured HHI using Q500, Q1500, and even the entire universe of stocks in existence (I had to save the sales in the prior quarters in context variables to avoid timeout errors). Counterintuitively, the backtesting results were worse when I measured HHI more accurately using the full universe of stocks. While there are several small differences in methodology, it’s unclear whether that would account for the gap in results.

The attached algorithm is very simple – holding stocks for a full year and making no attempt to pick stocks within an industry – so there are certainly areas for potential enhancement. For example, since firms that complete stock mergers tend to underperform (abstract here), perhaps the subset of stocks that are not involved in mergers benefit the most from consolidation in an industry. Also, it is possible that other signals can be used to sort stocks within long and short industries.

Even though the backtesting results were not spectacular, I thought the paper offered a nice example of using the industry classification data in a novel way. Industry classifications are typically used to reduce the industry exposure of a portfolio, but here it’s used as a signal itself. Incidentally, another paper published recently uses the same industry concentration measure to enhance a separate anomaly (see here). The authors create a signal based on corporate governance, and find that stocks that have strong governance (for example, no poison pills or staggered boards) outperform, but only for firms in industries with a high HHI. They argue that firms in less competitive industries benefit more from good governance, where there is less pressure imposed by the markets.

Clone Algorithm
Total Returns
Max Drawdown
Benchmark Returns
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month

Trading strategy related to "Are US Industries Becoming More Concentrated" (Grullon, Larkin, Michaely)
    At the end of June each year, get trailing 12 month sales for each stock
    Compute the Herfindahl-Hirschman Index (HHI) for each industry (using NAICS industry classification)
    Sort industries by change in HHI from previous year, and buy top 10 industries and short bottom 10
    Form equally weighted portfolio of industries, and equally weight stocks within each industry
    Rebalance once/year


import numpy as np
import pandas as pd
from quantopian.pipeline import Pipeline
from quantopian.pipeline import CustomFactor
from quantopian.algorithm import attach_pipeline, pipeline_output
from import USEquityPricing
from import morningstar
from quantopian.pipeline.filters.morningstar import IsPrimaryShare
from quantopian.pipeline.filters import Q500US, Q1500US
# These last two imports are for using Quantopian's Optimize API
from quantopian.algorithm import order_optimal_portfolio
import quantopian.experimental.optimize as opt

def initialize(context):
    # Set benchmark to short-term Treasury note ETF (SHY) since strategy is dollar neutral
    # We run rebalance every month, but only rebalance in June
    schedule_function(my_rebalance, date_rules.month_end(), time_rules.market_open())

    # Record variables at the end of each day.
    schedule_function(my_record_vars, date_rules.every_day(), time_rules.market_close())
    # Set commissions and slippage to 0 to determine pure alpha
    set_commission(commission.PerShare(cost=0, min_trade_cost=0))

    # context.conc is a DataFrame, indexed by industry, of the old HHI, the new HHI, and 
    #   number of stocks in each industry

    # Create our pipeline and attach it to our algorithm.
    my_pipe = make_pipeline()
    attach_pipeline(my_pipe, 'my_pipeline')
class Sales(CustomFactor):   
    inputs = [morningstar.income_statement.operating_revenue]  
    window_length = 189
    def compute(self, today, assets, out, sales):       
        out[:] = sales[0]+sales[-1]+sales[-64]+sales[-127] 
class Industry(CustomFactor):
    def compute(self, today, assets, out, industry):
        out[:] = industry[-1]   
def make_pipeline():
    Create our pipeline.
    # To make the program more flexible, we allow for the possibility of having two universes,
    #   one for estimating HHI and one for trading stocks after sorting by changes in HHI
    # Note that if we don't filter for primary share class, we double count sales for dual class stocks
    primary_share = IsPrimaryShare(mask=estimation_universe)
    universe = (
          & (pricing > 5)

    return Pipeline(
        columns= {

def before_trading_start(context, data):
    # Gets our pipeline output every day.
    context.output = pipeline_output('my_pipeline')

def my_rebalance(context, data):

    # We only want to rebalance once/year, at the end of June, so we check if the month is June
    if backtest_month != 6:
    # There are a few stocks that have an industry code of -1, which we'll drop
    # We divide by 1000 to convert from 6 digit NAICS to 3 digit NAICS industry classification
    context.output['industry']=context.output['industry'] // 1000'Number of unique industries: %d' %(context.output['industry'].nunique()))

    # 'share' is the  market share of total sales for each company
    context.output['share']=context.output.groupby('industry')['sales'].transform(lambda x: (x/x.sum()))
    # We square market share for HHI calculation below

    # 'new' is the current HHI for each industry, computed as the sum of squared market shares
    # Check whether it's first time running by looking at whether the 'old' HHI hasn't been created yet
    if pd.isnull(context.conc['old']).all():
    # 'count' is the numer of stocks in each industry (that are in the trading universe)
    # If the trading unviverse is smaller than the estimation universe, there may be some industries with
    #    no stocks to trade.  We eliminate those industries.
    context.conc=context.conc[context.conc['count'] != 0]
    # Compute change in HHI
    # Sort industries by change in HHI and go long top 10 industries and short bottom 10

    # Equally weight 10 industries, and equally weight stocks within each industry
    context.output['weights']=context.output.groupby('industry')['sales'].transform(lambda x: (.5/10)/x.count())

    # Copy the 'new' HHI into 'old' HHI for comparison next year
    ### Optimize code. ###
    # Makes the weights negative for our shorts
    shorts_weights_dict = {k: -v for k, v in shorts_weights_dict.items()}
    # Combine the weights for our longs and shorts in one dictionary
    target_weights = longs_weights_dict.copy()
    # Place orders according to weights.
    # Note that with the Optimize API, you don't have to explicity set the weight to zero for an unwind.
    #  If you have an existing position and no weight is given, it assumes the weight is zero.

    # For comparison, this is the same ordering code, but not using the Optimize API
    # for security in context.portfolio.positions:
    #     # Unwinds
    #     if (security not in context.longs) and (security not in context.shorts) and data.can_trade(security): 
    #         order_target_percent(security, 0)
    # for security in context.longs:
    #     # New longs
    #     if data.can_trade(security):
    #         order_target_percent(security, longs_weights_dict[security])

    # for security in context.shorts:
    #     # New shorts
    #     if data.can_trade(security):
    #         order_target_percent(security, -shorts_weights_dict[security])

def my_record_vars(context, data):
    Record variables at the end of each day.
    if len(context.longs)==0:
        longs = shorts = 0
        for position in context.portfolio.positions.itervalues():
            if position.amount > 0:
                longs += 1
            elif position.amount < 0:
                shorts += 1
        # Record our variables.
        record(leverage=context.account.leverage, long_count=longs, short_count=shorts)
        #"Today's shorts: "  +", ".join([short_.symbol for short_ in context.shorts]))
        #"Today's longs: "  +", ".join([long_.symbol for long_ in context.longs]))
There was a runtime error.

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

4 responses

Hi Rob,
Thanks so much for this post! I have learned a lot from it. I want to ask you a question regarding this post. The result of running the program looks like the figure( The figure shows that numbers of unique industries is 65, but the result does not print out whole 65 industries. If I want to see the result of 65 industries, could you tell me how to do? Thank you.

If the theory is right, could it work better cap-weighting the stocks (or buying/selling cap-weighted industry ETFs), in that the top player(s) may benefit more from concentration?

Fantastic post Rob! I just joined this community and already learning so much. Keep up the good work!

Hi Rob,
Thanks for the post. However, there's some question that I encounter running this code. The algorithm doesn't work. I'm wondering how can I fix it.Thank you.
Here is the result