Back to Community
Equity Valuation: The Comparables Approach Using K-means Clustering

Hi guys, this is a follow up from my "K-means Clustering Help" post.

One of the simplest ways to find an undervalued company is through the comparables approach. Here is a detailed explanation from Investopedia: https://www.investopedia.com/articles/investing/080913/equity-valuation-comparables-approach.asp

"The basic premise of the comparables approach is that an equity’s value should bear some resemblance to other equities in a similar class. For a stock, this can simply be determined by comparing a firm to its key rivals, or at least those rivals that operate similar businesses. Discrepancies in the value between similar firms could spell opportunity. The hope is that it means the equity being valued is undervalued and can be bought and held until the value increases. The opposite could hold true, which could present opportunity for shorting the stock, or positioning one’s portfolio to profit from a decline in its price.

There are two primary comparable approaches. The first is the most common and looks at market comparables for a firm and its peers. Common market multiples include the following: enterprise value to sales (EV/S), enterprise multiple, price to earnings (P/E), price to book (P/B) and price to free cash flow (P/FCF)..."

However, to find a set of equities in a similar class, we often have to make many assumptions. So, in this algorithm, I used k-means clustering to attempt to quantitatively cluster similar firms into comparable groups (info in Kmeans:https://en.wikipedia.org/wiki/K-means_clustering). Kmeans will group the firms based on variables such as Market Cap, ROE, ROA, etc. We could also find EV/EBITDA multiple of each company in each set and, according to the theory, the bottom 10-25% EV/EBITDA of each group should be undervalued and the top 10-25% EV/EBITDA of each group should be overvalued.

However, k-means clustering cannot be applied to discrete variables, such as industry code, sector ID, credit ranking, etc. So my algorithm only experimented with companies from the financial industry and did not include any discrete variables.

I have attached the result. I am brand new to Quantopian so please give me some advice on how to cluster firms more accurately and get around the discrete variables problem.

Thank you very much,

Thanh

Clone Algorithm
22
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
from quantopian.algorithm import attach_pipeline, pipeline_output
from quantopian.pipeline.classifiers.fundamentals import Sector 
from quantopian.pipeline import Pipeline
from quantopian.pipeline.data import Fundamentals
from quantopian.pipeline.filters import Q1500US, Q500US
import pandas as pd
import numpy as np
from sklearn.cluster import KMeans

def initialize(context):
    context.long_leverage = 0.5
    context.short_leverage = -0.5
  
    # Rebalance on the first trading day of each month at 11AM.
    schedule_function(rebalance,
                      date_rules.month_start(days_offset=0),
                      time_rules.market_open(hours=1, minutes=30))

    # Create and attach our pipeline (dynamic stock selector), defined below.
    attach_pipeline(make_pipeline(context), 'kmeans')


def make_pipeline(context):
  sector_filter = Sector()
  financial_sector_filter = sector_filter.eq(103)

  universe = Q1500US()
  
  finance_universe = financial_sector_filter & universe 
    
  market_cap = Fundamentals.market_cap.latest
  
  enterprise_value = Fundamentals.enterprise_value.latest
  
  sustain_growth = Fundamentals.sustainable_growth_rate.latest
  
  ROA = Fundamentals.roa.latest
  
  ROE = Fundamentals.roe.latest

  ROIC = Fundamentals.roic.latest

  EV_EBITDA = Fundamentals.ev_to_ebitda.latest
  
  #industry = Fundamentals.morningstar_industry_code.latest 
  
  result = Pipeline(
      columns={
          'EV/EBITDA': EV_EBITDA,
          #'industry': industry,
          'enterprise value': enterprise_value,
          'market_cap': market_cap,
          'sustain growth': sustain_growth,
          'ROA' : ROA,
          'ROE' : ROE,
          'ROIC' : ROIC
    }, screen =  finance_universe
      
  )
  return result

def before_trading_start(context, data):
  context.output = pipeline_output('kmeans')
  result = context.output.dropna(axis=0)
  result_array = result.values #switch Data Frame to array to use k-means library
  result_array = np.delete(result_array,0,1)#take EV/EBITDA out of the k-means clustering process
  kmeans = KMeans(n_clusters=50).fit(result_array) #fit into 50 groups
  cluster_label = kmeans.labels_ #each cluster now has an ID, ranging from 0-49
  cluster = np.array(cluster_label)
  cluster = cluster.reshape((-1, 1))
  result['Cluster'] = cluster #attach cluster ID to result Pipeline
  result = result.sort_values(by=['EV/EBITDA'])

  #loop to get a list of long stocks 
  context.long_groups = [];
  #loop to get a list of short stocks
  context.short_groups = [];
  for x in range(50):
    group = result[result['Cluster']==x]
    if len(group)>3 & len(group)<15: #eliminate clusters that are three or smaller 
        
        #get bottom 25% EV/EBITDA, ie the undervalued firms
        group_bottom = group[group['EV/EBITDA'] < group['EV/EBITDA'].quantile(0.25)]
        
        #get top 25% EV/EBITDA, ie the overvalued firms
        group_top = group[group['EV/EBITDA'] > group['EV/EBITDA'].quantile(0.75)]
        group_bottom = group_bottom.index.tolist()
        group_top = group_top.index.tolist()
        context.long_groups.append(group_bottom)
        context.short_groups.append(group_top)
        
    #for cluster that are 15 or larger, only get the smallest EV/EBITDA decile 
    elif len(group)>=15: 
        group_bottom = group[group['EV/EBITDA'] < group['EV/EBITDA'].quantile(0.1)]
        group_top = group[group['EV/EBITDA'] > group['EV/EBITDA'].quantile(0.9)]
        group_bottom = group_bottom.index.tolist()
        group_top = group_top.index.tolist()
        context.long_groups.append(group_bottom)
        context.short_groups.append(group_top)
        
  #merging clusters to lists of long and short stocks  
  context.long_groups = [val for sublist in context.long_groups for val in sublist] 
  context.short_groups = [val for sublist in context.short_groups for val in sublist]
  context.groups = context.short_groups + context.long_groups

def rebalance(context,data):
    for stock in context.portfolio.positions:
        if stock not in context.groups and data.can_trade(stock):
            order_target_percent(stock, 0)
    for stock in context.long_groups:
        if data.can_trade(stock):
           order_target_percent(stock, context.long_leverage/len(context.long_groups))
    for stock in context.short_groups:
        if data.can_trade(stock):
            order_target_percent(stock, context.short_leverage/len(context.short_groups))
There was a runtime error.
1 response

I'm wondering if standardization of the data is needed, for K-Means?