Equity Valuation: The Comparables Approach Using K-means Clustering

Hi guys, this is a follow up from my "K-means Clustering Help" post.

One of the simplest ways to find an undervalued company is through the comparables approach. Here is a detailed explanation from Investopedia: https://www.investopedia.com/articles/investing/080913/equity-valuation-comparables-approach.asp

"The basic premise of the comparables approach is that an equity’s value should bear some resemblance to other equities in a similar class. For a stock, this can simply be determined by comparing a firm to its key rivals, or at least those rivals that operate similar businesses. Discrepancies in the value between similar firms could spell opportunity. The hope is that it means the equity being valued is undervalued and can be bought and held until the value increases. The opposite could hold true, which could present opportunity for shorting the stock, or positioning one’s portfolio to profit from a decline in its price.

There are two primary comparable approaches. The first is the most common and looks at market comparables for a firm and its peers. Common market multiples include the following: enterprise value to sales (EV/S), enterprise multiple, price to earnings (P/E), price to book (P/B) and price to free cash flow (P/FCF)..."

However, to find a set of equities in a similar class, we often have to make many assumptions. So, in this algorithm, I used k-means clustering to attempt to quantitatively cluster similar firms into comparable groups (info in Kmeans:https://en.wikipedia.org/wiki/K-means_clustering). Kmeans will group the firms based on variables such as Market Cap, ROE, ROA, etc. We could also find EV/EBITDA multiple of each company in each set and, according to the theory, the bottom 10-25% EV/EBITDA of each group should be undervalued and the top 10-25% EV/EBITDA of each group should be overvalued.

However, k-means clustering cannot be applied to discrete variables, such as industry code, sector ID, credit ranking, etc. So my algorithm only experimented with companies from the financial industry and did not include any discrete variables.

I have attached the result. I am brand new to Quantopian so please give me some advice on how to cluster firms more accurately and get around the discrete variables problem.

Thank you very much,

Thanh

22
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
 Returns 1 Month 3 Month 6 Month 12 Month
 Alpha 1 Month 3 Month 6 Month 12 Month
 Beta 1 Month 3 Month 6 Month 12 Month
 Sharpe 1 Month 3 Month 6 Month 12 Month
 Sortino 1 Month 3 Month 6 Month 12 Month
 Volatility 1 Month 3 Month 6 Month 12 Month
 Max Drawdown 1 Month 3 Month 6 Month 12 Month
from quantopian.algorithm import attach_pipeline, pipeline_output
from quantopian.pipeline.classifiers.fundamentals import Sector
from quantopian.pipeline import Pipeline
from quantopian.pipeline.data import Fundamentals
from quantopian.pipeline.filters import Q1500US, Q500US
import pandas as pd
import numpy as np
from sklearn.cluster import KMeans

def initialize(context):
context.long_leverage = 0.5
context.short_leverage = -0.5

# Rebalance on the first trading day of each month at 11AM.
schedule_function(rebalance,
date_rules.month_start(days_offset=0),
time_rules.market_open(hours=1, minutes=30))

# Create and attach our pipeline (dynamic stock selector), defined below.
attach_pipeline(make_pipeline(context), 'kmeans')

def make_pipeline(context):
sector_filter = Sector()
financial_sector_filter = sector_filter.eq(103)

universe = Q1500US()

finance_universe = financial_sector_filter & universe

market_cap = Fundamentals.market_cap.latest

enterprise_value = Fundamentals.enterprise_value.latest

sustain_growth = Fundamentals.sustainable_growth_rate.latest

ROA = Fundamentals.roa.latest

ROE = Fundamentals.roe.latest

ROIC = Fundamentals.roic.latest

EV_EBITDA = Fundamentals.ev_to_ebitda.latest

#industry = Fundamentals.morningstar_industry_code.latest

result = Pipeline(
columns={
'EV/EBITDA': EV_EBITDA,
#'industry': industry,
'enterprise value': enterprise_value,
'market_cap': market_cap,
'sustain growth': sustain_growth,
'ROA' : ROA,
'ROE' : ROE,
'ROIC' : ROIC
}, screen =  finance_universe

)
return result

context.output = pipeline_output('kmeans')
result = context.output.dropna(axis=0)
result_array = result.values #switch Data Frame to array to use k-means library
result_array = np.delete(result_array,0,1)#take EV/EBITDA out of the k-means clustering process
kmeans = KMeans(n_clusters=50).fit(result_array) #fit into 50 groups
cluster_label = kmeans.labels_ #each cluster now has an ID, ranging from 0-49
cluster = np.array(cluster_label)
cluster = cluster.reshape((-1, 1))
result['Cluster'] = cluster #attach cluster ID to result Pipeline
result = result.sort_values(by=['EV/EBITDA'])

#loop to get a list of long stocks
context.long_groups = [];
#loop to get a list of short stocks
context.short_groups = [];
for x in range(50):
group = result[result['Cluster']==x]
if len(group)>3 & len(group)<15: #eliminate clusters that are three or smaller

#get bottom 25% EV/EBITDA, ie the undervalued firms
group_bottom = group[group['EV/EBITDA'] < group['EV/EBITDA'].quantile(0.25)]

#get top 25% EV/EBITDA, ie the overvalued firms
group_top = group[group['EV/EBITDA'] > group['EV/EBITDA'].quantile(0.75)]
group_bottom = group_bottom.index.tolist()
group_top = group_top.index.tolist()
context.long_groups.append(group_bottom)
context.short_groups.append(group_top)

#for cluster that are 15 or larger, only get the smallest EV/EBITDA decile
elif len(group)>=15:
group_bottom = group[group['EV/EBITDA'] < group['EV/EBITDA'].quantile(0.1)]
group_top = group[group['EV/EBITDA'] > group['EV/EBITDA'].quantile(0.9)]
group_bottom = group_bottom.index.tolist()
group_top = group_top.index.tolist()
context.long_groups.append(group_bottom)
context.short_groups.append(group_top)

#merging clusters to lists of long and short stocks
context.long_groups = [val for sublist in context.long_groups for val in sublist]
context.short_groups = [val for sublist in context.short_groups for val in sublist]
context.groups = context.short_groups + context.long_groups

def rebalance(context,data):
for stock in context.portfolio.positions:
if stock not in context.groups and data.can_trade(stock):
order_target_percent(stock, 0)
for stock in context.long_groups:
order_target_percent(stock, context.short_leverage/len(context.short_groups))