Alpha and Beta in Finance

In this short video, Max Margenot gives an overview of alpha and beta in finance. Max gives an intuitive description of market beta and the calculation of alpha and how they interact with finance and in algorithms on Quantopian.

Learn more by subscribing to our Quantopian Channel to access all of our videos.

As always, if there are any topics you would like us to focus on for future videos, please comment below or send us a quick note at [email protected].

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

6 responses

Excellent, thanks Phoebe and Max!

I find these very helpful and informative as well. Keep'em coming! :)

I was playing around with a sort of convoluted way of doing a momentum strategy. Filters stock universe by stocks with the least amount of volatility in their alpha, and then goes long the stocks with the strongest alpha and shorts the ones with the weakest alpha. The hypothesis is that stocks that have exhibited the most consistent alpha may continue to exhibit consistent alpha.

The strategy produces a tiny bit of alpha, but not enough to overcome slippage.

I'm wondering if this line of thought is a dead end or if it is simply too naive as is. Currently uses the most naive version of alpha (does not factor in fama-french, etc. factors).

@Phoebe Foy is there some simple way within a Quantopian CustomFactor to calculate alpha that factors in smb, mom, sectors, etc.?

16
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
 Returns 1 Month 3 Month 6 Month 12 Month
 Alpha 1 Month 3 Month 6 Month 12 Month
 Beta 1 Month 3 Month 6 Month 12 Month
 Sharpe 1 Month 3 Month 6 Month 12 Month
 Sortino 1 Month 3 Month 6 Month 12 Month
 Volatility 1 Month 3 Month 6 Month 12 Month
 Max Drawdown 1 Month 3 Month 6 Month 12 Month
"""
Alpha, selected for low alpha volatility
by Viridian Hawk

"""
import quantopian.algorithm as algo
import quantopian.optimize as opt
import math
import numpy as np
import pandas as pd
from quantopian.pipeline import Pipeline, CustomFactor
from quantopian.pipeline.data.builtin import USEquityPricing
from quantopian.pipeline.factors import SimpleBeta, SimpleMovingAverage, AverageDollarVolume
from quantopian.pipeline.filters import StaticAssets

class Alpha(CustomFactor):
inputs = [USEquityPricing.close]
def compute(self, today, assets, out, close):
returns = pd.DataFrame(close, columns=assets).pct_change()[1:]
spy_returns = returns[symbol('SPY')]

# get beta and alpha by running linear regression
A = np.vstack([spy_returns, np.ones(len(spy_returns))]).T
m, p = np.linalg.lstsq(A, returns)[0]

out[:] = p

class Volatility(CustomFactor):
window_safe = True
def compute(self, today, assets, out, returns):
# [0:-1] is needed to remove last close since diff is one element shorter
daily_returns = np.diff(returns, axis = 0) / returns[0:-1]
out[:] = daily_returns.std(axis = 0) * math.sqrt(252)

def initialize(context):
"""
Called once at the start of the algorithm.
"""

# Rebalance every day, 1 hour after market open.
algo.schedule_function(
rebalance,
algo.date_rules.month_start(),
algo.time_rules.market_open(hours=1))
algo.schedule_function(
record_vars,
algo.date_rules.every_day(),
algo.time_rules.market_close(),
)

# Create our dynamic stock selector.
algo.attach_pipeline(make_pipeline(), 'pipeline')

def make_pipeline():

window_length = 120

beta = SimpleBeta(target=symbol('SPY'), regression_length=120)
m = beta.notnull()

m &= alpha.notnull()

m &= alphaStd.notnull()
m &= alphaStd.bottom(200)

pipe = Pipeline(
columns={
'alpha'   : alpha.zscore(),
'beta'    : beta,
},
screen=m
)
return pipe

context.output = algo.pipeline_output('pipeline')

def rebalance(context, data):
algo.order_optimal_portfolio(
objective=opt.MaximizeAlpha( context.output.alpha ),
constraints=[
opt.MaxGrossExposure( 1.00 ),
opt.NetExposure( -0.05, 0.05 ),
opt.PositionConcentration.with_equal_bounds( -0.005, 0.005 ),
opt.FactorExposure(
context.output[['beta']],
min_exposures={'beta': -0.00},
max_exposures={'beta':  0.00} ),
#opt.experimental.RiskModelExposure(
##opt.MaxTurnover( 0.25 if context.account.leverage > 0.85 and context.account.leverage <= 1.07  else 1.5 )
],
)

def record_vars(context, data):
longs = shorts = 0
for stock in context.portfolio.positions:
if context.portfolio.positions[stock].amount > 0:
longs += 1
elif  context.portfolio.positions[stock].amount < 0:
shorts += 1
record(longs = longs)
record(shorts = shorts)
record(l = context.account.leverage)
There was a runtime error.

High of 63, ending at 38 vs 30. See notes in code, various experiments.
The dramatic increase screening out certain stocks at the bottom end of alphaStd on this line suggests there are some long and/or short dragging things down a lot, maybe filterable in some other way, and therein lies our challenge.
m &= alphaStd.percentile_between( 5, 80, mask=m) It filters more than I expected.
Anyway, hope someone finds something here of interest.

6
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
 Returns 1 Month 3 Month 6 Month 12 Month
 Alpha 1 Month 3 Month 6 Month 12 Month
 Beta 1 Month 3 Month 6 Month 12 Month
 Sharpe 1 Month 3 Month 6 Month 12 Month
 Sortino 1 Month 3 Month 6 Month 12 Month
 Volatility 1 Month 3 Month 6 Month 12 Month
 Max Drawdown 1 Month 3 Month 6 Month 12 Month
'''
Alpha, selected for low alpha volatility
by Viridian Hawk

modified, blue seahawk ...

Tweaking is considered overfitting yet it can be educational at times.
Some surprising differences:
window_length = 120   # 9.94
window_length = 126   # 11.74  half year vs 120

m &= alphaStd.bottom( c.num_stocks )                # 15.60

# These two together
m &= alphaStd.percentile_between( 5, 80, mask=m)    # 27.51    huh?
m &= alphaStd.bottom( c.num_stocks )

Returns in comments are to to 1-5-2012 only

'''
import math
import numpy  as np
import pandas as pd
import quantopian.algorithm as algo
import quantopian.optimize  as opt
from quantopian.pipeline import Pipeline, CustomFactor
from quantopian.pipeline.data.builtin import USEquityPricing
from quantopian.pipeline.factors import SimpleBeta

class Alpha(CustomFactor):
inputs = [USEquityPricing.close]
def compute(self, today, assets, out, close):
close = nanfill(close)
returns = pd.DataFrame(close, columns=assets).pct_change()[1:]
spy_returns = returns[symbol('SPY')]
# get beta and alpha by running linear regression
A = np.vstack([spy_returns, np.ones(len(spy_returns))]).T
m, p = np.linalg.lstsq(A, returns)[0]
out[:] = p

class Volatility(CustomFactor):
window_safe = True
def compute(self, today, assets, out, returns):
returns = nanfill(returns)
# [0:-1] is needed to remove last close since diff is one element shorter
daily_returns = np.diff(returns, axis = 0) / returns[0:-1]
out[:] = daily_returns.std(axis = 0) * math.sqrt(252)

def initialize(context):
context.num_stocks = 200   # 11.74
context.num_stocks = 100   # 13.89
context.num_stocks =  80   # 11.08
context.num_stocks = 140   # 14.61
context.num_stocks = 125   # 15.30
context.num_stocks = 120   # 15.37
#                                                                                       to 1-5-2012
#                                                                         original:      9.06

algo.attach_pipeline(make_pipeline(context), 'pipeline')

def make_pipeline(c):
# Here, regression_length=window_length
window_length = 252   # 6.35
window_length = 120   # 9.94
window_length = 90    # 8.89
window_length = 60    # 4.03
window_length = 126   # 11.74  half year vs 120

# At this point I re-hard-coded as regression_length=126
# Any testing you want to do now with window_length will be different than before.

beta = SimpleBeta(target=symbol('SPY'), regression_length=126)          # 15.37
# a rather illogical test. whatever works.
beta = SimpleBeta(target=symbol('SPY'), regression_length=window_length).demean() # 15.60
m = beta.notnull()

alpha = Alpha(window_length=window_length, mask=m)              # 15.60
#alpha = Alpha(window_length=window_length, mask=m).demean()    # 15.60
m &= alpha.notnull()

# Moving this up above alpha = Alpha(w..., there's an error in Alpha(), so why not here?

m &= alphaStd.notnull()
#m &= alphaStd.bottom( c.num_stocks )   # 15.60
#m &= alphaStd.   top( c.num_stocks )   #  2.71

# These two together
#m &= alphaStd.percentile_between(10, 80, mask=m)   # 12.88
m &= alphaStd.percentile_between( 5, 80, mask=m)   # 27.51    huh?
#m &= alphaStd.percentile_between( 4, 80, mask=m)   # 24.95
#m &= alphaStd.percentile_between( 6, 80, mask=m)   # 21.13
m &= alphaStd.bottom( c.num_stocks )

return Pipeline(
columns = {
'beta'    : beta,
'alphaStd': alphaStd,
},
screen = m
)

context.output = algo.pipeline_output('pipeline').dropna()

longs = shorts = 0
for stock in context.portfolio.positions:
if context.portfolio.positions[stock].amount > 0:
longs += 1
elif  context.portfolio.positions[stock].amount < 0:
shorts += 1
record(longs = longs)
record(shorts = shorts)
record(l = context.account.leverage)

if 'log_data_done' not in context:    # show values once
log_data(context, data, context.output, 4)

a  = context.output.alpha
#a -= a.mean()        # centering, not necessary with demean() on it in pipeline
conc = 1.0 / len(a)

algo.order_optimal_portfolio(
objective=opt.MaximizeAlpha( a ),
constraints=[
opt.MaxGrossExposure( 1.00 ),
opt.NetExposure( -0.05, 0.05 ),
opt.PositionConcentration.with_equal_bounds( -conc, conc ),
opt.FactorExposure(
context.output[['beta']],
min_exposures={'beta': -0.00},
max_exposures={'beta':  0.00} ),
#opt.experimental.RiskModelExposure(
##opt.MaxTurnover( 0.25 if context.account.leverage > 0.85 and context.account.leverage <= 1.07  else 1.5 )
],
)

def nanfill(_in):
# From https://stackoverflow.com/questions/41190852/most-efficient-way-to-forward-fill-nan-values-in-numpy-array
# Includes a way to count nans on webpage at
#   https://www.quantopian.com/posts/forward-filling-nans-in-pipeline
#return _in            # uncomment to not run the code below
np.maximum.accumulate(idx,axis=1, out=idx)
return _in

def log_data(context, data, z, num, fields=None):
'''
if 'log_init_done' not in context:  # {:,} magic for adding commas
log.info('\${:,}    {} to {}'.format(int(context.portfolio.starting_cash),
get_environment('start').date(), get_environment('end').date()))
context.log_data_done = 1

if not len(z):
log.info('Empty')
return

# Options
log_nan_only = 0          # Only log if nans are present
show_sectors = 0          # If sectors, do you want to see them or not
show_sorted_details = 1   # [num] high & low securities sorted, each column
padmax = 6                # num characters for each field, starting point

# Series ......
if 'Series' in str(type(z)):    # is Series, not DataFrame
nan_count = len(z[z != z])
nan_count = 'NaNs {}/{}'.format(nan_count, len(z)) if nan_count else ''
if (log_nan_only and nan_count) or not log_nan_only:
log.info('{}{}{} {}'.format(
nan_count
))
log.info('Low\n{}' .format(z.sort_values(ascending=False).tail(num)))
return

# DataFrame ......
content_min_max = [ ['','min','mid','max',''] ] ; content = ''
for col in z.columns:
#try: z[col].max()
#except:
#    log.info('{} non-numeric'.format(col))
#    #continue   # skip non-numeric
if col == 'sector' and not show_sectors: continue
nan_count = len(z[col][z[col] != z[col]])
nan_count = 'NaNs {}/{}'.format(nan_count, len(z)) if nan_count else ''
# known bug, not always sorting strings alphabetically ...
srt       = z[col].sort_values() if type(z[col][0]) != str else z.iloc[z[col].str.lower().argsort()]
content_min_max.append([col, str(srt[0]), str(srt[len(srt)//2]), str(srt[-1]), nan_count])
if log_nan_only and nan_count or not log_nan_only:
if len(z.columns) == 1: content = 'Stocks: {}'.format(z.shape[0])
if len(z.columns)  > 1: content = 'Stocks: {}  Columns: {}'.format(z.shape[0], z.shape[1])
if len(z.columns):
paddings = [6 for i in range(4)]
for lst in content_min_max:    # set max lengths
i = 0
for val in lst[:4]:        # value in each sub-list
i += 1
content += ('\n{}{}{}{}{}'.format(
''
))
for lst in content_min_max[1:]:    # populate content using max lengths
content += ('\n{}{}{}{}     {}'.format(
lst[4],
))
log.info(content)

if not show_sorted_details: return
if len(z.columns) == 1:     return     # skip detail if only 1 column
if fields == None: details = z.columns
for detail in details:
if detail == 'sector' and not show_sectors: continue
lo = z[details].sort_values(by=detail, ascending=False).tail(num)
content  = ''
content += ('_ _ _   {}   _ _ _'  .format(detail))
content += ('\n\t... {} highs\n{}'.format(detail, str(hi)))
content += ('\n\t... {} lows \n{}'.format(detail, str(lo)))
if log_nan_only and not len(lo[lo[detail] != lo[detail]]):
continue  # skip if no nans
log.info(content)
#for i in range(0, len(content), 1000):
#    log.info(content[i:i+1000])
#log.info(content)

There was a runtime error.

Since Alpha is unknown, it is found by using Beta to strip away returns attributed to the market and can be based on a lot of things such as common factor risk, quantopian model risk, etc. Does this imply also that because we can quantify and isolate different market returns via Beta, we are also isolating the risk associated with those returns as well?

@Viridian Hawk @Blue Seahawk Can one of you explain how the &= operator is used in the algorithm code? Are you using it to do a set calculation to make m a big screen?

In &=, the & can be thought of as 'and'. Adding to the mask.. m &= something ... says to further restrict the mask progressively as pipeline is processed.
So the mask (m here) collection of stocks becomes progressively smaller and smaller for each operation on them.
I have the impression that route is important to avoid allowing operations to be accidentally chewing on larger sets than intended, which can then make for nans in the inputs to factors or gaps in zscore, rank, percentile_between, top, etc. So I always use mask=m with those, even if it isn't strictly necessary at the moment, because in future edits it could play a role.
Also then, any m &= something ... line one is using can be commented out in a comparison test to see what effect that has.