Possible Implementation Bug With Beta?

There is a possible inconsistency and/or vulnerabilty in code from the Q model algo cloned from here Quantopian Risk Model In Algorithms. I would rather call it a possible implementation bug. This snippet of code pertaining to beta (also recently changed to SimpleBeta) is exposed for users to change:

# Alpha Combination
# -----------------
# Assign every asset a combined rank and center the values at 0.
# For UNIVERSE_SIZE=500, the range of values should be roughly -250 to 250.
combined_alpha = (fcf_zscore + yield_zscore + sentiment_zscore).rank().demean()
**beta = 0.66*RollingLinearRegressionOfReturns(
target=sid(8554),
returns_length=5,
regression_length=260,
).beta + 0.33*1.0**


This beta calculation is used to constrain beta to minimum and maximum levels per user setting in Optimize API:

# Constrain beta-to-SPY to remain under the contest criteria.
beta_neutral = opt.FactorExposure(
pipeline_data[['beta']],
**min_exposures={'beta': -0.05},
max_exposures={'beta': 0.05},**
)


Contest threshold for beta is between +- 0.30. If user has the option to change window length in beta calculation, (1) doesn't this create inconsistencies with the beta threshold of +-0.30 because the beta distribution of different window lengths is also different, and (2) doesn't this create vulnerabilities in the sense that it could be used to "game" the algo if the beta constraint operation in Optimize API actually worked. I think the beta calculation code should be hardwired to standardized one year daily returns and hidden from user to change parameters perhaps in the riskloadings pipeline. Beta calculation should have fixed window length not configurable by user to be standardized, otherwise, the contest threshold for beta of between +- 0.30 becomes relative. So let's test this and see what the impact is against the baseline code. The only change we will make is change the regression_length=260 to regression_length=63. Below is first the baseline and next is the revision:

32
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
 Returns 1 Month 3 Month 6 Month 12 Month
 Alpha 1 Month 3 Month 6 Month 12 Month
 Beta 1 Month 3 Month 6 Month 12 Month
 Sharpe 1 Month 3 Month 6 Month 12 Month
 Sortino 1 Month 3 Month 6 Month 12 Month
 Volatility 1 Month 3 Month 6 Month 12 Month
 Max Drawdown 1 Month 3 Month 6 Month 12 Month
import pandas as pd

import quantopian.algorithm as algo
import quantopian.optimize as opt

from quantopian.pipeline import Pipeline
from quantopian.pipeline.data import builtin, Fundamentals, psychsignal
from quantopian.pipeline.factors import AverageDollarVolume, RollingLinearRegressionOfReturns
from quantopian.pipeline.factors.fundamentals import MarketCap
from quantopian.pipeline.classifiers.fundamentals import Sector

# Algorithm Parameters
# --------------------
UNIVERSE_SIZE = 1000
LIQUIDITY_LOOKBACK_LENGTH = 100

MAX_GROSS_LEVERAGE = 1.0
MAX_SHORT_POSITION_SIZE = 0.01  # 1%
MAX_LONG_POSITION_SIZE = 0.01   # 1%

def initialize(context):
# Universe Selection
# ------------------

# From what remains, each month, take the top UNIVERSE_SIZE stocks by average dollar
monthly_top_volume = (
AverageDollarVolume(window_length=LIQUIDITY_LOOKBACK_LENGTH)
.downsample('week_start')
)
# The final universe is the monthly top volume &-ed with the original base universe.
# &-ing these is necessary because the top volume universe is calculated at the start
# of each month, and an asset might fall out of the base universe during that month.
universe = monthly_top_volume & base_universe

# Alpha Generation
# ----------------
# Compute Z-scores of free cash flow yield and earnings yield.
# Both of these are fundamental value measures.

# Alpha Combination
# -----------------
# Assign every asset a combined rank and center the values at 0.
# For UNIVERSE_SIZE=500, the range of values should be roughly -250 to 250.
combined_alpha = (fcf_zscore + yield_zscore + sentiment_zscore).rank().demean()

beta = 0.66*RollingLinearRegressionOfReturns(
target=sid(8554),
returns_length=5,
regression_length=260,
).beta + 0.33*1.0

# --------------
# Create and register a pipeline computing our combined alpha and a sector
# code for every stock in our universe. We'll use these values in our
# optimization below.
pipe = Pipeline(
columns={
'alpha': combined_alpha,
'sector': Sector(),
'sentiment': sentiment_zscore,
'beta': beta,
},
# combined_alpha will be NaN for all stocks not in our universe,
# but we also want to make sure that we have a sector code for everything
screen=combined_alpha.notnull() & Sector().notnull() & beta.notnull(),
)

# Multiple pipelines can be used in a single algorithm.
algo.attach_pipeline(pipe, 'pipe')

# Schedule a function, 'do_portfolio_construction', to run twice a week
# ten minutes after market open.
algo.schedule_function(
do_portfolio_construction,
date_rule=algo.date_rules.week_start(),
half_days=False,
)

# Call pipeline_output in before_trading_start so that pipeline
# computations happen in the 5 minute timeout of BTS instead of the 1
# minute timeout of handle_data/scheduled functions.
context.pipeline_data = algo.pipeline_output('pipe')

# Portfolio Construction
# ----------------------
def do_portfolio_construction(context, data):
pipeline_data = context.pipeline_data

# Objective
# ---------
# For our objective, we simply use our naive ranks as an alpha coefficient
# and try to maximize that alpha.
#
# This is a **very** naive model. Since our alphas are so widely spread out,
# we should expect to always allocate the maximum amount of long/short
# capital to assets with high/low ranks.
#
# A more sophisticated model would apply some re-scaling here to try to generate
# more meaningful predictions of future returns.
objective = opt.MaximizeAlpha(pipeline_data.alpha)

# Constraints
# -----------
# Constrain our gross leverage to 1.0 or less. This means that the absolute
# value of our long and short positions should not exceed the value of our
# portfolio.
constrain_gross_leverage = opt.MaxGrossExposure(MAX_GROSS_LEVERAGE)

# Constrain individual position size to no more than a fixed percentage
# of our portfolio. Because our alphas are so widely distributed, we
# should expect to end up hitting this max for every stock in our universe.
constrain_pos_size = opt.PositionConcentration.with_equal_bounds(
-MAX_SHORT_POSITION_SIZE,
MAX_LONG_POSITION_SIZE,
)

# Constrain ourselves to allocate the same amount of capital to
# long and short positions.
market_neutral = opt.DollarNeutral()

# Constrain beta-to-SPY to remain under the contest criteria.
beta_neutral = opt.FactorExposure(
pipeline_data[['beta']],
min_exposures={'beta': -0.05},
max_exposures={'beta': 0.05},
)

# Constrain exposure to common sector and style risk
# factors, using the latest default values. At the time
# of writing, those are +-0.18 for sector and +-0.36 for
# style.
constrain_sector_style_risk = opt.experimental.RiskModelExposure(
)

# Run the optimization. This will calculate new portfolio weights and
# manage moving our portfolio toward the target.
algo.order_optimal_portfolio(
objective=objective,
constraints=[
constrain_gross_leverage,
constrain_pos_size,
market_neutral,
constrain_sector_style_risk,
beta_neutral,
],
)
There was a runtime error.
20 responses

And here is revised algo that changes the regression_length from 260 to 63. The differences from baseline are:
Total Returns +0.29%
Alpha - None, both the same
Beta -0.01
Sharpe -0.03
Sortino -0.05
Volatility - None, both the same
Max Drawdown +0.13

32
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
 Returns 1 Month 3 Month 6 Month 12 Month
 Alpha 1 Month 3 Month 6 Month 12 Month
 Beta 1 Month 3 Month 6 Month 12 Month
 Sharpe 1 Month 3 Month 6 Month 12 Month
 Sortino 1 Month 3 Month 6 Month 12 Month
 Volatility 1 Month 3 Month 6 Month 12 Month
 Max Drawdown 1 Month 3 Month 6 Month 12 Month
import pandas as pd

import quantopian.algorithm as algo
import quantopian.optimize as opt

from quantopian.pipeline import Pipeline
from quantopian.pipeline.data import builtin, Fundamentals, psychsignal
from quantopian.pipeline.factors import AverageDollarVolume, RollingLinearRegressionOfReturns
from quantopian.pipeline.factors.fundamentals import MarketCap
from quantopian.pipeline.classifiers.fundamentals import Sector

# Algorithm Parameters
# --------------------
UNIVERSE_SIZE = 1000
LIQUIDITY_LOOKBACK_LENGTH = 100

MAX_GROSS_LEVERAGE = 1.0
MAX_SHORT_POSITION_SIZE = 0.01  # 1%
MAX_LONG_POSITION_SIZE = 0.01   # 1%

def initialize(context):
# Universe Selection
# ------------------

# From what remains, each month, take the top UNIVERSE_SIZE stocks by average dollar
monthly_top_volume = (
AverageDollarVolume(window_length=LIQUIDITY_LOOKBACK_LENGTH)
.downsample('week_start')
)
# The final universe is the monthly top volume &-ed with the original base universe.
# &-ing these is necessary because the top volume universe is calculated at the start
# of each month, and an asset might fall out of the base universe during that month.
universe = monthly_top_volume & base_universe

# Alpha Generation
# ----------------
# Compute Z-scores of free cash flow yield and earnings yield.
# Both of these are fundamental value measures.

# Alpha Combination
# -----------------
# Assign every asset a combined rank and center the values at 0.
# For UNIVERSE_SIZE=500, the range of values should be roughly -250 to 250.
combined_alpha = (fcf_zscore + yield_zscore + sentiment_zscore).rank().demean()

beta = 0.66*RollingLinearRegressionOfReturns(
target=sid(8554),
returns_length=5,
regression_length=63, #260,
).beta + 0.33*1.0

# --------------
# Create and register a pipeline computing our combined alpha and a sector
# code for every stock in our universe. We'll use these values in our
# optimization below.
pipe = Pipeline(
columns={
'alpha': combined_alpha,
'sector': Sector(),
'sentiment': sentiment_zscore,
'beta': beta,
},
# combined_alpha will be NaN for all stocks not in our universe,
# but we also want to make sure that we have a sector code for everything
screen=combined_alpha.notnull() & Sector().notnull() & beta.notnull(),
)

# Multiple pipelines can be used in a single algorithm.
algo.attach_pipeline(pipe, 'pipe')

# Schedule a function, 'do_portfolio_construction', to run twice a week
# ten minutes after market open.
algo.schedule_function(
do_portfolio_construction,
date_rule=algo.date_rules.week_start(),
half_days=False,
)

# Call pipeline_output in before_trading_start so that pipeline
# computations happen in the 5 minute timeout of BTS instead of the 1
# minute timeout of handle_data/scheduled functions.
context.pipeline_data = algo.pipeline_output('pipe')

# Portfolio Construction
# ----------------------
def do_portfolio_construction(context, data):
pipeline_data = context.pipeline_data

# Objective
# ---------
# For our objective, we simply use our naive ranks as an alpha coefficient
# and try to maximize that alpha.
#
# This is a **very** naive model. Since our alphas are so widely spread out,
# we should expect to always allocate the maximum amount of long/short
# capital to assets with high/low ranks.
#
# A more sophisticated model would apply some re-scaling here to try to generate
# more meaningful predictions of future returns.
objective = opt.MaximizeAlpha(pipeline_data.alpha)

# Constraints
# -----------
# Constrain our gross leverage to 1.0 or less. This means that the absolute
# value of our long and short positions should not exceed the value of our
# portfolio.
constrain_gross_leverage = opt.MaxGrossExposure(MAX_GROSS_LEVERAGE)

# Constrain individual position size to no more than a fixed percentage
# of our portfolio. Because our alphas are so widely distributed, we
# should expect to end up hitting this max for every stock in our universe.
constrain_pos_size = opt.PositionConcentration.with_equal_bounds(
-MAX_SHORT_POSITION_SIZE,
MAX_LONG_POSITION_SIZE,
)

# Constrain ourselves to allocate the same amount of capital to
# long and short positions.
market_neutral = opt.DollarNeutral()

# Constrain beta-to-SPY to remain under the contest criteria.
beta_neutral = opt.FactorExposure(
pipeline_data[['beta']],
min_exposures={'beta': -0.05},
max_exposures={'beta': 0.05},
)

# Constrain exposure to common sector and style risk
# factors, using the latest default values. At the time
# of writing, those are +-0.18 for sector and +-0.36 for
# style.
constrain_sector_style_risk = opt.experimental.RiskModelExposure(
)

# Run the optimization. This will calculate new portfolio weights and
# manage moving our portfolio toward the target.
algo.order_optimal_portfolio(
objective=objective,
constraints=[
constrain_gross_leverage,
constrain_pos_size,
market_neutral,
constrain_sector_style_risk,
#beta_neutral,
],
)
There was a runtime error.

It is also important to note that when I ran different scenarios by changing or tweaking user configurable parameters, I get the same exact result as with revision (1) basically an improvement in total returns and maximum drawdowns but a degradation in Sharpe and Sortino and no effect on others. Revision (2) uses SimpleBeta instead of the older RollingRegression method and changing the window_length from 260 to 63 and revision (3) is eliminating all references and calculations of beta in line with my hypothesis in this thread beta-constraint-in-risk-model-totally-unnecessary

Now when I tried Grant's proposed tweak in the above mentioned thread :

A brief note, in case folks are dinking around with trying to null out beta. I've found that a bias is required by the Optimize API constraints to see a dramatic effect:

MIN_BETA_EXPOSURE = -1
MAX_BETA_EXPOSURE = -0.5
I think what's going on is that for a volatile universe, the trailing beta factor is not predictive for stocks that are tending to run up with the bull market. So, a biased Optimize API constraint on beta is required to compensate for the lack of predictability.

So, in this Q model algo under examine, revision(4) we change MIN_BETA_EXPOSURE = -1 and MAX_BETA_EXPOSURE = -0.5 per Grant.
I get this error:

Rows/Columns with NaNs:
row=Equity(8817 [GGP]) col='beta'
row=Equity(21284 [ITMN]) col='beta'
row=Equity(22269 [SLXP]) col='beta'
row=Equity(30464 [VHC]) col='beta'
Rows/Columns with Infs:
None
There was a runtime error on line 147.

I suspected it had something to do with old beta calculation using RollingRegression method, so I replace it with SimpleBeta default settings and eliminated the error and also gave exact same results as revision(1). Here is it, with Grant's "wacky" tweak:

32
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
 Returns 1 Month 3 Month 6 Month 12 Month
 Alpha 1 Month 3 Month 6 Month 12 Month
 Beta 1 Month 3 Month 6 Month 12 Month
 Sharpe 1 Month 3 Month 6 Month 12 Month
 Sortino 1 Month 3 Month 6 Month 12 Month
 Volatility 1 Month 3 Month 6 Month 12 Month
 Max Drawdown 1 Month 3 Month 6 Month 12 Month
import pandas as pd

import quantopian.algorithm as algo
import quantopian.optimize as opt

from quantopian.pipeline import Pipeline
from quantopian.pipeline.data import builtin, Fundamentals, psychsignal
from quantopian.pipeline.factors import AverageDollarVolume, RollingLinearRegressionOfReturns, SimpleBeta
from quantopian.pipeline.factors.fundamentals import MarketCap
from quantopian.pipeline.classifiers.fundamentals import Sector

# Algorithm Parameters
# --------------------
UNIVERSE_SIZE = 1000
LIQUIDITY_LOOKBACK_LENGTH = 100

MAX_GROSS_LEVERAGE = 1.0
MAX_SHORT_POSITION_SIZE = 0.01  # 1%
MAX_LONG_POSITION_SIZE = 0.01   # 1%

def initialize(context):
# Universe Selection
# ------------------

# From what remains, each month, take the top UNIVERSE_SIZE stocks by average dollar
monthly_top_volume = (
AverageDollarVolume(window_length=LIQUIDITY_LOOKBACK_LENGTH)
.downsample('week_start')
)
# The final universe is the monthly top volume &-ed with the original base universe.
# &-ing these is necessary because the top volume universe is calculated at the start
# of each month, and an asset might fall out of the base universe during that month.
universe = monthly_top_volume & base_universe

# Alpha Generation
# ----------------
# Compute Z-scores of free cash flow yield and earnings yield.
# Both of these are fundamental value measures.

# Alpha Combination
# -----------------
# Assign every asset a combined rank and center the values at 0.
# For UNIVERSE_SIZE=500, the range of values should be roughly -250 to 250.
combined_alpha = (fcf_zscore + yield_zscore + sentiment_zscore).rank().demean()
'''
beta = 0.66*RollingLinearRegressionOfReturns(
target=sid(8554),
returns_length=5,
regression_length=260,
).beta + 0.33*1.0
'''
beta = SimpleBeta(target=sid(8554), regression_length=260)

# --------------
# Create and register a pipeline computing our combined alpha and a sector
# code for every stock in our universe. We'll use these values in our
# optimization below.
pipe = Pipeline(
columns={
'alpha': combined_alpha,
'sector': Sector(),
'sentiment': sentiment_zscore,
'beta': beta,
},
# combined_alpha will be NaN for all stocks not in our universe,
# but we also want to make sure that we have a sector code for everything
screen=combined_alpha.notnull() & Sector().notnull() & beta.notnull(),
)

# Multiple pipelines can be used in a single algorithm.
algo.attach_pipeline(pipe, 'pipe')

# Schedule a function, 'do_portfolio_construction', to run twice a week
# ten minutes after market open.
algo.schedule_function(
do_portfolio_construction,
date_rule=algo.date_rules.week_start(),
half_days=False,
)

# Call pipeline_output in before_trading_start so that pipeline
# computations happen in the 5 minute timeout of BTS instead of the 1
# minute timeout of handle_data/scheduled functions.
context.pipeline_data = algo.pipeline_output('pipe')

# Portfolio Construction
# ----------------------
def do_portfolio_construction(context, data):
pipeline_data = context.pipeline_data

# Objective
# ---------
# For our objective, we simply use our naive ranks as an alpha coefficient
# and try to maximize that alpha.
#
# This is a **very** naive model. Since our alphas are so widely spread out,
# we should expect to always allocate the maximum amount of long/short
# capital to assets with high/low ranks.
#
# A more sophisticated model would apply some re-scaling here to try to generate
# more meaningful predictions of future returns.
objective = opt.MaximizeAlpha(pipeline_data.alpha)

# Constraints
# -----------
# Constrain our gross leverage to 1.0 or less. This means that the absolute
# value of our long and short positions should not exceed the value of our
# portfolio.
constrain_gross_leverage = opt.MaxGrossExposure(MAX_GROSS_LEVERAGE)

# Constrain individual position size to no more than a fixed percentage
# of our portfolio. Because our alphas are so widely distributed, we
# should expect to end up hitting this max for every stock in our universe.
constrain_pos_size = opt.PositionConcentration.with_equal_bounds(
-MAX_SHORT_POSITION_SIZE,
MAX_LONG_POSITION_SIZE,
)

# Constrain ourselves to allocate the same amount of capital to
# long and short positions.
market_neutral = opt.DollarNeutral()

# Constrain beta-to-SPY to remain under the contest criteria.
beta_neutral = opt.FactorExposure(
pipeline_data[['beta']],
min_exposures={'beta': -1.0}, #-0.05},
max_exposures={'beta': -0.5}, #0.05},
)

# Constrain exposure to common sector and style risk
# factors, using the latest default values. At the time
# of writing, those are +-0.18 for sector and +-0.36 for
# style.
constrain_sector_style_risk = opt.experimental.RiskModelExposure(
)

# Run the optimization. This will calculate new portfolio weights and
# manage moving our portfolio toward the target.
algo.order_optimal_portfolio(
objective=objective,
constraints=[
constrain_gross_leverage,
constrain_pos_size,
market_neutral,
constrain_sector_style_risk,
#beta_neutral,
],
)
There was a runtime error.

Demeaning beta as a test, no difference.

context.pipeline_data = algo.pipeline_output('pipe')
context.pipeline_data.beta -= context.pipeline_data.beta.mean()


@Blue, when you say "no difference", is it no difference to baseline or revisions(1-4)?

Hi James -

I think the main issue here may be that in order for the Optimize API to be effective in keeping beta in check, it needs a beta forecast (just like an alpha factor needs to be predictive). The whacky thing I did, biasing the beta constraint, is probably plain old look-ahead bias (over-fitting...call it what you will). What's needed is a way to predict beta in such a way that the Optimize API can do a better job going forward, while still being able to trade volatile stocks that don't move in lock-step with the market. At least that's my intuition at this point.

Observations here and simultaneuosly in Grant's thread here beta-constraint-study lead to several findings that are confirming of my hypothesis in this thread beta-constraint-in-risk-model-totally-unnecessary that beta is not only totally unnecessary but also (1) beta constraint in Optimize does not do a good job constraining beta to configurable set levels, (2) implementation bug, exposing beta calculation to become user configurable lead to inconsistencies and vulnerabilities and (3) doing away with beta totally has positive effects in terms of increased returns and improved drawdowns. In short, it does not work properly.

I suspect that nothing is broken. The fundamental issue is that if one picks stocks that are not volatile and have common betas (e.g. are all strongly correlated with SPY), then using a simple trailing window regression beta estimate is fine. However, it may be impossible to do much better than the risk-free rate trading them long-short. So, one needs to look to another universe of stocks, which is less predictable, and this is where things fall apart.

So, I think to move this forward, it would be of interest to hear of ways of predicting beta better as an input to the optimization.

@Grant, it is broken, simply does not work properly as intended. You can try making it work, good luck with that.

I think it is a question of what is broken. For example, the Optimize API seems o.k.:

class FactorExposure(loadings, min_exposures, max_exposures)

Constraint requiring bounded net exposure to a set of risk factors.

Factor loadings are specified as a DataFrame of floats whose columns are factor labels and whose index contains Assets. Minimum and maximum factor exposures are specified as maps from factor label to min/max net exposure.

Parameters:

loadings (pd.DataFrame) -- An (assets x labels) frame of weights for each (asset, factor) pair.
min_exposures (dict or pd.Series) -- Minimum net exposure values for each factor.
max_exposures (dict or pd.Series) -- Maximum net exposure values for each factor.


It does work, as I showed, by biasing the beta contraint (offsetting its mean from zero). So this would suggest that if one had a way of predicting the required offset better point-in-time, then it would work.

@Grant, one symptom that it is broken is when its purpose does not attain the desired objective. The purpose of constraining beta in Optimize API is with the desired objective of complying with the beta thresholds of a fixed range +- 0.30. Numerous examples both you and I ran shows that it is not doing a job in constraining beta to what the user set it to be, i.e. +- 0.05. You can only validly test the effectiveness or not of constraining beta to desired levels as long as they are set inbounds of the beta thresholds. Now under your scenario you have gone out of bounds of the beta thresholds and it gave you funky results. You have in effect changed the purpose of beta constraint to something else! It is no longer meant to constraint beta to comply with the thresholds but now asking that beta to "explore where it has never gone before", and that does not also mean that you will find the holy grail. It simply means it is broken.

one symptom that it is broken is when its purpose does not attain the desired objective

Agreed. The problem potentially, though, is that forecasting beta with complete accuracy is probably not so easy. I'm wondering if Alphalens could be adapted to look at beta as a factor, not for forecasting returns but forecasting itself?

@Grant, you are one tough nut to crack, LOL! Generally, if one can forecast a future variable with a statistically significant degree of accuracy, you are creating alpha. Empirical studies already shows that the stock universe generally moves with the market, the underlying foundation of my hypothesis in the other thread. In beta to SPY terms computed using industry standards (one year daily returns), this translates to a stock's beta to almost always have a positive correlation with SPY and on the average, will be highly correlated. Above two sigma or black swan scenarios do occur, i.e. Facebook IPO, its initial price movements may be totally uncorrelated to SPY movements point in time. So if you can accurately predict beta and improve your returns then yes, I would have to say you are creating alpha because you found a way to arbitrage price inefficiencies cause by beta variations. Good luck with that!

Yes, the idea that one is going to null out beta perfectly with a simple trailing beta indicator won't work so well (and I agree that it is a mirage that it is working, when dealing with a basket of stocks that are all highly correlated to SPY and the correlation persists...in this case, the dollar-neutral constraint suffices and additionally, it will be really hard to make money because of all of the CAPM/EMH arguments). Hence, I started exploring the extremes of the QTradableStocksUS. The hard part is getting beta ~ 0, since the Q recipe doesn't work (unless one is dealing with SPY-like baskets, which won't have much alpha in the first place). At least that's what I'm gathering from all of this...

To answer, there are three backtests above my demean test and I used the first one, baseline. The other two have beta_neutral commented out, no difference would be expected. I'd suggest maybe try flipping beta values and see if there is a difference. And then without market_neutral.

context.pipeline_data = algo.pipeline_output('pipe')
context.pipeline_data.beta *= -1


@Blue, forgive me for not understanding your answer (maybe my low IQ is kicking in!). Am I interpreting this correct, you took the baseline code, you demeaned the beta in the pipeline and then commented out the beta neutral constraint in the Optimize API and you get the same results as the baseline? Am I correct in my interpretation? This is rather not suprising because once you comment out the beta constraint in Optimize API, no amount of data transformation (in your case, demean) to beta can or should have any effect to the algo performance since you've already eliminated the beta constraint in Optimize API. It will still calculate the algos ex post beta. Beta can only have an effect to the algo's performance, if you incorporated it as an alpha factor to your algo.

Whew, those were some impressive misunderstandings of what I said. Welp, we all have tough days so I can understand.
I'll just present more detail since I have more results to add now.
The idea is to throw bogus beta values at it to see how much difference they make.
This is a test, this is only a test. If this had been a competing algorithm the metrics would be better.

In the following, just like in my stuff above, beta_neutral is always active ...

Original baseline (the first backtest at the top of this page)
Ret    Bench   Alpha   Beta  Shrp  Srtno  Vltly    DD
37.21%  74.36%   0.09  -0.06  1.75   2.67   0.05  -4.22%

Original baseline with beta demeaned (all shifted downward by the average so-to-speak)
Ret    Bench   Alpha   Beta  Shrp  Srtno  Vltly    DD
37.23%  74.36%   0.09  -0.06  1.75   2.67   0.05  -4.22%

Original baseline with beta * -1 (positive made negative and negative made positive)
Ret    Bench   Alpha   Beta  Shrp  Srtno  Vltly    DD
37.21%  74.36%   0.09  -0.06  1.75   2.67   0.05  -4.22%

Original baseline with market_neutral deactivated, no other changes
Ret    Bench   Alpha   Beta  Shrp  Srtno  Vltly    DD
40.4%   74.36%   0.10  -0.04  1.90   2.92   0.05  -4.32%

Original baseline with market_neutral deactivated and beta * -1
Ret    Bench   Alpha   Beta  Shrp  Srtno  Vltly    DD
40.4%   74.36%   0.10  -0.04  1.90   2.92   0.05  -4.32%


Conclusion: beta_neutral appears to be having no effect and there isn't enough information to draw much more of a conclusion except ...
Beyond that, a theory could be that alpha and sentiment (incorporated into combined_alpha) are significant and beta is pretty much nothing by comparison. I imagine that can be the case when algo beta is already in fairly good shape.

. .

Below is the code I used to multiply beta values by -1 to flip them upside-down.
Now, someone might think, why on earth would you do that? Because, when Optimize sees low beta, for example, below zero, I'm assuming it would say, let's apply a little more weight to stocks with higher beta values, and part of the overall complaint on this page, as I understand it, is, that that may not be working right, so, if Opt is in fact doing that, it would now be giving more weight to stocks that actually have lower beta values, and one could expect the algorithm overall beta to move even further away from zero. Again, it is a test, the point is to learn something, and what's learned is, while I expected an effect, there is no effect.

def before_trading_start(context, data):
context.pipeline_data = algo.pipeline_output('pipe')

# Multiplying beta values by -1
# If beta_neutral is doing much, this should show quite a difference.
context.pipeline_data['beta'] = context.pipeline_data['beta'].copy() * -1

return    # Comment this to deactivate it if you want to verify that the changes stuck
log.info('{} -> {}  {}'.format(
'%.1f' % ori.beta,
ori.index
))



A more effective test would be to check any allocation differences with and without beta_neutral along with their beta values. This was just quicker.

Hi James -

I don't have the financial modeling background that some here have, but I gather that there are several dimensions of this problem to consider:

1. The combination of the two constraints and their interplay: dollar-neutral and beta-neutral. Point-in-time, dollar neutrality means that if one closes out all of the long positions and ends up with some cash, and does the same with the short positions, there's nothing left. It can be implemented with an order_target_percent approach such that the portfolio weights sum to zero (although this doesn't guarantee dollar-neutrality, since one is relying on the orders being filled completely and at exactly the anticipated (i.e. forecast) prices). Under the constraints of the new contest (mirroring requirements for a hedge-fund-suitable algo), it appears to be reasonable. The expectation is that algos will be holding 100-150 stocks from the QTradableStocksUS at an point in time, with equal weights long and short. The beta-neutral constraint is an overlay on top of the dollar-neutral one, which says that point-in-time, the trailing beta with respect to SPY needs to be low (ideally beta ~ 0, but |beta| ~ 0.3 is acceptable). It is independent of the dollar-neutral constraint. For example, one could go long on the high-beta stocks in the QTradableStocksUS and go short on the low-beta ones, inducing a skew. Both dollar neutrality and beta-neutrality are considered "risks" that need to be managed; Quantopian has provided constraints within the Optimize API to manage the risks point-in-time.
2. Managing the dollar-neutrality and beta-neutrality risks both require seeing into the future. For the former, one only needs to see a small delta_t into the future, and on a portfolio of 100-150 stocks, the discrepancies average out. The baseline approach to managing beta-neutrality is to use a 1-year historical measure of beta for each stock, and constrain orders under the assumption that the trailing beta measure will persist ("the calculation is now simply the 6 month rolling beta"). It would seem that achieving beta-neutrality is quite a different ballgame from that of dollar-neutrality, at least as it is implemented on Quantopian.
3. It is worth noting that there is no hard contest requirement to use the Optimize API in all of its glory; one just needs to use order_optimal_portfolio for all orders. So, for a given strategy, the recommended risk-management tools need not be applied; "roll your own" if you want (open-sourcing the Optimize API and risk model would help in this regard...still waiting for a response from Quantopian on this topic).
4. As illustrated on a different thread, the Optimize API standard beta-constraint approach can be used in a kludgey way to achieve beta neutrality, when it cannot be achieved with lower and upper beta constraints centered about beta = 0. Assuming that there is no actual "bug" in either in the Optimize API implementation or the computation of the beta in the algo returns, then the example I provided highlights some room for improvement in the Quantopian tools for managing the beta risk. I do think that Quantopian should dig into this phenomenon (particularly if it really need beta ~ 0 algos).
5. A separate, apparent argument on Quantopian is whether beta (at least in small doses) is a risk at all. The guidance I've gotten from Quantopian is that they and their potential customers value alpha not beta. The payout to algo authors might look something like:
(payout) = (percentage)(algo return)(1-|beta|)


If the payout relation above is at least roughly qualitatively correct, then trying to achieve beta ~ 0 is worthwhile for Quantopian quants (although perhaps if the algo return is a lot higher with |beta| > 0, then maybe beta ~ 0 would not be optimal for the quant).

Hi Grant, yes there are several dimensions to the problem. Let's take it up point by point.

1. Beta neutral and dollar neutral constraints are two seperate and distinct means to achieve a similar objective of uncorrelated returns vis a vis the market returns (SPY) which are designed to mitigate/hedge against market risk exposures. In the context of the new contest rules above, the beta constraint is given a leeway of +-0.3, pretty much industry standard for this type of strategy and net dollar exposure of <= 0. 10. The operation of these constraints are hidden under the hood of Optimize API. We can not see it, at least not yet, so it is hard to tell if (a) it is programmatically accurate and does its purpose which is to constraint to its set configurations that are in line or inbounds of its desired thresholds to meet contest compliance, (b) we do not know if there is precedence between these two constraints inside the operation of Optimize, (c) we do not know exactly what the net effect of the interaction of these two variables with very similar objective is and (d) probably other things. Fortunately, even without the benefit of seeing what is under the hood of Optimize, we can test the efficacy of the algorithm by asking a simple question of does it work as intended? In my many years of trading systems development, I often subscribe to the Occam's razor principle which states that among competing hypotheses, the one with the fewest assumptions should be selected. The preference for simplicity in the scientific method is based on the falsifiability criterion. So I attack this problem with some knowledge of empirical facts and this being that the stock universe generally moves with the market as measured by beta to SPY which almost always gives a positive correlation and on the average significant correlation. On the other hand, dollar neutral assures that one has a fairly equal amount of longs and shorts to neutralize the exposures to market risks . And this is the important part of my hypothesis, since there is significant correlation between individual stock's movement to that of the market, the dollar neutral operation may have already achieved beta neutrality ( at least within the context of acceptable contest thresholds) as an unintentional side effect because the betas of long cancels the betas of the shorts if the correlation persists and hold. While it is not 100% guarantee considering above two sigma or black swan scenarios, beta can be considered redundant if the effect of dollar neutral alone satisfies both threshold conditions of beta and dollar neutral in one operation giving it algorithmic efficiency. This is how Occam's razor is applied, "shaving off" that, that is unnecessary. We've both tested this by commented out all references to beta and at least for my own algos I even get a favorable effect in terms of increased returns and improved drawdowns. The other thing to note is even if you comment out all references to beta in exposed code, beta is still calculated internally in Optimize API and this could well be where the implementation bug resides. One beta code exposed to the user that is configurable and another beta code hidden in Optimize API, how they interact with each other may be where the problem is. To me, this operation is broken.

2. Beta or dollar neutrality are not risks per se but rather the means to manage the risks of market exposures which is the style of this strategy.

4. I think the results you are getting are more of the effects of the broken system which tests shows the following symptoms, among others : (1) beta constraint in Optimize does not do a good job constraining beta to configurable set levels, (2) implementation bug, exposing beta calculation to become user configurable lead to inconsistencies and vulnerabilities and (3) doing away with beta totally has positive effects in terms of increased returns and improved drawdowns and (4) effects of interplay between exposed beta calculation and hidden beta calculation is unknown.

5. No comment.

Thanks James -

Regarding transparency (i.e. open-sourced code), I'm not sure what's going on with Quantopian these days. To my knowledge, none of these critical modules have been open-sourced:

1. Optimize API
2. Risk model pipeline factor
3. SimpleBeta pipeline factor

Maybe they are still catching up with sorting out how to publish the code? Or it won't be published? In any case, I agree that there is some unnecessary black-box code we are dealing with.

Along these lines, frankly I'm not exactly sure what goes on inside order_optimal_portfolio. I think that this should be kinda-sorta equivalent to order_target_percent:

objective = opt.TargetWeights(weights)

order_optimal_portfolio(
objective=objective,
constraints=[],
)


where weights is a Pandas series (or Python dict) of the stocks and their weights. I think it still engages the optimization apparatus; it doesn't bypass it, passing through to order_target_percent (or whatever is there in its place). I suppose one could set up a test. Frankly, however, it is kinda silly trying to reserve-engineer what might be going on under the hood, when the code could simply be published.

Regarding controlling beta with the Optimize API, your Occam's Razor approach is probably spot on. Generally, if the optimizer is doing a lot of shuffling around, one probably isn't close enough to a solution in the first place and should sort out how to input an alpha factor (set of weights) that is closer to the desired output, across the various dimensions of contest/fund requirements.

@Grant, IMHO and on the bigger picture perspective, I think in Quantopian's quest to find their true alpha under this long/short market neutral strategy framework, they have loaded it with too many constraints without or remiss of thoroughly understanding the effects of these different constraints to their ultimate objective/s. Here is where Occam's Razor can come in handy, eliminate what is unnecessary. Less is more even only on the basis that it is more testable and verifiable. In real term applications of actual hedge funds by major players for this type strategies, most I know of, limit constraint to 2 or 3 to achieved their desired objective. Simply put, there is too much overkill.