(Partial) implementation of Quantitative Value algorithm

Here's my implementation of the algorithm in Quantitative Value, following the discussion in this post. There are a few significant differences between this algorithm and the one in the book, owing to the difference between the Morningstar data and the data available to the authors. Also, Quantopian's data set extends back only 12 years, so putting in the full 8 year margin and franchise power computations didn't make a lot of sense. I also have not made any explicit effort in restricting the stock universe, save one stock that for some reason was breaking the backtest.

For this post I have removed several variations that have boosted returns to 750%, in part because they would distract from the core algorithm in the text. And also because the core algorithm seems a more sensible starting point for exploring other ideas. Note that there is no leverage and no short positions in this algorithm.

173
Loading...
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
 Returns 1 Month 3 Month 6 Month 12 Month
 Alpha 1 Month 3 Month 6 Month 12 Month
 Beta 1 Month 3 Month 6 Month 12 Month
 Sharpe 1 Month 3 Month 6 Month 12 Month
 Sortino 1 Month 3 Month 6 Month 12 Month
 Volatility 1 Month 3 Month 6 Month 12 Month
 Max Drawdown 1 Month 3 Month 6 Month 12 Month
import datetime as dt
import numpy as np
import pandas as pd

from quantopian.algorithm import attach_pipeline, pipeline_output
from quantopian.pipeline import CustomFactor, Pipeline
from quantopian.pipeline.data import morningstar
from quantopian.pipeline.factors import Returns, AverageDollarVolume
from zipline.api import get_environment
from zipline.finance.commission import PerShare

###----------------------------------------
### Key Constants

benchmark = symbol('SPY')
initial_leverage = 0.95

stocks_to_avoid = [symbol('SXCL')]

###----------------------------------------
### Custom Factors

class MedianValue(CustomFactor):
def compute(self, today, assets, out, data):
out[:] = np.nanmedian(data, axis=0)

###----------------------------------------
### Utility

# Best guess as to checking type of object in Quantopian
def is_equity(stock):
return type(stock).__name__.find('Equity') != -1

def is_position(stock):
return type(stock).__name__.find('Position') != -1

def get_symbol(stock):
try:
return stock.symbol
except:
log.info('Stock does not have attribute symbol: {0}'.format(stock))
return str(stock)

def log_positions_and_sizes(prefix, positions):
pns = {}
for stock in positions:
if is_equity(stock):
pns[get_symbol(stock)] = positions[stock].amount
#    log.info(prefix + ' ' + ' Positions: ' + str(pns))

def is_valid_position(p):
today = get_datetime()
return (is_position(p) and (p.amount > 0) and ((today - p.sid.end_date).days < 2))

def valid_positions(positions):
return dict([[stock, positions[stock]] for stock in positions
if (is_equity(stock) and is_valid_position(positions[stock]))])

def count_positions(positions):
return len(valid_positions(positions))

def geometric_means(values):
result = 1.0
for v in values:
result = result * (v + 1.0)
return v**(1.0/len(values)) - 1.0

def ranked_geometric_means(values, ascending=False):
return geometric_means(values).dropna().rank(method='first', ascending=ascending)

def ranked_sum(values, ascending=False):
vsum = 0.0
for v in values:
vsum = vsum + v
return vsum.dropna().rank(method='first', ascending=ascending)

# series is a list of pandas.Series
# What's the plural of series?
def df_from_series(series, prefix):
result = None
for i in range(0, len(series)):
dfi = series[i].to_frame(name=(prefix + str(i)))
if result is None:
result = dfi
else:
result = result.join(dfi)
return result

def df_for_column(dfs, column_name):
return df_from_series([df.loc[column_name] for df in dfs], column_name)

def n_years_ago(n):
today = get_datetime()
day = 28 if today.day == 29 and today.month == 2 else today.day
return today.replace(year=(today.year - n), day=day)

def positive_to_one(x):
return 1.0 if x > 0 else 0.0

###----------------------------------------
### Tasks

### COMBOACCRUAL

# STA

def compute_sta(today_df, last_year_df):
diff_df = today_df - last_year_df
ca = diff_df.loc['current_assets'] - diff_df.loc['cash_cash_equivalents_and_marketable_securities']
cl = diff_df.loc['current_liabilities'] - diff_df.loc['current_debt']
return ((ca - cl - today_df.loc['depreciation_and_amortization'])/today_df.loc['total_assets'])

# SNOA

def compute_snoa(today_df):
total_assets = today_df.loc['total_assets']
cash = today_df.loc['cash_cash_equivalents_and_marketable_securities']
total_liabilities = today_df.loc['total_liabilities']
payables = today_df.loc['payables_and_accrued_expenses']
return ((total_assets - cash) - (total_liabilities - payables))/total_assets

# Comboaccrual

def query_comboaccrual_metrics(date):
return get_fundamentals(
query(fundamentals.balance_sheet.current_assets,
fundamentals.balance_sheet.current_liabilities,
fundamentals.balance_sheet.current_debt,
# Commented out because proportion of companies
# reporting this figure is small
# fundamentals.balance_sheet.income_tax_payable,
fundamentals.cash_flow_statement.depreciation_and_amortization,
fundamentals.balance_sheet.total_assets,
fundamentals.balance_sheet.cash_cash_equivalents_and_marketable_securities,
fundamentals.balance_sheet.total_liabilities,
fundamentals.balance_sheet.payables_and_accrued_expenses),
date.strftime('%Y-%m-%d'))

def compute_comboaccrual_factors(context, data):
ca_df = query_comboaccrual_metrics(get_datetime())
ca_df_1 = query_comboaccrual_metrics(n_years_ago(1))
context.sta = compute_sta(ca_df, ca_df_1)
context.snoa = compute_snoa(ca_df)

### Franchise Power

# Quantitative Value suggests 8 year geometric mean for ROA, ROC, FCFA, MG,
# MS, MM. But that's expensive, and leaves very few years when Quantopian
# can run the backtest. I will implement n year, with n = 4.

def query_fp_metrics(date):
return get_fundamentals(
query(fundamentals.income_statement.net_income_continuous_operations,
fundamentals.balance_sheet.total_assets,
fundamentals.balance_sheet.net_ppe,
fundamentals.income_statement.ebit,
fundamentals.balance_sheet.working_capital,
fundamentals.operation_ratios.gross_margin,
fundamentals.cash_flow_statement.free_cash_flow),
date.strftime('%Y-%m-%d'))

## ROA

def roa(df):
return (df.loc['net_income_continuous_operations']/df.loc['total_assets'])

## ROC

def roc(df):
capital = df.loc['net_ppe'] + df.loc['working_capital']
return (df.loc['ebit']/capital)

## FCFA

def fcf_on_assets(df):
return (df.loc['free_cash_flow']/df.loc['total_assets'])

## MG

def mg(df, df_1):
return (df.loc['gross_margin']/df_1.loc['gross_margin'])

def ranked_margin_growth(dfs):
mgs = []
for i in range(0, len(dfs) - 1):
mgs.append(mg(dfs[i], dfs[i + 1]).fillna(value=0))
return ranked_geometric_means(mgs)

## MS

def ranked_margin_stability(dfs):
gms_df = df_for_column(dfs, 'gross_margin').fillna(value=0.0)
mean_gm = gms_df.mean(axis=1)
sd_gm = gms_df.std(axis=1)
ms = (mean_gm/sd_gm).dropna()
return ms.rank(method='first', ascending=False)

## Net FP

def rank_franchise_power(context, data):
df_years = [query_fp_metrics(get_datetime()),
query_fp_metrics(n_years_ago(1)),
query_fp_metrics(n_years_ago(2)),
query_fp_metrics(n_years_ago(3))]
roas = {'rroa': ranked_geometric_means([roa(df) for df in df_years]),
'rroc': ranked_geometric_means([roc(df) for df in df_years]),
'cfoa': ranked_sum([fcf_on_assets(df) for df in df_years])}
roas_df = pd.DataFrame(roas)
mroas = roas_df[['rroa', 'rroc', 'cfoa']].mean(axis=1)
rmg = ranked_margin_growth(df_years)
rms = ranked_margin_stability(df_years)
rmm = df_from_series([rmg, rms], 'mm').max(axis=1)
context.fp = ((mroas + rmm)/2).dropna()

### Financial Strength

def query_fs_metrics(date):
return get_fundamentals(
query(fundamentals.income_statement.net_income_continuous_operations,
fundamentals.balance_sheet.total_assets,
fundamentals.operation_ratios.gross_margin,
fundamentals.cash_flow_statement.free_cash_flow,
fundamentals.balance_sheet.long_term_debt,
fundamentals.operation_ratios.current_ratio,
fundamentals.operation_ratios.assets_turnover),
date.strftime('%Y-%m-%d'))

## FS_ROA, FS_FCFTA, FS_LEVER

def fcfta(df):
return df.loc['free_cash_flow']/df.loc['total_assets']

def fs_profitability(df0):
roa0 = df0.loc['net_income_continuous_operations']/df0.loc['total_assets']
fcfta0 = fcfta(df0)
accrual0 = fcfta0 - roa0
fs_roa = roa0.dropna().apply(positive_to_one)
fs_fcfta = fcfta0.dropna().apply(positive_to_one)
fs_accrual = accrual0.dropna().apply(positive_to_one)
return fs_roa, fs_fcfta, fs_accrual

## STABILITY

def fs_lever(df0, df1):
lever = (df1.loc['long_term_debt']/df1.loc['total_assets'] -
df0.loc['long_term_debt']/df0.loc['total_assets'])
return lever.dropna().apply(positive_to_one)

def fs_liquid(df0, df1):
liquid = df0.loc['current_ratio'] - df1.loc['current_ratio']
return liquid.dropna().apply(positive_to_one)

## Operational Improvements

def fs_droa(df0, df1):
return (roa(df0) - roa(df1)).dropna().apply(positive_to_one)

def fs_dfcfta(df0, df1):
return (fcfta(df0) - fcfta(df1)).dropna().apply(positive_to_one)

def fs_dmargin(df0, df1):
return (df0.loc['gross_margin'] - df1.loc['gross_margin']).dropna().apply(positive_to_one)

def fs_dturn(df0, df1):
return (df0.loc['assets_turnover'] - df1.loc['assets_turnover']).dropna().apply(positive_to_one)

## Rank FS

def rank_financial_strength(context, data):
df0 = query_fs_metrics(get_datetime())
df1 = query_fs_metrics(n_years_ago(1))
fs_roa, fs_fcfta, fs_accrual = fs_profitability(df0)
fs_total = (fs_roa + fs_fcfta + fs_accrual + fs_lever(df0, df1) + fs_liquid(df0, df1) +
fs_droa(df0, df1) + fs_dfcfta(df0, df1) + fs_dmargin(df0, df1) + fs_dturn(df0, df1))
context.fs = (fs_total/9.0).rank(method='first', ascending=False)

### Pipeline Aggregation

def process_pipeline(context, data):
price_data_df = pipeline_output('quant_val_pipeline').dropna()
sta = context.sta.fillna(value=context.sta.mean())
snoa = context.snoa.fillna(value=context.snoa.mean())
comboaccrual_df = sta.to_frame(name='sta').join(snoa.to_frame('snoa'))
aggregate_df = (price_data_df
.join(comboaccrual_df)
.join(context.fp.to_frame('fp_rank'))
.join(context.fs.to_frame('fs_rank')))
aggregate_df['sta_rank'] = aggregate_df['sta'].rank(method='first', ascending=True)
aggregate_df['snoa_rank'] = aggregate_df['snoa'].rank(method='first', ascending=True)
aggregate_df['comboaccrual'] = aggregate_df[['sta_rank', 'snoa_rank']].max(axis=1)
accrual_p95 = aggregate_df['comboaccrual'].quantile(0.95)
ebit_ev_p95 = aggregate_df['ebit_ev'].quantile(0.95)
accrual_filtered = (aggregate_df
[aggregate_df['comboaccrual'] < accrual_p95]
[aggregate_df['ebit_ev'] > ebit_ev_p95]).copy()
accrual_filtered['re_fp_rank'] = accrual_filtered['fp_rank'].rank(method='first', ascending=True)
accrual_filtered['re_fs_rank'] = accrual_filtered['fs_rank'].rank(method='first', ascending=True)
accrual_filtered['combined_rank'] = (accrual_filtered['re_fp_rank']*0.5 +
accrual_filtered['re_fs_rank']*0.5)
best = accrual_filtered.sort('combined_rank', ascending=True)
best_name_filtered = [stock for stock in best.index
if (('_' not in stock.symbol) and
(stock not in stocks_to_avoid))]
context.stocks = best_name_filtered[:context.positions_considered]
log.info('Selection stats: ' + str(
{'price_count': len(price_data_df),
'sta_count': len(sta),
'snoa_count': len(snoa),
'comboaccrual_count': len(comboaccrual_df),
'aggregate_count': len(aggregate_df),
'considered_count': context.positions_considered,
'fp_rank': len(context.fp),
'fs_rank': len(context.fs),
'best_count': len(best)}))
context.sta_rank = None
context.snoa_rank = None
context.fp = None
context.fs = None

def run_stock_selection(context, data):
log.info('Scheduling stock selection')
context.stocks = []
compute_comboaccrual_factors(context, data)
rank_franchise_power(context, data)
rank_financial_strength(context, data)
process_pipeline(context, data)
log.info('Stock selection completed')
context.enable_trading = True

# Control

def select_stocks_if_needed(context, data):
today = get_datetime()
if not context.last_rebalanced or (today - context.last_rebalanced).days > 365:
run_stock_selection(context, data)
context.run_sell = True
context.run_buy = True
context.last_rebalanced = today

###----------------------------------------
### Initialize

def update_portfolio_parameters(context):
# Positions to maintain
new_target = int((context.portfolio.portfolio_value + 100000)**0.25)
curr_count = count_positions(context.portfolio.positions)
context.position_target = new_target
# Don't reduce position count by more than one
if new_target < curr_count:
context.position_target = curr_count - 1
context.positions_considered = int(context.position_target*context.positions_considered_factor)

def set_context_parameters(context):
# Proportion of the portfolio value we want as cash
context.target_leverage = initial_leverage
# Positions considered factor
context.positions_considered_factor = 1.5
context.stocks = []
context.enable_trading = False
# We'll assume we aren't starting with any untradable positions,
# or with a leveraged portfolio
update_portfolio_parameters(context)
context.loss = 0.0
context.stocks_to_close = set()
context.stocks_to_open = set()
context.last_rebalanced = None
context.run_sell = False
context.run_buy = False

def setup_pipeline(context):
pipe = Pipeline()
pipe = attach_pipeline(pipe, name='quant_val_pipeline')
mkt = MedianValue([morningstar.valuation.market_cap], window_length=10)
dv = AverageDollarVolume(window_length=20)
ebit_ev = morningstar.income_statement.ebit.latest/morningstar.valuation.enterprise_value.latest
net_filter = (mkt.percentile_between(20.0, 100.0) &
dv.percentile_between(40.0, 100.0))
pipe.add(ebit_ev, 'ebit_ev')
pipe.set_screen(net_filter)

###----------------------------------------
### Rebalance

### Both

# For each stock in positions, check where it lands in the ordered list of
# desired stocks. That position is the stock rank.
def positions_ranks(positions, desired):
return [desired.index(stock) for stock in positions if stock in desired]

def compute_position_value(context):
# position_value = cash_to_invest/target_position_count
cash_to_invest = context.portfolio.portfolio_value*context.target_leverage
return cash_to_invest/context.position_target

### Closing

def close_deselected_positions(context, close_position_handler):
this_closed = []
for stock in context.portfolio.positions:
if (stock not in context.stocks) and close_position_handler(stock):
this_closed.append(stock.symbol)
log.info('Closing deselected positions ' + str(this_closed))

def close_excess_positions(context, close_position_handler, to_close, ranks):
sorted_ranks = list(ranks)
this_closed = []
sorted_ranks.sort(reverse=True)
for rank in sorted_ranks:
if to_close == 0:
break
stock = context.stocks[rank]
if close_position_handler(stock):
this_closed.append(stock.symbol)
to_close -= 1
log.info('Closing excess positions ' + str(this_closed))

def rebalance_close(context, data):
closed_stocks = set()
def close_position_handler(stock):
if ((stock not in context.portfolio.positions) or
(context.portfolio.positions[stock].amount == 0) or
(stock in closed_stocks)):
return False
else:
closed_stocks.add(stock)
return True
# 1. close deselected positions
close_deselected_positions(context, close_position_handler)
# 2. rank positions
ranks = positions_ranks(set(context.portfolio.positions) - set(closed_stocks), context.stocks)
# 3. compute to_open = target_position_count - #all_positions + #closed
to_open = (context.position_target - len(context.portfolio.positions) + len(closed_stocks))
# 4. if to_open < 0: close highest ranked positions
if to_open < 0:
close_excess_positions(context, close_position_handler, -to_open, ranks)
return closed_stocks

def manage_rebalance_close(context, data):
log_positions_and_sizes('Starting Rebalance Close', context.portfolio.positions)
def really_close():
all_can_trade = data.can_trade(context.portfolio.positions.keys())
new_to_close = context.stocks_to_close.copy()
for stock in context.stocks_to_close:
p = stock in context.portfolio.positions and context.portfolio.positions[stock]
if not p or p.amount == 0:
new_to_close.remove(stock)
elif stock in all_can_trade and all_can_trade[stock]:
order_target_value(stock, 0.0)
context.stocks_to_close = new_to_close
if new_to_close:
log.info('Not confirmed closed: ' + str([s.symbol for s in new_to_close]))
else:
context.run_sell = False
log.info('All confirmed closed')
if context.stocks_to_close:
really_close()
else:
context.stocks_to_close = rebalance_close(context, data)
really_close()
log_positions_and_sizes('Finishing Rebalance Close', context.portfolio.positions)

### Opening

def open_new_positions(context, to_open, value):
this_opened = []
opened = set()
for stock in context.stocks:
if to_open == 0:
break
if (stock not in context.portfolio.positions) and is_equity(stock):
to_open -= 1
this_opened.append(stock.symbol)
opened.add(stock)
log.info('Opening positions ' + str(this_opened))
return opened

def rebalance_open(context, data, closed_stocks):
# 1. compute to_open = target_position_count - #all_positions
to_open = (context.position_target + len(closed_stocks) - len(context.portfolio.positions))
if to_open > 0:
# 2. if to_open > 0: open best new positions
return open_new_positions(context, to_open, compute_position_value(context))
return set()

def manage_rebalance_open(context, data):
log_positions_and_sizes('Starting Rebalance Open', context.portfolio.positions)
def really_open():
new_to_open = context.stocks_to_open.copy()
all_can_trade = data.can_trade(context.stocks)
all_prices = data.current(context.stocks, 'price')
target_value = compute_position_value(context)
available_cash = context.portfolio.cash - (context.portfolio.portfolio_value * (1 - context.target_leverage))
for stock in context.stocks_to_open:
if available_cash <= 0:
break
if stock not in all_can_trade or not all_can_trade[stock]:
new_to_open.remove(stock)
continue
stock_price = all_prices[stock]
p = stock in context.portfolio.positions and context.portfolio.positions[stock]
position_value = p.amount * stock_price if stock_price and p else 0.0
if target_value - position_value > stock_price:
log.debug('Ordering {} {}'.format(stock, target_value))
order_target_value(stock, target_value)
available_cash -= (target_value - position_value)
else:
new_to_open.remove(stock)
context.stocks_to_open = new_to_open
if new_to_open:
log.info('Not confirmed open: ' + str([s.symbol for s in new_to_open]))
else:
log.info('All confirmed open')
context.run_buy = context.run_sell
if context.stocks_to_open:
really_open()
else:
context.stocks_to_open = rebalance_open(context, data, context.stocks_to_close)
if context.stocks_to_open:
really_open()
log_positions_and_sizes('Finishing Rebalance Open', context.portfolio.positions)

### Main

def rebalance(context, data):
# Initialize
if context.enable_trading:
if context.run_sell:
manage_rebalance_close(context, data)
if context.run_buy:
manage_rebalance_open(context, data)
record(positions_value=context.portfolio.positions_value,
portfolio_value=context.portfolio.portfolio_value,
cash=context.portfolio.cash,
net_liquidation=context.account.net_liquidation)

###----------------------------------------
### Overall Algorithm

def initialize(context):
log.info("Initializing")
set_context_parameters(context)
setup_pipeline(context)
set_long_only()
schedule_function(rebalance,
date_rule=date_rules.every_day(),
time_rule=time_rules.market_open(hours=1, minutes=30))

def before_trading_start(context, data):
update_portfolio_parameters(context)
select_stocks_if_needed(context, data)

There was a runtime error.
28 responses

Great Work! I was also working on the same topic but then stopped because of some Quantopian Limitation for Fundamentals data.
Where did you find that get_fundamentals supports a date also in the backtest environment and not only the research notebook? Is it an undocumented feature?

Here some links to my work on QVAL:
Campbell, Hilscher, Szilagyi (CHS) Model - Probability of corporate failure
https://www.quantopian.com/posts/campbell-hilscher-szilagyi-chs-model-probability-of-corporate-failure

Hey Costantino,

You've just found a major bug in my implementation, get_fundamentals indeed does not take a date argument in the backtest. So I've been playing with an extremely partial implementation of the quantitative value algorithm, managing to squeeze out good performance from the few factors that are correctly implemented... This is beyond embarrassing, and quite frankly also really impressive that the broken algorithm produced this performance.

So the options I have now are to either save up state in the context object, which will not translate to the live trading environment, or to use pipeline, which at least used to perform horribly with fundamental data, to the point where the algorithm had to be broken up into phases that ran across multiple days. Ugh. I honestly don't understand some of the design choices that the quantopian platform has made.

Well, back to the drawing board.

Sunil

Hi Sunil,

I wanted also to check, because I suspected that... anyway your work is a very good basis for the QVAL algorithm... it only needs to be updated, when Quantopian will make easier to work with historical fundamental data.

I'm complaining about the lack of this feature for a long time... if you query for my username, you will find many post about this subject.
There is a workaround, storing the data in a Panda Panel, but then is not possible to trade live (you have to wait a year o more before the panel is filled with data).

Q2 remains a great platform and maybe if we unite our forces, we can convince the Q-Team to add this feature as soon as possible ;-)... a get_fundamentalswith date like in the Research environment would be great!

Well, I'm slowly converting the algorithm over to use pipelines. So far I've only converted STA and SNOA computations, I suspect I'll soon run into timeout issues. Especially when I start implementing the franchise power and margin factors. The pipeline API as set up doesn't tell us where it is spending compute cycles, so any sort of optimization is virtually impossible.

I considered storing data in the context object, but as you said, that just doesn't work for live trading.

Another alternative to adding dates to get_fundamentals would be to allow specification of specific times for which we want to retrieve data for custom factors. Right now you have a very coarse window_size argument, would be good to be able to generalize that, even with something as simple as taking a vector of indices rather than a single window size.

The pipeline API is a really nice feature, but it is currently very inefficient especially when you want pipeline outputs infrequently (rather than daily), and does not come with a particularly rich set of operations that can be run on pipeline factors.

Sunil

You're right! Pipeline API is nice but very memory consuming, when it comes to the many datapoints required for historical fundamental data.

I posted a feature request exactly as you're now proposing (vector of indices rather than a single window size):
https://www.quantopian.com/posts/pipeline-api-feature-request-retrive-single-data-points-instead-of-a-full-array-of-data-window-length
but until now it remained a dead letter.

A kindly request to the Quantopian Team: get_fundamentals already supports timestamp and timeframe in the Research environment. It would be great, to enable the same also in the Backtesting!

I am for now giving up on fixing my implementation. I have posted a last cry for help here...

I've also been working on a QV screen algo, and running into the same issues as Sunil and Constantino when trying to run backtests.

However, I have put together a notebook (using pipeline) showing my working and would love thoughts / inputs from others. I already know there are a few things that need working on, for example:

• some of Constantino's examples above seem to have more accurate ways of determining window sizes and indexing for fundamentals - I need to look at that
• I still have to implement the 'probability of manipulation' part of the forensic screen
• my franchise power factors only go back a maximum of 2 years, as opposed to 8 - notebook might be able to handle more than 2 years, I need to look at that

If you spot any other bugs or have any other suggestions for improvement, please let me know. Otherwise I'm just going to be waiting for Constantino's feature request above to be implemented :)

24
Loading notebook preview...

After working with the QV algorithm for a few months, here's what I've learned:

1. The algorithm has been developed and optimized for long term investing with annual rebalancing. The outcome is quite tax efficient for an individual investor. It isn't necessarily the kind of thing that fits into the Quantopian hedge fund. However, you can improve performance a bit if you add in momentum and technical factors. I've used regression lines with some success.
2. Large parts of the algorithm can't be implemented using Quantopian's current pipeline implementation. Getting over an year of data (that's two data points per factor) is infeasible. You end up consuming too much memory, it kills the notebook.
3. There are large holes in the morningstar data. Many factors are defined for surprisingly few companies. EBIT for instance is defined for about 1000 fewer companies than EBITDA. Look at the fundamental data you're using carefully. I've developed notebooks to get a sense for the data cardinality. I'd post a link for the notebook, but the forum refuses to attach them to a message. I'll see if I can share them via bitbucket.
4. On a related note, most fundamental factors seem to be updated on a quarterly basis. The QV text I believe prefers annual numbers. Quantopian's infrastructure can't accommodate annual aggregation particularly well due to memory issues noted above.
5. There are outright errors lurking in the fundamental data. These can be really hard to spot.

I might try running some experiments again in the future if Quantopian fixes issues surrounding fundamental data access.

Sunil

thanks Sunil

Thank Sunil, yeah I found many NaN values in the fundamental data too. I think this algorithm may be better used as a screener rather than a trading algorithm, at least with the current data availability.

Some updates:
- Use today's date.
- Remove stocks with NaN total_assets (i.e. fundamental data of some companies like CVE, UBS, etc. do not get listed in Quantopian dataset as they filed as foreign issuers).
- Get all companies instead of only ones with high AverageDollarVolume.
- Use 8-year factors for growth. The code looks incorrect but I cannot confirm at the moment. Will look more into this later.

26
Loading notebook preview...

Jay Teguh Wijaya - amazing notebook!!! thanks... you saved me weeks of work

Hello Friends, I am currently trying to implement the allocation system from Wes Gray's and Tobias Carlisle's "Quantitativ Value". Thank you for the amazing notebooks you provided! You also saved me weeks of work. We are truly standing on the shoulders of giants :-)

I noticed that something appears to be off with the STA score. This is what I found:
If I understood the forensic screen correctly, STA is supposed to take YoY changes of net income and operating cash-flow as inputs,
but "morningstar.cash_flow_statement.operating_cash_flow" seems to give us operating cash-flow from the most recent quarter.
Does anybody know how we can maybe sum up the last 4 quarters?

For net income we are using "morningstar.cash_flow_statement.net_income". For many companies I get very strange values.
I compared these numbers with a few 10-Qs and 10-Ks and could no find where they are coming from.
I'm not sure. Maybe I just accidentally picked companies with faulty datasets.
But if I use "Fundamentals.net_income_income_statement.latest" instead, I do get at least a number that matches the last quarter net income from the 10-Qs.

In addition to that, if I want to check "Fundamentals.net_income_income_statement_asof_date.latest" I do get the correct date of the most recent quarter like "2018-09-30". Unfortunately, if I do the same with "morningstar.cash_flow_statement.net_income_as_of.latest" I only get ghibberish like "1969-12-31 23:59:54.033000000".

Hope somebody has an idea how this can be improved.

Hi @Sebastian,

That Wes Gray & Tobias Carlisle book is absolutely one of my favorites (out of a large trading library of hundreds of books) and has also been a great inspiration to me. As you say, shoulders of giants; indeed we are fortunate.

I have also had some strange results from time to time with regard to cash flow vs income statement data in the Morningstar DB, although i have not done any sort of thorough investigation like checking back with 10-K's & -Q's as you have. I am about to start looking at some of these specific items more closely soon, and i will be happy to communicate & share with you regarding whatever i/you find. Have you made any comparisons between Morningstar and the new FactSet DB items?
Cheers, best regards, TonyM.

Hello Tony,

I'd be happy to work on this thing with a fellow enthusiast.

After a little experimenting I think this should work for building a 12-month trailing net income.:

class NetIncome12M(CustomFactor):
inputs = [ Fundamentals.net_income_income_statement ]
window_length = 252
def compute(self, today, assets, out, nico):
out[:] = sum([nico[-1], nico[-94], nico[-157], nico[-220]])


I looked at the values for the four lookback periods and it does seem to match
the filing nicely. With ADRs it's more complicated. Some seem to work. Some have the
exact same value in each of the 4 periods.

Cheers,
Sebastian

Hi Sebastian,
What sort of look-back periods are you currently working with or interested in working with?

At the moment i'm putting most of my efforts into finding what factors seem to work best even BEFORE doing a careful job of cleaning up the data nicely.
Some of "the usual suspect" factors as per the book do indeed work fine, others not so well (although of course that might be because of bad data and may improve later after cleanup, as you are focusing on), and also i'm finding some very interesting new ideas that are not in the book. Cheers, Tony.

In addition to Morningstar, you might like to take a look at FactSet | Income Statement | Net Income items, specifically the following annual items:

net_inc_af = Net Income
net_inc_aft_xord_af = Net Income After Extraordinary Items
net_inc_basic_af = Net Income Available to Common, Basic
net_inc_basic_aft_xord_af = Net Income Available to Common, Basic, After Extraordinary Items
net_inc_basic_beft_xord_af = Net Income Before Extraordinary Items
net_inc_dil_af = Net Income Available to Common, Fully Diluted
net_inc_dil_aft_xord_af = Net Income Available to Common, Fully Diluted, After Extraordinaries

So far i'm finding this to be a good DB. I think you may find that some of the "cleanup" work required with Morningstar is already done, as well as giving a wider data coverage without unnecessary additional calculation. Potentially may save quite a bit of work! Cheers, Tony

Hello Tony,
This might be an excellent idea. I will put the FactSet values side by side with the morningstar/Fundamentals items and see how they compare.

Cheers,
Sebastian

About your first post:
I must admit that I am not good at backtesting yet. Probably have to watch a few more lectures. Still I feel confident to apply the metrics that Wes and Tobias (and Jack Vogel) used in their books. After all they tested their hypotheses over a much longer timeframe than we have available here.

I am positive that there are other factors that carry alpha and can be integrated into a quantitive value system. I am thinking about signals from insider dealings or short interest but the most promising approach is probably to find a way to combine value and momentum.

Alpha architect wrote that combining a simple value screen with a momentum metric can improve returns significantly. The only reason why they did not add it to quantitative value was that their value system in combination with their multistep "quality on steroids" factors still provided a little bit bettern returns than value plus momentum.

You might find this blog post interesting:
https://alphaarchitect.com/2015/03/26/the-best-way-to-combine-value-and-momentum-investing-strategies/

By the way, I would love to be able to also apply the quantitative screener on stocks from outside the US. There is no logical
reason why it should not work. Therefore I am trying to keep at least as many ADRs inside the screen as possible (in contrast to Wes and Tobias).
And I can probably consider stocks well below the 40th market cap percentile on NYSE without the transaction costs killing my returns.
At least that's one of the advantages of not belonging to the top 0.1% yet ;-)

Cheers,
Sebastian

Hi @Sebastian,

Thanks for the alpha-architect link. I did look at it a few years ago but had forgotten about it.

"I would love to be able to also apply the quantitative screener on stocks from outside the US. There is no logical reason why it should not work".
I have been trading stocks outside the US for many years and came up with ideas similar to those of Wes & Tobias long before i read the book, but it certainly helped me to consolidate some of those ideas. I have been using them very successfully in my own US and also Non-US stock selection & trading outside of Quantopian, and i can assure you that the ideas definitely DO work in other markets. In fact in some cases they work even better in other markets outside of the US, in other countries where these ideas are less well-known & less widely applied.

ADRs are a slightly different beast, with a mixture of characteristics of both the original foreign stocks and of US stocks, but again the ideas do generally work there too.

As to Market Cap, although i certainly use it in some ratio calculations, it actually not something i consider very much in trading. Generally of more importance from a practical trading perspective is liquidity, which can reasonably be defined as price multiplied by trading volume. As a trader, what one does NOT want is to get caught holding illiquid (i.e. un-sellable) stocks when the market starts to turn down, because if they don't have decent trading volume then maybe you just can't get out and end up with big losses. (Yeah, i did learn that lesson the hard way years ago). Often Liquidity & MarketCap are closely related, but not necessarily. Good trading stocks do generally need Volume, and this is closely related to transaction costs. Especially if you are considering options trading outside of the context of Q.

With me it's the same. Back at university I read about return anomalies, the value-premium, the small-cap-premium and things like that and I know it should be possible the utilize that. I could punch myself that I did not dig more into quantitative investing back then. We had Thomson Financial Datastream terminals at our disposal.

Later I read a lot about behavioral finance and the more I read, the more convinced I became that I should have no discretion whatsoever about when to buy and when to sell a stock. As humans we are hardwired to buy and sell at the worst possible time. This is the reason why I want to develop a system with a statistical edge that makes the investment decision on my behalf. I did some swing trading when I was younger but now I am more interested in holding a portfolio of equities over a period of 6 or 12 months. Maybe that's still called trading ;-)

Like Wes and Tobias I also stumbled upon Joel Greenblatt's 'Magic Formula' and fell in love with the idea of ranking a universe and staying invested the whole time instead of trying to time the market based on some absolute metric of cheapness.

I think what you say about liquidity is very true. There are some high market cap stocks with a minuscule free float on the other hand there are quite a number of 80 Mio. USD market cap stocks that are rather liquid. If I look at the average daily trading volume over the last 30 days and my order would be 1% of this daily volume or less, I do not worry about moving the price too much, regardless of the actual market cap.

By the way, I have a challenge in cleaning up the data. maybe you have an idea. When I look at

Fundamentals.net_income_income_statement_asof_date.latest


...I get some data from outdated filings. 2007, 2010, 2014 and so on. So far I haven't found the correct synthax to exclude equities with a filing date before YYYY.MM.DD.
Maybe you have an idea.

Cheers,
Sebastian

Behavioral Finance ideas make a lot of sense than the EMH crap that i got dished out!! ;-))
" ... the more convinced I became that I should have no discretion whatsoever about when to buy and when to sell a stock".

My own take on this is that "discretion" is generally something best used when OUT of the market and DESIGNING a system, rather than when IN the market trading it.

"... interested in holding a portfolio of equities over a period of 6 or 12 months...." I have done a lot of trading systems design (before and still currently outside of Q) and with regard to equities, for the most part i don't even try to design the timeframe, i just let the logical rules (partly fundamental, partly technical) tell me when to hold & when not to for each stock, and the holding periods can turn out to be anything from a few days days right up to more-or-less indefinite in some cases.

Staying invested the whole time, but changing the investments as the rules advise, that makes good sense to me too, although with the caveat that although we haven't seen a bear market for > 10 years doesn't mean one can't happen. What do you do then? Get out or try holding inverse equity ETFs?

Yes, Greenblatt's ideas are good too, but the more you look at fundamentals, the more you can find other ideas that work but no-one has really written about. That's mostly what i am playing with right now here on Q.

Regarding illiquid stocks (e.g. speculative mining, biotech, etc) they can sometimes be fantastic "investments", but just don't want to be holding them if/when the market as a whole goes down. Yes, i agree with the "my order size < 1%% of av daily volume" type of idea too.

Excluding stocks with old (i.e. no recent) data:
My analysis for my own trading is not here in Q and honestly i am not a good python programmer, but i will think about this and, if i can't figure it out, then i will try asking around a bit. Certainly the idea makes good sense and worth doing.

All the best, Tony.

Hello Tony,

You definitely have a point about the possibility of a >10 year bear market.
I will tell you more about my personal asset allocation strategy py PM.

I'm not mistaken, there seems to be a major flaw in the STA-calculation even if I manage to use trailing 12 month net income and operating cash flow instead of quarterly values.

This is the code for the custom factor:

class STA(CustomFactor):
inputs = [morningstar.cash_flow_statement.operating_cash_flow,
morningstar.cash_flow_statement.net_income,
morningstar.balance_sheet.total_assets]
window_length = 1
def compute(self, today, assets, out, ocf, ni, ta):
ta = np.where(np.isnan(ta), 0, ta)
ocf = np.where(np.isnan(ocf), 0, ocf)
ni = np.where(np.isnan(ni), 0, ni)
out[:] = abs(ni[-1] - ocf[-1])/ ta[-1]


If we look at the absolut difference between net income and operating cash flow, doesn't it mean we are also punishing companies with high cash flows in comparison to net earnings? Basically this means a company with 15 billion in net income and 5 billion in ocf is just as bad as a company with 15 billion in ocf and 5 billion in net earnings. This doesn't seem right. Maybe I missed something but I thing it should just be:

 out[:] = (ni[-1] - ocf[-1])/ ta[-1]


All the best,
Sebastian

Finally got time to tinker around with the quantitave vaue screen a little more.
That's how you can check the filings for "freshness" of the filing. I just looked at net income from the most recent income statement to keep things simple.

#Has to be importet first
from quantopian.pipeline.factors import BusinessDaysSincePreviousEvent

#That's what I use for freshness when I filter the universe
ni1current = (BusinessDaysSincePreviousEvent(inputs=[Fundamentals.net_income_income_statement_asof_date]) <= 190)


Cheers,
Sebastian

Hi Sebastian, that "BusinessDaysSince..." looks like just the thing. Cheers.

Hello Tony,
It took me some time but i dug deeper into the algo. One important takeaway:
Pay attention what you are looking at!
Many of the Morningstar metrics mean something different than you would expect.

If I am not totally mistaken, things like Gross Margin or ROA at a certain point in time
are actually calculated using only the income statement information from
the most recent quarter - not the last 4 quarters and not even the most recent fiscal year end.
This means yesterday's ROA is actually net income from the most recent quarter divided by most recent total assets.
That's no problem as long as your fundamental only contains balance sheet items
like the YoY long term debt growth inside the F-Score but as soon as you need to analyze anything from the
income or cashflow statement, you are probably looking at only one quarter worth of data.

Cheers,
Sebastian

Small correction:
I noticed there was still something off when I compared the ROA with the actual 10Qs.

morningstar.operation_ratios.roa.latest


... is actually the most recent quarter's net income but NOT divided by the most recent quarter's total assets but by the average of the most recent quarter's and the previous quarter's total assets.

Cheers