New Video: Learn from the Experts Ep 2: Fast Iterative Factor Development with Kyle

In our latest video, Quantopian community member Kyle McEntush walks through his algorithm creation process with Quantopian’s Dr. Thomas Wiecki. This video starts with a short interview about Kyle’s background in chemical engineering and continues with Kyle walking through an example algorithm he created on Quantopian.

As a Chemical Engineer himself, Kyle shows how others can use their engineering and science backgrounds to help them create challenge-ready factors. Kyle generously shares his workflow which is focused on a fast iteration cycle of many factors.

Check out our latest challenge here, where you can test out your skills and submit for a chance to win cash prizes or an opportunity to get your factor licensed.

You can watch it at this link, or down below:

Learn more by subscribing to our YouTube channel to access all of our videos and be notified when a new one is posted.

As always, if there are any topics you would like us to focus on for future videos, please comment below or send us a quick note at [email protected].

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

24 responses

Research notebook attached.

215
Loading notebook preview...

And the example algo using the "TrendyEPS" factor

115
Loading...
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
 Returns 1 Month 3 Month 6 Month 12 Month
 Alpha 1 Month 3 Month 6 Month 12 Month
 Beta 1 Month 3 Month 6 Month 12 Month
 Sharpe 1 Month 3 Month 6 Month 12 Month
 Sortino 1 Month 3 Month 6 Month 12 Month
 Volatility 1 Month 3 Month 6 Month 12 Month
 Max Drawdown 1 Month 3 Month 6 Month 12 Month
import quantopian.algorithm as algo
from quantopian.algorithm import attach_pipeline, pipeline_output

import quantopian.optimize as opt
from quantopian.pipeline import Pipeline
from quantopian.pipeline.factors import CustomFactor
from quantopian.pipeline.data.builtin import EquityPricing
from quantopian.pipeline.data.factset import RBICSFocus
from quantopian.pipeline.data.factset.estimates import Actuals, PeriodicConsensus

from quantopian.pipeline.filters import QTradableStocksUS
from zipline.utils.tradingcalendar import trading_day

import numpy as np
import pandas as pd

# Pipeline parameters
USE_SECTORS = True
PIPE_NORMALIZE = False

# Algo parameters
NO_COST = True   # disable trading costs and slippage

def clip(data, threshold=0.025, drop=False):
data = pd.Series(data)
data_notnull = data[data.notnull()]
if data_notnull.shape[0] > 0:
low_cutoff = data_notnull.quantile(threshold)
high_cutoff = data_notnull.quantile(1 - threshold)
if not drop:
data = data.clip(lower=low_cutoff, upper=high_cutoff).values
else:
data = data[(data < low_cutoff) | (data > high_cutoff)]

return data

def standardize(data, winsorize=True, sectors=None, threshold=0.025):
data = pd.Series(data)
if winsorize:
data = clip(data, threshold=threshold)

# Prepare the data
dfData = pd.DataFrame({'data': data})
if USE_SECTORS and sectors is not None:
dfData['sector'] = sectors
else:
dfData['sector'] = ''

# Standardize the data
zscore = lambda x: (x - x.mean()) / (x.std() == 0 and 1 or x.std())
data = dfData.groupby(['sector'])['data'].transform(zscore)

return data

def normalize(data, demean=False):
data = pd.Series(data)
if demean:
data = data - data.mean()

denom = data.abs().sum()
if denom == 0:
denom = 1

return data / denom

class TrendyEPS(CustomFactor):
# Get EPS values
qn3_eps = PeriodicConsensus.slice('EPS', 'qf', -3)
qn2_eps = PeriodicConsensus.slice('EPS', 'qf', -2)
qn1_eps = PeriodicConsensus.slice('EPS', 'qf', -1)
q0_eps = PeriodicConsensus.slice('EPS', 'qf', 0)
an3_eps = Actuals.slice('EPS', 'qf', -3)
an2_eps = Actuals.slice('EPS', 'qf', -2)
an1_eps = Actuals.slice('EPS', 'qf', -1)
a0_eps = Actuals.slice('EPS', 'qf', 0)

inputs = [qn3_eps.mean, qn2_eps.mean, qn1_eps.mean, q0_eps.mean, an3_eps.actual_value, an2_eps.actual_value, an1_eps.actual_value, a0_eps.actual_value, RBICSFocus().l2_name]
window_length = 1
window_safe = True

def compute(self, today, assets, out, qn3_eps, qn2_eps, qn1_eps, q0_eps, an3_eps, an2_eps, an1_eps, a0_eps, sectors):
# Calculate surprise
surprise_n3 = (an3_eps[-1, :] - qn3_eps[-1, :]) / np.abs(qn3_eps[-1, :])
surprise_n2 = (an2_eps[-1, :] - qn2_eps[-1, :]) / np.abs(qn2_eps[-1, :])
surprise_n1 = (an1_eps[-1, :] - qn1_eps[-1, :]) / np.abs(qn1_eps[-1, :])
surprise_0 = (a0_eps[-1, :] - q0_eps[-1, :]) / np.abs(q0_eps[-1, :])

# Add all surprises
surprise = np.nan_to_num(surprise_n3) + np.nan_to_num(surprise_n2) + np.nan_to_num(surprise_n1) + np.nan_to_num(surprise_0)

# Replace inf w/ NaN
surprise[np.isinf(surprise)] = np.nan

# Standardize the data
surprise = standardize(surprise, sectors=sectors.as_string_array()[-1, :])

# Normalize the data (NOTE: only include if looking at factor individually)
if PIPE_NORMALIZE:
surprise = normalize(surprise)

out[:] = surprise

def make_factors():
factors = {}

factors['TrendyEPS'] = TrendyEPS

return factors

# Define the universe
universe = QTradableStocksUS()

def factor_pipeline(universe):
all_factors = make_factors()

factors = {a: all_factors[a]() for a in all_factors}

pipe = Pipeline(columns=factors, screen=universe)

return pipe

def initialize(context):
# Rebalance every day, after market close
algo.schedule_function(
rebalance,
algo.date_rules.every_day(),
algo.time_rules.market_close(hours=1)
)

if NO_COST:
set_commission(commission.PerShare(cost=0, min_trade_cost=0))
set_slippage(slippage.FixedBasisPointsSlippage(basis_points=0, volume_limit=0.1))

# Record tracking variables at the end of each day.
algo.schedule_function(
record_vars,
algo.date_rules.every_day(),
algo.time_rules.market_close(),
)

# Create our dynamic stock selector.
attach_pipeline(factor_pipeline(universe), 'pipeline')

def rebalance(context, data):
# Get the date's alphas
date_alphas = (pipeline_output('pipeline')).astype('float64')

# Get useful info
stocks = date_alphas.index.unique()

# Combine the alpha factors
combined_alpha = date_alphas.sum(axis=1)

# Normalize the alpha factors
combined_alpha = normalize(combined_alpha)

# Define the objective
objective = opt.TargetWeights(combined_alpha)

# Add our constraints
constraints = []

# Calculate the optimal portfolio
try:
combined_alpha = opt.calculate_optimal_portfolio(objective=objective, constraints=constraints)
except:
pass

# Drop expired securites (i.e. that aren't in the tradeable universe on that date)
combined_alpha = combined_alpha[combined_alpha.index.isin(pipeline_output('pipeline').index)]

# Do a final null filter and normalization
combined_alpha = combined_alpha[pd.notnull(combined_alpha)]
combined_alpha = normalize(combined_alpha)

# Define the objective
objective = opt.TargetWeights(combined_alpha)

# Order the optimal portfolio
order_optimal_portfolio(objective=objective, constraints=constraints)

def record_vars(context, data):
record(num_positions=len(context.portfolio.positions))

def handle_data(context, data):
pass
There was a runtime error.

Great, expecially the work you do to "clean" the data from NaNs,zeros etc. I wonder if Q could maybe in the future insert an option when we import data to have it already more "cleaned" somehow

Good post. I re ran it, and see specific returns are very low or negative. Is the risk model considering trendyeps as value, hence subtracting it?

I added constraints in order_optimal_portfolio, but got into a Timeout error. Any idea how to fix it?

Thanks Kyle for sharing

Good Stuff! I hope these 'Learn from the Experts' threads will earn their own folder
soon.

I'm also curious why this backtests so slowly. I tried added a DollarNeutral constraint and it times out.

I don't really know why adding constraints lead to slow backtests. Perhaps someone at Q might know? Otherwise, I don't see any explicitly slow code anywhere in the algo...

That issue should be fixed now, please let me know if that's not the case.

@Kyle

Do you have as well a zipline version?
Just found a zipline command in the tear sheet.
Would be nice to use it on zipline

Thankx
Carsten

Wow, this is extremely educational. Thank you so much for sharing the video, notebook, and algorithm. I have been having a hard time connecting the report from Alphalens and the result of the full backtest and this interview answers that question.

I generalized the EPSGrowth class a little bit:

class FactorGrowth(CustomFactor):
inputs = [None, None, None]
window_length = 1
window_safe = True

def compute(self, today, assets, out, prev_q, curr_q, sectors):
# Calculate surprise
prev_q = prev_q[-1, :]
curr_q = curr_q[-1, :]
surprise = (curr_q - prev_q) / np.abs(prev_q)

# Replace inf w/ NaN
surprise[np.isinf(surprise)] = np.nan

# Standardize the data
surprise = standardize(surprise, sectors=sectors.as_string_array()[-1, :])

# Normalize the data (NOTE: only include if looking at factor individually)
if PIPE_NORMALIZE:
surprise = normalize(surprise)

out[:] = surprise


So the make_factors() function can be written as:

def make_factors():
factors = {}
# Quarterly
qn1_q_eps = Actuals.slice('EPS', 'qf', -1).actual_value
q0_q_eps = Actuals.slice('EPS', 'qf', 0).actual_value
# Annual
qn1_a_eps = Actuals.slice('EPS', 'af', -1).actual_value
q0_a_eps = Actuals.slice('EPS', 'af', 0).actual_value
focus = RBICSFocus().l2_name

factors['EPSGrowth'] = FactorGrowth(inputs=[qn1_q_eps, q0_q_eps, focus], mask=universe)
factors['EPSGrowthYr'] = FactorGrowth(inputs=[qn1_a_eps, q0_a_eps, focus], mask=universe)
factors['TrendyEPS'] = TrendyEPS(mask=universe)

return factors


The pro is I can experiment with additional factors a bit faster as I do not need to write a CustomFactor for each of them, but the con is I can't just change the operations to a specific factor I experimented with, also this is probably an unneeded abstraction as it makes the code a little harder to read.

If I wanted to do some changes to a specific factor I'd have to make a copy of the FactorGrowth class and rename it for that specific factor.

May the Alpha be with you.

Thank You very much Kyle and Thomas for this very educational and very helpful video. I have a question on this. If someone could please help on this. In the following code, how would one go about combining two factors?

def make_factors():
factors = {}

factors['EPSGrowth'] = EPSGrowth
factors['EPSGrowthYr'] = EPSGrowthYr
factors['CombinedFactor'] = EPSGrowth*EPSGrowthYr

return factors

# Define the universe
universe = QTradableStocksUS()

def factor_pipeline(universe):
all_factors = make_factors()
factors = {a: all_factors[a]() for a in all_factors}
pipe = Pipeline(columns=factors, screen=universe)
return pipe


I am trying above code for combining two factors, but it is giving me following error WHILE running pipeline -

unsupported operand type(s) for *: 'ABCMeta' and 'ABCMeta'

Could someone please post a code which will work?

Nadeem, I typically do the factor combination outside of the pipeline. In other words, I'd run the pipeline as normal, and in the results just multiply the two pandas columns. It's also possible that the factors have a .latest or some other attribute that change them from ABCMeta classes to BoundColumns, although I'm not sure.

Hi,

Thanks Kyle for the useful research notebook. For one of the factor, I am trying to have multiple outputs, but i encountered a RecarrayField error. The idea is to see the various intermediate variables for the computation of each factor in the class, which i thought will be useful when we want to reconstruct our own Fundamental variable (e.g. PE ratio). Under make_factors(), im stuck with the error. How do i go about doing this?

Sorry for the newbie qn, just started learning coding a month ago.

Thanks.

4
Loading notebook preview...

Thanks Kyle!! Your video is very useful!!
Do you have as well a zipline version?

@Thomas Wiecki

is there a way to get the factor loadings and returns for zipline.
How are they designed? I'm running zipline with fundamentals, I could construct them by myself if I knew the formulas you are using.
Should be similar to Farmans and French?

# Load risk factor loadings and returns

factor_loadings = get_factor_loadings(assets, start_date, new_end_date)
factor_returns = get_factor_returns(start_date, new_end_date)

@Carsten: The model is only available on Quantopian. You can find the details here: https://www.quantopian.com/posts/risk-model-white-paper-released-and-available-for-your-reading

How long did it take to make all the resarch? from zero to the end?

@Alejandro, I start with a hypothesis. Ex: I think that companies who beat earnings repeatedly are good buys. Then, I code a bunch of different factors that attempt to capture that hypothesis. Next, I plug them into my notebook and run it. I iterate over this for a few hours until I've found something statistically significant. Only then do I run a backtest to confirm that the factor translates well to a trading environment.

All in all, I've spent many hours in research. It really is up to you.

Hi @Kyle
This is an excellent demo who helped me allot to learn how to analyze factors.
It might be a rookie question but I am having trouble understand how to research my hypothesis...

I try to take current ratio as a factor.
I want to have a current ratio of 1-5 as long, and <1 as short

1. I am not sure if the math I did actually correct (full notebook is provided):
match = match - 1 # current ratio > 1 is welcomed
class Current(CustomFactor):
factor = mf.current_ratio
inputs = [factor, RBICSFocus().l2_name]
window_length = 1
window_safe = True

def compute(self, today, assets, out, factor, sectors):
# Calculate surprise
factor = factor[-1, :]
match = factor
match[factor > 5] = np.nan
match = match - 1 # current ratio > 1 is welcomed

# Replace inf w/ NaN
match[np.isinf(match)] = np.nan

# Standardize the data
match = standardize(match, sectors=sectors.as_string_array()[-1, :])

# Normalize the data (NOTE: only include if looking at factor individually)
if PIPE_NORMALIZE:
match = normalize(match)

out[:] = match


1. In the tear sheet the specific returns is positive but the total is negative.
What does this behavior means?

I would really appreciate the help here to continue
Thanks!
Idan

0
Loading notebook preview...

Hi @Kyle, many thanks for this very good educational video and sources shared. Question on factors combination - several posts earlier you mentioned that in order to analyze the combined factor you do the multiplication of two columns in pandas with separate factors. Why do you multiply but not make the linear combination of these?

Hi Vyacheslav,

I typically do linear combination as you're describing. However, there may be cases where you actually want to scale one factor's impact based on another's. For example, maybe your strategy succeeds when there is both positive sentiment as well as good momentum. In that case, multiplicative combination would allow you to take varying positions weights in many companies as opposed to a binary cutoff with fewer companies and thus slightly larger weights.

Your research can confirm whether or not this multiplicative aggregation is useful or not. Again, to clarify, I typically just do linear combination.

@Kyle, thanks!