Back to Community
New Video: Learn from the Experts Ep 2: Fast Iterative Factor Development with Kyle

In our latest video, Quantopian community member Kyle McEntush walks through his algorithm creation process with Quantopian’s Dr. Thomas Wiecki. This video starts with a short interview about Kyle’s background in chemical engineering and continues with Kyle walking through an example algorithm he created on Quantopian.

As a Chemical Engineer himself, Kyle shows how others can use their engineering and science backgrounds to help them create challenge-ready factors. Kyle generously shares his workflow which is focused on a fast iteration cycle of many factors.

Check out our latest challenge here, where you can test out your skills and submit for a chance to win cash prizes or an opportunity to get your factor licensed.

You can watch it at this link, or down below:

Learn more by subscribing to our YouTube channel to access all of our videos and be notified when a new one is posted.

As always, if there are any topics you would like us to focus on for future videos, please comment below or send us a quick note at [email protected].

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

16 responses

Research notebook attached.

Loading notebook preview...

And the example algo using the "TrendyEPS" factor

Clone Algorithm
44
Loading...
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
import quantopian.algorithm as algo
from quantopian.algorithm import attach_pipeline, pipeline_output

import quantopian.optimize as opt
from quantopian.pipeline import Pipeline
from quantopian.pipeline.factors import CustomFactor
from quantopian.pipeline.data.builtin import EquityPricing
from quantopian.pipeline.data.factset import RBICSFocus
from quantopian.pipeline.data.factset.estimates import Actuals, PeriodicConsensus

from quantopian.pipeline.filters import QTradableStocksUS
from zipline.utils.tradingcalendar import trading_day

import numpy as np
import pandas as pd

# Pipeline parameters
USE_SECTORS = True
PIPE_NORMALIZE = False

# Algo parameters
NO_COST = True   # disable trading costs and slippage


def clip(data, threshold=0.025, drop=False):
    data = pd.Series(data)
    data_notnull = data[data.notnull()]
    if data_notnull.shape[0] > 0:
        low_cutoff = data_notnull.quantile(threshold)
        high_cutoff = data_notnull.quantile(1 - threshold)
        if not drop:
            data = data.clip(lower=low_cutoff, upper=high_cutoff).values
        else:
            data = data[(data < low_cutoff) | (data > high_cutoff)]

    return data


def standardize(data, winsorize=True, sectors=None, threshold=0.025):
    data = pd.Series(data)
    if winsorize:
        data = clip(data, threshold=threshold)

    # Prepare the data
    dfData = pd.DataFrame({'data': data})
    if USE_SECTORS and sectors is not None:
        dfData['sector'] = sectors
    else:
        dfData['sector'] = ''

    # Standardize the data
    zscore = lambda x: (x - x.mean()) / (x.std() == 0 and 1 or x.std())
    data = dfData.groupby(['sector'])['data'].transform(zscore)

    return data

def normalize(data, demean=False):
    data = pd.Series(data)
    if demean:
        data = data - data.mean()

    denom = data.abs().sum()
    if denom == 0:
        denom = 1

    return data / denom

class TrendyEPS(CustomFactor):
    # Get EPS values
    qn3_eps = PeriodicConsensus.slice('EPS', 'qf', -3)
    qn2_eps = PeriodicConsensus.slice('EPS', 'qf', -2)
    qn1_eps = PeriodicConsensus.slice('EPS', 'qf', -1)
    q0_eps = PeriodicConsensus.slice('EPS', 'qf', 0)
    an3_eps = Actuals.slice('EPS', 'qf', -3)
    an2_eps = Actuals.slice('EPS', 'qf', -2)
    an1_eps = Actuals.slice('EPS', 'qf', -1)
    a0_eps = Actuals.slice('EPS', 'qf', 0)

    inputs = [qn3_eps.mean, qn2_eps.mean, qn1_eps.mean, q0_eps.mean, an3_eps.actual_value, an2_eps.actual_value, an1_eps.actual_value, a0_eps.actual_value, RBICSFocus().l2_name]
    window_length = 1
    window_safe = True

    def compute(self, today, assets, out, qn3_eps, qn2_eps, qn1_eps, q0_eps, an3_eps, an2_eps, an1_eps, a0_eps, sectors):
        # Calculate surprise
        surprise_n3 = (an3_eps[-1, :] - qn3_eps[-1, :]) / np.abs(qn3_eps[-1, :])
        surprise_n2 = (an2_eps[-1, :] - qn2_eps[-1, :]) / np.abs(qn2_eps[-1, :])
        surprise_n1 = (an1_eps[-1, :] - qn1_eps[-1, :]) / np.abs(qn1_eps[-1, :])
        surprise_0 = (a0_eps[-1, :] - q0_eps[-1, :]) / np.abs(q0_eps[-1, :])
        
        # Add all surprises
        surprise = np.nan_to_num(surprise_n3) + np.nan_to_num(surprise_n2) + np.nan_to_num(surprise_n1) + np.nan_to_num(surprise_0)

        # Replace inf w/ NaN
        surprise[np.isinf(surprise)] = np.nan

        # Standardize the data
        surprise = standardize(surprise, sectors=sectors.as_string_array()[-1, :])

        # Normalize the data (NOTE: only include if looking at factor individually)
        if PIPE_NORMALIZE:
            surprise = normalize(surprise)

        out[:] = surprise


def make_factors():
    factors = {}

    factors['TrendyEPS'] = TrendyEPS

    return factors

# Define the universe
universe = QTradableStocksUS()


def factor_pipeline(universe):
    all_factors = make_factors()

    factors = {a: all_factors[a]() for a in all_factors}

    pipe = Pipeline(columns=factors, screen=universe)

    return pipe


def initialize(context):
    # Rebalance every day, after market close
    algo.schedule_function(
        rebalance,
        algo.date_rules.every_day(),
        algo.time_rules.market_close(hours=1)
    )

    if NO_COST:
        set_commission(commission.PerShare(cost=0, min_trade_cost=0))
        set_slippage(slippage.FixedBasisPointsSlippage(basis_points=0, volume_limit=0.1))

    # Record tracking variables at the end of each day.
    algo.schedule_function(
        record_vars,
        algo.date_rules.every_day(),
        algo.time_rules.market_close(),
    )

    # Create our dynamic stock selector.
    attach_pipeline(factor_pipeline(universe), 'pipeline')


def rebalance(context, data):
    # Get the date's alphas
    date_alphas = (pipeline_output('pipeline')).astype('float64')

    # Get useful info
    stocks = date_alphas.index.unique()

    # Combine the alpha factors
    combined_alpha = date_alphas.sum(axis=1)

    # Normalize the alpha factors
    combined_alpha = normalize(combined_alpha)

    # Define the objective
    objective = opt.TargetWeights(combined_alpha)

    # Add our constraints
    constraints = []

    # Calculate the optimal portfolio
    try:
        combined_alpha = opt.calculate_optimal_portfolio(objective=objective, constraints=constraints)
    except:
        pass

    # Drop expired securites (i.e. that aren't in the tradeable universe on that date)
    combined_alpha = combined_alpha[combined_alpha.index.isin(pipeline_output('pipeline').index)]

    # Do a final null filter and normalization
    combined_alpha = combined_alpha[pd.notnull(combined_alpha)]
    combined_alpha = normalize(combined_alpha)

    # Define the objective
    objective = opt.TargetWeights(combined_alpha)

    # Order the optimal portfolio
    order_optimal_portfolio(objective=objective, constraints=constraints)


def record_vars(context, data):
    record(num_positions=len(context.portfolio.positions))


def handle_data(context, data):
    pass
There was a runtime error.

Great, expecially the work you do to "clean" the data from NaNs,zeros etc. I wonder if Q could maybe in the future insert an option when we import data to have it already more "cleaned" somehow

Good post. I re ran it, and see specific returns are very low or negative. Is the risk model considering trendyeps as value, hence subtracting it?

I added constraints in order_optimal_portfolio, but got into a Timeout error. Any idea how to fix it?

Thanks Kyle for sharing

Good Stuff! I hope these 'Learn from the Experts' threads will earn their own folder
soon.

I'm also curious why this backtests so slowly. I tried added a DollarNeutral constraint and it times out.

I don't really know why adding constraints lead to slow backtests. Perhaps someone at Q might know? Otherwise, I don't see any explicitly slow code anywhere in the algo...

That issue should be fixed now, please let me know if that's not the case.

@Kyle

Do you have as well a zipline version?
Just found a zipline command in the tear sheet.
Would be nice to use it on zipline

Thankx
Carsten

Wow, this is extremely educational. Thank you so much for sharing the video, notebook, and algorithm. I have been having a hard time connecting the report from Alphalens and the result of the full backtest and this interview answers that question.

I generalized the EPSGrowth class a little bit:

class FactorGrowth(CustomFactor):  
    inputs = [None, None, None]  
    window_length = 1  
    window_safe = True

    def compute(self, today, assets, out, prev_q, curr_q, sectors):  
        # Calculate surprise  
        prev_q = prev_q[-1, :]  
        curr_q = curr_q[-1, :]  
        surprise = (curr_q - prev_q) / np.abs(prev_q)

        # Replace inf w/ NaN  
        surprise[np.isinf(surprise)] = np.nan

        # Standardize the data  
        surprise = standardize(surprise, sectors=sectors.as_string_array()[-1, :])

        # Normalize the data (NOTE: only include if looking at factor individually)  
        if PIPE_NORMALIZE:  
            surprise = normalize(surprise)

        out[:] = surprise  

So the make_factors() function can be written as:

def make_factors():  
    factors = {}  
    # Quarterly  
    qn1_q_eps = Actuals.slice('EPS', 'qf', -1).actual_value  
    q0_q_eps = Actuals.slice('EPS', 'qf', 0).actual_value  
    # Annual  
    qn1_a_eps = Actuals.slice('EPS', 'af', -1).actual_value  
    q0_a_eps = Actuals.slice('EPS', 'af', 0).actual_value  
    focus = RBICSFocus().l2_name

    factors['EPSGrowth'] = FactorGrowth(inputs=[qn1_q_eps, q0_q_eps, focus], mask=universe)  
    factors['EPSGrowthYr'] = FactorGrowth(inputs=[qn1_a_eps, q0_a_eps, focus], mask=universe)  
    factors['TrendyEPS'] = TrendyEPS(mask=universe)

    return factors  

The pro is I can experiment with additional factors a bit faster as I do not need to write a CustomFactor for each of them, but the con is I can't just change the operations to a specific factor I experimented with, also this is probably an unneeded abstraction as it makes the code a little harder to read.

If I wanted to do some changes to a specific factor I'd have to make a copy of the FactorGrowth class and rename it for that specific factor.

May the Alpha be with you.

Thank You very much Kyle and Thomas for this very educational and very helpful video. I have a question on this. If someone could please help on this. In the following code, how would one go about combining two factors?

def make_factors():  
    factors = {}

    factors['EPSGrowth'] = EPSGrowth  
    factors['EPSGrowthYr'] = EPSGrowthYr  
    factors['CombinedFactor'] = EPSGrowth*EPSGrowthYr

    return factors

# Define the universe  
universe = QTradableStocksUS()

def factor_pipeline(universe):  
    all_factors = make_factors()  
    factors = {a: all_factors[a]() for a in all_factors}  
    pipe = Pipeline(columns=factors, screen=universe)  
    return pipe  

I am trying above code for combining two factors, but it is giving me following error WHILE running pipeline -

unsupported operand type(s) for *: 'ABCMeta' and 'ABCMeta'

Could someone please post a code which will work?

Nadeem, I typically do the factor combination outside of the pipeline. In other words, I'd run the pipeline as normal, and in the results just multiply the two pandas columns. It's also possible that the factors have a .latest or some other attribute that change them from ABCMeta classes to BoundColumns, although I'm not sure.

Hi,

Thanks Kyle for the useful research notebook. For one of the factor, I am trying to have multiple outputs, but i encountered a RecarrayField error. The idea is to see the various intermediate variables for the computation of each factor in the class, which i thought will be useful when we want to reconstruct our own Fundamental variable (e.g. PE ratio). Under make_factors(), im stuck with the error. How do i go about doing this?

Sorry for the newbie qn, just started learning coding a month ago.

Thanks.

Loading notebook preview...

Thanks Kyle!! Your video is very useful!!
Do you have as well a zipline version?