Back to Community
multi-factor algo template

I've attempted to replicate the Pipeline factors used in the ML algo posted here:

https://www.quantopian.com/posts/machine-learning-on-quantopian-part-3-building-an-algorithm

Note also that the ML algo is now on Github:

https://github.com/quantopian/research_public/blob/92b32ccd61f25fdfbccfc67a82217c64ae3173e3/research/ml_algo.py

The end goal is to re-factor the ML algo, so that various alpha combination techniques can be applied easily and compared. This is a first step in that direction.

One specific snag is that I have been unable to use the built-in MACDSignal in a custom factor. So, if anyone knows how to do it, it would be much appreciated (my attempt is commented out...if you run it, you'll see the error).

Clone Algorithm
20
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
from quantopian.algorithm import attach_pipeline, pipeline_output, order_optimal_portfolio
from quantopian.pipeline import Pipeline
from quantopian.pipeline.data.builtin import USEquityPricing
from quantopian.pipeline.data import Fundamentals, psychsignal
from quantopian.pipeline.factors import AnnualizedVolatility, SimpleBeta, Returns, MACDSignal, CustomFactor
import quantopian.optimize as opt
from quantopian.pipeline.experimental import risk_loading_pipeline
from quantopian.pipeline.filters import QTradableStocksUS
from sklearn import preprocessing
from scipy.stats.mstats import winsorize

from zipline.utils.numpy_utils import (
    repeat_first_axis,
    repeat_last_axis,
)

import numpy as np

MAX_GROSS_EXPOSURE = 1.0
NUM_POSITIONS = 400 # even number
MAX_POSITION_SIZE = 2.0/NUM_POSITIONS

MIN_BETA_EXPOSURE = -0.3
MAX_BETA_EXPOSURE = 0.3

# Factor preprocessing settings
WIN_LIMIT = 0.025 # factor preprocess winsorize limit

def preprocess(a):
    
    # find inf and -inf and replace with nan
    inds = np.where(np.isinf(a))
    a[inds] = np.nan
    
    # demean and replace nans with 0
    a = np.nan_to_num((a-np.nanmean(a)))
    
    a = winsorize(a,limits=(WIN_LIMIT,WIN_LIMIT))
    
    return preprocessing.scale(a)
        
def make_features():
   
    class MeanReversion1M(CustomFactor):
        inputs = (Returns(window_length=21),)
        window_length = 252

        def compute(self, today, assets, out, monthly_rets):
            out[:] = preprocess(np.divide(
                monthly_rets[-1] - np.nanmean(monthly_rets, axis=0),
                np.nanstd(monthly_rets, axis=0)))


    class MoneyflowVolume5d(CustomFactor):
        inputs = (USEquityPricing.close, USEquityPricing.volume)

        # we need one more day to get the direction of the price on the first
        # day of our desired window of 5 days
        window_length = 6

        def compute(self, today, assets, out, close_extra, volume_extra):
            # slice off the extra row used to get the direction of the close
            # on the first day
            close = close_extra[1:]
            volume = volume_extra[1:]

            dollar_volume = close * volume
            denominator = dollar_volume.sum(axis=0)

            difference = np.diff(close_extra, axis=0)
            direction = np.where(difference > 0, 1, -1)
            numerator = (direction * dollar_volume).sum(axis=0)

            out[:] = preprocess(np.divide(numerator, denominator))


    class PriceOscillator(CustomFactor):
        inputs = (USEquityPricing.close,)
        window_length = 252

        def compute(self, today, assets, out, close):
            four_week_period = close[-20:]
            out[:] = preprocess(np.divide(
                np.nanmean(four_week_period, axis=0),
                np.nanmean(close, axis=0))-1)

    class Trendline(CustomFactor):
        inputs = [USEquityPricing.close]
        window_length = 252

        _x = np.arange(window_length)
        _x_var = np.var(_x)

        def compute(self, today, assets, out, close):
            x_matrix = repeat_last_axis(
                (self.window_length - 1) / 2 - self._x,
                len(assets),
            )

            y_bar = np.nanmean(close, axis=0)
            y_bars = repeat_first_axis(y_bar, self.window_length)
            y_matrix = close - y_bars

            out[:] = preprocess(np.divide(
                (x_matrix * y_matrix).sum(axis=0) / self._x_var,
                self.window_length)
            )


    class Volatility3M(CustomFactor):
        inputs = [Returns(window_length=2)]
        window_length = 63

        def compute(self, today, assets, out, rets):
            out[:] = preprocess(np.nanstd(rets, axis=0))


    class AdvancedMomentum(CustomFactor):
        inputs = [USEquityPricing.close, Returns(window_length=126)]
        window_length = 252

        def compute(self, today, assets, out, prices, returns):
            out[:] = preprocess(np.divide(
                (
                    (prices[-21] - prices[-252]) / prices[-252] -
                    prices[-1] - prices[-21]
                ) / prices[-21],
                np.nanstd(returns, axis=0)
            ))
            
    class asset_growth_3m(CustomFactor):
        inputs = [Returns(
        window_length=63,
    )]
        window_length = 1
        
        def compute(self, today, assets, out, returns):
            out[:] = preprocess(returns[-1,:])
    
    class asset_to_equity_ratio(CustomFactor):
        inputs = [Fundamentals.total_assets, Fundamentals.common_stock_equity]
        window_length = 1
        
        def compute(self, today, assets, out, total_assets, common_stock_equity):
            out[:] = preprocess(total_assets[-1,:] / common_stock_equity[-1,:])

    class capex_to_cashflows(CustomFactor):
        inputs = [Fundamentals.capital_expenditure, Fundamentals.free_cash_flow]
        window_length = 1
        
        def compute(self, today, assets, out, capital_expenditure, free_cash_flow):
            out[:] = preprocess(capital_expenditure[-1,:] / free_cash_flow[-1,:])
    
    class ebitda_yield(CustomFactor):
        inputs = [Fundamentals.ebitda, USEquityPricing.close]
        window_length = 1
        
        def compute(self, today, assets, out, ebitda, close):
            out[:] = preprocess((ebitda[-1,:] * 4) / close[-1,:])
    
    class ebita_to_assets(CustomFactor):
        inputs = [Fundamentals.ebit, Fundamentals.total_assets]
        window_length = 1
        
        def compute(self, today, assets, out, ebit, total_assets):
            out[:] = preprocess((ebit[-1,:] * 4) / total_assets[-1,:])
    
    class return_on_total_invest_capital(CustomFactor):
        inputs = [Fundamentals.roic]
        window_length = 1
        
        def compute(self, today, assets, out, roic):
            out[:] = preprocess(roic[-1,:])
    
    class net_income_margin(CustomFactor):
        inputs = [Fundamentals.net_margin]
        window_length = 1
        
        def compute(self, today, assets, out, net_margin):
            out[:] = preprocess(net_margin[-1,:])
    
    class operating_cashflows_to_assets(CustomFactor):
        inputs = [Fundamentals.operating_cash_flow, Fundamentals.total_assets]
        window_length = 1
        
        def compute(self, today, assets, out, operating_cash_flow, total_assets):
            out[:] = preprocess((operating_cash_flow[-1,:] * 4) / total_assets[-1,:])
    
    class price_momentum_3m(CustomFactor):
        inputs = [Returns(window_length=63)]
        window_length = 1
        
        def compute(self, today, assets, out, returns):
            out[:] = preprocess(returns[-1,:])
    
    class returns_39w(CustomFactor):
        inputs = [Returns(window_length=215)]
        window_length = 1
        
        def compute(self, today, assets, out, returns):
            out[:] = preprocess(returns[-1,:])
            
    # class MACD_Signal(CustomFactor):
    #     inputs = [MACDSignal]
    #     window_length = 1
        
    #     def compute(self, today, assets, out, macdsignal):
    #         out[:] = preprocess(macdsignal[-1,:])

    return {
        'Asset Growth 3M': asset_growth_3m,
        'Asset to Equity Ratio': asset_to_equity_ratio,
        'Capex to Cashflows': capex_to_cashflows,
        'EBIT to Assets': ebita_to_assets,
        'EBITDA Yield': ebitda_yield,
        'Mean Reversion 1M': MeanReversion1M,
        'Moneyflow Volume 5D': MoneyflowVolume5d,
        'Net Income Margin': net_income_margin,
        'Operating Cashflows to Assets': operating_cashflows_to_assets,
        'Price Momentum 3M': price_momentum_3m,
        'Price Oscillator': PriceOscillator,
        'Return on Invest Capital': return_on_total_invest_capital,
        '39 Week Returns': returns_39w,
        'Trendline': Trendline,
        'Volatility 3m': Volatility3M,
        'Advanced Momentum': AdvancedMomentum,
        # 'MACD Signal Line': MACD_Signal,
        }

def make_pipeline():
    
    universe = QTradableStocksUS()
    
    beta = SimpleBeta(target=sid(8554),regression_length=260,
                      allowed_missing_percentage=1.0
                     )
    
    features = make_features()
    
    combined_alpha = None
    for name, f in features.iteritems():
        if combined_alpha == None:
            combined_alpha = f(mask=universe)
        else:
            combined_alpha += f(mask=universe)

    longs = combined_alpha.top(NUM_POSITIONS/2)
    shorts = combined_alpha.bottom(NUM_POSITIONS/2)

    long_short_screen = (longs | shorts)

    pipe = Pipeline(columns = {
        'combined_alpha':combined_alpha,
        'beta':beta,
    },
    screen = long_short_screen
                   )
    return pipe

def initialize(context):

    attach_pipeline(make_pipeline(), 'long_short_equity_template')
    attach_pipeline(risk_loading_pipeline(), 'risk_loading_pipeline')

    # Schedule my rebalance function
    schedule_function(func=rebalance,
                      date_rule=date_rules.every_day(),
                      time_rule=time_rules.market_open(minutes=60),
                      half_days=True)
    # record my portfolio variables at the end of day
    schedule_function(func=recording_statements,
                      date_rule=date_rules.every_day(),
                      time_rule=time_rules.market_close(),
                      half_days=True)
    
    # comment out lines below for realistic backtesting
    set_commission(commission.PerShare(cost=0, min_trade_cost=0))
    set_slippage(slippage.FixedSlippage(spread=0))
    
def before_trading_start(context, data):

    context.pipeline_data = pipeline_output('long_short_equity_template')
    context.risk_loading_pipeline = pipeline_output('risk_loading_pipeline')

def recording_statements(context, data):

    record(num_positions=len(context.portfolio.positions))
    record(leverage=context.account.leverage)

def rebalance(context, data):
    
    pipeline_data = context.pipeline_data

    # demean and normalize
    combined_alpha = pipeline_data.combined_alpha
    combined_alpha = combined_alpha - combined_alpha.mean()
    combined_alpha = combined_alpha/combined_alpha.abs().sum()
    
    objective = opt.MaximizeAlpha(combined_alpha)
    
    constraints = []
    
    constraints.append(opt.MaxGrossExposure(MAX_GROSS_EXPOSURE))
    
    constraints.append(opt.DollarNeutral())
    
    constraints.append(
        opt.PositionConcentration.with_equal_bounds(
            min=-MAX_POSITION_SIZE,
            max=MAX_POSITION_SIZE
        ))
    
    beta_neutral = opt.FactorExposure(
        loadings=pipeline_data[['beta']],
        min_exposures={'beta':MIN_BETA_EXPOSURE},
        max_exposures={'beta':MAX_BETA_EXPOSURE}
        )
    constraints.append(beta_neutral)
    
    risk_model_exposure = opt.experimental.RiskModelExposure(
        context.risk_loading_pipeline.dropna(),
        version=opt.Newest,
    )
      
    constraints.append(risk_model_exposure)
    
    order_optimal_portfolio(
                objective=objective,
                constraints=constraints,
                )
There was a runtime error.
13 responses

Some contributions, logging preview of data (would also count nans), little different normalization, nanfill(), MACD and show_opt_weights. Currently on TargetWeights instead of MaximizeAlpha just seeing what happens. Only to '08 here.

2005-01-04 05:45 log_pipe:277 INFO Rows: 400  Columns: 7  
                                   min                 mean                max  
 Capex to Cashflows     -3.17832165546     -0.0609523660925      3.08108326485  
     EBIT to Assets     -3.18221183985       -0.14835647347      2.39685051613  
               MACD     -4.13792045044     -0.0941518487729     0.762728988302  
Moneyflow Volume 5D     -1.79092275867       -0.12176130157      2.17061364146  
      Volatility 3m     -1.30869878045       0.436766827585      2.85606751559  
               beta     0.236772337938        1.41296277251      3.28745561795  
     combined_alpha     -19.5841117298       0.264473410066      21.3321623585  
2005-01-04 05:45 log_pipe:292 INFO _ _ _   Capex to Cashflows   _ _ _  
    ... Capex to Cashflows highs  
                      Capex to Cashflows  EBIT to Assets      MACD  \  
Equity(21110 [APCS])            3.081083       -0.195858 -0.042299  
Equity(16511 [KMX])             3.081083        0.741732  0.061259  
Equity(15129 [FDS])             3.081083        2.396851  0.569937  
Equity(3706 [HTLD])             3.081083        0.953252  0.081986

                      Moneyflow Volume 5D  Volatility 3m      beta  \  
Equity(21110 [APCS])             0.683083       0.804737  1.375408  
Equity(16511 [KMX])              0.165734       0.367713  1.493836  
Equity(15129 [FDS])             -0.057370      -0.319507  1.081853  
Equity(3706 [HTLD])              0.185720       0.234778  1.422592

......

show_opt_weights() can be filtered for sids.

2005-07-11 07:30 show_opt_weights:322 INFO Close  
2005-07-11 07:30 show_opt_weights:329 INFO 0.00140 => 0  ACI  
2005-07-11 07:30 show_opt_weights:329 INFO 0.00178 => 0  TVTY  
2005-07-11 07:30 show_opt_weights:329 INFO -0.00144 => 0  BCR  
2005-07-11 07:30 show_opt_weights:329 INFO 0.00270 => 0  BPOP  
2005-07-11 07:30 show_opt_weights:331 INFO     40 more  
2005-07-11 07:30 show_opt_weights:340 INFO Open  
2005-07-11 07:30 show_opt_weights:347 INFO 0 => 0.00155  AES  
2005-07-11 07:30 show_opt_weights:347 INFO 0 => 0.00277  AIG  
2005-07-11 07:30 show_opt_weights:347 INFO 0 => -0.00206  ARRO  
2005-07-11 07:30 show_opt_weights:347 INFO 0 => -0.00275  ATW  
2005-07-11 07:30 show_opt_weights:349 INFO     49 more  
2005-07-11 07:30 show_opt_weights:360 INFO Change  
2005-07-11 07:30 show_opt_weights:362 INFO -0.005070 => -0.005000  VRX  
2005-07-11 07:30 show_opt_weights:362 INFO -0.005060 => -0.005000  PRGO  
2005-07-11 07:30 show_opt_weights:362 INFO -0.005148 => -0.005000  KERX  
2005-07-11 07:30 show_opt_weights:362 INFO -0.005272 => -0.005000  RCPI  
2005-07-11 07:30 show_opt_weights:362 INFO -0.005399 => -0.005000  ENCY  
2005-07-11 07:30 show_opt_weights:362 INFO -0.005351 => -0.005000  WYNN  
2005-07-11 07:30 show_opt_weights:362 INFO -0.005147 => -0.005000  CYBX  
2005-07-11 07:30 show_opt_weights:362 INFO -0.005059 => -0.005000  SSRM  
2005-07-11 07:30 show_opt_weights:362 INFO -0.005142 => -0.005000  ICOS  
Clone Algorithm
25
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
from quantopian.algorithm import attach_pipeline, pipeline_output, order_optimal_portfolio
from quantopian.pipeline import Pipeline
from quantopian.pipeline.data.builtin import USEquityPricing
from quantopian.pipeline.data import Fundamentals, psychsignal
from quantopian.pipeline.factors import AnnualizedVolatility, SimpleBeta, Returns, MACDSignal, CustomFactor
import quantopian.optimize as opt
from quantopian.pipeline.experimental import risk_loading_pipeline
from quantopian.pipeline.filters import QTradableStocksUS
from sklearn import preprocessing
from scipy.stats.mstats import winsorize
import numpy as np

from zipline.utils.numpy_utils import (
    repeat_first_axis,
    repeat_last_axis,
)

MAX_GROSS_EXPOSURE = 1.0
NUM_POSITIONS      = 400 # even number
MAX_POSITION_SIZE  = 2.0 / NUM_POSITIONS
MIN_BETA_EXPOSURE  = -0.3
MAX_BETA_EXPOSURE  =  0.3
WIN_LIMIT          = 0.025 # factor preprocess winsorize limit

def make_pipeline():
    universe = QTradableStocksUS()
    
    beta = SimpleBeta(target=sid(8554),regression_length=260,
                      allowed_missing_percentage=1.0
                     )
    
    features = make_features()
    logview  = {}
    watch    = [
        'Volatility 3m',
        'Moneyflow Volume 5D',
        'EBIT to Assets',
        'Capex to Cashflows',
        'MACD',
    ]
    '''
        'Asset Growth 3M'               : asset_growth_3m,
        'Asset to Equity Ratio'         : asset_to_equity_ratio,
        'Capex to Cashflows'            : capex_to_cashflows,
        'EBIT to Assets'                : ebita_to_assets,
        'EBITDA Yield'                  : ebitda_yield,
        'Mean Reversion 1M'             : MeanReversion1M,
        'Moneyflow Volume 5D'           : MoneyflowVolume5d,
        'Net Income Margin'             : net_income_margin,
        'Operating Cashflows to Assets' : operating_cashflows_to_assets,
        'Price Momentum 3M'             : price_momentum_3m,
        'Price Oscillator'              : PriceOscillator,
        'Return on Invest Capital'      : return_on_total_invest_capital,
        '39 Week Returns'               : returns_39w,
        'Trendline'                     : Trendline,
        'Volatility 3m'                 : Volatility3M,
        'Advanced Momentum'             : AdvancedMomentum,
        'MACD'                          : MACD,
        #'MACD Signal Line'              : MACD_Signal,
    '''
    
    combined_alpha = None
    for name, f in features.iteritems():
        results = f(mask=universe)
        if combined_alpha == None:
            combined_alpha = results
        else:
            combined_alpha += results
        if name in watch:
            logview[name] = results
        
    longs  = combined_alpha   .top(NUM_POSITIONS/2)
    shorts = combined_alpha.bottom(NUM_POSITIONS/2)

    columns = {
        'combined_alpha': combined_alpha,
        'beta'          : beta,
    }
    columns.update(logview)
    
    return Pipeline(
        screen = (longs | shorts), columns = columns
    )

def trade(context, data):
    
    pipeline_data = context.pipeline_data

    # demean and normalize
    combined_alpha = norm(context, pipeline_data.combined_alpha)
    '''
    combined_alpha = pipeline_data.combined_alpha
    combined_alpha = combined_alpha - combined_alpha.mean()
    combined_alpha = combined_alpha/combined_alpha.abs().sum()
    '''
    
    objective = opt.TargetWeights(combined_alpha)
    #objective = opt.MaximizeAlpha(combined_alpha)
    
    constraints = []
    
    constraints.append(opt.MaxGrossExposure(MAX_GROSS_EXPOSURE))
    
    constraints.append(opt.DollarNeutral())
    
    constraints.append(
        opt.PositionConcentration.with_equal_bounds(
            min = -MAX_POSITION_SIZE,
            max =  MAX_POSITION_SIZE
        ))
    
    beta_neutral = opt.FactorExposure(
        loadings=pipeline_data[['beta']],
        min_exposures={'beta':MIN_BETA_EXPOSURE},
        max_exposures={'beta':MAX_BETA_EXPOSURE}
        )
    constraints.append(beta_neutral)
    
    risk_model_exposure = opt.experimental.RiskModelExposure(
        context.risk_loading_pipeline.dropna(),
        version=opt.Newest,
    )
      
    constraints.append(risk_model_exposure)
    
    
    '''
    show_opt_weights() 
        Not yet right maybe, haven't figured out what's what yet.
        How can this be done without running twice?
        Currently have to do both run_optimization() and order_optimal_portfolio()
        https://www.quantopian.com/help#running-optimizations
        quantopian.optimize.run_optimization() performs the same optimization as calculate_optimal_portfolio() but returns an OptimizationResult with additional information.
        Find result.new_weights
    '''
    show_weights = 0    # Set to nonzero.
    if show_weights:
        result = opt.run_optimization(
            objective=objective,
            constraints=constraints,
        )
        show_opt_weights(result.old_weights, result.new_weights)    # Show some changes
    
    order_optimal_portfolio(
        objective=objective,
        constraints=constraints,
    )
    
def initialize(context):

    attach_pipeline(make_pipeline(), 'long_short_equity_template')
    attach_pipeline(risk_loading_pipeline(), 'risk_loading_pipeline')

    # Schedule my rebalance function
    schedule_function(func=trade,
                      date_rule=date_rules.every_day(),
                      time_rule=time_rules.market_open(minutes=60),
                      half_days=True)
    # record my portfolio variables at the end of day
    schedule_function(func=recording_statements,
                      date_rule=date_rules.every_day(),
                      time_rule=time_rules.market_close(),
                      half_days=True)
    
    # comment out lines below for realistic backtesting
    set_commission(commission.PerShare(cost=0, min_trade_cost=0))
    set_slippage(slippage.FixedSlippage(spread=0))
    
def before_trading_start(context, data):

    context.pipeline_data = pipeline_output('long_short_equity_template')
    context.risk_loading_pipeline = pipeline_output('risk_loading_pipeline')

    if 'log_pipe_done' not in context:       # show pipe info, also count of any nans
        log_pipe(context, data, context.pipeline_data, 4) #, details=['zoo', 'alpha'])

def recording_statements(context, data):

    record(num_positions=len(context.portfolio.positions))
    record(leverage=context.account.leverage)

def preprocess(a):
    
    # find inf and -inf and replace with nan
    inds = np.where(np.isinf(a))
    a[inds] = np.nan
    
    # demean and replace nans with 0
    a = np.nan_to_num((a-np.nanmean(a)))
    
    a = winsorize(a, limits = (WIN_LIMIT, WIN_LIMIT))
    
    return preprocessing.scale(a)

def nanfill(_in):
    # Alternative to replacing nans with zero above, maybe forward fill them.
    # Call like in MACD() or integrate with preprocess() maybe
    
    # https://www.quantopian.com/posts/forward-filling-nans-in-pipeline
    # From https://stackoverflow.com/questions/41190852/most-efficient-way-to-forward-fill-nan-values-in-numpy-array

    #return _in            # uncomment to not run the code below

    '''
    nan_num = np.count_nonzero(np.isnan(_in))
    if nan_num:
        log.info(nan_num)
        #log.info(str(_in))
    '''

    mask = np.isnan(_in)
    idx = np.where(~mask,np.arange(mask.shape[1]),0)
    np.maximum.accumulate(idx,axis=1, out=idx)
    _in[mask] = _in[np.nonzero(mask)[0], idx[mask]]
    return _in

def norm(c, d):    # d data, it's a series, normalize it pos, neg separately
    d = d[ d == d ]    # insure no nans
    if d.min() >= 0 or d.max() <= 0:
        d -= d.mean()
    #d -= d.mean()
    pos  = d[ d > 0 ]
    neg  = d[ d < 0 ]
    num  = min(len(pos), len(neg), NUM_POSITIONS / 2)
    pos  = pos.sort_values(ascending=False).head(num)
    neg  = neg.sort_values(ascending=False).tail(num)
    pos /=   pos.sum()
    neg  = -(neg / neg.sum())
    return pos.append(neg)
    
def log_pipe(context, data, df, num, details=None):
    ''' Log info about pipeline output or any DataFrame or Series (df)
    See https://www.quantopian.com/posts/overview-of-pipeline-content-easy-to-add-to-your-backtest
    '''

    # Options
    log_nan_only = 0
    show_sectors = 0
    show_sorted_details = 1

    if not len(df):
        log.info('Empty')
        return

    # Series ......
    context.log_pipe_done = 1 ; padmax = 6 ; content = ''
    if 'Series' in str(type(df)):    # is Series, not DataFrame
        nan_count = len(df[df != df])
        nan_count = 'NaNs {}/{}'.format(nan_count, len(df)) if nan_count else ''
        if (log_nan_only and nan_count) or not log_nan_only:
            pad = max(6, len(str(df.max())))
            log.info('{}{}{}   Series {}  len {}'.format('min' .rjust(pad+5),
                'mean'.rjust(pad+5), 'max' .rjust(pad+5),  df.name, len(df)))
            log.info('{}{}{} {}'.format(str(df.min()) .rjust(pad+5),
                str(df.mean()).rjust(pad+5), str(df.max()) .rjust(pad+5), nan_count
            ))
        return

    # DataFrame ......
    content_min_max = [ ['','min','mean','max',''] ]
    for col in df.columns:
        if col == 'sector' and not show_sectors: continue
        nan_count = len(df[col][df[col] != df[col]])
        nan_count = 'NaNs {}/{}'.format(nan_count, len(df)) if nan_count else ''
        padmax    = max( padmax, max(6, len(str(df[col].max()))) )
        content_min_max.append([col, str(df[col] .min()), str(df[col].mean()), str(df[col] .max()), nan_count])
    if log_nan_only and nan_count or not log_nan_only:
        content = 'Rows: {}  Columns: {}'.format(df.shape[0], df.shape[1])
        if len(df.columns) == 1: content = 'Rows: {}'.format(df.shape[0])

        paddings = [6 for i in range(4)]
        for lst in content_min_max:    # set max lengths
            i = 0
            for val in lst[:4]:    # value in each sub-list
                paddings[i] = max(paddings[i], len(str(val)))
                i += 1
        headr = content_min_max[0]
        content += ('\n{}{}{}{}{}'.format(
             headr[0] .rjust(paddings[0]),
            (headr[1]).rjust(paddings[1]+5),
            (headr[2]).rjust(paddings[2]+5),
            (headr[3]).rjust(paddings[3]+5),
            ''
        ))
        for lst in content_min_max[1:]:    # populate content using max lengths
            content += ('\n{}{}{}{}     {}'.format(
                lst[0].rjust(paddings[0]),
                lst[1].rjust(paddings[1]+5),
                lst[2].rjust(paddings[2]+5),
                lst[3].rjust(paddings[3]+5),
                lst[4],
            ))
        log.info(content)

    if not show_sorted_details: return
    if len(df.columns) == 1:    return    # skip detail if only 1 column
    if details == None: details = df.columns
    for detail in details:
        if detail == 'sector': continue
        hi = df.sort_values(by=detail, ascending=False).head(num)
        lo = df.sort_values(by=detail, ascending=False).tail(num)
        content  = ''
        content += ('_ _ _   {}   _ _ _'  .format(detail))
        content += ('\n\t... {} highs\n{}'.format(detail, str(hi)))
        content += ('\n\t... {} lows \n{}'.format(detail, str(lo)))
        if log_nan_only and not len(lo[lo[detail] != lo[detail]]):
            continue  # skip if no nans
        log.info(content)

def show_opt_weights(old, new):  # under development
    # Show some optimization changes
    
    # list of security id's to track exclusively
    sids      = []    
    max_lines = 4    # for Open & Close. Change is allowed to run wild till limit.
    
    closes = []
    for s in old.index:
        if sids and s.sid not in sids: continue
        if old.min() == 0.0 and old.max() == 0.0: break
        if old[s] == 0.0: continue
        if new[s] != 0.0: continue  # maybe better if not *near* zero
        closes.append('{} => 0  {}'.format('%.5f' % old[s], s.symbol))
    if closes:
        log.info('Close')
        count = 0 ; exceeded = 0
        for c in closes:
            count += 1
            if count > max_lines: 
                exceeded = 1
                break
            log.info(c)
        if exceeded:
            log.info('    {} more'.format(len(closes) - max_lines))
            
    opens = []
    for s in new.index:
        if sids and s.sid not in sids: continue
        if old[s] != 0.0: continue
        if new[s] == 0.0: continue
        opens.append('0 => {}  {}'.format('%.5f' % new[s], s.symbol))
    if opens:
        log.info('Open')
        count = 0 ; exceeded = 0
        for c in opens:
            count += 1
            if count > max_lines: 
                exceeded = 1
                break
            log.info(c)
        if exceeded:
            log.info('    {} more'.format(len(opens) - max_lines))
            
    changes = []
    for s in new.sort_values().index:
        if sids and s.sid not in sids: continue
        if old[s] == 0.0: continue
        if new[s] == 0.0: continue

        if old[s] == new[s]: continue
        changes.append('{} => {}  {}'.format('%.6f' % old[s], '%.6f' % new[s], s.symbol))
    if changes:        
        log.info('Change')
        for c in changes:
            log.info(c)
    
def make_features():
   
    class MeanReversion1M(CustomFactor):
        inputs = (Returns(window_length=21),)
        window_length = 252

        def compute(self, today, assets, out, monthly_rets):
            out[:] = preprocess(np.divide(
                monthly_rets[-1] - np.nanmean(monthly_rets, axis=0),
                np.nanstd(monthly_rets, axis=0)))


    class MoneyflowVolume5d(CustomFactor):
        inputs = (USEquityPricing.close, USEquityPricing.volume)

        # we need one more day to get the direction of the price on the first
        # day of our desired window of 5 days
        window_length = 6

        def compute(self, today, assets, out, close_extra, volume_extra):
            # slice off the extra row used to get the direction of the close
            # on the first day
            close = close_extra[1:]
            volume = volume_extra[1:]

            dollar_volume = close * volume
            denominator = dollar_volume.sum(axis=0)

            difference = np.diff(close_extra, axis=0)
            direction = np.where(difference > 0, 1, -1)
            numerator = (direction * dollar_volume).sum(axis=0)

            out[:] = preprocess(np.divide(numerator, denominator))


    class PriceOscillator(CustomFactor):
        inputs = (USEquityPricing.close,)
        window_length = 252

        def compute(self, today, assets, out, close):
            four_week_period = close[-20:]
            out[:] = preprocess(np.divide(
                np.nanmean(four_week_period, axis=0),
                np.nanmean(close, axis=0))-1)

    class Trendline(CustomFactor):
        inputs = [USEquityPricing.close]
        window_length = 252

        _x = np.arange(window_length)
        _x_var = np.var(_x)

        def compute(self, today, assets, out, close):
            x_matrix = repeat_last_axis(
                (self.window_length - 1) / 2 - self._x,
                len(assets),
            )

            y_bar = np.nanmean(close, axis=0)
            y_bars = repeat_first_axis(y_bar, self.window_length)
            y_matrix = close - y_bars

            out[:] = preprocess(np.divide(
                (x_matrix * y_matrix).sum(axis=0) / self._x_var,
                self.window_length)
            )


    class Volatility3M(CustomFactor):
        inputs = [Returns(window_length=2)]
        window_length = 63

        def compute(self, today, assets, out, rets):
            out[:] = preprocess(np.nanstd(rets, axis=0))


    class AdvancedMomentum(CustomFactor):
        inputs = [USEquityPricing.close, Returns(window_length=126)]
        window_length = 252

        def compute(self, today, assets, out, prices, returns):
            out[:] = preprocess(np.divide(
                (
                    (prices[-21] - prices[-252]) / prices[-252] -
                    prices[-1] - prices[-21]
                ) / prices[-21],
                np.nanstd(returns, axis=0)
            ))
            
    class asset_growth_3m(CustomFactor):
        inputs = [Returns(
        window_length=63,
    )]
        window_length = 1
        
        def compute(self, today, assets, out, returns):
            out[:] = preprocess(returns[-1,:])
    
    class asset_to_equity_ratio(CustomFactor):
        inputs = [Fundamentals.total_assets, Fundamentals.common_stock_equity]
        window_length = 1
        
        def compute(self, today, assets, out, total_assets, common_stock_equity):
            out[:] = preprocess(total_assets[-1,:] / common_stock_equity[-1,:])

    class capex_to_cashflows(CustomFactor):
        inputs = [Fundamentals.capital_expenditure, Fundamentals.free_cash_flow]
        window_length = 1
        
        def compute(self, today, assets, out, capital_expenditure, free_cash_flow):
            out[:] = preprocess(capital_expenditure[-1,:] / free_cash_flow[-1,:])
    
    class ebitda_yield(CustomFactor):
        inputs = [Fundamentals.ebitda, USEquityPricing.close]
        window_length = 1
        
        def compute(self, today, assets, out, ebitda, close):
            out[:] = preprocess((ebitda[-1,:] * 4) / close[-1,:])
    
    class ebita_to_assets(CustomFactor):
        inputs = [Fundamentals.ebit, Fundamentals.total_assets]
        window_length = 1
        
        def compute(self, today, assets, out, ebit, total_assets):
            out[:] = preprocess((ebit[-1,:] * 4) / total_assets[-1,:])
    
    class return_on_total_invest_capital(CustomFactor):
        inputs = [Fundamentals.roic]
        window_length = 1
        
        def compute(self, today, assets, out, roic):
            out[:] = preprocess(roic[-1,:])
    
    class net_income_margin(CustomFactor):
        inputs = [Fundamentals.net_margin]
        window_length = 1
        
        def compute(self, today, assets, out, net_margin):
            out[:] = preprocess(net_margin[-1,:])
    
    class operating_cashflows_to_assets(CustomFactor):
        inputs = [Fundamentals.operating_cash_flow, Fundamentals.total_assets]
        window_length = 1
        
        def compute(self, today, assets, out, operating_cash_flow, total_assets):
            out[:] = preprocess((operating_cash_flow[-1,:] * 4) / total_assets[-1,:])
    
    class price_momentum_3m(CustomFactor):
        inputs = [Returns(window_length=63)]
        window_length = 1
        
        def compute(self, today, assets, out, returns):
            out[:] = preprocess(returns[-1,:])
    
    class returns_39w(CustomFactor):
        inputs = [Returns(window_length=215)]
        window_length = 1
        
        def compute(self, today, assets, out, returns):
            out[:] = preprocess(returns[-1,:])
            
    # class MACD_Signal(CustomFactor):
    #     inputs = [MACDSignal]
    #     window_length = 1
        
    #     def compute(self, today, assets, out, macdsignal):
    #         out[:] = preprocess(macdsignal[-1,:])

    class MACD(CustomFactor):
        inputs = [USEquityPricing.close]
        window_length = 60
        def ema(self, data, window):      # Initial value for EMA is taken as trialing SMA
            c = 2.0 / (window + 1)
            ema = np.mean(data[-(2*window)+1:-window+1], axis=0)
            for value in data[-window+1:]:
                ema = (c * value) + ((1 - c) * ema)
            return ema
        def compute(self, today, assets, out, close):
            close = nanfill(close)
            fema = self.ema(close, 12)    # fast
            sema = self.ema(close, 26)    # slow
            macd_line = fema - sema
            macd = []
            macd.insert(0, self.ema(close,12) - self.ema(close,26))
            for i in range(1, 15, 1):
                macd.insert(0, self.ema(close[:-i],12) - self.ema(close[:-i],26))
            signal = self.ema(macd,9)
            out[:] = macd_line - signal
        
    return {
        'Asset Growth 3M'               : asset_growth_3m,
        'Asset to Equity Ratio'         : asset_to_equity_ratio,
        'Capex to Cashflows'            : capex_to_cashflows,
        'EBIT to Assets'                : ebita_to_assets,
        'EBITDA Yield'                  : ebitda_yield,
        'Mean Reversion 1M'             : MeanReversion1M,
        'Moneyflow Volume 5D'           : MoneyflowVolume5d,
        'Net Income Margin'             : net_income_margin,
        'Operating Cashflows to Assets' : operating_cashflows_to_assets,
        'Price Momentum 3M'             : price_momentum_3m,
        'Price Oscillator'              : PriceOscillator,
        'Return on Invest Capital'      : return_on_total_invest_capital,
        '39 Week Returns'               : returns_39w,
        'Trendline'                     : Trendline,
        'Volatility 3m'                 : Volatility3M,
        'Advanced Momentum'             : AdvancedMomentum,
        'MACD'                          : MACD,
        #'MACD Signal Line'              : MACD_Signal,
    }
    
There was a runtime error.

Thanks I’ll take a look when I get the chance.

@Grant Catching up a bit, what is the purpose of preprocess? Can you give an example?

I think you pasted the wrong link? :)

Sorry about that. I corrected the link above, and here it is:

https://www.quantopian.com/posts/alpha-combination-via-clustering#5d0c93ccdcf6b7004165d874

Thanks.

Regarding the framework itself, it seems it would be beneficial to have:

  • different weights to different factors
  • ability to have different factors for long and short portion

I have not seen frameworks generally do this.

Thanks Vladimir,

Having different weights for different factors is straightforward, and I've done it. One way is to return a tuple in the make_features() function above, which contains both the factor and its weight.

Long-only and short-only (or biased long or short) factors would take some thinking on how to do the normalization prior to combination. If you know how to do this, please share your technique.

I've implemented it outside quantopian contest algo structure (which is pretty easy) but not within it which is why I was asking.

Doing a simple .rank() for the long factor, and a negative .rank() for the short factor, and then combine them might work?

Kind of an interesting topic that I hadn't considered. I suppose that if one could find enough long and short factors, they could be combined, and would net out to no exposure, without de-meaning the alpha vectors prior to combination.

hi Grant,

in the unlikely event (but quite likely esp for fundamental data) that there is an outlier, say outlier=70000 (unit), and the true mean is 1(unit)..... "a = np.nan_to_num((a-np.nanmean(a))) " might result in np.nanmean(a) =say 25 (unit) as opposed to 1(unit, being the true mean), then "a-25" pushes most of the values to say -24 (unit). This is fine so far (as we will scale things again at the end), however, we then replace NaN with 0 by method of np.nan_to_num, so the NaN names become good alpha names (right of the distribution). These NaN names are more likely than not going to stay as good alpha numbers in the remaining of the preprocessing, despite the subsequent winzorization and final scaling.

i guess ideally the winsorization should be done before the first normalization, so that the first normalised numbers are more likely centred around zero, then it is fine to assign zero to NaN names. Do note, that scipy's winsorize method doesn't work well with NaNs in the array, so your current method of replacing NaN with zero is right, in a sense that it ensures the next line "winsorize" will work.

actually the best way to do the whole processing is to ignore NaN, then winsorize, then normalize, then throw the NaN back into the distribution as zero.

Thanks Zicai Feng -

You raise a good point! In the limit of a large outliers that skew the distribution, the NaNs don't end up where one would want them: zero, where zero means the factor predicts neither long nor short. As you have pointed out, with the code I shared above, the NaNs effectively inherit the skew, which is not desirable.

Should I get back into coding on Q, I'll have to fix this little problem.