Question about a CustomFactor look-back period

Hey all, I've cannibalized the following code to suit my desire to test the alpha surrounding the SMI factor, but I am unsure of how exactly to analyze the look-back period of the custom factor or of the individual metrics themselves.

class SMI(CustomFactor):
inputs = [USEquityPricing.close, USEquityPricing.high, USEquityPricing.low]
def compute(self, today, assets, out, close, high, low):
maxi = talib.MAX(high, timeperiod = 8)
mini = talib.MIN(low, timeperiod = 8)
center = (maxi+mini*.5)
c = (close[-1] - center)
H1 = talib.EMA(c, timeperiod = 3)
H2 = talib.EMA(H1, timeperiod = 3)
D = talib.EMA((maxi-mini), timeperiod = 8)
D1 = talib.EMA(D, timeperiod = 3)
D2 = .5*talib.EMA(D1, timeperiod = 3)
SMI = (H2/D2)
SMI_signal = talib.EMA(SMI, timperiod = 3)


What I'd like to do is to follow the general guidelines posted here: https://www.youtube.com/watch?v=dDFewKqNDfU
with the initial look-back period to be 8 days, followed by two, 3-day EMA smoothing periods for each D and H, and then an 8-day EMA smoothing period of the SMI as a signal. All that being said, if the data set were to start on say day 1, there shouldn't be any signal until day 19 (7 days until the first H, 2 days until the first H1, 2 days until the first H2, repeat the process for D so no new days there, then lastly 8 days until the first SMI signal).

I'm not confident my current code is doing this however, rather I think each time period acts independently and is pulling data that way, i.e. instead of starting on day 1 or current day -19, I'm just starting on current day -8. That would cause the whole code to be incorrect. And on that note, is there a way I could make this code have a look-back period of 100+ days and just pull the most recent SMI and SMI signal? The ways this algebra works the more data available in the look-back, the higher the resolution, i.e. 100 days of data >>> 19 days of data.

Thanks in advance! Let me know if you need any additional info.

17 responses

Anyone? Let me know if more clarification is needed.

# Stochastic Momentum Index (SMI) Indicator
# The Stochastic Momentum Index (SMI) was introduced by William Blau in 1993

import talib
# -------------------------------------------------
stock, period, smooth, sig = symbol('SPY'), 8, 3, 8
# -------------------------------------------------
def initialize(context):
schedule_function(Indicator, date_rules.every_day(), time_rules.market_open(minutes = 65))

def Indicator(context, data):
bars = period + smooth + sig

H = data.history(stock, 'high', bars, '1d')
L = data.history(stock, 'low', bars, '1d')
C = data.history(stock, 'close', bars, '1d')

HH = talib.MAX(H, period)
LL = talib.MIN(L, period)
M = (HH + LL)*0.5
D = (C - M)
HL = HH - LL
Dema_D = talib.DEMA(D, smooth)
Dema_HL = talib.DEMA(HL, smooth)
SMI = 2*Dema_D/Dema_HL
SMI_signal = talib.EMA(SMI, sig)

record(SMI = SMI[-1], SMI_signal = SMI_signal[-1])


Vladimir, thanks a ton! Very helpful. I didn't have to edit your version of the code much to get it to what I wanted, but I attached below nonetheless!

# Stochastic Momentum Index (SMI) Indicator
# The Stochastic Momentum Index (SMI) was introduced by William Blau in 1993

import talib
# -----------------------------------------
stock, period, smooth = symbol('UPRO'), 8, 3
# -----------------------------------------
def initialize(context):
schedule_function(Indicator, date_rules.every_day(), time_rules.market_open(minutes = 65))

def Indicator(context, data):
bars = 2*period + 2*smooth

H = data.history(stock, 'high', bars, '1d')
L = data.history(stock, 'low', bars, '1d')
C = data.history(stock, 'close', bars, '1d')

hh = talib.MAX(H, period)
ll = talib.MIN(L, period)
m = (hh + ll)*0.5
center = (C - m)
H1 = talib.DEMA(center, smooth) # DEMA = double exp
D = (hh - ll)
D1 = talib.DEMA(D, smooth) # DEMA = double exp
SMI = (H1/D1)
SMI_signal = talib.EMA(SMI, period)

record(SMI = SMI[-1], SMI_signal = SMI_signal[-1])


Currently trying to make this into a custom factor, this is what I've got so far but it isn't working...

class SMI(CustomFactor):

# Pre-declare inputs and window_length
inputs = [USEquityPricing.close, USEquityPricing.high, USEquityPricing.low]
window_length = 100
def compute(self, today, assets, out, close, high, low):
table = pd.DataFrame(index=assets)
H = high.history(assets, 'high', 50, '1d')
L = low.history(assets, 'low', 50, '1d')
C = close.history(assets, 'close', 50, '1d')
hh = talib.MAX(H, 8)
ll = talib.MIN(L, 8)
m = (hh + ll)*0.5
center = (C - m)
H1 = talib.DEMA(center, 3) # DEMA = double exp
D = (hh - ll)
D1 = talib.DEMA(D, 3) # DEMA = double exp
SMI_signal = (H1/D1)
SMI = talib.EMA(SMI_signal, 8)
SMI_diff = (SMI_signal-SMI)
table ["SMI_diff"] = SMI_diff[-1]
out[:] = table.fillna(table.max()).mean(axis=1)


Error message is something along the lines of numpy not allowing the use of .history. What'd be the easiest way to work around the data.history deprecation and this error?

This is the error code that I get. Updated custom factor is in attached notebook.

AttributeError: 'numpy.ndarray' object has no attribute 'history'


Any and all help is greatly appreciated!

3

Still having trouble getting this set up as a custom factor.

Will attach algorithm and backtest tonight to assist in getting assistiance. Until then and as always all help is greatly appreciated!

This is the working independent code as well as the custom factor which is commented out. The custom factor is what I'm having trouble getting running correctly.

If you can possibly help I would greatly appreciate it! Thanks in advance!

2
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
 Returns 1 Month 3 Month 6 Month 12 Month
 Alpha 1 Month 3 Month 6 Month 12 Month
 Beta 1 Month 3 Month 6 Month 12 Month
 Sharpe 1 Month 3 Month 6 Month 12 Month
 Sortino 1 Month 3 Month 6 Month 12 Month
 Volatility 1 Month 3 Month 6 Month 12 Month
 Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Stochastic Momentum Index (SMI) Indicator
# The Stochastic Momentum Index (SMI) was introduced by William Blau in 1993
from quantopian.pipeline.data.builtin import USEquityPricing
from quantopian.pipeline.factors import CustomFactor
import talib
# -----------------------------------------
stock, period, smooth = symbol('UPRO'), 8, 3
# -----------------------------------------
def initialize(context):

schedule_function(Indicator, date_rules.every_day(), time_rules.market_open(minutes = 65))

context.stocks = symbols('UPRO', 'TMF')

def Indicator(context, data):
bars = 2*period + 2*smooth

H = data.history(stock, 'high', bars, '1d')
L = data.history(stock, 'low', bars, '1d')
C = data.history(stock, 'close', bars, '1d')

hh = talib.MAX(H, period)
ll = talib.MIN(L, period)

m = (hh + ll)*0.5
center = (C - m)
H1 = talib.DEMA(center, smooth) # DEMA = double exp

D = (hh - ll)
D1 = talib.DEMA(D, smooth) # DEMA = double exp

SMI_signal = (H1/D1)
SMI = talib.EMA(SMI_signal, period)
SMI_diff = (SMI_signal - SMI)

record(SMI = SMI[-1], SMI_signal = SMI_signal[-1])
record(SMI_diff = SMI_diff[-1])

#class SMI(CustomFactor):

# Pre-declare inputs and window_length
#    inputs = [USEquityPricing.close, USEquityPricing.high, USEquityPricing.low]
#    window_length = 100

#    def compute(self, today, assets, out, close, high, low):
#        table = pd.DataFrame(index=assets)
#        H = high.history(assets, 'high', 50, '1d')
#        L = low.history(assets, 'low', 50, '1d')
#        C = close.history(assets, 'close', 50, '1d')
#        hh = talib.MAX(H, 8)
#        ll = talib.MIN(L, 8)
#        m = (hh + ll)*0.5
#        center = (C - m)
#        H1 = talib.DEMA(center, 3) # DEMA = double exp
#        D = (hh - ll)
#        D1 = talib.DEMA(D, 3) # DEMA = double exp
#        SMI_signal = (H1/D1)
#        SMI = talib.EMA(SMI_signal, 8)
#        SMI_diff = (SMI_signal-SMI)
#        table ["SMI_diff"] = SMI_diff[-1]
#        out[:] = table.fillna(table.max()).mean(axis=1)
There was a runtime error.

There are two issues with the custom factor.

First, no reason to fetch any data by using the 'history' method. That data is already passed to the compute function in the parameters high, low, close. The columns are the securities and the rows are dates (the last row is yesterdays data).

# no need for this
H = high.history(assets, 'high', 22, '1d')
L = low.history(assets, 'low', 22, '1d')
C = close.history(assets, 'close', 22, '1d')

# high, low, and close are 2D numpy arrays already with the data
H = high
L = low
C = close



Second, the talib functions generally expect 1D arrays with data for a single security. The high, low, and close (or H, L, C) are 2D arrays with data for multiple securities. Unfortunately, one needs to iterate over each column of these arrays and then pass the column to the talib function. Take a look at these posts for incorporating talib methods into custom factors. https://www.quantopian.com/posts/using-ta-lib-functions-in-pipeline or https://www.quantopian.com/posts/having-difficulty-with-macd-and-custom-factors-in-the-notebook-pipeline.

Dan, first off I appreciate your feedback and direction!

I fiddled with the code quite a bit and I think I have the structure somewhat lined out but I must admit, I'm not exactly sure what needs '[:, col_ix]'-ing and what doesn't. I've attached an updated notebook with the associated error:

IndexErrorTraceback (most recent call last)
<ipython-input-8-5e1955e5efa1> in <module>()
10 )
11
---> 12 result = run_pipeline(p, '2014', '2014-03')

/build/src/qexec_repo/qexec/research/api.py in run_pipeline(pipeline, start_date, end_date, chunksize)
479             start_date,
480             end_date,
--> 481             chunksize
482         )
483

/build/src/qexec_repo/qexec/research/_api.pyc in inner_run_pipeline(engine, equity_trading_days, pipeline, start_date, end_date, chunksize)
--> 864         chunksize=chunksize,
865     )
866

/build/src/qexec_repo/zipline_repo/zipline/pipeline/engine.pyc in run_chunked_pipeline(self, pipeline, start_date, end_date, chunksize)
328             chunksize,
329         )
--> 330         chunks = [self.run_pipeline(pipeline, s, e) for s, e in ranges]
331
332         if len(chunks) == 1:

/build/src/qexec_repo/zipline_repo/zipline/pipeline/engine.pyc in run_pipeline(self, pipeline, start_date, end_date)
309             dates,
310             assets,
--> 311             initial_workspace,
312         )
313

/build/src/qexec_repo/zipline_repo/zipline/pipeline/engine.pyc in compute_chunk(self, graph, dates, assets, initial_workspace)
536                     assets,
538                 )
539                 if term.ndim == 2:

/build/src/qexec_repo/zipline_repo/zipline/pipeline/mixins.pyc in _compute(self, windows, dates, assets, mask)
213
--> 214                 compute(date, masked_assets, out_row, *inputs, **params)
216         return out

<ipython-input-7-8dcf380e20fd> in compute(self, today, assets, out, high, low, close)
47             C = close[:, col_ix]
48
---> 49             hh = talib.MAX(H[:, col_ix], 8)
50             ll = talib.MIN(L[:, col_ix], 8)
51

IndexError: too many indices for array


Can you advise as to where I'm still going wrong?

1

Hi Joe,

You are correctly indexing high/low/close to get the data you need for each loop. Once you unpack the columns you need into H/L/C (using [:, col_ix] gives you a column from high/low/close), you have 1D arrays.

The error you are getting happens because you are using two indices with a 1D array. For example:

hh = talib.MAX(H[:, col_ix], 8)


Should be:

hh = talib.MAX(H, 8)


The same applies to L and C, and all results you get from talib functions (hh, ll, center, etc).

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Ernesto,

Thanks for the advice! I've made the necessary changes, but was having trouble trying to get multiple results to output (SMI, SMI diff, SMI signal) so I tried to output just SMI and there appears to be nothing coming through. I'm sure it has something to do with the out[..] as that's not typically how I output information but 1) the whole col_ix throws a wrench into how I know to output stuff from a custom factor and 2) I typically only output one factor from a custom factor.

1

Can anyone assist with this? Can't quite seem to figure it out.

Increase the window_length from 14 to 20. Another fix is to change the line in the custom factor


SMI = talib.EMA(SMI_signal, 8)



to be

     SMI = talib.EMA(SMI_signal, 3)



There's only 3 non-nan data points in SMI_signal to average. When it tries to average 8 (3 plus 5 nans) it always returns nan. The fix is to either input more data points (ie increase the window_length) or decrease the number of points being averaged (ie decrease the EMA length). Not sure if this changes your intention of factor but this at least returns an output.

Also try forward filling the nans.

That would look like this

def compute(self, today, assets, out, high, low, close):
high  = nanfill(high)
low   = nanfill(low)
close = nanfill(close)


If I change the 8 to a 3 it will not compute correctly, as I'm attempting to get an 8,3,3,8 EMA Stochastic MTM Oscillator as seen on Yahoo Finance Indicators.

Would that just be because I'm only calling out one date, but if I called for a range it would only return nans for the first 8 or so days, then every day after that the dataset would be populated?

Joe,

Try this 8-3-8

from quantopian.algorithm import attach_pipeline, pipeline_output
from quantopian.pipeline import factors, filters, classifiers
from quantopian.pipeline.data.builtin import USEquityPricing
from quantopian.pipeline.factors import CustomFactor
from quantopian.pipeline import Pipeline
import pandas as pd
import numpy as np
from numpy import isnan, nan
import talib
# -----------------------------------------------------------------------------------
stocks, period, smooth, sig  = filters.StaticAssets(symbols('QQQ', 'TLT')), 8, 3, 8
# -----------------------------------------------------------------------------------
bars = period + smooth + sig

def initialize(context):
schedule_function(Indicator, date_rules.every_day(), time_rules.market_close(minutes = 1))
m = stocks
pipe = Pipeline(columns = {'smi': smi}, screen = m & smi.notnull())
attach_pipeline(pipe, 'smi_set')

def Indicator(context, data):
stocks = pipeline_output('smi_set').index
smi_pipe = pipeline_output('smi_set').smi
smi_reg = np.zeros(len(stocks))

for i, stock in enumerate(stocks):
H = data.history(stock, 'high',  bars + 1, '1d')
L = data.history(stock, 'low',   bars + 1, '1d')
C = data.history(stock, 'close', bars + 1, '1d')

HH = talib.MAX(H, period)
LL = talib.MIN(L, period)
M = (HH + LL)*0.5
D = (C - M)
HL = HH - LL
Numer = talib.DEMA(D, smooth)
Denom = talib.DEMA(HL, smooth)

SMI = 2*Numer/Denom
SMI_sig = talib.EMA(SMI, sig)
SMI_diff = (SMI - SMI_sig)[-2]
smi_reg[i] = SMI_diff

for i, stock in enumerate(stocks):
record(**{stocks[i].symbol + '_pipe': smi_pipe[i]})
record(**{stocks[i].symbol + '_reg': smi_reg[i]})

def columnwise_anynan(array2d):
return isnan(array2d).any(axis = 0)

def nanfill(arr):
np.maximum.accumulate(idx,axis=1, out=idx)
return arr

class SMI_factor(CustomFactor):
inputs = [USEquityPricing.high, USEquityPricing.low, USEquityPricing.close]
window_length = bars
def compute(self, today, assets, out, high, low, close):
anynan = columnwise_anynan(high)
for col_ix, have_nans in enumerate(anynan):
if have_nans:
out[col_ix] = nan
continue
H = high[:, col_ix]
L = low[:, col_ix]
C = close[:, col_ix]
HH = talib.MAX(H, period)
LL = talib.MIN(L, period)
M = (HH + LL)*0.5
D = (C - M)
HL = HH - LL
Numer = talib.DEMA(D, smooth)
Denom = talib.DEMA(HL, smooth)
SMI = 2*(Numer/Denom)
SMI_sig = talib.EMA(SMI, sig)
SMI_diff = (SMI - SMI_sig)
results = SMI_diff[-1]     # SMI[-1], SMI_sig[-1], SMI_diff[-1]
out[col_ix] = results


Vladimir, it works flawlessly, thank you so very much!

There are some discrepancies in the Quantopian numbers and the numbers from Yahoo but that is to be expected with differing look-back values.

Again, thank you so much!