Back to Community
Log-Normal Returns in Pipeline

Maybe this has already been solved and I am fairly new to this, however, I notice that the built-in factor Returns() really returns the pct_difference version of a return ((close-close[1])/close[1]). However, I know from general math and videos like this (https://www.youtube.com/watch?v=PtoUlt3V0CI) that using this as a return is inappropriate for quant purposes since they don't multiply (cumproduct) across different time series accurately like the log-returns do. I want to do some future price expected value and confidence interval forecasting based on the drift rate and standard deviation of the log-returns so any inaccuracies in the returns have a significant compounding effect.

I am trying to use a custom factor log-normal return (log_return = ln(close/close[1]) in the definition of my pipeline and pass that output into another custom factor in my pipeline definition. However, I'm struggling with receiving 'window safe" errors when whenever I pass it through and haven't found a way around it yet. It just does not like my custom factor for the log-return. Perhaps it is just a simple datatype issue and I'm displaying my vast confusion on them but I saw some documentation that said only the Return(), Zscore(), and some other built-in factors were the only window safe factors available. I would have thought that they already would have made a built-in factor for this.

I have included my pipeline script below to illustrate the problem. I want to use that LogReturn() custom factor as an input (inputs=[today_log_return]) to the all the custom factors that currently receive inputs=[today_return], but the custom factors choke on inputs=[today_log_return] whereas they work just fine with inputs=[today_return]; but that is pct_change and that is not what I want.

I appreciate any help you can give. Thanks!

""" This algorithm trades usign the student T-test method
"""

from quantopian.algorithm import attach_pipeline, pipeline_output

from quantopian.pipeline import Pipeline
from quantopian.research import run_pipeline
from quantopian.pipeline.data.builtin import USEquityPricing
from quantopian.pipeline.factors import CustomFactor, Returns, SimpleMovingAverage, AverageDollarVolume
from quantopian.pipeline.filters.morningstar import Q1500US, Q500US
import numpy as np
import pandas as pd
import scipy.stats as stats
import math

t = 13 # Data aggregation period in days
T = 45 # Trade window in days for price expected value and confidence interval estimation
tc = 1.782 # T-statistic critical value for v = n-1 degrees of freedom, n = 13 days @ 95% Confidence Interval from http://www.itl.nist.gov/div898/handbook/eda/section3/eda3672.htm

class LogReturn(CustomFactor):
# Default inputs
inputs = [USEquityPricing.close]
window_length = 2
# Compute log return
def compute(self, today, assets, out, close):
logreturns = pd.Series(np.log(close[-1] / close[0]), name = 'logreturns') #np.log(close[-1] / close[0])
out[:] = logreturns

class StdDev(CustomFactor):
def compute(self, today, asset_ids, out, values):
# Calculates the column-wise standard deviation, ignoring NaNs
out[:] = np.nanstd(values, axis=0)

class Drift(CustomFactor):
# Default inputs
window_length = t
# Compute the drift rate
def compute(self, today, asset_ids, out, values):
mean_return = np.nanmean(values, axis=0)
std_return = np.nanstd(values, axis=0)
drift = mean_return/t + (std_return/np.sqrt(t))**2/2
out[:] = drift

class Tstat(CustomFactor):
# Default inputs
window_length = t
# Compute the drift rate
def compute(self, today, asset_ids, out, values):
mean_return = np.nanmean(values, axis=0)
std_return = np.nanstd(values, axis=0)
drift = mean_return/t + (std_return/np.sqrt(t))**2/2
std_drift=np.nanstd(drift, axis=0)
tstat = (drift-0)/(std_drift/np.sqrt(t))
out[:] = tstat

class MeanDrift(CustomFactor):
# Default inputs
window_length = t
# Compute the drift rate
def compute(self, today, asset_ids, out, values):
mean_return = np.nanmean(values, axis=0)
std_return = np.nanstd(values, axis=0)
drift = mean_return/t + (std_return/np.sqrt(t))**2/2
std_drift=np.nanstd(drift, axis=0)
mean_drift = np.nanmean(values, axis=0)
out[:] = mean_drift

class StdDrift(CustomFactor):
# Default inputs
window_length = t
# Compute the drift rate
def compute(self, today, asset_ids, out, values):
mean_return = np.nanmean(values, axis=0)
std_return = np.nanstd(values, axis=0)
drift = mean_return/t + (std_return/np.sqrt(t))**2/2
std_drift=np.nanstd(drift, axis=0)
out[:] = std_drift

class ExpectedValue(CustomFactor):
# Default inputs
inputs = [USEquityPricing.close]
window_length = 2
# Compute the drift rate
def compute(self, today, asset_ids, out, close):
logret = np.log(close[-1] / close[0])
mean_return = np.nanmean(logret, axis=0)
std_return = np.nanstd(logret, axis=0)
drift = mean_return/t + (std_return/np.sqrt(t))2/2
std_drift=np.nanstd(drift, axis=0)
expected_value = close[0]*np.exp((drift+std_drift
2/2)*T)
out[:] = expected_value

def make_pipeline():

# Base universe set to the Q500US  
base_universe = Q500US(minimum_market_cap=5000000000000000000)  

# Factor of latest close price.  
today_close = USEquityPricing.close.latest  

# Factor of today's return.  
today_return = Returns(inputs = [USEquityPricing.close], window_length=2)  

# Factor of today's return.  
today_log_return = LogReturn(inputs = [USEquityPricing.close], window_length=2)  

# Factor of the moving average return over time t.  
mean_return_t = SimpleMovingAverage(inputs=[today_return], window_length=t)

# Factor of the standard deviation of the return over time t.  
std_return_t = StdDev(inputs=[today_return], window_length=t)  

# Factor of the drift rate over time t.  
today_drift_t = Drift(inputs=[today_return], window_length=t)  

#Factor of the student t-statistic of the drift over time t.  
today_tstat = Tstat(inputs=[today_return], window_length=t)  

mean_drift_t = MeanDrift(inputs=[today_return], window_length=t)  

std_drift_t = StdDrift(inputs=[today_return], window_length=t)  

#ev_T = ExpectedValue(inputs=[USEquityPricing.close], window_length=T)  


return Pipeline(  
    columns = {  
        'close': today_close,  
        'return': today_return,  
        'log_return':today_log_return,  
        'mean_return': mean_return_t,  
        'std_return' : std_return_t,  
        'drift': today_drift_t,  
        'tstat': today_tstat,  
        'mean_drift' : mean_drift_t,  
        'std_drift': std_drift_t,  
        #'s_T': ev_T,  
    },  
    screen=(today_tstat > tc*3) & (today_close > 100)  
)  

result = run_pipeline(make_pipeline(), '2017-05-25', '2017-08-25').dropna()
print len(result)
result.tail(50)