Back to Community
Linear regression without targets possible in pipeline?

I would like to do in a pipeline mathematical notion of linear regression on equity returns similar to how scipy.stats.linregress does getting
slope, intercept, p, r, error for each equity in the pipeline for the lookup window. In particular, I don't want to specify a target asset that RollingLinearRegressionOf Returns and linear_regression seem to require. Note: I am not interested in beta of a stock, I want pure mathematical definition of linear regression of a sequence of values. Is that possible in quantopian pipeline?

scipy.stats.linregress(x, y=None)[source]
Calculate a linear least-squares regression for two sets of measurements.

Parameters:
x, y : array_like
Two sets of measurements. Both arrays should have the same length. If only x is given (and y=None), then it must be a two-dimensional array where one dimension has length 2. The two sets of measurements are then found by splitting the array along the length-2 dimension.
Returns:
slope : float
slope of the regression line
intercept : float
intercept of the regression line
rvalue : float
correlation coefficient
pvalue : float
two-sided p-value for a hypothesis test whose null hypothesis is that the slope is zero.
stderr : float
Standard error of the estimated gradient.

9 responses

You still need 2 arrays, either you call them X and Y or you provide only X but it has to have 2 dimensions (so it's the same).

Anyway I believe this is what you want:

def _slope(ts, x=None):  
    """  
    Input: Price time series.  
    Output: regression slope  
    """  
    if x is None:  
        x = np.arange(len(ts))  
    slope, intercept, r_value, p_value, std_err = stats.linregress(x, ts)  
    return slope

class Linregress(CustomFactor):  
    inputs = [USEquityPricing.close]  
    params = ('exclude_days',)  
    def compute(self, today, assets, out, close, exclude_days ):  
        ts = close[:-exclude_days]  
        x = np.arange(len(ts))  
        slope = np.apply_along_axis(_slope, 0, ts, x.T)  
        out[:] = slope

slope = Linregress(mask=universe,  window_length=30, exclude_days=0)  

EDIT:
the default input is close price, but you can pass any input factor you like

returns = Returns(window_length=2, mask=universe)  
slope = Linregress(inputs=[returns], mask=universe,  window_length=30, exclude_days=0)  

Thanks Luca. Exactly what I wanted.

I am trying to use this code in research, but am raising: ValueError: Inputs must not be empty. I am probably missing something obvious, but would really appreciate a pointer.

Click to load notebook preview

I have been working on getting a regression calculation working in Pipeline and have also been having some difficulty. I was also having the empty inputs issue mentioned above, but have been able to overcome the issue in the attached implementation. Now the problem I am having is that it is not returning separate values for each individual asset. It works when running individual assets, but repeats values when multiple assets are used. Any thoughts?

Click to load notebook preview

The code I added to your notebook returns values, you'll want to verify the slopes are accurate.

Click to load notebook preview

Here's another option using the numpy 'apply_along_axis' method. Not sure what the reasoning behind 'annualising' the slope? I omitted. The slopes seem right. Multiply the slope x 252 and that should give the annual price increase. For SPY thats .15x252=37.8. That's inline with the increase from a year ago.

Click to load notebook preview

Thanks for your suggestions on this, which are very helpful. The reason for annualizing the slope is simply because working with an annualized return figure is easier for me to interpret. In this version I converted prices to log values to normalize the slope rather than getting the slope of the price in dollars. I also added back in the annualize calculation and the r_value penalty, which is intended to penalize volatility. This seems to now be working as intended.

Click to load notebook preview

In my efforts to get this working last week I came across a custom factor at: https://github.com/quantopian/algorithm-component-library/blob/master/factors_project/factors_all.py

I have tested it against other versions and found that it runs quite a bit faster. May be useful to someone...

class Trendline(CustomFactor):  
    inputs = [USEquityPricing.close]  
    window_length = 252

    def compute(self, today, assets, out, low):

        # array to store values of each security  
        secs = []

        # days elapsed  
        days = xrange(self.window_length)

        for col in low.T:  
            # metric for each security  
            col_cov = np.cov(col, days)  
            secs.append(col_cov[0, 1] / col_cov[1, 1])  
        out[:] = secs  

Thank you for the solutions, I was having similar issues.