Back to Community
New Correlation and Linear Regression Factors

Three new built-in factors have just been added to pipeline: RollingPearsonOfReturns, RollingSpearmanOfReturns, and RollingLinearRegressionOfReturns. These factors are a first pass at offering correlation and regression capabilities that are fast enough to use via pipeline. We have plans to offer a more generic implementation of these factors in the future, but would love your feedback on them. For more detailed information on how to use these factors, check out the documentation.

The example algorithm below utilizes the "beta" output of RollingLinearRegressionOfReturns. RollingLinearRegressionOfReturns is a factor that computes multiple outputs, a feature that was released for custom factors a couple of weeks ago. This algorithm looks at the beta of high dollar-volume stocks with SPY, then longs the low-beta stocks and shorts the high-beta stocks. It also records/plots the rolling alpha, beta and correlation of AAPL with SPY to help better visualize what each output might look like.

Clone Algorithm
209
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 574876da7da3920f879333a8
There was a runtime error.
Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

21 responses

This post should be marked as interesting.

This is incredibly good progress in the last few weeks on Q!

Can these be used to compute correlation to cross sectional mean return? So, rather than include SPY in the universe, you just compare stocks individual returns to their cross sectional mean return. This could be market cap weighted, but equal weighted seems often used in papers.

I can see on github that RollingLinearRegressionOfReturns use Returns(window_length=returns_length) as its inputs. Does this mean we can pass factors as input to other factors at last?

The "rebalance" function in the example doesn't seem to ever close orphan positions from previous months, so not quite rebalancing.

Here's a version of the above with a fixed rebalancing function, producing a more realistic backtest.

Quantopian: what about having a default maximum leverage (eg 3x or same as contest condition) to catch gross bugs like this earlier? With an API routine to change the default for people who really want extreme leverage.

Clone Algorithm
36
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 574db5363ba6760f890a156b
There was a runtime error.

Awesome! Thank You Q. This makes Pipeline much more appealing.

@ Norbert, thanks for the mod as well. Saves a little time!

@Dan Unfortunately what you are suggesting is not currently possible with these factors; they are specifically limited to only computing correlations between the returns of individual stocks. As I mentioned above however, we intend to implement a more generic version of these factors that would allow for more interesting computations such as what you have in mind, so stay tuned and thanks for the feedback!

@Luca In general we still do not support the use of factors as inputs, but with the release of these new correlation/regression factors we have deemed that a select few factors can safely be used as inputs (such as Returns). The main difficulty of using other factors as inputs here is accounting for splits. For example, if we try to use SimpleMovingAverage as the input to our regression factor, and our target asset undergoes a split, the regression output would be distorted for any regressions computed over a window containing the date of the split. Factors that are comparable across splits, such as zscore and rank, are other examples (besides Returns) of factors that could be used as inputs.

@Norbert Thanks for catching that. We have actually implemented a set_max_leverage method, but have yet to release it for use in the IDE. I made an internal feature request to try to get that out.

Thank you David, your explanation makes sense. I appreciate that

@ David,

I am trying to pinpoint the time of the pricing that is used to calculate daily returns in this factor. This is probably obvious, but just want to confirm that daily returns are calculated on the open and close price that would correspond to the open and close price of the data I make use of in the backtester?

Thanks

Frank,

These built-in factors use close price to calculate Returns; this is consistent in both research and the backtester. For example, this can be seen here, where the Returns factor is created. Note that Returns uses a default input of close price. The built-ins are somewhat limited in scope as they restrict you from using, say, open price for Returns. In order to facilitate more generic computations, we implemented new correlation/regression methods for Factors, which will allow you to customize your Returns factor and even compute against different factors besides Returns. The post for that can be found here.

Thanks David.

close[0] and close[-1] from USEquityPricing.close are what are used...that is what I was trying to understand.

Is it possible to perform a multiple regression?

Is it possible to use these factors in a research Notebook? I can't seem to get them to work in the research environment. Is there a trick for passing the target= variable in research?

This worked for me

    regression_factor = RollingLinearRegressionOfReturns(  
        target=symbols('SPY'),  
        returns_length=10,  
        regression_length=90,  
    )  

Emerging from my shell to let all of you know this function is only accurate when returns_length = 2 while regression_length can be anything you wish except it must be 1 fewer than the number of trading days you are targeting.

A returns_length > 2 can make sense, but it depends on what you are looking for. Why do you say it's not accurate? Is there a bug in the factor code?

Code to clarify the apparent requirement of returns_length=2 for stock Beta values.

Clone Algorithm
5
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 580f0021cdc1ca12f38e47fe
There was a runtime error.

@blue, I didn't read carefully the code but there are at least 2 things I noticed:
- You should swap .pct_change() with .pct_change(context.returns_length-1) to get the same returns as the pipeline factor
- data.history should fetch enough days of data to calculate .pct_change on context.regression_length days, that is: context.regression_length + context.returns_length -1 (then you have to drop the nans)

This should work with any returns_length value

Clone Algorithm
9
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 5814758993247a0f2b5c48c8
There was a runtime error.

Good, now, since this is a linear regression, what can be done to make the algo OLS slope also match, perhaps.

Beta using pandas:

def beta_sids(context, data):  
    spy = sid(8554)  
    changes = data.history(list(context.stocks) + [spy], 'close', 252, '1d').ffill().bfill().pct_change()  
    return pd.Series({sid: changes[sid].cov(changes[spy]) / changes[spy].var() for sid in changes})  

Edit 2017-06-18
It dawned on me since then that my point about linear regression beta purely as slope above is just one case where one of the two sets being compared is a constant, like time or a counter.

When two sets of varying data are used as inputs to a linear regression (the implementation provided here by Quantopian), the beta value is precisely the beta we are familiar with in the environment around here, the stock market's beta.