Multiple Pipelines Available In Algorithms

As you might've noticed in the new risk API announcement, we've added the ability to use multiple pipelines in algorithms. One example of that is to use the risk loading pipeline along with another pipeline that you define, as seen in the attached algorithm.

Multiple pipelines can easily lead to a slowdown in your algorithm, because the pipeline machinery can optimize your data fetching within a single pipeline, but does not optimize data fetching across separate pipelines. In general, it's better to use a single pipeline. Some anti-patterns are putting each of your terms into its own pipeline, or having shared terms across multiple pipelines.

However, there are a few select use cases where multiple pipelines do work, like when you have disjoint sets of computations that you'd like to run and think about differently (if, for example, you have one pipeline for your risk loadings, and another pipeline for your alpha factors).

67
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
 Returns 1 Month 3 Month 6 Month 12 Month
 Alpha 1 Month 3 Month 6 Month 12 Month
 Beta 1 Month 3 Month 6 Month 12 Month
 Sharpe 1 Month 3 Month 6 Month 12 Month
 Sortino 1 Month 3 Month 6 Month 12 Month
 Volatility 1 Month 3 Month 6 Month 12 Month
 Max Drawdown 1 Month 3 Month 6 Month 12 Month
from quantopian.algorithm import attach_pipeline, pipeline_output
from quantopian.pipeline import Pipeline
from quantopian.pipeline.data.builtin import USEquityPricing

def initialize(context):
# Multiple pipelines can be attached in initialize.
attach_pipeline(make_pipeline(), 'price_pipeline')

def make_pipeline():
yesterday_close = USEquityPricing.close.latest

pipe = Pipeline(
screen = base_universe,
columns = {
'close': yesterday_close,
}
)
return pipe

# Multiple pipelines can be accessed from pipeline_output.
context.price_pipeline = pipeline_output('price_pipeline')
context.risk_pipeline = pipeline_output('risk_pipeline')
print context.risk_pipeline.head(1)
There was a runtime error.
Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

17 responses

Hi @Abhijeet,

This is potentially a huge improvement. Can we have a Pipeline that executes only once when initialized and for the first day only? This would be very helpful for CustomFactors with multi-year window_size, and only need to be run once. Further, it would be nice to use schedule_function to schedule a Pipeline run for those same long-running CustomFactors, so the factor could be updated weekly, or monthly.

Best Regards,
Doug

** crickets **

Doug, what type of long-running factor are you looking to run? A lot of the time, the bottleneck on speed is in loading the data into pipeline. Pipeline is efficient in that it only loads data points once, even if they're needed on multiple days. That means if you have a factor with 2 years of lookback that you run once per month, pipeline will still only load those 2 years of data once, so computing the pipeline less frequently won't actually save time. Now, if your computation is the bottleneck, then you can downsample your term. For example, you can do something like

my_factor = MyCustomFactor().downsample('month_start')


Does this help?

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Here's a simple example of the close price of AAPL being downsampled.

8

Hi @Jamie,

One representative long-running CustomFactor pulls 15 years of Fundamentals data, digests the data, and generates 365 values for each asset. I can easily do this in a Research Notebook, output a CSV, and local_csv() into an algo. Contest rules however prohibit local_csv().

I've tested this CustomFactor in the algo, generating the 1 value needed for each asset each day, but the algo times out after the first 10 or 20 days of backtesting. And besides, I have no need to repeat this calculation daily. Weekly or monthly perhaps would be useful but not absolutely necessary.

So no, the suggestion you offer, while appreciated, is unhelpful.

Best Regards,
Doug

This is super useful, thank you! I have code in some algorithms that uses calculations on some ETF's to determine market conditions, and I don't want the ETF's to be in my universe. Now it's easy to put my universe into one pipeline, and put the market condition calculations into a separate pipeline. It looks like this is what you're doing with your risk pipeline.

I was able to do everything I needed to do in a single pipeline, but this makes the code much simpler.

Hi @Jamie,

Does Q agree that scheduling an alternative Pipeline is useful and has merit, either on initiation only or periodically? Is this feature request assessed to be easily implemented?

Thank you,
Doug

Hi Doug,

Can you provide a bit more information on what you're trying to do? If I had to guess, it sounds like you're building up training data for some sort of model (based on the 15 years of data).

In general, it's not an easy task to schedule a pipeline to run on certain days only. That said, I'm still not sure I understand why the pipeline is timing out. Do you know roughly how long the pipeline takes to compute in research? Without knowing more details about the code, it's tough for me to pinpoint the issue. Based on the fact that your backtest was able to run for the first 10-20 days, my hunch is that it's the computation that's expensive, and I'm wondering if there's a way to make it more efficient so that it can run in a backtest. Would you be willing to share your code either here, or privately with our support team (Help -> Contact Support)?

Hi Jamie,

Thank you. If I understand correctly, pulling 15 years of the same data on each cycle is not a resource constraint. I'll rewrite an efficient daily calculation instead of processing all 365 days in one pass as I had done in research.

Best Regards,
Doug

Jamie thanks a lot.

Thank you for this!

Hi Jamie,

I am trying to create two monthly pipelines. One pipeline calls Citi Bank's EV/EBITDA, market cap, and industry. Then I will use that info to create another pipeline that filters out every stock in the same industry with market caps that are lower or higher than 10% of Citi's market cap. However, I don't know how to do this. Would you mind suggesting how to approach this?

Thanks,

Thanh

Best to use a single pipeline with a factor for the Citibank EV/EBITDA. Then base a filter off that. Not sure why you would want two separate pipelines. Keep it simple. (As a benefit, one pipeline will typically run faster too)

Hi Dan, I tried to get all the info of Citibank by using these codes:
def make_pipeline(context):
symbol = Fundamentals.primary_symbol.latest
symbol_filter = symbol.eq('C')

market_cap = Fundamentals.market_cap.latest

industry = Fundamentals.morningstar_industry_code.latest

EV_EBITDA = Fundamentals.ev_to_ebitda.latest

pipe = Pipeline(
columns={
'EV/EBITDA': EV_EBITDA,
'market_cap': market_cap,
'industry': industry
}, screen = symbol_filter)
return pipe


I try to find a way to call Citibank EV/EBITDA as a factor by just could not. Can you suggest a way to do that? Thank you very much!

Abhijeet and Jamie,

I have a few contest entries that are using the risk pipeline and have been working fine in backtests and in the contest. I started them as live trading with paper money to be able to periodically check on their performance. I've found previously that I learn a lot more when live trading an algorithm (especially when it's my own money but live trading support was ended...). Unfortunately, I received the following error on February 21st.

ValueError: Request for risk model data ending with 2018-02-21 could not be processed. Data is available up to 2018-02-16.
There was a runtime error on line 230.


Is this a known issue that the risk pipeline can only be used in backtesting? I'll try to make live versions again and see what happens. But as I mentioned, I learn a lot more by looking at the semi-real time output of an algorithm than from a backtest. So I would think it would be in Q's best interest to allow developers access to live trading their contest entries (with paper money).

Hi @Stephen,

This is a current known limitation, where live servers launch before the risk model data is available. I've made a note of it in our bug tracker, and it's on our list to get to.

Hi Everyone,

Thank you for implementing this (and for everything else on Q - truly grateful!). I'm quite a newbie both with Q and Python and would really appreciate any help with my below two struggles:

1. Is it possible to plot data from the Risk Pipeline in the Custom Data graph? If possible I would like to plot rolling Alpha, Beta, Sharpe, and Volatility. If it is, would you be able to provide some sample code, or point me to to the relevant lesson or lecture please?

2. I'm trying to write an algo that filters out stocks based on Fundamental data (fundamentals_pipe), then to buy/sell AND HOLD those positions based ONLY on Technical indicators (technical_pipe), even if they are no longer part of the fundamentals_pipe. Essentially a Value + Momentum strategy.

For example, filter out high and low p/e stocks, then buy and hold the low p/e stocks that are trading above a certain moving average (e.g. SMA50), even if they are no longer part of the 'low p/e stocks' filter.

What's the most efficient way to implement this? Using two pipes as described above, or using a 'for' or 'while' loop in my Rebalance function? Any sample code would really be appreciated. I'm happy to share my current code I'm struggling with. I'm getting a lot of time-outs, likely because my code is not very efficient.

All the best,
Joakim