Back to Community
feed current portfolio weights into Pipeline custom factor?

Is there any way to feed the current portfolio weights into Pipeline custom factor?

27 responses

Any guidance on my question?

@Grant. I don't think so. Pipelines are quasi asynchronous. By design one first defines the data then, each time the 'pipeline_output' method is called, the pipeline objects go on their merry way pre-fetching and pre-calculating data in 'chunks'. The fist time 'pipeline_output' is called, for example, the pipeline gets 6 days worth of data by default. This means it has already gotten the data and calculated any custom factors BEFORE any trades have been made. There's no going back and changing that pipeline data with subsequent portfolio weights.

Thats my understanding.

Thanks Dan...I suspected, but I thought maybe a hack with a global might work. For live trading, it can't work this way, though, right? For example, doesn't this imply a call every day:

def before_trading_start(context, data):

    context.pipeline_data = pipeline_output('long_short_equity_template')  
    context.risk_loading_pipeline = pipeline_output('risk_loading_pipeline')  

If I put a value in a global and change it every day, what value will Pipeline use?

EDIT: I have to say, code that appears to do one thing, but does another (e.g. "chunking") would seem to be a bad practice. The chunking should be explicit and perhaps controllable.

Grant, you got me thinking (that is a feat in itself)....

I did a test, and yes, it looks like a custom factor can access global data. If that global data is changed then the custom factor will change. Theoretically, one could put portfolio weights into a global and the pipeline could act on it. This is problematic in backtesting, but yes, in live trading it should work. Pipeline can only fetch current data and doesn't 'pre-fetch' anything in live trading. Maybe try running the attached pipeline in live paper trading and see?

This approach in an algorithm may not be practical though. Since the pipeline behavior in backtesting is different than live trading one couldn't backtest the algorithm and expect comparable results.

Attached is a test algo which increments a global variable. Run it and look at the log. That global variable is outputted as a custom factor using pipeline. Both the actual global variable and the custom factor output are logged. One can see the 6 'prefetch' chunk cycles of the pipeline and then much longer chunks (about 130).

Regarding your comment about 'chunking should be explicit and perhaps controllable' I agree. It's possible in zipline but not the Q platform for some reason. See this post https://www.quantopian.com/posts/feature-request-expose-chunksize-in-pipeline-api-to-allow-predictable-backtesting-of-compute-intensive-pipelines-without-timeout

Clone Algorithm
5
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 5a75e73d1c14604611bd1c73
There was a runtime error.

@Grant, @Dan, hope you see the door you are opening. It can give you control over the equity line, over the pipeline, and even over the Optimizer API. And thereby force the portfolio as a whole to go in the direction you want. In a way, overriding objectives as well as constraints based on whatever you want.

I want to see more of this.

Thanks to you both.

Thanks Dan-Will have a look later.

Guy-No real advantage expected. It would be more of a convenience. I’d like to keep a factor in Pipeline that might benefit from the current portfolio weights. Presently I am setting to equal weight as a workaround.

@Grant, there is a lot more in there than meets the eye.

Look at it this way. It's like gaining from the outside some control over the general behavior of your trading strategy. Except, you will have to do it from the inside. Meaning you will need to plan where you want your trading strategy to go. Like giving it some general directives to follow: due to current market environment do a little bit more of this or that.

This is more than a feature. It gives you the ability to provide your portfolio with guiding equations to direct its overall behavior in ways that were constrained to constants otherwise.

@Grant, since all trading strategies can be represented by their respective payoff matrix: Σ(H∙ΔP), what is proposed is: Σ(H∙f(t)∙ΔP). A time function you control giving a hand to your inventory management procedures. And since you can put it there, why not go for: f(t) = (1+g(t))^t. That makes it a game changer. I've used similar techniques in some of my programs to enhanced performance.

Another hack would be to have a try-except in Pipeline and force an error. Then get data from a global and resume. Maybe the error would pause the chunking so that the global would be current?

Also I wonder if Fetcher is compatible with Pipeline? Are data input via Fetcher available to Pipeline custom factors? If so then this would be a backdoor for the current portfolio state to be made available in Pipeline.

@Grant, since Fetcher works somewhere, its content could be transferred to the context, use the same technique @Dan proposed and give control to the outside. This would be good for going forward.

In a backtest, you would have to provide a file that contained your directives from start to finish which should be considered a blatant case of peeking ahead since you would be providing controlling functions after the fact with maybe some knowledge that you might not have had otherwise.

Whereas, providing guiding equations from the inside, well, they are just that, equations. They do not peek, but they do plan the next step.

I’d heard that Q was working on a new version of Fetcher...I wonder how it will integrate with Pipeline?

@ Dan -

I just ran your algo above (# Backtest ID: 5a75e73d1c14604611bd1c73), and captured the log output:

1969-12-31 19:00 define_pipeline:38 INFO running defining pipeline method  
2011-01-04 09:31  PRINT global pipeline value: 1.0  
2011-01-04 09:31  PRINT global variable value: 1.0  
2011-01-04 09:31  PRINT setting global variable value to: 2.0  
2011-01-05 09:31  PRINT global pipeline value: 1.0  
2011-01-05 09:31  PRINT global variable value: 2.0  
2011-01-05 09:31  PRINT setting global variable value to: 3.0  
2011-01-06 09:31  PRINT global pipeline value: 1.0  
2011-01-06 09:31  PRINT global variable value: 3.0  
2011-01-06 09:31  PRINT setting global variable value to: 4.0  
2011-01-07 09:31  PRINT global pipeline value: 1.0  
2011-01-07 09:31  PRINT global variable value: 4.0  
2011-01-07 09:31  PRINT setting global variable value to: 5.0  
2011-01-10 09:31  PRINT global pipeline value: 1.0  
2011-01-10 09:31  PRINT global variable value: 5.0  
2011-01-10 09:31  PRINT setting global variable value to: 6.0  
2011-01-11 09:31  PRINT global pipeline value: 1.0  
2011-01-11 09:31  PRINT global variable value: 6.0  
2011-01-11 09:31  PRINT setting global variable value to: 7.0  
2011-01-12 09:31  PRINT global pipeline value: 7.0  
2011-01-12 09:31  PRINT global variable value: 7.0  

It is potentially a real pitfall for a programmer who doesn't understand what Pipeline is doing under the hood with its chunking. If I'm reading things correctly, static globals are compatible with Pipeline, but any global that changes will not be current (although I suppose if the global is changed within a Pipeline computation, it might be o.k.).

Your algo does show that if one knew when the chunking would occur, a global could be kept in synch, but clearly a daily incrementing of a global will cause a periodic offset.

I launched the algo into live trading. We'll see what happens.

The simplest solution would be to have the option of turning off the Pipeline chunking and take the performance hit in backtesting. This might have broader implications for the platform, since I'm guessing one benefit of chunking is that it lightens the load on their infrastructure. I'd be interested in hearing from Quantopian why we have the chunking in the first place, since it breaks the one-to-one correspondence between backtesting and live trading.

An added advantage of toggling off chunking would be to free up time in before_trading_start for other computations that take awhile. Presently, the chunking consumes a variable amount of time in before_trading_start unrealistically chewing up the 5-minute daily compute window (I've tripped up on this API ugliness, as well).

Algo corresponding to Backtest ID: 5a75e73d1c14604611bd1c73 (posted above), after 1 day of live trading:

2018-02-05 09:31  PRINT setting global variable value to: 2.0  
2018-02-05 09:31  PRINT global variable value: 1.0  
2018-02-05 09:31  PRINT global pipeline value: 1.0  
1969-12-31 19:00 define_pipeline:38 INFO running defining pipeline method  

We'll see what happens today.

Another update:

2018-02-06 09:31  PRINT setting global variable value to: 3.0  
2018-02-06 09:31  PRINT global variable value: 2.0  
2018-02-06 09:31  PRINT global pipeline value: 1.0  
2018-02-05 09:31  PRINT setting global variable value to: 2.0  
2018-02-05 09:31  PRINT global variable value: 1.0  
2018-02-05 09:31  PRINT global pipeline value: 1.0  
1969-12-31 19:00 define_pipeline:38 INFO running defining pipeline method  

If I'm not mistaken, Pipeline is not picking up the change in the global value from the prior day.

Add a print statement inside the compute method and you'll see what's going on:

    def compute(self, today, assets, out, high):  
        # Just output the global value  
        print ("inside compute global variable value: {}").format(GLOBAL_TEST_VALUE)  
        out[:] = GLOBAL_TEST_VALUE  

Pipeline is computed in chunks and the results are given back to the algorithm one day a time. When there is no more data in the current chunk data the pipeline is run again. Every time the pipeline computes a new chunk the new value of GLOBAL_TEST_VALUE is used for all the days contained in the chunk.

1970-01-01 01:00 define_pipeline:39 INFO running defining pipeline method  
2011-01-04 14:45  PRINT inside compute global variable value: 1.0  
2011-01-04 14:45  PRINT inside compute global variable value: 1.0  
2011-01-04 14:45  PRINT inside compute global variable value: 1.0  
2011-01-04 14:45  PRINT inside compute global variable value: 1.0  
2011-01-04 14:45  PRINT inside compute global variable value: 1.0  
2011-01-04 14:45  PRINT inside compute global variable value: 1.0  
2011-01-04 15:31  PRINT global pipeline value: 1.0  
2011-01-04 15:31  PRINT global variable value: 1.0  
2011-01-04 15:31  PRINT setting global variable value to: 2.0  
2011-01-05 15:31  PRINT global pipeline value: 1.0  
2011-01-05 15:31  PRINT global variable value: 2.0  
2011-01-05 15:31  PRINT setting global variable value to: 3.0  
2011-01-06 15:31  PRINT global pipeline value: 1.0  
2011-01-06 15:31  PRINT global variable value: 3.0  
2011-01-06 15:31  PRINT setting global variable value to: 4.0  
2011-01-07 15:31  PRINT global pipeline value: 1.0  
2011-01-07 15:31  PRINT global variable value: 4.0  
2011-01-07 15:31  PRINT setting global variable value to: 5.0  
2011-01-10 15:31  PRINT global pipeline value: 1.0  
2011-01-10 15:31  PRINT global variable value: 5.0  
2011-01-10 15:31  PRINT setting global variable value to: 6.0  
2011-01-11 15:31  PRINT global pipeline value: 1.0  
2011-01-11 15:31  PRINT global variable value: 6.0  
2011-01-11 15:31  PRINT setting global variable value to: 7.0  
2011-01-12 14:45  PRINT inside compute global variable value: 7.0  
2011-01-12 14:45  PRINT inside compute global variable value: 7.0  
2011-01-12 14:45  PRINT inside compute global variable value: 7.0  
2011-01-12 14:45  PRINT inside compute global variable value: 7.0  
2011-01-12 14:45  PRINT inside compute global variable value: 7.0  
2011-01-12 14:45  PRINT inside compute global variable value: 7.0  
2011-01-12 14:45  PRINT inside compute global variable value: 7.0  
2011-01-12 14:45  PRINT inside compute global variable value: 7.0  
2011-01-12 14:45  PRINT inside compute global variable value: 7.0  
2011-01-12 14:45  PRINT inside compute global variable value: 7.0  
2011-01-12 14:45  PRINT inside compute global variable value: 7.0  
2011-01-12 14:45  PRINT inside compute global variable value: 7.0  
2011-01-12 14:45  PRINT inside compute global variable value: 7.0  
2011-01-12 14:45  PRINT inside compute global variable value: 7.0  
2011-01-12 14:45  PRINT inside compute global variable value: 7.0  
2011-01-12 14:45  PRINT inside compute global variable value: 7.0  
2011-01-12 14:45  PRINT inside compute global variable value: 7.0  
2011-01-12 14:45  PRINT inside compute global variable value: 7.0  
2011-01-12 14:45  PRINT inside compute global variable value: 7.0  
2011-01-12 14:45  PRINT inside compute global variable value: 7.0  
2011-01-12 14:45  PRINT inside compute global variable value: 7.0  
2011-01-12 14:45 WARN Logging limit exceeded; some messages discarded  
2011-01-20 15:31  PRINT global pipeline value: 7.0  
2011-01-20 15:31  PRINT global variable value: 12.0  
2011-01-20 15:31  PRINT setting global variable value to: 13.0  
2011-01-21 15:31  PRINT global pipeline value: 7.0  
2011-01-21 15:31  PRINT global variable value: 13.0

[...]

@ Luca -

So under live trading, should one be able to feed data into Pipeline via a global and have it be current? I understand the chunking problem with backtesting (more or less), but for live trading, there is no forward window of data to chunk, right?

I don't know how live trading works, but from what I understood reading the Q people's comments in the forum it might be that in live trading the algorithm is restarted everyday from the first day of submission. That's very different from keeping an algorithm "alive" and update it every day. If that's the case then you'll have the same behaviour in live trading as you have when backtesting.

Thanks Luca -

Yes, I'd forgotten about the overnight re-start jazz for live trading. The live trading results above suggest that Pipeline sees the initial value of the global every day, which would be consistent with a re-start.

What's confusing, though, is that the global value incremented by one overnight. I thought only context persisted overnight? Hmm?

I contacted Q support on this confusion. I'm hoping someone knowledgeable about the machinery can shed some light on this weirdness.

What's confusing, though, is that the global value incremented by one
overnight. I thought only context persisted overnight? Hmm?

This is how I believe it works:
1. before the trading start a new backtest is run starting from the algorithm submission date. The algorithm logs, statistics and plots are replaced with this backtest results.
2. that backtest stays alive during the trading hours
3. at the end of the trading day the algorithm is stopped
4. the next day everything starts again from point 1.

This isn't a workflow that we intended to support. Can you tell me more about why you're trying to do it this way?

In general the workflow is to 1) use pipeline to define a target portfolio and then 2) use optimizer to migrate from the current portfolio the new target portfolio. It's not obvious why you'd use your current portfolio to define a target portfolio - that's effectively pre-empting the optimzer without any of the benefits of optimization.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

It's really just convenience and elegance that sent me down this rabbit hole. The idea behind one of my factors is to run an optimization, to find alpha factors (weights) that would be close to the current portfolio weights, but result in a net positive return upon rebalancing the portfolio. Being able to do everything in Pipeline would be nice, but is not absolutely necessary.

I would note to Q support that the behavior I'm seeing here is kinda unexpected. So, if this "gotcha" is not described in your docs., I recommend adding a few sentences.

Here's another update for the live-running algo (Backtest ID: 5a75e73d1c14604611bd1c73):

2018-02-07 09:31  PRINT setting global variable value to: 4.0  
2018-02-07 09:31  PRINT global variable value: 3.0  
2018-02-07 09:31  PRINT global pipeline value: 3.0  
2018-02-06 09:31  PRINT setting global variable value to: 3.0  
2018-02-06 09:31  PRINT global variable value: 2.0  
2018-02-06 09:31  PRINT global pipeline value: 1.0  
2018-02-05 09:31  PRINT setting global variable value to: 2.0  
2018-02-05 09:31  PRINT global variable value: 1.0  
2018-02-05 09:31  PRINT global pipeline value: 1.0  
1969-12-31 19:00 define_pipeline:38 INFO running defining pipeline method  

It is very odd that the global value from pipeline is correct the first day, then goes out of sync., and then comes back into sync. the third day.

The other weird behavior is that I thought that context is the only object that persists. However, it appears that globals do, as well.

Latest live algo output:

2018-02-08 09:31  PRINT setting global variable value to: 5.0  
2018-02-08 09:31  PRINT global variable value: 4.0  
2018-02-08 09:31  PRINT global pipeline value: 4.0  
2018-02-07 09:31  PRINT setting global variable value to: 4.0  
2018-02-07 09:31  PRINT global variable value: 3.0  
2018-02-07 09:31  PRINT global pipeline value: 3.0  
2018-02-06 09:31  PRINT setting global variable value to: 3.0  
2018-02-06 09:31  PRINT global variable value: 2.0  
2018-02-06 09:31  PRINT global pipeline value: 1.0  
2018-02-05 09:31  PRINT setting global variable value to: 2.0  
2018-02-05 09:31  PRINT global variable value: 1.0  
2018-02-05 09:31  PRINT global pipeline value: 1.0  
1969-12-31 19:00 define_pipeline:38 INFO running defining pipeline method  

Aside from Dan's response above of "This isn't a workflow that we intended to support" I did receive an e-mail from Q support. I'm still kinda confused what exactly is supported and what is not. Generally, I think globals are supported, and apparently they persist, as does context (which I believe is officially supported). So, should I assume that the persistence of globals is supported? And presumably the use of unchanging globals and Pipeline is supported? But the use of changing globals in combination with Pipeline is not supported?

Also, assuming that the persistence of globals is supported, is there a reason I would use context over globals for storing data that needs to persist?

I'll ask Q support if they can clarify.

Grant, the help docs suggest that you should use context instead of global variables:

context should be used instead of global variables in the algorithm.

This is under the Overview section.

As Dan mentioned, this isn't a behavior that we intended to support. It might work in some cases, but I don't have an explanation for the odd behavior that you're seeing.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Thanks Jamie-

Static globals are supported I gather and are compatible with Pipeline, correct? And anything dynamic should be stored in context if persistence is required, correct?

Hi Grant,

That's the correct interpretation, yes.

Tangent: If I remember right, we were being advised against globals until optimize came along, then all of the examples use them.
I prefer to have the numbers present where optimize is being run.

So then, if it is useful to Q in some way for us to use global variables with order_optimal_portfolio and its kin (like calculate_optimal_portfolio), would just like to know, thanks.