Back to Community
accessing positions held within pipeline?

Is it possible to access the positions held in the current portfolio within pipeline? Or is this something I would need to capture within before_trading_start prior to running pipeline, and feed in (via a global?)?

Any guidance?

10 responses

I think the pipeline is an tool to get data into your notebook, isn't it? In principle the pipe line doesn't know anything about your positions.... or am I misunderstanding your question?

@ Cpt Morgan -

That is the use case I've seen--apply a computation across a broad universe of stocks and output a single value per stock. I'm interested in the same thing, but incorporating the current portfolio weight vector in to the computation.

A use case of potential interest would be to run the Optimize API from within a Pipeline Custom Factor. The Optimize API has access to current positions, so this might do the trick.

You can pass context to the make pipeline and have access to positions via that object.

Thanks Leo -

That sounds like the right direction to head. It basically amounts to making the context object accessible to pipeline. If I use make_pipeline(context) that just makes context accessible from within the make_pipeline function, but I'd like it accessible from within some of my pipeline custom factors. Do you have an example?

Coincidentally I was doing the same thing yesterday, here's how:

my_factor = MyFactor()  
my_factor.context = context  

You can now access the context within MyFactor through self.context. An even cleaner way would be to pass context to the MyFactor constructor, but it turns out it's hard to define your own constructor in custom factors due to the way custom factors are pre-processed, you can read more here: https://www.quantopian.com/posts/python-noob-question-how-can-i-create-a-parameterized-customfactor-in-pipeline

(Another reasonably elegant solution from that thread is to make a helper custom factor constructor function that closes over the variables you'd like to pass to the custom factor, context in this case)

Thanks Ivory Ant. I'll give that a try.

There isn't any supported way to access your current positions from inside a Pipeline computation. This is by design, since the intended purpose of the Pipeline API is to be the place where you put daily-frequency calculations that don't depend on the state of your algorithm. Separating these calculations out from the rest of your algorithm is valuable for two reasons:

  1. Since Pipeline calculations can't depend on algorithm state, they can be run offline in the research environment without any code changes.
  2. Since Pipeline calculations can't depend on algorithm state, they can be pre-calculated and cached, which provides substantial performance benefits.

When you attach a pipeline to an algorithm, under the hood we pre-fetch all the inputs your pipeline will need for the next six months, run your pipeline over that entire period, and cache the results. Each day, when your algorithm calls pipeline_output, we slice off pre-calculated values for the current day and pass them back to your algorithm. You can see that logic in Zipline here.

In summary, while it is possible to get a reference to context in from a pipeline term, using anything stored on context in your compute function almost certainly won't do what you want, because your compute function gets called in 6-month blocks on data that's "in the future" relative to the rest of your algorithm.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Yeah I was just gonna say be careful, I found out myself yesterday: https://www.quantopian.com/posts/timestamps-in-log-seem-wrong

I guess we need to move some more sophisticated logic outside the pipeline.

Thanks Scott -

How does pipeline handle globals? When pipeline runs over the forward six months, would it just use the global values at the start of the six month period? It sounds like if I update the value of a global every day (from within the algo proper), pipeline will only see the value of the global every six months, correct?

And what if I write to a global from within pipeline? Presumably then the result of every daily pipeline computation could be stored in the global over the six month chunk, right?

When does pipeline run? It is a computational pig, every six months consuming before_trading_start (I think). But most of the time, there is plenty of time for other computations in before_trading_start. But as far as I know, there is no way to tell, within the code, for a given call to before_trading_start, whether the pipeline pig will consume the before_trading_start window. A suggestion would be to provide a flag, so that if pipeline will not be run, then the user could do something else with the time (5 minutes, right?).

Why the chunking in the first place? Is it basically database I/O overhead that makes it more efficient to chunk pipeline versus call it every day?