Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Why so many columns when calling "get_pricing" from a pipeline result?

Hey guys, I have a quick question.

See the attached notebook. All I do here is grab the momentum factor using the Q500 US universe.

When I inspect the resulting pipeline DataFrame (pipeline_df) grabbing one day of data, I see there are 500 rows for the 500 companies in the universe, so far so good.

When I call the get_pricing function, however, over 9,000 securities are returned?

Why is this the case? How can I make it so I only return the 500 stocks that were returned from my pipeline call?

Again the notebook is attached for your reference.

Thank you so much in advance for any help on this issue.

5 responses

The key is to use the get_level_values method. Don't look at the index directly. The index for a multi-index dataframe is an index object and not the actual rows. It may include deleted rows and doesn't always represent the current row count. There are several posts on this topic. Check out this one. There are also some good links in there with more info.

So, change your code to be something like this

# Use the `get_level_values` method to get the securities from pipeline_df  
prices = get_pricing(pipeline_df.index.get_level_values(level=1).unique(),  
                     start_date, end_date,  

Attached is a notebook showing this in action as well as how to get the 'prices' dataframe to be in the same format as the 'pipeline_df' dataframe. Also, remember that the pipeline dates are always 1 trading day after the associated pricing date.


The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.


Works like a charm!

Thank you very much Dan

Hi guys, I tried to import Dan's lines of code and change a couple bits so it is using my df but I got the error message:

"AttributeError: 'Pipeline' object has no attribute 'index'"

I have attached my notepad. I am just getting started with python for finance and still trying to learn all the relevant code and rules.

Simple mistake. Need to set the output of the pipeline to the variable 'pipe' (or anything you wish just be consistent in the code). Otherwise the output of the pipeline is never saved and cannot be referenced later on.

# original code  
run_pipeline(make_pipeline(), start_date, end_date).head(5) 

# should be  
pipe = run_pipeline(make_pipeline(), start_date, end_date)