Back to Community
Chunksize for Pipeline in Research

I am running into a problem with the chunksize when I am running a pipeline. My understanding of chunksize: for chunksize I could type in 30, this chunksize of 30 will cause the dataframe that is returned form the run_pipeline function to only show data for securities every 30 days. Whenever I type in 30 for chunksize the returned dataframe continues to show data for the selected stocks for every singe day from start_date to end_date. Does anyone know why this is?

Thank you for your help
Jake

Loading notebook preview...
Notebook previews are currently unavailable.
7 responses

Hi Jake,

The keyword chunksize is used in run_pipeline to split up pipeline computations and make computing over a long period of time easier. It specifies the number of days in each chunk of output but is unrelated to the number of stocks output each day. That depends on the rest of your pipeline logic.
Learn more about run_pipeline here and please let me know if I can help with anything else!

Cheers,
Robert

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Thank you Robert I appreciate the response!

I reviewed the notebook "Research API Improvements". After spending some time attempting figure out how 'chunksize' works I still fail to understand how it works. This is my interpretation of how it works. If I were to execute this line of code:

stocks = run_pipeline(pipe,'05-01-2017','07-01-2017',chunksize=30)
stocks

then the above line of code would return a dataframe looking somewhat like this

                                          Percent change over the last 200 days  
                                NVDA                      .657  
05-01-2017                      AAPL                      .789  
                                SQ                        .456

                                AAPL                      .892  
05-31-2017                      NVDA                      .765  
                                SQ                        .528

                                NVDA                      .826  
06-30-2017                      AAPL                      .942  
                                SQ                        .581

note the above stock selection and values for the corresponding stocks are made up and not from an actual pipeline. I simply made this 'dataframe' as an example

Is my interpretation of how 'chunksize' works correct?

But when I run

stocks = run_pipeline(pipe,'05-01-2017,'07-01-2017',chunksize=30)
stocks

this dataframe is returned

                                          Percent change over the last 200 days  
                                NVDA                      .657  
05-01-2017                      AAPL                      .789  
                                SQ                        .456

                                AAPL                      .812  
05-02-2017                      NVDA                      .725  
                                SQ                        .478

                                NVDA                      .756  
05-03-2017                      AAPL                      .832  
                                SQ                        .521

...

So instead of my expectation that chunksize causes the pipeline to only return a list of stocks every X days, it returns a list of stocks for every single day since the start date.

Am I doing something wrong? How can I get a pipeline to return a dataframe just like the one in my previous reply?

Thank you for your help
Jake

If I'm not mistaken chunksize is just a hint to the backend to help manage resources while running the query, it shouldn't influence the results in any way. It allows you to balance the CPU/memory trade-off to make sure the pipeline run can actually finish without running out of memory.

@Ivory you are correct. Changing chunksize should never affect the final output of a pipeline; it's just an optimization hint for the pipeline engine.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Thank you for clearing this up. I understand now!

Jake