Back to Community
Getting started - Struggling with pipelines


I have worked with Python and R, so I know pandas, numpy, and others packages for data analysis. However, I just can't understand those lines of from import EquityPricing, factset. I can't assign it to a object like:

df = USEquities  

It gives me an error. I have been looking for tutorials for those quantopian.pipelines and I found no one. The most similar documentation was the sklearn.pipeline.Pipeline documentation:

And the quantopian Pipelines tutorial:

And still, I just can't get over it. And it's very frustrating as I can't find any more documentation or examples. Do you have any examples/ideas to put some practice on this and get over it?

3 responses

The pipeline concept can be a bit difficult to wrap one's head around at first. But, once over the initial 'hump', it's pretty straightforward. Have no fear.

Also, it doesn't help that the the word 'pipeline' is used in other areas where it doesn't completely match with the Quantopian definition. The scikit-learn link listed above is an example. It's conceptually similar (ie a pipeline is 'thing' which performs sequential tasks) but isn't really the same other than the name.

Let's start with the imports. When one imports the following

from import EquityPricing, factset

the two classes EquityPricing and factset are DataSets. They are definitions for how to connect to the correct field in the data database. They aren't the data itself just the connection info. So, doing something like this

df = USEquities  

doesn't assign any data to 'df' rather just info on how to get the data. To get a list (or more precisely a dataframe) of the data, one needs to execute the pipeline using the run_pipeline method. The result of that is the data. Maybe read through the docs ( for the definition of BoundColumn.

All that said, there is another post which may make it more clear. Take a look at this ( ).

Hope that helps. Good luck.


The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

It really helped, thanks! Another question then, if you have "Quantopian own definition of pipelines", what else is just "Quantopian defined"? Reading, I know that the factors used in pipelines are also "Quantopian defined". What else? Because I don't want to switch to python in my job and have the surprise something I learned here It's not going to work anywhere else. I would appreciate knowing before-hand.

The pipeline uses auxiliary objects like factors, filters and classifiers, as well as datasets, which are incorporated in the quantopian API. The API also makes use of Equity objects, which are used as most commonly as indices in the pandas MultiIndex for the pandas DataFrame and Series objects, as well as futures and continuous futures objects for futures based strategies. Overall, they are nothing more than summary classes which gather the necessary information for the respective asset class. Refer to the API documentation, it is explained very well there. On another note, the output of the quantopian pipeline is a pandas DataFrame, which makes it easy to manipulate both in research and in the backtest environment. The idea behind the pipeline is to generalize the operations needed in an algorithmic strategy, so if you have the data yourself you should be able to achieve approximately the same results manually, however, it would presumably take more effort than using the pipeline provided here.