Back to Community
How do you debug pipelines?

I've got a pipeline with about 20 factors with lots of computations between the factors. And the pipeline failed with an unhelpful message:

Something went wrong. Sorry for the inconvenience. Try using the built-in debugger to analyze your code. If you would like help, send us an email.  
ValueError: could not broadcast input array from shape (4521) into shape (8560)  
There was a runtime error on line 1509.  

Even the Blue Screen of Death produced more useful output than this.

Generally, how do you go about tracking a bug in a substantial pipeline? You can't trace through factors or operations on factors. You don't get a backtrace. The error messages are unhelpful. The graph draws a pretty static picture without any sense for what's happening in the pipeline during execution. When will quantopian get to improving their developer tools? You can't put together a DSL that you expect everyone to use without building better tooling around it!!!

S

10 responses

In the source code editor, go to line 1509, and click on the "1509" number to the left of the text. Re-run your backtest. Execution will stop at that line, and a shell will open in the source code, allowing you to evaluate expressions in that context and step through the code from there. See the gif in this thread.

Also, it sounds like you have a bit of a ball of mud which will be painful to transfer over, but it's worth figuring out how to pull your pipeline into the research environment. While it doesn't provide a debugger, it does let you iterate much faster than the backtester.

Sunil

I second Alex's suggestion to start in the research environment and then copy over to your algorithm. Remember that a pipeline simply returns a Pandas dataset. It is extremely helpful to print the dataset at various stages of the pipeline to ensure each column is returning expected data. More than once or twice I have found NaNs or zeros being returned which need to be accounted for and wouldn't be obvious in the algorithm IDE. Filters and masks can also be better seen in action in the research environment.

Dan

The debugger will go to the pipeline_output call, and no further. Have you ever thought about how the pipeline's implemented?

Here's how the pipeline execution seems to work: when the program executes, it sets up an execution graph that sets up a dataflow program. None of the code in the custom factors, or any of the actual operations on factors, are executed until you ask for output. At that point the bound columns in the dataflow start receiving inputs from the underlying factors, and the actual code executes. The code you write is effectively not the code that executes. And quantopian doesn't give sufficient tools to dig into the pipeline dataflow. Taking it into the research environment will not help this situation in any way. Nor will having good functional composition or object orientation or any other code organization technique.

The only way I can think of debugging this mess is to turn off parts of the pipeline, try running the code repeatedly, and see whether my program runs or fails. A binary search, if you will. It is about as inefficient as it sounds, especially since fundamental data access is so slow. My pipeline is large because I'm manipulating historical data over fundamental factors. Quantopian does not make available any other mechanism for accessing historical fundamental data.

Unfortunately quantopian seems more interested in adding more features than fixing the developer experience. This is really unfortunate.

S

Edit: Forgot to add, I have taken bits of my pipeline into the research environment when needed. It still doesn't help you see what's going on inside the pipeline. But it might help reconstitute parts of the pipeline more efficiently for doing my testing-by-parts.

Hi Sunil,

Based on the error message it looks like the problem is occurring in a custom Factor. (Specifically, the out[:] = line, since it's trying to broadcast to 8560, which is roughly the number of equities on the platform.) I would advise printing some debug text at the beginning of each compute function, with the text indicating which custom Factor it is. Then the last printed debug text will tell you the custom Factor you need to look more closely at. You could then use more debug text inside that custom Factor to pinpoint the problem.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Nathan, thanks for the response. I tracked down the problem to a custom factor where I was filtering out values before setting the output. Your diagnosis is spot on.

Sunil

Hi Sunil,

You're right that debugging pipeline is difficult in the IDE. I agree with Alex and Dan that you might have an easier time developing your pipeline in Research. In general, it runs faster (since it only needs to make the pipeline computations) and it gives better error messages that can help you debug the problem. They're not always perfect but they're usually more informative. You're correct about the implementation of Pipeline, by the way.

Improving the development experience is on our to-do list, it just hasn't reached the top of the priority list yet.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Working in the research environment during pipeline development also gives you access to the very helpful
pipe.show_graph() function. I posted an notebook that uses it that you might want to take a look at. It is very good for getting an understanding of what your pipeline is actually doing

Jamie,

I took Alex's and Dan's suggestion and put the pipeline into the research environment. It was easier to manipulate and run the pipeline in parts there, which made debugging at least possible if not easy. It took a bit over a day to do this. The results are posted in a different thread:

https://www.quantopian.com/posts/the-quantitative-value-algorithm-in-pipelines-dreams-and-nightmares

Stephan, the thread above includes a link to the pipeline graph. I hope you can see why I found the graph itself less than useful, and quite unwieldy. Going through your notebook was on my todo list, for other reasons. Looks like fantastic work, hope you're able to make good use of it.

I'm not going to be working on this particular project until Quantopian is in a position to backtest my algorithm.

Thanks,

Sunil

Well yeah that is an impressively complex graph. No wonder you had trouble debugging it. My biggest concern with that is that it appears to have a lot of free parameters which can be a sign of data snooping. Pity you couldn't get the whole thing to work over the recommended interval though otherwise it would have been cool to see an algorithm running using this.