Back to Community
Which pipeline is wrong?

It is a simplified question from the previous post, I do not know how to delete the earlier one.

I am following youtube python tutorial given by here
I simply copied the notebook in the webpage to generate and run a pipeline (pipeline1), which is shown in the second table.
Then I removed one line about ranking of factor to generate another pipeline (pipeline), which is shown in the first table.
These two tables give entirely different factor column on the same security.
Anybody could explain why these two are different, and which one shall be the correct one?

Loading notebook preview...
Notebook previews are currently unavailable.
3 responses

Well, they are two different factor values because they are two different factors. Unless, I'm not understanding your question, in the first cell your 'testing_factor' is

testing_factor = operation_ratios.operation_margin.latest

While in the second cell 'testing factor' is

testing_factor = testing_factor.rank(mask=universe, method='average')

One is the actual operation_margin value and the other is the rank of that value.

A couple of things, In general don't use the same variable name for two different things. You should maybe rename them 'testing_factor' and 'testing_factor_rank'. This becomes especially important in notebooks. This may not be happening here, but since notebooks can be executed in any order, using the same variable name can result in different values based upon the order in which the cells were run.

Anyway, I've changed your notebook to show the two different factors and their two different values. Does that clarify it?

Loading notebook preview...
Notebook previews are currently unavailable.

@Dan, Yes, your reply clarifies my question. I originally thought rank() reshuffles the order. Anyway, it's clear to me now and thanks.

Ah yes, you may have been thinking of 'sort_values' which just reshuffles the rows. '.rank' is a very powerful method to normalize data which may have different ranges.