Ranking factors, but normalizing ranks based on NaNs

So let's say I have two factors, A and B. I want to do:

(A.rank() + B.rank()).zscore()


The problem is that because there's a lot more NaNs in B than in A, the ranks for B are always smaller. How do make it so the ranks are comparable? i.e the top rank for both factors will always be the same. Is there some way I can filter out NaNs, maybe by keeping only the stocks that have no NaNs for every factor and then ranking? Thanks!

1 response

Certainly the 'cleanest' way to ensure you are ranking equally is to do as suggested and filter out any securities with a nan value in either A or B. This would create the same 'pool' of securities for each factor. The ranks then would be more comparable. This could be done something like this

a_and_b_are_finite = A.isfinite() & B.isfinite()

It's a good practice to use the isfinite() method rather than notnan() with zscores and ranks. Infinite values don't always play nicely with these methods so best to filter those out too. Using the mask parameter in this way will only rank those securities that have 'rankable' or finite values. Also note the use of the method parameter. This determines how tied values are handled. The default is 'ordinal' where each security is given a distinct rank, corresponding to the order that the values occur. The order is typically just by SID. Not very meaningful. Moreover, the values are probably independent. If two are the same then they should be assigned the same rank? My preference is to use the 'max' method. For info on the various options check out the scipy rankdata docs.