Help with Factor Scaling

Hi Everyone,

I'm trying to scale factors (from within Pipeline) to values between 0 and 1, using sklearn's MinMaxScaler. However, I'm running into errors that I can't figure out unfortunately. If anyone could help I'd be very grateful.

Attached is a Notebook with the errors I'm getting. Not sure if I need to use a CustomFactor or not, but I can't get it to work either way. Many thanks in advance.

3 responses

Joakim, the error you are seeing is UserWarning: MinMaxScaler assumes floating point values as input, got bool. That's the clue. You are passing boolean values to the custom factor. Probably unintentional. Here's the offending line of code

factor1_scaled = FactorScaler(inputs=[factor1.notnull()], mask=universe )

The input you are passing factor1.notnull() is a filter (ie boolean values) it's not the factor stripped of nulls. What you want is this

factor1_scaled = FactorScaler(inputs=[factor1], mask=universe & factor1.isfinite() )  

Pass the factor but then filter that factor with the mask. Notice too that it's probably wiser to use the isfinite() method. This will not only catch the nans but also the infinite values. Infinite values don't play well with the MinMaxScaler method. Best to avoid those too.

You may also want to set copy=False (rather than True) to avoid the additional step of copying the input array.

This is actually a nice way to normalize factors and an alternate to the 'rank' or 'demean' approaches.

Good post.

Hi Joakim, similar to what Dan mentioned above, here's a notebook I've used when exploring alternative scaling options aside from the usual ranking / demean / zscore.

Hi Dan and Daniel,

Super helpful, thanks so much!!