Conditional Z-Score

Hi,
I am looking for a way to create a conditional z-score in the following context:

def make_ml_pipeline(factors, universe):
factors_pipe = OrderedDict()
for name, f in factors.iteritems():
factors_pipe[name] = f().zscore()
pipe = Pipeline(screen=universe, columns=factors_pipe)
return pipe

Now the z-score substracts the mean and divides by the standard deviation. When the standard deviation is 0 for a particular date, the z-score fills in np.NaNs into the column for all assets for that particular date.
I would like to have the condition:
If standard deviation == 0 then use original value, else calculate the zscore across all assets for that particular date.

For the people who want to know the reason why one would need that... If you do any automated machine learning, where you are creating a nullhandling indicator column with zeros and ones, you have the occational column where no data is missing (i.e. the whole column will be filled with zeros). For that column the standard deviation will be 0 as well and therefore the zscore np.NaN. Since algorithms cannot calculate with np.NaNs, I want it to write 0 instead for that particular situation - in an automated way.

Any ideas?

3 responses

I have now tried to use sklearns StandardScaler() and MinMaxScaler() inside a Custom Factor function to be able to impose if conditions.

However, as can be seen in the attached notebook, for both formulas I cannot get the correct scaling with the StandardScaler() and MinMaxScaler().

I would appreciate it if somebody could tell me where my formula is wrong. Again I am trying to scale across assets per timestamp.

2
Notebook previews are currently unavailable.

The issue is not supplying the final screen as a mask to the factors. Without a mask ALL securities are passed to a factor. The enterprise_value_minmaxscaled factor therefore scales across all securities then, when the pipeline is run, some are excluded from the results because of the QTradableStocksUSscreen. These excluded securities happen to be the min values.

# Add the same factor mask as the screen to ensure the same 'universe' for both

def enterprise_value_original():
return Fundamentals.enterprise_value.latest
def enterprise_value_minmaxscaled():
def enterprise_value_standardscaled():
def enterprise_value_nulls():
def enterprise_value_nulls_minmaxscaled():

def enterprise_value_nulls_standardscaled():

all_factors={
'enterprise_value_original':enterprise_value_original,
'enterprise_value_minmaxscaled':enterprise_value_minmaxscaled,
'enterprise_value_standardscaled':enterprise_value_standardscaled,
'enterprise_value_nulls':enterprise_value_nulls,
'enterprise_value_nulls_minmaxscaled':enterprise_value_nulls_minmaxscaled,
'enterprise_value_nulls_standardscaled':enterprise_value_nulls_standardscaled
}
return all_factors



See attached notebook. The enterprise_value_minmaxscaled min value is now zero after applying the mask.

Hope that helps.

0