Fill zscore factor NAN with 0

When i try to combine multiple "input factors" into one "final factor" (e.g., inputs = "sales", "roa", and "cfinc"; final = "final" in attached NB), I am left with a NaN if any of the "input factors" were NaNs to begin with. I would like to be able to convert any NaN inside an "input factor" (which i have zscored) to 0s prior to combining into my final factor.

In the attached notebook, you can see the undesirable outcome with ticker 'BNS'. Two of the three "input factors" are giving NaNs, so the final output factor is a NaN as well. Of course I can .dropna() on the pipeline output (final cell in NB), but i would prefer to still have the ticker in my results - with the "final" factor simply a combination of the input factors that are not NaNs.

Any ideas? Thank you and apologies if this is obvious...

1
3 responses

Hi Tom, that part is not so obvious. You would use a CustomFactor:

class CombineFactorsNanFill(CustomFactor):
def compute(self, today, assets, out, *factors):
for f in factors:
nanfill(f)
out[:] = np.sum(factors, axis=0)

def nanfill(_in):   # Forward Fill NaNs
'''
From https://stackoverflow.com/questions/41190852/most-efficient-way-to-forward-fill-nan-values-in-numpy-array
https://www.quantopian.com/posts/forward-filling-nans-in-pipeline
'''

#return _in            # uncomment to not run the code below

'''
nan_num = np.count_nonzero(np.isnan(_in))   # count nans
if nan_num:
log.info(nan_num)
#log.info(str(_in))
'''
np.maximum.accumulate(idx,axis=1, out=idx)
return _in


And use it this way in pipeline definition:

factors = [sales, roa, cfinc]


Hope that helps!

Just keep in mind that filling with zeros will have impact on your scoring (i.e. not sure you can rely on stddev when there are lots of fake 0 values in your factor). Filling with mean would be better I think. Something like this would work but is kinda slow:

class CombineFactorsNanFill(CustomFactor):
def compute(self, today, assets, out, *factors):
for f in factors:
mean = np.nanmean(f)
f[np.isnan(f)]=mean
out[:] = np.sum(factors, axis=0)


Cheers

Thanks Charles!