Back to Community
Forward filling nans in pipeline custom factors

A way to forward fill nans in pipeline adapted from stackoverflow.

Example:

class Quality(CustomFactor):  
    inputs = [Fundamentals.total_revenue]  
    window_length = 24  
    def compute(self, today, assets, out, total_revenue):  
        total_revenue = nanfill(total_revenue)  
        out[:] = total_revenue

def nanfill(arr):  
    mask = np.isnan(arr)  
    idx  = np.where(~mask,np.arange(mask.shape[1]),0)  
    np.maximum.accumulate(idx,axis=1, out=idx)  
    arr[mask] = arr[np.nonzero(mask)[0], idx[mask]]  
    return arr  
2 responses

Why would Fundamentals.total_revenue require forward filling? Are there companies for which total_revenue is not reported by the company? Or are these errors in the Fundamentals database?

Fundamental reporting is not consistent from all companies.

Counting nans also and logging counts if there are any nans:

def nanfill(arr):  
    nan_num = np.count_nonzero(np.isnan(arr))  
    if nan_num:  
        log.info(nan_num)  
        log.info(str(arr))  
    mask = np.isnan(arr)  
    idx  = np.where(~mask,np.arange(mask.shape[1]),0)  
    np.maximum.accumulate(idx,axis=1, out=idx)  
    arr[mask] = arr[np.nonzero(mask)[0], idx[mask]]  
    if nan_num:  
        log.info(str(arr))  
    return arr  

In my experience in backtests with nans forward filled this way I've seen some improved performance.
Try for example with factors in the Notebook at https://www.quantopian.com/posts/faster-fundamental-data