Back to Community
Pipeline: check for inf and na values

I calculate the "Earning Yield" factor but I cannot figure out how to check for NaN/Inf values

import pandas as pd  
import numpy as np  
import math  
from quantopian.algorithm import attach_pipeline, pipeline_output  
from quantopian.pipeline import Pipeline  
from quantopian.pipeline import CustomFactor  
from quantopian.pipeline.data.builtin import USEquityPricing  
from quantopian.pipeline.data import morningstar  
from quantopian.pipeline.factors import SimpleMovingAverage


# Create custom factor subclass to calculate a market cap based on yesterday's  
# close  
class EarningYield(CustomFactor):

    # Pre-declare inputs and window_length  
    inputs = [morningstar.valuation.enterprise_value, morningstar.income_statement.ebit]  
    window_length = 1

    # Compute market cap value  
    def compute(self, today, assets, out, ev, ebit):  
        out[:] = ev[-1] / ebit[-1]

# Put any initialization logic here.  The context object will be passed to  
# the other methods in your algorithm.  
def initialize(context):  
    pipe = Pipeline()  
    attach_pipeline(pipe, name='my_pipeline')  
    # Construct the custom factor  
    earning_yield = EarningYield()  
    pipe.add(earning_yield, 'earning_yield')  
    remove_nan_inf = np.isfinite(earning_yield)  
    # Use multiple screens to narrow the universe  
    pipe.set_screen(remove_nan_inf)  


def before_trading_start(context, data):  
    # Access results using the name passed to `attach_pipeline`.  
    results = pipeline_output('my_pipeline')  
    print results.sort('earning_yield', ascending=False, na_position='last').head(100)  
    # Define a universe with the results of a Pipeline.  
    # Take the first ten assets by 30-day SMA.  
    update_universe(results.sort('earning_yield').index[:10])

Clone Algorithm
14
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 56490c40e1d6d5110f336029
There was a runtime error.
11 responses

At this point, the best way to remove the NaNs is to set a filter. There are two different filters I have used.

  1. You can and two filters together to get all the values greater than or less than 0. This would look like, (earning_yield0)
  2. Or, since NaN != NaN, you can set up a filter where (earning_yield == earning_yield). The correct way to do this would be (earning_yield.eq(earning_yield))

We want to add a notnull filter to make this easier, but these two approaches work now.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Ok, it's works for NaN. But for Inf?
I cannot use np.isfinite(earning_yield) because it doesn't works.

@Marco

np.isfinite(earning_yield) doesn't work because isfinite expects to be passed a numpy array, and earning_yield is an instance of your EarningYield class. The important thing to understand about the Pipeline API Factor objects is that they aren't arrays: they're objects that know how to produce arrays when plugged into the pipeline infrastructure. This means that if you want to do an isfinite check, you should either do it in your factor's compute method, or you should do it in before_trading_start when looking at the pipeline output.

In this particular case, if you're seeing infs, it probably means that there are assets for which ebit is 0 for some entries in our database, so you're getting inf when you divide by that zero. The right place to filter out those entries is probably in your compute function, which would look something like this:

def compute(self, today, assets, out, ev, ebit):  
    results = ev[-1] / ebit[-1]  
    out[:] = np.where(results.isfinite(), results, FILLVALUE)  

where FILLVALUE would be replaced with whatever value you want to use to replace inf.

You can read more about how np.where works in the Numpy Documentation

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Hi Marco,

Above you are calculating the EV/EBIT ratio. If you reverse the ratio to EBIT/EV, you'll have the EBIT yield on the enterprise value. Since it is using the EV in the denominator, you'll be far less likely to encounter a div by zero.

I've never understood why the ratios like P/E and EV/EBIT are so popular, when the inverse is a simple percentage yield that you can annualize and compare to bond yields, interest rates, dividend yields, etc. All such ratios have the potential for a divide by zero error.

thanks,
fawce

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Fair warning; when calculating EBIT/EV, you need to handle the case of negative EVs, which are technically great bargains, yet will rank improperly compared to the bulk of stocks. In the past, I have floored all EVs at $1, to avoid this problem (and that of zero denominators).

Fair warning; when calculating EBIT/EV, you need to handle the case of negative EVs, which are technically great bargains, yet will rank improperly compared to the bulk of stocks. In the past, I have floored all EVs at $1, to avoid this problem (and that of zero denominators).

For the curious, a convenient and efficient way to do this is to use np.clip:

In [27]: values = np.random.randn(5, 5) * 2

In [28]: values[0, 0] = np.nan  # Add a nan to show that clip preserves nans.

In [29]: values  
Out[29]:  
array([[        nan,  2.66427821,  1.04620424,  3.10569609, -0.07394394],  
       [-1.19532404, -0.1025992 , -1.85837569, -1.08679584, -2.70198082],  
       [ 1.83385781, -0.08093196, -1.42395295, -1.46883062,  1.55708417],  
       [-5.49540581, -2.1154967 , -1.1082758 ,  0.26625189,  1.28044892],  
       [ 0.88452115,  1.71184365,  2.03239484, -4.04879794,  2.16011577]])

In [30]: np.clip(values, a_min=1.0, a_max=None)  # Clip lower bound of the array to 1.0  
Out[30]:  
array([[        nan,  2.66427821,  1.04620424,  3.10569609,  1.        ],  
       [ 1.        ,  1.        ,  1.        ,  1.        ,  1.        ],  
       [ 1.83385781,  1.        ,  1.        ,  1.        ,  1.55708417],  
       [ 1.        ,  1.        ,  1.        ,  1.        ,  1.28044892],  
       [ 1.        ,  1.71184365,  2.03239484,  1.        ,  2.16011577]])  

Note that you still need to handle NaN, either before or after clipping.

I've opened a PR in Zipline that adds built-in support for isnan, notnan, and isfinite methods to all Factor instances. It's a pretty straightforward change, so I expect it to land in Zipline master within a day or two. It should make it downstream to the Quantopian platform shortly after that.

The code changes can be seen here.

Hope that helps,
-Scott

I continue to be impressed by the potential of this platform.

I am trying to do two things with pipeline in factors that I think would be also useful to others. This is particularly relevant to using multiple factors (I am attempting to re-create O'Shaugnessy's Trending Value Portfolio in Pipeline) which adds the decile of 6 different factors and then ranks them

The first is to allocate a decile rank to the factor, as opposed to a simple rank. The second is to set the nan to the median of the selected stock universe.

I can see how to set the nan above, but am not sure if calculating the median in the factor class works or not??
Any suggestions.

Hi Paul,
We don't have an easy way to do the decile rank today, but it's on the list of things to build. I think at this point you would have to do a simple rank and then do pandas wizardry, but someone else might be able to suggest a nicely approach.

Paul--were you ever able to successfully implement the strategy?

I know that it is nowadays possible to check for inf and na values of a factor within the pipeline, but apparently you cannot replace na values within pipeline. Will replacing na values of a factor within the pipeline be implemented anytime soon? Without the replacement feature, it is painful to merge two factors that have plenty of na values into a single combined factor.

A simple example of when this would be useful:

universe = Q500US()  
returns = Returns(mask=universe)  
positive_returns = returns > 0  
negative_returns = returns < 0  
factor_a = sd(returns[positive_returns])  
factor_b = sd(returns[negative_returns])  
combined_factor = factor_a.fillna(0.0) - factor_b.fillna(0.0)  

Currently you cannot do something like this example within the pipeline.