Back to Community
TypeError:MaximizeAlpha() expected a value with dtype 'float64' or 'int64' for argument 'alphas', but got 'object' instead.

Hi All,

I sometimes get this error message in the IDE when running a backtest. Does anyone know the reason for it, and how to fix it? A backtest can be running fine for a while, but then all of a sudden throws out this error message. Does anyone else get this? I tried searching the forum but couldn't find anything.

TypeError:MaximizeAlpha() expected a value with dtype 'float64' or 'int64' for argument 'alphas', but got 'object' instead.

Is the Optimize API receiving a 'string' instead of a 'float' or 'integer'? If so, would this be a 'data input' error, where a data field was entered (incorrectly in the dataset) as a string rather than a float? Or something else?

Assuming it's a 'string issue' how can I either convert the string to a float, or screen out any 'string values'? Here is an example of the 'screen' I'm using in Pipeline (to filter out non-relevant values from my 'alpha_factor' that's passed to MaximizeAlpha()), but still produces above error message.

screen = universe & alpha_factor.notnan() & alpha_factor.notnull() & alpha_factor.isfinite()  

I'd appreciate any help on this. Thanks.

14 responses

Hi Joakim,

One reason why this could be happening is that MaximizeAlpha is receiving an empty Series as input (which defaults to object dtype). An easy way to verify this is to log the size of your input to MaximizeAlpha right before calling it.

You can guard against this error by checking if your pipeline output is empty (using Series.empty) before invoking MaximizeAlpha:

def rebalance(context, data):  
    # Retrieve alpha from pipeline output  
    alpha = context.pipeline_data.sentiment_score

    if not alpha.empty:  
        # Create MaximizeAlpha objective  
        objective = opt.MaximizeAlpha(alpha)  
    ...  

I got this code sample from the Getting Started tutorial.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Hi Ernesto,

Thanks for your help, I really appreciate it. Unfortunately your suggestion didn't fix my problem in this case. I'll see if I can reproduce it in a 'dummy' sample algo that I'll share either here or with Q Support, to see if that will help with troubleshooting.

@Joakim I'd be happy to help if you could share an algo (dummy or otherwise).

Hi Dan,

Great, thank you! Here's one based on the not-so-dummy L/S equity template algo. I've basically just tried to replace the Morningstar fields with the Factset fields for the 'value' factor (may not be entirely accurate, hence why it's 'dummy' :)).

In this case, I'm suspecting that it's due to 'dividing by 0', which may then default to an 'Object'? The reason why I think so is because my 'hack' of adding 0.0000001 to Enterprise value works in this case. If you (or anyone else) know of a good way to avoid risking dividing by 0 (which is oftentimes quite a valid value; 0 Long Term Debt for example is great in my view), I'd be all ears. Maybe winsorizing the final factor would help, but I don't want to winsorize too much.

Also, how do you troubleshoot this stuff? Do you print these values out to the logs? Hoping I'll be able to learn to troubleshooting errors myself so I don't have to ask for help in the forums each time.

Clone Algorithm
7
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
import quantopian.algorithm as algo
import quantopian.optimize as opt
from quantopian.pipeline import Pipeline
from quantopian.pipeline.factors import SimpleMovingAverage

from quantopian.pipeline.filters import QTradableStocksUS
from quantopian.pipeline.experimental import risk_loading_pipeline

from quantopian.pipeline.data import Fundamentals as msf
from quantopian.pipeline.data.factset import Fundamentals as fsf


# Constraint Parameters
MAX_GROSS_LEVERAGE = 1.0
TOTAL_POSITIONS = 2000

# Here we define the maximum position size that can be held for any
# given stock. If you have a different idea of what these maximum
# sizes should be, feel free to change them. Keep in mind that the
# optimizer needs some leeway in order to operate. Namely, if your
# maximum is too small, the optimizer may be overly-constrained.
MAX_SHORT_POSITION_SIZE = 2.0 / TOTAL_POSITIONS
MAX_LONG_POSITION_SIZE = 2.0 / TOTAL_POSITIONS


def initialize(context):
    """
    A core function called automatically once at the beginning of a backtest.

    Use this function for initializing state or other bookkeeping.

    Parameters
    ----------
    context : AlgorithmContext
        An object that can be used to store state that you want to maintain in 
        your algorithm. context is automatically passed to initialize, 
        before_trading_start, handle_data, and any functions run via schedule_function.
        context provides the portfolio attribute, which can be used to retrieve information 
        about current positions.
    """
    
    algo.attach_pipeline(make_pipeline(), 'long_short_equity_template')

    # Attach the pipeline for the risk model factors that we
    # want to neutralize in the optimization step. The 'risk_factors' string is 
    # used to retrieve the output of the pipeline in before_trading_start below.
    algo.attach_pipeline(risk_loading_pipeline(), 'risk_factors')

    # Schedule our rebalance function
    algo.schedule_function(func=rebalance,
                           date_rule=algo.date_rules.every_day(), #week_start(),
                           time_rule=algo.time_rules.market_open(hours=0, minutes=30),
                           half_days=True)

    # Record our portfolio variables at the end of day
    algo.schedule_function(func=record_vars,
                           date_rule=algo.date_rules.every_day(),
                           time_rule=algo.time_rules.market_close(),
                           half_days=True)


def make_pipeline():
    """
    A function that creates and returns our pipeline.

    We break this piece of logic out into its own function to make it easier to
    test and modify in isolation. In particular, this function can be
    copy/pasted into research and run by itself.

    Returns
    -------
    pipe : Pipeline
        Represents computation we would like to perform on the assets that make
        it through the pipeline screen.
    """
    # The factors we create here are based on fundamentals data and a moving
    # average of sentiment data
    # value = msf.ebit.latest / msf.enterprise_value.latest
    # quality = msf.roe.latest
    # sentiment_score = SimpleMovingAverage(
    #     inputs=[stocktwits.bull_minus_bear],
    #     window_length=3,
    # )
    universe = QTradableStocksUS()

    
    ebit_oper_ltm = fsf.ebit_oper_ltm.latest
    ebit_oper_ltm_win = ebit_oper_ltm.winsorize(min_percentile=0.01, max_percentile=0.99)
    
    entrpr_val_qf = fsf.entrpr_val_qf.latest #+ 0.00000001
    entrpr_val_qf_win = entrpr_val_qf.winsorize(min_percentile=0.01, max_percentile=0.99)

    
    fsf_value = ebit_oper_ltm_win / entrpr_val_qf_win

    # Here we combine our winsorized factors, z-scoring them to equalize their influence
    combined_factor = (
        # value_winsorized.zscore() + 
        # quality_winsorized.zscore() + 
        # sentiment_score_winsorized.zscore()
        fsf_value.zscore(mask=universe)
        
    )

    # Build Filters representing the top and bottom baskets of stocks by our
    # combined ranking system. We'll use these as our tradeable universe each
    # day.
    longs = combined_factor.top(TOTAL_POSITIONS//2, mask=universe)
    shorts = combined_factor.bottom(TOTAL_POSITIONS//2, mask=universe)

    # The final output of our pipeline should only include
    # the top/bottom 300 stocks by our criteria
    long_short_screen = (longs | shorts)

    # Create pipeline
    pipe = Pipeline(
        columns={
            'longs': longs,
            'shorts': shorts,
            'combined_factor': combined_factor
        },
        screen=long_short_screen
    )
    return pipe


def before_trading_start(context, data):
    """
    Optional core function called automatically before the open of each market day.

    Parameters
    ----------
    context : AlgorithmContext
        See description above.
    data : BarData
        An object that provides methods to get price and volume data, check
        whether a security exists, and check the last time a security traded.
    """
    # Call algo.pipeline_output to get the output
    # Note: this is a dataframe where the index is the SIDs for all
    # securities to pass my screen and the columns are the factors
    # added to the pipeline object above
    context.pipeline_data = algo.pipeline_output('long_short_equity_template')

    # This dataframe will contain all of our risk loadings
    context.risk_loadings = algo.pipeline_output('risk_factors')


def record_vars(context, data):
    """
    A function scheduled to run every day at market close in order to record
    strategy information.

    Parameters
    ----------
    context : AlgorithmContext
        See description above.
    data : BarData
        See description above.
    """
    # Plot the number of positions over time.
    algo.record(num_positions=len(context.portfolio.positions))


# Called at the start of every month in order to rebalance
# the longs and shorts lists
def rebalance(context, data):
    """
    A function scheduled to run once every Monday at 10AM ET in order to
    rebalance the longs and shorts lists.

    Parameters
    ----------
    context : AlgorithmContext
        See description above.
    data : BarData
        See description above.
    """
    # Retrieve pipeline output
    pipeline_data = context.pipeline_data

    risk_loadings = context.risk_loadings

    # Here we define our objective for the Optimize API. We have
    # selected MaximizeAlpha because we believe our combined factor
    # ranking to be proportional to expected returns. This routine
    # will optimize the expected return of our algorithm, going
    # long on the highest expected return and short on the lowest.
    objective = opt.MaximizeAlpha(pipeline_data.combined_factor)

    # Define the list of constraints
    constraints = []
    # Constrain our maximum gross leverage
    constraints.append(opt.MaxGrossExposure(MAX_GROSS_LEVERAGE))

    # Require our algorithm to remain dollar neutral
    constraints.append(opt.DollarNeutral())

    # Add the RiskModelExposure constraint to make use of the
    # default risk model constraints
    neutralize_risk_factors = opt.experimental.RiskModelExposure(
        risk_model_loadings=risk_loadings,
        version=0
    )
    constraints.append(neutralize_risk_factors)

    # With this constraint we enforce that no position can make up
    # greater than MAX_SHORT_POSITION_SIZE on the short side and
    # no greater than MAX_LONG_POSITION_SIZE on the long side. This
    # ensures that we do not overly concentrate our portfolio in
    # one security or a small subset of securities.
    constraints.append(
        opt.PositionConcentration.with_equal_bounds(
            min=-MAX_SHORT_POSITION_SIZE,
            max=MAX_LONG_POSITION_SIZE
        ))

    # Put together all the pieces we defined above by passing
    # them into the algo.order_optimal_portfolio function. This handles
    # all of our ordering logic, assigning appropriate weights
    # to the securities in our universe to maximize our alpha with
    # respect to the given constraints.
    algo.order_optimal_portfolio(
        objective=objective,
        constraints=constraints
    )
There was a runtime error.

@Joakim You were correct in tracing the problem back to dividing by zero. One way to avoid dividing by zero is to use a mask. In this case maybe something like this:

    entrpr_val_qf = fsf.entrpr_val_qf.latest  
    non_zero_entrpr_val_qf = entrpr_val_qf != 0

    entrpr_val_qf_win = entrpr_val_qf.winsorize(min_percentile=0.01, max_percentile=0.99, mask = non_zero_entrpr_val_qf)

That excludes any 0 values from the calculation and the algo runs without an error. I don't generally do this masking, but it seems good programming practice to always include something like this whenever dividing factors. Good catch!

Clone Algorithm
5
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
import quantopian.algorithm as algo
import quantopian.optimize as opt
from quantopian.pipeline import Pipeline
from quantopian.pipeline.factors import SimpleMovingAverage

from quantopian.pipeline.filters import QTradableStocksUS
from quantopian.pipeline.experimental import risk_loading_pipeline

from quantopian.pipeline.data import Fundamentals as msf
from quantopian.pipeline.data.factset import Fundamentals as fsf


# Constraint Parameters
MAX_GROSS_LEVERAGE = 1.0
TOTAL_POSITIONS = 2000

# Here we define the maximum position size that can be held for any
# given stock. If you have a different idea of what these maximum
# sizes should be, feel free to change them. Keep in mind that the
# optimizer needs some leeway in order to operate. Namely, if your
# maximum is too small, the optimizer may be overly-constrained.
MAX_SHORT_POSITION_SIZE = 2.0 / TOTAL_POSITIONS
MAX_LONG_POSITION_SIZE = 2.0 / TOTAL_POSITIONS


def initialize(context):
    """
    A core function called automatically once at the beginning of a backtest.

    Use this function for initializing state or other bookkeeping.

    Parameters
    ----------
    context : AlgorithmContext
        An object that can be used to store state that you want to maintain in 
        your algorithm. context is automatically passed to initialize, 
        before_trading_start, handle_data, and any functions run via schedule_function.
        context provides the portfolio attribute, which can be used to retrieve information 
        about current positions.
    """
    
    algo.attach_pipeline(make_pipeline(), 'long_short_equity_template')

    # Attach the pipeline for the risk model factors that we
    # want to neutralize in the optimization step. The 'risk_factors' string is 
    # used to retrieve the output of the pipeline in before_trading_start below.
    algo.attach_pipeline(risk_loading_pipeline(), 'risk_factors')

    # Schedule our rebalance function
    algo.schedule_function(func=rebalance,
                           date_rule=algo.date_rules.every_day(), #week_start(),
                           time_rule=algo.time_rules.market_open(hours=0, minutes=30),
                           half_days=True)

    # Record our portfolio variables at the end of day
    algo.schedule_function(func=record_vars,
                           date_rule=algo.date_rules.every_day(),
                           time_rule=algo.time_rules.market_close(),
                           half_days=True)


def make_pipeline():
    """
    A function that creates and returns our pipeline.

    We break this piece of logic out into its own function to make it easier to
    test and modify in isolation. In particular, this function can be
    copy/pasted into research and run by itself.

    Returns
    -------
    pipe : Pipeline
        Represents computation we would like to perform on the assets that make
        it through the pipeline screen.
    """
    # The factors we create here are based on fundamentals data and a moving
    # average of sentiment data
    # value = msf.ebit.latest / msf.enterprise_value.latest
    # quality = msf.roe.latest
    # sentiment_score = SimpleMovingAverage(
    #     inputs=[stocktwits.bull_minus_bear],
    #     window_length=3,
    # )
    universe = QTradableStocksUS()

    
    ebit_oper_ltm = fsf.ebit_oper_ltm.latest
    ebit_oper_ltm_win = ebit_oper_ltm.winsorize(min_percentile=0.01, max_percentile=0.99)
    
    entrpr_val_qf = fsf.entrpr_val_qf.latest #+ 0.00000001
    non_zero_entrpr_val = (entrpr_val_qf > 0) | (entrpr_val_qf < 0)

    entrpr_val_qf_win = entrpr_val_qf.winsorize(min_percentile=0.01, max_percentile=0.99, mask = non_zero_entrpr_val)

    
    fsf_value = ebit_oper_ltm_win / entrpr_val_qf_win

    # Here we combine our winsorized factors, z-scoring them to equalize their influence
    combined_factor = (
        # value_winsorized.zscore() + 
        # quality_winsorized.zscore() + 
        # sentiment_score_winsorized.zscore()
        fsf_value.zscore(mask=universe)
        
    )

    # Build Filters representing the top and bottom baskets of stocks by our
    # combined ranking system. We'll use these as our tradeable universe each
    # day.
    longs = combined_factor.top(TOTAL_POSITIONS//2, mask=universe)
    shorts = combined_factor.bottom(TOTAL_POSITIONS//2, mask=universe)

    # The final output of our pipeline should only include
    # the top/bottom 300 stocks by our criteria
    long_short_screen = (longs | shorts)

    # Create pipeline
    pipe = Pipeline(
        columns={
            'longs': longs,
            'shorts': shorts,
            'combined_factor': combined_factor
        },
        screen=long_short_screen
    )
    return pipe


def before_trading_start(context, data):
    """
    Optional core function called automatically before the open of each market day.

    Parameters
    ----------
    context : AlgorithmContext
        See description above.
    data : BarData
        An object that provides methods to get price and volume data, check
        whether a security exists, and check the last time a security traded.
    """
    # Call algo.pipeline_output to get the output
    # Note: this is a dataframe where the index is the SIDs for all
    # securities to pass my screen and the columns are the factors
    # added to the pipeline object above
    context.pipeline_data = algo.pipeline_output('long_short_equity_template')

    # This dataframe will contain all of our risk loadings
    context.risk_loadings = algo.pipeline_output('risk_factors')


def record_vars(context, data):
    """
    A function scheduled to run every day at market close in order to record
    strategy information.

    Parameters
    ----------
    context : AlgorithmContext
        See description above.
    data : BarData
        See description above.
    """
    # Plot the number of positions over time.
    algo.record(num_positions=len(context.portfolio.positions))


# Called at the start of every month in order to rebalance
# the longs and shorts lists
def rebalance(context, data):
    """
    A function scheduled to run once every Monday at 10AM ET in order to
    rebalance the longs and shorts lists.

    Parameters
    ----------
    context : AlgorithmContext
        See description above.
    data : BarData
        See description above.
    """
    # Retrieve pipeline output
    pipeline_data = context.pipeline_data

    risk_loadings = context.risk_loadings

    # Here we define our objective for the Optimize API. We have
    # selected MaximizeAlpha because we believe our combined factor
    # ranking to be proportional to expected returns. This routine
    # will optimize the expected return of our algorithm, going
    # long on the highest expected return and short on the lowest.
    objective = opt.MaximizeAlpha(pipeline_data.combined_factor)

    # Define the list of constraints
    constraints = []
    # Constrain our maximum gross leverage
    constraints.append(opt.MaxGrossExposure(MAX_GROSS_LEVERAGE))

    # Require our algorithm to remain dollar neutral
    constraints.append(opt.DollarNeutral())

    # Add the RiskModelExposure constraint to make use of the
    # default risk model constraints
    neutralize_risk_factors = opt.experimental.RiskModelExposure(
        risk_model_loadings=risk_loadings,
        version=0
    )
    constraints.append(neutralize_risk_factors)

    # With this constraint we enforce that no position can make up
    # greater than MAX_SHORT_POSITION_SIZE on the short side and
    # no greater than MAX_LONG_POSITION_SIZE on the long side. This
    # ensures that we do not overly concentrate our portfolio in
    # one security or a small subset of securities.
    constraints.append(
        opt.PositionConcentration.with_equal_bounds(
            min=-MAX_SHORT_POSITION_SIZE,
            max=MAX_LONG_POSITION_SIZE
        ))

    # Put together all the pieces we defined above by passing
    # them into the algo.order_optimal_portfolio function. This handles
    # all of our ordering logic, assigning appropriate weights
    # to the securities in our universe to maximize our alpha with
    # respect to the given constraints.
    algo.order_optimal_portfolio(
        objective=objective,
        constraints=constraints
    )
There was a runtime error.

Thanks Dan, super helpful as always!

Do you know if there's any way to use a mask based on dtype != 'object' as well? The reason I'm asking is because I sometimes get this error even for factors where I'm not at risk of dividing by zero. Maybe in the Pipeline 'screen' as more of a 'catchall' filter?

The error

TypeError:MaximizeAlpha() expected a value with dtype 'float64' or 'int64' for argument 'alphas', but got 'object' instead.

is typically because the series passed to 'MaximizeAlpha' is empty. Unless one is doing something fancy like changing what is being passed based on certain conditions, it will always be the same factor. This factor will always have the same dtype values UNLESS it doesn't have any values (ie it's empty). That empty series being presented to 'MaximizeAlpha' is what's causing the error.

One approach is to figure out why it would be empty (in this case a divide by zero) and fix that. A brute force approach (which I often use) however is to simply use a 'try:except' statement. This doesn't fix things but at least keeps the code going. Something like this

    try:  
        objective = opt.MaximizeAlpha(pipeline_data.combined_factor)  
    except:  
        log.info('MaximizeAlpha exception')  
        log.info(pipeline_data.combined_factor)  
        # can't really do the rest so just return  
        return

Take a look at the logs in the attached backtest. Notice the logged value of 'pipeline_data.combined_factor' when there is an exception. It shows:

INFO Series([], Name: combined_factor, dtype: object)

There's the culprit. The series passed to the 'MaximizeAlpha' method is empty.

Again, that doesn't fix it. It's best to now figure out why it would be empty when one is probably always expecting values.

Hope that helps Joakim.

Clone Algorithm
5
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
import quantopian.algorithm as algo
import quantopian.optimize as opt
from quantopian.pipeline import Pipeline
from quantopian.pipeline.factors import SimpleMovingAverage

from quantopian.pipeline.filters import QTradableStocksUS
from quantopian.pipeline.experimental import risk_loading_pipeline

from quantopian.pipeline.data import Fundamentals as msf
from quantopian.pipeline.data.factset import Fundamentals as fsf


# Constraint Parameters
MAX_GROSS_LEVERAGE = 1.0
TOTAL_POSITIONS = 2000

# Here we define the maximum position size that can be held for any
# given stock. If you have a different idea of what these maximum
# sizes should be, feel free to change them. Keep in mind that the
# optimizer needs some leeway in order to operate. Namely, if your
# maximum is too small, the optimizer may be overly-constrained.
MAX_SHORT_POSITION_SIZE = 2.0 / TOTAL_POSITIONS
MAX_LONG_POSITION_SIZE = 2.0 / TOTAL_POSITIONS


def initialize(context):
    """
    A core function called automatically once at the beginning of a backtest.

    Use this function for initializing state or other bookkeeping.

    Parameters
    ----------
    context : AlgorithmContext
        An object that can be used to store state that you want to maintain in 
        your algorithm. context is automatically passed to initialize, 
        before_trading_start, handle_data, and any functions run via schedule_function.
        context provides the portfolio attribute, which can be used to retrieve information 
        about current positions.
    """
    
    algo.attach_pipeline(make_pipeline(), 'long_short_equity_template')

    # Attach the pipeline for the risk model factors that we
    # want to neutralize in the optimization step. The 'risk_factors' string is 
    # used to retrieve the output of the pipeline in before_trading_start below.
    algo.attach_pipeline(risk_loading_pipeline(), 'risk_factors')

    # Schedule our rebalance function
    algo.schedule_function(func=rebalance,
                           date_rule=algo.date_rules.every_day(), #week_start(),
                           time_rule=algo.time_rules.market_open(hours=0, minutes=30),
                           half_days=True)

    # Record our portfolio variables at the end of day
    algo.schedule_function(func=record_vars,
                           date_rule=algo.date_rules.every_day(),
                           time_rule=algo.time_rules.market_close(),
                           half_days=True)


def make_pipeline():
    """
    A function that creates and returns our pipeline.

    We break this piece of logic out into its own function to make it easier to
    test and modify in isolation. In particular, this function can be
    copy/pasted into research and run by itself.

    Returns
    -------
    pipe : Pipeline
        Represents computation we would like to perform on the assets that make
        it through the pipeline screen.
    """
    # The factors we create here are based on fundamentals data and a moving
    # average of sentiment data
    # value = msf.ebit.latest / msf.enterprise_value.latest
    # quality = msf.roe.latest
    # sentiment_score = SimpleMovingAverage(
    #     inputs=[stocktwits.bull_minus_bear],
    #     window_length=3,
    # )
    universe = QTradableStocksUS()

    
    ebit_oper_ltm = fsf.ebit_oper_ltm.latest
    ebit_oper_ltm_win = ebit_oper_ltm.winsorize(min_percentile=0.01, max_percentile=0.99)
    
    entrpr_val_qf = fsf.entrpr_val_qf.latest #+ 0.00000001
    non_zero_entrpr_val = entrpr_val_qf != 0

    entrpr_val_qf_win = entrpr_val_qf.winsorize(min_percentile=0.01, max_percentile=0.99) #, mask = non_zero_entrpr_val)

    
    fsf_value = ebit_oper_ltm_win / entrpr_val_qf_win

    # Here we combine our winsorized factors, z-scoring them to equalize their influence
    combined_factor = (
        # value_winsorized.zscore() + 
        # quality_winsorized.zscore() + 
        # sentiment_score_winsorized.zscore()
        fsf_value.zscore(mask=universe)
        
    )

    # Build Filters representing the top and bottom baskets of stocks by our
    # combined ranking system. We'll use these as our tradeable universe each
    # day.
    longs = combined_factor.top(TOTAL_POSITIONS//2, mask=universe)
    shorts = combined_factor.bottom(TOTAL_POSITIONS//2, mask=universe)

    # The final output of our pipeline should only include
    # the top/bottom 300 stocks by our criteria
    long_short_screen = (longs | shorts)

    # Create pipeline
    pipe = Pipeline(
        columns={
            'longs': longs,
            'shorts': shorts,
            'combined_factor': combined_factor
        },
        screen=long_short_screen
    )
    return pipe


def before_trading_start(context, data):
    """
    Optional core function called automatically before the open of each market day.

    Parameters
    ----------
    context : AlgorithmContext
        See description above.
    data : BarData
        An object that provides methods to get price and volume data, check
        whether a security exists, and check the last time a security traded.
    """
    # Call algo.pipeline_output to get the output
    # Note: this is a dataframe where the index is the SIDs for all
    # securities to pass my screen and the columns are the factors
    # added to the pipeline object above
    context.pipeline_data = algo.pipeline_output('long_short_equity_template')

    # This dataframe will contain all of our risk loadings
    context.risk_loadings = algo.pipeline_output('risk_factors')


def record_vars(context, data):
    """
    A function scheduled to run every day at market close in order to record
    strategy information.

    Parameters
    ----------
    context : AlgorithmContext
        See description above.
    data : BarData
        See description above.
    """
    # Plot the number of positions over time.
    algo.record(num_positions=len(context.portfolio.positions))


# Called at the start of every month in order to rebalance
# the longs and shorts lists
def rebalance(context, data):
    """
    A function scheduled to run once every Monday at 10AM ET in order to
    rebalance the longs and shorts lists.

    Parameters
    ----------
    context : AlgorithmContext
        See description above.
    data : BarData
        See description above.
    """
    # Retrieve pipeline output
    pipeline_data = context.pipeline_data

    risk_loadings = context.risk_loadings

    # Here we define our objective for the Optimize API. We have
    # selected MaximizeAlpha because we believe our combined factor
    # ranking to be proportional to expected returns. This routine
    # will optimize the expected return of our algorithm, going
    # long on the highest expected return and short on the lowest.
    try:
        objective = opt.MaximizeAlpha(pipeline_data.combined_factor)
    except:
        log.info('MaximizeAlpha exception')
        log.info(pipeline_data.combined_factor)
        # can't really do the rest so just return
        return

    # Define the list of constraints
    constraints = []
    # Constrain our maximum gross leverage
    constraints.append(opt.MaxGrossExposure(MAX_GROSS_LEVERAGE))

    # Require our algorithm to remain dollar neutral
    constraints.append(opt.DollarNeutral())

    # Add the RiskModelExposure constraint to make use of the
    # default risk model constraints
    neutralize_risk_factors = opt.experimental.RiskModelExposure(
        risk_model_loadings=risk_loadings,
        version=0
    )
    constraints.append(neutralize_risk_factors)

    # With this constraint we enforce that no position can make up
    # greater than MAX_SHORT_POSITION_SIZE on the short side and
    # no greater than MAX_LONG_POSITION_SIZE on the long side. This
    # ensures that we do not overly concentrate our portfolio in
    # one security or a small subset of securities.
    constraints.append(
        opt.PositionConcentration.with_equal_bounds(
            min=-MAX_SHORT_POSITION_SIZE,
            max=MAX_LONG_POSITION_SIZE
        ))

    # Put together all the pieces we defined above by passing
    # them into the algo.order_optimal_portfolio function. This handles
    # all of our ordering logic, assigning appropriate weights
    # to the securities in our universe to maximize our alpha with
    # respect to the given constraints.
    algo.order_optimal_portfolio(
            objective=objective,
            constraints=constraints
        )
There was a runtime error.

So awesome, thank you Dan!!

One way I've been thinking about how to avoid dividing by zero (as it can oftentimes be a valid value, so I don't want to filter it out too much unless I really have to) is to 'move' a normalized zScore from a range between about -3 and 3 (roughly, winsorized by 1% on each side), to a range of say 1 to 3. This way there wouldn't be any risk of dividing by zero. I could also more easily multiply two factors, which doesn't really work if I just zScore them (e.g two negative zScores multiplied with each other equals a positive number, which is clearly not what I'd want).

Would you know how to do this, or if there's a better way of multiplying/dividing normalized factors, and/or if my above thinking is flawed in some way?

The mask is necessary on these two lines.
Resolves the root problem of longs & shorts both empty and therefore no pipeline output and no alpha for MaximizeAlpha to operate on.
This backtest is narrowed down more for the date where the problem was occurring and shows that it continues through it without error by adding that mask.

    ebit_oper_ltm_win = ebit_oper_ltm.winsorize(min_percentile=0.01, max_percentile=0.99, mask=universe)  
    entrpr_val_qf_win = entrpr_val_qf.winsorize(min_percentile=0.01, max_percentile=0.99, mask=universe)  
Clone Algorithm
1
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
import quantopian.algorithm as algo
import quantopian.optimize as opt
from quantopian.pipeline import Pipeline
from quantopian.pipeline.factors import SimpleMovingAverage

from quantopian.pipeline.filters import QTradableStocksUS
from quantopian.pipeline.experimental import risk_loading_pipeline

from quantopian.pipeline.data import Fundamentals as msf
from quantopian.pipeline.data.factset import Fundamentals as fsf


# Constraint Parameters
MAX_GROSS_LEVERAGE = 1.0
TOTAL_POSITIONS = 2000

# Here we define the maximum position size that can be held for any
# given stock. If you have a different idea of what these maximum
# sizes should be, feel free to change them. Keep in mind that the
# optimizer needs some leeway in order to operate. Namely, if your
# maximum is too small, the optimizer may be overly-constrained.
MAX_SHORT_POSITION_SIZE = 2.0 / TOTAL_POSITIONS
MAX_LONG_POSITION_SIZE = 2.0 / TOTAL_POSITIONS


def initialize(context):
    """
    A core function called automatically once at the beginning of a backtest.

    Use this function for initializing state or other bookkeeping.

    Parameters
    ----------
    context : AlgorithmContext
        An object that can be used to store state that you want to maintain in 
        your algorithm. context is automatically passed to initialize, 
        before_trading_start, handle_data, and any functions run via schedule_function.
        context provides the portfolio attribute, which can be used to retrieve information 
        about current positions.
    """
    
    algo.attach_pipeline(make_pipeline(), 'long_short_equity_template')

    # Attach the pipeline for the risk model factors that we
    # want to neutralize in the optimization step. The 'risk_factors' string is 
    # used to retrieve the output of the pipeline in before_trading_start below.
    algo.attach_pipeline(risk_loading_pipeline(), 'risk_factors')

    # Schedule rebalance function
    algo.schedule_function(func=rebalance,
                           date_rule=algo.date_rules.every_day(), #week_start(),
                           time_rule=algo.time_rules.market_open(hours=0, minutes=30),
                           half_days=True)

    # Record portfolio variables at the end of day
    algo.schedule_function(func=record_vars,
                           date_rule=algo.date_rules.every_day(),
                           time_rule=algo.time_rules.market_close(),
                           half_days=True)


def make_pipeline():
    """
    A function that creates and returns pipeline.

    We break this piece of logic out into its own function to make it easier to
    test and modify in isolation. In particular, this function can be
    copy/pasted into research and run by itself.

    Returns
    -------
    pipe : Pipeline
        Represents computation we would like to perform on the assets that make
        it through the pipeline screen.
    """
    # The factors we create here are based on fundamentals data and a moving
    # average of sentiment data
    # value = msf.ebit.latest / msf.enterprise_value.latest
    # quality = msf.roe.latest
    # sentiment_score = SimpleMovingAverage(
    #     inputs=[stocktwits.bull_minus_bear],
    #     window_length=3,
    # )
    
    universe = QTradableStocksUS()
    
    ebit_oper_ltm = fsf.ebit_oper_ltm.latest
    ebit_oper_ltm_win = ebit_oper_ltm.winsorize(min_percentile=0.01, max_percentile=0.99, mask=universe)
    
    entrpr_val_qf = fsf.entrpr_val_qf.latest #+ 0.00000001
    entrpr_val_qf_win = entrpr_val_qf.winsorize(min_percentile=0.01, max_percentile=0.99, mask=universe)
    
    fsf_value = ebit_oper_ltm_win / entrpr_val_qf_win

    # Here we combine winsorized factors, z-scoring them to equalize their influence
    combined_factor = (
        # value_winsorized.zscore() + 
        # quality_winsorized.zscore() + 
        # sentiment_score_winsorized.zscore()
        fsf_value.zscore(mask=universe)
    )

    # Build Filters representing the top and bottom baskets of stocks by our
    # combined ranking system. We'll use these as tradeable universe each
    # day.
    longs  = combined_factor.top   (TOTAL_POSITIONS//2, mask=universe)
    shorts = combined_factor.bottom(TOTAL_POSITIONS//2, mask=universe)

    # The final output of pipeline should only include
    # the top/bottom 300 stocks by criteria
    long_short_screen = (longs | shorts)

    # Create pipeline
    pipe = Pipeline(
        columns={
            'ebt_ltm'        : ebit_oper_ltm,
            'ebt_win'        : ebit_oper_ltm_win,
            'qf'             : entrpr_val_qf, 
            'qf_win'         : entrpr_val_qf_win, 
            'fsf'            : fsf_value,
            'longs'          : longs,
            'shorts'         : shorts,
            'combined_factor': combined_factor,
        },
        screen=long_short_screen
    )
    return pipe


def before_trading_start(context, data):
    """
    Optional core function called automatically before the open of each market day.

    Parameters
    ----------
    context : AlgorithmContext
        See description above.
    data : BarData
        An object that provides methods to get price and volume data, check
        whether a security exists, and check the last time a security traded.
    """
    # Call algo.pipeline_output to get the output
    # Note: this is a dataframe where the index is the SIDs for all
    # securities to pass screen and the columns are the factors
    # added to the pipeline object above
    context.pipeline_data = algo.pipeline_output('long_short_equity_template')

    # This dataframe will contain all of risk loadings
    context.risk_loadings = algo.pipeline_output('risk_factors')

    if 'log_data_done' not in context:    # show values once
        log_data(context, data, context.pipeline_data, 4)  # all fields (columns) if unspecified

def record_vars(context, data):
    """
    A function scheduled to run every day at market close in order to record
    strategy information.

    Parameters
    ----------
    context : AlgorithmContext
        See description above.
    data : BarData
        See description above.
    """
    # Plot the number of positions over time.
    algo.record(num_positions=len(context.portfolio.positions))


# Called at the start of every month in order to rebalance
# the longs and shorts lists
def rebalance(context, data):
    """
    A function scheduled to run once every Monday at 10AM ET in order to
    rebalance the longs and shorts lists.

    Parameters
    ----------
    context : AlgorithmContext
        See description above.
    data : BarData
        See description above.
    """
    # Retrieve pipeline output
    pipeline_data = context.pipeline_data

    risk_loadings = context.risk_loadings

    # Here we define objective for the Optimize API. We have
    # selected MaximizeAlpha because we believe combined factor
    # ranking to be proportional to expected returns. This routine
    # will optimize the expected return of algorithm, going
    # long on the highest expected return and short on the lowest.
    try:
        objective = opt.MaximizeAlpha(pipeline_data.combined_factor)
    except:
        log_data(context, data, context.pipeline_data, 4)
        print algo.pipeline_output('long_short_equity_template')
        return

    # Define the list of constraints
    constraints = []
    # Constrain maximum gross leverage
    constraints.append(opt.MaxGrossExposure(MAX_GROSS_LEVERAGE))

    # Require algorithm to remain dollar neutral
    constraints.append(opt.DollarNeutral())

    # Add the RiskModelExposure constraint to make use of the
    # default risk model constraints
    neutralize_risk_factors = opt.experimental.RiskModelExposure(
        risk_model_loadings=risk_loadings,
        version=0
    )
    constraints.append(neutralize_risk_factors)

    # With this constraint we enforce that no position can make up
    # greater than MAX_SHORT_POSITION_SIZE on the short side and
    # no greater than MAX_LONG_POSITION_SIZE on the long side. This
    # ensures that we do not overly concentrate portfolio in
    # one security or a small subset of securities.
    constraints.append(
        opt.PositionConcentration.with_equal_bounds(
            min=-MAX_SHORT_POSITION_SIZE,
            max=MAX_LONG_POSITION_SIZE
        ))

    # Put together all the pieces we defined above by passing
    # them into the algo.order_optimal_portfolio function. This handles
    # all of ordering logic, assigning appropriate weights
    # to the securities in universe to maximize alpha with
    # respect to the given constraints.
    algo.order_optimal_portfolio(
        objective=objective,
        constraints=constraints
    )
    
def log_data(context, data, z, num, fields=None):
    ''' Log info about pipeline output or, z can be any DataFrame or Series
    https://www.quantopian.com/posts/overview-of-pipeline-content-easy-to-add-to-your-backtest
    '''
    if 'log_init_done' not in context:  # {:,} magic for adding commas
        log.info('${:,}    {} to {}'.format(int(context.portfolio.starting_cash),
                get_environment('start').date(), get_environment('end').date()))
        context.log_data_done = 1

    if not len(z):
        log.info('Empty')
        return

    # Options
    log_nan_only = 0          # Only log if nans are present
    show_sectors = 0          # If sectors, do you want to see them or not
    show_sorted_details = 1   # [num] high & low securities sorted, each column
    padmax = 6                # num characters for each field, starting point

    # Series ......
    if 'Series' in str(type(z)):    # is Series, not DataFrame
        nan_count = len(z[z != z])
        nan_count = 'NaNs {}/{}'.format(nan_count, len(z)) if nan_count else ''
        if (log_nan_only and nan_count) or not log_nan_only:
            pad = max( padmax, len('%.5f' % z.max()) )
            log.info('{}{}{}   Series  len {}'.format('min'.rjust(pad+5),
                'mean'.rjust(pad+5), 'max'.rjust(pad+5), len(z)))
            log.info('{}{}{} {}'.format(
                ('%.5f' % z.min()) .rjust(pad+5),
                ('%.5f' % z.mean()).rjust(pad+5),
                ('%.5f' % z.max()) .rjust(pad+5),
                nan_count
            ))
            log.info('High\n{}'.format(z.sort_values(ascending=False).head(num)))
            log.info('Low\n{}' .format(z.sort_values(ascending=False).tail(num)))
        return

    # DataFrame ......
    content_min_max = [ ['','min','mean','max',''] ] ; content = ''
    for col in z.columns:
        try: z[col].max()
        except: continue   # skip non-numeric
        if col == 'sector' and not show_sectors: continue
        nan_count = len(z[col][z[col] != z[col]])
        nan_count = 'NaNs {}/{}'.format(nan_count, len(z)) if nan_count else ''
        padmax    = max( padmax, len(str(z[col].max())) )
        content_min_max.append([col, str(z[col] .min()), str(z[col].mean()), str(z[col] .max()), nan_count])
    if log_nan_only and nan_count or not log_nan_only:
        content = 'Rows: {}  Columns: {}'.format(z.shape[0], z.shape[1])
        if len(z.columns) == 1: content = 'Rows: {}'.format(z.shape[0])

        paddings = [6 for i in range(4)]
        for lst in content_min_max:    # set max lengths
            i = 0
            for val in lst[:4]:    # value in each sub-list
                paddings[i] = max(paddings[i], len(str(val)))
                i += 1
        headr = content_min_max[0]
        content += ('\n{}{}{}{}{}'.format(
             headr[0] .rjust(paddings[0]),
            (headr[1]).rjust(paddings[1]+5),
            (headr[2]).rjust(paddings[2]+5),
            (headr[3]).rjust(paddings[3]+5),
            ''
        ))
        for lst in content_min_max[1:]:    # populate content using max lengths
            content += ('\n{}{}{}{}     {}'.format(
                lst[0].rjust(paddings[0]),
                lst[1].rjust(paddings[1]+5),
                lst[2].rjust(paddings[2]+5),
                lst[3].rjust(paddings[3]+5),
                lst[4],
            ))
        log.info(content)

    if not show_sorted_details: return
    if len(z.columns) == 1:     return     # skip detail if only 1 column
    if fields == None: details = z.columns
    for detail in details:
        if detail == 'sector' and not show_sectors: continue
        hi = z[details].sort_values(by=detail, ascending=False).head(num)
        lo = z[details].sort_values(by=detail, ascending=False).tail(num)
        content  = ''
        content += ('_ _ _   {}   _ _ _'  .format(detail))
        content += ('\n\t... {} highs\n{}'.format(detail, str(hi)))
        content += ('\n\t... {} lows \n{}'.format(detail, str(lo)))
        if log_nan_only and not len(lo[lo[detail] != lo[detail]]):
            continue  # skip if no nans
        log.info(content)
    
There was a runtime error.

Perhaps try this, Joakim and see if it stabilises:

try:  
    context.pipeline_data = algo.pipeline_output('long_short_equity_template').replace([np.inf, -np.inf], np.nan)  
    context.pipeline_data.dropna(inplace=True)  
except Exception as message:  
    log.warn("pipeline_data error: {}".format(message))  
    return  

Firstly, replacing any inf values (undesirable values rather than errors from Pipeline) with NaN to be dropped from the set.
Secondly, error handling using try/except so the algorithm continues to run after skipping the session, and reporting the error for diagnosis.

ps: you may also add np.nan in the list to be replaced, and any values for replacement, for example:

df.replace([np.inf, -np.inf, np.nan], 0.999)  

Hope it helps.

pps: Thinking also that perhaps you may want to check this:

screen = universe  & alpha_factor.isfinite()  & ~(alpha_factor.notnan() | alpha_factor.notnull())  

Thanks guys, really appreciate your help!

@Karl,

Why is there essentially a double negative in your screen?

~(alpha_factor.notnan() | alpha_factor.notnull())

~ ('tilde' or 'not' right?) and then .notnan() or .notnull(), so essentially 'not' notnan or 'not' notnull?

I'm 99.9% sure you're correct, I'm just trying to understand the 'logic'. :)

Hi Joakim,

I take the assumption that alpha_factor.notnan() may not happen at the same time as alpha_factor.notnull() so by negating ( alpha_factor.notnan() or alpha_factor.notnull() ) would preclude either/both slipping through your alpha_factor.notnan() & alpha_factor.notnull() Boolean logic.

Hope it makes sense though I may have misinterpreted your intent :)

Try this and see if it helps you see what's going on

    #long_short_screen = (longs | shorts)  
    long_short_screen = ~fsf_value.percentile_between(10, 90, mask=universe)  # just some lows & highs, ~ means not  

With that change, there is output on 8/9.

Then toggle use_the_mask_that_works on line 87 off and on. Partial diff below...

In this first section, notice there are no longs or shorts. That means with the original long_short_screen, the dataframe is empty of course.

use_the_mask_that_works OFF

2011-08-09 07:00 log_data:327 INFO Rows: 6587  Columns: 8  
                               min              mean                max  
combined_factor                nan               nan                nan     NaNs 6587/6587  
        ebt_ltm      -2800180000.0     443824822.892      35156900000.0     NaNs 3886/6587  
        ebt_win       -121594000.0     339883491.163      11939300000.0     NaNs 3886/6587  
            fsf     -252.072043633               inf                inf     NaNs 3962/6587  
          longs              False               0.0              False  
             qf      -5274280000.0     7049287416.09     822583000000.0     NaNs 3212/6587  
         qf_win         -2280840.0     4940088680.93     167153000000.0     NaNs 3212/6587  
         shorts              False               0.0              False  
2011-08-09 07:00 log_data:342 INFO _ _ _   combined_factor   _ _ _  
    ... combined_factor highs  
                      combined_factor     ebt_ltm     ebt_win       fsf  \  
Equity(21 [AAME])                 NaN         NaN         NaN       NaN  
Equity(25 [ARNC_PR])              NaN         NaN         NaN       NaN  
Equity(31 [ABAX])                 NaN  19721000.0  19721000.0  0.034702  
Equity(37 [ABCW])                 NaN         NaN         NaN       NaN  

Notice the difference in fsf here ...

use_the_mask_that_works ON

2011-08-09 07:00 log_data:327 INFO Rows: 6587  Columns: 8  
                                min                mean                max  
combined_factor      -5.80677784964      0.119662267913      20.4170142151     NaNs 6249/6587  
        ebt_ltm       -2800180000.0       448632896.568      35156900000.0     NaNs 3886/6587  
        ebt_win        -145421000.0       550752342.018      12728000000.0     NaNs 6245/6587  
            fsf     -0.396482834994     0.0768264835087      1.69785455888     NaNs 6249/6587  
          longs               False     0.0256565963261               True  
             qf       -5274280000.0       7083996590.61     822583000000.0     NaNs 3212/6587  
         qf_win         338625000.0       11157241586.6     196765000000.0     NaNs 6079/6587  
         shorts               False     0.0256565963261               True  
2011-08-09 07:00 log_data:342 INFO _ _ _   combined_factor   _ _ _  
    ... combined_factor highs  
                      combined_factor      ebt_ltm      ebt_win       fsf  \  
Equity(33807 [IBKR])        20.417014  574936000.0  574936000.0  1.697855  
Equity(36243 [AGNC])        18.013224  509928000.0  509928000.0  1.505878  
Equity(6330 [RAD])           7.835089  234671000.0  234671000.0  0.693011  
Equity(32627 [GTU])          4.957366  156846000.0  156846000.0  0.463185  

Part of it is that without the mask, winsorize is operating on all 8000+ stocks.
Also, with this, down to 540 stocks instead of the 6587 above. I thought this would not be necessary.
screen = universe & long_short_screen
Not an end result answer, just hopefully some info and a test that can help.

Clone Algorithm
1
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
import quantopian.algorithm as algo
import quantopian.optimize as opt
from quantopian.pipeline import Pipeline
from quantopian.pipeline.factors import SimpleMovingAverage

from quantopian.pipeline.filters import QTradableStocksUS
from quantopian.pipeline.experimental import risk_loading_pipeline

from quantopian.pipeline.data import Fundamentals as msf
from quantopian.pipeline.data.factset import Fundamentals as fsf


# Constraint Parameters
MAX_GROSS_LEVERAGE = 1.0
TOTAL_POSITIONS = 2000

# Here we define the maximum position size that can be held for any
# given stock. If you have a different idea of what these maximum
# sizes should be, feel free to change them. Keep in mind that the
# optimizer needs some leeway in order to operate. Namely, if your
# maximum is too small, the optimizer may be overly-constrained.
MAX_SHORT_POSITION_SIZE = 2.0 / TOTAL_POSITIONS
MAX_LONG_POSITION_SIZE = 2.0 / TOTAL_POSITIONS


def initialize(context):
    """
    A core function called automatically once at the beginning of a backtest.

    Use this function for initializing state or other bookkeeping.

    Parameters
    ----------
    context : AlgorithmContext
        An object that can be used to store state that you want to maintain in 
        your algorithm. context is automatically passed to initialize, 
        before_trading_start, handle_data, and any functions run via schedule_function.
        context provides the portfolio attribute, which can be used to retrieve information 
        about current positions.
    """
    
    algo.attach_pipeline(make_pipeline(), 'long_short_equity_template')

    # Attach the pipeline for the risk model factors that we
    # want to neutralize in the optimization step. The 'risk_factors' string is 
    # used to retrieve the output of the pipeline in before_trading_start below.
    algo.attach_pipeline(risk_loading_pipeline(), 'risk_factors')

    # Schedule rebalance function
    algo.schedule_function(func=rebalance,
                           date_rule=algo.date_rules.every_day(), #week_start(),
                           time_rule=algo.time_rules.market_open(hours=0, minutes=30),
                           half_days=True)

    # Record portfolio variables at the end of day
    algo.schedule_function(func=record_vars,
                           date_rule=algo.date_rules.every_day(),
                           time_rule=algo.time_rules.market_close(),
                           half_days=True)


def make_pipeline():
    """
    A function that creates and returns pipeline.

    We break this piece of logic out into its own function to make it easier to
    test and modify in isolation. In particular, this function can be
    copy/pasted into research and run by itself.

    Returns
    -------
    pipe : Pipeline
        Represents computation we would like to perform on the assets that make
        it through the pipeline screen.
    """
    # The factors we create here are based on fundamentals data and a moving
    # average of sentiment data
    # value = msf.ebit.latest / msf.enterprise_value.latest
    # quality = msf.roe.latest
    # sentiment_score = SimpleMovingAverage(
    #     inputs=[stocktwits.bull_minus_bear],
    #     window_length=3,
    # )
    
    
    
    use_the_mask_that_works = 1   # 0 to use the original which fails, or 1 to use the mask
    
    universe = QTradableStocksUS()
    
    ebit_oper_ltm = fsf.ebit_oper_ltm.latest
    entrpr_val_qf = fsf.entrpr_val_qf.latest #+ 0.00000001
    
    if use_the_mask_that_works:    # This works, using the mask in both places
        ebit_oper_ltm_win = ebit_oper_ltm.winsorize(min_percentile=0.01, max_percentile=0.99, mask=universe)
        entrpr_val_qf_win = entrpr_val_qf.winsorize(min_percentile=0.01, max_percentile=0.99, mask=universe)
    elif     not     use_the_mask_that_works:
        ebit_oper_ltm_win = ebit_oper_ltm.winsorize(min_percentile=0.01, max_percentile=0.99)
        entrpr_val_qf_win = entrpr_val_qf.winsorize(min_percentile=0.01, max_percentile=0.99)

    
    
    fsf_value = ebit_oper_ltm_win / entrpr_val_qf_win

    # Here we combine winsorized factors, z-scoring them to equalize their influence
    combined_factor = (
        # value_winsorized.zscore() + 
        # quality_winsorized.zscore() + 
        # sentiment_score_winsorized.zscore()
        #fsf_value.zscore()   # test. also fail if the only change
        fsf_value.zscore(mask=universe)
    )

    # Build Filters representing the top and bottom baskets of stocks by our
    # combined ranking system. We'll use these as tradeable universe each
    # day.
    longs  = combined_factor.top   (TOTAL_POSITIONS//2, mask=universe)
    shorts = combined_factor.bottom(TOTAL_POSITIONS//2, mask=universe)

    # The final output of pipeline should only include
    # the top/bottom 300 stocks by criteria
    #long_short_screen = (longs | shorts)
    long_short_screen = ~fsf_value.percentile_between(10, 90, mask=universe)
    

    # Create pipeline
    pipe = Pipeline(
        columns={
            'ebt_ltm'        : ebit_oper_ltm,
            'ebt_win'        : ebit_oper_ltm_win,
            'qf'             : entrpr_val_qf, 
            'qf_win'         : entrpr_val_qf_win, 
            'fsf'            : fsf_value,
            'longs'          : longs,
            'shorts'         : shorts,
            'combined_factor': combined_factor,
        },
        screen=long_short_screen
    )
    return pipe


def before_trading_start(context, data):
    """
    Optional core function called automatically before the open of each market day.

    Parameters
    ----------
    context : AlgorithmContext
        See description above.
    data : BarData
        An object that provides methods to get price and volume data, check
        whether a security exists, and check the last time a security traded.
    """
    # Call algo.pipeline_output to get the output
    # Note: this is a dataframe where the index is the SIDs for all
    # securities to pass screen and the columns are the factors
    # added to the pipeline object above
    context.pipeline_data = algo.pipeline_output('long_short_equity_template')

    # This dataframe will contain all of risk loadings
    context.risk_loadings = algo.pipeline_output('risk_factors')

    return
    if 'log_data_done' not in context:    # show values once
        log_data(context, data, context.pipeline_data, 4)  # all fields (columns) if unspecified

def record_vars(context, data):
    """
    A function scheduled to run every day at market close in order to record
    strategy information.

    Parameters
    ----------
    context : AlgorithmContext
        See description above.
    data : BarData
        See description above.
    """
    # Plot the number of positions over time.
    algo.record(num_positions=len(context.portfolio.positions))


# Called at the start of every month in order to rebalance
# the longs and shorts lists
def rebalance(context, data):
    """
    A function scheduled to run once every Monday at 10AM ET in order to
    rebalance the longs and shorts lists.

    Parameters
    ----------
    context : AlgorithmContext
        See description above.
    data : BarData
        See description above.
    """
    # Retrieve pipeline output
    pipeline_data = context.pipeline_data

    risk_loadings = context.risk_loadings

    # Here we define objective for the Optimize API. We have
    # selected MaximizeAlpha because we believe combined factor
    # ranking to be proportional to expected returns. This routine
    # will optimize the expected return of algorithm, going
    # long on the highest expected return and short on the lowest.
    try:
        objective = opt.MaximizeAlpha(pipeline_data.combined_factor)
        log.info('ok, pipeline_data.combined_factor len {}'.format(len(pipeline_data.combined_factor)))
    except:
        log_data(context, data, context.pipeline_data, 4)
        log.info( algo.pipeline_output('long_short_equity_template') )
        return

    # Define the list of constraints
    constraints = []
    # Constrain maximum gross leverage
    constraints.append(opt.MaxGrossExposure(MAX_GROSS_LEVERAGE))

    # Require algorithm to remain dollar neutral
    constraints.append(opt.DollarNeutral())

    # Add the RiskModelExposure constraint to make use of the
    # default risk model constraints
    neutralize_risk_factors = opt.experimental.RiskModelExposure(
        risk_model_loadings=risk_loadings,
        version=0
    )
    constraints.append(neutralize_risk_factors)

    # With this constraint we enforce that no position can make up
    # greater than MAX_SHORT_POSITION_SIZE on the short side and
    # no greater than MAX_LONG_POSITION_SIZE on the long side. This
    # ensures that we do not overly concentrate portfolio in
    # one security or a small subset of securities.
    constraints.append(
        opt.PositionConcentration.with_equal_bounds(
            min=-MAX_SHORT_POSITION_SIZE,
            max=MAX_LONG_POSITION_SIZE
        ))

    # Put together all the pieces we defined above by passing
    # them into the algo.order_optimal_portfolio function. This handles
    # all of ordering logic, assigning appropriate weights
    # to the securities in universe to maximize alpha with
    # respect to the given constraints.
    algo.order_optimal_portfolio(
        objective=objective,
        constraints=constraints
    )
    
def log_data(context, data, z, num, fields=None):
    ''' Log info about pipeline output or, z can be any DataFrame or Series
    https://www.quantopian.com/posts/overview-of-pipeline-content-easy-to-add-to-your-backtest
    '''
    # off for clarity here
    if   0 and   'log_init_done' not in context:  # {:,} magic for adding commas
        log.info('${:,}    {} to {}'.format(int(context.portfolio.starting_cash),
                get_environment('start').date(), get_environment('end').date()))
        context.log_data_done = 1

    if not len(z):
        log.info('Empty')
        return

    # Options
    log_nan_only = 0          # Only log if nans are present
    show_sectors = 0          # If sectors, do you want to see them or not
    show_sorted_details = 1   # [num] high & low securities sorted, each column
    padmax = 6                # num characters for each field, starting point

    # Series ......
    if 'Series' in str(type(z)):    # is Series, not DataFrame
        nan_count = len(z[z != z])
        nan_count = 'NaNs {}/{}'.format(nan_count, len(z)) if nan_count else ''
        if (log_nan_only and nan_count) or not log_nan_only:
            pad = max( padmax, len('%.5f' % z.max()) )
            log.info('{}{}{}   Series  len {}'.format('min'.rjust(pad+5),
                'mean'.rjust(pad+5), 'max'.rjust(pad+5), len(z)))
            log.info('{}{}{} {}'.format(
                ('%.5f' % z.min()) .rjust(pad+5),
                ('%.5f' % z.mean()).rjust(pad+5),
                ('%.5f' % z.max()) .rjust(pad+5),
                nan_count
            ))
            log.info('High\n{}'.format(z.sort_values(ascending=False).head(num)))
            log.info('Low\n{}' .format(z.sort_values(ascending=False).tail(num)))
        return

    # DataFrame ......
    content_min_max = [ ['','min','mean','max',''] ] ; content = ''
    for col in z.columns:
        try: z[col].max()
        except: continue   # skip non-numeric
        if col == 'sector' and not show_sectors: continue
        nan_count = len(z[col][z[col] != z[col]])
        nan_count = 'NaNs {}/{}'.format(nan_count, len(z)) if nan_count else ''
        padmax    = max( padmax, len(str(z[col].max())) )
        content_min_max.append([col, str(z[col] .min()), str(z[col].mean()), str(z[col] .max()), nan_count])
    if log_nan_only and nan_count or not log_nan_only:
        content = 'Rows: {}  Columns: {}'.format(z.shape[0], z.shape[1])
        if len(z.columns) == 1: content = 'Rows: {}'.format(z.shape[0])

        paddings = [6 for i in range(4)]
        for lst in content_min_max:    # set max lengths
            i = 0
            for val in lst[:4]:    # value in each sub-list
                paddings[i] = max(paddings[i], len(str(val)))
                i += 1
        headr = content_min_max[0]
        content += ('\n{}{}{}{}{}'.format(
             headr[0] .rjust(paddings[0]),
            (headr[1]).rjust(paddings[1]+5),
            (headr[2]).rjust(paddings[2]+5),
            (headr[3]).rjust(paddings[3]+5),
            ''
        ))
        for lst in content_min_max[1:]:    # populate content using max lengths
            content += ('\n{}{}{}{}     {}'.format(
                lst[0].rjust(paddings[0]),
                lst[1].rjust(paddings[1]+5),
                lst[2].rjust(paddings[2]+5),
                lst[3].rjust(paddings[3]+5),
                lst[4],
            ))
        log.info(content)

    if not show_sorted_details: return
    if len(z.columns) == 1:     return     # skip detail if only 1 column
    if fields == None: details = z.columns
    for detail in details:
        if detail == 'sector' and not show_sectors: continue
        hi = z[details].sort_values(by=detail, ascending=False).head(num)
        lo = z[details].sort_values(by=detail, ascending=False).tail(num)
        content  = ''
        content += ('_ _ _   {}   _ _ _'  .format(detail))
        content += ('\n\t... {} highs\n{}'.format(detail, str(hi)))
        content += ('\n\t... {} lows \n{}'.format(detail, str(lo)))
        if log_nan_only and not len(lo[lo[detail] != lo[detail]]):
            continue  # skip if no nans
        log.info(content)
There was a runtime error.

Or a different route with 1678 stocks

def make_pipeline():  
    m = QTradableStocksUS()

    ebit_oper_ltm = fsf.ebit_oper_ltm.latest  
    entrpr_val_qf = fsf.entrpr_val_qf.latest #+ 0.00000001

    m &= ebit_oper_ltm.notnull()  
    m &= entrpr_val_qf.notnull()

    ebit_oper_ltm_win = ebit_oper_ltm.winsorize(min_percentile=0.01, max_percentile=0.99, mask=m)  
    entrpr_val_qf_win = entrpr_val_qf.winsorize(min_percentile=0.01, max_percentile=0.99, mask=m)

    fsf_value = ebit_oper_ltm_win / entrpr_val_qf_win

    combined_factor = (  
        # value_winsorized.zscore(mask=m) +  
        # quality_winsorized.zscore(mask=m) +  
        # sentiment_score_winsorized.zscore(mask=m)  
        fsf_value.zscore(mask=m)  
    )

    longs  = combined_factor.top   (TOTAL_POSITIONS//2, mask=m)  
    shorts = combined_factor.bottom(TOTAL_POSITIONS//2, mask=m)

    long_short_screen = m & (longs | shorts)

    pipe = Pipeline(  
        columns={  
            'ebt_ltm'        : ebit_oper_ltm,  
            'ebt_win'        : ebit_oper_ltm_win,  
            'qf'             : entrpr_val_qf,  
            'qf_win'         : entrpr_val_qf_win,  
            'fsf'            : fsf_value,  
            'longs'          : longs,  
            'shorts'         : shorts,  
            'combined_factor': combined_factor,  
        },  
        screen = long_short_screen  
    )  
    return pipe