Back to Community
Magic Formula

Magic formula has been discussed in this forum in the past, but not much backtest results have been shared so far. Here I've implemented Greenblatt's strategy, with minor modifications such as filtering out mining and pharmaceutical companies. I've run backtests in segments with different market cap ranges, which showed that eliminating small cap stocks under a billion dollar cap improves the overall return. Small cap baskets tend to get destroyed by a number of companies losing more than 30 percent of their values. As per Greenblatt's remark, the strategy has periods of underperformance compared to S&P, but in long run it does seem to come out slightly ahead.

I'm now interested in comparing the predictive power of fundamental ratios. The original formula weighs return on investment (ROI) and earnings yield (EY) equally, but I found some discussions arguing EY should be weighed more heavily. Ideally I would like to do some regression analysis on these two ratios and other fundamental metrics, but It's difficult to find free historical fundamental data to test this thesis, so I'm wondering if someone here with some experience can chime in.

Clone Algorithm
194
Loading...
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
"""
This is a template algorithm on Quantopian for you to adapt and fill in.
"""
from quantopian.algorithm import attach_pipeline, pipeline_output
from quantopian.pipeline import Pipeline
from quantopian.pipeline.data.builtin import USEquityPricing
from quantopian.pipeline.factors import AverageDollarVolume
from quantopian.pipeline import CustomFactor  
from quantopian.pipeline.data import morningstar
import pandas as pd
import numpy as np


def initialize(context):
    """
    Called once at the start of the algorithm.
    """   
    # Rebalance every day, 1 hour after market open.
    #schedule_function(my_rebalance, date_rules.every_day(), time_rules.market_open(hours=1))
     
    # Record tracking variables at the end of each day.
    #schedule_function(my_record_vars, date_rules.every_day(), time_rules.market_close())
     
    # Create our dynamic stock selector.
    context.capacity = 25.0
    context.weight = 1.0/context.capacity
    context.buy = True
    set_slippage(slippage.FixedSlippage(spread=0.02))
    set_long_only()
    
    #schedule for buying a week after the year start
    schedule_function(func=schedule_task_a,
                      date_rule=date_rules.month_start(4),
                      time_rule=time_rules.market_open())
    #schedule for selling losers a week before the year start
    schedule_function(func=schedule_task_b,
                      date_rule=date_rules.month_end(4),
                      time_rule=time_rules.market_open())
    #schedule for selling winners on the 7th day of year start
    schedule_function(func=schedule_task_c,
                      date_rule=date_rules.month_start(3),
                      time_rule=time_rules.market_close())
                      
    
    
def schedule_task_a(context, data):
    today = get_datetime('US/Eastern')
    if today.month == 1:
        for stock in context.portfolio.positions:
            #print stock
            print stock
        for stock in context.stocks.index:
            order_target_percent(stock, context.weight)
            
#selling losers
def schedule_task_b(context, data):
    today = get_datetime('US/Eastern')
    if today.month == 12 and context.portfolio.positions_value != 0:
        for stock in context.portfolio.positions:
            if context.portfolio.positions[stock].cost_basis > data[stock].price:
                order_target_percent(stock, 0)            
        print today, 'losers sold'

#selling winners
def schedule_task_c(context, data):
    today = get_datetime('US/Eastern')
    if today.month == 1:
        for stock in context.portfolio.positions:
            order_target_percent(stock, 0)
    
    
def before_trading_start(context, data):
    """
    A function to create our dynamic stock selector (pipeline). Documentation on
    pipeline can be found here: https://www.quantopian.com/help#pipeline-title
    """
    
    fundamental_df = get_fundamentals(
        
        query(
            #min market cap at 50 mil, finance and foreign stocks excluded, 
            #earnings yield, return on capital
            fundamentals.asset_classification.morningstar_sector_code,
            fundamentals.income_statement.ebit,
            fundamentals.valuation.enterprise_value, 
            fundamentals.operation_ratios.roic, 
            fundamentals.income_statement.ebitda
            )
    
#    .filter(fundamentals.valuation.market_cap > 50000000)    
#    .filter(fundamentals.valuation.market_cap < 500000000)   

    .filter(fundamentals.valuation.market_cap > 1000000000)    
    .filter(fundamentals.valuation.market_cap < 10000000000)   
    #.filter(fundamentals.valuation_ratios.ev_to_ebitda > 0)
    .filter(fundamentals.asset_classification.morningstar_sector_code != 103)
    .filter(fundamentals.asset_classification.morningstar_sector_code != 207)
    .filter(fundamentals.asset_classification.morningstar_sector_code != 206)
    .filter(fundamentals.asset_classification.morningstar_sector_code != 309)
    .filter(fundamentals.asset_classification.morningstar_industry_code != 20533080)
    .filter(fundamentals.asset_classification.morningstar_industry_code != 10217033)    
    .filter(fundamentals.asset_classification.morningstar_industry_group_code != 10106)
    .filter(fundamentals.asset_classification.morningstar_industry_group_code != 10104)
    .filter(fundamentals.valuation.shares_outstanding != None)
    .filter(fundamentals.valuation.market_cap != None)
    .filter(fundamentals.valuation.shares_outstanding != None)  
    .filter(fundamentals.company_reference.primary_exchange_id != "OTCPK") # no pink sheets
    .filter(fundamentals.company_reference.primary_exchange_id != "OTCBB") # no pink sheets
    .filter(fundamentals.company_reference.country_id == "USA")
    .filter(fundamentals.asset_classification.morningstar_sector_code != None) # require sector
    .filter(fundamentals.share_class_reference.is_primary_share == True) # remove ancillary classes
    .filter(((fundamentals.valuation.market_cap*1.0) / (fundamentals.valuation.shares_outstanding*1.0)) > 10.0)  # stock price > $1
    .filter(fundamentals.share_class_reference.is_depositary_receipt == False) # !ADR/GDR
    .filter(~fundamentals.company_reference.standard_name.contains(' LP')) # exclude LPs
    .filter(~fundamentals.company_reference.standard_name.contains(' L P'))
    .filter(~fundamentals.company_reference.standard_name.contains(' L.P'))
    .filter(fundamentals.balance_sheet.limited_partnership == None) # exclude LPs

        
    #.order_by(fundamentals.valuation_ratios.ev_to_ebitda.asc())
    )
    fundamental_df.loc['earnings_yield'] = fundamental_df.loc['ebit']/fundamental_df.loc['enterprise_value']
    #print fundamental_df.loc['ebit'], fundamental_df.loc['enterprise_value'], fundamental_df.loc['earnings_yield']                                                                                         
    #rank the companies based on their earnings yield
    earnings_yield = fundamental_df
    ey = earnings_yield.loc['earnings_yield']
    rank_ey = ey.rank(ascending = 0)
    #rank the companies based on the return on capital
    rank_roic = fundamental_df.loc['roic'].rank(ascending = 0)
    total_rank = rank_ey + rank_roic
    sorted_rank = total_rank.sort_values()
    
    
    print ey, rank_ey
    print rank_roic, total_rank
    print sorted_rank
    #get 100 best stocks 
    context.stocks = sorted_rank[0:int(context.capacity)]
    
There was a runtime error.
14 responses

Hi, Jonh.
Thank you for sharing your code. It was the great starting point for me in learning Quantopian.
But, It seems that we can't use any more get_fundamentals() which is in your code.
I tried to modify your code that uses pipeline and I backtested it for same period.

I just modified 3 lines and add make_pipeline() in your code.
modified below.

  1. for stock in context.stocks.index: to for stock in context.output.index: (line 52 in your code) (To use pipeline's output.)
  2. data[stock].price: to data.current(stock, 'price'): (line 60 in your code) ( Because it looks that data[stock].price was deprecated.)
  3. context.output=pipeline_output('my_pipeline').sort_values(by='MF_rank', ascending=True).head(int(context.capacity)) (before_trading_start() in your code)

added below.

  1. my_pipe = make_pipeline()
  2. attach_pipeline(my_pipe, 'my_pipeline')
    (initialize() in your code)
  3. make_pipeline(): ...

But the result had some quite difference.(My result got low returns versus Benchmark's and got high MaxDrawdown.)

Could you please look at my code and help me?

Clone Algorithm
28
Loading...
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
"""
This is a template algorithm on Quantopian for you to adapt and fill in.
"""
from quantopian.algorithm import attach_pipeline, pipeline_output
from quantopian.pipeline import Pipeline
from quantopian.pipeline.data.builtin import USEquityPricing
from quantopian.pipeline.factors import AverageDollarVolume
from quantopian.pipeline import CustomFactor  
from quantopian.pipeline.data import morningstar, Fundamentals
from quantopian.pipeline.filters.morningstar import IsPrimaryShare, IsDepositaryReceipt
import pandas as pd
import numpy as np


def initialize(context):
    """
    Called once at the start of the algorithm.
    """   
    # Rebalance every day, 1 hour after market open.
    #schedule_function(my_rebalance, date_rules.every_day(), time_rules.market_open(hours=1))
     
    # Record tracking variables at the end of each day.
    #schedule_function(my_record_vars, date_rules.every_day(), time_rules.market_close())
     
    # Create our dynamic stock selector.
    context.capacity = 25.0
    context.weight = 1.0/context.capacity
    context.buy = True
    
    my_pipe = make_pipeline()
    attach_pipeline(my_pipe, 'my_pipeline')
    
    set_slippage(slippage.FixedSlippage(spread=0.02))
    set_long_only()
    
    #schedule for buying a week after the year start
    schedule_function(func=schedule_task_a,
                      date_rule=date_rules.month_start(4),
                      time_rule=time_rules.market_open())
    #schedule for selling losers a week before the year start
    schedule_function(func=schedule_task_b,
                      date_rule=date_rules.month_end(4),
                      time_rule=time_rules.market_open())
    #schedule for selling winners on the 7th day of year start
    schedule_function(func=schedule_task_c,
                      date_rule=date_rules.month_start(3),
                      time_rule=time_rules.market_close())
    
def schedule_task_a(context, data):
    today = get_datetime('US/Eastern')
    if today.month == 1:
        for stock in context.portfolio.positions:
            #print stock
            print stock
        for stock in context.output.index:
            order_target_percent(stock, context.weight)
            
#selling losers
def schedule_task_b(context, data):
    today = get_datetime('US/Eastern')
    if today.month == 12 and context.portfolio.positions_value != 0:
        for stock in context.portfolio.positions:
            if context.portfolio.positions[stock].cost_basis > data.current(stock, 'price'):
                order_target_percent(stock, 0)            
        print today, 'losers sold'

#selling winners
def schedule_task_c(context, data):
    today = get_datetime('US/Eastern')
    if today.month == 1:
        for stock in context.portfolio.positions:
            order_target_percent(stock, 0)

def make_pipeline():
    not_lp_name = ~Fundamentals.standard_name.latest.matches('.* L[. ]?P.?$')
    is_primary_share = IsPrimaryShare()
    is_not_depositary_receipt = ~IsDepositaryReceipt()
    
    filter_market_cap = (Fundamentals.market_cap.latest > 1000000000) & (Fundamentals.market_cap.latest < 10000000000)
    
    filter_sectors = (
        (Fundamentals.morningstar_sector_code.latest != 103) &
        (Fundamentals.morningstar_sector_code.latest != 207) &
        (Fundamentals.morningstar_sector_code.latest != 206) &
        (Fundamentals.morningstar_sector_code.latest != 309) &
        (Fundamentals.morningstar_industry_code.latest != 20533080) &
        (Fundamentals.morningstar_industry_code.latest != 10217033) &
        (Fundamentals.morningstar_industry_group_code != 10106) &
        (Fundamentals.morningstar_industry_group_code != 10104)
    ) & (filter_market_cap)
    
    filter_plus = (
        Fundamentals.shares_outstanding.latest.notnull() &
        Fundamentals.market_cap.latest.notnull() &
        (Fundamentals.primary_exchange_id != "OTCPK") &
        (Fundamentals.primary_exchange_id != "OTCBB") &
        Fundamentals.country_id.latest.matches("USA") &
        Fundamentals.morningstar_sector_code.latest.notnull() &
        (USEquityPricing.close.latest > 10.0) & 
        not_lp_name & 
        is_primary_share & 
        is_not_depositary_receipt
    ) & (filter_sectors)
    
    earnings_yield = Fundamentals.ebit.latest/Fundamentals.enterprise_value.latest
    EY_rank = earnings_yield.rank(ascending=False)
    roic = Fundamentals.roic.latest
    roic_rank = roic.rank(ascending=False)
    MF_rank = EY_rank + roic_rank
    
    pipe = Pipeline(columns = {
        'earnings_yield': earnings_yield,
        'roic': roic,
        'MF_rank': MF_rank,
    } ,screen = filter_plus )
    return pipe
    
def before_trading_start(context, data):
    """
    A function to create our dynamic stock selector (pipeline). Documentation on
    pipeline can be found here: https://www.quantopian.com/help#pipeline-title
    """
    context.output=pipeline_output('my_pipeline').sort_values(by='MF_rank', ascending=True).head(int(context.capacity))
    # context.stocks = sorted_rank[0:int(context.capacity)]
There was a runtime error.

Best to rank using the mask:

EY_rank = earnings_yield.rank(ascending=False, mask=filter_plus)  
roic_rank = roic.rank(ascending=False, mask=filter_plus)

Remove close.latest not in original.

This is a little different.

Clone Algorithm
173
Loading...
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
from quantopian.pipeline.classifiers.fundamentals import Sector
from quantopian.algorithm import attach_pipeline, pipeline_output
from quantopian.pipeline import Pipeline
from quantopian.pipeline.data.builtin import USEquityPricing
from quantopian.pipeline.factors import AverageDollarVolume
from quantopian.pipeline import CustomFactor  
from quantopian.pipeline.filters import Q500US, Q1500US, Q3000US, QTradableStocksUS
from quantopian.pipeline.data import morningstar, Fundamentals
from quantopian.pipeline.filters.morningstar import IsPrimaryShare, IsDepositaryReceipt
import pandas as pd
import numpy as np

def initialize(context):
    # Rebalance every day, 1 hour after market open.
    #schedule_function(my_rebalance, date_rules.every_day(), time_rules.market_open(hours=1))
     
    # Record tracking variables at the end of each day.
    #schedule_function(my_record_vars, date_rules.every_day(), time_rules.market_close())
     
    # Create our dynamic stock selector.
    context.capacity = 25
    context.buy = True
    
    my_pipe = make_pipeline()
    attach_pipeline(my_pipe, 'my_pipeline')
    
    set_slippage(slippage.FixedSlippage(spread=0.02))
    set_long_only()
    
    #schedule for buying a week after the year start
    schedule_function(func=schedule_task_a,
                      date_rule=date_rules.month_start(4),
                      time_rule=time_rules.market_open())
    #schedule for selling losers a week before the year start
    schedule_function(func=schedule_task_b,
                      date_rule=date_rules.month_end(4),
                      time_rule=time_rules.market_open())
    #schedule for selling winners on the 7th day of year start
    schedule_function(func=schedule_task_c,
                      date_rule=date_rules.month_start(3),
                      time_rule=time_rules.market_close())
    
def schedule_task_a(context, data):
    today = get_datetime('US/Eastern')
    if today.month == 1:
        for stock in context.portfolio.positions:
            #print stock
            print stock
        for stock in context.output.index:
            order_target_percent(stock, context.output.T[stock]['weight']) # T is Transform, there is a better way.  .iloc or .ix or something
            #order_target_percent(stock, context.weight)
            
#selling losers
def schedule_task_b(context, data):
    today = get_datetime('US/Eastern')
    if today.month == 12 and context.portfolio.positions_value != 0:
        for stock in context.portfolio.positions:
            if context.portfolio.positions[stock].cost_basis > data.current(stock, 'price'):
                order_target_percent(stock, 0)            
        print today, 'losers sold'

#selling winners
def schedule_task_c(context, data):
    today = get_datetime('US/Eastern')
    if today.month == 1:
        for stock in context.portfolio.positions:
            order_target_percent(stock, 0)

def make_pipeline():
    m  = QTradableStocksUS() & Sector().notnull()  # m for mask
    m &= (Fundamentals.market_cap.latest > 1000000000) 
    m &= (Fundamentals.market_cap.latest < 10000000000)
    m &= (
        (Fundamentals.morningstar_sector_code.latest != 103) &
        (Fundamentals.morningstar_sector_code.latest != 207) &
        (Fundamentals.morningstar_sector_code.latest != 206) &
        (Fundamentals.morningstar_sector_code.latest != 309) &
        (Fundamentals.morningstar_industry_code.latest != 20533080) &
        (Fundamentals.morningstar_industry_code.latest != 10217033) &
        (Fundamentals.morningstar_industry_group_code != 10106) &
        (Fundamentals.morningstar_industry_group_code != 10104)
    )
    
    earnings_yield = Fundamentals.ebit.latest/Fundamentals.enterprise_value.latest
    roic      = Fundamentals.roic.latest
    EY_rank   = earnings_yield.rank(ascending=False, mask=m)
    roic_rank = roic          .rank(ascending=False, mask=m)
    MF_rank   = EY_rank + roic_rank
    
    pipe = Pipeline(columns = {
        'earnings_yield': earnings_yield,
        'roic'          : roic,
        'MF_rank'       : MF_rank,
    }, screen = m )
    return pipe
    
def before_trading_start(context, data):
    context.output=pipeline_output('my_pipeline').sort_values(by='MF_rank', ascending=True).head(int(context.capacity)).dropna()
    # context.stocks = sorted_rank[0:int(context.capacity)]
    
    # weight as rank normalize 0 to 1
    context.output['weight'] = context.output['MF_rank'] / context.output['MF_rank'].sum()
    
    #context.weight = 1.0 / len(context.output)
    
    
There was a runtime error.

Thanks, Blue Seahawk.
Your answer is really helpful for me!

@K, thank you for mentioning it, and here are some more possibilities then too.
Mainly run this and take a look at the log window. Visibility into the pipeline values.
This is just a start toward adding short if you wish, it would need some work.

The focus here is to provide options, tools & flexibility. For example, class Wild() for quick development in trying things, normalization of positive and negative weights separately to be able to add shorting if you wish (that's where things went south with that last minute addition of norm()), logging of pipeline min, mean, max and some highs & lows, forward filling of nans (addition of class was necessary there, to have a window to work with, rather than just latest), an example of percentile_between you might want to try some numbers in, examples of zscore, demean (as one way to obtain some negative values for short shares), a little bit more efficient route for 'today', the efficient pnl determination for long & short simultaneously helps makes an addition of shorting easier to work with if interested in moving toward qualifying in the contest for example.

Returns were not the point so this backtest is only a few days. Rather than going with this algo, you could use it to copy/paste various bits over to yours in trying some things.

Clone Algorithm
173
Loading...
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
''' https://www.quantopian.com/posts/magic-formula
    Bit of a mess made this time and yet some things to think about, raw materials to work with.
'''

from quantopian.algorithm import attach_pipeline, pipeline_output
from quantopian.pipeline  import CustomFactor
from quantopian.pipeline  import Pipeline
from quantopian.pipeline.data         import Fundamentals
from quantopian.pipeline.data.builtin import USEquityPricing
from quantopian.pipeline.filters      import Q500US, Q1500US, Q3000US, QTradableStocksUS
from quantopian.pipeline.classifiers.fundamentals import Sector
import numpy  as np
import pandas as pd

def initialize(context):
    # Trade every day, 1 hour after market open.
    #schedule_function(trade, date_rules.every_day(), time_rules.market_open(hours=1))

    # Record tracking variables at the end of each day.
    #schedule_function(records, date_rules.every_day(), time_rules.market_close())

    # Create dynamic stock selector.
    context.capacity = 125
    context.buy = True

    pipe = make_pipeline()
    attach_pipeline(pipe, 'pipeline')

    #set_slippage(slippage.FixedSlippage(spread=0.02))
    #set_long_only()

    # Buying a week after the year start
    schedule_function(opens,         date_rules.month_start(4), time_rules.market_open())
    # Selling losers a week before the year start
    schedule_function(close_losers,  date_rules.  month_end(4), time_rules.market_open())
    # Selling winners on the nth day of year start
    schedule_function(close_winners, date_rules.month_start(3), time_rules.market_close())

def opens(context, data):            # opening positions
    if context.today.month != 1: return
    
    log.info('current: {}'.format([s.symbol for s in context.portfolio.positions]))
    opening = []
    for s in context.output.index:
        if not data.can_trade(s): continue
            
        # with norm() .....
        # donno how these can ever be missing. Uncomment and investigate with debugger eh.
        if s not in context.weights.index: continue    
        order_target_percent(s, context.weights[s])
        
        #order_target_percent(s, context.output.T[s]['weight']) # T is Transform, there is a better way.  .iloc or .ix or something
        #order_target_percent(s, context.weight)
        opening.append(s.symbol)
    log.info('opening: {}'.format(str(opening)))

def close_losers(context, data):     # selling losers
    if context.today.month != 12 or context.portfolio.positions_value == 0:
        return

    '''
    A way to handle cost basic vs price relationship in both short & long at the same time
        pos = context.portfolio.positions
        amt = pos[s].amount
        pnl = amt * (data.current(s, 'price') - pos[s].cost_basis)
    '''
    
    pos = context.portfolio.positions
    
    for s in pos:
        #if pos[s].cost_basis > data.current(s, 'price'):
        #    order_target_percent(s, 0)
        pnl = pos[s].amount * (data.current(s, 'price') - pos[s].cost_basis)
        if pnl < 0:
            order_target(s, 0)
    print context.today, 'losers sold'

def close_winners(context, data):    # selling winners
    if context.today.month != 1: return
    
    #for s in context.portfolio.positions:
    #    order_target_percent(s, 0)
    
    pos = context.portfolio.positions
    
    for s in pos:
        #if pos[s].cost_basis > data.current(s, 'price'):
        #    order_target_percent(s, 0)
        pnl = pos[s].amount * (data.current(s, 'price') - pos[s].cost_basis)
        if pnl > 0:
            order_target(s, 0)
    
def make_pipeline():
    m  = QTradableStocksUS() & Sector().notnull()  # m for mask
    m &= (Fundamentals.market_cap.latest > 1000000000)
    m &= (Fundamentals.market_cap.latest < 10000000000)
    m &= (
        (Fundamentals.morningstar_sector_code.latest   != 103) &
        (Fundamentals.morningstar_sector_code.latest   != 207) &
        (Fundamentals.morningstar_sector_code.latest   != 206) &
        (Fundamentals.morningstar_sector_code.latest   != 309) &
        (Fundamentals.morningstar_industry_code.latest != 20533080) &
        (Fundamentals.morningstar_industry_code.latest != 10217033) &
        (Fundamentals.morningstar_industry_group_code  != 10106) &
        (Fundamentals.morningstar_industry_group_code  != 10104)
    )

    #earnings_yield = Fundamentals.ebit.latest / Fundamentals.enterprise_value.latest
    #earnings_yield = EBITPerEV(mask=m) ; m &= (earnings_yield > 0)                        # 124
    earnings_yield = EBITPerEV(mask=m) ; #m &= (earnings_yield.percentile_between(70, 95)) # 141
    #earnings_yield = Fundamentals.earning_yield.latest.zscore(mask=m) ; m &= (earnings_yield.percentile_between(70, 95))  # 114
    #roic      = Fundamentals.roic.latest
    roic      = ROIC(mask=m)
    EY_rank   = earnings_yield       .rank(ascending=False, mask=m)
    roic_rank = roic                 .rank(ascending=False, mask=m)
    #MF_rank   = (EY_rank + roic_rank).rank(ascending=False, mask=m)
    MF_rank   = (EY_rank + roic_rank).rank(ascending=False, mask=m).demean() # use with norm()
    
    ''' Original before MF_rank re-rank ...
                                min                mean                 max
       MF_rank                  7.0               79.76               129.0     
earnings_yield      0.0291078522234     0.0569397883902      0.108915379496     
          roic             0.051941          0.12467516            0.414196     
        weight     0.00351053159478                0.04     0.0646940822467     
    '''

    pipe = Pipeline(columns = {
        'earnings_yield': earnings_yield,
        'roic'          : roic,
        'MF_rank'       : MF_rank,
    }, screen = m )
    return pipe

def before_trading_start(context, data):
    context.today = get_datetime('US/Eastern')

    context.output = pipeline_output('pipeline').sort_values(by='MF_rank', ascending=True).head(int(context.capacity)).dropna()
    # context.stocks = sorted_rank[0:int(context.capacity)]

    # weight as rank normalize 0 to 1
    #context.output['weight'] = context.output['MF_rank'] / context.output['MF_rank'].sum()
    
    # Adding norm() was a last minute thing with disastrous consequences.
    #   You might want to go back to see if you can rescue it or
    #     go back to context.output['weight'] = ... above
    # demean moves down so middle is around zero, for norm to have pos & neg to chew on.
    context.weights = norm(context, context.output['MF_rank'])

    #context.weight = 1.0 / len(context.output)

    if 'log_pipe_done' not in context:    # show pipe info once
        log_pipe(context, data, context.output, 4)
        #log_pipe(context, data, context.output, 4, filter=['alpha', 'beta', ... or what-have-you])

def norm(c, d):    # d data, it's a series, normalize it pos & neg
    # A different normalization method that handles pos, neg separately for long, short weights
    if d.min() >= 0:
        d -= d.mean()        
    pos = d[ d > 0 ]
    neg = d[ d < 0 ]    
    if   not len(pos) and len(neg):
        d = neg - neg.mean()
    elif not len(neg) and len(pos):
        d = pos - pos.mean()
    pos  = d[ d > 0 ]
    neg  = d[ d < 0 ]    
    num  = min(len(pos), len(neg))
    pos  = pos.sort_values(ascending=False).head(num)
    neg  = neg.sort_values(ascending=False).tail(num)    
    pos /=  pos.sum()
    neg  = -(neg / neg.sum())    
    return pos.append(neg)

class ROIC(CustomFactor): 
    inputs = [Fundamentals.roic] ; window_length = 252
    def compute(self, today, assets, out, roic):
        roic = nanfill(roic)
        out[:] = np.mean(roic, axis=0)

class Wild(CustomFactor):
    # Intended for the default input (roic) to be overridden with any other fundamental like
    #  fcf = Wild(inputs=[Fundamentals.fcf_yield], mask=m)
    #    or
    #  fcf = Wild(inputs=[Fundamentals.fcf_yield], window_length=88, mask=m)
    inputs = [Fundamentals.roic] ; window_length = 252
    def compute(self, today, assets, out, z):
        out[:] = np.mean(nanfill(z), axis=0)        # mean, avg

class EBITPerEV(CustomFactor):
    inputs = [Fundamentals.ebit, Fundamentals.enterprise_value]; window_length = 144
    def compute(self, today, assets, out, ebit, ev):
        ebit = nanfill(ebit)
        ev   = nanfill(ev)
        out[:] = np.mean(ebit, axis=0) / np.mean(ev, axis=0)
        #out[:] = ebit[-1] / ev[-1]
        
def nanfill(_in):    # https://stackoverflow.com/questions/41190852/most-efficient-way-to-forward-fill-nan-values-in-numpy-array
    do_nanfill = 1        # set to 0 for an interesting test of the difference or no diff.
    if not do_nanfill:
        return _in
    # Forward-fill missing values
    mask = np.isnan(_in)
    idx = np.where(~mask,np.arange(mask.shape[1]),0)
    np.maximum.accumulate(idx,axis=1, out=idx)
    _in[mask] = _in[np.nonzero(mask)[0], idx[mask]]
    return _in

def log_pipe(context, data, z, num, filter=None):
    ''' Log info about pipeline output or, z can be any DataFrame or Series
    https://www.quantopian.com/posts/overview-of-pipeline-content-easy-to-add-to-your-backtest
    '''
    # Options
    log_nan_only = 0          # Only log if nans are present
    show_sectors = 0          # If sectors, do you want to see them or not
    show_sorted_details = 1   # [num] high & low securities sorted, each column

    if 'log_init_done' not in context:
        log.info('${}    {} to {}'.format('%.0e' % (context.portfolio.starting_cash),
                get_environment('start').date(), get_environment('end').date()))
    context.log_init_done = 1

    if not len(z):
        log.info('Empty')
        return

    # Series ......
    context.log_pipe_done = 1 ; padmax = 6
    if 'Series' in str(type(z)):    # is Series, not DataFrame
        nan_count = len(z[z != z])
        nan_count = 'NaNs {}/{}'.format(nan_count, len(z)) if nan_count else ''
        if (log_nan_only and nan_count) or not log_nan_only:
            pad = max(6, len(str(z.max())))
            log.info('{}{}{}   Series {}  len {}'.format('min' .rjust(pad+5),
                'mean'.rjust(pad+5), 'max' .rjust(pad+5),  z.name, len(z)))
            log.info('{}{}{} {}'.format(str(z.min()) .rjust(pad+5),
                str(z.mean()).rjust(pad+5), str(z.max()) .rjust(pad+5), nan_count
            ))
        return

    # DataFrame ......
    content_min_max = [ ['','min','mean','max',''] ] ; content = ''
    for col in z.columns:
        if col == 'sector' and not show_sectors: continue
        nan_count = len(z[col][z[col] != z[col]])
        nan_count = 'NaNs {}/{}'.format(nan_count, len(z)) if nan_count else ''
        padmax    = max( padmax, 6, len(str(z[col].max())) )
        content_min_max.append([col, str(z[col] .min()), str(z[col].mean()), str(z[col] .max()), nan_count])
    if log_nan_only and nan_count or not log_nan_only:
        content = 'Rows: {}  Columns: {}'.format(z.shape[0], z.shape[1])
        if len(z.columns) == 1: content = 'Rows: {}'.format(z.shape[0])

        paddings = [6 for i in range(4)]
        for lst in content_min_max:    # set max lengths
            i = 0
            for val in lst[:4]:    # value in each sub-list
                paddings[i] = max(paddings[i], len(str(val)))
                i += 1
        headr = content_min_max[0]
        content += ('\n{}{}{}{}{}'.format(
             headr[0] .rjust(paddings[0]),
            (headr[1]).rjust(paddings[1]+5),
            (headr[2]).rjust(paddings[2]+5),
            (headr[3]).rjust(paddings[3]+5),
            ''
        ))
        for lst in content_min_max[1:]:    # populate content using max lengths
            content += ('\n{}{}{}{}     {}'.format(
                lst[0].rjust(paddings[0]),
                lst[1].rjust(paddings[1]+5),
                lst[2].rjust(paddings[2]+5),
                lst[3].rjust(paddings[3]+5),
                lst[4],
            ))
        log.info(content)

    if not show_sorted_details: return
    if len(z.columns) == 1:     return     # skip detail if only 1 column
    if filter == None: details = z.columns
    for detail in details:
        if detail == 'sector': continue
        hi = z[details].sort_values(by=detail, ascending=False).head(num)
        lo = z[details].sort_values(by=detail, ascending=False).tail(num)
        content  = ''
        content += ('_ _ _   {}   _ _ _'  .format(detail))
        content += ('\n\t... {} highs\n{}'.format(detail, str(hi)))
        content += ('\n\t... {} lows \n{}'.format(detail, str(lo)))
        if log_nan_only and not len(lo[lo[detail] != lo[detail]]):
            continue  # skip if no nans
        log.info(content)
There was a runtime error.

Why are you ascending=False?
EY_rank = earnings_yield.rank(ascending=False, mask=filter_plus)
Wouldn't the higher earnings yield be better value?

This strategy is so weird. If you change the "ascending" setting in the line 98 in Blue's code as below, it means that you are selecting the worst companies in the output list (~750 companies) to go Long. The performance is still pretty good. However, if you go Short, it performs terribly.

context.output=pipeline_output('my_pipeline').sort_values(by='MF_rank', ascending=True).head(int(context.capacity)).dropna()  

to

context.output=pipeline_output('my_pipeline').sort_values(by='MF_rank', ascending=False).head(int(context.capacity)).dropna()  
Clone Algorithm
4
Loading...
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
from quantopian.pipeline.classifiers.fundamentals import Sector
from quantopian.algorithm import attach_pipeline, pipeline_output
from quantopian.pipeline import Pipeline
from quantopian.pipeline.data.builtin import USEquityPricing
from quantopian.pipeline.factors import AverageDollarVolume
from quantopian.pipeline import CustomFactor  
from quantopian.pipeline.filters import Q500US, Q1500US, Q3000US, QTradableStocksUS
from quantopian.pipeline.data import morningstar, Fundamentals
from quantopian.pipeline.filters.morningstar import IsPrimaryShare, IsDepositaryReceipt
import pandas as pd
import numpy as np
 
def initialize(context):
    # Rebalance every day, 1 hour after market open.
    #schedule_function(my_rebalance, date_rules.every_day(), time_rules.market_open(hours=1))
     
    # Record tracking variables at the end of each day.
    #schedule_function(my_record_vars, date_rules.every_day(), time_rules.market_close())
     
    # Create our dynamic stock selector.
    context.capacity = 25
    context.buy = True
    
    my_pipe = make_pipeline()
    attach_pipeline(my_pipe, 'my_pipeline')
    
    set_slippage(slippage.FixedSlippage(spread=0.02))
    set_long_only()
    
    #schedule for buying a week after the year start
    schedule_function(func=schedule_task_a,
                      date_rule=date_rules.month_start(4),
                      time_rule=time_rules.market_open())
    #schedule for selling losers a week before the year start
    schedule_function(func=schedule_task_b,
                      date_rule=date_rules.month_end(4),
                      time_rule=time_rules.market_open())
    #schedule for selling winners on the 7th day of year start
    schedule_function(func=schedule_task_c,
                      date_rule=date_rules.month_start(3),
                      time_rule=time_rules.market_close())
    
def schedule_task_a(context, data):
    today = get_datetime('US/Eastern')
    if today.month == 1:
        for stock in context.portfolio.positions:
            #print stock
            print stock
        for stock in context.output.index:
            order_target_percent(stock, context.output.T[stock]['weight']) # T is Transform, there is a better way.  .iloc or .ix or something
            #order_target_percent(stock, context.weight)
            
#selling losers
def schedule_task_b(context, data):
    today = get_datetime('US/Eastern')
    if today.month == 12 and context.portfolio.positions_value != 0:
        for stock in context.portfolio.positions:
            if context.portfolio.positions[stock].cost_basis > data.current(stock, 'price'):
                order_target_percent(stock, 0)            
        print today, 'losers sold'
 
#selling winners
def schedule_task_c(context, data):
    today = get_datetime('US/Eastern')
    if today.month == 1:
        for stock in context.portfolio.positions:
            order_target_percent(stock, 0)
 
def make_pipeline():
    m  = QTradableStocksUS() & Sector().notnull()  # m for mask
    m &= (Fundamentals.market_cap.latest > 1000000000) 
    m &= (Fundamentals.market_cap.latest < 10000000000)
    m &= (
        (Fundamentals.morningstar_sector_code.latest != 103) &
        (Fundamentals.morningstar_sector_code.latest != 207) &
        (Fundamentals.morningstar_sector_code.latest != 206) &
        (Fundamentals.morningstar_sector_code.latest != 309) &
        (Fundamentals.morningstar_industry_code.latest != 20533080) &
        (Fundamentals.morningstar_industry_code.latest != 10217033) &
        (Fundamentals.morningstar_industry_group_code != 10106) &
        (Fundamentals.morningstar_industry_group_code != 10104)
    )
    
    earnings_yield = Fundamentals.ebit.latest/Fundamentals.enterprise_value.latest
    roic      = Fundamentals.roic.latest
    EY_rank   = earnings_yield.rank(ascending=False, mask=m)
    roic_rank = roic          .rank(ascending=False, mask=m)
    MF_rank   = EY_rank + roic_rank
    
    pipe = Pipeline(columns = {
        'earnings_yield': earnings_yield,
        'roic'          : roic,
        'MF_rank'       : MF_rank,
    }, screen = m )
    return pipe
    
def before_trading_start(context, data):
    context.output=pipeline_output('my_pipeline').sort_values(by='MF_rank', ascending=False).head(int(context.capacity)).dropna()
    # context.stocks = sorted_rank[0:int(context.capacity)]
    
    # weight as rank normalize 0 to 1
    context.output['weight'] = context.output['MF_rank'] / context.output['MF_rank'].sum()
    
    #context.weight = 1.0 / len(context.output)
There was a runtime error.

It outperforms here due to higher beta. <=1% alpha is something, but I imagine it can be improved with a few tweaks here.

If you need free historical fundamentals data, I run TenQuant.io. Feel free to check it out; it retreives data in real time from Edgar.

Hi guys,
I worked for some days on the magic formula starting from this post (really appreciated).
after many modifications and backtests, I found a really weird behaviour, based on the month in which the stocks are rotated:
the strategy results change heavily if the stock buys/sells are made in a different month from January (as proposed in the examples of this discussion).
I took the code of Blue Seahawk and backtested it using all the 12 months, here's the results:
https://drive.google.com/file/d/1Gzzrpw8ygGy7d9gFyo4tz3IT9XYpWatb/view?usp=sharing

Moreover, I have made many changes on the algorithm (using interval 2003-2020 instead of 2011-2017, using FCF yield instead of ebit/EV, using a custom formula to extract ROCE instead of ROIC, and other minor changes), but the results have the same behaviour: really good on january, good on the ending months of the year, and really bad if the rotation is made on spring/summer:
https://drive.google.com/file/d/1u6AnsqsqSLXHuZ5EkGwnyBKuNgSTB5FI/view?usp=sharing

I can't find an explanation of this behaviour (it doesn't seem random). How can it happens?

@Luca Wiegand Low frequency trading is very susceptible to initial state values. Think of trying to fly a plane from New York to Los Angeles. If the pilot adjusts his course every minute then chances are the flight path will be quite straight with a lot of tiny corrections. On the other hand, if the pilot only adjusts his course every hour, the flight path will be much more erratic and potentially quite far off a 'straight line' at times. Moreover, if the direction calculations are off a bit, they will have a much greater impact when adjusting only hourly.

Conventional wisdom is the markets don't perform well in the spring and summer. There is the adage "sell in May and go away" . The trading in spring and summer isn't reflective of the other parts of the year. So, similar to the flight example above, if one bases trading 'direction decisions' only on data from those months then results may be significantly different from a 'straight line'.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Hi Dan, thanks for the response.
I can agree on the fact that low frequency tarding can be more susceptible, but in this case the results each year should not have a bias towards one specific month, but it should be random.
If you check the results each year for choosing the stocks on january and on july:

https://drive.google.com/file/d/1b_BLQc2XhBd6RuGvZXk8df2x6jO7Lh2O/view?usp=sharing

The january strategy beats july 13 times over the 16 total.

Regarding the underperformance in spring and summer, even if it could be true only in some years, all the data used in the strategy are annual (LTM), except for the ones that came from balance sheet. So I don't find a reason why selecting stocks in an underperforming period can lead to underperformance for the whole year

I attach a backtest if someone want to have a look

Clone Algorithm
1
Loading...
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
from quantopian.pipeline.classifiers.fundamentals import Sector
from quantopian.algorithm import attach_pipeline, pipeline_output
from quantopian.pipeline import Pipeline
from quantopian.pipeline.data.builtin import USEquityPricing
from quantopian.pipeline.factors import AverageDollarVolume
from quantopian.pipeline import CustomFactor
from quantopian.pipeline.filters import Q500US, Q1500US, Q3000US, QTradableStocksUS
from quantopian.pipeline.data.morningstar import Fundamentals
from quantopian.pipeline.filters.morningstar import IsPrimaryShare, IsDepositaryReceipt
import pandas as pd
import numpy as np

# First the numpy version
class TrailingTwelveMonths(CustomFactor):  
    window_length=315  
    window_safe = True  
    mask=symbols  
 
    def compute(self, today, assets, out, values, dates):  
        out[:] = [  
            (v[d + np.timedelta64(52, 'W') > d[-1]])[  
                np.unique(  
                    d[d + np.timedelta64(52, 'W') > d[-1]],  
                    return_index=True  
                )[1]  
            ].sum()  
            for v, d in zip(values.T, dates.T)  
        ]  


def initialize(context):
    # Rebalance every day, 1 hour after market open.
    #schedule_function(my_rebalance, date_rules.every_day(), time_rules.market_open(hours=1))
     
    # Record tracking variables at the end of each day.
    #schedule_function(my_record_vars, date_rules.every_day(), time_rules.market_close())
     
    # Create our dynamic stock selector.
    context.capacity = 30
    context.weight = 1.0/context.capacity
    context.buy = True
    
    my_pipe = make_pipeline()
    attach_pipeline(my_pipe, 'my_pipeline')
    
    set_slippage(slippage.FixedSlippage(spread=0))
    set_commission(commission.PerShare(cost=0))
    set_long_only()
    
    
    #schedule for buying a week after the year start
    schedule_function(func=schedule_task_a,
                      date_rule=date_rules.month_start(2),
                      time_rule=time_rules.market_open())
    #schedule for selling winners on the 7th day of year start
    schedule_function(func=schedule_task_c,
                      date_rule=date_rules.month_start(1),
                      time_rule=time_rules.market_close())
        
# buying schedule
def schedule_task_a(context, data):
    today = get_datetime('US/Eastern')
    str_stock = ''
    if today.month == 7:
        for stock in context.output.index:
            str_stock += '- {0} ({1})'.format(stock.asset_name, stock.symbol)
            order_target_percent(stock, context.weight)
        print(('buying stocks: {0}'.format(str_stock)))
        
        
#selling winners
def schedule_task_c(context, data):
    str_stock = ''
    today = get_datetime('US/Eastern')
    if today.month == 7:
        for stock in context.portfolio.positions:
            str_stock += '- {0} ({1})'.format(stock.asset_name, stock.symbol)
            order_target_percent(stock, 0)
        print(('selling winners: {0}'.format(str_stock)))

        
def make_pipeline():
    filter_base = Q3000US() & Sector().notnull()
    
    filter_market_cap = (Fundamentals.market_cap.latest > 1000000000) & (filter_base) #& (Fundamentals.market_cap.latest < 10000000000)
    
    filter_sectors = (
        (Fundamentals.morningstar_sector_code.latest != 103) & # Financial Services
        (Fundamentals.morningstar_sector_code.latest != 207) & # Utilities
        (Fundamentals.morningstar_sector_code.latest != 104) & # REIT
        (Fundamentals.morningstar_industry_code.latest != 30910060) # Oil e Gas
        #(Fundamentals.morningstar_sector_code.latest != 206) & # Healthcare
        #(Fundamentals.morningstar_industry_code.latest != 20533080) & # Pharmaceutical Retailers
        #(Fundamentals.morningstar_industry_code.latest != 10217033) & # Apparel Stores
        #(Fundamentals.morningstar_industry_group_code != 10106) & # Metals & Mining
        #(Fundamentals.morningstar_industry_group_code != 10104) # Coal
    ) & (filter_market_cap)
    
    filter_mf = (
        (Fundamentals.net_margin.latest > 0) &
        (Fundamentals.tangible_book_value_per_share.latest > 0) &
        (Fundamentals.book_value_per_share.latest > 0)
    ) & (filter_sectors)
    
    
    #  NEW EBIT accurato e che arriva anche prima
    ebit = TrailingTwelveMonths(inputs=[Fundamentals.ebit, Fundamentals.ebit_asof_date], mask=filter_mf )
    earnings_yield =ebit/Fundamentals.enterprise_value.latest
    
    total_assets = Fundamentals.total_assets.latest
    current_liabilities = Fundamentals.current_liabilities.latest
    current_assets = Fundamentals.current_assets.latest - Fundamentals.cash_cash_equivalents_and_marketable_securities.latest
    intang = Fundamentals.goodwill_and_other_intangible_assets.latest
    cash = Fundamentals.cash_cash_equivalents_and_marketable_securities.latest
    
    pipe = Pipeline(columns = {
        'earnings_yield': earnings_yield,
        'EBIT': ebit,
        'total_assets': total_assets,
        'current_liabilities': current_liabilities,
        'current_assets': current_assets,
        'intang': intang,
        'cash': cash,
    } ,screen = filter_mf )
    return pipe
    
    
def before_trading_start(context, data):
    """
    A function to create our dynamic stock selector (pipeline). Documentation on
    pipeline can be found here: https://www.quantopian.com/help#pipeline-title
    """
    data = pipeline_output('my_pipeline')
    data['current_liabilities_mod'] = data[['current_liabilities','current_assets']].min(axis=1)
    data['intang'] = data['intang'].fillna(0)
    data['CapitalEmployed'] = data['total_assets']-data['current_liabilities_mod']-data['intang']-data['cash']
    data['ROCE'] = data['EBIT'] / data['CapitalEmployed']
    data['EY_rank'] = data['earnings_yield'].rank(ascending=False)
    data['ROCE_rank'] = data['ROCE'].rank(ascending=False)
    data['MF_rank'] = data['EY_rank'] + data['ROCE_rank']
    
    context.output=data.sort_values(by='MF_rank', ascending=True).head(int(context.capacity)).dropna()
There was a runtime error.

The conventional wisdom of "sell in May and go away" is sort of premised on the fact that most companies fiscal year ends in December. Results come out in January, and the markets digest those results in February. Companies which had done well are rewarded with higher share prices while those which didn't have lower share prices. Conventional wisdom is that good stocks are often at their peak by May while poorer performing stocks are at their low. Hence, 'sell in May' assumes you had some good stocks so sell at their peak.

Consider what this strategy does. It buys 'good stocks' and sells 'bad stocks'. After earnings season, during the months of May through the summer, the 'good stocks' are at their highs while 'bad stocks' are at their lows. By rebalancing during these months the algo effectively buys 'high' and sells 'low'. Not the formula for a winning strategy.

However, consider moving that rebalancing forward before earnings season. The 'good stocks' haven't been run up and the 'bad stocks' haven't been dragged down. The algo, as noted, uses annual fundamental data for the most part. The picks for 'good' and 'bad' stocks therefore won't change a lot whether before or after earnings season. The algo will probably be trading about the same stocks. That hasn't changed. What has changed? The price. By trading before earnings season one has a fighting chance at buying low and selling high. That is a much better formula for a winning strategy.

All this of course are generalities and there are of course many exceptions. But stock valuations, on average, are not random with highest and lowest valuations being December through February. Check out the attached notebook showing the number companies reaching their min and max PE by month over the 10 years 2010-2019.

Loading notebook preview...

@Dan,

Why is there only 10 bars on the histogram?

@Vladimir Good catch regarding the number of bars. I got a lazy and used the default bars=10 for the hist method. Setting bars=12 is more appropriate. Doing that also changes the graph to not show such explicit peaks in January. It still shows definite variation but just not so pronounced. Again good catch.

Attached is an updated notebook.

Loading notebook preview...

@Dan,

I created "Seasonality of min and max monthly return", using your notebook.
Hope it will be useful.

Please respond to my request.

Loading notebook preview...