Back to Community
Ranking and Trading on "Days to Cover"

At our meetups in NYC and Boston this month, Jess Stauth explained how short interest can be used as a signal for investing. She has developed a very refined model at Thomson Reuters, and we collaborated on a simplified version here.

First I built a csv file of data for "days to cover" from the Nasdaq site, and threw it into s3. "Days to cover" is the number of average volume days required to cover the total short interest in a security.

I coded the attached algorithm that chooses a universe of stocks and ranks them based on the days to cover data. The algorithm looks for changes in the top ten and bottom ten stocks in the days to cover ranking, and rebalances the portfolio to have an equal dollar investment long in the top decile, and short in the short portfolio.

In addition to the algorithm returns and benchmark returns, I recorded four intermediate values:

  • decile_count -- the number of stocks in a decile, to gauge the number of positions we're building in the portfolio
  • top_turnover -- the percentage change in the top decile. only updates when the ranking changes.
  • bottom_turnover -- the percentage change in the bottom decile. only updates when the ranking changes.
  • dtoc_is_one_count -- NASDAQ doesn't report less than 1 day to cover, so this is a check to see how many stocks in the ranking are hitting the floor.

I also chose to rebalance whenever the deciles change, rather than on a monthly or bi-monthly schedule. The rebalance is incremental, so the transaction costs shouldn't be too onerous.

I'd love to see this ported to run on minutely data, to use the quandl api, or to rebalance on a monthly schedule rather than in and "event-driven" mode. Feedback is appreciated!

thanks,
fawce

aside: I mentioned my script to the guys at Quandl and they offered to put it into their mill, so now you can get short interest through their lovely api and never worry about updating the results. However, I haven't figured out yet how to get the data from quandl for a wide swath of stocks.

Clone Algorithm
722
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
def time_lag(df):
    # data from the exchanges is delayed by 8 business days
    # using pandas' tshift to lag the index by 8 business days
    df = df.tshift(8, freq='b')
    return df

def initialize(context):
    # TODO: this is pulling a static scrape of the data,
    # need to figure out how to pull a large universe from
    # quandl
    fetch_csv(
        'https://s3.amazonaws.com/quantopian-production/data/dtoc_history.csv',
        date_column='date',
        date_format='%m/%d/%Y',
        post_func=time_lag,
        mask=True,
    )
    # set the second decile of stocks by dollar volume traded
    # as our universe. This universe updates quarterly.
    set_universe(universe.DollarVolumeUniverse(80, 90))
    # assume a 
    set_commission(commission.PerShare(cost=0.005))
    # we're using pretty liquid stocks, so volume share will impose a 
    # fairly small impact.
    set_slippage(slippage.VolumeShareSlippage(volume_limit=0.25, price_impact=0.1))
    # we run with a 50k initial cash balance, and we expect about 50 stocks in each
    # decile because the 10% universe above has about 500 members.
    # To limit total leverage to about 2x, we limit individual positions to
    # $1k notional value.
    context.position_limit = 1000
    context.top = None
    context.bottom = None
    
def handle_data(context, data):
    # get a map of sid -> days to cover. get_dtoc is defined below
    # it has a few guards on accessing the days to cover for the stock
    # codenote: this is a python dictionary comprehension.
    dtoc_map = {int(sid): get_dtoc(value) for sid, value in data.iteritems()}
    
    # cull the sids to the stocks in data AND with days to cover data.
    # codenote: itertools is full of win, google it.
    context.sids = data.keys()
    filtered = filter(lambda x: dtoc_map[x] is not None, context.sids)
    
    # calculate the count of items in a decile and plot it with record
    decile = len(filtered)/10
    record(decile_count=decile)
    
    # only proceed if we have a non-zero decile count
    if decile > 0:
        
        # rank the sids by days to cover, lowest to highest
        ranking = sorted(filtered, key=lambda x: dtoc_map[x])
        
        # find all the sids with a days to cover of one. this is interesting
        # because the NASDAQ reports the data with a floor of 1.
        ones = filter(lambda x: data[x]['days_to_cover'] == 1, ranking)
        record(dtoc_is_one_count=len(ones))
        
        # record the maximum days to cover (i.e. greatest short interest)
        record(highest_dtoc=data[ranking[-1]]['days_to_cover'])
        
        # slice the ranking to get the top and the bottom
        # codenote: python supports negative indexes, which count back 
        # from the end of the list.
        top = set(ranking[:decile])
        bottom = set(ranking[-1*decile:])
        
        if context.top != top or context.bottom != bottom:
            # turnover is defined as the number of stocks entering or leaving the
            # decile. it is useful as a way to guage the stability of our ranking.
            # turnover for top and bottom is recorded into a time-series plot
            record(bottom_turnover=calculate_turnover(bottom, context.bottom))
            record(top_turnover=calculate_turnover(top, context.top))
            
            log.info("Rebalancing on {d}".format(d=data[data.keys()[0]].datetime))
            context.top =  top
            context.bottom = bottom
            rebalance(context, data)

def calculate_turnover(curr, orig):
    """
    turnover reflects the churn in the decile from curr to orig.
    """
    if not curr or not orig:
        return 0.0
    
    # sets overload the behavior of +/- operators
    adds = curr - orig
    drops = orig - curr
    change = float(len(adds) + len(drops)) / len(orig) * 100.0
    return change
    
def get_dtoc(siddata):
    """
    The source data for days to cover has holes in the coverage. 
    Some stocks are not covered, and others have missing days. Here
    we have a convenience function to guard against missing days_to_cover properties.
    """
    if 'days_to_cover' in siddata:
        return siddata.days_to_cover
    else:
        return None    
 
def rebalance(context, data):
    """
    Rebalance the portfolio based on contents of the current portfolio,
    the top decile, and the bottom decile. top, bottom, and portfolio 
    are all expected to be in context. 

    The rebalance strategy is to calculate a difference between the
    current position and a desired position.
    It then places an order to make that adjustment.
    This should serve to avoid needless portfolio churn and 
    transaction fees.
    """
    for sid in context.sids:
        if sid not in context.top and sid not in context.bottom:
            pos = context.portfolio.positions[sid]  
            if pos.amount != 0:
                order(sid, -1 * pos.amount)
        
        if sid in context.top:
            order_position(context, sid, data[sid], 1)
        if sid in context.bottom:
            order_position(context, sid, data[sid], -1)
        
       
def order_position(context, sid, event, direction):
    """
    Helper function to calculate the order size needed.
    """
    if event.price > 0:
        cur_pos = context.portfolio.positions[sid]
        desired_amount = context.position_limit / event.price * direction
        order_amount = desired_amount - cur_pos.amount
        order(sid, order_amount)
            
This backtest was created using an older version of the backtester. Please re-run this backtest to see results using the latest backtester. Learn more about the recent changes.
There was a runtime error.
Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

5 responses

Hey John:

Quandl is few weeks away from being able to take a list of tickers in the API so you can download an entire exchange if you want to. End of month maybe…

T.

Thanks Tammer! I'll keep an eye out for it.
thanks,
fawce

I'm keeping an eye out for selecting the universe using a ranking :)

Just a quick update, I converted this algo to use the new order_target_value method. The new method is a bit more savvy with building the positions than my original, so the orders placed and the resulting performance differ slightly.

thanks,
fawce

Clone Algorithm
722
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
def time_lag(df):
    # data from the exchanges is delayed by 8 business days
    # using pandas' tshift to lag the index by 8 business days
    df = df.tshift(8, freq='b')
    return df

def initialize(context):
    # TODO: this is pulling a static scrape of the data,
    # need to figure out how to pull a large universe from
    # quandl
    fetch_csv(
        'https://s3.amazonaws.com/quantopian-production/data/dtoc_history.csv',
        date_column='date',
        date_format='%m/%d/%Y',
        post_func=time_lag,
        mask=True,
    )
    # set the second decile of stocks by dollar volume traded
    # as our universe. This universe updates quarterly.
    set_universe(universe.DollarVolumeUniverse(80, 90))
    # assume a 
    set_commission(commission.PerShare(cost=0.005))
    # we're using pretty liquid stocks, so volume share will impose a 
    # fairly small impact.
    set_slippage(slippage.VolumeShareSlippage(volume_limit=0.25, price_impact=0.1))

    # we run with a 50k initial cash balance, and we expect about 50 stocks in each
    # decile because the 10% universe above has about 500 members.
    # To limit total leverage to about 2x, we limit individual positions to
    # $1k notional value.
    context.position_limit = 1000
    context.top = None
    context.bottom = None
    
def handle_data(context, data):
    # get a map of sid -> days to cover. get_dtoc is defined below
    # it has a few guards on accessing the days to cover for the stock
    # codenote: this is a python dictionary comprehension.
    dtoc_map = {int(sid): get_dtoc(value) for sid, value in data.iteritems()}
    
    # cull the sids to the stocks in data AND with days to cover data.
    # codenote: itertools is full of win, google it.
    context.sids = data.keys()
    filtered = filter(lambda x: dtoc_map[x] is not None, context.sids)
    
    # calculate the count of items in a decile and plot it with record
    decile = len(filtered)/10
    record(decile_count=decile)
    
    # only proceed if we have a non-zero decile count
    if decile > 0:
        
        # rank the sids by days to cover, lowest to highest
        ranking = sorted(filtered, key=lambda x: dtoc_map[x])
        
        # find all the sids with a days to cover of one. this is interesting
        # because the NASDAQ reports the data with a floor of 1.
        ones = filter(lambda x: data[x]['days_to_cover'] == 1, ranking)
        record(dtoc_is_one_count=len(ones))
        
        # record the maximum days to cover (i.e. greatest short interest)
        record(highest_dtoc=data[ranking[-1]]['days_to_cover'])
        
        # slice the ranking to get the top and the bottom
        # codenote: python supports negative indexes, which count back 
        # from the end of the list.
        top = set(ranking[:decile])
        bottom = set(ranking[-1*decile:])
        
        if context.top != top or context.bottom != bottom:
            # turnover is defined as the number of stocks entering or leaving the
            # decile. it is useful as a way to guage the stability of our ranking.
            # turnover for top and bottom is recorded into a time-series plot
            record(bottom_turnover=calculate_turnover(bottom, context.bottom))
            record(top_turnover=calculate_turnover(top, context.top))
            
            log.info("Rebalancing on {d}".format(d=data[data.keys()[0]].datetime))
            context.top =  top
            context.bottom = bottom
            rebalance(context, data)

def calculate_turnover(curr, orig):
    """
    turnover reflects the churn in the decile from curr to orig.
    """
    if not curr or not orig:
        return 0.0
    
    # sets overload the behavior of +/- operators
    adds = curr - orig
    drops = orig - curr
    change = float(len(adds) + len(drops)) / len(orig) * 100.0
    return change
    
def get_dtoc(siddata):
    """
    The source data for days to cover has holes in the coverage. 
    Some stocks are not covered, and others have missing days. Here
    we have a convenience function to guard against missing days_to_cover properties.
    """
    if 'days_to_cover' in siddata:
        return siddata.days_to_cover
    else:
        return None    
 
def rebalance(context, data):
    """
    Rebalance the portfolio based on contents of the current portfolio,
    the top decile, and the bottom decile. top, bottom, and portfolio 
    are all expected to be in context. 

    The rebalance strategy is to calculate a difference between the
    current position and a desired position.
    It then places an order to make that adjustment.
    This should serve to avoid needless portfolio churn and 
    transaction fees.
    """
    
    for sid in context.sids:
        if sid not in context.top and sid not in context.bottom:
            order_target_value(sid, 0)
        
        if sid in context.top:
            order_target_value(sid, context.position_limit)
        if sid in context.bottom:
            order_target_value(sid, -1*context.position_limit)
            
This backtest was created using an older version of the backtester. Please re-run this backtest to see results using the latest backtester. Learn more about the recent changes.
There was a runtime error.

Re minutely data, how would that be useful since short data is only updated once every two weeks (http://www.nasdaqtrader.com/trader.aspx?id=shortintpubsch)? Or minutely just for comparison to the index?

Also, an interesting extension would be to include shorting costs. Many academic papers have pointed to the lack of incorporation of shorting costs as a distortion back testing returns, though you have chosen fairly liquid securities for this strategy.

Finally, am I correct in assuming that you are going long the stocks that have the highest days to cover (ie the most shorted stocks) and long the ones that have the lowest days to cover (lines 123 to 126)? The blog post that you refer to (and my knowledge of academic papers on this topic) indicate that we should be doing the opposite?

Quote: Aggregated open short interest level provides a profitable, low turnover signal rooted in buy-side sentiment, aka “the smart money.”

Perhaps I'm reading the code wrong -- I'm new to Python.