Back to Community
Backtest with Accern's ML-Driven DS2 Dataset to Generate Daily Strategy with Sharpe Ratio of 3 with Trading Costs

The strategy is built upon Accern's proprietary DS2 dataset. DS2 stands for predictive analytics dataset that’s Daily frequency, with S&P 500 universe coverage and generated through 2nd version of our predictive models.

Accern’s proprietary data analytics pipeline (TITAN) monitors and processes millions of online stories everyday to quantify the most relevant and impactful stories into 60+ metrics. These low-latency, real-time analytics are made available in Accern’s News Analytics Firehose. This is where Accern’s powerful AI engine (XYME) undertakes the difficult task of analyzing the firehose and generating predictive analytics for wide range of use-cases, ranging from quantitative trading, fundamental analysis, market intelligence etc. The predictive analytics dataset (DS2) is also generated through the same process.

Strategy Setup

Backtest period: October 30, 2014 to March 28, 2018
Universe Components: S&P 500 stocks
Benchmark: SPDR S&P 500 Trust ETF (SPY)
Rebalance Frequency: Daily
Direction Style: Long/Short
Commissions and slippage costs: 1 bps, volume limit = 10%

Signal-Building

The core strategy logic, in general, is to integrate two layers of signals to select a set of stocks each day for us to determine either long or short positions to execute.

The first layer of signals is built as follows:

Step 1: Calculate rolling 21-day mean values of the DS2 score for each stock, as S_21;
Step 2: Calculate rolling 63-day mean values of the DS2 score for each stock, as S_63;
Step 3: Calculate values of S_21 - S_63 for each stock, as S_crossover;
Step 4: If a stock’s S_crossover > 0, we put it into the long position list, as long_1;
Step 5: If a stock’s S_crossover < 0, we put it into the short position list, as short_1.

The second layer of signals is built as follows:

Step 1: Rank stocks’ DS2 scores on a daily basis
Step 2: Select top 100 stocks with highest scores and put them into long_2;
Step 3: Select bottom 100 stocks with lowest scores put them into short_2.

Eventually, we take the intersection of the long_1 and long_2 as the finalized stocks to go long on that day, marked as final_long. We then take the intersection of the short_1 and short_2 as the finalized stocks to go short on that day, marked as final_short.

Execution:

Execution/Rebalance Time: Prior to the markets opening on each business trading day, we will finish the signal-building process outlined above and have the final long and short stock lists prepared. Within five minutes of the markets opening, we start executing/rebalancing trades and positions.

Position Control:

Before every rebalance, we will need to calculate the target single position size for that trading day and adjust/rebalance according to the calculated long_size and short_size:

portfolio_leverage = 1
total_position_size = length of (final_long + final_short)
long_size = short_size = portfolio_leverage / total_position_size

Here, long_size and short_size are the percents of the current portfolio’s net asset value. The purpose of this position-control process is to make sure our strategy’s gross leverage is at around 1, while the net dollar exposure to either long or short positions is as small as possible.

Rebalance:

Upon each rebalance, there are three execution logics:

Enter new positions: Based on final_long and final_short signals, we enter stocks if they are not in the current portfolio.
Exit existing positions: We exit existing positions if the stock is neither in the final_longs and final_shorts, assuming we have DS2 signal data for that stock on that day. However, if we don’t have DS2 signal data for that stock on that day and the stock is still in our portfolio, we keep that stock unchanged.
Reverse long/short position directions: If a stock is in portfolio and the most updated signal indicates opposite signal direction, we will either reverse the long position to short position or vice versa.

About Accern

Accern is a data design startup that provides predictive news analytics solutions to companies to assist with intelligent data-driven decisions. Each day, we monitor billions of websites, extract numerous insights per story, and create tailored predictive news analytics solutions for our clients with flexible delivery options. Accern recently made Forbes 30 Under 30 for 2018 under Enterprise Technologies.

For more information, please visit www.accern.com

Clone Algorithm
79
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Uniqueness for this template:

# Allows us to reverse longs to shorts directly
# Ignore stocks with no signals on some days which are already in our portfilio




import pandas as pd
import numpy as np

# data_file_sample = 'https://dl.dropboxusercontent.com/s/yzihbscejxr5fvy/6_5_DS2_daily_agg_mean_SMA_21_63_part_1_sample_shorter_for_QA.csv?dl=0'

data_file_part_1 = 'https://dl.dropboxusercontent.com/s/bg3po7nk8nsez7d/6_4_DS2_daily_agg_mean_SMA_21_63_part_1.csv?dl=0'

data_file_part_2 = 'https://dl.dropboxusercontent.com/s/nbppgxkr59q4lz7/6_4_DS2_daily_agg_mean_SMA_21_63_part_2.csv?dl=0'


num_stocks_to_trade = 100
MAX_GROSS_EXPOSURE = 1.0
MAX_SHORT_POSITION_SIZE = 0.1
MAX_LONG_POSITION_SIZE = 0.1
single_metric_string = 'rm_signal'
score_string = 'score'
long_signal = 'long'
short_signal = 'short'
long_half= 0.5
short_half = 0.5


data_take=0
k=0
def merge_data(df):
    global data_take
    global k
    if k==0:
        data_take=df
        k=1
    else:
        frames=[data_take,df]
        data_take=pd.concat(frames)
    return data_take



def initialize(context):
    context.num_stocks_to_trade = num_stocks_to_trade
    set_benchmark(sid(8554))
    set_slippage(slippage.FixedBasisPointsSlippage(basis_points=1, volume_limit=0.1))
    set_commission(commission.PerShare(cost=0, min_trade_cost=0))
    
    schedule_function(my_rebalance,
                      date_rules.every_day(),
                      time_rules.market_open(hours = 0, minutes = 5))

    schedule_function(record_vars,
                      date_rules.every_day(),
                      time_rules.market_close(hours = 0, minutes = 1))   

    # fetch_csv(data_file_sample,
    #       date_column='date',  # Assigning the column label
    #       date_format='%m/%y/%d %H:%M:',  # Using date format from CSV file
    #       mask=False,
    #       # post_func = merge_data,
    #       timezone='EST')   
    
    fetch_csv(data_file_part_1,
          date_column='date',  # Assigning the column label
          date_format='%m/%y/%d %H:%M:',  # Using date format from CSV file
          mask=False,
          post_func = merge_data,
          timezone='EST')

    fetch_csv(data_file_part_2,
          date_column='date',  # Assigning the column label
          date_format='%m/%y/%d %H:%M:',  # Using date format from CSV file
          mask=False,
          post_func = merge_data,
          timezone='EST')
     


def my_rebalance(context, data):  
   
    
    rm_signals = []
    current_date = get_datetime().date()
    for stock in data.fetcher_assets:
        enter_date = data.current(stock, 'enter_date')
        enter_date = pd.to_datetime(enter_date).date()
        if current_date.strftime('%Y-%m-%d %H:%M:%S') == enter_date.strftime('%Y-%m-%d %H:%M:%S'):
            rm_signal = data.current(stock, single_metric_string)
            score = data.current(stock, score_string)
            rm_signals.append([stock, score, rm_signal])
            
    df = pd.DataFrame(rm_signals, columns = ['stock', score_string, single_metric_string])
    df['rank'] = df[score_string].rank(ascending = False)
    df = df.sort_values('rank')
    
    # Firstly, whether go long or short depends on the string signal: 'long' or 'short' on the 'rm_signal' column: 
    longs_1 = df[df[single_metric_string]==long_signal]['stock'].tolist()
    shorts_1 = df[df[single_metric_string]==short_signal]['stock'].tolist()
    
    # Secondly, filter out the highest scores for going long and lowest scores for going short
    num_stocks_to_trade = context.num_stocks_to_trade
    if len(df) < 2.0*num_stocks_to_trade:
        num_stocks_to_trade = int(len(df)/2.0)
        # print ('df capacity is smaller than num_stocks_to_trade, opps, actual trade num is {}'.format(num_stocks_to_trade))
        longs_2  = df[:num_stocks_to_trade]['stock'].tolist()
        shorts_2 = df[-num_stocks_to_trade:]['stock'].tolist()
    else:
        longs_2  = df[:num_stocks_to_trade]['stock'].tolist()
        shorts_2 = df[-num_stocks_to_trade:]['stock'].tolist()    
        # print ('df capacity is larger than num_stocks_to_trade, cool, actual trade num is {}'.format(num_stocks_to_trade))    
    
    # Lastly, take the intercept between the two sets of long and short positions: 
    longs = list(set(longs_1) & set(longs_2))
    shorts = list(set(shorts_1) & set(shorts_2))

    print ('longs_all_length: ', len(longs))
    print ('shorts_all_length: ', len(shorts))   
    # print ('Portfolio Value: ', context.portfolio.portfolio_value)
   
    
    
    # Define long_size and short_size in advance:
    # if context.account.leverage >1:
    
    total_potential_positions = list(set(longs + shorts))
    total_potential_size = len(total_potential_positions)
    print ('Current position size: ', len(context.portfolio.positions))
    print ('Long/Short Signal Size: ', len(longs + shorts))
    print ('Total_potential_size: ', total_potential_size)
    long_size = 1.0/total_potential_size
    short_size = 1.0/total_potential_size
        

    
    # print ('Long half for today: ', long_half)
    # print ('Short half for today: ', short_half)

    # try:
    #     long_size = long_half/len(longs)
    #     if long_size > MAX_LONG_POSITION_SIZE:
    #         long_size = MAX_LONG_POSITION_SIZE
            
        
    #     short_size = short_half/len(shorts)
    #     if short_size > MAX_SHORT_POSITION_SIZE:
    #         short_size = MAX_SHORT_POSITION_SIZE    
    # except:
    #     pass
       
    print ('Long size for today: ', long_size)
    print ('Short size for today: ', short_size)
    ################ Execution Logic 1 : Flipping/reversing long/short positions: ################
    try:
        for position in context.portfolio.positions:
            if context.portfolio.positions[position].amount>0 and position in shorts:
                order_target_percent(position, -short_size)
        
            if context.portfolio.positions[position].amount<0 and position in longs:
                order_target_percent(position, long_size)
                
    ################ Execution logic 2: Exiting ################
            if position not in longs and position not in shorts and position in df['stock'].tolist():
                order_target_percent(position, 0)             

    ################ Execution Logic 3: Ignoring stocks with no signals which are already in portfolio: ################
            if position not in df['stock'].tolist():
                pass
                print ('This stock is still in our position, but no signals today, do nothing', position)
            
    except:
        pass

    ################ Execution Logic 4: Entering new long/short stocks: ################
    
    # Enter new longs:
    for stock in longs:
            # if not get_open_orders(stock):
        if data.can_trade(stock) and \
        context.portfolio.positions[stock].amount == 0:
            order_target_percent(stock, long_size)     
              
    # Enter new shorts:
    for stock in shorts:
            # if not get_open_orders(stock):
        if data.can_trade(stock) and \
        context.portfolio.positions[stock].amount == 0:
            order_target_percent(stock, -short_size)

           
def record_vars(context, data):
    long_count = 0
    short_count = 0
    for position in context.portfolio.positions.itervalues():
        if position.amount > 0:
            long_count += 1
        elif position.amount < 0:
            short_count += 1
    record(gross_leverage=context.account.leverage)  # Plot leverage to chart
    record(net_leverage=context.account.net_leverage)
    record(num_longs = long_count)
    record(num_shorts = short_count)
    # record(portfolio_cash = context.portfolio.cash)
    # record(capital_used = context.portfolio.capital_used)
    # record(portfolio_value = context.portfolio.portfolio_value)
    # record(positions_value = context.portfolio.positions_value)
    # record(position_portfolio_ratio = context.portfolio.positions_value/context.portfolio.portfolio_value)
    record(portfolio_size = len(context.portfolio.positions))
There was a runtime error.
62 responses

This is the tearsheet of the strategy for more open discussions about this strategy.

Loading notebook preview...
Notebook previews are currently unavailable.

@Brad,

Can you please re-run notebook tearsheet using bt.create_full_tear_sheet(round_trips = True)? This way we can see other important stats such as % profitable short and long trades. Noted that gross leverage and daily turnover rates are above Q contest threshold but I think this could be adjusted. Otherwise great algo with fairly low volatility and decent returns.

@ James.

Thanks for the feedback. Attached is the tearsheet with the round_trips = True.

Loading notebook preview...
Notebook previews are currently unavailable.

Brad, I enjoy following your work with the Accern data. You could investigate using the new self-serve data feature. This will provide a few benefits to you: the data will be stored in a point-in-time manner, the signal will be available directly into the pipeline API and therefore you can run analyses on the data using alphalens easily.

Again, nice work!

Thanks,
Josh

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

@ Josh,

Appreciate it! Great feature that I always want to find on Quantopian.

Definitely will try and I believe it will make the external-data-driven strategy building and testing process more efficient.

Thanks.
Brad

@Brad,

Thanks for the new tearsheet. Your algo has a 52% profitable trades, 49% on shorts and 54% on longs, overall better than random / chance, so definitely some alpha there.

To try and fit your sentiment data into the Q contest framework, the first thing you should do is to make your stock universe filterable / maskable to Q universe which is QTradableStocksUS. This may (or may not) have some impact on your results since QTradableStocksUS has some limitations as articulated here A concrete example of some of these limitations is the exclusion of M&A target stocks, a source of good returns if you're on the right side. Your universe probably includes some of these M&A stocks, excluding them from your universe will have some effect.

Secondly, you have to subject your ordering system to Q's Optimize API which handles its specific risk constraints and thresholds.
Once you do both, you can now parametrically adjust the high turnover rates, dollar neutral, gross leverage and other specifics to conform to Q's framework.

I encourage you to try and upload Accern sentiment data to the new self-serve data feature as Josh suggested. I tried with my predictive SPY signal hereand it worked fine. Accern data could be a premium data provider for Q, if it passes all its requirements, I would say.

@ James,

Appreciate such detailed and helpful comments and suggestions. In fact, previously I do have used Accern's DS2 dataset along with Quantopian's optimization package to generate some pretty promising strategies which meet all of the requirements except for the tradable universe. By the way DS2's universe is S&P 500 companies.

I've attached the strategies here. Feel free to test and run it on your side. At the same time, I will try to figure out how to meet the last requirement using the existing strategy template and universe. But any help on it would be much appreciated!

This is a strategy built on DS2's (originally as daily dataset) weekly-mean aggregated dataset. I didn't utilize the self-serve data feature when I was developing this strategy. But going forward, I will integrate this feature with Accern dataset/files.

Clone Algorithm
7
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
import quantopian.optimize as opt
import quantopian.algorithm as algo
import pandas as pd

# data_file = 'https://dl.dropboxusercontent.com/s/ti33mv9icd8m1td/4_27_ds2_small_sample_for_DS2.csv?dl=0'
# data_file = 'https://dl.dropboxusercontent.com/s/cuas96rydjgao2l/4_25_trimmed_gtm_official_DS2_for_Q.csv?dl=0'
data_file = 'https://dl.dropboxusercontent.com/s/57d3j23kzdgut3g/5_1_weekly_mean_score_on_DS2_bmonday_fixed_v2_for_Q.csv?dl=0'

#Set Maximum portfolio Leverage
MAX_GROSS_EXPOSURE = 1
#Set Maximum Position sizes for individual longs and shorts
MAX_SHORT_POSITION_SIZE = 0.02
MAX_LONG_POSITION_SIZE = 0.02
single_metric_string = 'score'



def initialize(context):
    
    #Slippage and Commission model
    # set_slippage(slippage.FixedSlippage(spread=0))
    set_slippage(slippage.FixedBasisPointsSlippage(basis_points=1, volume_limit=0.1))
    # set_commission(commission.PerShare(cost=0.001, min_trade_cost=1.00))
    
    schedule_function(morning_execution,
                      date_rules.week_start(),
                      time_rules.market_open(hours = 0, minutes = 5))

    
    schedule_function(record_vars,
                      date_rules.every_day(),
                      time_rules.market_close()) 
    
    #Fetch Accern CSV data
    fetch_csv(data_file,
          date_column='next_week_start',  # Assigning the column label
          date_format='%m/%y/%d %H:%M:',  # Using date format from CSV file
          mask=False, 
          timezone='EST')
    
def before_trading_start(context, data): 
    
    current_bar = get_datetime()
    scores = []
    stocks = []
    # print(current_bar)
    for stock in data.fetcher_assets:
        if data.current(stock, 'year')  == current_bar.year and \
           data.current(stock, 'month') == current_bar.month and \
           data.current(stock, 'day')   == current_bar.day:
            stocks.append(stock)
            score = data.current(stock, single_metric_string)
            scores.append(score)
            
    df = pd.DataFrame(scores, columns=[single_metric_string], index = stocks)
    # df['ranked'] = df[single_metric_string].rank(ascending = False)
    # df = df.sort_values('ranked')


    print ('Showing df:', df)
  
    
    context.stock_list = df
    
    #Record Stocks in today's list
    # log.info(context.stock_list.index)
    print ('stockranking: ', context.stock_list.index, df[single_metric_string])
    # print ('stock lowest ranking: ', context.stock_list.index, df['combined'])
def morning_execution(context, data):
    #Set objective for our Optimizer
    context.stock_list[single_metric_string] = context.stock_list[single_metric_string].astype(float)
    objective = opt.MaximizeAlpha(context.stock_list[single_metric_string])
    #Set Constraints
    constrain_gross_leverage = opt.MaxGrossExposure(MAX_GROSS_EXPOSURE)
    market_neutral = opt.DollarNeutral(tolerance=0.08)
    
    constrain_pos_size = opt.PositionConcentration.with_equal_bounds(
        -MAX_SHORT_POSITION_SIZE,
        MAX_LONG_POSITION_SIZE,
    )
    
    #Place Orders based on our objective and constraints
    try:
        algo.order_optimal_portfolio(
            objective=objective,
            constraints=[
                constrain_gross_leverage,
                constrain_pos_size,
                market_neutral,
            ],
        )
        
    except:
        pass
    
def record_vars(context, data):
    long_count = 0
    short_count = 0
    for position in context.portfolio.positions.itervalues():
        if position.amount > 0:
            long_count += 1
        elif position.amount < 0:
            short_count += 1
    record(gross_leverage=context.account.leverage)  # Plot leverage to chart
    record(net_leverage=context.account.net_leverage)
    record(num_longs = long_count)
    record(num_shorts = short_count)
    record(portfolio_size = len(context.portfolio.positions))
There was a runtime error.

Also this is another strategy result built on DS2's weekly-sum aggregated dataset which also meets all of the requirements except for tradable universe.

Clone Algorithm
24
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
import quantopian.optimize as opt
import quantopian.algorithm as algo
import pandas as pd

# data_file = 'https://dl.dropboxusercontent.com/s/ti33mv9icd8m1td/4_27_ds2_small_sample_for_DS2.csv?dl=0'
# data_file = 'https://dl.dropboxusercontent.com/s/cuas96rydjgao2l/4_25_trimmed_gtm_official_DS2_for_Q.csv?dl=0'
data_file = 'https://dl.dropboxusercontent.com/s/wh7nk5ajzietrr4/5_1_weekly_sum_score_on_DS2_bmonday_fixed_v2_for_Q.csv?dl=0'

#Set Maximum portfolio Leverage
MAX_GROSS_EXPOSURE = 1
#Set Maximum Position sizes for individual longs and shorts
MAX_SHORT_POSITION_SIZE = 0.02
MAX_LONG_POSITION_SIZE = 0.02
single_metric_string = 'score'



def initialize(context):
    
    #Slippage and Commission model
    # set_slippage(slippage.FixedSlippage(spread=0))
    set_slippage(slippage.FixedBasisPointsSlippage(basis_points=1, volume_limit=0.1))
    # set_commission(commission.PerShare(cost=0.001, min_trade_cost=1.00))
    
    schedule_function(morning_execution,
                      date_rules.week_start(),
                      time_rules.market_open(hours = 0, minutes = 5))

    
    schedule_function(record_vars,
                      date_rules.every_day(),
                      time_rules.market_close()) 
    
    #Fetch Accern CSV data
    fetch_csv(data_file,
          date_column='next_week_start',  # Assigning the column label
          date_format='%m/%y/%d %H:%M:',  # Using date format from CSV file
          mask=False, 
          timezone='EST')
    
def before_trading_start(context, data): 
    
    current_bar = get_datetime()
    scores = []
    stocks = []
    # print(current_bar)
    for stock in data.fetcher_assets:
        if data.current(stock, 'year')  == current_bar.year and \
           data.current(stock, 'month') == current_bar.month and \
           data.current(stock, 'day')   == current_bar.day:
            stocks.append(stock)
            score = data.current(stock, single_metric_string)
            scores.append(score)
            
    df = pd.DataFrame(scores, columns=[single_metric_string], index = stocks)
    # df['ranked'] = df[single_metric_string].rank(ascending = False)
    # df = df.sort_values('ranked')


    print ('Showing df:', df)
  
    
    context.stock_list = df
    
    #Record Stocks in today's list
    # log.info(context.stock_list.index)
    print ('stockranking: ', context.stock_list.index, df[single_metric_string])
    print ('testing', type(context.stock_list.score))
    # print ('stock lowest ranking: ', context.stock_list.index, df['combined'])
def morning_execution(context, data):
    #Set objective for our Optimizer
    context.stock_list[single_metric_string] = \
    context.stock_list[single_metric_string].astype(float)
    objective = opt.MaximizeAlpha(context.stock_list[single_metric_string])
    #Set Constraints
    constrain_gross_leverage = opt.MaxGrossExposure(MAX_GROSS_EXPOSURE)
    market_neutral = opt.DollarNeutral(tolerance=0.08)
    constrain_pos_size = opt.PositionConcentration.with_equal_bounds(
        -MAX_SHORT_POSITION_SIZE,
        MAX_LONG_POSITION_SIZE,
        
    # constrain_sector_exposure = opt.NetGroupExposure.with_equal_bounds(labels, min, max)
    )
    
    #Place Orders based on our objective and constraints
    try:
        
        algo.order_optimal_portfolio(
            objective=objective,
            constraints=[
                constrain_gross_leverage,
                constrain_pos_size,
                market_neutral,
        
            ],
        )
        
    except:
        pass
    
def record_vars(context, data):
    long_count = 0
    short_count = 0
    for position in context.portfolio.positions.itervalues():
        if position.amount > 0:
            long_count += 1
        elif position.amount < 0:
            short_count += 1
    record(gross_leverage=context.account.leverage)  # Plot leverage to chart
    record(net_leverage=context.account.net_leverage)
    record(num_longs = long_count)
    record(num_shorts = short_count)
    record(portfolio_size = len(context.portfolio.positions))
There was a runtime error.

@Brad,

Thanks for the above backtests of Accern's DS2 dataset. I am very impressed with the results. In my many years of trading systems development, while I always knew that news is a sigificant factor in stock price movements I often wondered how it can be amalgamated with traditional price action predictive models in a meaningful and compact way. Also being an early adaptor of AI in Finance, I perfectly understand the AI techniques Accern is deploying, an NLP for news aggregation and analytics processed in a deep neural network for compact scoring, a novel combination. The main differentiator then and now, is more digitized news data and forward looking, it will only get better as more of these news data becomes readily accessible. Although there are also challenges ahead like "Fake News", hahaha!

I tried many different ways to filter your stock universe to conform to QTradableStocksUS programmatically but it doesn't seem to work. I guess it may be a technical issue in mapping your stock universe to that of QTradableStocksUS, so perhaps if you upload your dataset to the Self-Serve Data feature it will automatically map your stocks. Since your stock universe is SP500, majority of which should be in QTradableStocksUS, your backtests above should hold.

@Brad: I downloaded the file used in the last backtest you shared and added it as a self-serve dataset so I could run it through a notebook and backtest. Attached is a notebook that inspects the data and runs it through a Pipeline + Alphalens (Including a QTradableStocksUS filter).

Note that you'll need to upload the data as a self-serve dataset on your own account and change the import statement in the notebook to include your user ID instead of mine.

Loading notebook preview...
Notebook previews are currently unavailable.
Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

And here's a backtest that uses the default slippage and commissions. I'm not sure if it's the exact same logic as your example, but it seems to perform well over the simulation period. It also passes the contest criteria! Anyone who wants to clone and tweak this algo will need to make their own version of the self-serve dataset and update the import statement before running it since all self-serve datasets are currently private to the account that created them.

Clone Algorithm
14
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
import quantopian.algorithm as algo
import quantopian.optimize as opt

from quantopian.pipeline import Pipeline
from quantopian.pipeline.data.builtin import USEquityPricing
from quantopian.pipeline.filters import QTradableStocksUS
from quantopian.pipeline.data.user_55eefc2af269c592d80008a9 import accern_ds2_weekly_sum_hist

#Set Maximum portfolio Leverage
MAX_GROSS_EXPOSURE = 1
#Set Maximum Position sizes for individual longs and shorts
MAX_SHORT_POSITION_SIZE = 0.02
MAX_LONG_POSITION_SIZE = 0.02

def initialize(context):
    schedule_function(morning_execution,
                      date_rules.week_start(),
                      time_rules.market_open(hours = 0, minutes = 5))
    
    # Create our dynamic stock selector.
    algo.attach_pipeline(make_pipeline(), 'pipeline')


def make_pipeline():
    # Base universe set to the QTradableStocksUS
    base_universe = QTradableStocksUS()

    pipe = Pipeline(
        columns={
            'score': accern_ds2_weekly_sum_hist.score.latest,
        },
        screen=(base_universe & accern_ds2_weekly_sum_hist.score.latest.notnull())
    )

    return pipe


def before_trading_start(context, data):
    context.output = algo.pipeline_output('pipeline')


def morning_execution(context, data):
    # Set objective for our Optimizer
    objective = opt.MaximizeAlpha(context.output['score'])
    
    # Set Constraints
    constrain_gross_leverage = opt.MaxGrossExposure(MAX_GROSS_EXPOSURE)
    dollar_neutral = opt.DollarNeutral(tolerance=0.02)
    constrain_pos_size = opt.PositionConcentration.with_equal_bounds(
        -MAX_SHORT_POSITION_SIZE,
        MAX_LONG_POSITION_SIZE,
    )
    
    algo.order_optimal_portfolio(
        objective=objective,
        constraints=[
            constrain_gross_leverage,
            dollar_neutral,
            constrain_pos_size,
        ],
    )
There was a runtime error.

Here's a version of the backtest with the 1bps slippage model you used in the examples above (this is a closer comparison to the version you shared using Fetcher).

Clone Algorithm
14
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
import quantopian.algorithm as algo
import quantopian.optimize as opt

from quantopian.pipeline import Pipeline
from quantopian.pipeline.data.builtin import USEquityPricing
from quantopian.pipeline.filters import QTradableStocksUS
from quantopian.pipeline.data.user_55eefc2af269c592d80008a9 import accern_ds2_weekly_sum_hist

#Set Maximum portfolio Leverage
MAX_GROSS_EXPOSURE = 1
#Set Maximum Position sizes for individual longs and shorts
MAX_SHORT_POSITION_SIZE = 0.02
MAX_LONG_POSITION_SIZE = 0.02

def initialize(context):
    set_slippage(slippage.FixedBasisPointsSlippage(basis_points=1, volume_limit=0.1))
    
    schedule_function(morning_execution,
                      date_rules.week_start(),
                      time_rules.market_open(hours = 0, minutes = 5))
    
    # Create our dynamic stock selector.
    algo.attach_pipeline(make_pipeline(), 'pipeline')


def make_pipeline():
    # Base universe set to the QTradableStocksUS
    base_universe = QTradableStocksUS()

    pipe = Pipeline(
        columns={
            'score': accern_ds2_weekly_sum_hist.score.latest,
        },
        screen=(base_universe & accern_ds2_weekly_sum_hist.score.latest.notnull())
    )

    return pipe


def before_trading_start(context, data):
    context.output = algo.pipeline_output('pipeline')


def morning_execution(context, data):
    # Set objective for our Optimizer
    objective = opt.MaximizeAlpha(context.output['score'])
    
    # Set Constraints
    constrain_gross_leverage = opt.MaxGrossExposure(MAX_GROSS_EXPOSURE)
    dollar_neutral = opt.DollarNeutral(tolerance=0.02)
    constrain_pos_size = opt.PositionConcentration.with_equal_bounds(
        -MAX_SHORT_POSITION_SIZE,
        MAX_LONG_POSITION_SIZE,
    )
    
    algo.order_optimal_portfolio(
        objective=objective,
        constraints=[
            constrain_gross_leverage,
            dollar_neutral,
            constrain_pos_size,
        ],
    )
There was a runtime error.

One thing to note about self-serve data - the scores will be lagged by 1 day from the date provided as the Primary Date when you upload a dataset using the Self-Serve Data tool. If you want your algorithm to be able to trade on the date that you use in this column, you should move all of the dates back by 1 day before uploading. It's on our list to make the 1 day lag configurable when uploading a dataset. I'm thinking this might be one of the reasons that there's a bit of difference between the Fetcher and Self-Serve versions.

Lastly, here are the Pyfolio tearsheets. This is the tearsheet for the first backtest that I shared with default slippage (the slippage that is used in the contest).

Loading notebook preview...
Notebook previews are currently unavailable.

And this is the tearsheet from the version using 1bps slippage.

Loading notebook preview...
Notebook previews are currently unavailable.

@Jamie,

One thing to note about self-serve data - the scores will be lagged by 1 day from the date provided as the Primary Date when you upload a dataset using the Self-Serve Data tool. If you want your algorithm to be able to trade on the date that you use in this column, you should move all of the dates back by 1 day before uploading. It's on our list to make the 1 day lag configurable when uploading a dataset. I'm thinking this might be one of the reasons that there's a bit of difference between the Fetcher and Self-Serve versions.

This info should be highlighted because it does make a difference. Please correct me if I'm wrong, my understanding is that signal that is calculated at the end of the day on, say January 2 should be dated January 3, the day it is supposed to be traded?

I made a small change to the original algorithm to turn off signal layer 1 (that only allowed trades that favored positive momentum of the Accern news signal) to see how the results would change. The change is at line 104. Everything else should be the same. @Brad I'm curious if you got a similar result and the intuition behind the RM signal.

Clone Algorithm
11
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Uniqueness for this template:

# Allows us to reverse longs to shorts directly
# Ignore stocks with no signals on some days which are already in our portfilio




import pandas as pd
import numpy as np

# data_file_sample = 'https://dl.dropboxusercontent.com/s/yzihbscejxr5fvy/6_5_DS2_daily_agg_mean_SMA_21_63_part_1_sample_shorter_for_QA.csv?dl=0'

data_file_part_1 = 'https://dl.dropboxusercontent.com/s/bg3po7nk8nsez7d/6_4_DS2_daily_agg_mean_SMA_21_63_part_1.csv?dl=0'

data_file_part_2 = 'https://dl.dropboxusercontent.com/s/nbppgxkr59q4lz7/6_4_DS2_daily_agg_mean_SMA_21_63_part_2.csv?dl=0'


num_stocks_to_trade = 100
MAX_GROSS_EXPOSURE = 1.0
MAX_SHORT_POSITION_SIZE = 0.1
MAX_LONG_POSITION_SIZE = 0.1
single_metric_string = 'rm_signal'
score_string = 'score'
long_signal = 'long'
short_signal = 'short'
long_half= 0.5
short_half = 0.5


data_take=0
k=0
def merge_data(df):
    global data_take
    global k
    if k==0:
        data_take=df
        k=1
    else:
        frames=[data_take,df]
        data_take=pd.concat(frames)
    return data_take



def initialize(context):
    context.num_stocks_to_trade = num_stocks_to_trade
    set_benchmark(sid(8554))
    set_slippage(slippage.FixedBasisPointsSlippage(basis_points=1, volume_limit=0.1))
    set_commission(commission.PerShare(cost=0, min_trade_cost=0))
    
    schedule_function(my_rebalance,
                      date_rules.every_day(),
                      time_rules.market_open(hours = 0, minutes = 5))

    schedule_function(record_vars,
                      date_rules.every_day(),
                      time_rules.market_close(hours = 0, minutes = 1))   

    # fetch_csv(data_file_sample,
    #       date_column='date',  # Assigning the column label
    #       date_format='%m/%y/%d %H:%M:',  # Using date format from CSV file
    #       mask=False,
    #       # post_func = merge_data,
    #       timezone='EST')   
    
    fetch_csv(data_file_part_1,
          date_column='date',  # Assigning the column label
          date_format='%m/%y/%d %H:%M:',  # Using date format from CSV file
          mask=False,
          post_func = merge_data,
          timezone='EST')

    fetch_csv(data_file_part_2,
          date_column='date',  # Assigning the column label
          date_format='%m/%y/%d %H:%M:',  # Using date format from CSV file
          mask=False,
          post_func = merge_data,
          timezone='EST')
     


def my_rebalance(context, data):  
   
    
    rm_signals = []
    current_date = get_datetime().date()
    for stock in data.fetcher_assets:
        enter_date = data.current(stock, 'enter_date')
        enter_date = pd.to_datetime(enter_date).date()
        if current_date.strftime('%Y-%m-%d %H:%M:%S') == enter_date.strftime('%Y-%m-%d %H:%M:%S'):
            rm_signal = data.current(stock, single_metric_string)
            score = data.current(stock, score_string)
            rm_signals.append([stock, score, rm_signal])
            
    df = pd.DataFrame(rm_signals, columns = ['stock', score_string, single_metric_string])
    df['rank'] = df[score_string].rank(ascending = False)
    df = df.sort_values('rank')
    
    # Firstly, whether go long or short depends on the string signal: 'long' or 'short' on the 'rm_signal' column: 
    #longs_1 = df[df[single_metric_string]==long_signal]['stock'].tolist()
    #shorts_1 = df[df[single_metric_string]==short_signal]['stock'].tolist()
    
    #kay star adjustment to kill signal 1, and rely only on signal 2
    
    longs_1 = df[df[single_metric_string]!='2']['stock'].tolist()
    shorts_1 = df[df[single_metric_string]!='2']['stock'].tolist()
    
    # Secondly, filter out the highest scores for going long and lowest scores for going short
    num_stocks_to_trade = context.num_stocks_to_trade
    if len(df) < 2.0*num_stocks_to_trade:
        num_stocks_to_trade = int(len(df)/2.0)
        # print ('df capacity is smaller than num_stocks_to_trade, opps, actual trade num is {}'.format(num_stocks_to_trade))
        longs_2  = df[:num_stocks_to_trade]['stock'].tolist()
        shorts_2 = df[-num_stocks_to_trade:]['stock'].tolist()
    else:
        longs_2  = df[:num_stocks_to_trade]['stock'].tolist()
        shorts_2 = df[-num_stocks_to_trade:]['stock'].tolist()    
        # print ('df capacity is larger than num_stocks_to_trade, cool, actual trade num is {}'.format(num_stocks_to_trade))    
    
    # Lastly, take the intercept between the two sets of long and short positions: 
    longs = list(set(longs_1) & set(longs_2))
    shorts = list(set(shorts_1) & set(shorts_2))

    print ('longs_all_length: ', len(longs))
    print ('shorts_all_length: ', len(shorts))   
    # print ('Portfolio Value: ', context.portfolio.portfolio_value)
   
    
    
    # Define long_size and short_size in advance:
    # if context.account.leverage >1:
    
    total_potential_positions = list(set(longs + shorts))
    total_potential_size = len(total_potential_positions)
    print ('Current position size: ', len(context.portfolio.positions))
    print ('Long/Short Signal Size: ', len(longs + shorts))
    print ('Total_potential_size: ', total_potential_size)
    long_size = 1.0/total_potential_size
    short_size = 1.0/total_potential_size
        

    
    # print ('Long half for today: ', long_half)
    # print ('Short half for today: ', short_half)

    # try:
    #     long_size = long_half/len(longs)
    #     if long_size > MAX_LONG_POSITION_SIZE:
    #         long_size = MAX_LONG_POSITION_SIZE
            
        
    #     short_size = short_half/len(shorts)
    #     if short_size > MAX_SHORT_POSITION_SIZE:
    #         short_size = MAX_SHORT_POSITION_SIZE    
    # except:
    #     pass
       
    print ('Long size for today: ', long_size)
    print ('Short size for today: ', short_size)
    ################ Execution Logic 1 : Flipping/reversing long/short positions: ################
    try:
        for position in context.portfolio.positions:
            if context.portfolio.positions[position].amount>0 and position in shorts:
                order_target_percent(position, -short_size)
        
            if context.portfolio.positions[position].amount<0 and position in longs:
                order_target_percent(position, long_size)
                
    ################ Execution logic 2: Exiting ################
            if position not in longs and position not in shorts and position in df['stock'].tolist():
                order_target_percent(position, 0)             

    ################ Execution Logic 3: Ignoring stocks with no signals which are already in portfolio: ################
            if position not in df['stock'].tolist():
                pass
                print ('This stock is still in our position, but no signals today, do nothing', position)
            
    except:
        pass

    ################ Execution Logic 4: Entering new long/short stocks: ################
    
    # Enter new longs:
    for stock in longs:
            # if not get_open_orders(stock):
        if data.can_trade(stock) and \
        context.portfolio.positions[stock].amount == 0:
            order_target_percent(stock, long_size)     
              
    # Enter new shorts:
    for stock in shorts:
            # if not get_open_orders(stock):
        if data.can_trade(stock) and \
        context.portfolio.positions[stock].amount == 0:
            order_target_percent(stock, -short_size)

           
def record_vars(context, data):
    long_count = 0
    short_count = 0
    for position in context.portfolio.positions.itervalues():
        if position.amount > 0:
            long_count += 1
        elif position.amount < 0:
            short_count += 1
    record(gross_leverage=context.account.leverage)  # Plot leverage to chart
    record(net_leverage=context.account.net_leverage)
    record(num_longs = long_count)
    record(num_shorts = short_count)
    # record(portfolio_cash = context.portfolio.cash)
    # record(capital_used = context.portfolio.capital_used)
    # record(portfolio_value = context.portfolio.portfolio_value)
    # record(positions_value = context.portfolio.positions_value)
    # record(position_portfolio_ratio = context.portfolio.positions_value/context.portfolio.portfolio_value)
    record(portfolio_size = len(context.portfolio.positions))
There was a runtime error.

@James: You're right, we could probably make this fact more apparent when you actually upload your dataset. We'll revisit the UI when we add the ability to customize the lag on the dataset. For now, you can read more about how datasets are lagged in the Self-Serve Data - How Does It Work noteboook (section 2. Considerations When Creating Your Dataset) and in the help documentation.

Please correct me if I'm wrong, my understanding is that signal that is calculated at the end of the day on, say January 2 should be dated January 3, the day it is supposed to be traded?

I did a poor job explaining what I meant. When you upload a dataset using Self-Serve, each row has a Primary Date which serves as the asof_date in Pipeline. The Self-Serve upload process constructs a timestamp for that row on the next trading day. For example, if I have a row with a Primary Date on 2/14/2015, the asof_date will be 2/14/2015 while the timestamp will be on 2/15/2015. Pipeline can only surface a data point once it's timestamp has passed, so data points with a Primary date of N in the raw file will be surface in a Pipeline on day N+1.

My suggestion to Brad was to submit the raw file with day N-1 in the Primary Date column so that the timestamp added by Self-Serve is on day N (and the data is surface in Pipeline on day N). Going forward, we plan to make it possible to customize the lag to be something other than 1 day.

Does this help?

Hi Jamie,

I uploaded Brad's data from his last backtest to Self Serve to compare to your results. From Brad's original datafile, I sorted it by Date and then lagged the dates by one week. I think now it's more in line with Brad's results. The QTU universe definitely degraded Brad's SP500 universe. Here's the backtest with default slippage:

Clone Algorithm
1
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
import quantopian.algorithm as algo
import quantopian.optimize as opt

from quantopian.pipeline import Pipeline
from quantopian.pipeline.data.builtin import USEquityPricing
from quantopian.pipeline.filters import QTradableStocksUS
from quantopian.pipeline.data.user_50842732cb5e1a02000000d1 import accern_weekly_sum_lag

#Set Maximum portfolio Leverage
MAX_GROSS_EXPOSURE = 1
#Set Maximum Position sizes for individual longs and shorts
MAX_SHORT_POSITION_SIZE = 0.02
MAX_LONG_POSITION_SIZE = 0.02

def initialize(context):
    #set_slippage(slippage.FixedBasisPointsSlippage(basis_points=1, volume_limit=0.1))
    
    schedule_function(morning_execution,
                      date_rules.week_start(),
                      time_rules.market_open(hours = 0, minutes = 5))
    
    # Create our dynamic stock selector.
    algo.attach_pipeline(make_pipeline(), 'pipeline')


def make_pipeline():
    # Base universe set to the QTradableStocksUS
    base_universe = QTradableStocksUS()

    pipe = Pipeline(
        columns={
            'score': accern_weekly_sum_lag.score.latest,
        },
        screen=(base_universe & accern_weekly_sum_lag.score.latest.notnull())
    )

    return pipe


def before_trading_start(context, data):
    context.output = algo.pipeline_output('pipeline')


def morning_execution(context, data):
    # Set objective for our Optimizer
    objective = opt.MaximizeAlpha(context.output['score'])
    
    # Set Constraints
    constrain_gross_leverage = opt.MaxGrossExposure(MAX_GROSS_EXPOSURE)
    dollar_neutral = opt.DollarNeutral(tolerance=0.02)
    constrain_pos_size = opt.PositionConcentration.with_equal_bounds(
        -MAX_SHORT_POSITION_SIZE,
        MAX_LONG_POSITION_SIZE,
    )
    
    algo.order_optimal_portfolio(
        objective=objective,
        constraints=[
            constrain_gross_leverage,
            dollar_neutral,
            constrain_pos_size,
        ],
    )
There was a runtime error.

Here's the backtest with Brad's slippage of 1 bps:

Clone Algorithm
1
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
import quantopian.algorithm as algo
import quantopian.optimize as opt

from quantopian.pipeline import Pipeline
from quantopian.pipeline.data.builtin import USEquityPricing
from quantopian.pipeline.filters import QTradableStocksUS
from quantopian.pipeline.data.user_50842732cb5e1a02000000d1 import accern_weekly_sum_lag

#Set Maximum portfolio Leverage
MAX_GROSS_EXPOSURE = 1
#Set Maximum Position sizes for individual longs and shorts
MAX_SHORT_POSITION_SIZE = 0.02
MAX_LONG_POSITION_SIZE = 0.02

def initialize(context):
    set_slippage(slippage.FixedBasisPointsSlippage(basis_points=1, volume_limit=0.1))
    
    schedule_function(morning_execution,
                      date_rules.week_start(),
                      time_rules.market_open(hours = 0, minutes = 5))
    
    # Create our dynamic stock selector.
    algo.attach_pipeline(make_pipeline(), 'pipeline')


def make_pipeline():
    # Base universe set to the QTradableStocksUS
    base_universe = QTradableStocksUS()

    pipe = Pipeline(
        columns={
            'score': accern_weekly_sum_lag.score.latest,
        },
        screen=(base_universe & accern_weekly_sum_lag.score.latest.notnull())
    )

    return pipe


def before_trading_start(context, data):
    context.output = algo.pipeline_output('pipeline')


def morning_execution(context, data):
    # Set objective for our Optimizer
    objective = opt.MaximizeAlpha(context.output['score'])
    
    # Set Constraints
    constrain_gross_leverage = opt.MaxGrossExposure(MAX_GROSS_EXPOSURE)
    dollar_neutral = opt.DollarNeutral(tolerance=0.02)
    constrain_pos_size = opt.PositionConcentration.with_equal_bounds(
        -MAX_SHORT_POSITION_SIZE,
        MAX_LONG_POSITION_SIZE,
    )
    
    algo.order_optimal_portfolio(
        objective=objective,
        constraints=[
            constrain_gross_leverage,
            dollar_neutral,
            constrain_pos_size,
        ],
    )
There was a runtime error.

Jamie, Brad,

I think the question of which one is right would depend on Brad's interpretation of the dates in his original datafile. If the dates refer to date to trade, then lagging it by one week like I did would be correct. However, if these dates are as of date, then Jamie's version is correct.

From his label, "next week start", it seems like it refers to the trade date but I could be wrong.

Used @key star's version of the program: # Backtest ID: 5b2d410e2c36d2426b131c6c, and commented out the set commission and set slippage lines.

It generated the attached results. I will put out its tearsheet shortly.

Clone Algorithm
0
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Uniqueness for this template:

# Allows us to reverse longs to shorts directly
# Ignore stocks with no signals on some days which are already in our portfilio




import pandas as pd
import numpy as np

# data_file_sample = 'https://dl.dropboxusercontent.com/s/yzihbscejxr5fvy/6_5_DS2_daily_agg_mean_SMA_21_63_part_1_sample_shorter_for_QA.csv?dl=0'

data_file_part_1 = 'https://dl.dropboxusercontent.com/s/bg3po7nk8nsez7d/6_4_DS2_daily_agg_mean_SMA_21_63_part_1.csv?dl=0'

data_file_part_2 = 'https://dl.dropboxusercontent.com/s/nbppgxkr59q4lz7/6_4_DS2_daily_agg_mean_SMA_21_63_part_2.csv?dl=0'


num_stocks_to_trade = 100
MAX_GROSS_EXPOSURE = 1.0
MAX_SHORT_POSITION_SIZE = 0.1
MAX_LONG_POSITION_SIZE = 0.1
single_metric_string = 'rm_signal'
score_string = 'score'
long_signal = 'long'
short_signal = 'short'
long_half= 0.5
short_half = 0.5


data_take=0
k=0
def merge_data(df):
    global data_take
    global k
    if k==0:
        data_take=df
        k=1
    else:
        frames=[data_take,df]
        data_take=pd.concat(frames)
    return data_take



def initialize(context):
    context.num_stocks_to_trade = num_stocks_to_trade
    set_benchmark(sid(8554))
    #set_slippage(slippage.FixedBasisPointsSlippage(basis_points=5, volume_limit=0.1))
    #set_commission(commission.PerShare(cost=0.001, min_trade_cost=0))
    
    schedule_function(my_rebalance,
                      date_rules.every_day(),
                      time_rules.market_open(hours = 0, minutes = 5))

    schedule_function(record_vars,
                      date_rules.every_day(),
                      time_rules.market_close(hours = 0, minutes = 1))   

    # fetch_csv(data_file_sample,
    #       date_column='date',  # Assigning the column label
    #       date_format='%m/%y/%d %H:%M:',  # Using date format from CSV file
    #       mask=False,
    #       # post_func = merge_data,
    #       timezone='EST')   
    
    fetch_csv(data_file_part_1,
          date_column='date',  # Assigning the column label
          date_format='%m/%y/%d %H:%M:',  # Using date format from CSV file
          mask=False,
          post_func = merge_data,
          timezone='EST')

    fetch_csv(data_file_part_2,
          date_column='date',  # Assigning the column label
          date_format='%m/%y/%d %H:%M:',  # Using date format from CSV file
          mask=False,
          post_func = merge_data,
          timezone='EST')
     


def my_rebalance(context, data):  
   
    
    rm_signals = []
    current_date = get_datetime().date()
    for stock in data.fetcher_assets:
        enter_date = data.current(stock, 'enter_date')
        enter_date = pd.to_datetime(enter_date).date()
        if current_date.strftime('%Y-%m-%d %H:%M:%S') == enter_date.strftime('%Y-%m-%d %H:%M:%S'):
            rm_signal = data.current(stock, single_metric_string)
            score = data.current(stock, score_string)
            rm_signals.append([stock, score, rm_signal])
            
    df = pd.DataFrame(rm_signals, columns = ['stock', score_string, single_metric_string])
    df['rank'] = df[score_string].rank(ascending = False)
    df = df.sort_values('rank')
    
    # Firstly, whether go long or short depends on the string signal: 'long' or 'short' on the 'rm_signal' column: 
    #longs_1 = df[df[single_metric_string]==long_signal]['stock'].tolist()
    #shorts_1 = df[df[single_metric_string]==short_signal]['stock'].tolist()
    
    #kay star adjustment to kill signal 1, and rely only on signal 2
    
    longs_1 = df[df[single_metric_string]!='2']['stock'].tolist()
    shorts_1 = df[df[single_metric_string]!='2']['stock'].tolist()
    
    # Secondly, filter out the highest scores for going long and lowest scores for going short
    num_stocks_to_trade = context.num_stocks_to_trade
    if len(df) < 2.0*num_stocks_to_trade:
        num_stocks_to_trade = int(len(df)/2.0)
        # print ('df capacity is smaller than num_stocks_to_trade, opps, actual trade num is {}'.format(num_stocks_to_trade))
        longs_2  = df[:num_stocks_to_trade]['stock'].tolist()
        shorts_2 = df[-num_stocks_to_trade:]['stock'].tolist()
    else:
        longs_2  = df[:num_stocks_to_trade]['stock'].tolist()
        shorts_2 = df[-num_stocks_to_trade:]['stock'].tolist()    
        # print ('df capacity is larger than num_stocks_to_trade, cool, actual trade num is {}'.format(num_stocks_to_trade))    
    
    # Lastly, take the intercept between the two sets of long and short positions: 
    longs = list(set(longs_1) & set(longs_2))
    shorts = list(set(shorts_1) & set(shorts_2))

    print ('longs_all_length: ', len(longs))
    print ('shorts_all_length: ', len(shorts))   
    # print ('Portfolio Value: ', context.portfolio.portfolio_value)
   
    
    
    # Define long_size and short_size in advance:
    # if context.account.leverage >1:
    
    total_potential_positions = list(set(longs + shorts))
    total_potential_size = len(total_potential_positions)
    print ('Current position size: ', len(context.portfolio.positions))
    print ('Long/Short Signal Size: ', len(longs + shorts))
    print ('Total_potential_size: ', total_potential_size)
    long_size = 1.0/total_potential_size
    short_size = 1.0/total_potential_size
        

    
    # print ('Long half for today: ', long_half)
    # print ('Short half for today: ', short_half)

    # try:
    #     long_size = long_half/len(longs)
    #     if long_size > MAX_LONG_POSITION_SIZE:
    #         long_size = MAX_LONG_POSITION_SIZE
            
        
    #     short_size = short_half/len(shorts)
    #     if short_size > MAX_SHORT_POSITION_SIZE:
    #         short_size = MAX_SHORT_POSITION_SIZE    
    # except:
    #     pass
       
    print ('Long size for today: ', long_size)
    print ('Short size for today: ', short_size)
    ################ Execution Logic 1 : Flipping/reversing long/short positions: ################
    try:
        for position in context.portfolio.positions:
            if context.portfolio.positions[position].amount>0 and position in shorts:
                order_target_percent(position, -short_size)
        
            if context.portfolio.positions[position].amount<0 and position in longs:
                order_target_percent(position, long_size)
                
    ################ Execution logic 2: Exiting ################
            if position not in longs and position not in shorts and position in df['stock'].tolist():
                order_target_percent(position, 0)             

    ################ Execution Logic 3: Ignoring stocks with no signals which are already in portfolio: ################
            if position not in df['stock'].tolist():
                pass
                print ('This stock is still in our position, but no signals today, do nothing', position)
            
    except:
        pass

    ################ Execution Logic 4: Entering new long/short stocks: ################
    
    # Enter new longs:
    for stock in longs:
            # if not get_open_orders(stock):
        if data.can_trade(stock) and \
        context.portfolio.positions[stock].amount == 0:
            order_target_percent(stock, long_size)     
              
    # Enter new shorts:
    for stock in shorts:
            # if not get_open_orders(stock):
        if data.can_trade(stock) and \
        context.portfolio.positions[stock].amount == 0:
            order_target_percent(stock, -short_size)

           
def record_vars(context, data):
    long_count = 0
    short_count = 0
    for position in context.portfolio.positions.itervalues():
        if position.amount > 0:
            long_count += 1
        elif position.amount < 0:
            short_count += 1
    record(gross_leverage=context.account.leverage)  # Plot leverage to chart
    record(net_leverage=context.account.net_leverage)
    record(num_longs = long_count)
    record(num_shorts = short_count)
    # record(portfolio_cash = context.portfolio.cash)
    # record(capital_used = context.portfolio.capital_used)
    # record(portfolio_value = context.portfolio.portfolio_value)
    # record(positions_value = context.portfolio.positions_value)
    # record(position_portfolio_ratio = context.portfolio.positions_value/context.portfolio.portfolio_value)
    record(portfolio_size = len(context.portfolio.positions))
There was a runtime error.

Here is the accompanying tearsheet for program with Backtest ID: 5b2d410e2c36d2426b131c6c.

Observations: if you can't distinguish a predictive trading strategy from a possible random occurrence, does it stay predictive?

From the tearsheet, we can observe that the win ratio is at 0.49. Making the program's price predictions fail 51% of the time. This, in a game where just randomly playing long might have given a probable hit rate of about 52%, and this just by flipping a fair coin.

With default commissions and slippage settings, the average net profit per trade was $-0.41 in contradiction with the backtest which is stating that the strategy actually made money. This has been asked before: which are the real numbers we can rely on? Is it the $56k win or the -$37k loss. This, for me, paints a bad picture where I have little confidence in the numbers presented. I think it warrants a fix so that both the backtest and the tearsheet give the same numbers.

This strategy is simply trading on noise. Call it market noise, call it factor noise, call it random-walk noise, call it tweet noise, it appears as if it is all noise. If you want to design something that would appear as good as coin tossing, this is getting close.

Note, that I did increase the initial stake to $10 million which makes the picture even worse.

Is there a justification to make 90,716 trades, have a win rate of 0.49 with a false-positive of 0.51, an average net profit of $-0.41, and call it something that might be good enough for the contest whatever the underlying objective was.

Here, I suspect the outcome of the payoff matrix, giving the tearsheet numbers is: Σ(H ∙ ΔP) = n ∙ x_bar = 90,716 ∙ -0,41 = -$37,049.95. It is in the tearsheet that we can get the total number of round_trips.

Loading notebook preview...
Notebook previews are currently unavailable.

There are some NaNs being ordered in all of the above
The star version does != '2' for both long & short, same stocks for both, not valid, recommend delete.
Blue version: Since equally weighted before, this was my start to an experiment with weights based on score. Suggest changing the 1.2.
It would be interesting to correlate PnL with weights previously assigned. Start of next day would be simplest.

Edit: My ordering section can be simplified further like this

    for s in context.portfolio.positions:  
        if not data.can_trade(s): continue  
        if s in longs.index or s in shrts.index: continue  
        order_target(s, 0)

    for s in longs.index:  
        order_target_percent(s, longs[s])

    for s in shrts.index:  
        order_target_percent(s, shrts[s])  
Clone Algorithm
3
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
''' https://www.quantopian.com/posts/backtest-with-accerns-ml-driven-ds2-dataset-to-generate-daily-strategy-with-sharpe-ratio-of-3-with-trading-costs

Weighted by score, however current positions would also need to be considered
  to target leverage accurately & best.
'''

import pandas as pd

# data_file_sample = 'https://dl.dropboxusercontent.com/s/yzihbscejxr5fvy/6_5_DS2_daily_agg_mean_SMA_21_63_part_1_sample_shorter_for_QA.csv?dl=0'

data_file_part_1 = 'https://dl.dropboxusercontent.com/s/bg3po7nk8nsez7d/6_4_DS2_daily_agg_mean_SMA_21_63_part_1.csv?dl=0'

data_file_part_2 = 'https://dl.dropboxusercontent.com/s/nbppgxkr59q4lz7/6_4_DS2_daily_agg_mean_SMA_21_63_part_2.csv?dl=0'

data_take = 0
k = 0

def merge_data(df):
    global data_take
    global k
    if k == 0:
        data_take = df
        k = 1
    else:
        frames = [data_take, df]
        data_take = pd.concat(frames)
    return data_take

def initialize(context):
    # unchanged from original
    set_slippage(slippage.FixedBasisPointsSlippage(basis_points=1, volume_limit=0.1))
    set_commission(commission.PerShare(cost=0, min_trade_cost=0))

    fetch_csv(data_file_part_1,
          date_column='date',  # Assigning the column label
          date_format='%m/%y/%d %H:%M:',  # Using date format from CSV file
          mask=False,
          post_func = merge_data,
          timezone='EST')

    fetch_csv(data_file_part_2,
          date_column='date',  # Assigning the column label
          date_format='%m/%y/%d %H:%M:',  # Using date format from CSV file
          mask=False,
          post_func = merge_data,
          timezone='EST')

    schedule_function(trade, date_rules.every_day(), time_rules.market_open(minutes=5))

def trade(context, data):
    longs = pd.Series({})
    shrts = pd.Series({})

    current_date = str(get_datetime().date())
    for s in data.fetcher_assets:
        if not data.can_trade(s): continue
        if current_date == data.current(s, 'enter_date'):
            rm_signal = data.current(s, 'rm_signal')
            if   rm_signal == 'long' : longs[s] = data.current(s, 'score')
            elif rm_signal == 'short': shrts[s] = data.current(s, 'score')

    # Normalize 0 to 1 positive (long) & 0 to -1 negative (short)
    longs += abs(longs.min())   # shift to insure all positive
    longs += .1                 # prevent lowest from being zero
    shrts -= abs(shrts.max())
    shrts -= .1
    
    longs = longs[longs > longs.mean() * 1.2]    # strongest scores
    shrts = shrts[shrts < shrts.mean() * 1.2]               
                
    longs /= longs.sum()  # 0 to 1 totaling 1.0
    shrts /= shrts.sum()  # these became positive
    longs *=  .5          # for leverage, to total .5
    shrts *= -.5          # back to negative

    if 'log_data_done' not in context:    # show values once
        log.info('Long')
        log_data(context, data, longs, 8)
        log.info('Short')
        log_data(context, data, shrts, 8)

    for s in context.portfolio.positions:
        if not data.can_trade(s): continue

        if s not in longs.index and s not in shrts.index:
            order_target(s, 0)

        elif s in longs.index:
            order_target_percent(s, longs[s])

        elif s in shrts.index:
            order_target_percent(s, shrts[s])

    for s in longs.index:
        if s in context.portfolio.positions: continue
        if get_open_orders(s): continue
        order_target_percent(s, longs[s])

    for s in shrts.index:
        if s in context.portfolio.positions: continue
        if get_open_orders(s): continue
        order_target_percent(s, shrts[s])

def before_trading_start(context, data):
    
    # Records
    long_count  = 0
    short_count = 0
    for s in context.portfolio.positions:
        if   context.portfolio.positions[s].amount > 0: long_count  += 1
        elif context.portfolio.positions[s].amount < 0: short_count += 1
    #record(net_lvrg = context.account.net_leverage)
    record(lvrg     = context.account.leverage)
    record(num_long = long_count)
    record(num_shrt = short_count)
    record(num_pos  = len(context.portfolio.positions))
    
def log_data(context, data, z, num, filter=None):
    ''' Log info about pipeline output or, z can be any DataFrame or Series
    https://www.quantopian.com/posts/overview-of-pipeline-content-easy-to-add-to-your-backtest
    Modified a bit here from original.
    '''
    # Options
    log_nan_only = 0          # Only log if nans are present
    show_sectors = 0          # If sectors, do you want to see them or not
    show_sorted_details = 1   # [num] high & low securities sorted, each column

    if 'log_init_done' not in context:
        log.info('${}    {} to {}'.format('%.0e' % (context.portfolio.starting_cash),
                get_environment('start').date(), get_environment('end').date()))
    context.log_init_done = 1

    if not len(z):
        log.info('Empty')
        return

    # Series ......
    context.log_data_done = 1 ; padmax = 6
    if 'Series' in str(type(z)):    # is Series, not DataFrame
        nan_count = len(z[z != z])
        nan_count = 'NaNs {}/{}'.format(nan_count, len(z)) if nan_count else ''
        if (log_nan_only and nan_count) or not log_nan_only:
            pad = max(6, len('%.5f' % z.max()))
            log.info('{}{}{}   Series  len {}'.format('min'.rjust(pad+5),
                'mean'.rjust(pad+5), 'max'.rjust(pad+5), len(z)))
            log.info('{}{}{} {}'.format(
                ('%.5f' % z.min()) .rjust(pad+5),
                ('%.5f' % z.mean()).rjust(pad+5),
                ('%.5f' % z.max()) .rjust(pad+5),
                nan_count
            ))
        return

    # DataFrame ......
    content_min_max = [ ['','min','mean','max',''] ] ; content = ''
    for col in z.columns:
        try: z[col].max()
        except: continue        # skip non-numeric
        #if col == 'stock': continue
        #if col == 'rm_signal': continue
        if col == 'sector' and not show_sectors: continue
        nan_count = len(z[col][z[col] != z[col]])
        nan_count = 'NaNs {}/{}'.format(nan_count, len(z)) if nan_count else ''
        padmax    = max( padmax, 6, len(str(z[col].max())) )
        content_min_max.append([col, str(z[col] .min()), str(z[col].mean()), str(z[col] .max()), nan_count])
    if log_nan_only and nan_count or not log_nan_only:
        content = 'Rows: {}  Columns: {}'.format(z.shape[0], z.shape[1])
        if len(z.columns) == 1: content = 'Rows: {}'.format(z.shape[0])

        paddings = [6 for i in range(4)]
        for lst in content_min_max:    # set max lengths
            i = 0
            for val in lst[:4]:    # value in each sub-list
                paddings[i] = max(paddings[i], len(str(val)))
                i += 1
        headr = content_min_max[0]
        content += ('\n{}{}{}{}{}'.format(
             headr[0] .rjust(paddings[0]),
            (headr[1]).rjust(paddings[1]+5),
            (headr[2]).rjust(paddings[2]+5),
            (headr[3]).rjust(paddings[3]+5),
            ''
        ))
        for lst in content_min_max[1:]:    # populate content using max lengths
            content += ('\n{}{}{}{}     {}'.format(
                lst[0].rjust(paddings[0]),
                lst[1].rjust(paddings[1]+5),
                lst[2].rjust(paddings[2]+5),
                lst[3].rjust(paddings[3]+5),
                lst[4],
            ))
        log.info(content)

    if not show_sorted_details: return
    if len(z.columns) == 1:     return     # skip detail if only 1 column
    if filter == None: details = z.columns
    for detail in details:
        if detail == 'sector': continue
        hi = z[details].sort_values(by=detail, ascending=False).head(num)
        lo = z[details].sort_values(by=detail, ascending=False).tail(num)
        content  = ''
        content += ('_ _ _   {}   _ _ _'  .format(detail))
        content += ('\n\t... {} highs\n{}'.format(detail, str(hi)))
        content += ('\n\t... {} lows \n{}'.format(detail, str(lo)))
        if log_nan_only and not len(lo[lo[detail] != lo[detail]]):
            continue  # skip if no nans
        log.info(content)

There was a runtime error.

@Blue, if you had used default commissions and slippage, you would get something like the attached.

Doing stuff without frictional costs is like a waste of time. They won't go away just because they are not properly accounted for in a backtest.

These simulations should try to be as realistic as possible. That is why we “simulate” this stuff in the first place. It is to know if they could be worthwhile in real life. And in real life, a strategy doing 89,424 trades will see both commissions and slippage.

This one's payoff matrix gives: Σ(H ∙ ΔP) = n ∙ x_bar = 89,424 ∙ -11.16 = -$997,971.84.

Clone Algorithm
0
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
''' https://www.quantopian.com/posts/backtest-with-accerns-ml-driven-ds2-dataset-to-generate-daily-strategy-with-sharpe-ratio-of-3-with-trading-costs

Weighted by score, however current positions would also need to be considered
  to target leverage accurately & best.
'''

import pandas as pd

# data_file_sample = 'https://dl.dropboxusercontent.com/s/yzihbscejxr5fvy/6_5_DS2_daily_agg_mean_SMA_21_63_part_1_sample_shorter_for_QA.csv?dl=0'

data_file_part_1 = 'https://dl.dropboxusercontent.com/s/bg3po7nk8nsez7d/6_4_DS2_daily_agg_mean_SMA_21_63_part_1.csv?dl=0'

data_file_part_2 = 'https://dl.dropboxusercontent.com/s/nbppgxkr59q4lz7/6_4_DS2_daily_agg_mean_SMA_21_63_part_2.csv?dl=0'

data_take = 0
k = 0

def merge_data(df):
    global data_take
    global k
    if k == 0:
        data_take = df
        k = 1
    else:
        frames = [data_take, df]
        data_take = pd.concat(frames)
    return data_take

def initialize(context):
    # unchanged from original
    #set_slippage(slippage.FixedBasisPointsSlippage(basis_points=1, volume_limit=0.1))
    #set_commission(commission.PerShare(cost=0, min_trade_cost=0))

    fetch_csv(data_file_part_1,
          date_column='date',  # Assigning the column label
          date_format='%m/%y/%d %H:%M:',  # Using date format from CSV file
          mask=False,
          post_func = merge_data,
          timezone='EST')

    fetch_csv(data_file_part_2,
          date_column='date',  # Assigning the column label
          date_format='%m/%y/%d %H:%M:',  # Using date format from CSV file
          mask=False,
          post_func = merge_data,
          timezone='EST')

    schedule_function(trade, date_rules.every_day(), time_rules.market_open(minutes=5))

def trade(context, data):
    longs = pd.Series({})
    shrts = pd.Series({})

    current_date = str(get_datetime().date())
    for s in data.fetcher_assets:
        if not data.can_trade(s): continue
        if current_date == data.current(s, 'enter_date'):
            rm_signal = data.current(s, 'rm_signal')
            if   rm_signal == 'long' : longs[s] = data.current(s, 'score')
            elif rm_signal == 'short': shrts[s] = data.current(s, 'score')

    # Normalize 0 to 1 positive (long) & 0 to -1 negative (short)
    longs += abs(longs.min())   # shift to insure all positive
    longs += .1                 # prevent lowest from being zero
    shrts -= abs(shrts.max())
    shrts -= .1
    
    longs = longs[longs > longs.mean() * 1.2]    # strongest scores
    shrts = shrts[shrts < shrts.mean() * 1.2]               
                
    longs /= longs.sum()  # 0 to 1 totaling 1.0
    shrts /= shrts.sum()  # these became positive
    longs *=  .5          # for leverage, to total .5
    shrts *= -.5          # back to negative

    if 'log_data_done' not in context:    # show values once
        log.info('Long')
        log_data(context, data, longs, 8)
        log.info('Short')
        log_data(context, data, shrts, 8)

    for s in context.portfolio.positions:
        if not data.can_trade(s): continue

        if s not in longs.index and s not in shrts.index:
            order_target(s, 0)

        elif s in longs.index:
            order_target_percent(s, longs[s])

        elif s in shrts.index:
            order_target_percent(s, shrts[s])

    for s in longs.index:
        if s in context.portfolio.positions: continue
        if get_open_orders(s): continue
        order_target_percent(s, longs[s])

    for s in shrts.index:
        if s in context.portfolio.positions: continue
        if get_open_orders(s): continue
        order_target_percent(s, shrts[s])

def before_trading_start(context, data):
    
    # Records
    long_count  = 0
    short_count = 0
    for s in context.portfolio.positions:
        if   context.portfolio.positions[s].amount > 0: long_count  += 1
        elif context.portfolio.positions[s].amount < 0: short_count += 1
    #record(net_lvrg = context.account.net_leverage)
    record(lvrg     = context.account.leverage)
    record(num_long = long_count)
    record(num_shrt = short_count)
    record(num_pos  = len(context.portfolio.positions))
    
def log_data(context, data, z, num, filter=None):
    ''' Log info about pipeline output or, z can be any DataFrame or Series
    https://www.quantopian.com/posts/overview-of-pipeline-content-easy-to-add-to-your-backtest
    Modified a bit here from original.
    '''
    # Options
    log_nan_only = 0          # Only log if nans are present
    show_sectors = 0          # If sectors, do you want to see them or not
    show_sorted_details = 1   # [num] high & low securities sorted, each column

    if 'log_init_done' not in context:
        log.info('${}    {} to {}'.format('%.0e' % (context.portfolio.starting_cash),
                get_environment('start').date(), get_environment('end').date()))
    context.log_init_done = 1

    if not len(z):
        log.info('Empty')
        return

    # Series ......
    context.log_data_done = 1 ; padmax = 6
    if 'Series' in str(type(z)):    # is Series, not DataFrame
        nan_count = len(z[z != z])
        nan_count = 'NaNs {}/{}'.format(nan_count, len(z)) if nan_count else ''
        if (log_nan_only and nan_count) or not log_nan_only:
            pad = max(6, len('%.5f' % z.max()))
            log.info('{}{}{}   Series  len {}'.format('min'.rjust(pad+5),
                'mean'.rjust(pad+5), 'max'.rjust(pad+5), len(z)))
            log.info('{}{}{} {}'.format(
                ('%.5f' % z.min()) .rjust(pad+5),
                ('%.5f' % z.mean()).rjust(pad+5),
                ('%.5f' % z.max()) .rjust(pad+5),
                nan_count
            ))
        return

    # DataFrame ......
    content_min_max = [ ['','min','mean','max',''] ] ; content = ''
    for col in z.columns:
        try: z[col].max()
        except: continue        # skip non-numeric
        #if col == 'stock': continue
        #if col == 'rm_signal': continue
        if col == 'sector' and not show_sectors: continue
        nan_count = len(z[col][z[col] != z[col]])
        nan_count = 'NaNs {}/{}'.format(nan_count, len(z)) if nan_count else ''
        padmax    = max( padmax, 6, len(str(z[col].max())) )
        content_min_max.append([col, str(z[col] .min()), str(z[col].mean()), str(z[col] .max()), nan_count])
    if log_nan_only and nan_count or not log_nan_only:
        content = 'Rows: {}  Columns: {}'.format(z.shape[0], z.shape[1])
        if len(z.columns) == 1: content = 'Rows: {}'.format(z.shape[0])

        paddings = [6 for i in range(4)]
        for lst in content_min_max:    # set max lengths
            i = 0
            for val in lst[:4]:    # value in each sub-list
                paddings[i] = max(paddings[i], len(str(val)))
                i += 1
        headr = content_min_max[0]
        content += ('\n{}{}{}{}{}'.format(
             headr[0] .rjust(paddings[0]),
            (headr[1]).rjust(paddings[1]+5),
            (headr[2]).rjust(paddings[2]+5),
            (headr[3]).rjust(paddings[3]+5),
            ''
        ))
        for lst in content_min_max[1:]:    # populate content using max lengths
            content += ('\n{}{}{}{}     {}'.format(
                lst[0].rjust(paddings[0]),
                lst[1].rjust(paddings[1]+5),
                lst[2].rjust(paddings[2]+5),
                lst[3].rjust(paddings[3]+5),
                lst[4],
            ))
        log.info(content)

    if not show_sorted_details: return
    if len(z.columns) == 1:     return     # skip detail if only 1 column
    if filter == None: details = z.columns
    for detail in details:
        if detail == 'sector': continue
        hi = z[details].sort_values(by=detail, ascending=False).head(num)
        lo = z[details].sort_values(by=detail, ascending=False).tail(num)
        content  = ''
        content += ('_ _ _   {}   _ _ _'  .format(detail))
        content += ('\n\t... {} highs\n{}'.format(detail, str(hi)))
        content += ('\n\t... {} lows \n{}'.format(detail, str(lo)))
        if log_nan_only and not len(lo[lo[detail] != lo[detail]]):
            continue  # skip if no nans
        log.info(content)
There was a runtime error.

I'm not inspired by any sentiment data I have seen yet but I'm a tools guy and figured I was handing them some tools they might be able to use.

Doing stuff without frictional costs is like a waste of time

... except when concerns other than returns are pretty important.
I always use defaults in my own algos except when I make them ^more^ stringent, not less. I pointed out in the code that the original was unchanged.
My code above is otherwise very different from the original because it has several suggestions. I should have stated: My point was not returns here. The 1.2 thing may have implied that but in reality that was probably a mechanism I was using to try to trick people into being intrigued to dig into it, reading the code, and finding some other stuff they can like, a lot. I would pay people to think differently if I could. Overall it was intended to be instructional, I consider it to be cleaner, easier to understand and to modify. Maybe you ignored my code and just ran it with default slippage & commissions. I should have been more clear. I can do that now.

Others may not agree, but it also has that main point, utilizing scores as weights. Why would anyone do equal weighting on signals that are not equal? Answer: If you look at that score normalization you can see it isn't exactly easy. There might be a better way that someone can point out but I think it ought to be considered a gift to have a start laid out there. Jamie's code makes that about as simple as can be and yet folks ought to know I saw scores as low as -100 that were marked as long in the original. I didn't read how the scoring is done. But it's good to know things, and to see that, use log_data() on df. Look at what's there, try some filtering, and correlate PnL with weights previously assigned to see whether any patterns show up that can lead to improvement.

The changes are not presented as the holy grail but instead to shine a light on some alternate paths that some might like as a possibly smoother way to maybe get there eventually in the future as I considered them to be in the weeds. I thought it was already obvious to everyone in the thread that the original does poorly with defaults. Sentiment has important potential so I want to encourage the pioneers who are trying to move towards it. My algo above also provides visibility into the data one is working with, log_data(). You should all try it with your own code. Some might even improve on it.

@Blue, my views had nothing to do with your code. I admire the work you do and read all your posts.

If there is value in that script, it is not seen in the code. At the very least, it was not demonstrated. However, the set of premises might be at fault.

First, the program is structured as a fixed-ratio betting system. Which right off will suffer return degradation as time increases.

Second, it tries to interpret, separate, and average positive and negative tweets, as if they thought they could. They do such a marvelous job at it that their hit rate is 49%.

The attached notebook states that they were right 43,800 times and wrong 45,624 times on 89,424 trades. It is a sufficiently large sample size to say that the obtained averages are pretty good estimates of what is.

Now, for me, something trying to predict should be right more than half the time. Otherwise, I could flip a coin and do as good a job (notice that coin flipping is a one-liner in Python).

If something that is supposed to predict market sentiment gets only half the trades right we are left with two solutions. One. The data series appear random-like and is not amenable to release any alpha. Two. The trade triggering mechanism is acting random-like and produces the same results. This strategy is trading on noise, not only that, it will churn an account to oblivion.

My conclusion is that there is nothing in those tweets. I find it amazing to see people not ready to accept some advice from Mr. Buffett for instance, but then stand ready and willing to listen to any Dick and Harry that can type a few words on Twitter. How much confidence can I have on an anonymous tweet? Won't some work and others not? And this, without knowing which one will or will not until after the fact? Should I follow tipsters and pink sheets of old?

Maybe some should re-read Fooled by Randomness.

Loading notebook preview...
Notebook previews are currently unavailable.

@Guy,

IN your last post you write:

"43,800 times and wrong 45,624 times on 89,424 trades. It is a sufficiently large sample size to say that the obtained averages are pretty good estimates of what is. Now, for me, something trying to predict should be right more than half the time."

Shouldn't one weight this ratio with the associated gains/losses? So, an algo could have difficulty calling the correct shot when the security doesn't change much, but have better success calling more significant movement?

/Luc

Another question:

@Brad

Why did you use the slippage: set_slippage(slippage.FixedBasisPointsSlippage(basis_points=1, volume_limit=0.1))

Any reason behind that choice?

/Luc

@Luc, to your question: yes. In this case, it painted an even worse picture. So, I refrained.

A basic formula for a portfolio's trading strategy is: F(t) = F(0) + Σ(H∙ΔP) – Σ(exp.) = n∙x_bar where n is the number of trades and x_bar is the average net profit per trade. That there be 1,000 or 90,000 trades. And, if the payoff matrix is negative, it is not very good.

While some are quick to pass judgement on the the merits of news sentiment as pure noise, I believe Brad from Accern is basically just illustrating how their proprietary news sentiment data can be used within the Q framework. It should also be noted that while the results of this single factor under default commissions and slippage falls apart, I believe it is best evaluated when combined with other possible alpha factors where hidden relationships might exist and augment results.

@James, such an argumentation sounds a lot like a Parrondo Paradox where combining losings strategies a special way you could produce a winning one. Good luck.

@Guy,

a Parrondo Paradox where combining losings strategies a special way you could produce a winning one

I hope you had your reading glasses on when I said:

I believe it is best evaluated when combined with other possible alpha factors where hidden relationships might exist and augment results

Where in that statement did I say or imply that I was "combining losings strategies"? You are making wrong assumptions on my behalf, thank you for that!
It is a well known technique of AI/ML algos for Financial Applications called ensemble that combines different factors or strategies to extract hidden non linear ( or linear) relationships that optimizes your given objective function. If a factor or strategy is deemed to be noise, then it is minimally or zero weighted and that is the beauty of AI algos in processing relative inputs, something you know very little about.

I noticed you have the habit of taking other people's algos, tweaking them and commenting on it. Nothing wrong with that, just an observation. I believe you are a published author of investment strategies but I haven't seen any original algo from you posted here in Q forum to share. Please share and educate us further. This is what I call the Fleury Paradox, no pun intended, just a term of endearment. Cheers, Guy!

@James, well, for one, I am not that good in Python, still learning.

When you design trading strategies there comes a level at which you are absolutely not interested in sharing your code. You accept to explain what it does, show some backtest analysis results since you might estimate that no one will be able to reverse engineer your code.

Note that the Parrondo Paradox (which in reality is not) made its mark in the late '90s. It combined two apparently losing strategies to generate an assured winning one if certain properties were met. One was that one of the three strategies had a 70+ hit rate! And if you had a strategy like that, then why in the world would you want to play the other two losing ones?

Here is an underscore.

Any strategy has for payoff matrix: Σ(H∙ΔP). We would both agree to that. If I want to compare strategies, All I need is asked: is Σ(H(a)∙ΔP) > Σ(H(b)∙ΔP) ? If I answer yes, then any combination of those two will produce less than strategy (a): (1 - λ)∙Σ(H(a)∙ΔP) + λ∙Σ(H(b)∙ΔP) ≤ Σ(H(a)∙ΔP).

If you design a trading strategy that has a 50% hit rate, how do you distinguish it from what would appear as a random-walk down somewhere? If you have a time series that is acting like the flip of a coin and apply some betting “strategy” on it, it won't matter what it is, it will generate a 50/50 scenario. You need a positive predictable edge to make the betting strategy worthwhile, and that is what that strategy failed to demonstrated using its data. Making ML, DL, or coin tossing irrelevant. Well, actually, coin flipping would give you the 50/50 win/loss ratio, but it would not bring with it any profits either.

If the trading strategy presented wanted to show that its data was worthwhile, then it failed to show that it was significantly different from random since flipping a coin could have provided about the same results. And that is the point I raised. It is independent of the value of the trading method used. It only says that that particular trading strategy is completely worthless, and even more so if you include default frictional costs. As said before: “this strategy is trading on noise, not only that, it will churn an account to oblivion”.

Now, the demonstration that it is not so is not mine to make.

I am of the binary type when looking at trading strategies, and I have looked at thousands: they work or they don't. They outperform the benchmarks over the long term or they don't.

If you want to show me a special way of losing, I will look at it. I not only collect stuff that appears to work, I also collect what does not as examples of what not to do.

I have no bad intentions. But I do tend to say it as I see it. Sometimes it is not flattering for a strategy. It has nothing to do with the individual, only the concepts, data, and strategy presented.

Hi Guy,
You say:

I am of the binary type when looking at trading strategies

Obviously, you are stuck in the linear world.

If you have a time series that is acting like the flip of a coin and apply some betting “strategy” on it, it won't matter what it is, it will generate a 50/50 scenario

Coin flipping has only two possible outcomes: heads or tails. Trading has three possible outcomes: profit, loss or even. So probability distributions are different.
It is very common,specially in HFT to have a very profitable strategy with below 50% hit rate. One can have s 35% hit rate and still be profitable by having high returns on true positives and small loses on false positives. Everything is relative when analyzing trade performance.

@ All

Didn’t expect this much of feedbacks and comments from you guys! Here is the weekly-sum-aggregated strategy’s performance using Quantopian’s self-serve data pipeline with the previous 1 bps model.

@ Jamie & James:
Thanks for your feedbacks and strategies! However, they are a little different from what I produced. The reason is actually what you guys have mentioned, i.e. date shifting issue. In my understanding, in stead of shifting a whole week back, I think I only need to shift back 1 day for all dates in the datasets. The original DS2 dataset I uploaded to build the strategy is assuming I make the trades at the market open on THAT day. But the self-serve data pipeline shifts all the dates in the dataset forward by 1 day while running backtests.

So I shifted all dates in the DS2 dataset back by 1 day in order to adapt to the pipeline’s mechanism.

Clone Algorithm
6
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
import quantopian.algorithm as algo
import quantopian.optimize as opt
from quantopian.pipeline import Pipeline
from quantopian.pipeline.data.builtin import USEquityPricing
from quantopian.pipeline.filters import QTradableStocksUS
from quantopian.pipeline.data.user_575f0e61946dff77e500080f import ds2_weekly_data_sum_day_shifted_back

#Set Maximum portfolio Leverage
MAX_GROSS_EXPOSURE = 1
#Set Maximum Position sizes for individual longs and shorts
MAX_SHORT_POSITION_SIZE = 0.02
MAX_LONG_POSITION_SIZE = 0.02

def initialize(context):
    set_slippage(slippage.FixedBasisPointsSlippage(basis_points=1, volume_limit=0.1))
    
    schedule_function(morning_execution,
                      date_rules.week_start(),
                      time_rules.market_open(hours = 0, minutes = 5))
    
    # Create our dynamic stock selector.
    algo.attach_pipeline(make_pipeline(), 'pipeline')


def make_pipeline():
    # Base universe set to the QTradableStocksUS
    base_universe = QTradableStocksUS()

    pipe = Pipeline(
        columns={
            'score': ds2_weekly_data_sum_day_shifted_back.score.latest,
        },
        screen=(base_universe & ds2_weekly_data_sum_day_shifted_back.score.latest.notnull())
    )

    return pipe


def before_trading_start(context, data):
    context.output = algo.pipeline_output('pipeline')


def morning_execution(context, data):
    # Set objective for our Optimizer
    objective = opt.MaximizeAlpha(context.output['score'])
    
    # Set Constraints
    constrain_gross_leverage = opt.MaxGrossExposure(MAX_GROSS_EXPOSURE)
    dollar_neutral = opt.DollarNeutral(tolerance=0.02)
    constrain_pos_size = opt.PositionConcentration.with_equal_bounds(
        -MAX_SHORT_POSITION_SIZE,
        MAX_LONG_POSITION_SIZE,
    )
    
    algo.order_optimal_portfolio(
        objective=objective,
        constraints=[
            constrain_gross_leverage,
            dollar_neutral,
            constrain_pos_size,
        ],
    )
There was a runtime error.

Here is the weekly-mean-aggregated strategy’s performance using Quantopian’s self-serve data pipeline with the previous 1 bps model.

As can be seen both weekly-mean and weekly-sum strategy performances are actually reduced compared to my original uploaded two weekly strategies' performances, mainly because of the base_universe = QTradableStocksUS() I used in the code in order to meet all Quantopian's contest requirements. So in the process, I assume those stocks that are not qualified by the standard of QTradableStocksUS() cannot be executed.

Clone Algorithm
3
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# This time, let me switch the commissions model to the default version.

import quantopian.algorithm as algo
import quantopian.optimize as opt
from quantopian.pipeline import Pipeline
from quantopian.pipeline.data.builtin import USEquityPricing
from quantopian.pipeline.filters import QTradableStocksUS
from quantopian.pipeline.data.user_575f0e61946dff77e500080f import ds2_weekly_data_mean_day_shifted_back

#Set Maximum portfolio Leverage
MAX_GROSS_EXPOSURE = 1
#Set Maximum Position sizes for individual longs and shorts
MAX_SHORT_POSITION_SIZE = 0.02
MAX_LONG_POSITION_SIZE = 0.02

def initialize(context):
    set_slippage(slippage.FixedBasisPointsSlippage(basis_points=1, volume_limit=0.1))
    
    schedule_function(morning_execution,
                      date_rules.week_start(),
                      time_rules.market_open(hours = 0, minutes = 5))
    
    # Create our dynamic stock selector.
    algo.attach_pipeline(make_pipeline(), 'pipeline')


def make_pipeline():
    # Base universe set to the QTradableStocksUS
    base_universe = QTradableStocksUS()

    pipe = Pipeline(
        columns={
            'score': ds2_weekly_data_mean_day_shifted_back.score.latest,
        },
        screen=(base_universe & ds2_weekly_data_mean_day_shifted_back.score.latest.notnull())
    )

    return pipe


def before_trading_start(context, data):
    context.output = algo.pipeline_output('pipeline')


def morning_execution(context, data):
    # Set objective for our Optimizer
    objective = opt.MaximizeAlpha(context.output['score'])
    
    # Set Constraints
    constrain_gross_leverage = opt.MaxGrossExposure(MAX_GROSS_EXPOSURE)
    dollar_neutral = opt.DollarNeutral(tolerance=0.02)
    constrain_pos_size = opt.PositionConcentration.with_equal_bounds(
        -MAX_SHORT_POSITION_SIZE,
        MAX_LONG_POSITION_SIZE,
    )
    
    algo.order_optimal_portfolio(
        objective=objective,
        constraints=[
            constrain_gross_leverage,
            dollar_neutral,
            constrain_pos_size,
        ],
    )
There was a runtime error.

@Brad,

We have to be extra careful about the dates and data frequencies here when we upload to Self Serve Data and it can be tricky. In Self Serve Data, the Primary Date is the as of date and marks it plus 1 day for the Trade date . In your case, your data frequency is weekly and your algo schedules it to trade on week_start.

Let's do a concrete example, you aggregated news and scored them as weekly sums from Monday, June 18, 2018 to Friday, June 23, 2018 for trading date Monday, June 25, 2018. Since your scheduled function to trade (morning execution) is week_start, you trade every Monday and so your as of date (Primary Date) should always be the prior Friday. If this is what you just did, then you are correct. In your statement you said, " I think I only need to shift back 1 day for all dates in the datasets". I'm not sure if this is correct. Let's have Jamie or some other Q team member weigh in on this.

Here is the weekly-sum-aggregated strategy’s performance using Quantopian’s self-serve data pipeline with the default commissions model.

Clone Algorithm
6
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# This time, let me switch the commissions model to the default version.

import quantopian.algorithm as algo
import quantopian.optimize as opt
from quantopian.pipeline import Pipeline
from quantopian.pipeline.data.builtin import USEquityPricing
from quantopian.pipeline.filters import QTradableStocksUS
from quantopian.pipeline.data.user_575f0e61946dff77e500080f import ds2_weekly_data_sum_day_shifted_back

#Set Maximum portfolio Leverage
MAX_GROSS_EXPOSURE = 1
#Set Maximum Position sizes for individual longs and shorts
MAX_SHORT_POSITION_SIZE = 0.02
MAX_LONG_POSITION_SIZE = 0.02

def initialize(context):
    # set_slippage(slippage.FixedBasisPointsSlippage(basis_points=1, volume_limit=0.1))
    
    schedule_function(morning_execution,
                      date_rules.week_start(),
                      time_rules.market_open(hours = 0, minutes = 5))
    
    # Create our dynamic stock selector.
    algo.attach_pipeline(make_pipeline(), 'pipeline')


def make_pipeline():
    # Base universe set to the QTradableStocksUS
    base_universe = QTradableStocksUS()

    pipe = Pipeline(
        columns={
            'score': ds2_weekly_data_sum_day_shifted_back.score.latest,
        },
        screen=(base_universe & ds2_weekly_data_sum_day_shifted_back.score.latest.notnull())
    )

    return pipe


def before_trading_start(context, data):
    context.output = algo.pipeline_output('pipeline')


def morning_execution(context, data):
    # Set objective for our Optimizer
    objective = opt.MaximizeAlpha(context.output['score'])
    
    # Set Constraints
    constrain_gross_leverage = opt.MaxGrossExposure(MAX_GROSS_EXPOSURE)
    dollar_neutral = opt.DollarNeutral(tolerance=0.02)
    constrain_pos_size = opt.PositionConcentration.with_equal_bounds(
        -MAX_SHORT_POSITION_SIZE,
        MAX_LONG_POSITION_SIZE,
    )
    
    algo.order_optimal_portfolio(
        objective=objective,
        constraints=[
            constrain_gross_leverage,
            dollar_neutral,
            constrain_pos_size,
        ],
    )
There was a runtime error.

Here is the weekly-mean-aggregated strategy’s performance using Quantopian’s self-serve data pipeline with the default commissions model.

Clone Algorithm
3
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# This time, let me switch the commissions model to the default version.

import quantopian.algorithm as algo
import quantopian.optimize as opt
from quantopian.pipeline import Pipeline
from quantopian.pipeline.data.builtin import USEquityPricing
from quantopian.pipeline.filters import QTradableStocksUS
from quantopian.pipeline.data.user_575f0e61946dff77e500080f import ds2_weekly_data_mean_day_shifted_back

#Set Maximum portfolio Leverage
MAX_GROSS_EXPOSURE = 1
#Set Maximum Position sizes for individual longs and shorts
MAX_SHORT_POSITION_SIZE = 0.02
MAX_LONG_POSITION_SIZE = 0.02

def initialize(context):
    # set_slippage(slippage.FixedBasisPointsSlippage(basis_points=1, volume_limit=0.1))
    
    schedule_function(morning_execution,
                      date_rules.week_start(),
                      time_rules.market_open(hours = 0, minutes = 5))
    
    # Create our dynamic stock selector.
    algo.attach_pipeline(make_pipeline(), 'pipeline')


def make_pipeline():
    # Base universe set to the QTradableStocksUS
    base_universe = QTradableStocksUS()

    pipe = Pipeline(
        columns={
            'score': ds2_weekly_data_mean_day_shifted_back.score.latest,
        },
        screen=(base_universe & ds2_weekly_data_mean_day_shifted_back.score.latest.notnull())
    )

    return pipe


def before_trading_start(context, data):
    context.output = algo.pipeline_output('pipeline')


def morning_execution(context, data):
    # Set objective for our Optimizer
    objective = opt.MaximizeAlpha(context.output['score'])
    
    # Set Constraints
    constrain_gross_leverage = opt.MaxGrossExposure(MAX_GROSS_EXPOSURE)
    dollar_neutral = opt.DollarNeutral(tolerance=0.02)
    constrain_pos_size = opt.PositionConcentration.with_equal_bounds(
        -MAX_SHORT_POSITION_SIZE,
        MAX_LONG_POSITION_SIZE,
    )
    
    algo.order_optimal_portfolio(
        objective=objective,
        constraints=[
            constrain_gross_leverage,
            dollar_neutral,
            constrain_pos_size,
        ],
    )
There was a runtime error.

@ James:

Thanks for the important reminder. Yes, we should pay very much attention here.

I actually confirmed it by going into the backtest's transaction details in the old version(by adding 'old' at the end of the URL link of the backtest result) backtest results to check what stocks are traded on which dates. It turns out my interpretation is correct when I compare my most updated results with the most original results I posted. They are mostly the same (some very small part can be different because in the most recent strategies I used QtradableStocksUS).

@Brad & James: Subtracting one day from the Primary Date column in the raw file should work. If a Monday data point is backed up to Sunday in the raw .csv file, it should be surfaced when the Pipeline is computed on Monday after it is uploaded via Self-Serve. Of course, the answer could depend on what sort of Pipeline you use. My suggestion would be to inspect the Pipeline you got when you shifted the data back by a day and make sure you're getting 5 fresh data points per week. We're hoping to make this process easier with an upcoming customizable lag feature.

Thanks, Jamie!

Brad, do you happen to have a daily frequency datafile that I can test drive in one of my ML algos that currently uses sentiment data from another provider?
I would like to replace and compare them.

Thanks Jamie and James!

Sure James, feel free to shoot me an email at [email protected]. I can send you the original daily DS2 dataset for you to test.

Attached is the updated DS2 daily strategy, built on the same logic and template of the most original one in this post's very beginning. The difference here, as can be seen in the code, is to integrate with Quantopian's QTradableStocksUS and self-serve pipelines. The trading cost part still assumes no commissions and only 1 bps slippage with volume_limit = 10% just to compare with the original version.

Clone Algorithm
6
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Uniqueness for this template:

# Allows us to reverse longs to shorts directly
# Ignore stocks with no signals on some days which are already in our portfilio



import quantopian.algorithm as algo
import quantopian.optimize as opt
from quantopian.pipeline import Pipeline
from quantopian.algorithm import attach_pipeline, pipeline_output
import pandas as pd
import numpy as np
from quantopian.pipeline.filters import QTradableStocksUS
# from quantopian.pipeline.data.user_575f0e61946dff77e500080f import ds2_sample_official
from quantopian.pipeline.data.user_575f0e61946dff77e500080f import ds2_21_minus_63_all_self_serve_official
# from quantopian.pipeline.data.user_50842732cb5e1a02000000d1 import ds2_21_minus_63_all_data_for_self_serve_pipeline

# data_file_sample = 'https://dl.dropboxusercontent.com/s/yzihbscejxr5fvy/6_5_DS2_daily_agg_mean_SMA_21_63_part_1_sample_shorter_for_QA.csv?dl=0'

# data_file_part_1 = 'https://dl.dropboxusercontent.com/s/bg3po7nk8nsez7d/6_4_DS2_daily_agg_mean_SMA_21_63_part_1.csv?dl=0'

# data_file_part_2 = 'https://dl.dropboxusercontent.com/s/nbppgxkr59q4lz7/6_4_DS2_daily_agg_mean_SMA_21_63_part_2.csv?dl=0'


# num_stocks_to_trade = 100
MAX_GROSS_EXPOSURE = 1.0
MAX_SHORT_POSITION_SIZE = 0.1
MAX_LONG_POSITION_SIZE = 0.1
single_metric_string = 'rm_signal'
score_string = 'score'
long_signal = 'long'
short_signal = 'short'
long_half= 0.5
short_half = 0.5

def initialize(context):
    set_benchmark(sid(8554))
    set_slippage(slippage.FixedBasisPointsSlippage(basis_points=1, volume_limit=0.1))
    # set_commission(commission.PerShare(cost=0.001, min_trade_cost=0))
    
    schedule_function(my_rebalance,
                      date_rules.every_day(),
                      time_rules.market_open(hours = 0, minutes = 5))

    schedule_function(record_vars,
                      date_rules.every_day(),
                      time_rules.market_close(hours = 0, minutes = 1))   

    attach_pipeline(make_pipeline(), 'pipeline')


     

def make_pipeline():
    # Base universe set to the QTradableStocksUS
    base_universe = QTradableStocksUS()
    
    scores = ds2_21_minus_63_all_self_serve_official.score.latest
    rm_signal = ds2_21_minus_63_all_self_serve_official.rm_signal.latest
    tickers = ds2_21_minus_63_all_self_serve_official.symbol.latest
    
    
    pipe = Pipeline(
        columns={
            'scores': scores,
            'rm_signal': rm_signal,
            'tickers': tickers,
        },
        screen=(base_universe & scores.notnull())
    )
    return pipe        
    
        
# def before_trading_start(context, data):
def my_rebalance(context, data):
    """
    Get pipeline results.
    """

    # Gets our pipeline output every day.
    pipe_results = pipeline_output('pipeline')
    # print (pipe_results)
    df = pd.DataFrame(pipe_results)
    
    df['rank'] = df['scores'].rank(ascending = False)
    df = df.sort_values('rank')
    print (df.index)
    
    #############################  Now we create the two layers of long/short signals:  #############################  
    # Firstly, whether go long or short depends on the string signal: 'long' or 'short' on the 'rm_signal' column: 
    longs_1 = df[df['rm_signal']=='long'].index.tolist()
    shorts_1 = df[df['rm_signal']=='short'].index.tolist()
    
    # Secondly, filter out the highest scores for going long and lowest scores for going short
    num_stocks_to_trade = 100
    if len(df) < 2.0*num_stocks_to_trade:
        num_stocks_to_trade = int(len(df)/2.0)
        # print ('df capacity is smaller than num_stocks_to_trade, opps, actual trade num is {}'.format(num_stocks_to_trade))
        longs_2  = df[:num_stocks_to_trade].index.tolist()
        shorts_2 = df[-num_stocks_to_trade:].index.tolist()
    else:
        longs_2  = df[:num_stocks_to_trade].index.tolist()
        shorts_2 = df[-num_stocks_to_trade:].index.tolist()    
        # print ('df capacity is larger than num_stocks_to_trade, cool, actual trade num is {}'.format(num_stocks_to_trade))    
    
    # Lastly, take the intercept between the two sets of long and short positions: 
    longs = list(set(longs_1) & set(longs_2))
    shorts = list(set(shorts_1) & set(shorts_2))

    print ('longs_all_length: ', len(longs))
    print ('shorts_all_length: ', len(shorts))   

    total_potential_positions = list(set(longs + shorts))
    total_potential_size = len(total_potential_positions)
    print ('Current position size: ', len(context.portfolio.positions))
    print ('Long/Short Signal Size: ', len(longs + shorts))
    print ('Total_potential_size: ', total_potential_size)
    long_size = 1.0/total_potential_size
    short_size = 1.0/total_potential_size

       
    print ('Long size for today: ', long_size)
    print ('Short size for today: ', short_size)
    ################ Execution Logic 1 : Flipping/reversing long/short positions: ################
    try:
        for position in context.portfolio.positions:
            if context.portfolio.positions[position].amount>0 and position in shorts:
                order_target_percent(position, -short_size)
        
            if context.portfolio.positions[position].amount<0 and position in longs:
                order_target_percent(position, long_size)
                
    ################ Execution logic 2: Exiting ################
            if position not in longs and position not in shorts and position in df.index.tolist():
                order_target_percent(position, 0)             

    ################ Execution Logic 3: Ignoring stocks with no signals which are already in portfolio: ################
            if position not in df['tickers'].tolist():
                pass
                print ('This stock is still in our position, but no signals today, do nothing', position)
            
    except:
        pass

    ################ Execution Logic 4: Entering new long/short stocks: ################
    
    # Enter new longs:
    for stock in longs:
        print ('stock to long: ', stock)
            # if not get_open_orders(stock):
        if data.can_trade(stock) and \
        context.portfolio.positions[stock].amount == 0:
            order_target_percent(stock, long_size)     
              
    # Enter new shorts:
    for stock in shorts:
        print ('stock to short: ', stock)
            # if not get_open_orders(stock):
        if data.can_trade(stock) and \
        context.portfolio.positions[stock].amount == 0:
            order_target_percent(stock, -short_size)


        
           
def record_vars(context, data):
    long_count = 0
    short_count = 0
    for position in context.portfolio.positions.itervalues():
        if position.amount > 0:
            long_count += 1
        elif position.amount < 0:
            short_count += 1
    record(gross_leverage=context.account.leverage)  # Plot leverage to chart
    record(net_leverage=context.account.net_leverage)
    record(num_longs = long_count)
    record(num_shorts = short_count)
    record(portfolio_size = len(context.portfolio.positions))
There was a runtime error.

Also, thanks for all you guys who are providing the insights/concerns/questions of trading costs in this post. Commissions and slippage costs are always a complicated and tricky issue in this field.

@ Luc

The post here is mostly a showcase to demonstrate DS2's predictive capability and also is based on the fact that the relatively high-turnover daily DS2 strategy in this post is more suitable for larger trading institutions (the feedbacks we have already received from a variety of large institutions/clients in the industry), which usually have much better deals and benefits provided by brokers/market makers and also are equipped with more specialized infrastructures/teams to help bring down commissions and slippage costs, compared to using Quantopian's default trading costs model, which I assume is more in a retail/individual trading situation (correct me if I'm wrong).

Under this assumption, I created another one with the estimated slippage model of set_slippage(slippage.FixedBasisPointsSlippage(basis_points=2.5, volume_limit=0.1)), with a modified commission's costs of set_commission(commission.PerShare(cost=0.001, min_trade_cost=0.1)).

Clone Algorithm
6
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Uniqueness for this template:
 
# Allows us to reverse longs to shorts directly
# Ignore stocks with no signals on some days which are already in our portfilio
# Include modified slippage and commission costs
 
 
import quantopian.algorithm as algo
import quantopian.optimize as opt
from quantopian.pipeline import Pipeline
from quantopian.algorithm import attach_pipeline, pipeline_output
import pandas as pd
import numpy as np
from quantopian.pipeline.filters import QTradableStocksUS
# from quantopian.pipeline.data.user_575f0e61946dff77e500080f import ds2_sample_official
from quantopian.pipeline.data.user_575f0e61946dff77e500080f import ds2_21_minus_63_all_self_serve_official
 
 
def initialize(context):
    set_benchmark(sid(8554))
    set_slippage(slippage.FixedBasisPointsSlippage(basis_points=2.5, volume_limit=0.1))
    set_commission(commission.PerShare(cost=0.001, min_trade_cost=0.1))
    
    schedule_function(my_rebalance,
                      date_rules.every_day(),
                      time_rules.market_open(hours = 0, minutes = 5))
 
    schedule_function(record_vars,
                      date_rules.every_day(),
                      time_rules.market_close(hours = 0, minutes = 1))   
 
    attach_pipeline(make_pipeline(), 'pipeline')
 
 
     
 
def make_pipeline():
    # Base universe set to the QTradableStocksUS
    base_universe = QTradableStocksUS()
    
    scores = ds2_21_minus_63_all_self_serve_official.score.latest
    rm_signal = ds2_21_minus_63_all_self_serve_official.rm_signal.latest
    tickers = ds2_21_minus_63_all_self_serve_official.symbol.latest
    
    
    pipe = Pipeline(
        columns={
            'scores': scores,
            'rm_signal': rm_signal,
            'tickers': tickers,
        },
        screen=(base_universe & scores.notnull())
    )
    return pipe        
    
        
# def before_trading_start(context, data):
def my_rebalance(context, data):
    """
    Get pipeline results.
    """
 
    # Gets our pipeline output every day.
    pipe_results = pipeline_output('pipeline')
    # print (pipe_results)
    df = pd.DataFrame(pipe_results)
    df['rank'] = df['scores'].rank(ascending = False)
    df = df.sort_values('rank')
    print (df.index)
    
    #############################  Now we create the two layers of long/short signals:  #############################  
    # Firstly, whether go long or short depends on the string signal: 'long' or 'short' on the 'rm_signal' column: 
    longs_1 = df[df['rm_signal']=='long'].index.tolist()
    shorts_1 = df[df['rm_signal']=='short'].index.tolist()
    
    # Secondly, pick the highest scores for going long and lowest scores for going short
    num_stocks_to_trade = 100
    if len(df) < 2.0*num_stocks_to_trade:
        num_stocks_to_trade = int(len(df)/2.0)
        longs_2  = df[:num_stocks_to_trade].index.tolist()
        shorts_2 = df[-num_stocks_to_trade:].index.tolist()
    else:
        longs_2  = df[:num_stocks_to_trade].index.tolist()
        shorts_2 = df[-num_stocks_to_trade:].index.tolist()       
    
    # Lastly, take the intercection between two sets of long and short positions: 
    longs = list(set(longs_1) & set(longs_2))
    shorts = list(set(shorts_1) & set(shorts_2))
 
    total_potential_positions = list(set(longs + shorts))
    total_potential_size = len(total_potential_positions)
 
    long_size = 1.0/total_potential_size
    short_size = 1.0/total_potential_size
 
       
 
    ################ Execution Logic 1 : Flipping/reversing long/short positions: ################
    try:
        for position in context.portfolio.positions:
            if context.portfolio.positions[position].amount>0 and position in shorts:
                order_target_percent(position, -short_size)
        
            if context.portfolio.positions[position].amount<0 and position in longs:
                order_target_percent(position, long_size)
                
    ################ Execution logic 2: Exiting ################
            if position not in longs and position not in shorts and position in df.index.tolist():
                order_target_percent(position, 0)             
 
    ################ Execution Logic 3: Ignoring stocks with no signals which are already in portfolio: ################
            if position not in df['tickers'].tolist():
                pass
                print ('This stock is still in our position, but no signals today, do nothing', position)
            
    except:
        pass
 
    ################ Execution Logic 4: Entering new long/short stocks: ################
    
    # Enter new longs:
    for stock in longs:
        print ('stock to long: ', stock)
            # if not get_open_orders(stock):
        if data.can_trade(stock) and \
        context.portfolio.positions[stock].amount == 0:
            order_target_percent(stock, long_size)     
              
    # Enter new shorts:
    for stock in shorts:
        print ('stock to short: ', stock)
            # if not get_open_orders(stock):
        if data.can_trade(stock) and \
        context.portfolio.positions[stock].amount == 0:
            order_target_percent(stock, -short_size)
 
 
 
def record_vars(context, data):
    long_count = 0
    short_count = 0
    for position in context.portfolio.positions.itervalues():
        if position.amount > 0:
            long_count += 1
        elif position.amount < 0:
            short_count += 1
    record(gross_leverage=context.account.leverage)  # Plot leverage to chart
    record(net_leverage=context.account.net_leverage)
    record(num_longs = long_count)
    record(num_shorts = short_count)
    record(portfolio_size = len(context.portfolio.positions))
There was a runtime error.

@ Kay star:

Great findings! Thanks. Actually, the reason why I wanted to set up the first layer signal along with the second signal was trying to bring down the number of daily stocks to trade, thus to decrease avg daily turnover rate. But as I tested, the difference in avg rolling 63-day daily turnovers is only about 10%.

The version with only one signal comes with higher performance, e.g. Sharpe Ratio and Max Drawdown compared with the two-signal version of strategy. Attached is the one signal version you suggested, i.e. only picking the top and bottom companies based on DS2 scores. But I directly consider the previous second signals as the final signals in my code as you can see, slight difference but almost the same. Performance metrics including max drawdown and Sharpe are improved.
Unfortunately, as we lift the transaction costs and slippage the performance doesn't hold well due to high turnover rate.

Clone Algorithm
14
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Uniqueness for this template:

# Allows us to reverse longs to shorts directly
# Ignore stocks with no signals on some days which are already in our portfilio




import pandas as pd
import numpy as np

# data_file_sample = 'https://dl.dropboxusercontent.com/s/yzihbscejxr5fvy/6_5_DS2_daily_agg_mean_SMA_21_63_part_1_sample_shorter_for_QA.csv?dl=0'

data_file_part_1 = 'https://dl.dropboxusercontent.com/s/bg3po7nk8nsez7d/6_4_DS2_daily_agg_mean_SMA_21_63_part_1.csv?dl=0'

data_file_part_2 = 'https://dl.dropboxusercontent.com/s/nbppgxkr59q4lz7/6_4_DS2_daily_agg_mean_SMA_21_63_part_2.csv?dl=0'


num_stocks_to_trade = 100
MAX_GROSS_EXPOSURE = 1.0
MAX_SHORT_POSITION_SIZE = 0.1
MAX_LONG_POSITION_SIZE = 0.1
single_metric_string = 'rm_signal'
score_string = 'score'
long_signal = 'long'
short_signal = 'short'
long_half= 0.5
short_half = 0.5


data_take=0
k=0
def merge_data(df):
    global data_take
    global k
    if k==0:
        data_take=df
        k=1
    else:
        frames=[data_take,df]
        data_take=pd.concat(frames)
    return data_take



def initialize(context):
    context.num_stocks_to_trade = num_stocks_to_trade
    set_benchmark(sid(8554))
    set_slippage(slippage.FixedBasisPointsSlippage(basis_points=1, volume_limit=0.1))
    set_commission(commission.PerShare(cost=0.000, min_trade_cost=0))
    
    schedule_function(my_rebalance,
                      date_rules.every_day(),
                      time_rules.market_open(hours = 0, minutes = 5))

    schedule_function(record_vars,
                      date_rules.every_day(),
                      time_rules.market_close(hours = 0, minutes = 1))   

    # fetch_csv(data_file_sample,
    #       date_column='date',  # Assigning the column label
    #       date_format='%m/%y/%d %H:%M:',  # Using date format from CSV file
    #       mask=False,
    #       # post_func = merge_data,
    #       timezone='EST')   
    
    fetch_csv(data_file_part_1,
          date_column='date',  # Assigning the column label
          date_format='%m/%y/%d %H:%M:',  # Using date format from CSV file
          mask=False,
          post_func = merge_data,
          timezone='EST')

    fetch_csv(data_file_part_2,
          date_column='date',  # Assigning the column label
          date_format='%m/%y/%d %H:%M:',  # Using date format from CSV file
          mask=False,
          post_func = merge_data,
          timezone='EST')
     


def my_rebalance(context, data):  
   
    
    rm_signals = []
    current_date = get_datetime().date()
    for stock in data.fetcher_assets:
        enter_date = data.current(stock, 'enter_date')
        enter_date = pd.to_datetime(enter_date).date()
        if current_date.strftime('%Y-%m-%d %H:%M:%S') == enter_date.strftime('%Y-%m-%d %H:%M:%S'):
            rm_signal = data.current(stock, single_metric_string)
            score = data.current(stock, score_string)
            rm_signals.append([stock, score, rm_signal])
            
    df = pd.DataFrame(rm_signals, columns = ['stock', score_string, single_metric_string])
    df['rank'] = df[score_string].rank(ascending = False)
    df = df.sort_values('rank')
    
    #kay star adjustment to kill signal 1, and rely only on signal 2
    
    
    # Secondly, filter out the highest scores for going long and lowest scores for going short
    num_stocks_to_trade = context.num_stocks_to_trade
    if len(df) < 2.0*num_stocks_to_trade:
        num_stocks_to_trade = int(len(df)/2.0)

        longs  = df[:num_stocks_to_trade]['stock'].tolist()
        shorts = df[-num_stocks_to_trade:]['stock'].tolist()
    else:
        longs  = df[:num_stocks_to_trade]['stock'].tolist()
        shorts = df[-num_stocks_to_trade:]['stock'].tolist()    


    print ('longs_all_length: ', len(longs))
    print ('shorts_all_length: ', len(shorts))   

    
    total_potential_positions = list(set(longs + shorts))
    total_potential_size = len(total_potential_positions)
    print ('Current position size: ', len(context.portfolio.positions))
    print ('Long/Short Signal Size: ', len(longs + shorts))
    print ('Total_potential_size: ', total_potential_size)
    long_size = 1.0/total_potential_size
    short_size = 1.0/total_potential_size
       
       
    print ('Long size for today: ', long_size)
    print ('Short size for today: ', short_size)
    ################ Execution Logic 1 : Flipping/reversing long/short positions: ################
    try:
        for position in context.portfolio.positions:
            if context.portfolio.positions[position].amount>0 and position in shorts:
                order_target_percent(position, -short_size)
        
            if context.portfolio.positions[position].amount<0 and position in longs:
                order_target_percent(position, long_size)
                
    ################ Execution logic 2: Exiting ################
            if position not in longs and position not in shorts and position in df['stock'].tolist():
                order_target_percent(position, 0)             

    ################ Execution Logic 3: Ignoring stocks with no signals which are already in portfolio: ################
            if position not in df['stock'].tolist():
                pass
                print ('This stock is still in our position, but no signals today, do nothing', position)
            
    except:
        pass

    ################ Execution Logic 4: Entering new long/short stocks: ################
    
    # Enter new longs:
    for stock in longs:
            # if not get_open_orders(stock):
        if data.can_trade(stock) and \
        context.portfolio.positions[stock].amount == 0:
            order_target_percent(stock, long_size)     
              
    # Enter new shorts:
    for stock in shorts:
            # if not get_open_orders(stock):
        if data.can_trade(stock) and \
        context.portfolio.positions[stock].amount == 0:
            order_target_percent(stock, -short_size)

           
def record_vars(context, data):
    long_count = 0
    short_count = 0
    for position in context.portfolio.positions.itervalues():
        if position.amount > 0:
            long_count += 1
        elif position.amount < 0:
            short_count += 1
    record(gross_leverage=context.account.leverage)  # Plot leverage to chart
    record(net_leverage=context.account.net_leverage)
    record(num_longs = long_count)
    record(num_shorts = short_count)
    record(portfolio_size = len(context.portfolio.positions))
There was a runtime error.

@All:

Some great work/results done by @James Villa integrating DS2 dataset into an existing strategy:

https://www.quantopian.com/posts/sentiment-data-comparing-accern-and-psychsignal.

James has a very good point when it comes to the impact of trading costs on DS2 strategies, as well as some effective/realistic ways of utilizing the DS2 or any other Accern datasets. Using the standalone strategies posted here is surely not the most perfect solution, especially when traders don't have a better transaction cost deals with their brokers. Instead, combining the DS2 as a strong alpha factor with other alpha factors would be one of the practical ways to increase overall performance even with strict transaction cost models.

@Brad, am I missing something? You said

@ Kay star:
Great findings!

In the Kay Star version, longs_1 & shorts_1 are both set to the same stocks, all stocks.
Then in longs, there are many where 'rm_signal' was short and visa versa.

From the Star changes:

    longs_1 = df[df[single_metric_string]!='2']['stock'].tolist()  
    shorts_1 = df[df[single_metric_string]!='2']['stock'].tolist()  

There is no '2'.
Basically randomness no longer utilizing signals. Why would that be considered good?

If Kay Star would like to delete that backtest, I'll be happy to delete this message.

@ Blue Seahawk:

Thanks for pointing out. I think what Kay Star wanted to implement is just to ignore the first layer of signals. So if you take the intersection of all stocks and stocks filtered from the second layer of signals, you can actually consider them as the final selected stocks to either go long/short and I don't see problems of doing that.

But just as I pointed out, to make it more clean to look at and not confusing especially with that !='2' you mentioned, I prefer to delete those parts, as you can see in my updated version. I just directly made the second layer of signals as final signals to pick long/short positions.

And also, apologies for the late reply to your test on the weighted position allocation based on scores, which is definitely worth improving from my original version. I will take a look into it soon and get back with feedbacks. Again, appreciate the comments and suggestions.

@ Guy,

As pointed out already by some others in this post, the results of flipping coins are not the same as the results of a trading strategy. I can have only 40% of hit rate, but still consistently make money as long as the money I make from the 40% winning trades is more than that I lose from the 60% losing trades on a regular basis.

Take, for instance, consider short volatility strategies as an example. Most of the time prior to a crisis, we can consistently make money by shorting volatility using options or shorting volatility ETN/ETFs (consider how these ETF/ETNs are structured based on VIX futures). However, when a big market crash comes, only one or two losses might immediately wipe out all previous gains accumulated from 10 or 100 winning trades.

Still, I don't have any problems with your concern regarding the transaction costs, which easily turns a winning simulated strategy into a losing one in the real-life trading situation. That's why I emphasized the cautious use of high-turnover strategies like the one in this post in previous replies.

@Brad, if a trading strategy has something close to a 50/50 hit rate, it behaves about the same as if flipping a coin. And until someone can demonstrate that it is NOT behaving as if tossing a coin, then I will consider the trades, and the outcome of a trading strategy as if quasi-randomly generated since the two would not be distinguishable no matter how much skill or spin we would like to put on it.

Yes. Some strategies can have a 40% hit rate and have profits on their winning trades compensate. But, it is not the case with your trading strategy.

@Brad, I think you should look at your numbers. If your trading strategy does not generate them, then they do not apply to your trading strategy. And that is what counts. Give examples of what your trading strategy does, that is what is under analysis, not the other guy's strategy, but yours. And there, the math is pretty explicit.

In my last backtest on your latest iteration , I set commissions and slippage to default. Then reduced leverage so that it would be closer to 1.0, and got the attached analysis result: backtest id ('5b34637a08f1c0448808070a'). Made no other changes.

On 90,715 trades (tosses), a 3-standard deviation is about 450 on each side. Your strategy came in at 44,841 wins and 45,874 losses. Flipping a coin would have, in probability, 44,548 on the low side and 45,452 for the high side. The average net win was $709.90 per trade and the average loss $694.46. Your overall hit rate was 0.49. It is technically quite close to 0.50. The average net profit per trade was: -$0.28. The strategy lost, on average, about a quarter on every trade it made.

Therefore, if there are predictive powers in there, they were faint, very very faint indeed.

I was hoping to find something worthwhile, or at least find something interesting that maybe I could use somewhere else.

I understand that your strategy is not operating on some random stuff. It is thought out or at least you have good reasons to having programmed it that way. But that is not the point. The point is the outcome. The what you get in the end. And there, I maintain, that the strategy behaved about the same as if trades had been randomly generated. And I can not beat a heads or tails game.

For whatever it is worth, your strategy did provide a low-volatility low-beta equity curve.

Loading notebook preview...
Notebook previews are currently unavailable.

@ Guy,

I agree with what you said since the numbers are here without the need to debate, which is the fact that if we use the default commissions and slippage on the daily DS2 strategy, it just falls apart.

I understand that you seem to be searching for full-fledged and solid ready-to-use strategy trading in real life. In this case, I admit the strategy shown here is not what you are looking for with the default commission and slippage model of Quantopian's.

As I pointed out before, not all traders, especially large institutions in the world use exactly the same default commissions model as Quantopian does. Lots of them have their own better deals with brokers and market makers, which is easy to understand. So I still don't think it's a good idea to completely consider this strategy to be unrealistic just because of using the fixed default commissions and slippage costs. After all, commissions and slippage model is still one of the tweakable parameters in backtesting a strategy, which means if I, in real life scenario, have cheaper trading costs than default setup here, I will backtest and come up with a different result, which could be a much better one.

Again, the point I want to make here is to encourage others to be interested in our DS2 datasets and Accern by showing a strategy example, to test them and find value in the way they want. And in fact, some major players on Wall Street we have been working with have already given us very positive feedback regarding their alpha-finding process using our datasets. It all depends on how we use our creativity and strategy development experience working with the data.

Still, I appreciate your detailed comments and concerns about the strategies posted here. In this tricky financial market, any suspicious voice should be listened to and taken into consideration, which I myself as a quant/discretionary trader, also personally believe in.

@Brad, I only look at the numbers. My interest is in a strategy's payoff matrix final outcome, its endgame. Will a strategy produce a reasonable profit over its trading interval? I appreciate it more when the holding time is longer than 40 months. Even more so, when the 40 months under consideration is within a 9-year+ general market uptrend.

If a trading strategy can be “derailed”, turned unprofitable by adding low commissions and some slippage due to market impact and share availability, then it definitely is too weak to face its future alone. Already, Q has set its default commissions to 0.001, one tenth of what IB would charge. Under those conditions, buying 100 shares of AAPL will cost 10 cents instead of a dollar. Not something that should have a major impact on a trading strategy starting with a $10 M stake.

A portfolio's payoff matrix equation would look like the following: F(t) = F(0) + Σ(H∙ΔP) – Σ(exp.). If the sum of all the trading activity minus expenses approaches zero: Σ(H∙ΔP) – Σ(exp.) → 0. It implies that the generated profits came out to about the same as the trading expenses: Σ(H∙ΔP) ≈ Σ(exp.), and therefore, the strategy could barely cover its frictional costs.

For me, what others can say about how successful they have been using your data is almost totally irrelevant without proof or some kind of demonstration. Even a backtest analysis would be sufficient since it would not reveal how their strategy was constructed. Until I see one of those, I will refrain from making a judgment on one side or other.

I should thank you for having given me the opportunity to analyze your trading strategy.

Your strategy leveraged a “small” edge it had, this without frictional costs considered. By removing part of the leverage, (not all) and by considering these minimal trading costs, the strategy failed to show that it had any significant predictive abilities. And there, part of the reason should be put on the data itself. Saying that the data failed to show that there was indeed a built-in predictive edge.

You started this thread praising the predictive powers of your dataset, and ended your last post declaring that you were a “discretionary trader”. Thereby implying that you might not even use that dataset since by definition you do not need a program to trade discretionarily. I too only use your dataset to make simulations, and I would not go any further than that, meaning doing more than simulations.

Even with your understanding of the dataset you provided, all you could provide was a trading strategy that technically did not produce any tangible benefits, read any profits when frictional costs were considered. Well, let's say any worthwhile profit, and I would add, probably at any scale (but, I cannot verify that).

Why not tell me what you think your dataset was supposed to provide? What is the fundamental basis to say that there was an edge in that dataset?

If I was not so eager to learn a lot of stuff, I might not have wasted so much time. Well, it was not all wasted, you learn something all the time.

@Guy,

You said:

For me, what others can say about how successful they have been using your data is almost totally irrelevant without proof or some kind of demonstration. Even a backtest analysis would be sufficient since it would not reveal how their strategy was constructed. Until I see one of those, I will refrain from making a judgment on one side or other.

Have a crack at this. It uses Accern DS2 daily datasets as a sentiment factor along with six other factors.

Loading notebook preview...
Notebook previews are currently unavailable.

@Guy Fluery,

You said:

if a trading strategy has something close to a 50/50 hit rate, it behaves about the same as if flipping a coin. And until someone can demonstrate that it is NOT behaving as if tossing a coin, then I will consider the trades, and the outcome of a trading strategy as if quasi-randomly generated since the two would not be distinguishable no matter how much skill or spin we would like to put on it.

In the context of financial trading, let me just debunk your coin tossing analogy. In coin tossing, you only have two possible outcomes: heads or tails. Done 10,000 or a million times, you arrive at close to 50/50 hit rate, the classic random walk. In financial trading, there are three possible outcomes: profit, loss or even. There is also the inclusion of frictional costs, commissions aand slippage which the coin tossing doesn't have. More importantly, financial trading has the magnitude dimension, the amount of profit or loss one can have which coin toss doesn't have. So overall, we can conclude that they have different probability distributions. So when you dismiss trading results that exhibit a 50/50 hit rate as pure noise or random behavior based on your coin toss analogy, you make me cringe. This is an utterly incorrect analysis of trading performance based on incorrect premises about outcomes and probabilities.
My above tearsheet demonstrates this clearly on short trades, where the hit rate is 50% but is quite profitable and therefore can not just be dismissed as pure noise or random walk.

Summary stats   All trades  Short trades    Long trades  
Total number of round_trips 24650.00    14739.00    9911.00  
Percent profitable  0.54    0.50    0.60  
Winning round_trips 13370.00    7375.00 5995.00  
Losing round_trips  11280.00    7364.00 3916.00  
Even round_trips    0.00    0.00    0.00  
PnL stats   All trades  Short trades    Long trades  
Total profit    $4418696.39 $2346003.52 $2072692.87  
Gross profit    $32249881.54    $16890816.68    $15359064.86  
Gross loss  $-27831185.15   $-14544813.17   $-13286371.99  
Profit factor   $1.16   $1.16   $1.16  
Avg. trade net profit   $179.26 $159.17 $209.13  
Avg. winning trade  $2412.11    $2290.28    $2561.98  
Avg. losing trade   $-2467.30   $-1975.12   $-3392.84  
Ratio Avg. Win:Avg. Loss    $0.98   $1.16   $0.76  
Largest winning trade   $283397.43  $283397.43  $172126.34  
Largest losing trade    $-191143.29 $-178729.30 $-191143.29  
Duration stats  All trades  Short trades    Long trades  
Avg duration    34 days 02:27:10.183691 30 days 12:34:59.452947 39 days 10:09:09.469377  
Median duration 23 days 00:00:00    20 days 23:01:00    28 days 01:00:00  
Longest duration    205 days 23:00:00   179 days 23:00:00   205 days 23:00:00  
Shortest duration   0 days 14:33:59 0 days 14:33:59 0 days 14:33:59  

If you need further proof, let me know.

@James, I will only use the equation presented in my last post and some of the numbers you provided.

For instance, you have a 0.54 hit rate and want to assume that it is better than average when the US market's historical long-term average has been in the vicinity of 0.52 – 0.54. It is like flipping a biased coin (0.52 – 0.54). You will get it right 52 to 54% of the time, on average.

Is there any merit in flipping such a coin and getting 0.54? I do not see any. On the contrary, that is what the most expected value would turn out to be. So, I certainly am not surprised by your 0.54 hit rate. Should you have shown less, then your trading methods would have been detrimental to your portfolio since randomly trading using (random() - 0.54) > 0.50) would have done a better job. If your system was predictive, it should have exceeded 0.54 which is available to anyone playing the game and throwing darts at the financial section of their newspaper.

Your portfolio payoff matrix equation: F(t) = F(0) + Σ(H∙ΔP) = n∙x_bar = (n - λ)∙AW + λ∙AL

when filled with your numbers gives:

24,650 ∙ $179.26 = (24650 – 9911) ∙ 2412.11 + 9911 ∙ -2467.30 = $4,418,759

The average profit per trade is about 0.16% on your average bet. And you want to tell me that you are not playing on market noise. You would need more than that. That is not even a 1% profit on a trade. But then again, money is money. There is nothing wrong playing on market noise and making money. So, go for it.

Your winning trade averaged $2412.11 while your losing trade average $-2467.30 and you do not see the similarities with flipping a coin?

What I see in your tearsheet is a degrading CAGR. A trading method that can take over 2 hours to fill market orders on liquid stocks, and because of the upward drift is profiting from it. That's OK. The strategy does make money. It is also that x_bar at $179.26 will be decreasing further as you go forward. Note that $179.26 per trade, on average, is not that big. It could disappear just with a slight increase in frictional costs. For instance, your Gross Leverage chart shows that leverage is sometimes used but not accounted for.

What you provided is no proof, it is what was expected...

If your strategy was predictive in some way, it failed to show it. Its hit rate would have been higher than 0.54. And that is not what the tearsheet says. But, like I said, you can still profit on market noise, it is just that you cannot expect for it to remain the same. Sometimes it will work and sometimes it won't.

There is no need for me to make any suggestions. It would be totally useless. You think that your stuff is predictive when in reality it might not. Furthermore, you seem to refuse to even investigate the “possibility”. So, may I wish you continued good luck.

I cannot make the blind see.

Added:

You stated: “...In financial trading, there are three possible outcomes: profit, loss or even.”

Yet, looking at your numbers, there is not a single trade that came out even out of 24,650.

@Guy,

You seem to be fixated with averages. You introduce the concept of the Law of Large Numbers with your coin tossing analogy yet you analyze the phenomena with the Law of Averages which leads to what is called the Gambler's Fallacy. I suggest you research them so that you may see the light. The rigidity of your analysis using the coin tossing analogy is too one dimensional and binary. At the very least, you are missing the magnitude dimension. Also, your clairvoyant tendencies never ceases to amaze me.

@James, here is some clairvoyance for you. Your trading strategy is structurally designed to degrade with time. The more time you give it, the worse it will get.

There is a simple acid test you can do. It requires no change in code. Simply add more time to your simulation. Start your test on January 4, 2010 for instance, and end it with June 29, 2018 it will still be an up-market for the duration. Then report back with the tearsheet backtest analysis. You should do a lot more trades, see your average net profit per trade go down, and see your CAGR decline.

Your trading strategy is playing on market noise. As I said, it is OK. You do not need to be predictive to win at this game. You might think that what you do is predictive, but in reality, it is just coincidental. There is no advantage in taking a coin to guess the outcome of another. On the other hand, even if you did, the expected outcome would be the same.

@Guy,

I would had obliged your crystal ball request but unfortunately, the Accern dataset starts at 7/31/2014 to 3/27/2018.

Secondly, I really appreciate this healthy debate between two different schools of thought.

Here are some quotes from you:

@Brad, if a trading strategy has something close to a 50/50 hit rate, it behaves about the same as if flipping a coin. And until someone can demonstrate that it is NOT behaving as if tossing a coin, then I will consider the trades, and the outcome of a trading strategy as if quasi-randomly generated since the two would not be distinguishable no matter how much skill or spin we would like to put on it.
...the US market's historical long-term average has been in the vicinity of 0.52 – 0.54. It is like flipping a biased coin (0.52 – 0.54). You will get it right 52 to 54% of the time, on average.
If your system was predictive, it should have exceeded 0.54 which is available to anyone playing the game and throwing darts at the financial section of their newspaper.

From your statements, I gather that you are a follower of the Efficient Market Hypothesis and belong to the school of thought that the stock market is a random walk and is pure white noise, therefore no amount of skill or spin, fundamental or technical analysis, can outperform the market without taking in additional risks. You must be a proponent of passive investment, buy and hold the diversified market because it is a biased random walk with positive drift. Your thoughts mirror that of Burton Malkiel , the author of the 1973 book, "A Random Walk Down Wall Street." Here, Malkiel makes reference that stock market professional can not outperform dart throwing monkeys, something that you referred to. So Wall Street Journal put this hypothesis to the test and here is an excerpt of their findings:

In 1988, the Wall Street Journal created a contest to test Malkiel's random walk theory by creating the annual Wall Street Journal Dartboard Contest, pitting professional investors against darts for stock-picking supremacy. Wall Street Journal staff members played the role of the dart-throwing monkeys. After 100 contests, the Wall Street Journal presented the results, which showed the experts won 61 of the contests and the dart throwers won 39. However, the experts were only able to beat the Dow Jones Industrial Average (DJIA) in 51 contests.

I belong to the school of thought that the market is chaotic, irrational and, at times, downright inefficient along the lines of Prof. Eugene Fama of Univ. of Chicago. Markets are highly irrational and the predominant emotions — greed and fear — work to drive prices up too high on good news and too low on bad news. I lean more towards the Fractal Market Hypothesis, formalized by Edgar Peters in 1991 within the framework of chaos theory to explain the heterogeneity of investors with respect to their investment horizons. In short, I believe that there are packets of predictability and packets of randomness in the stock market which could be described as a semi-chaotic time series. The packets of predictability can be extracted from human behavior of market participants as manifested by the greed and fear emotions. The packets of randomness comes from uncertainty as a result of structural and other changes. Having said that, I believe one can find order in chaos or what is seemingly a random phenomena.

Lastly, I just want to comment on your Addendum above:

Added:

You stated: “...In financial trading, there are three possible outcomes: profit, loss or even.”

Yet, looking at your numbers, there is not a single trade that came out even out of 24,650.

I find it a little amateurist on your part to dispute this fact using a small sample size of my algo. You could do better than that!

@James, I have read all those people too and a lot more. I do not adhere to any school of thought, only to numbers, and I would add winning numbers.

Portfolio investment management and trading systems are different things. When trading, what you want is to extract a profit from all the seemingly random-like price movements. Call it whatever you want: volatility, chaotic, semi-chaotic, variance, random-like, quasi-unpredictable, quasi-random walk, slightly predictive, positive alpha-streams, mostly efficient, irrational or whatever. It does not really matter to the trading account balance at the end of the day. Only the net results are accounted for.

I go for the practical side of things. And there, I have only 3 questions. Does that trading strategy work? That is answered with a simulation. Will it continue to work in the future? The answer comes from the architecture, structure, and behavior of the program itself, often by an extension on its metric if the backtest was for a long enough period, trading a sufficiently large number of stocks with a large number of trades. The third question and maybe the most important, how can I improve on a particular trading strategy design, can I deliberately make it better?

@James, yes. I am definitely concerned about averages. You were given a portfolio's equity equation which read:

F(t) = F(0) + Σ(H∙ΔP) – Σ(Exp.) = n∙x_bar = (n - λ)∙AW + λ∙AL

There is an equal sign in all this. It is a statement. Note that I separated accounting for trading expenses since x_bar is the average net profit per trade.

You have a trading strategy that made $179.26 average profit per trade on 24,650 trades. Holding, on average, 90 stocks over the simulated period. And it can all be resumed in two numbers: n∙x_bar. It says that whatever you do trading, no matter how simple or complex it might be, it will all end up with those two numbers. One is just the trade counter, so, no predictability there, except maybe that it could be big, and that you could estimate it from your simulated scenarios to get something like: on average, n tends to be of this size.

x_bar, the average net profit per trade, is the heart of your trading strategy.

If it is negative, you lose.
If it is zero, you lose.
If it is small or does not cover trading expenses, you lose.
If it generates less than market averages, you lose.
If you generate about the same as market averages then any incentive to use your trading strategy will need to be justified by its other characteristics since there are a lot of better alternatives out there. If x_bar degrades over time, then you have presented me with your best effort, and I should expect your strategy going downhill going forward, which, I think we would both agree, is not that enviable. It was the reason for requesting a longer trading interval to see the rate of CAGR descent. For sure, I would not go forward without doing such a test. I would need that answer better doing anything else, even if it meant putting aside a factor.

Note that your trading strategy is structured in such a way that it cannot generate an even trade. And this, by your design. Pointing it out might be trivial since it has absolutely no impact on your trading strategy. However, a sample of 24,650 trades is more than large enough, to make averages, to make generalities concerning a strategy's trading behavior.

If all ends up with n∙x_bar, and all the outperformance is directly related to it, then there is only one question: how can I increase the value of n∙x_bar? How can I improve the strategy's payoff matrix Σ(H∙ΔP)? Increasing n would do the job. Then find ways to increase n. Increasing x_bar would do the job too. Then find ways to increase x_bar. What I suggest is: do both at the same time! It will be a lot more productive, that is if your trading strategy can be improved in such a way in those two departments. But for sure, if you cannot improve n (increasing it), or cannot increase n_bar, then you are left with what you already have!!! And it will not get any better due to the very nature of your strategy design.

I lost interest in your trading strategy, so I will not be the one to modify it or test scenarios that might increase n∙x_bar. I wish you good luck.

@Guy,

Thanks for wishing me some luck. I would wish you good luck too but since you are just learning python and do not have backtests/tearsheets to show, it would be for nothing. But I would be curious if you can point me to some backtests/tearsheets, here in Q or elsewhere, that blew your mind and met all your expectations.

Since you're in the mood to scrutinize tearsheets, here's another one of mine that I posted here Please feel free to comment.