Back to Community
Logitbot: Using Big Data and Machine Learning to Forecast stock Returns

FORECASTING STOCK RETURNS WITH BIG DATA AND MACHINE LEARNING

OVERVIEW:

LogitBot Inc. uses machine learning and other advanced technologies to deliver investment insights and predictive analytics to investors across multiple time horizons. We combine massive amounts of data into a graph of the world's financial information and employ sophisticated models to uncover hidden relationships in order to understand and predict how markets are likely to behave.

SUMMARY

We are excited to demonstrate to the Quantopian community how they can leverage our predicted stock return model outputs as a valuable signal in systematic trading strategies. The predictions we are making available on Quantopian consider a large variety of predictors ranging from fundamental factors, historical returns to credit and interest rate spreads, commodity prices and fx fluctuations to forecast log returns at 5 day horizon.

SAMPLE STRATEGY

We have shared a trading strategy that utilizes the model forecasts to generate attractive returns. A 5 year backtest of the strategy on Quantopians platform results in:
Total Returns of 239.42% vs SPY 112.4%
Alpha: 0.33
Beta: 0.18
Sharpe: 3.20
Sortino: 4.06

We have made a sample dataset with 1 year worth of data available here

Please take a look at our research notebook with more detail about the model data and the process

Clone Algorithm
265
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
from datetime import datetime
import pandas as pd
import numpy as np
import pytz
from quantopian.algorithm import attach_pipeline, pipeline_output
from quantopian.pipeline import Pipeline
from quantopian.pipeline.data.builtin import USEquityPricing
from quantopian.pipeline.factors import AverageDollarVolume, SimpleMovingAverage



def preview_data(df):

    df['date'] = pd.to_datetime(df['date'])
    df = df.sort(['symbol','date'], ascending= [1,1])
    df = df.replace([np.nan,-np.inf, np.inf], 0)

    log.info('Loaded data for {} stocks from Logitbot'.format(len(df['symbol'].unique())))

    log.info('Preview of Data \n{}'.format(df.head()))
             
    return df


def initialize(context):
    """
    Called once at the start of the program. Initialize trading parameters.
    """

    # Define context variables that can be accessed in other methods of
    # the algorithm.

    context.std_thresh       = 0.1
    context.target_weight    = 0.03
    context.model            = 'model_h2'
    context.min_hold_period  = 3
    context.rebal_dict       = {}
    context.use_short        = False
    context.avg_longs        = 0
    context.avg_shorts       = 0
    context.total_longs      = 0
    context.total_shorts     = 0
    context.acc_thresh       = 0.55
    
    # avoid illiquid stocks
    context.min_dollar_volume = 5e6
    context.min_volume        = 1e6
    context.max_short_port    = 0.3

    # if context.use_short is False:
    #     set_long_only()
    
    # fetch the Logitbot preditions from google 
        
    csv_file ="https://docs.google.com/spreadsheets/d/12ToiE68v0Q6obkcZToO_O2NRn0A-qMe0x3VqPeevDKs/pub?gid=1612788013&single=true&output=csv"

    fetch_csv( csv_file, pre_func=preview_data, symbol_column = 'symbol', date_column='date', date_format = '%m/%d/%Y', delimiter = '|')



    # Trade daily at 10am
    schedule_function(func= ml_trades,
                      date_rule=date_rules.every_day(),
                      time_rule=time_rules.market_open(hours=0, minutes=30))

    # Record tracking variables at the end of each day.
    schedule_function(plot_variables,
                      date_rules.every_day(),
                      time_rules.market_close(minutes=1))

    # Create and attach our pipeline (dynamic stock selector), defined below.
    attach_pipeline(make_pipeline(context), 'ml_example')




def ml_trades(context,data):
    """
    Execute trades based on the Machine learning forecasts
    This function is called according to the settings in schedule_functions
    
    """
    pred_ret_col    = context.model

    for stock in data.fetcher_assets:

        if stock in context.security_list:
    
            if data.can_trade(stock):
        
                if stock not in context.rebal_dict.keys():
                    context.rebal_dict[stock] = datetime(1901,1,1, tzinfo=pytz.utc)
        
                # Trading logic
                if data.can_trade(stock):
        
                    pred_return  = data.current(stock,pred_ret_col)
                    mean_log_ret = data.current(stock,'mean_log_ret')
                    vol_log_ret  = data.current(stock,'vol_log_ret')
                    curr_pos     = context.portfolio.positions[stock].amount
                    today_dt     = pd.to_datetime(get_datetime())
                    accuracy     = data.current(stock, 'accuracy_{}'.format(context.model))

                # hold positions for at least the duration of rebal_interval
                if (today_dt - context.rebal_dict[stock]).days > context.min_hold_period:

                    # BUY if the predicted return is above the average buy threshold
                    buy_thresh = (mean_log_ret + vol_log_ret) * context.std_thresh

                    if pred_return > buy_thresh and curr_pos <= 0 and accuracy >= context.acc_thresh:
                        tgt_wgt = context.target_weight
                        order_target_percent(stock,tgt_wgt)
                        # update the rebal_dict with the last trade date
                        context.rebal_dict[stock] = today_dt

                    # SELL if the predicted return is below average
                    sell_thresh = (mean_log_ret - vol_log_ret) * context.std_thresh

                    if pred_return < sell_thresh and curr_pos >= 0 and accuracy >= context.acc_thresh:
                        tgt_wgt = context.target_weight
                        order_target_percent(stock, -tgt_wgt /2)

                        # update the rebal_dict with the last trade date
                        context.rebal_dict[stock] = today_dt

            else:
                log.warn("cannot trade stock {}".format(stock))

    
                

def plot_variables(context, data):
    """
    This function is called at the end of each day and plots certain variables.
    Currently plots the strategy's leverage, exposure and net_long expsure

    """
    # Check how many long and short positions we have.
    longs = shorts = 0
    for position in context.portfolio.positions.itervalues():
        if position.amount > 0:
            longs += 1
        if position.amount < 0:
            shorts += 1
    if context.use_short is False:
        shorts = 1

    record(leverage=context.account.leverage,
           exposure=context.account.net_leverage,
           net_long = (longs / float(shorts) )/ len(data.fetcher_assets))



def make_pipeline(context):
    """
    A function to create a dynamic stock selector. We use this pipeline to filter stocks 
    based on their average AND dollar volume (proxy for liquidity)
    
    """
    # Create a pipeline object.
    pipe = Pipeline()

    # Create a liquidity filter as a combination of dollar volume and average volume
    dollar_volume = AverageDollarVolume(window_length=30)
    volume_30   =  SimpleMovingAverage(inputs=[USEquityPricing.volume], window_length=10)

    # Define high dollar-volume filter to be stocks with a dollar volume of at least 5M
    high_dollar_volume = dollar_volume > context.min_dollar_volume
    high_volume = volume_30 > context.min_volume

    pipe.add(high_dollar_volume, 'high_dollar_volume')
    pipe.add(high_volume, 'high_avg_volume')
    
    return pipe



def before_trading_start(context, data):
    """
    Called every day before market open. This is where we get the securities
    that made it through the pipeline.
    """

    # Pipeline_output returns a pandas DataFrame with the results of our factors
    # and filters.
    screen_df = pipeline_output('ml_example')

    # Sets the list of securities we want to trade as the securities with a 'True'
    # value in the high_dollar_volume column
    screen_df = screen_df[screen_df['high_dollar_volume']]
    context.universe = screen_df[screen_df['high_avg_volume']]


    # A list of the securities that pass the filter  today.
    context.security_list = context.universe.index.tolist()
    # A set of the same securities, sets have faster lookup.
    context.security_set = set(context.security_list)
    
There was a runtime error.
15 responses

so how do we get the live feed?

I opened the CSV that this code referenced. It actually has parameters for individual dates, meaning the code is going date by date and is highly overfit. You know, even I could create a strategy that never loses by telling it exactly what to do on any given date through the lens of retrospection.

Does Quantopian not screen for these kinds of scams? Seems highly suspicious, not to mention the website link is sparse and looks very unprofessional. Maybe this is the "real deal", but if it were, I doubt he'd be sharing it and would probably be on his own private island, not trying to rope a few greedy, sad souls in. Pardon my skepticism. Maybe (I hope) I'm wrong.

symbol|date|act_log_ret|model_o|model_g|model_b|model_h|model_h2|mean_log_ret|vol_log_ret|accuracy_model_o|accuracy_model_g|accuracy_model_b|accuracy_model_h|accuracy_model_h2
AAPL|2006-01-03|0.019|0.017|-0.001|-0.056|-0.013|0.000|0.053|0.016||||0.000|0.500
AAPL|2006-01-04|0.010|0.030|-0.019|-0.064|-0.018|0.000|0.053|0.017||||0.000|0.500
AAPL|2006-01-05|0.011|0.023|0.005|-0.055|-0.009|0.000|0.053|0.016||||0.000|0.500

Hi Jeffery

We are a fintech company that is building AI tools and looking to provide machine learning models as a service. Our models provide a rolling prediction of the 5 day returns. So on Monday we will provide a forecast for Friday. On Tuesday we provide a forecast for the next Monday, etc etc. The CSV we shared is a sample of the feed we plan to make available on Quantopian on an end of day basis.

Please take a look at our Research Notebook : https://www.quantopian.com/posts/forecasting-stock-returns-with-big-data-and-machine-learning

Nice

Gosh, if only there was a way for you to prove that this wasn't just a bunch of trumped up backward looking picks.

Have you considered proving that this works on out of sample data in some way? I mean Jeffrey's right. Any one of us could create that CSV file. Not sure if entering the Quantopian contest breaks their TOS or not, but you could certainly use fundseeder to prove what you're doing.

Hi Tar, Jake and Jefferey

Thanks for your feedback. We follow a rigorous process to ensure our predictions do not contain lookahead bias. As an example, it is often useful to scale machine learning features to [-1 1] by applying a standard scaler (i.e standardize features by removing the mean and scaling to unit variance). While this is common in many statistical studies, it can introduce a bias as the mean and standard deviation contain information about the past and future.

We look forward to providing predictions for unseen data through our feed to allow users to evaluate the efficacy of our models on true out of sample data. This notebook was posted in close coordination with Quantopian and is the first in a series meant to introduce machine learning signals and catalyze a discussion on new use cases for the models we produce.

We are working with Quantopian to provide forward looking sample data in the near future for the community to evaluate, and we are excited to get your feedback once you have tested the signals in your own strategies.

Is it possible to get an update on this algo? Looks interesting.

Steve

We have finalized our data agreement with Quantopian and the data is ready. Quantopian is in the process of loading the data into their systems to make it available to the entire community

I have rerun it with some adjustments.
- I set the workbench to apple since it is only trading apple?
- Decrease leverage to keep it close to 1 to make benchmark comparison fair
It would be interesting to have some sample data for spy or some commodities like gold and oil to see how it behaves on those scenarios.

Clone Algorithm
31
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
from datetime import datetime
import pandas as pd
import numpy as np
import pytz
from quantopian.algorithm import attach_pipeline, pipeline_output
from quantopian.pipeline import Pipeline
from quantopian.pipeline.data.builtin import USEquityPricing
from quantopian.pipeline.factors import AverageDollarVolume, SimpleMovingAverage



def preview_data(df):

    df['date'] = pd.to_datetime(df['date'])
    df = df.sort(['symbol','date'], ascending= [1,1])
    df = df.replace([np.nan,-np.inf, np.inf], 0)

    log.info('Loaded data for {} stocks from Logitbot'.format(len(df['symbol'].unique())))

    log.info('Preview of Data \n{}'.format(df.head()))
             
    return df


def initialize(context):
    """
    Called once at the start of the program. Initialize trading parameters.
    """

    # Define context variables that can be accessed in other methods of
    # the algorithm.

    context.std_thresh       = 0.1
    context.target_weight    = 0.018
    context.model            = 'model_h2'
    context.min_hold_period  = 3
    context.rebal_dict       = {}
    context.use_short        = False
    context.avg_longs        = 0
    context.avg_shorts       = 0
    context.total_longs      = 0
    context.total_shorts     = 0
    context.acc_thresh       = 0.55
    
    # avoid illiquid stocks
    context.min_dollar_volume = 5e6
    context.min_volume        = 1e6
    context.max_short_port    = 0.3

    # if context.use_short is False:
    #     set_long_only()
    set_benchmark(sid(24))
    
    # fetch the Logitbot preditions from google 
        
    csv_file ="https://docs.google.com/spreadsheets/d/12ToiE68v0Q6obkcZToO_O2NRn0A-qMe0x3VqPeevDKs/pub?gid=1612788013&single=true&output=csv"

    fetch_csv( csv_file, pre_func=preview_data, symbol_column = 'symbol', date_column='date', date_format = '%m/%d/%Y', delimiter = '|')



    # Trade daily at 10am
    schedule_function(func= ml_trades,
                      date_rule=date_rules.every_day(),
                      time_rule=time_rules.market_open(hours=0, minutes=30))

    # Record tracking variables at the end of each day.
    schedule_function(plot_variables,
                      date_rules.every_day(),
                      time_rules.market_close(minutes=1))

    # Create and attach our pipeline (dynamic stock selector), defined below.
    attach_pipeline(make_pipeline(context), 'ml_example')




def ml_trades(context,data):
    """
    Execute trades based on the Machine learning forecasts
    This function is called according to the settings in schedule_functions
    
    """
    pred_ret_col    = context.model

    for stock in data.fetcher_assets:

        if stock in context.security_list:
    
            if data.can_trade(stock):
        
                if stock not in context.rebal_dict.keys():
                    context.rebal_dict[stock] = datetime(1901,1,1, tzinfo=pytz.utc)
        
                # Trading logic
                if data.can_trade(stock):
        
                    pred_return  = data.current(stock,pred_ret_col)
                    mean_log_ret = data.current(stock,'mean_log_ret')
                    vol_log_ret  = data.current(stock,'vol_log_ret')
                    curr_pos     = context.portfolio.positions[stock].amount
                    today_dt     = pd.to_datetime(get_datetime())
                    accuracy     = data.current(stock, 'accuracy_{}'.format(context.model))

                # hold positions for at least the duration of rebal_interval
                if (today_dt - context.rebal_dict[stock]).days > context.min_hold_period:

                    # BUY if the predicted return is above the average buy threshold
                    buy_thresh = (mean_log_ret + vol_log_ret) * context.std_thresh

                    if pred_return > buy_thresh and curr_pos <= 0 and accuracy >= context.acc_thresh:
                        tgt_wgt = context.target_weight
                        order_target_percent(stock,tgt_wgt)
                        # update the rebal_dict with the last trade date
                        context.rebal_dict[stock] = today_dt

                    # SELL if the predicted return is below average
                    sell_thresh = (mean_log_ret - vol_log_ret) * context.std_thresh

                    if pred_return < sell_thresh and curr_pos >= 0 and accuracy >= context.acc_thresh:
                        tgt_wgt = context.target_weight
                        order_target_percent(stock, -tgt_wgt /2)

                        # update the rebal_dict with the last trade date
                        context.rebal_dict[stock] = today_dt

            else:
                log.warn("cannot trade stock {}".format(stock))

    
                

def plot_variables(context, data):
    """
    This function is called at the end of each day and plots certain variables.
    Currently plots the strategy's leverage, exposure and net_long expsure

    """
    # Check how many long and short positions we have.
    longs = shorts = 0
    for position in context.portfolio.positions.itervalues():
        if position.amount > 0:
            longs += 1
        if position.amount < 0:
            shorts += 1
    if context.use_short is False:
        shorts = 1

    record(leverage=context.account.leverage,
           exposure=context.account.net_leverage,
           net_long = (longs / float(shorts) )/ len(data.fetcher_assets))



def make_pipeline(context):
    """
    A function to create a dynamic stock selector. We use this pipeline to filter stocks 
    based on their average AND dollar volume (proxy for liquidity)
    
    """
    # Create a pipeline object.
    pipe = Pipeline()

    # Create a liquidity filter as a combination of dollar volume and average volume
    dollar_volume = AverageDollarVolume(window_length=30)
    volume_30   =  SimpleMovingAverage(inputs=[USEquityPricing.volume], window_length=10)

    # Define high dollar-volume filter to be stocks with a dollar volume of at least 5M
    high_dollar_volume = dollar_volume > context.min_dollar_volume
    high_volume = volume_30 > context.min_volume

    pipe.add(high_dollar_volume, 'high_dollar_volume')
    pipe.add(high_volume, 'high_avg_volume')
    
    return pipe



def before_trading_start(context, data):
    """
    Called every day before market open. This is where we get the securities
    that made it through the pipeline.
    """

    # Pipeline_output returns a pandas DataFrame with the results of our factors
    # and filters.
    screen_df = pipeline_output('ml_example')

    # Sets the list of securities we want to trade as the securities with a 'True'
    # value in the high_dollar_volume column
    screen_df = screen_df[screen_df['high_dollar_volume']]
    context.universe = screen_df[screen_df['high_avg_volume']]


    # A list of the securities that pass the filter  today.
    context.security_list = context.universe.index.tolist()
    # A set of the same securities, sets have faster lookup.
    context.security_set = set(context.security_list)
    
There was a runtime error.

Had a look at the Logibot website and I am a little puzzled about what the product is.

Am I right in thinking that Logibot "just" offers the output of its algos - IE predictions / and or classifications or are you offering software which has the algos built in for a client to optimise, fiddle with and make his own predictions, input his own data?

If the latter then what are your ML algos on offer? And is the software GUI based using an underlying menu of algos for people to try out? And what about the data?

What data do you provide? Is it all pre formattted, normalised whatever to run seamlessly on pre-supplied algos? Apart from instrument prices what other data are you providing? Corporate, economic....? News?

Hi Lucas,

We have models for spy, currencies, commodities, etc, but our initial offering with Quantopian only covers predictions on single name stocks. Quantopian is still finalizing the integration and we'll post a note on this board once it's published.

Running a strategy on a single stock is not optimal. It's generally better to trade a basket of stocks to minimize idiosyncratic risks and maximize returns.

Thanks for looking at the data and would love to have you use it once it's available.

Hi Anthony,

Our solutions are designed to make the process of researching and finding good investment opportunities easier. To that end, we provide resultant model outputs via APIs, web, or via our AI which understands natural language. The whole premise of LogitBot is that a user does not need deep domain experience in Machine Learning to use the models or algorithms.

We currently cater to institutional investors who generally make API calls to our models or pull down information via feeds. Quantopian is handling the data ingestion process for its users. I am happy to chat and learn more about your specific use cases in or outside of Quantopian. Please email me at [email protected].

Thanks!

When will the LogitBot datasets be back and available from Quantopian? What was the problem with the datasets that caused them to be removed?

Hi Nick,

Sorry for the mix up and thanks for reaching out.

There is nothing wrong with the dataset. There was an engineering issue with respect to loading the data, which has been resolved. It should be available to the community in the next week.

Please email me at [email protected] so that we can schedule a time to chat, and also share a really interesting strategy that we've developed on Quantopian.

Thanks.

This is complete Garbage - if you have any Ph.D in AI, Mathematics, Physics , Statistics , HFT or Algo Trading , you will quickly know that this is a bunch of nonsense . Any Experienced Data Analyst or AI engineer , can pick this algorithm apart very quickly . I suspect extreme overfitting, if it is Regression/Classification model with feature engineering . Try this on out-of-sample i.e unseen data with this Logibot feature engineering or stock and you will be surprised. Feature Engineering does help but its not a silver bullet coated with diamonds , No sir .