Back to Community
Learn from the Experts Ep 4: Avoiding Overfitting via Cross-Validation with Joakim

In this video, Quantopian community member and challenge winner, Joakim Arvidsson, walks through his algorithm creation process with Quantopian’s Thomas Wiecki. This video starts with a short interview about Joakim’s background in investment banking and continues with Joakim walking through an example algorithm he created on Quantopian.

As a long time community member and winner of the daily contest and multiple challenges, Joakim shows how he puts his skills to use on the Quantopian platform. He also explains the reasoning behind his decisions, allowing you to walk away with a better understanding of how financial algorithms work and a starting point for creating your own.

Submit to our latest challenge here to test your skills and for a chance to win cash prizes.

You can watch it at this link, or below:

Learn more by subscribing to our YouTube channel to access all of our videos and be notified when a new one is posted.

As always, if there are any topics you would like us to focus on for future videos, please comment below or send us a quick note at [email protected].

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

8 responses

Thanks for your generosity! A great video!

Thanks @Emiliano!

@All, here's the research notebook used in the video. As mentioned, it is based off of Thomas' original notebook: "Tackling overfitting via cross-validation over quarters."

Click to load notebook preview

And here's the backtest, including the 2 years of out-of-sample testing from Jan 2017 - Jan 2019.

Clone Algorithm
31
Loading...
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
from quantopian.pipeline import Pipeline, CustomFactor
import quantopian.pipeline.data.factset.estimates as fe
from quantopian.pipeline.domain import US_EQUITIES
# from quantopian.research import run_pipeline
from quantopian.pipeline.filters import QTradableStocksUS 
# import alphalens as al
from quantopian.pipeline.experimental import risk_loading_pipeline

import numpy as np
import pandas as pd
import quantopian.algorithm as algo
import quantopian.optimize as opt

from quantopian.pipeline.data.factset import RBICSFocus
from quantopian.pipeline.classifiers.morningstar import Sector

######################################################################

WMIN = 0.02
WMAX = 0.98

def initialize(context):
    algo.attach_pipeline(make_pipeline(), 'long_short_equity_pipe')
    algo.attach_pipeline(risk_loading_pipeline(), 'risk_factors')
    
    set_commission(commission.PerShare(cost=0.000, min_trade_cost=0))    
    set_slippage(slippage.FixedSlippage(spread=0))

    # Schedule my rebalance function
    schedule_function(func=rebalance,
                      date_rule=date_rules.every_day(), 
                      time_rule=time_rules.market_close(hours=1,minutes=30),
                      half_days=True)
    
######################################################################
    
class NanToNum(CustomFactor):
    window_length = 1
#     window_safe = True
    def compute(self, today, assets, out, factor):
        out[:] = np.nan_to_num(factor[-1])
        
######################################################################
    
def make_pipeline():

    # Base universe set to the QTradableStocksUS
    universe = QTradableStocksUS()

    # ms_sector = Sector()

    # economy = RBICSFocus.l1_name.latest
    sector = RBICSFocus.l2_name.latest
    # subsector = RBICSFocus.l3_name.latest

    fq1_eps_cons = fe.PeriodicConsensus.slice('EPS', 'qf', 1)
    price_tgt_cons = fe.LongTermConsensus.slice('PRICE_TGT')

    fq1_eps_cons_up = fq1_eps_cons.up.latest
    fq1_eps_cons_down = fq1_eps_cons.down.latest   
    fq1_eps_cons_total = fq1_eps_cons.num_est.latest   

    price_tgt_up = price_tgt_cons.up.latest
    price_tgt_down = price_tgt_cons.down.latest
    price_tgt_total = price_tgt_cons.num_est.latest

    alpha1 = ((fq1_eps_cons_up - fq1_eps_cons_down )).winsorize(WMIN, WMAX)#.zscore(mask=universe)
    alpha1nf = NanToNum(inputs=[alpha1.zscore()])#.winsorize(WMIN,WMAX)

    alpha2 = ((price_tgt_up - price_tgt_down ) / price_tgt_total).winsorize(WMIN, WMAX).zscore(mask=universe)
    alpha2nf = NanToNum(inputs=[alpha2.zscore()])#.winsorize(WMIN,WMAX)

    
    combined_factor = (

        alpha1nf +  
        alpha2nf +

        0

    ).zscore(mask=universe, groupby=sector)
    
    # combined_factor = SimpleMovingAverage(
    #     inputs=[combined_factor.zscore()], 
    #     window_length = 6, 
    #     mask=universe, 
    # )
    
    
    pipe = Pipeline(
        columns={
            'combined_factor': combined_factor,
            'alpha1': alpha1,
            'alpha2': alpha2,
            # 'alpha3': alpha3,
            
        },
        screen=universe 
            # & combined_factor.notnan() 
            # & combined_factor.notnull() 
    )
    return pipe
    
def before_trading_start(context, data):
    
    context.pipeline_data = algo.pipeline_output('long_short_equity_pipe')#.dropna()
    context.risk_loadings = algo.pipeline_output('risk_factors').dropna()
    
    
def rebalance(context, data):

    pipeline_data = context.pipeline_data
    
    # risk_loadings = context.risk_loadings
    
    combined_factor = pipeline_data['combined_factor'].fillna(0)   
    
    # alpha1 = pipeline_data['alpha1'].fillna(0) 
    # alpha2 = pipeline_data['alpha2'].fillna(0) 
    # # alpha3 = pipeline_data['alpha3'].fillna(0) 
    
    # combined_factor = (
    #     alpha1 + 
    #     alpha2 + 
    #     # alpha3 + 
    # )
    
    alpha_weight_norm = combined_factor / (combined_factor.abs().sum() )
    
    objective = opt.TargetWeights(alpha_weight_norm)    
    constraints = []
    
    # neutralize_risk_factors = opt.experimental.RiskModelExposure(
    #     risk_model_loadings=risk_loadings,
    #     version=opt.Newest,

    # )
    # constraints.append(neutralize_risk_factors)
    
    algo.order_optimal_portfolio(
        objective=objective,
        constraints=constraints,
    )
There was a runtime error.

The Pyfolio tearsheet with in-sample and out-of-sample analysis.

Click to load notebook preview

And the new (alpha-decay) tearsheet.

Click to load notebook preview

Hi
You mention tstats(IC) as something you look at in the alphalens.
could you please elaborate more on what this info can add?

Thanks!

Hi Idan,

I use it to determine if the factor is statistically significant or not. Both in training and when cross validating. I find it easier to use, instead of p-values lower than a certain threshold, for deciding when to reject the null hypothesis (and instead accept the alternative hypothesis). Especially for really robust factors (and factor combinations). This page explains it pretty well in my opinion.

Hi Joakim,

Thanks for sharing your process. Started two weeks ago and your notebook is really helpful to get a quick start!

When going through your process and testing two easy factors (3month return and 1year return) I do however face some problems on how to iterate through the cells. In the attached notebook I'm building the alphalens forward returns for the simple 3month share price return and 1 year share price return factors, then when trying to tweak the factors by taking the risk-adjusted share price return for the two periods I do get quite a few infinite values. Am I making a mistake when iterating through the cells after re-running the changed factors?

Thanks for your help!

Click to load notebook preview