Back to Community
NO Price Data At All !

Hi Quantopians, especially Delaney,
[I come in peace and i hope you are friendly]

Although I’m still “new around here”, if you will allow me to be so bold, I would like to start my very first thread here on the Quantopian Forum.
It is in the form of a challenge to anyone who wants to play, and I hope it will be fun and educational for everyone. You can do your own scoring for this one. There is no prize, at least certainly not from me anyway. In this particular challenge you can even cheat if you want to, although you will only be cheating yourself, because the aim here is to help everyone win something very special in terms of your own knowledge.

I created this for two reasons. Firstly because Delaney at Quantopian has been strongly urging us to look at “alternative data” and use inputs other than just price. Secondly, I know there are some people who don’t believe in “fundamentals”. They put forward various reasons about why they think “fundamentals are BS”, but I think they are wrong, and I would like to demonstrate that to you now.

Please take a look at the results below, from an algo that I wrote. Those results look rather un-spectacular, don’t they? Yes, I think they are rather un-spectacular too ….. except for one small thing……. The algo used NO PRICE DATA AT ALL!!

The only inputs are from the Morningstar Fundamentals data freely available to all of us, and excluding any ratios that involve the stock prices in any way, for example PE ratio, Price to Book ratio, Earnings yield, etc.

To anyone who thinks that Buffett & Munger’s consistent performance over decades is just a “statistical anomaly” (i.e. lucky ...yeah, sure, just like the idea that, given enough monkeys with typewriters, one of the monkeys will surely write the entire works of Shakespeare), I would say OK, continue to believe whatever you want but, IMHO, fundamentals really DO work and I believe Buffett & Munger are excellent proof of that. So is the algo output shown here….. Unless of course the only reason that it works is because the entire period from September 2009 to September 2017 is just a big bull market, and everyone knows that absolutely ANY fool whatsoever can make tons of money very easily in a bull market, right? ;-))

So here is the challenge for you:
Design an algo to beat the results shown, over the 8 year period from 1st September 2009 to 1st September 2017 (as an equal basis of comparison for everyone), using ONLY Morningstar Fundamentals data, EXCLUDING the price-related ratios. All other constraints are exactly the same as per the real Quantopian Open Contest, including "competition transaction costs" etc, and especially leverage <= 1.

You can score yourself however you want to really, but my personal “scoring system” for this little exercise is as follows. (Please note: this is not intended to have any particular relationship to the way in which Quantopian might calculate the Quantopian Open Contest scores and, as far as I know, it doesn’t but it is still useful, at least to me).

Uncle Tony’s Score = 100*Sharpe* (Returns% / 8years) * (1 + 10*(alpha-abs (beta))) / (1 + Drawdown%)

On that basis, the example shown would have
Uncle Tony’s Score = 100*0.93* (24.35 / 8) * (1+10*(0.03 – abs (-0.04))) / (1 + abs (5.70)) = 54.2

Can you improve on that? If so, please share with us how did you do it?

Remember, NO price data or ratios that involve stock price data in any way. Do your own scoring. This is designed as a learning experience. Have fun!!!

After playing a few times, I hope you will be asking yourself why you or anyone else would ever even consider throwing away a whole lot of perfectly good alpha by NOT using available fundamentals data.

Delaney, just imagine what we could do if we actually added PRICE data as well!! ;-))

Cheers, best wishes, Tony M.

Clone Algorithm
85
Loading...
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
#=================================================================
#    TonyM_NO_price_data
#    --------------------
#	Created 30 Oct 2017, Updated 6 Nov 2017

#=================================================================

from quantopian.algorithm import attach_pipeline, pipeline_output, order_optimal_portfolio 
from quantopian.pipeline import Pipeline
from quantopian.pipeline.data.builtin import USEquityPricing
from quantopian.pipeline.filters import Q1500US
from quantopian.pipeline.factors import CustomFactor, SimpleMovingAverage, AverageDollarVolume, Returns, RollingLinearRegressionOfReturns

import numpy as np
import pandas as pd

import quantopian.optimize as opt

# FUDAMENTALS DATA
#Improved version of Morningstar fundamental data in Pipeline. The new implementation (Aug 2017) is faster and corrects many data issues that existed in the old system.
#Fundamental queries in Pipeline are 2-20x faster. The biggest improvements will be noticed in fields that update less often (monthly, quarterly, etc.) and in queries that use many different fields. 

new_way = True
#New way: Fundamentals.my_field
if new_way:
    #from quantopian.pipeline.data import Fundamentals
    from quantopian.pipeline.data import Fundamentals as income_statement
    from quantopian.pipeline.data import Fundamentals as balance_sheet
    from quantopian.pipeline.data import Fundamentals as operation_ratios  
    from quantopian.pipeline.data import Fundamentals as valuation_ratios  
    from quantopian.pipeline.classifiers.fundamentals import Sector

    
sector = Sector()
#operating_income = Fundamentals.operating_income.latest


#=================================================================

# Define Constraint Parameter values
#-----------------------------------
# General constraints, as per rules of the Quantcon Singapore 2017 Hackethon & Quantopian Open, whichever is the more stringent.

# Risk Exposures
# --------------
MAX_GROSS_EXPOSURE = 0.90   #NOMINAL leverage = 1.00, but must limit to < 1.10
MAX_BETA_EXPOSURE = 0.05
MAX_SECTOR_EXPOSURE = 0.05
#Dollar Neutral .05
#Position Concentration .10

# Set the Number of positions used
# --------------------------------
NUM_LONG_POSITIONS = 300
NUM_SHORT_POSITIONS = 300

# Maximum position size held for any given stock
# ----------------------------------------------
# Note: the optimizer needs some leeway to operate. If the maximum is too small, the optimizer may be overly-constrained.
MAX_SHORT_POSITION_SIZE = 2*1.0/(NUM_LONG_POSITIONS + NUM_SHORT_POSITIONS)
MAX_LONG_POSITION_SIZE = 2*1.0/(NUM_LONG_POSITIONS + NUM_SHORT_POSITIONS)

#=================================================================

#Fundamentals:
#--------------
class Piotroski9(CustomFactor):
    #Profitability:
    # ROA = NetIncome/Total Assets > 0, i.e. NetIncome> 0. score +1
    # OpCF> 0, score +1.
    # ROA > ROAprevYr. score +1
    # CFOps > Net Inc. score +1
    # Leverage & Liquidity:
    # LTdebtRatio < prevYr, score +1
    # CR > prevYr, score +1
    # No new shares issued, score +1    
    # Op Eff:
    # Gross margin > prevYr
    # AssetTurn > prevYr, score +1
    inputs = [operation_ratios.roa, operation_ratios.cash_flow_from_continuing_operating_activities, income_statement.net_income_income_statement, operation_ratios.long_term_debt_equity_ratio, operation_ratios.current_ratio, valuation_ratios.shares_outstanding, operation_ratios.gross_margin, operation_ratios.assets_turnover]
    window_length = 260
    def compute(self, today, assets, out, roa, cash_flow_from_continuing_operating_activities, net_income_income_statement, long_term_debt_equity_ratio, current_ratio, shares_outstanding, gross_margin, assets_turnover):
        out[:] = np.sign(roa[-1]) + np.sign(cash_flow_from_continuing_operating_activities[-1]) + np.sign(roa[-1] - roa[-260]) + np.sign(cash_flow_from_continuing_operating_activities[-1] - net_income_income_statement[-1]) + np.sign(long_term_debt_equity_ratio[-260] - long_term_debt_equity_ratio[-1]) + np.sign(shares_outstanding[-260] - shares_outstanding[-1]) + np.sign(gross_margin[-1] - gross_margin[-260]) + np.sign(assets_turnover[-1] - assets_turnover[-260])
# Yes, sure I know this is not the real Piotroski score!


class AltmanZ(CustomFactor):  
    # alt_A = WC / Total Assets 
    # alt_B = Retained earnings / Total Assets 
    # alt_C = EBIT  / Total Assets 
    # alt_D = MktVal of Equity / Total Liabilities
    # alt_E = Sales / Total Assets
    # AltmanZ = 1.2*A + 1.4*B +3.3*C +0.6*D +1.0*E
    inputs = [income_statement.total_assets, income_statement.working_capital, income_statement.retained_earnings, income_statement.ebit, valuation_ratios.market_cap, income_statement.total_liabilities, income_statement.total_revenue]
    window_length = 252
    def compute(self, today, assets, out, total_assets, working_capital, retained_earnings, ebit, market_cap, total_liabilities, total_revenue):  
        out[:] = 1.2*(working_capital[-1]/total_assets[-1]) + 1.4*(retained_earnings[-1]/total_assets[-1]) + 3.3*(ebit[-1]/total_assets[-1]) + 0.6*(market_cap[-1]/total_liabilities[-1]) + 1.0*(total_revenue[-1]/total_assets[-1])
        
#Here's some info about the more modern successor to the Altman Z-score:
#https://alphaarchitect.com/2011/07/23/stop-using-altman-z-score/
#=================================================================

def make_pipeline():
# Create & return our pipeline (dynamic stock selector). The pipeline is     used to rank stocks based on different factors, including builtin factors,     or custom factors. Documentation on pipeline is at:    https://www.quantopian.com/help#pipeline-title

#Break this piece of logic out into its own function to make it easier to test & modify in isolation. In particular, this function can be copy/pasted into research and run by itself.

    altman_z = AltmanZ()
    piotroski = Piotroski9()

    # Classify securities by sector to enforce sector neutrality later
    sector = Sector()
    
    # Define universe of securities
    # -----------------------------
   
    universe = Q1500US()
    
    # Combined Rank
    # -------------
    # Construct a Factor representing the rank of each asset by our momentum, quality, value, and any other metrics. Aggregate them together here using simple addition.By applying a mask to the rank computations, remove any stocks that failed to meet our initial criteria **BEFORE** computing ranks.  This means that the stock with rank 10.0 is the 10th-lowest stock that was included in the Q1500US.
    
    combined_rank = (
        0.90*piotroski.rank(mask=universe).zscore() +
        0.10*altman_z.rank(mask=universe).zscore()
    )

    #Build Filters representing the top & bottom stocks by our combined ranking system. Use these as our tradeable universe each day.
    longs = combined_rank.top(NUM_LONG_POSITIONS)
    shorts = combined_rank.bottom(NUM_SHORT_POSITIONS)

    # Final output of pipeline should only include the top/bottom subset of stocks by our criteria.
    long_short_screen = (longs | shorts)

    # Define any risk factors that we will want to neutralize. We are chiefly interested in Market Beta as a risk factor. Define it using Bloomberg's beta calculation Ref: https://www.lib.uwo.ca/business/betasbydatabasebloombergdefinitionofbeta.html
    beta = 0.66*RollingLinearRegressionOfReturns(
                    target=sid(8554),
                    returns_length=5,
                    regression_length=260,
                    mask=long_short_screen
                    ).beta + 0.33*1.0
    
    # Create pipeline
    #----------------
    pipe = Pipeline(columns = {
        'longs':longs,
        'shorts':shorts,
        'combined_rank':combined_rank,
        'piotroski':piotroski,
        'altman_z':altman_z,
        'sector':sector,
        'market_beta':beta
    },
    screen = long_short_screen)
    return pipe

#=================================================================

# Initialization
# --------------
def initialize(context):
#Called once at the start of the algorithm.
    
    # Nominal Leverage = Maximum Gross Exposure = 1.00, but re-set this to 0.90 to avoid risk of exceeding hard leverage limit of 1.10
    context.leverage_buffer = 0.90
    
    # Set slippage & commission as per Quantopian Open rules.
    # For competition use, assume $0.001/share
    # Can take up to 2.5% of 1 minute's trade volume.
    set_commission(commission.PerShare(cost=0.001, min_trade_cost=0))
    set_slippage(slippage.VolumeShareSlippage(volume_limit=0.025, price_impact=0.1))
    
    context.spy = sid(8554)
    
    attach_pipeline(make_pipeline(), 'long_short_equity_template')

    # Schedule my rebalance function.
    #-------------------------------
    #Rebalance every day, 1 hour after market open.
    #schedule_function(my_rebalance, date_rules.every_day(), time_rules.market_open(hours=1))
    #Changed from monthly to weekly rebal.
    schedule_function(func=rebalance,
                      date_rule=date_rules.week_start(days_offset=0),
                      time_rule=time_rules.market_open(hours=0,minutes=30),
                      half_days=True)
    

    # Record tracking variables at the end of each day.
    #-------------------------------------------------
    #schedule_function(my_record_vars, date_rules.every_day(), time_rules.market_close())
    schedule_function(func=recording_statements,
                      date_rule=date_rules.every_day(),
                      time_rule=time_rules.market_close(),
                      half_days=True)    
    
#=================================================================

# Control & Monitor Leverage
#---------------------------
def handle_data(context, data):
    # Called every 1 minute bar for securities specified
    pass

# Record & output my portfolio variables at End of Day only
#----------------------------------------------------------
 
def recording_statements(context, data):    

    # Track the algorithm's leverage, to put on custom graph.
    leverage = context.account.leverage
    record(leverage=leverage)
    
#=================================================================        

def before_trading_start(context, data):
# Called and runs every day before market open. This is where we get the securities that made it through the pipeline.

    # Call pipeline_output to get the output
    # Note: this is a dataframe where the index is the SIDs for all securities to pass screen, and the columns are the factors added to the pipeline object above

    context.pipeline_data = pipeline_output('long_short_equity_template')


def rebalance(context, data):

    pipeline_data = context.pipeline_data

    # Extract from pipeline any specific risk factors to neutralize that have already been calculated 
    risk_factor_exposures = pd.DataFrame({
            'market_beta':pipeline_data.market_beta.fillna(1.0)
        })
    # Fill in any missing factor values with a market beta of 1.0.
    # Do this rather than simply dropping the values because want to err on the side of caution. Don't want to exclude a security just because it is missing a calculated market beta data value, so assume any missing values have full exposure to the market.
    # Define objective for the Optimize API. 
    # Here we use MaximizeAlpha because we believe our combined factor ranking to be proportional to expected returns. This routine will optimize the expected return of the algorithm, going long on the highest expected return and short on the lowest.
    
    objective = opt.MaximizeAlpha(pipeline_data.combined_rank)
    
    # Define the list of constraints
    constraints = []
    
    # Constrain maximum gross leverage
    constraints.append(opt.MaxGrossExposure(MAX_GROSS_EXPOSURE))
    
    # Require algorithm to remain dollar-neutral
    constraints.append(opt.DollarNeutral())    # default tolerance = 0.0001
    
    # Add sector neutrality constraint using the sector classifier included in the pipeline
    constraints.append(
        opt.NetGroupExposure.with_equal_bounds(
            labels=pipeline_data.sector,
            min=-MAX_SECTOR_EXPOSURE,
            max=MAX_SECTOR_EXPOSURE,
        ))
    
    # Take the risk factors extracted above and list desired max/min exposures to them. 
    neutralize_risk_factors = opt.FactorExposure(
        loadings=risk_factor_exposures,
        min_exposures={'market_beta':-MAX_BETA_EXPOSURE},
        max_exposures={'market_beta':MAX_BETA_EXPOSURE}
    )
    constraints.append(neutralize_risk_factors)
    
    # With this constraint, we enforce that no position can make up greater than MAX_SHORT_POSITION_SIZE on the short side and no greater than MAX_LONG_POSITION_SIZE on the long side. This ensures we don't overly concentrate the portfolio in one security or a small subset of securities.
    constraints.append(
        opt.PositionConcentration.with_equal_bounds(
            min=-MAX_SHORT_POSITION_SIZE,
            max=MAX_LONG_POSITION_SIZE
        ))

    # Put together all the pieces defined above by passing them into the order_optimal_portfolio function. This handles all ordering logic, assigning appropriate weights to the securities in our universe to maximize alpha with respect to the given constraints.
    order_optimal_portfolio(
        objective=objective,
        constraints=constraints,
    )
    
#=================================================================
There was a runtime error.
25 responses

This is a great example of starting to move towards more real, fundamental, economic hypotheses by using fundamental data about a company to make your predictions. In general pricing data can certainly be helpful, but really should in many cases be thought of as the outcome variable for which we are trying to predict.

I'm very interested to see what other directions people can take this algorithm. Obviously it's also totally possible to overfit models that use fundamental data, but as long as you're coming to it with a valid line of reasoning for why the model should work, that's a big plus.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

I wanted to encourage people to also look at these factors in a research setting, as you get a ton more visibility plus faster turnaround time on analysis. I cloned the factor analysis lecture as a baseline and swapped in Tony's factors. Looks like the first produces a lot of discrete values which currently doesn't work well with out alphalens library, but the second one works just fine and gives a whole readout. You can see that at least over a shorter predictive window of 1, 5, or 10 days, the factor is pretty inconsistent. This makes sense as fundamental factors look at real properties of a company that may take months to result in actual price actions.

Some next steps for curious folks:

  1. Which factor is actually contributing to performance?
  2. Run it on a longer time horizon, maybe look at 20, 40, 60 day windows.
  3. Are the factors covarying a lot?
Loading notebook preview...

the first produces a lot of discrete values which currently doesn't
work well with out alphalens library

It might be tedious to write but Alphalens support discrete values, just use the 'bins' option with custom intervals so that the discrete values fall in those bins. Here's an example Alphalens on the sentdex factor that has only discrete values.

Delaney, hi! & thanks.
I hope a lot of other people will pick up on this too.

The inputs I used were nothing special really, and there was definitely no attempt whatsoever at over-fitting or even any sort of fitting at all!
The basis of the algo is very simple, no secrets, and you can find it all in lots of places in the literature, in text books, and even on Wikipedia or Investopedia.

The "Altman Z score" part was published way back in 1968, so its not exactly new. It was designed by a Professor of Finance at New York University to predict the probability of a company going bankrupt within the next 2 years. If a company is financially sound then its Altman Z score should be high. If a company is financially shaky then its score will be low. Companies with negative Altman Z scores should generally make good candidates for shorting ....at least until they get de-listed and disappear! The Altman Z score was never meant as a short term indicator. I use it in my own personal, very small-time, long-only trading account to screen out stocks that have too much risk of bankruptcy for my liking.

The other component of this algo is the well-known Piotroski F-score, which you can also find just about anywhere (text books, Wikipedia etc). It was designed by Professor Joseph Piotroski (you can check him on Linked-In if you want), as a measure of a company's general financial strength, rather than specifically its risk of bankruptcy. The only minor problem with Piotroski's 9-point score is that each item gets scored +1 or 0 (actually I didn't even do that, I just took the sign +/-1), so it is a discrete, integer scale.

I made no attempt to optimize or even modify anything at all. I wanted to use both Piotroski-9point and Altman-Z because I think they complement each other nicely. I decided to add them together as factors, with some weight given to each. Piotroski is widely applicable, so I gave it a significantly higher weight than AltmanZ which is really only intended to estimate risk of bankruptcy, rather than likely financial performance. And that's all. Very easy really.

The weaknesses of the algo as presented here are the following:

a) It is obviously not a practical stand-alone system as it is, but that is quite deliberate. I wanted to completely separate out fundamentals from absolutely ANYTHING else at all that you might want to put into a "real" algo.

b) Piotroski 9-point is a discrete, integer scale. Obviously you need to convert it to a continuous scale which is more suited to our application, although there are some interesting & different ways you might do that and plenty of scope for experimentation. I didn't bother with that here, because I just wanted to show that, without any modification at all, there is lots of stuff out there that is very accessible and can so easily be used.

c) The output from the algo is quite "lumpy" in the time-domain sense. That's because the fundamental data only gets updated at the reporting periods of each company. You have to live with that. Don't try to interpolate in any way, because that would introduce a look-ahead bias.

d) I haven't tested it, but I expect that the algo will probably work better if you allow it to use a larger number of companies that I did. There is probably some optimum number that gives the best balance between rewards & transaction costs with too many tiny little trades. I don't know. Try it!

There is lots more fundamental data in Morningstar, just waiting to be experimented with in real algos, as compared to this little demo one.
I very deliberately didn't do anything more to improve it, because I wanted to put a challenge out there that I knew could be beaten!

Please, I invite anyone who is interested to try to do better than I have in this example. It shouldn't be too hard really, and basically it's "free alpha", out there and just waiting to be harvested!

Best wishes, Tony

Hi Luca,
I think being able to use discrete values is very useful in general, even if not actually essential in this algo. Thanks for your post.

To everyone,
If you have cloned the algo as I wrote it, you will see in line 96 my comment: # Yes, sure I know this is not the real Piotroski score!
I wrote this with reference to the rather long line 95 above it. In the real Piotroski score, he gives +1 for each item that passes the required criterion, and zero otherwise. What i have done is to just use sign(...) which gives +1 or -1 for each item rather than 1 or 0, so the number that comes out at the end will not be identical to Piotroski's actual numbers, but the ranking of the numbers is identical. If you want to get the actual Piotroski score numbers, then you need to insert something of the form MAX(0, my calculation value) in front of each of the 9 individual terms.

That difference from Piotroski's actual method was deliberate on my part, and it makes no difference to the functioning of the algo because the ranking for the individual stocks will be the same.

There are however some small typo / copy & paste errors in the out[:] = ........ part of the Piotroski9 custom factor on line 95 of the code as I wrote it.
In Piotroski's 9 point scheme, he subdivides the 9 terms into three parts, namely: Profitability (4 items), Leverage & Liquidity (3 items), and Operating Efficiency (2 items) and each of these items are correctly defined in the COMMENTS on lines 80-91. If you want to get correct Piotroski results, you will need to make some small corrections to code line 95 to ensure that it is actually consistent with comment lines 80-91.

Some suggestions:

a) Pull the Piotroski & Altman formulas apart and examine each term individually (9 terms in Piotroski and 5 in Altman) as possibly useful factors.

b) Some of these individual terms are well-known ratios from BalanceSheet, Income Statement/P&L or Cash Flow Statements (e.g. ROA = Net Income / Total Assets) but are already reported exactly as required in Morningstar (e.g. roa).

c) All the terms in the Altman formula are ratios of 2 different financial statement items and in all cases except Altman item D, the denominator is Total Assets so as to normalize each of the terms and make them dimensionless ratios before adding the together as Altman did.

d) For anyone who actually wants to treat this as a competitive exercise and really wants to use NO price data at all, you will have to leave out Altman's term B = Market Value of Equity / Total Liabilities, and also avoid using anything else from Morningstar that implicitly contains price, for example any type of yield, PE, PEG and so on, and also Market Cap and Enterprise Value. Obviously those are all useful things to look at too, but they do contain price data, just by the way they are defined.

I expect that if enough people at Quantopian play around with the Morningstar data, both on a stand-alone item-by-item basis, and also using some sensible combinations of the different items, we will probably come up with some good innovative alphas. Although the academic literature is full of studies by people looking over and over at the same old well-known factors (like Price-to-Book Value, etc) there is a lot of scope for innovative thinking and new ideas, as long as they are based on an understanding of the meanings & relationships between financial statement items. We want to make sure we don't come up with apparent but nonsensical correlations. I'm not sure if Delaney already told the story, but there was one infamous study in which the researchers which found that, at least over their original test period, out of a very large number of different possible factors, the factor with the highest correlation to S&P returns was the price of butter in Bangladesh! [hmmmm, how interesting! .... now why would that be ?] ;-))

One thing to keep in mind is that for allocations, Quantopian is looking for strategies that trade frequently enough to develop a good statistical confidence. Purely fundamental based strategies tend to have long predictive horizons on the factors. As such any trading more frequently than every 1-3 months is just paying unnecessary costs. Their infrequent nature makes them hard to evaluate as you'd have to wait years to develop enough sample points. Instead good approaches involve using fundamental data alongside other sources. You can sort and bucket by fundamental values, or use it as part of a larger overall model. One example might be finding that only certain types of companies were affected by sentiment, and using fundamental data to select for those. Then using the actual sentiment and pricing data to decide if they're currently under or overpriced.

Hi Delaney -

I haven't had much time lately to dink around on Quantopian, but in the back of my mind, I'm wondering how to approach the kind of multi-dimensional problem you describe:

good approaches involve using fundamental data alongside other sources

We have data coming out the wazoo:

https://www.quantopian.com/help#overview-datasources
https://www.quantopian.com/help/fundamentals
https://www.quantopian.com/data
Q500/Q1500 universes
Fetcher
Time of day/week/month
Etc.

The number of dimensions is huge. It seems like a problem for a computer, versus an individual formulating hypotheses and testing them one-by-one (using the research platform and Alphalens, for example). The problem needs to be reduced down to salient dimensions to be tractable. For example, one could consider attempting to see if fundamentals could be used to improve one or more of the 101 Alphas (https://www.quantopian.com/posts/alpha-compiler), across the Q1500US. Any idea how to do this on the research platform? We have the fundamentals, the alphas, and the universe, so what next?

Hi Grant, you are of course correct that, with lots of data, the number of dimensions is large and so part of the problem is simply dimensionality reduction. There are however two more-or-less diametrically opposed schools of thought about how to attack the problem. These are generally called the "Data First" approach and the "Ideas First" approach. Each has some advantages and some disadvantages. The former approach, Data First, is basically the Data-Mining type approach, which it appears that you are implicitly advocating as you write: "The number of dimensions is huge. It seems like a problem for a computer ......" The advantage of this approach is that it can sometimes uncover subtle relationships that are hard to spot manually. The disadvantage is that it can also uncover apparent relationships that are actually not there at all (for example the famous "S&P500 vs Butter Production in Bangladesh" phenomenon), or alternatively relationships that are real but which have no underlying or enduring basis and so get arbitraged away & disappear very quickly as people find them. The other approach, "Ideas First", starts with examining ideas & concepts that actually make sense from some deeper perspective and which are therefore far more likely to endure and produce robust trading systems / algos.

References, for your amusement:

"Butter in Bangladesh Predicts the Stock Market" https://www.fool.com/investing/general/2007/09/20/butter-in-bangladesh-predicts-the-stock-market.aspx

"The Bangladeshi butter-production theory of asset prices" http://business.time.com/2009/04/16/the-bangladeshi-butter-production-theory-of-asset-prices/

More BS for "Butter in Bangladesh" Fans
https://www.forbes.com/sites/davidleinweber/2012/12/31/more-bs-for-butter-in-bangladesh-fans/#6944115f451f

Nerds on Wall Street / Stupid Data Miner Tricks
http://nerdsonwallstreet.typepad.com/my_weblog/2007/04/stupid_data_min.html

Now, coming back to reality, in particular all of the fundamental data available in Morningstar are from one of three standard financial statements: the Balance Sheet, the Income Statement (P&L) and the CashFlow Statement, as well as some other miscellaneous bits of data like company address, etc. All the data from the 3 financial statements fit together in a coherent way (or at least they should, and if they don't they don't then maybe that means the company is "cooking its books", and that can be valuable info to uncover too). The point is that, at least as far as the "fundamentals" data are concerned, it makes sense to think carefully about what each of the numbers actually MEANS and how they are derived, and how they relate to a company's operations. They aren't just "signals" in some abstract sense. So I would suggest that a good starting point for "dimensionality reduction" with regard to Morningstar-type fundamentals data is to develop a good understanding of the actual meanings of the numbers and of the fundamentals of corporate accounting and financial statements, rather than to just number crunch & see what comes out.

@Tony Morland when I attempt to run your sample algorithm, I receive an error on line 108 claiming that Fundamentals has no attribute "net_income". Yet, for some reason, the algorithm does execute occasionally after multiple attempts. Do you know why this may be happening or is there something wrong on Q's side?

Hi Mustafa .... yes, I see the same strange error msg as you do. It wasn't there a few days ago, so I think it must be a problem on the Q side. I will follow up with Ernesto at Q help/support. Hope it get's fixed soon. Sorry for any inconvenience. Best regards, Tony

@Tony Morland no it’s cool, just want to make sure it gets fixed on Q’s side. Thank you for the Altman z score as well, I was wondering how I could translate it into python. Your code really helped a good amount!

@Tony, Sorry about that error. This weekend, we shipped a change that disambiguates some field names which we learned are being used to represent multiple data points by our fundamental data provider. The net_income is one such field. In this case, we get a net_income from two different reports: income_statment and cash_flow_statement. Sometimes, the data points differ depending on which report they come from. The two versions of net_income can be referenced with net_income_income_statement and net_income_cash_flow_statement, respectively.

I apologize for the confusion.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Thanks Jamie :-)

Mustafa & others: Algo in now revised & re-posted

Also, FYI, here's some info about the more modern successor to the Altman Z-score
https://alphaarchitect.com/2011/07/23/stop-using-altman-z-score/

@ Tony -

Regarding the "data first" versus "ideas first" approaches, it may be more synergistic. For example, my understanding is that for chess, the best approach is to pair powerful computers with expert players. I've yet to understand, at a basic level, what the 160,000 Quantopian mostly non-expert users will do (and my understanding is that Quantopian aspires to 1 M users). I'm one of the non-experts, and so some dimensionality reduction would help. Circling back to Delaney's suggestion:

One example might be finding that only certain types of companies were affected by sentiment, and using fundamental data to select for those. Then using the actual sentiment and pricing data to decide if they're currently under or overpriced.

So, where should one start? It seems the right algorithm could provide some clues, and then specific cause-effect hypotheses could be formulated and tested.

@Grant
Hi. You raise an number of interesting issues, ALL of which I think are worth picking up on, and some of which may just lead a long way..........

  • Chess:
    Firstly I love metaphors & analogies in general. They are great ways to look at things as "picture" or "story" or "... well, it's sort of like, but not quite, so what if we just tried .....", and thereby gain additional fresh insights into problems from unexpected angles. Personally, I think Chess is a GREAT metaphor for trading the markets, especially if you play against a chess computer, as I do. I almost never win, not only because I'm not a very good player, but also because every time I do (occasionally) win, I raise the level one more notch. The computer never gets tired, never misses anything, never makes "silly"mistakes in the ways that I do. I have learned a lot about trading while playing chess like this. I'm not sure I could actually verbalize some of those learnings, but that doesn't make them less real and I know my trading improves the more I play.

  • Data First vs. Ideas First:
    There's plenty of room for both and yes, they can reinforce each other. My only concern with "data first" is the risk of loss of robustness. I know some people say: "as long as it works, that's enough", but I don't quite buy it. Maybe it's not essential to understand why something works, but I think it helps, especially when things start to go wrong. Its also one of the problems with Neural Networks, some kinds of ML, "black boxes" in general, and abstracting things too far from the context in which they belong.

  • Context:
    I think it's always important to consider in trading, and I made some comments on that in a post about the need for long-term data. I think it also ties in with what Delaney has said about sentiment and trying to figure out when it works and when it doesn't (just like some TA "indicators?)

  • Quantopian users:
    " .... I've yet to understand, at a basic level, what the 160,000 Quantopian mostly non-expert users will do "
    No, I have no idea what they will do either ...but we will see.
    Although you write: I'm one of the non-experts, actually I think that you are probably just being modest. I'm sure you didn't write all those 623 algos without some great programming skills, and you are right about how "some dimensionality reduction would help". The question is how to go about it. My suggestion is that at least some level of relevant background knowledge always helps. So anyone who knows nothing whatsoever about accounting or corporate financial statements , or what all those names in Morningstar mean, or how the underlying companies can "cheat" & "massage" some (but not all) of those numbers to try to make things look better than they really are, well without at least a little bit of knowledge then you are at a disadvantage. I'm not necessarily suggesting reading lots & lots of boring accounting books, but at least some background knowledge will help a lot. I will post a short "reading list" for anyone interested.

On the topic of sentiment, I tried playing around with it a bit and so far very disappointing. Please see my post entitled: "Alternative Data: The Good, The Bad and the Useless". Maybe you can help set me on the right track .... perhaps I'm just not on it because of some silly mistake on my part.

  • Cause & Effect Hypotheses:
    With regard to (Morningstar) Fundamental data, there's no shortage of such things. Would a "reading list" help?

Cheers, best wishes.

@Tony Morland I would greatly appreciate a reading list; the accounting course I’m in currently drives me mad.

@Grant, @Delaney, please see posts on "Alternative data: Good / Bad / Useless". Now I like where this is going .......
........ finger pointing at the moon (for @Karl ). :-)

@Karl, could you expand a little? i don't follow what you would like to do

With the release of the new risk model, we can see what's producing these returns much better. I made a notebook that demonstrates that this algorithm has high exposure to momentum, size, value, and volatility. Not surprising for a fundamental value based strategy, but still interesting imho.

Loading notebook preview...

Many thanks @Delaney. I look forward to playing around with it and seeking further improvement .... . still without using any price data ;-)

@Mustafa, & others
I will put together a good reading list for you, but my suggested starting point is:
a) "The Little Book of Valuation" by Aswath Damodaran (Wiley, 2011). Small book but big on content. Inexpensive. Very easy reading.
b) Any other books by Damodaran e.g. "Damodaran on Valuation", "The Dark Side of Valuation", etc.
c) His website -- lots of excellent free material there. http://pages.stern.nyu.edu/~adamodar/
.

@Karl,
Re Damodaran: You're welcome :-)

Re ERP: I have some other good info on this which I will share, but i just need to go and get it, so maybe tomorrow.

Re your dialog with @Luca: im not sure exactly where you are going with this, but i start to see how you might take it a little further, construct a "synthetic" or "shadow" portfolio of stocks .... not the ones you are actually holding, but the ones you might potentially be holding, see how they perform, and then on the next day actually take the best-of-the best, both long & short (assuming you are working a balanced equity LongShort strategy) from the synthetic/shadow portfolio and put them into or use them to adjust the real portfolio, which then becomes a sort-of "best possible portfolio" with minimum (i.e. 1 day) lag. I don't have the python skill to code it, but i think it's a neat idea.

@Karl, i like it ;-) cheers!

@Tony thanks for the list; gonna get on it asap

@Grant, i appreciate your input to this discussion, thanks.
@Delaney, FYI.

I think part of beginning the solution to the: "...where to even start?" problem with applying Fundamentals data is to have some appropriate visualization tools.

Although Q is developing good tools for Risk evaluation, there is not as yet much apparent interest in tools for examining the INPUT data & inter-relationships of Fundamentals data in detail. I think this would be very worthwhile, and i have made what i think is a useful start. Please see the thread entitled: "Fundamentals - python ..... help .....".

The problem is that i am hampered by my very limited python beginner-level skills, so what I have done is clunky, clumsy and not very flexible or user-friendly. I hope that someone else can pick up on this. Would you or any of your python-savvy friends & colleagues be interested in taking this on further than i can? Cheers, TonyM.