Back to Community
Fundamental data integrity check including FactSet

Update: I'd suggest scrolling down to the May 4, 2019 backtest, comparing FactSet.
(I removed the earlier backtests)

How often are various fundamentals updated?
Price-earnings ratio for example might appear to be updating each day, except if earnings are updated quarterly, would only ^appear^ to be daily because of close prices changes of course.

Is everything actually essentially quarterly report data? For example, would fcf (free cash flow) yield be simply yesterday's close compared to the last quarterly reported fcf?

This code can be used for finding how often fundamental values are updated. Examples from logging output and from my spreadsheet.
In pipeline factors, be smart about it. Some questions: Why 240? For Mean of 4, what's going on with the ones that are 8? What about NaNs? By the way Min is not always 1, some are 2, some 4. I haven't run through all 1000 or so fundamentals.

Average trading days in a year: 252
fcf_yield: Across 2366 stocks seen, the stock whose value changed the most changed 240 times in a year. Average was 187.
ebit: Maximum of 8, average of 4 (quarterly)

         Max  
Stocks   Days   Min    Mean   Median  Score  Some fundamentals  
1979     239     1     195     234     67     pe_ratio  
2366     240     1     187     213     77     fcf_yield  
2366     238     1     185     209     77     earning_yield  
1910     235     1     175     213     59     two_years_forward_pe_ratio  
2041     233     1     164     191     60     two_years_forward_earning_yield  
2247     237     1     144     176     57     total_yield  
2188     233     1      96      86     37     buy_back_yield  
2375     239     1     208     234     87     market_cap  
2164     239     1     208     235     79     enterprise_value  
2276     220     1       8       4      3     cfo_per_share  
2276     220     1       8       4      3     fcf_per_share  
2253     220     1       8       4      3     sales_per_share  
2264      23     1       5       5     20     share_class_level_shares_outstanding  
2278      11     1      10      11     87     size_score  
2223      10     1       4       4     37     normalized_basic_eps_earnings_reports  
2223      10     1       4       4     37     normalized_diluted_eps_earnings_reports  
2228       8     1       4       4     46     operating_income  
2030       8     1       4       4     42     ebit  
1968       8     1       4       4     41     roic  
1963       8     1       4       4     41     normalized_ebitda  
1961       8     1       4       4     41     ebitda  
1427       8     1       3       4     22     interest_coverage  
1389       8     1       3       4     21     interest_expense_non_operating  
2266       7     1       4       4     54     roa

Was just to have some sort of Score for starters  
 = INT(100*(E3/C3) * B3/2375)  
B: Stocks  
C: Max Days  
E: Mean  
2375: Max stocks  

Logging

1969-12-31 16:00 initialize:26 INFO fcf_yield   2017-03-06 to 2018-03-02  
2017-03-29 13:00 log_counts:60 INFO 2081 stocks counts: max 20 min 1 mean 17 median 18  
2017-04-28 13:00 log_counts:60 INFO 2123 stocks counts: max 39 min 1 mean 33 median 36  
2017-05-30 13:00 log_counts:60 INFO 2152 stocks counts: max 58 min 1 mean 49 median 53  
2017-06-28 13:00 log_counts:60 INFO 2181 stocks counts: max 79 min 1 mean 66 median 72  
2017-07-28 13:00 log_counts:60 INFO 2215 stocks counts: max 98 min 1 mean 81 median 89  
2017-08-28 13:00 log_counts:60 INFO 2230 stocks counts: max 118 min 1 mean 97 median 107  
2017-09-27 13:00 log_counts:60 INFO 2252 stocks counts: max 139 min 1 mean 113 median 125  
2017-10-26 13:00 log_counts:60 INFO 2275 stocks counts: max 160 min 1 mean 128 median 142  
2017-11-27 13:00 log_counts:60 INFO 2296 stocks counts: max 181 min 1 mean 144 median 161  
2017-12-27 13:00 log_counts:60 INFO 2317 stocks counts: max 199 min 1 mean 157 median 177  
2018-01-29 13:00 log_counts:60 INFO 2343 stocks counts: max 219 min 1 mean 171 median 194  
2018-02-28 13:00 log_counts:60 INFO 2366 stocks counts: max 240 min 1 mean 187 median 213

1969-12-31 16:00 initialize:21 INFO ebit   2017-03-06 to 2018-03-02  
2017-04-03 13:00 log_counts:54 INFO 305 stocks  counts: max 2  min 1  mean 1  median 1  
2017-05-03 13:00 log_counts:54 INFO 775 stocks  counts: max 4  min 1  mean 1  median 1  
2017-06-02 13:00 log_counts:54 INFO 1849 stocks  counts: max 4  min 1  mean 1  median 1  
2017-08-02 13:00 log_counts:54 INFO 1880 stocks  counts: max 4  min 1  mean 1  median 1  
2017-08-31 13:00 log_counts:54 INFO 1947 stocks  counts: max 6  min 1  mean 2  median 2  
2017-10-02 13:00 log_counts:54 INFO 1950 stocks  counts: max 6  min 1  mean 2  median 2  
2017-10-31 13:00 log_counts:54 INFO 1953 stocks  counts: max 6  min 1  mean 2  median 2  
2017-11-30 13:00 log_counts:54 INFO 2005 stocks  counts: max 8  min 1  mean 3  median 3  
2018-01-02 13:00 log_counts:54 INFO 2010 stocks  counts: max 8  min 1  mean 3  median 3  
2018-02-01 13:00 log_counts:54 INFO 2011 stocks  counts: max 8  min 1  mean 3  median 3  
2018-03-02 13:00 log_counts:54 INFO 2030 stocks  counts: max 8  min 1  mean 4  median 4  
8 responses

Reply from Morningstar verifies, all based on quarterly or yearly. My edits capturing the essence of their email:

Daily
Fundamentals that are related to price such as Price/Earnings, Price/Sales, Price/Book, Price/Cash, etc....show a daily change simply due to price movement.

Annual and Quarterly reports
The naturally expected lag time between the release of a company’s financial report and Morningstar’s database update is influenced by each company’s priority level tied to market capital.
Timelines in business hours: P1 - 36, P2 - 60, P3 - 84, P4 - 108, P5 - 156, P6 - 204, P7 - 252.
Some companies do not report quarterly data.

The last few cells of the attached Notebook accomplish the same for all Fundamental quantitative factors.

Loading notebook preview...
Notebook previews are currently unavailable.

Thanks for posting that Notebook.

Guys, I know that this fundamentals' updating frequency is an issue for a lot of people. This is maybe not the best thread to post that, but for anyone interested, please find below a way to use a current-and-previous-fundamental custom factor without having to guess the updating frequency or date of the fundamental. Hope it'll help

class DaysSalesInInventoryGrowth(CustomFactor):  
    window_length = 100  
    inputs = [Fundamentals.days_in_inventory]  
    def compute(self, today, assets, out, days_in_inventory):  
        for col_ix, x, in enumerate(days_in_inventory):  
            DSI = days_in_inventory[:, col_ix]  
            DSI = np.unique(DSI[~np.isnan(DSI)])  
            if np.size(DSI) < 2:  
                out[col_ix] = np.nan  
                continue  
            out[col_ix] = (DSI[-1] > DSI[-2]).astype(int)  

Adding FactSet.

Nice clean output from this backtest and easy to use.

---------------------------------------------------  
Example out, using factset  
---------------------------------------------------

2017-05-01       FactsetFundamentals.pe_af  
  Changes  
  per 252         Number of  
trading days      companies  
----------------------------  
     2                 1566  
     1                  524

---------------------------------------------------  
Example out, using Fundamentals (morningstar)  
---------------------------------------------------

2017-05-01       Fundamentals<US>.pe_ratio  
  Changes  
  per 252         Number of  
trading days      companies  
----------------------------  
   245                  233  
   244                  296  
   243                  255  
   242                  188  
   241                  122  
   240                   91  
   239                   66  
   238                   47  
   237                   40  
   236                   21  
   235                   19  
   234                   14  
   233                   14  
   232                   12  
   231                   10  
   230                    4  
   229                    2  
   228                    4  
   227                    3  
   225                    3  
   224                    3  
   223                    2  
   222                    1  
   221                    3  
   220                    1  
   219                    1  
   217                    1  
   215                    1  
   214                    2  
   213                    2  
   211                    1  
Clone Algorithm
20
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
'''
https://www.quantopian.com/posts/fundamental-data-integrity-check-including-factset

Count the number times a fundamental changes in one year [or set number of `days`]
  (left column is the number of changes) per the number of companies that changed that many times (right column).

Look for this area to edit/try different fundamentals

    f = Fundamentals.           pe_ratio           <-- morningstar
    f = factset.Fundamentals.   pe_af              <-- factset

---------------------------------------------------
Example out, using factset
---------------------------------------------------

2017-05-01       FactsetFundamentals.pe_af
  Changes
  per 252         Number of
trading days      companies
----------------------------
     2                 1566
     1                  524

---------------------------------------------------
Example out, using Fundamentals (morningstar)
---------------------------------------------------

2017-05-01       Fundamentals<US>.pe_ratio
  Changes
  per 252         Number of
trading days      companies
----------------------------
   245                  233
   244                  296
   243                  255
   242                  188
   241                  122
   240                   91
   239                   66
   238                   47
   237                   40
   236                   21
   235                   19
   234                   14
   233                   14
   232                   12
   231                   10
   230                    4
   229                    2
   228                    4
   227                    3
   225                    3
   224                    3
   223                    2
   222                    1
   221                    3
   220                    1
   219                    1
   217                    1
   215                    1
   214                    2
   213                    2
   211                    1
'''
from quantopian.pipeline import Pipeline
from quantopian.algorithm             import attach_pipeline, pipeline_output
from quantopian.pipeline.filters      import QTradableStocksUS
from quantopian.pipeline.data.builtin import USEquityPricing
from quantopian.pipeline.data         import Fundamentals, factset
from quantopian.pipeline.factors      import CustomFactor
import numpy  as np
import pandas as pd

def initialize(context):
    context.days = 1 * 252       # 252 trading days per year
    m = QTradableStocksUS()


    '''  Set the fundamental to be checked  '''

    #f = Fundamentals.           pe_ratio
    f = factset.Fundamentals.   pe_af

    #f = Fundamentals.           roa
    #f = factset.Fundamentals.   roa_af


    attach_pipeline(Pipeline(
        screen  = m,
        columns = {
            'counts': FCount(inputs=[ f ], window_length=context.days, mask=m),
        }
     ), 'p')
    log.info('{} to {}\n'.format(
            get_environment('start').date(), get_environment('end').date()))
    context.fname = f.qualname

def before_trading_start(context, data):
    c = context
    out = pipeline_output('p') .astype(int)

    content = [
        '{}       {}   '.format(str(get_datetime().date()), context.fname),
        '  Changes',
        '  per {}         Number of'.format(str(context.days).rjust(3)),
        'trading days      companies',
        '----------------------------',
    ]
    for k, v in out.counts.value_counts().sort_index(ascending=False).iteritems():
        content.append('{}               {}'.format(str(k).rjust(6), str(v).rjust(6)))
    lg(content)

    record(_min = out.counts.min())
    record(mean = out.counts.mean())
    record(_max = out.counts.max())

    data_detail = 0
    if data_detail:
        try:    context.log_data_done
        except: log_data(c, out, 9)        # show pipe info in detail once

class FCount(CustomFactor):
    def compute(self, today, assets, out, z):
        df   = pd.DataFrame(z, columns=assets)                 # df for some functionality
        changes = (df.fillna(0).diff() != 0).sum()             # filling nan w zero etc
        out[:] = changes.values                                # back to ndarray

def lg(lines):  # log lines of output efficiently
    if not lines:
        return

    buffer_len = 1024   # each group
    chunk = ':'
    for line in lines:
        if line is None or not len(line):
            continue  # skip if empty string for example
        if len(chunk) + len(line) < buffer_len:
            # Add to chunk if will still be under buffer_len
            chunk += '\n{}'.format(line)
        else:  # Or log chunk and start over with new line.
            log.info(chunk)
            chunk = ':\n{}'.format(line)

    if len(chunk) > 2:       # if anything remaining
        log.info(chunk)

def log_data(context, z, num, fields=None):
    ''' Log info about pipeline output or, z can be any DataFrame or Series
    https://quantopian.com/posts/overview-of-pipeline-content-easy-to-add-to-your-backtest
    '''
    if not len(z):
        log.info('Empty pipe')
        return

    try:
        context.log_data_done
        return
    except:
        log.info('starting_cash ${:,}   portfolio ${:,}     {} positions ...'.format(
            int(context.portfolio.cash),
            int(context.portfolio.portfolio_value),
            len(context.portfolio.positions)
        ))
        context.log_data_done = 1

    # Options
    log_nan_only = 0          # Only log if nans are present.
    show_sectors = 0          # If sectors, see them or not.
    show_sorted_details = 1   # [num] high & low securities sorted, each column.
    padmax = 6                # num characters for each field, starting point.

    def out(lines):  # log data lines of output efficiently
        buffer_len = 1024   # each group
        chunk = ':'
        for line in lines:
            if line is None or not len(line):
                continue    # skip if empty string for example
            if len(chunk) + len(line) < buffer_len:
                # Add to chunk if will still be under buffer_len
                chunk += '\n{}'.format(line)
            else:  # Or log chunk and start over with new line.
                log.info(chunk)
                chunk = ':\n{}'.format(line)
        if len(chunk) > 2:       # if anything remaining
            log.info(chunk)

    if 'dict' in str(type(z)):
        log.info('Not set up to handle a dictionary, only dataframe & series, bailing out of log_data()')
        return
    elif 'MultiIndex' in str(type(z.index)):
        log.info('Found MultiIndex, not set up to handle it, bailing out of log_data()')
        return
    # Change index to just symbols for readability, meanwhile, right-aligned
    z = z.rename(index=dict(zip(z.index.tolist(), [i.symbol.rjust(6) for i in z.index.tolist()])))

    # Series ......
    if 'Series' in str(type(z)):    # is Series, not DataFrame
        nan_count = len(z[z != z])
        nan_count = 'NaNs {}/{}'.format(nan_count, len(z)) if nan_count else ''
        if (log_nan_only and nan_count) or not log_nan_only:
            pad = max( padmax, len('%.5f' % z.max()) )
            log.info('{}{}{}   Series  len {}'.format('min'.rjust(pad+5),
                'mean'.rjust(pad+5), 'max'.rjust(pad+5), len(z)))
            log.info('{}{}{} {}'.format(
                ('%.5f' % z.round(6). min()).rjust(pad+5),
                ('%.5f' % z.round(6).mean()).rjust(pad+5),
                ('%.5f' % z.round(6). max()).rjust(pad+5),
                nan_count
            ))
            log.info('High\n{}'.format(z.sort_values(ascending=False).head(num)))
            log.info('Low\n{}' .format(z.sort_values(ascending=False).tail(num)))
        return

    # DataFrame ......
    content_min_max = [ ['','min','mean','max',''] ] ; content = []
    for col in z.columns:
        try: z[col].max()
        except: continue   # skip non-numeric
        if col == 'sector' and not show_sectors: continue
        nan_count = len(z[col][z[col] != z[col]])
        nan_count = 'NaNs {}/{}'.format(nan_count, len(z)) if nan_count else ''
        padmax    = max( padmax, len(str(z[col].max())) ) ; mean_ = ''
        if len(str(z[col].max())) > 8 and 'float' in str(z[col].dtype):
            z[col] = z[col].round(6)   # Reduce number of decimal places for floating point values
        if 'float' in str(z[col].dtype): mean_ = str(round(z[col].mean(), 6))
        elif 'int' in str(z[col].dtype): mean_ = str(round(z[col].mean(), 1))
        content_min_max.append([col, str(z[col] .min()), mean_, str(z[col] .max()), nan_count])
    if log_nan_only and nan_count or not log_nan_only:
        log.info('Rows: {}  Columns: {}'.format(z.shape[0], z.shape[1]))
        if len(z.columns) == 1: content.append('Rows: {}'.format(z.shape[0]))

        paddings = [6 for i in range(4)]
        for lst in content_min_max:    # set max lengths
            i = 0
            for val in lst[:4]:    # value in each sub-list
                paddings[i] = max(paddings[i], len(str(val)))
                i += 1
        headr = content_min_max[0]
        content.append(('{}{}{}{}{}'.format(
             headr[0] .rjust(paddings[0]),
            (headr[1]).rjust(paddings[1]+5),
            (headr[2]).rjust(paddings[2]+5),
            (headr[3]).rjust(paddings[3]+5),
            ''
        )))
        for lst in content_min_max[1:]:    # populate content using max lengths
            content.append(('{}{}{}{}     {}'.format(
                lst[0].rjust(paddings[0]),
                lst[1].rjust(paddings[1]+5),
                lst[2].rjust(paddings[2]+5),
                lst[3].rjust(paddings[3]+5),
                lst[4],
            )))
    out(content)

    if not show_sorted_details: return
    #if len(z.columns) == 1:     return     # skip detail if only 1 column
    if fields == None: details = z.columns
    content = []
    for detail in details:
        if detail == 'sector' and not show_sectors: continue
        hi = z[details].sort_values(by=detail, ascending=False).head(num)
        lo = z[details].sort_values(by=detail, ascending=False).tail(num)
        content.append(('_ _ _   {}   _ _ _'  .format(detail)))
        content.append(('{} highs ...\n{}'.format(detail, str(hi))))
        content.append(('{} lows  ...\n{}'.format(detail, str(lo))))
        if log_nan_only and not len(lo[lo[detail] != lo[detail]]):
            continue  # skip if no nans
    out(content)
There was a runtime error.

Maybe worth a try, using FactsetFundamentals.pe_qf instead of _af

The code is unusual so I think ppl don't realize how easy it is.

Four quick steps:
1. Click Clone
2. Change a to q on line 82
3. Click Build
4. Wait 3 seconds for result. (Ok maybe 7 seconds)

Naturally some questions crop up.
Why were there fewer than 4 quarterly changes for that many companies? Did they all drop out of QTU?
I can understand if a bunch of them just reported and thus showing 5 quarterly reports, with reporting dates not strict, but why are such a high percentage of them at a count of 5 in one year? On another note, if the mask is removed, then over half of them (more than 5000 stocks) show only 1 quarterly value for the last year. Why?

The point: Know what we are dealing with in any data set, and account for any irregularities, or backtest results can be regarded as random.
There's an expression overseas that might apply, I'll omit it here.

Is it possible to show an out of date fundamental value in Quantopian pipeline latest set by referencing some standard quote/fundamental site (msn money, yahoo finance etc.) that uses Morningstar fundamentals. I usually find the reference for the data provider in the page footer.