Back to Community
Upcoming Changes to Morningstar Fundamental Data

Morningstar is planning to release a set of changes to its fundamental dataset starting on April 1st, 2018. There are three categories of changes they are releasing that may impact your algorithm:

(1) Some fields are being entirely removed by the vendor. These fields are no longer being supported because they had low usage by Morningstar customers.

How this might affect you: If your strategy references these fields, it will no longer be able to run after April 1st. We will soon start surfacing deprecation warnings in the IDE to let you know, and you can adjust your strategy.

(2) Some fields are being removed by the vendor because there is a duplicate field that surfaces similar data.

How this might affect you: We will map these fields on our backend to the active, duplicate field. Your algorithm will continue to run. You will see a deprecation warning in the IDE, explaining that the field you're accessing is no longer active with a suggestion to convert to the active sibling field.

(3) Some fields are being recalculated because more granular data is available or the formula is being adjusted to be more aligned with industry standards.

How this might affect you: Most fields should see similar values, but this may differ from field to field. Depending on how your algorithm accesses the data, using the last known value or a historical lookback window, it may see a regime change around April 1st when the new data starts to surface. In the IDE you will also see a message soon about the upcoming adjustment.


Attached to this post is a notebook with a list of fields per category. You can check if your strategies reference any of the identified datapoints. From our research, most strategies don't use the fields slated for removal and should continue to run. The QTU and the risk model won't be affected by this update.

Generally, we always prefer to give you ways to test your strategies in advance when changes like this happen. Unfortunately, we won't be able to do so here. The old vs new data won't be available concurrently for testing, and will be cutover by the vendor on April 1st. We are still working to assess category #3, the differences in magnitude for fields being recalculated, to confirm the expectation that the old vs new values will be similar. We will certainly keep you updated as we learn more.

We will be emailing people in the contest if they might be impacted by the upcoming transition. And are always available for any questions.

Thanks,
Alisa

Loading notebook preview...
Notebook previews are currently unavailable.
Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

17 responses

A related backtest here can be used for gathering information on fundamentals.

Alisa,

Quantopian's obsession with high data quality notwithstanding, I find these statements discomforting:

"...may see a regime change around April 1st when the new data starts to surface..."
"We are still working to assess...the differences in magnitude..."

As someone who is very reliant on fundamental data, I will provide you here an outline of a specification for what I would like the Morningstar Data Dictionary to tell us:

For each and every data element:
1. How frequently it is expected to be updated (daily, quarterly, etc.) .
2. The events that trigger the update (preceding day's closing price, 10K/Q filed with SEC + 3 days, stock split date +__ days, etc.).
3. If it has a regime change on April 1, why so, including a formula or map from/to (more than just a "difference in magnitude").
4. The percentage of QTradableStocksUS that are expected to have current data for this data element.

Rationale for the above is that currently, as a user, I must expend an inordinate amount of time to confirm how frequently data elements are updated and if those elements are available for a large proportion of the stocks in my universe. If I were to take on faith that all elements were current and available for all stocks in my universe, and I then regress historical values to make a prediction, I'd potentially have garbage in/garbage out, even if the fundamental reasoning is sound.

Thank you for taking my comments under consideration,
Doug Baldwin

@Blue, thanks for sharing!

@David, I fully share your same level of frustration. This is a change that not only impacts the community, but ourselves internally at Q as well. We worked hard with the vendor to see if there was any wiggle room and unfortunately this is outside of our control. Our style is generally to ease into changes; to assess the impact, have a period running both environments in parallel, and then cutover, and we weren’t able to provide that in this case. I have shared the details we know and can certainly update here as more information surfaces.

From what I have gathered, the update frequency for the fundamental is not expected to change. It will remain on the same schedule as it has historically. Similar for the trigger events, majority of these events should remain unchanged. It will be the formula itself to calculate the ratio that will be adjusted in category 3. For example if the formula to calculate a ratio includes a field that Morningstar is removing (per category 1), the calculation will now be slightly different.

I’m working on an analysis to estimate the impact of category 3 — to better understand the expected magnitude of difference between the old vs new data. I’m hopeful the original info was correct, that the differences are minor. I will share here as we uncover more in the next few days.

Hi Alisa,

I am not intending to express frustration. I am explaining what we need.

Fundamentals update frequency is an imprecise unknown. I simply have asked that you correct for this confusion while implementing the April 1 update. To be clear in making my point about Q's imprecise definition of (factor)_asof_date, I have developed for you the attached notebook. Please address our concerns with precise definitions and an explicit data dictionary.

Thank you,
Doug

Loading notebook preview...
Notebook previews are currently unavailable.

@Doug - thank you for that notebook -- it is very well put together. We're combing through it and we will share soon our thoughts about fixed vs changing values related to the asof_date.

@all - A few updates to share about the upcoming Morningstar update as we learn more:

  • Deprecation warnings for the affected fields are now in the IDE. If you reference one of the fields slated for removal, you'll see an update in the IDE that the field will no longer receive updating values from Morningstar after March 31st.
  • For the fields in category 2 that were suggested to be mapped to their active duplicate field, after an investigation we saw inconsistencies. We saw that some fields were indeed similar in behavior while others were orders of magnitude different.

For example looking at the field proceeds_from_issuance_of_warrants, Morningstar suggested the duplicate field was proceeds_from_stock_option_exercised. From our analysis, these fields are quite different from one another, and mapping to the suggested active field is not the answer here:

Here is the comparison of count of values:

And here is the median range of the values per field:

Whereas other fields, like net_income_cash_statement closely resembled the suggested active field:

Based on this, we are doing the following in order to minimize the impact on the community:

  • Keeping the fields slated for removal in our database. This means your algorithm will continue to be able to run after April 1, but the field will not longer receive updating values from Morningstar. The last known value will be forward-filled. You will see deprecation warnings in the IDE to flag this behavior and suggest you explore other active fields.
  • A sub-category of these fields are heavily used in the contest. After speaking with some users and examining the data, we will forward-fill 5 fields with NaNs until the old 6 month contest finishes later this year.
  • Once the 6 month contest runs to a close in August, we will begin to remove these stale fields from the database. This will give you time to transition your strategies. And algos in the contest will not break.
  • To better support strategies in the contest, there are a few shims we are building on our backend. Two fields that are heavily in-use in the contest will be mapped to their active duplicate field. After investigation, these appear to be similar in value. The fields being shimmed are net_income_cash_flow_statement and 'total_liabilities`. If you run a backtest with these fields soon (later this week) you will see the appropriate IDE warning message.

The specific fields and impact are laid out in the updated version of the notebook below.

TLDR -- Most fields will have the last known value forward-filled after March 31. Five fields heavily in use in the contest will have NaNs forward fill after this date. Two fields will be mapped to their active sibling field. If you reference any of these fields, you will see the appropriate deprecation warning in the IDE. At a later date in the future, these stale fields will be sunsetted.

Loading notebook preview...
Notebook previews are currently unavailable.

Thanks for the update!

Two things:
1. Are these changes going to be reflected in https://www.quantopian.com/help/fundamentals?
2. For the fields that have "Regime Changes", will Morningstar tell us the specific changes made between the old methodology of calculation and the new methodology?

@Doug: First off, thanks for your notebook. It's very well written and illustrates your point clearly.

Just to make sure we are on the same page, let me restate the problems you described as I understand them:
- It is not clear, without looking at the data, what the update frequency of a field should be in expectation.
- Looking at just the asof_date of a field doesn't always tell the whole story. There are some fields that see their asof_date updated daily, but the actual value is updated less frequently.

Currently, we get fundamental data from Morningstar in two types of files: daily files and monthly files. Daily files are delivered to us on a daily basis, while monthly files arrive at the beginning of each month. A monthly file arriving at the beginning of month N typically contains data relevant to month N-1 (similarly, daily files delivered on day N typically have data from day N-1). Whenever we get a data point in one of either file, we create a record in our database. The record is created with an asof_date (this is a date that Morningstar provides), a timestamp (this is the datetime that we learned about the data point), and a value (the value of the field).

As you noted, some fields come in the daily file, despite the fact that their value isn't actually getting updated. In this case, we still process the data point and create a new record, because we received a new data point.

Unfortunately, some fields do not always consistently come in just the daily file or just the monthly file. There are some fields that usually come in one file, but sometimes arrive in the other. For example, some fields will be regularly updated in the monthly file, but receive restatements in the daily file mid-month.. As a result, we do not pre-define an expected update frequency for each field. Instead, we surface the data in the format that it comes and leave the analysis up to the individual analyzing the data. For our fundamental data integration, we felt that this was the best solution given some of the uncertainty around the update frequency.

What does this mean for you? Well, to be frank, more work. To best understand the expected update frequency of a field, you'll need to analyze the data much like you did in your above report. There are some fields (the mostly static ones) where it's best to look at the value to see if there has been a change, but it still might be helpful to look at the asof_date to know that even if the value hasn't changed, Morningstar is telling us that they know it hasn't changed, not just that they haven't provided an update.

For daily fields, a stale asof_date typically means that a value was not updated in the last batch of data that we received from Morningstar. Sometimes, it could mean that just the field or even just the data point for that asset was not updated in the last file, but other times it could mean that we didn't get the file from Morningstar on time, or our system that processes that data had some type of problem that day. If the reason for the stale asof_date matters to you, there are different techniques you can use to figure out the scale of the problem. For example, you could check the latest asof_date of the field you're looking at to figure out if the entire field was not updated, or if it's just the asset in question. You could also determine if it's the entire fundamental dataset that didn't update by looking at something like AAPL's market cap (or another example of something that is expected to updated daily).

For quarterly values, it gets trickier. In my experience, the asof_date of a quarterly field reflects the end of the quarter for which the data point applies, and it is not updated in the same way as the static fields mentioned earlier. For these, the way to think about it will depend on the questions you want to answer as well as how you want to use the data. I gave an example here of getting the last 4 quarters of data. Notably, the factor in that notebook has a limitation where it does not handle missing updates well, so you might be looking at 4 quarters of data for most companies, and 3 for a small number of companies that are missing a data point. Ultimately, it's up to you to decide how to handle these cases. We'd certainly like to make it easier to query quarterly data - most of the effort would be in aligning data so that you have consistent definitions of a quarter across different assets - but it's not something that we're currently working on.

Apologies for the long answer, I'm hoping it helps explain some of the decisions we've made, or maybe helps you in your research.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

@Cheng -

  1. Yes, we'll update the documentation to remove fields that will be stale in the system to discourage usage as they'll likely be removed sometime in the future. We'll also check that the sample algos, tutorials, and lectures all reference the active fields.

  2. Unfortunately, no. We received generic descriptions of how the definition is being adjusted but not clear detail. For example, a field might have its definition adjusted because it "Includes removed items". Or for a "Balancing effect". Beyond that high-level description we don't have any more color.

@all -

The deprecation warnings in the IDE will be updated with the latest known information this evening. Instead of a generic message saying "We are investigating the similarity of these two fields to determine the impact of the change", we have updated the detailed message for each specific field. If your backtest references a field that will no longer be updating and will have stale data, that will be now clear in the IDE. And if it is marked to be recalculated, that also has the appropriate informational message.

Here is the execpted timeline of events in April. This is particularly relevant to fields that are marked as recalculated.

  • Until March 31 - Data will continue to arrive in the old format. After this date fields scheduled for removal will no longer receive any fresh updated values.
  • April 2 - Daily fields will arrive in the new recalculated format (this is the first trading day of the month). The field will live update in the new format each day
  • early April - Monthly fields are updated in the old format ( for the trailing month of March)
  • mid April - Daily fields will receive their historical values in the recalculated format. This means historical lookbook windows will now have the full history in the new format.
  • early May - Monthly fields are updated in the new format (for the trailing month of April)

Once all of the files arrive in mid-April, we plan to do an analysis to understand by how much did the values change when being recalculated (category 3). During those few weeks, depending on how you use the field strategy, your backtest may see the following.

If you look at the latest value of the field day over day:

If your algorithm uses a lookback window, depending on the length of the window is will see the following. Pre-March 31 it will have data in the old format. Between April 1 - mid-April (until the historical files are delivered) it will have a mix of data in the old format and data in the new format. After the historical files are delivered mid-April, the lookback window will see the full history on the new format.

From what we've heard, we are not expecting a big change in value like the extreme case illustrated in the drawings above. But we can't promise what will happen, and will evaluate the files when they arrive.

There were some issues noticed in this thread that caught a few issues in the notebook - thanks! The notebook has now been updated and here's the latest copy with what we know. Thanks again for your patience as we work through this.

Loading notebook preview...
Notebook previews are currently unavailable.

@Jamie: Thank you for your clear and thorough explanation.

The fundamentals reference page has been updated to reflect only the active fields that will be receiving fresh values from Morningstar going forward: https://www.quantopian.com/help/fundamentals

Will the Morningstar set by replaced by the FactSet data set, if not now, then eventually? Personally I find the Morningstar data a bit frustrating, the example above one clear point.

Hi Dan, Good question. This is definitely a question about which we have been thinking. We have not yet made a decision on exactly how we will handle the specific case of fundamentals since it is a case of overlapping data. I can envision a future where you have access to both and are able fall back from a primary to a secondary provider. But we will need to analyze the best approach given additional considerations such as ease of use and backwards compatibility.

Thanks

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

@Quantopian: Is the migration complete? The warning statement, quoted below, is now dated, and as a result unclear.

":12: QuantopianWarning: Morningstar adjusted the calculation for the field starting on April 1, 2018. The historical values for this field will surface from the vendor in mid-April 2018.
Until the historical data is updated in mid-April, your algorithm will see old (before April 1) and new (after April 1) definitions of the field in the same backtest."

Morningstar delivered the files to us mid-April and we've recently started to look at the differences for the fields being recalculated before/after the April 2018 period. We are hoping to share the results soon, and will then remove (or appropriately update) the warning information that's present in the IDE. Stay tuned.

This is now complete. The IDE warning has been removed for the fields that were marked as recalculated. Our analyses showed that on average for the Quantopian Tradable Universe, the values of the fields did not change significantly. Thanks everyone for your patience while we worked through this to wrap up.

The following warnings were raised today in a research notebook. Are these warnings still valid?

<string>:12: QuantopianWarning: Morningstar adjusted the calculation for the field 'net_income_income_statement' starting on April 1, 2018.

The historical values for this field will surface from the vendor in mid-April 2018.

Until the historical data is updated in mid-April, your algorithm will see old (before April 1) and new (after April 1) definitions of the field in the same backtest.  
<string>:12: QuantopianWarning: Morningstar adjusted the calculation for the field 'impairment_of_capital_assets' starting on April 1, 2018.

The historical values for this field will surface from the vendor in mid-April 2018.

Until the historical data is updated in mid-April, your algorithm will see old (before April 1) and new (after April 1) definitions of the field in the same backtest.  
<string>:12: QuantopianWarning: Morningstar adjusted the calculation for the field 'operating_income' starting on April 1, 2018.

The historical values for this field will surface from the vendor in mid-April 2018.

Until the historical data is updated in mid-April, your algorithm will see old (before April 1) and new (after April 1) definitions of the field in the same backtest.  
<string>:12: QuantopianWarning: Morningstar adjusted the calculation for the field 'total_revenue' starting on April 1, 2018.

The historical values for this field will surface from the vendor in mid-April 2018.

Until the historical data is updated in mid-April, your algorithm will see old (before April 1) and new (after April 1) definitions of the field in the same backtest.  
<string>:12: QuantopianWarning: Morningstar adjusted the calculation for the field 'payables_and_accrued_expenses' starting on April 1, 2018.

The historical values for this field will surface from the vendor in mid-April 2018.

Until the historical data is updated in mid-April, your algorithm will see old (before April 1) and new (after April 1) definitions of the field in the same backtest.  
<string>:12: QuantopianWarning: Morningstar adjusted the calculation for the field 'net_ppe' starting on April 1, 2018.

The historical values for this field will surface from the vendor in mid-April 2018.

Until the historical data is updated in mid-April, your algorithm will see old (before April 1) and new (after April 1) definitions of the field in the same backtest.  
<string>:12: QuantopianWarning: Morningstar adjusted the calculation for the field 'receivables' starting on April 1, 2018.

The historical values for this field will surface from the vendor in mid-April 2018.

Until the historical data is updated in mid-April, your algorithm will see old (before April 1) and new (after April 1) definitions of the field in the same backtest.  
<string>:12: QuantopianWarning: Morningstar adjusted the calculation for the field 'cash_and_cash_equivalents' starting on April 1, 2018.

The historical values for this field will surface from the vendor in mid-April 2018.

Until the historical data is updated in mid-April, your algorithm will see old (before April 1) and new (after April 1) definitions of the field in the same backtest.  
<string>:12: QuantopianWarning: Morningstar adjusted the calculation for the field 'enterprise_value' starting on April 1, 2018.

The historical values for this field will surface from the vendor in mid-April 2018.

Until the historical data is updated in mid-April, your algorithm will see old (before April 1) and new (after April 1) definitions of the field in the same backtest.  
<string>:12: QuantopianWarning: Morningstar adjusted the calculation for the field 'goodwill' starting on April 1, 2018.

The historical values for this field will surface from the vendor in mid-April 2018.

Until the historical data is updated in mid-April, your algorithm will see old (before April 1) and new (after April 1) definitions of the field in the same backtest.  

@Doug, thanks for pointing that out and it should now be fixed. The warnings have now been removed in the research environment as well.