Back to Community
How Quantopian merge stock price data and Morningstar fundamental data ?

Usually when we do factor investing research on academic , we would try to combine CRSP (historical pricing data) and Compustat (Fundamental data) with so called CCM Linktable on WRDS. Some researcher would just combine with those two dataset with CUSIP number, but it will cause some problems, for example like this website shows(
It's actually quite easy to have some errors when combine pricing data and fundamental data especially when we use the dataset from other vendors.
So I'm curious how Quantopian merge stock price data and Morningstar fundamental data ?


5 responses

All data on Quantopian is 'normalized' or tied together with our internal Security ID (SID). We get data from various vendors with various identifiers. Quantopian uses several sources to associate all this data to our internal SID. Moreover, at times the stock ticker or CUSIP or other identifier may change as a result of a merger, acquisition, or other corporate action. The article linked above highlights some of the issues. The Quantopian specifics for tracking changes are beyond the scope of this post. However, do note the associations are generally in line with FactSet and other major data providers.

This does highlight an important consideration when querying data. Quantopian stores all data 'point-in-time'. If a stock ticker changes over time, at one point ticker XYZ may refer to one company and, at another point, a completely different company. Quantopian data recognizes this but needs to know the 'reference date' to resolve which company one is referring to. This can be specified as the symbol_reference_date parameter in the symbols and symbol method. The most precise way to specify a security is to use the SID.


The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Since CRSP and Compustat are have their own permanent identifier called PERMNO and GVKEY which like Quantopian's SID.
Researchers have to merge those two dataset through another dataset called CCM Linktable , so I'm still curious the details about how Quantopian track those corporate change action when Quantopian use SID to combine with Morningstar fundamental dataset? Because Morningstar must have different identifier from Quantopian.
Could you share more details?


In the Morningstar files which Quantopian loads each day, there are two fields company_ID and security_ID which are used to tie to the Quantopian SID. Morningstar provides secondary information to translate from these numbers to CUSIP. We internally first translate to CUSIP and then to the SID.

Morningstar data is a bit unique in that some of the data is company specific (eg number of employees, total revenue, income) while other data is security specific (eg price to book ratio, price earnings ratio, etc.). Depending on the type of data, Morningstar will provide either the company_ID or security_ID. If a company_ID is specified, the data is broadcast to all CUSIPS before then tying to the SID.

Hope that helps.

May I inquire? What is your use case for asking?

Thank you, Dan. It's helpful.
Actually, I consider deploy some money to my algorithm which I develop in Quantopian for my brokerage account. It’s quite concentrated portfolio strategy, that means any inaccuracies in dataset might have big effect in back-testing result.
If I understand correctly, Quantopian get company_ID or security_ID from Morningstar then convert it to CUSIP based on information provided by Morningstar. Because CUSIP will change through time, Quantopian also record each securities CUSIP and SID on your internal database, So unless Quantopian don't renew the CUSIP for mapping to SID accidentally or When using company_ID or security_ID convert to CUSIP have some problem(ex: Morningstar don't renew information on time for the company_ID or security_ID when converting CUSIP )
Using CUSIP to combine pricing data and fundamental data will be fine. Right?

Combining is done using CUSIP (which uniquely identifies a security) and corporate actions (which ties CUSIPS together in cases of mergers, acquisitions, spinoffs, and other corporate actions). The Morningstar mapping is quite robust and typically matches other sources such as FactSet.

The Morningstar data is "point-in-time" so one can look at company info and ticker over time. Run a pipeline with the fields legal_name and primary_symbol. Those will be the company name and ticker as they may have changed over time.