Back to Community
New Dataset: RBICS Focus (Global Sector Data)

The datasets described herein are proprietary to FactSet Research Systems, Inc. ("Factset") and may not be copied or distributed. The datasets made available to Quantopian by FactSet are not exhaustive of FactSet's data, products, software, and/or services.

RBICS Focus: Revenue-Based Sector Classification Data

FactSet’ s Revere Business Industry Classification System (RBICS) is a comprehensive structured taxonomy designed to offer precise classification of global companies. RBICS Focus is a dataset containing single-sector mappings of thousands of the most liquid, publicly traded companies worldwide based on their primary lines of business. It uses revenues as the key factor in determining a company’s primary line of business. On Quantopian, RBICS Focus sectors are available at three levels of granularity:

  • Level 1: Economy
  • Level 2: Sector
  • Level 3: Subsector

RBICS Focus data is available via the Pipeline API, which means it can be accessed in Research and the IDE.


Properties

  • Coverage: All supported countries on Quantopian
  • Data Frequency: Daily
  • Update Frequency: Daily (updated overnight after each trading day)
  • Timespan: North America - 2004 to present. Start dates for other regions can be found here
  • Point-in-time start: November 2018
  • Holdout: 1 year

Data Holdout Period

Like other FactSet-sourced datasets, RBICS Focus has a holdout period. In this case, RBICS Focus has a trailing 1-year holdout. This means that the most recent year of data is not accessible in Research and the IDE. However, submitting an algorithm to the contest that uses RBICS Focus data is allowed and all contest scoring is done using the entire, up-to-date dataset. Similarly, algorithms using RBICS Focus data will be evaluated by Quantopian for funding using the full dataset.

Point-In-Time

RBICS Focus data has been collected and stored in a point-in-time fashion on Quantopian since November 2018. This corresponds to when Quantopian started downloading and storing the data on a nightly basis. RBICS Focus data prior to November 2018 is timestamped with a 1-day delay to emulate the delay that is expected in the point-in-time segment of the data.


Methodology

To overcome disparate and non-standardized company disclosure, FactSet created a normalized global industry classification structure. Standardized industry definitions are applied to companies globally. Only primary sources of information disclosed directly by companies via regulatory filings, investor reports, and company press releases are used. FactSet Analysts are trained to interpret information in a consistent manner and input them into a system with built-in quality and error-checking features.

Data quality is monitored using a combination of system and human quality controls. FactSet utilizes technology such as an internally developed document reader with customizable searching and translation tools to augment the data collection and review efficacy. Ultimately, the information quality results from the patented taxonomy and the well-trained analysts following the methodology yet exercising judgement to ensure collection of material data and proper assimilation of the information.


Usage

The RBICSFocus dataset is a pipeline DataSet. The columns of the RBICSFocus dataset can be used like any other BoundColumn in a pipeline.

Example

This code snippet constructs and runs a pipeline that computes the difference between an asset's 1-week return and the 1-week mean return of all assets with the same economy classification. Note that this example uses Factor.demean to group assets by economy classification and subtract the mean return of the group. Make sure to run it in Research.

from quantopian.pipeline import Pipeline  
from quantopian.pipeline.data.factset import RBICSFocus  
from quantopian.pipeline.domain import US_EQUITIES  
from quantopian.pipeline.factors import Returns  
from quantopian.research import run_pipeline

economy_focus = RBICSFocus.l1_id.latest  
returns_1w = Returns(window_length=6)

returns_1w_less_sector_mean = returns_1w.demean(groupby=economy_focus)

pipe = Pipeline(  
    columns={  
        'economy_focus': economy_focus,  
        'returns_1w_less_sector_mean': returns_1w_less_sector_mean,  
    },  
    domain=US_EQUITIES,  
)

df = run_pipeline(pipe, '2015-05-05', '2016-05-05')  
print(df.head())  

The attached notebook provides a similar example as well as some analysis of the RBICS Focus dataset.


Pipeline Datasets and Columns

Dataset

RBICSFocus - The RBICSFocus dataset is a pipeline dataset that provides access to revenue-based sector classifications.

Fields

The RBICSFocus dataset has 7 fields (accessible as BoundColumn attributes):

  • l1_id (dtype str) - Economy classification code based on business focus.
  • l1_name (dtype str) - Economy classification name based on business focus.
  • l2_id (dtype str) - Sector classification code based on business focus.
  • l2_name (dtype str) - Sector classification name based on business focus.
  • l3_id (dtype str) - Subsector classification code based on business focus.
  • l3_name (dtype str) - Subsector classification name based on business focus.
  • asof_date (dtype datetime64[ns]) - The start date (date when the record first applies) of the classification.

Other Notes

  • Currently, non-US data is only available in Pipeline in Research. The IDE only has access to US equity data at this time.
  • This is currently the best documentation for the Equity Metadata dataset. We are working on a new set of documentation that will document all data integrations in one place.
Loading notebook preview...
Notebook previews are currently unavailable.
Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

3 responses

Thanks Jamie, and well done to you and everyone else involved in rolling this out. I'm looking forward to exploring this new dataset!

Is it possbile to get the breakdown of all classification a company belongs to? Such as Amazon is 70% consumer, 30% technology based on its revenue.

The data is the main thing of all and it is important to maintain the data of any work and by the post, we can see how the data sector is doing work for the progress. I have once lost my data from the email account due to outlook 0x8004010f error.