Back to Community
Feature request: security universe with constantly updated S&P500

Given that the universe size limit has been raised to 500 securities, is Quantopian working at an easy way to have a constantly updated universe with all the 500 securities from the S&P500?

In the Fundamental Data Reference I could not find any data stating whether a security belongs to an Index (e.g. S&P500).

Maybe Q could easily provide a variant of the set_universe tool, with e.g. universe.S&P500Universe as parameter.

9 responses

would b awesome. The primary exchange id works for the nas 100 but not for snp

Hi guys,

Unfortunately, historical index constituent lists for indices like the S&P500 are licensed from the originating data vendor (in this case S&P) and we don't have an agreement in place to offer this data via the backtester. It's been a while since I looked into the costs for historical constituents, I think it would be prohibitive for an individual to license them at the going institutional rate, but if we could find a way to get a better scale deal it might be doable.

If you aren't required by investors to use an index like S&P 500 directly you can always create a proxy for large cap US listed stocks using the Morningstar fundamentals data. Granted that's not perfect and probably also requires a bit of handling edge cases and so on.

Best wishes, Jess

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

I've read that many times about the licensing and prohibitive costs, but couldn't one fairly easily create the historical constituents from scraping the wiki page like: https://en.wikipedia.org/wiki/List_of_S%26P_500_companies

This gives you (of course you have to trust the source) the current, and all changes back to 2000.
Most of the changes seem to come from this source: http://sandp500changes.whw1.com

(I wonder if this question originated after reading Andreas' latest book out this past weekend, where basically you need this historical constituents list to do the ranking, etc.)

So, what am I missing, that it wouldn't be easy enough to scrape the site with a script and create time indexed constituents listings? At least, that was my plan.

@Ken, seems doable, if you need help, please holler. I'm not a king developer but I have time

Hi Guys,

I wrote a parser for the historical constituents of the S&P 100 list on Wikipedia. I haven't tested it in a while so I'm not sure if it still works.

The first thing you need to do is export the XML from the Special Exports page. Then parse the XML with the code below to pull out the historical ticker list.

import xml.etree.ElementTree as ET  
import wikimarkup  
import pandas as pd  
from pyquery import PyQuery  
from datetime import timedelta

def get_tables(wiki):  
    html = PyQuery(wikimarkup.parse(wiki))  
    frames = []  
    for table in html('table'):  
        data = [[x.text.strip() for x in row]  
                for row in table.getchildren()]  
        df = pd.DataFrame(data[1:], columns=data[0])  
        frames.append(df)  
    return frames

tree = ET.parse('/path/to/Wikipedia-20141121081732.xml')  
root = tree.getroot()

sp100_hist = dict()  
components = list()  
for child in root[1]:  
    for table in child.findall('{http://www.mediawiki.org/xml/export-0.10/}text'):  
        components = get_tables(table.text)  
    if not components:  
        continue  
    for ts in child.findall('{http://www.mediawiki.org/xml/export-0.10/}timestamp'):  
        try:  
            sp100_hist[ts.text] = components[0]['Symbol']  
        except KeyError:  
            continue

symb = pd.DataFrame.from_dict(sp100_hist)

# find unique columns of symbols and drop the rest  
symb_unique = symb.copy()  
for i, col in enumerate(symb):  
    if i > 0:  
        if np.array_equal(symb[col], symb[prev_ts]):  
            symb_unique.drop(col, inplace=True, axis=1)  
    prev_ts = col  

@Peter it would be great if you could invest in it a little of your time!!!!

Concerning the sources I have found also
http://faq.library.upenn.edu/recordDetail?id=36133&library=manchester_business&institution=manchester

A CSV file with the list in the format required by the fetcher could be made available to everybody interested, e.g. in Dropbox.
Then we could use the fetcher (fetch_csv) to have a constantly updated S&P500 Universe in our algos.

In the CSV file the full list of S&P stocks should be given only for the days in which there was a change in the index constituents.
If there are no stocks for a date in the file, the stocks from the previous day are forward-filled in the Universe.
The CSV file could be backward filled at least until 2007 according to the WIKI page.

It would be great then to have a software running somewhere once per day that checks if there are changes to the index constituents and if so it automatically adds at the end of the CSV file the new 500 constituents with the current date.

@Nicola, yes WRDS is the database folks commonly point to to answer this question, but this is probably more of what Jessica was referring to perhaps. You have to be in an organization that has access to this, etc.
Hence my suggestion to use a public site like wikipedia to start with. True, it may not stay up to date, but not a bad start.

see my answer here: https://www.quantopian.com/posts/need-list-of-s-and-p-500-stocks, based on the holding of IVV

Clone Algorithm
44
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
import pandas as pd
import numpy as np
import datetime
import pytz

base_universe_df = pd.DataFrame()
symbols_blacklisted =[]
today = None

def initialize(context):
    global today
    context.count =0
    context.start = get_environment("start")
    context.end = get_environment("end")
    today = context.start
    context.universechanges = None
    # import the custom CSV data file
    #possible dates, take the last one
    posDates = pd.date_range(start=str(context.start), end=str(context.end), freq='BM', tz="UTC")
    dateToUse = posDates[0]
    strDateToUse = '%s/%s/%s' % (dateToUse.year, dateToUse.month, dateToUse.day)
    url = "https://www.ishares.com/us/products/239726/ishares-core-sp-500-etf/1395165510754.ajax?fileType=csv&fileName=IVV_holdings&dataType=fund&asOfDate="+strDateToUse
    #Below temporary solution until I found a way to make the dynamic solution work (generates a 500, probably has to do with headers): valid dates are  in the bottome part of this algo in a comment
    url ="https://copy.com/zbszbonNR1FXuwWg"
    
    fetch_csv(url,
              pre_func=add_date,
              post_func=postviewbase,
              skiprows=10,
              universe_func=my_universe_start)

    
def add_date(df):
    global today 
    if today is None:
        date = get_environment("start")
    else:    
        date = today
        
    df['date']=date
    df  = df.rename(columns={'Ticker': 'symbol'})
    df  = df.rename(columns={'Weight (%)': 'Weight'})
    df  = df.rename(columns={'Market Value': 'MarketValue'})
    df = df[df.symbol.notnull()]
    for stock in df.symbol:
        for blacklisted in symbols_blacklisted:
            df = df[df.symbol != blacklisted]

    df.fillna(method='ffill') 
    df = df[['date', 'symbol','ISIN','Name','Weight','Sector','MarketValue','Shares','Price']]
    return df
    
def postviewbase(df):
    global base_universe_df
    base_universe_df = df
    return df

# my_universe returns a set of securities that define your universe.
def my_universe_start(context, fetcher_data):
    global base_universe_df
                
    log.info('total universe size:'+str(len(base_universe_df)))
    # return the securities we identified earlier
    return base_universe_df['sid']



def handle_data(context,data):
    #do stuff
    pass

'''
valid dates for download from:

"https://www.ishares.com/us/products/239726/ishares-core-sp-500-etf/1395165510754.ajax?fileType=csv&fileName=IVV_holdings&dataType=fund&asOfDate="+

2015/06/19, 2015/05/29, 2015/04/30, 2015/03/31, 2015/02/27, 2015/01/30, 2014/12/31, 2014/11/28, 2014/10/31, 2014/09/30, 2014/08/29, 2014/07/31, 2014/06/30, 2014/05/30, 2014/04/30, 2014/03/31, 2014/02/28, 2014/01/31, 2013/12/31, 2013/11/29, 2013/10/31, 2013/09/30, 2013/08/30, 2013/07/31, 2013/06/28, 2013/05/31, 2013/04/30, 2013/03/28, 2013/02/28, 2013/01/31, 2012/12/31, 2012/11/30, 2012/10/31, 2012/09/28, 2012/08/31, 2012/07/31, 2012/06/29, 2012/05/31, 2012/04/30, 2012/03/30, 2012/02/29, 2012/01/31, 2011/12/30, 2011/11/30, 2011/10/31, 2011/09/30, 2011/08/31, 2011/07/29, 2011/06/30, 2011/05/31, 2011/04/29, 2011/03/31, 2011/02/28, 2011/01/31, 2010/12/31, 2010/11/30, 2010/10/29, 2010/09/30, 2010/08/31, 2010/07/30, 2010/06/30, 2010/05/28, 2010/04/30, 2010/03/31, 2010/02/26, 2010/01/29, 2009/12/31, 2009/11/30, 2009/10/30, 2009/09/30, 2009/08/31, 2009/07/31, 2009/06/30, 2009/05/29, 2009/04/30, 2009/03/31, 2009/02/27, 2009/01/30, 2008/12/31, 2008/11/28, 2008/10/31, 2008/09/30, 2008/08/29, 2008/07/31, 2008/06/30, 2008/05/30, 2008/04/30, 2008/03/31, 2008/02/29, 2008/01/31, 2007/12/31, 2007/11/30, 2007/10/31, 2007/09/28, 2007/08/31, 2007/07/31, 2007/06/29, 2007/05/31, 2007/04/30, 2007/03/30, 2007/02/28, 2007/01/31, 2006/12/29, 2006/11/30, 2006/10/31, 2006/09/29, 2006/08/31, 2006/07/31, 2006/06/30, 2006/05/31, 2006/04/28, 2006/03/31, 2006/02/28, 2006/01/31, 2005/12/30, 2005/11/30, 2005/10/31, 2005/09/30, 2005/08/31, 2005/07/29, 2005/06/30, 2005/05/31, 2005/04/29, 2005/03/31, 2005/02/28, 2005/01/31, 2004/12/31, 2004/11/30, 2004/10/29, 2004/09/30, 2004/08/31, 2004/07/30, 2004/06/30, 2004/05/28, 2004/04/30, 2004/03/31, 2004/02/27, 2004/01/30, 2003/12/31, 2003/11/28, 2003/10/31, 2003/09/30, 2003/08/29, 2003/07/31, 2003/06/30, 2003/05/30, 2003/04/30, 2003/03/31, 2003/02/28, 2003/01/31, 2002/12/31, 2002/11/29, 2002/10/31, 2002/09/30, 2002/08/30, 2002/07/31, 2002/06/28, 2002/05/31, 2002/04/30, 2002/03/28, 2002/02/28, 2002/01/31, 2001/12/31, 2001/11/30, 2001/10/31, 2001/09/28, 2001/08/31, 2001/07/31, 2001/06/29, 2001/05/31, 2001/04/30, 2001/03/30, 2001/02/28, 2001/01/31, 2000/12/29, 2000/11/30, 2000/10/31, 2000/09/29, 2000/08/31, 2000/07/31, 2000/06/30, 2000/05/31


'''
There was a runtime error.