Back to Community
Cant Create a columns for SID & Symbols Name

Hi Fellow Quants,

I am trying to build a data set that creates a Data Frame with the following columns:

SID, Symbol Name, Price, Volume, Market Cap, Super Sector Name, Super Sector Code, Sector Name, Sector Code, Industry Group, Industry Group Code, Industry Name, Industry Code

I took the sector & industry classifications based on the MorningStar classification on the fundamentals page and inserted them into a CSV that I am pulling using the local_csv function.

https://www.quantopian.com/help/fundamentals

The challenge is that when i merged the 2 datasets and deduplicated on column names, I lost the column created by default for pipelines that shows SID & Symbol it looks like "Equity(24 [AAPL]) ".

Can someone look at my code and show me how my merge between the 2 data frames is removing this column. This or how to pull SID & Symbol name as a column would be greatly appreciated.

Loading notebook preview...
Notebook previews are currently unavailable.
1 response

The issue is that the 'SID & Symbol which looks like "Equity(24 [AAPL])' is the asset object. This is the index of the pipeline dataframe. The 'merge' method ignores this index when joining on columns (see http://pandas.pydata.org/pandas-docs/version/0.18/generated/pandas.merge.html ). To get around that, first move the index to a column, then do the merge, then set the index back to the index column. This can be done in one line.

result10.reset_index().merge(test, on=['Industry Code', 'Super Sector Code', 'Sector Code', 'Industry Group Code']).set_index('index')

This assumes the pipeline output is 'result10'. I didn't actually try this and it may get confused with the pipeline output having a multi-index of both assets AND dates, but this is a general approach. You may also need to add a 'how' parameter to the 'join' method? One subtle point in the above code is that, by default, the 'reset_index' method sets the name for the new column it makes to 'index'. That is why 'set_index('index')' works.

I didn't have the CSV file to test this with, so again my apologies if it doesn't work exactly. The general approach however should work.

Good luck.