Back to Community
OneHot encoding Sectors within pipeline factor call?

I'd like to convert the sector information from Sector() into a oneHot encoded array and return those columns rather than the standard Sector column. I have a function which will take the sector values and turn them into a binary array however I'm not sure how to override the Sector method so that it returns the correct value.

MORNINGSTAR_SECTOR_CODE = {  
     -1: 'Misc',  
    101: 'Basic Materials',  
    102: 'Consumer Cyclical',  
    103: 'Financial Services',  
    104: 'Real Estate',  
    205: 'Consumer Defensive',  
    206: 'Healthcare',  
    207: 'Utilities',  
    308: 'Communication Services',  
    309: 'Energy',  
    310: 'Industrials',  
    311: 'Technology' ,  
}

def oneHot_sectors(sector_keys):  
    ##- Convert the Sectors column into binary labels  
    sector_binarizer = preprocessing.LabelBinarizer()  
    strlbls = map(str, sector_keys)  #LabelBinarizer didn't like float values, so convert to strings  
    sector_binarizer.fit(strlbls)  
    sector_labels_bin = sector_binarizer.transform(strlbls)  # this is now 12 binary columns from 1 categorical

    ##- Create a pandas dataFrame from the new binary labels  
    colNames = []  
    for i in range(len(sector_labels_bin[0])):  
        colNames.append("S_Label_" + str(i))  
    sLabels = pd.DataFrame(data=sector_labels_bin, index=sector_keys), columns=colNames)  
    return sLabels  

now...
oneHot_sectorLabels = oneHot_sectors(MORNINGSTAR_SECTOR_CODE.keys()) creates a pandas frame with 12 columns for a binary encoding of the labels.

What I'd like to do is create a CustomFactor which will map the Sector to the correct row and return those values. Is this possible?