Help: Pipeline results dataframe has extra (hidden) entries in index

I'm obviously doing something wrong here (or expecting the wrong thing) but I can't see what it is (yes I've done the searching!)

The code below (condensed into a single cell for brevity) creates a pipeline which filters to the bottom 10 by AverageDollarVolume. I call run_pipeline() with a date range that spans 17 calendar days, 11 trading days. The resulting dataframe has 11 values in index.levels[0], as expected. Each day has 10 values in index.levels[1], as expected. reports that the multi index has 110 entries, as expected.

I want to extract the list of assets, so I use
results.index.levels[1].unique() which is a technique that is used in the documentation. However, this returns an array of 8930 (in my case) assets and this number is the same however wide or narrow I make the filter. I expected this to be the list of assets referenced in the index, so between 10 and a max of 110 (but probably more like 15), as run here. On one hand this feels like a pandas problem because run_pipeline() returns a pandas DataFrame and I then am only using pandas methods on it, but on the other hand it feels like a Quantopian problem because I have never seen this behaviour in a DataFrame produced by any other means. Help!

def make_pipeline(filterWidth=10):

    # Dollar volume factor  
    dollar_volume = AverageDollarVolume(inputs=[USEquityPricing.close, USEquityPricing.volume],  
    # 10-day close price average  
    mean_10 = SimpleMovingAverage(inputs=[USEquityPricing.close], window_length=10)

    filter_dollar_volume = dollar_volume.bottom(filterWidth)  
    return Pipeline(  
            'meanclose': mean_10,  
            'dolvol': dollar_volume  

filterWidth = 10  
results = run_pipeline(make_pipeline(filterWidth), '2020-05-15', '2020-06-01')  
dateCount = len(results.index.levels[0])  
asset_list = list(results.index.levels[1].unique())  
print('''Number of dates in index.levels[0]: {0}  
Number of rows in dataframe: {1}  
Product of dates and filter width: {2}  
Length of results.index.levels[1].unique(): {3}  
           dateCount * filterWidth,  
print('First 5 of asset list:\n', asset_list[:5])  

If you want to run it, my imports are these:

from quantopian.pipeline import Pipeline  
from quantopian.research import run_pipeline  
from import USEquityPricing  
from quantopian.pipeline.factors import SimpleMovingAverage, AverageDollarVolume  
import pandas as pd  

Thanks in anticipation...

Yes, you are correct that this behavior is in Pandas. The simple answer is to use the get_level_values method and not levels. So, this will get you what you expected

# Don't use 'levels'  
asset_list = list(results.index.levels[1].unique())  

# Use 'get_level_values' instead  
asset_list = list(results.index.get_level_values(level=1).unique() )

The less simple answer, along with a notebook, can be found in this post.

Good luck.


Dan, thanks very much for your prompt and helpful reply. The github discussion seems to have been going on for a long time and for many versions of pandas. It's a pity they chose to spend so much time infighting - they could have fixed the issue by now ;-)

Anyway, you've given me an insight which will be useful beyond Quantopian, and a practical workaround. Cheers!

@Peter Cahill Always glad to help when I can.

(and I've checked it out and it works. Thanks again!)