Back to Community
Analyzing Pipeline Data from Research notebook - How to get data using Equity object?

Hi there,

Say if I have a data that I got from running a pipeline i.e.

my_pipe = make_pipeline()  
data = run_pipeline(my_pipe, '2014-05-05', '2015-05-05')  

How do I get, for example, data for only AAPL equity? In Pipeline result, looks like equities are zipline.assets._assets.Equity objects and we are not allowed to run from zipline.assets._assets import Equity. How else do I create Equity object?

Loading notebook preview...
Notebook previews are currently unavailable.
11 responses

you have 2 levels that are not columns. Convert at least one of them.
Then access as normal. Here is how I did it.

result.reset_index(inplace =True)  
result.rename(columns ={ 'level_0': 'Current Day',  
                        'level_1': 'sid'  

Where 'Ps' is price to sales ratio

for i in xrange(0,len(result)):  
    if result.sid[i].symbol == u'AAPL':  
        print 'AAPL',result['Ps'][i]  

Thank you for replying to this thread.

Unfortunately, I cannot use for iteration nor == operator here. I need to be able to get directly from pandas. See following code:

eq1 = # How do I create Equity object here?  
df = pd.DataFrame(data.ix[:, 'close_price'].xs(eq1, level=1))  

Any idea?


this will give you Equity object for SPY

p = get_pricing(['SPY','TLT'])  
SPY = p.axes[2][0]  

But I cannot use that equity object for selection i.e. result[result.index == SPY]. Any idea?

@Jay Teguh Wijaya
You should be able to select and compare to an equity object in the manner posted above. There are better ways but that should work.

The problem you may be having is that both the 'get_pricing' and 'run_pipeline' methods return dataframes (or panels) sometimes with a multi-index and sometimes not.

The key is understanding what type of object 'result' is, how it's indexed, and how many dimensions it has. This varies depending upon if using 'get_pricing' or 'run_pipeline' and then also on what parameters are passed (see and )

Could you be more explicit what 'result' is and what 'SPY' is. Maybe post a notebook?

Thanks for replying, @Dan Whitnable

Please see attached for the example notebook.

Loading notebook preview...
Notebook previews are currently unavailable.

A lot of ways to get to data for a specific security. The output of run_pipeline is a multi-indexed dataframe. The major index are the dates. Again, many ways to do this but I like the '.xs' method to extract a single days slice.

result_today = result.xs(today)

Now, select a particular security and data column (remember that the pipeline can return many columns of data). I like using the 'at' method. The 'loc' method is perhaps more ubiquitous though.

# Use the symbols method to get a security object when the symbol is known  
arnc_object = symbols('ARNC')

# Use the 'at' method to get a single value from a dataframe.  
# The first parameter is the index, the second is the column name[arnc_object, 'exchange_id']

See the attached notebook.

Loading notebook preview...
Notebook previews are currently unavailable.

That's awesome! Thank you @Dan ! I have not used multi-index dataframes much so I did not know the xs trick.

I ran into the same issue. When running a pipeline that brings back several equities over multiple days (multi index dataframe), I perform the following steps:

# get a df from my pipeline  
result= run_pipeline(make_pipeline(), '2017-01-03','2017-04-03')

# name the first two columns for easier identification  
result.index.names = ['dates','equities']

# retrieve the slice of data for a specific equity over the entire date range  

#alternatively retrieve all equities for a specific date  

I hope this is useful to others.

Thanks Mitch - this helped!