Back to Community
Access Data in Pipeline Output Pandas Dataframe

I apologize in advance if this is a basic question, but I am having trouble accessing the data in the dataframe that is returned from my Pipeline. As I understand, the dataframe that is returned by the Pipeline will have the assets that pass my screen as the rows, and any factors I applied as columns. How would I access one cell in that dataframe? I thought it would be something along the lines of:

val = context.output.iloc[etf]['sma_10']

where, the etf object is pulled from the context.portfolio.positions dictionary in a for-each loop. However, when I call it like this I get this error:

TypeError: cannot do positional indexing on class 'pandas.indexes.base.Index' with these indexers [Equity(9458 [SGY])] of type 'zipline.assets._assets.Equity'

I can do some hack workaround by figuring out the position of the asset I am looking for in the frame, and then using that, but I feel like something like this must surely be built in to the pandas library. Please let me know if this question needs any clarification. Thanks!

7 responses

Take a look at this post https://www.quantopian.com/posts/keyerror-when-i-try-to-get-column-data-from-pipeline-output

Generally the fastest way to read a single value from a dataframe is to use the '.get_value' method http://pandas.pydata.org/pandas-docs/version/0.19.2/generated/pandas.DataFrame.get_value.html. Assuming the dataframe has a column labeled 'price' then maybe something like this.

aapl = symbols('AAPL')  
aapl_price = pipeline_output_df.get_value(aapl, 'price')

Here's a notebook showing this in action...

Loading notebook preview...
Notebook previews are currently unavailable.

Hi Dan,

Thanks for the quick response, worked like a charm! There is one slight problem using '.get_value' when iterating through the portfolio positions dictionary, however. It seems that the asset keys in this dictionary are formatted differently than in the pipeline output:

i.e. the pipeline looks like :

Equity(42247 [MEMP])

and the positions dictionary looks like:

Equity(42247, symbol=u'MEMP', asset_name=u'AMPLIFY ENERGY CORP', exchange=u'NASDAQ', start_date=Timestamp('2011-12-09 00:00:00+0000', tz='UTC'), end_date=Timestamp('2017-05-05 00:00:00+0000', tz='UTC'), first_traded=None, auto_close_date=Timestamp('2017-05-10 00:00:00+0000', tz='UTC'), exchange_full=u'NASDAQ GLOBAL MARKET')

So it is unable to key using the second. Is there any quick conversion that lets me use the second key like the first one? Apologies if this doesn't make sense.

Hmm, not sure why there is a problem. It would help if you attached a backtest. The following should both work...

    for stock in context.portfolio.positions:  
        price = context.output.get_value(stock, 'latest_close')  


    for stock in context.output.index:  
        price = context.output.get_value(stock, 'latest_close')  

Attached is some code with this in action.

Clone Algorithm
5
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 59405933a5f3b46e6f074242
There was a runtime error.

Hi Dan,

Attached is a backtest, this one succeeded because I commented out the lines that were causing the error (75-80). The line specifically in question is line 77 (and in turn, line 79 as well since it does the same operation). Thanks for your patience and all your help!

Clone Algorithm
2
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 59406cb37040a96de7efd128
There was a runtime error.

The problem is you are setting a screen in your pipeline.

  pipe = Pipeline(  
        screen = (wr<=-80),  
        columns = {  
            'W%R': wr,  
            'R1': r1,  
            'S1': s1,  
        }  
    )  


The pipeline therefore only returns securities each day where wr<=-80. It turns out that some of the currently held positions (ie those in context.portfolio.positions) don't pass that filter and are therefore not in the current pipeline output. Therefore the '.get_value' method below fails

    for etf in context.portfolio.positions:  
        curr_price = data.current(etf, 'price')  
        if curr_price >= context.output.get_value(etf, 'R1'):

because 'etf' is not in the current index of pipeline (presumably because wr is greater than -80).

The 'get_value(stock, column) method works to get a value from the pipeline dataframe only if 'stock' is in the index.

Hmmm so that must be an error somewhere else in the algorithm, since it is supposed to buy and sell on the same day, nothing should be in the portfolio overnight, meaning that the portfolio should always be a subset of the pipeline output. Thanks a lot, Dan!