Back to Community
Soon: Upgrade to pandas 0.18

We are doing the final testing on upgrading the Quantopian platform to use pandas 0.18, NumPy 1.11.1, and SciPy 0.17.1. When that testing is complete we will make the changes on our production servers. We are planning on upgrading the backtest servers first, followed by the live trading servers the following day. The exact date is pending the completion of testing, but we're aiming for Monday/Tuesday.

The vast majority of algorithms are unaffected by this change.

pandas 0.18 and NumPy 1.11 contain many useful additions and improvements. These releases also contain a small number of breaking changes that may affect Quantopian algorithms. Some algorithms will no longer run. Community members with a live algorithm that will be affected (broker-backed, contest, or zipline) have already received an email from us.

We will update this thread as the upgrade is performed.

Here are some of the common breakages that we've seen, and how to work around them. They might be useful if you run across an older algorithm that no longer runs.

Timezone-aware Datetime Columns

The most important change for Quantopian users is the pandas 0.17 addition of timezone-aware datetime columns to pandas DataFrames. This can cause Timestamps read from DataFrames that were previously tz-naive to now be tz-aware, leading to errors in comparison operators like == or <. Very few Quantopian APIs are directly affected by this change, but algorithms that construct DataFrames containing datetimes columns may have changed behavior because of this update.

Changes in Broadcasting Behavior Between DataFrame and Series

NOTE: In this section, all the examples that refer to +/add also apply to other binary arithmetic operators (e.g. -/sub, */mul, etc.) as well.

In pandas, when you add a DataFrame and a Series, the Series is interpreted as a row and broadcast to every row in the DataFrame:

In [8]: df  
   a  b  
x  0  1  
y  1  2  
z  2  3

In [9]: ab_series  
a     10  
b    100  
dtype: int64

In [10]: df + ab_series  
    a    b  
x  10  101  
y  11  102  
z  12  103  

This can lead to unexpected behavior when we want pandas to interpret a Series as a column instead of a row:

In [13]: xyz_series  
x      10  
y     100  
z    1000  
dtype: int64

In [14]: df + xyz_series  
    a   b   x   y   z  
x NaN NaN NaN NaN NaN  
y NaN NaN NaN NaN NaN  
z NaN NaN NaN NaN NaN  

The right way to add a Series to a DataFrame column-wise is to use DataFrame.add with axis=0:

In [16]: df.add(xyz_series, axis=0)  
      a     b  
x    10    11  
y   101   102  
z  1002  1003  

A particularly common case where you often want to add a Series to a DataFrame as a column is when working with timeseries data. This is so common that older versions of pandas used to special-case the behavior of adding a Series and a DataFrame when both objects had DatetimeIndexes:

Old Behavior:

In [4]: returns  
            AAPL  MSFT  
2014-01-01  0.10 -0.02  
2014-01-02  0.05  0.10  
2014-01-03 -0.01 -0.04

In [5]: benchmark  
2014-01-01    0.10  
2014-01-02    0.05  
2014-01-03    0.06  
Freq: D, dtype: float64

In [6]: returns + benchmark FutureWarning: TimeSeries broadcasting along DataFrame index by default is deprecated. Please use DataFrame.<op> to explicitly broadcast arithmetic operations along the index  
            AAPL  MSFT  
2014-01-01  0.20  0.08  
2014-01-02  0.10  0.15  
2014-01-03  0.05  0.02  

While often convenient, this special case made it harder for users to understand the rules for broadcasting and led to confusing behavior when an operation that worked with datetimes stopped working with differently-indexed data. For these reasons, the pandas team deprecated the datetime special case in pandas 0.8.0 and finally removed the behavior in pandas 0.17.0. Consequently, trying to add a datetime-indexed DataFrame to a like-indexed Series will no longer implicitly use column-wise addition:

New Behavior:

In [7]: returns + benchmark  
            2014-01-01 00:00:00  2014-01-02 00:00:00  2014-01-03 00:00:00  AAPL  MSFT  
2014-01-01                  NaN                  NaN                  NaN   NaN   NaN  
2014-01-02                  NaN                  NaN                  NaN   NaN   NaN  
2014-01-03                  NaN                  NaN                  NaN   NaN   NaN  

Users whose algorithms do columnwise arithmetic between Series and DataFrame should update their code to use the corresponding explicit methods. See the pandas docs for full details.

Stricter Int/Float Type Checking

Several APIs in pandas and numpy used to warn and coerce floats to integers. Many of these APIs now raise errors when they receive floats.

Most notably, using a float key with DatafFrame.iloc will now raise an error:

Old Behavior:

In [14]: df  
   d  e  f  
a  1  0  0  
b  0  1  0  
c  0  0  1

In [15]: df.iloc[1.0]  
FutureWarning: scalar indexers for index type Index should be integers and not floating point  
d    0  
e    1  
f    0  
Name: b, dtype: float64  

New Behavior:

In [15]: df.iloc[1.0]  
TypeError: cannot do positional indexing on <class 'pandas.indexes.base.Index'> with these indexers [1.0] of <type 'float'>

Rolling and Expanding Functions

In Pandas 18, the rolling_* and expanding_* families of functions (e.g. rolling_mean and expanding_mean) were changed to behave more like groupby.

Old Style:

In [13]: df  
          a         b  c  
0  1.788628  0.436510  5  
1 -1.863493 -0.277388  5  
2 -0.082741 -0.627001  5  
3 -0.477218 -1.313865  5  
4  0.881318  1.709573  5

In [14]: pd.rolling_mean(df, 3)  
          a         b   c  
0       NaN       NaN NaN  
1       NaN       NaN NaN  
2 -0.052535 -0.155960   5  
3 -0.807817 -0.739418   5  
4  0.107120 -0.077097   5

In [15]: pd.rolling_min(df, 3)  
          a         b   c  
0       NaN       NaN NaN  
1       NaN       NaN NaN  
2 -1.863493 -0.627001   5  
3 -1.863493 -1.313865   5  
4 -0.477218 -1.313865   5  

New Style:

In [6]: df.rolling(3).mean()  
          a         b    c  
0       NaN       NaN  NaN  
1       NaN       NaN  NaN  
2 -0.052535 -0.155960  5.0  
3 -0.807817 -0.739418  5.0  
4  0.107120 -0.077097  5.0

In [7]: df.rolling(3).min()  
          a         b    c  
0       NaN       NaN  NaN  
1       NaN       NaN  NaN  
2 -1.863493 -0.627001  5.0  
3 -1.863493 -1.313865  5.0  
4 -0.477218 -1.313865  5.0  

For more details on this change, see the pandas changelog


At least one bug introduced in Pandas 0.17 is known to affect a small number of algorithms. DataFrame.tranpose is broken when the frame being transposed contains a column of the new tz-aware datetime dtype. This issue manifests as an AssertionError with a message of AssertionError: Number of Block dimensions (1) must equal number of axes (2).

Usage of the DataFrame.transpose() method is, in general, discouraged. transpose is very rarely necessary, and it can impose a significant performance penalty when applied to DataFrames containing multiple data types. In cases where transpose() is necessary, users may have to ensure that their frames do not contain tz-aware datetimes.


The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

2 responses

The upgrade is complete. Any backtests you start will be running on the new libraries. On Monday, the live trading servers will come up with the new libraries in use.

In my logs, I am getting some deprecation warnings about pandas.ewm_std. Is there any way yet to update my code to fix this without losing all my live trading history?