We are doing the final testing on upgrading the Quantopian platform to use pandas 0.18, NumPy 1.11.1, and SciPy 0.17.1. When that testing is complete we will make the changes on our production servers. We are planning on upgrading the backtest servers first, followed by the live trading servers the following day. The exact date is pending the completion of testing, but we're aiming for Monday/Tuesday.
The vast majority of algorithms are unaffected by this change.
pandas 0.18 and NumPy 1.11 contain many useful additions and improvements. These releases also contain a small number of breaking changes that may affect Quantopian algorithms. Some algorithms will no longer run. Community members with a live algorithm that will be affected (broker-backed, contest, or zipline) have already received an email from us.
We will update this thread as the upgrade is performed.
Here are some of the common breakages that we've seen, and how to work around them. They might be useful if you run across an older algorithm that no longer runs.
Timezone-aware Datetime Columns
The most important change for Quantopian users is the pandas 0.17 addition of timezone-aware datetime columns to pandas DataFrames. This can cause Timestamps read from DataFrames that were previously tz-naive to now be tz-aware, leading to errors in comparison operators like
<. Very few Quantopian APIs are directly affected by this change, but algorithms that construct DataFrames containing datetimes columns may have changed behavior because of this update.
Changes in Broadcasting Behavior Between DataFrame and Series
NOTE: In this section, all the examples that refer to
add also apply to other binary arithmetic operators (e.g.
mul, etc.) as well.
In pandas, when you add a DataFrame and a Series, the Series is interpreted as a row and broadcast to every row in the DataFrame:
In : df Out: a b x 0 1 y 1 2 z 2 3 In : ab_series Out: a 10 b 100 dtype: int64 In : df + ab_series Out: a b x 10 101 y 11 102 z 12 103
This can lead to unexpected behavior when we want pandas to interpret a Series as a column instead of a row:
In : xyz_series Out: x 10 y 100 z 1000 dtype: int64 In : df + xyz_series Out: a b x y z x NaN NaN NaN NaN NaN y NaN NaN NaN NaN NaN z NaN NaN NaN NaN NaN
The right way to add a Series to a DataFrame column-wise is to use
In : df.add(xyz_series, axis=0) Out: a b x 10 11 y 101 102 z 1002 1003
A particularly common case where you often want to add a Series to a DataFrame as a column is when working with timeseries data. This is so common that older versions of pandas used to special-case the behavior of adding a Series and a DataFrame when both objects had DatetimeIndexes:
In : returns Out: AAPL MSFT 2014-01-01 0.10 -0.02 2014-01-02 0.05 0.10 2014-01-03 -0.01 -0.04 In : benchmark Out: 2014-01-01 0.10 2014-01-02 0.05 2014-01-03 0.06 Freq: D, dtype: float64 In : returns + benchmark frame.py:3194: FutureWarning: TimeSeries broadcasting along DataFrame index by default is deprecated. Please use DataFrame.<op> to explicitly broadcast arithmetic operations along the index FutureWarning) Out: AAPL MSFT 2014-01-01 0.20 0.08 2014-01-02 0.10 0.15 2014-01-03 0.05 0.02
While often convenient, this special case made it harder for users to understand the rules for broadcasting and led to confusing behavior when an operation that worked with datetimes stopped working with differently-indexed data. For these reasons, the pandas team deprecated the datetime special case in pandas 0.8.0 and finally removed the behavior in pandas 0.17.0. Consequently, trying to add a datetime-indexed DataFrame to a like-indexed Series will no longer implicitly use column-wise addition:
In : returns + benchmark Out: 2014-01-01 00:00:00 2014-01-02 00:00:00 2014-01-03 00:00:00 AAPL MSFT 2014-01-01 NaN NaN NaN NaN NaN 2014-01-02 NaN NaN NaN NaN NaN 2014-01-03 NaN NaN NaN NaN NaN
Users whose algorithms do columnwise arithmetic between Series and DataFrame should update their code to use the corresponding explicit methods. See the pandas docs for full details.
Stricter Int/Float Type Checking
Several APIs in pandas and numpy used to warn and coerce floats to integers. Many of these APIs now raise errors when they receive floats.
Most notably, using a float key with
DatafFrame.iloc will now raise an error:
In : df Out: d e f a 1 0 0 b 0 1 0 c 0 0 1 In : df.iloc[1.0] FutureWarning: scalar indexers for index type Index should be integers and not floating point type(self).__name__),FutureWarning) Out: d 0 e 1 f 0 Name: b, dtype: float64
In : df.iloc[1.0] ... TypeError: cannot do positional indexing on <class 'pandas.indexes.base.Index'> with these indexers [1.0] of <type 'float'>
Rolling and Expanding Functions
In Pandas 18, the
expanding_* families of functions (e.g.
expanding_mean) were changed to behave more like
In : df Out: a b c 0 1.788628 0.436510 5 1 -1.863493 -0.277388 5 2 -0.082741 -0.627001 5 3 -0.477218 -1.313865 5 4 0.881318 1.709573 5 In : pd.rolling_mean(df, 3) Out: a b c 0 NaN NaN NaN 1 NaN NaN NaN 2 -0.052535 -0.155960 5 3 -0.807817 -0.739418 5 4 0.107120 -0.077097 5 In : pd.rolling_min(df, 3) Out: a b c 0 NaN NaN NaN 1 NaN NaN NaN 2 -1.863493 -0.627001 5 3 -1.863493 -1.313865 5 4 -0.477218 -1.313865 5
In : df.rolling(3).mean() Out: a b c 0 NaN NaN NaN 1 NaN NaN NaN 2 -0.052535 -0.155960 5.0 3 -0.807817 -0.739418 5.0 4 0.107120 -0.077097 5.0 In : df.rolling(3).min() Out: a b c 0 NaN NaN NaN 1 NaN NaN NaN 2 -1.863493 -0.627001 5.0 3 -1.863493 -1.313865 5.0 4 -0.477218 -1.313865 5.0
For more details on this change, see the pandas changelog
At least one bug introduced in Pandas 0.17 is known to affect a small number of algorithms.
DataFrame.tranpose is broken when the frame being transposed contains a column of the new tz-aware datetime dtype. This issue manifests as an AssertionError with a message of
AssertionError: Number of Block dimensions (1) must equal number of axes (2).
Usage of the
DataFrame.transpose() method is, in general, discouraged.
transpose is very rarely necessary, and it can impose a significant performance penalty when applied to DataFrames containing multiple data types. In cases where
transpose() is necessary, users may have to ensure that their frames do not contain tz-aware datetimes.