Back to Community
Quantopian Lecture Series: Introduction to pandas

A quantitative workflow is all about testing hypotheses on data. Before you can test hypotheses or do anything with your data, it needs to be in a format that is easy to access and to work with. pandas is a Python package specifically designed to make management and analysis of your data all part of the same intuitive workflow. It provides data structures that allow you to organize and perform efficient calculations on time series and cross-sectional data with ease. It underlies most of the computations done in the lecture series and is used by many cutting edge firms. In this lecture we will walk you through some basic use cases and make sure you’re familiar with all the components you need to get started.

All of our lectures are available at:

Loading notebook preview...

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

5 responses

When computing the mean, prices.mean(axis=0) ; why do we compute across rows (axis=0) instead of columns (axis=1)?

So since each column is a different equity, this means that the rows in each column are the price values for that equity. Thus we want the mean along all the rows, not across each row. Setting axis=0 indicates that we want to calculate the function on the set of rows for each column. This way we get the means for each equity in our index.

I realize this is not clear. I will update the explanation in the lecture to better show this idea.

In the example, the rows (axis 0) are the dates and the columns (axis 1) are the securities. To get the mean price of the securities along all days then one would compute the mean along the rows (axis=0). If one wanted to get the mean for the days along all securities then one would compute the mean along the columns (axis=1). Depends upon what you want to do, but typically we want the average price of each security (axis=0) and not the average price of each day (axis=1). The axis specifies where the values are from (in this case the dates) and not how to group the data (for instance the columns).

If you are referring to the notebook cells...

The same statistical functions from our interactions with Series
resurface here with the addition of the axis parameter. By specifying
the axis, we tell pandas to calculate the desired function along
either the rows (axis=0) or the columns (axis=1). We can easily
calculate the mean of each columns like so:


It may be confusing but it's correct. To get the mean of each column, along the rows, use axis=0.

That's a very good way to put it, Dan.

Thank you all for the clarification and quick response!