Research Recipes

The following examples are code snippets that can be run in the Research environment.

Data Visualization

Plotting with matplotlib

This example retrieves market cap data from the FactSet Fundamentals dataset and uses matplotlib.pyplot to plot the market cap for two different equities (T and VZ).

Note the use of a for loop with plt.plot() to plot the market cap for each stock in stocks_of_interest. Also note the use of data.index.levels[0] to access the "dates" index of the Pipeline output and the use of data.xs(stock, level=1) to get the timeseries of market cap data for stock in each iteration of the for loop.

from quantopian.pipeline import Pipeline
from quantopian.pipeline.data.factset import Fundamentals

from quantopian.research import run_pipeline

import matplotlib.pyplot as plt

# Get data to plot using pipeline.
pipe = Pipeline(
    columns={
        'market_cap': Fundamentals.mkt_val.latest,
    }
)

data = run_pipeline(pipe, '2002-1-01', '2018-6-27')

# Plotting data.
stocks_of_interest = symbols(['T', 'VZ'])

# Define columns to be a list of all columns that were on our pipeline output.
columns = list(data.columns)

# Define dates to be the dates from our pipeline output index.
dates = data.index.levels[0]

# Plot data for each stock in stocks_of_interest.
for stock in stocks_of_interest:
    pipeline_data = data.xs(stock, level=1)

    # Iterate through data fields and plot each one as a line graph. In this
    # example, we only have one field to plot, but we can add more columns
    # to the pipeline and they will get plotted here.
    for column in columns:
        plt.plot(dates, pipeline_data[column], label=str(stock.symbol)+' '+str(column))

plt.legend(loc='upper left')
plt.title('Historical T and VZ Market Cap')
plt.show()
Plotting with matplotlib

Rendering a pipeline result in an interactive table

This example uses QGrid to render an interactive "grid" of a pipeline output dataframe. Running this example in research will render an interactive view of the dataframe where you can filter the columns by clicking the filter icon in any of the column headers of the grid. You can also edit cell values interactively by double clicking the cell you'd like to edit. This tool is typically helpful for consumers who are less familiar with the pandas library.

Note the use of qgrid.show_grid(), with the grid_options and show_toolbar options in the commented bottom line.

from quantopian.pipeline import Pipeline
from quantopian.research import run_pipeline
from quantopian.pipeline.data import EquityPricing
from quantopian.pipeline.factors import (
    DailyReturns,
    MaxDrawdown,
)

# Import qgrid, a library for building an interactive view of a pandas
# DataFrame.
import qgrid

# Retrieve data with pipeline.
pipe = Pipeline(
    columns={
        'returns': DailyReturns(),
        'max_drawdown': MaxDrawdown([EquityPricing.close], window_length=6),
    },
)
df = run_pipeline(pipe, '2015-01-01', '2015-04-01')

# Set the default max number of rows to 12  so theDataFrame we render
# with qgrid isn't too tall.
qgrid.set_grid_option('maxVisibleRows', 12)

# Render DataFrame with QGrid
qgrid.show_grid(df)

# Uncomment and run this line to allow columns to overflow the cell window
# and show a toolbar to add/remove rows and view in fullscreen
# qgrid.show_grid(df, grid_options={'forceFitColumns': False}, show_toolbar=True)
Rendering a pipeline result in an interactive table

Plotting a histogram

This example retrieves daily returns in pipeline and plots the relative frequency of daily returns for all US equities in 2014.

Note the use of matplotlib's pyplot.hist function in which we specify several arguments like bins, alpha (to set the transparency of the bars), weights (to convert frequencies to relative frequencies), range, and label. Also note the use of plt.xlim to set the display range of the x axis in the plot.

from quantopian.pipeline import Pipeline
from quantopian.pipeline.factors import DailyReturns

from quantopian.research import run_pipeline

import matplotlib.pyplot as plt
import numpy as np

# Use pipeline to get daily returns.
pipe = Pipeline(
    columns={
        'returns_1d': DailyReturns(),
    },
    screen=DailyReturns().notnull()
)

data = run_pipeline(pipe, '2014-01-01', '2015-01-01')

# Get all returns values as a numpy array.
all_returns = data.values

# Create an array consisting entirely of 1/N so we can transform each observation
# into a weight when plotting relative frequencies below.
weight_factor = np.zeros_like(all_returns) + 1. / all_returns.size

# Plot the relative frequency of US equity daily returns in 2014.
plt.hist(
    all_returns,
    bins=np.linspace(-0.1, 0.1, num=21),
    alpha=0.5,
    weights=weight_factor,
    range=(-0.1, 0.1),
    label='US Equities Daily Returns Distribution',
)

plt.legend()
plt.xlabel('Price')
plt.ylabel('Relative Frequency')
plt.xlim(-0.1, 0.1)
plt.title('Frequency Distribution of US Equity Daily Returns, 2014');
Plotting a histogram

Advanced plotting with Seaborn

Seaborn is a great statistical plotting library for Python. It can be thought of as a more high-level interface to matplotlib. While matplotlib is very powerful and feature rich, it does not have the most user-friendly interface and the plots often require a lot of tweaking to look presentable. Seaborn also has the added benefit of being able to take meta-information extracted from pandas dataframes into account. The below examples use Seaborn to plot various financial data in Research.

Distribution Plot

This example uses returns() to get AAPL's daily returns from 2013 and then uses distplot to plot the distribution of those daily returns.

import seaborn as sns

from quantopian.research import returns

aapl = symbols('AAPL')
df_rets = returns(
    assets=aapl,
    start='2013-01-01',
    end='2014-01-01',
    frequency='daily',
).dropna()

sns.distplot(df_rets);
Seaborn distribution plot

Violin Plot

This example uses returns() to get the daily returns of a handful of stocks from 2013. It then uses violinplot to plot each of those distributions side-by-side.

import seaborn as sns

from quantopian.research import returns

syms = symbols(['AAPL', 'IBM', 'MSFT', 'SBUX', 'SPY'])
df_rets = returns(
    assets=syms,
    start='2013-01-01',
    end='2014-01-01',
    frequency='daily',
).dropna()

sns.violinplot(df_rets);
Seaborn violin plot

Pair Plot

This example uses returns() to get the daily returns of a handful of stocks from 2013. It then uses pairplot to plot pairwise relationships. The kernel density estimates (KDE) of the returns are plotted along the diagonal.

import seaborn as sns

from quantopian.research import returns

syms = symbols(['AAPL', 'IBM', 'MSFT', 'SBUX', 'SPY'])
df_rets = returns(
    assets=syms,
    start='2013-01-01',
    end='2014-01-01',
    frequency='daily',
).dropna()

sns.pairplot(df_rets, diag_kind='kde', size=2.4);
Seaborn pair plot

Heat Map

This example uses returns() to get the daily returns of a handful of stocks from 2013. It then uses heatmap to plot pairwise correlations.

import seaborn as sns

from quantopian.research import returns

syms = symbols(['AAPL', 'IBM', 'MSFT', 'SBUX', 'SPY'])
df_rets = returns(
    assets=syms,
    start='2013-01-01',
    end='2014-01-01',
    frequency='daily',
).dropna()

sns.heatmap(df_rets.corr());
Seaborn heat map plot

Joint Distribution

This example uses returns() to get the daily returns of SBUX and SPY from 2013. It then uses jointplot to plot the joint distribution with bivariate and univariate graphs. The kind='reg' keyword argument runs a linear regression and plots the best fitting line, the confidence interval (shaded regions) and the Pearson regression coefficient as well as p-value.

import seaborn as sns

from quantopian.research import returns

sbux, spy = symbols(['SBUX', 'SPY'])
df_rets = returns(
    assets=[sbux, spy],
    start='2013-01-01',
    end='2014-01-01',
    frequency='daily',
).dropna()

sns.jointplot(sbux, spy, df_rets, kind='reg');
Seaborn joint distribution plot