Quarter on Quarter growth factors

Hi All,

I've tried to create two quarter-on-quarter (QonQ) growth/change factors, based off a ttm (trailing twelve month) factor someone else had created and posted in the forum. If I've made a mistake on either of them, please do let me know, as I'm pretty new to python and Quantopian.

The first one is meant to calculate the growth/change between the current (most recently reported) quarter vs the same quarter last year, in order to remove any seasonality effect, so mostly for cumulative time-series items on the income and cashflow statements.

The second one is meant to calculate the growth/change between the current quarter and the direct previous (consecutive) quarter, and is meant mostly for balance sheet/point-in-time related items.

Note that I added a few extra trading days to the standard 63 trading days / quarter, as perhaps not all companies issue their 10Qs exactly 63 trading days after the previous one. Maybe it's better to use a few more extra days for quarter length just to be safe? Or is there a better way of doing this?

import numpy as np
from quantopian.pipeline.factors import CustomFactor
from quantopian.pipeline.data import Fundamentals

class QonQ_Growth(CustomFactor):
"""
Quarter on Quarter growth (current vs same quarter last year)
Call from within Pipeline function, for example:
ebit_growth = QonQ_Growth(inputs=[Fundamentals.ebit])
"""
# quarter_length = 65
window_length = 3 * 66
def compute(self, today, assets, out, data):
q_on_q = np.array([-1, -3 * 66])
out[:] = (data[q_on_q[0]] - data[q_on_q[1]]) / data[q_on_q[1]]

class QonCQ_Growth(CustomFactor):
"""
Quarter on Consecutive Quarter Growth
Call from within Pipeline function, for example:
equity_growth = QonCQ_Growth(inputs=[Fundamentals.book_value_per_share])
"""
# quarter_length = 65
window_length = 1 * 66
def compute(self, today, assets, out, data):
q_on_cq = np.array([-1, -66])
out[:] = (data[q_on_cq[0]] - data[q_on_cq[1]]) / data[q_on_cq[1]]


12 responses

The issue deals with the fact that when you pull in trailing fundamental data using Pipeline, it uses the asof_date as opposed to the timestamp (i.e. the date that the data became available).

For example, take a stock like AAPL that reported Q1 earnings today after the bell (i.e. asof_date would be 3/31/18 and timestamp would be 5/2/2018) . So, if you pull the latest data using pipeline, you will not see this Q1 datapoint until 5/2/2018. However, if you try and pull trailing data by using a CustomFactor, pipeline populates the input array (which you called data) using the asof_date as opposed to the timestamp.

What would this mean for the required window_length?
- For a pipeline calculation ran on 4/30/2018, you would need a window_length of approximately 272 (83 + 63*3) (82 is the number of trading days from the beginning of the year to 4/30/2018)
- The problem is that this required window_length will vary over time depending on the company and date in question. Therefore, a different method will need to be used.

I am not sure of the best solution at the moment. I need to think through it to see if there is an efficient way to do it. If anyone else has any suggestions, please feel free to post.

(My notebook I used to debug the problem is attached. However, it is not super organized, as it is late. Let me know if you have any questions).

6

Thanks Michael. Hopefully you or someone else smarter than I can figure out how to do this.

Here's also to hoping that FactSet can provide better and cleaner fundamental datasets, so we can focus on research and strategy development, instead of wasting time trying to figure out how to get the correct data in the first place...

A while back I shared a CustomFactor - LastFourQuarters (in the attached notebook on that thread) - for getting the last 4 quarters of data. You might be able to use it to build QonQ factors more easily.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Joakim,

Jamie's solution works well. Try this (See Notebook for an example):

class QonQ_Growth(CustomFactor):

window_length = 252 + 63

def compute(self, today, assets, out, asof_date, values):
for column_ix in range(asof_date.shape[1]):
_, unique_indices = np.unique(asof_date[:, column_ix], return_index=True)
quarterly_values = values[unique_indices, column_ix]

# Fill empty values with NANs in output array
if len(quarterly_values) < 5:
quarterly_values = np.hstack([
np.repeat([np.nan], 5 - len(quarterly_values)),
quarterly_values,
])
quarterly_values = quarterly_values[-5:]
out[column_ix] = quarterly_values[-1] / quarterly_values[-5] - 1

18

Awesome!! Works like a charm, thank you both!

Perhaps you can help me with another factor as well, but I'll start a new thread for that one (days since 52 week hi/lo)

Again, thank you so much! Any tips on how I can get better at the class object, numpy, and CustomFactor?

Regarding Numpy (and other Scientific Python Packages), I have found the SciPy conference tutorials posted on YouTube to be helpful. See this link for the 2016 talk on Numpy https://www.youtube.com/watch?v=gtejJ3RCddE&t=3s.

Regarding classes, I used a couple courses from Udacity on Object Oriented Programming.

1. Programming Foundations with Python
2. Object Oriented Programming in Java (Note: while this was in Java, it went into a bit more detail into the concepts of object oriented programming design).

Regarding the CustomFactor class, I would recommend:

1. The Quantopian API Documentation
2. Quantopian's Pipeline Tutorial (Lesson 10 deals with Custom Factors, but the whole tutorial is worth going through to get more familiar.)

@Joakim: One of our senior engineers, Scott Sanderson, pointed me to this video a couple of years ago: https://www.youtube.com/watch?v=EEUXKG97YRw. I remember finding it extremely helpful in terms of understanding vectorized computing in Python at the time.

Awesome! Thanks again! I'll check all of those out. Should keep me busy for a while.

Hi everybody,

Does anyone getting pipeline time-out using @Jamie LastFourQuarters code?

I feel the loading and processing is too high for using such function.

Thank you
L

Hi Leo,

Yes,I have that problem.

This part of the code takes too much time:

for column_ix in range(asof_date.shape[1]):


The only alternative I know at the moment is by passing in axis parameter into numpy.unique but it needs at least Numpy 1.13.0 while Quantopian only supports Numpy 1.11.1.

Have you solved the time-out issue?

Omit my previous message i.e. the part where I suggested an alternative; The axis parameter does not work as I had expected.

For now, I tried my best to avoid looping over the assets. I retrieved the data by first creating an array of indexes to get and then just extract the data from these indexes, like so:

def get_periodical(data, t=251, num_periods=None):
""" Get data on a periodical basis
For example, for annual data, get data every 251 working days,
and num_periods is the number of years from the latest date.
"""
data = data[::t,:]
if num_periods != None:
data = data[-num_periods:]
return data

class CFOA(AnnualScale):
inputs = [Fundamentals.free_cash_flow_asof_date,
Fundamentals.free_cash_flow, Fundamentals.total_assets,
Fundamentals.morningstar_sector_code]
def compute(self, today, assets, out, asof_date, fcf, ta, groupby,
num_years,
clip_outliers, clip_threshold, scale_by):
fcf_ann = get_periodical(fcf, num_periods=num_years)
fcf_sum = np.sum(fcf_ann, axis=0)
ta = ta[-1, :]
ta[ta == 0] = 1
result = fcf_sum/ta
out[:] = result


AnnualScale is inherited from CustomFactor.

It will run into problems when:
1. The number of trading days is not exactly 251 and the algorithm processes the dates at the edge of different asof_dates.
2. A company may have a varying length of asof_dates. This may cause us to get incorrect data, especially from earlier years.

As you can see, this method is less robust than the one proposed by Jamie above. The larger the num_periods, the larger the drifts are going to be.

Solved it. I used numpy.roll as a replacement for calling np.unique on each asset. The running time is stable now even for longer time windows and QTradableStocksUS filter.

def get_periodical(asof_date, values, period_skip=4, num_periods=2):
# Prepare the mask, all True cells are unique values column-wise.
# -------------
asof_date_shifted = np.roll(asof_date, -1, axis=0)
asof_date_diff = asof_date - asof_date_shifted
# -------------
# Delete rows with all nan values
values = values[~np.isnan(values).all(axis=1)]
# Reorder step
# -------------
# At this point we may have values like so:
# [[ 0.52974        nan  0.616352  0.804157]
#  [      nan       nan  0.601933  0.821267]
#  [ 0.532081       nan  0.618076  0.834409]]
#
# We want to reorder the values so we have:
# [[      nan       nan  0.616352  0.804157]
#  [ 0.52974        nan  0.601933  0.821267]
#  [ 0.532081       nan  0.618076  0.834409]]
# Test value:
#     values[1,0] = np.nan
values_bool = values.astype(bool)
values_bool[np.isnan(values)] = False
# Mergesort retains the ordering of unsorted elements
sidx = values_bool.argsort(axis=0, kind='mergesort')
values = values[sidx, np.arange(sidx.shape[1])]
#     values = values[sidx]
#     print(values)

# Gather the correct values
# based on period_skip and num_periods
# -------------
# Let's say we have the following values:
# [[      nan       nan  0.653645  0.834693]
#  [ 0.522313       nan  0.575657  0.813225]
#  [ 0.52974        nan  0.616352  0.804157]
#  [ 0.534637       nan  0.601933  0.821267]
#  [ 0.532081       nan  0.618076  0.834409]]
#
# For period_skip=2 and num_periods=2, we want to get the following result:
# [[ 0.52974        nan  0.616352  0.804157]
#  [ 0.532081       nan  0.618076  0.834409]]
#
# The code below does just that.
values = np.flipud(values[:-num_periods*period_skip:-period_skip, :])
return values

def get_changes(data, t=1):
data = pd.DataFrame(data)
prev_data = data.shift(t, axis=0)
return data/prev_data

class GMean(CustomFactor):
window_length = 252
params={
'period_skip': 1, # 4 for annual
'num_periods': 2 # 2 years minimum
}

def compute(self, today, assets, out,
asof_date, values, period_skip, num_periods):
periodic_values = get_periodical(asof_date, values,
period_skip=period_skip,
num_periods=num_periods)
result = gmean(periodic_values+1, axis=0)-1
out[:] = result

class GMeanChanges(CustomFactor):
window_length = 252
params={
'period_skip': 1, # 4 for annual
'num_periods': 2 # 2 years minimum
}

def compute(self, today, assets, out,
asof_date, values, period_skip, num_periods):
periodic_values = get_periodical(asof_date, values,
period_skip=period_skip,
num_periods=num_periods)
#         print(periodic_values.shape)
changes = get_changes(periodic_values)[1:]
result = gmean(changes+1, axis=0)-1
out[:] = result


GMean calculates the geometric mean based on x quarters/years of data and GMeanChanges calculates the geometric mean of the growths of these data.

1