It's always a good idea to look at the raw data, and associated
asof_dates before jumping into coding. This is true for fundamental data, and especially true for metadata such as the unemployment rate. That way one can better understand how often the data updates, and what the lag time is.
The quandl unemployment rate data from FRED is only updated monthly. The daily values over a given month are all the same and only change as a new monthly report is made public. It's not really accurate to average 3 months of daily data to get the 3 month 'monthly' average. There are two issues. First, if one month has 20 trading days and another 21, then the latter would be weighted more when averaging. Second, and this is a bigger issue, using an approximation of 63 trading days for a 3 month average may not always get 3 complete months of data, More often, it will get a portion of the latest month and a portion of the month 4 months ago. Not what is really intended.
What's the fix? Look at the
asof_date for the data and only take a single value for each date. I typically use the last value in case there was an update. The last 3 unique values of
asof_date will be the last 3 available months of data. Find the mean of these 3 values to get the 3 month average. To get the 3 month average each month for the past year, one needs to implement a 'rolling mean'. Fortunately, pandas dataframes have a method just for this.
Here is a custom factor to get both the latest 3 month average and the minimum 3 month average over the past 12 months. A single factor with two outputs saves time over two separate factors since much of the calculations for each are the same.
inputs = [fred_unrate.value, fred_unrate.asof_date]
# Ensure we have enough data for a years worth of 3 month averages (plus a little more)
window_length = 252 + (21*3) + (21*3)
# Define the two outputs
outputs = ['latest', 'lowest']
def compute(self, today, asset_ids, out, values, asof_dates):
# Start by getting everything into a single dataframe
values_df = pd.DataFrame(values, columns=['value'])
dates_df = pd.DataFrame(asof_dates, columns=['asof_date'])
df = pd.concat([values_df, dates_df], axis=1)
# Remove duplicates to get unique dates
df.drop_duplicates('asof_date', keep='last', inplace=True)
# Get 3 month averages of the values
rolling_means = df.rolling(3).value.mean()
# Take only the most recent 12 months then remove the last month
rolling_means_12mo = rolling_means.tail(12)
rolling_means_ex_latest = rolling_means_12mo.head(-1)
# Finally find the lowest mean value
lowest_mean_value = rolling_means_ex_latest.min()
# The latest value is simply the last rolling mean. The lowest is lowest_mean_value
out.latest[:] = rolling_means.tail(1)
out.lowest[:] = lowest_mean_value
That is the bulk of whats needed to calculate the Sahm Index. There is a bit more explanation as well as the rest of the calculations, all in pipeline, in the attached notebook.
Interesting direction trying to forecast the potential of a recession. Good luck!