Need Help: Pipeline, Quandl Data & Custom Factors

Hi everyone,

I posted this post a few months ago and didn't get any responses. I think because I originally posted it as an incomplete code that I needed help with. I did some digging and worked on this some more and got farther along. Any thoughts?

A little more background:
I am trying to recreate a version of the Sahm Index from FRED that can be referred to in the pipeline, to possibly change the portfolio weights based on if there is a potential recession. I created a version of this in a research notebook that pulled the data from elsewhere, however, I realized that not only does Quandl work differently in the IDE than in research but pipelines work very differently with outside data like Quandl than in the notebooks. So below is my effort to find the middle ground.

It's a jumping-off point for some recession data getting in the mix of some future algos, which I find interesting to test against historical data as well as the very recent economic climate.

For reference:
Sahm Index FRED

The first place I heard about it:
NPR – The Eponymous Economist

I'm stoked I got this far with the help of the following forums:

However, I'm sure this is not the cleanest code, and there may be some better methods out there of achieving the same result So I'm open to suggestions. Also, I'm pretty new, so if I made a huge mistake somewhere, some critiquing would be awesome as well.

2
5 responses

I think this should do it. Please try.

class SahmIndex(CustomFactor):
window_length = 252
def compute(self, today, asset_ids, out, unrate):
3M_MA = np.nanmean(unrate[-90:], axis=0)
12M_Min = np.min(unrate, axis=0)
out[:] = 3M_MA - 12M_Min


@nadeem did u just write 6 lines of code for what took this gentleman months of work?

And i found his code complex

Thank you so much for helping me clean up the Custom Factor, I was having trouble applying basic computations inside of the CustomFactor which lead to a lot of workarounds. This is super helpful. The only piece that is missing is that the 12-month low is representing the low of the 3 month averages, not the 12month low in general. It also has to exclude the current month's data point. I was having trouble calculating that within the custom factor which is why in the original I opted to move that calculation into the pipeline itself. I'm going to mess around with this code you sent and see if I can add onto it, if you find another way before me please let me know, Thanks!

@Octavian N

Yes, he pulled a lot of my calculation into the Custom Factor instead of relying on the pipeline. A lot of the code is extraneous as well, it just exists so you can see all of the data. I'm VERY new so I'll be slow and clunky at this. Nadeem's cleanup was very helpful. Could you let me know what was overly complex in my code so I can try to clean it up more? Thanks!

Im newer than you my friend. Im still getting surprised at a lot of things i learn. Thank you for sharing

It's always a good idea to look at the raw data, and associated asof_dates before jumping into coding. This is true for fundamental data, and especially true for metadata such as the unemployment rate. That way one can better understand how often the data updates, and what the lag time is.

The quandl unemployment rate data from FRED is only updated monthly. The daily values over a given month are all the same and only change as a new monthly report is made public. It's not really accurate to average 3 months of daily data to get the 3 month 'monthly' average. There are two issues. First, if one month has 20 trading days and another 21, then the latter would be weighted more when averaging. Second, and this is a bigger issue, using an approximation of 63 trading days for a 3 month average may not always get 3 complete months of data, More often, it will get a portion of the latest month and a portion of the month 4 months ago. Not what is really intended.

What's the fix? Look at the asof_date for the data and only take a single value for each date. I typically use the last value in case there was an update. The last 3 unique values of asof_date will be the last 3 available months of data. Find the mean of these 3 values to get the 3 month average. To get the 3 month average each month for the past year, one needs to implement a 'rolling mean'. Fortunately, pandas dataframes have a method just for this.

Here is a custom factor to get both the latest 3 month average and the minimum 3 month average over the past 12 months. A single factor with two outputs saves time over two separate factors since much of the calculations for each are the same.

class Average_Unemplyment_Rate(CustomFactor):
inputs = [fred_unrate.value, fred_unrate.asof_date]

# Ensure we have enough data for a years worth of 3 month averages (plus a little more)
window_length = 252 + (21*3) + (21*3)

# Define the two outputs
outputs = ['latest', 'lowest']

def compute(self, today, asset_ids, out, values, asof_dates):
# Start by getting everything into a single dataframe
values_df = pd.DataFrame(values, columns=['value'])
dates_df = pd.DataFrame(asof_dates, columns=['asof_date'])
df = pd.concat([values_df, dates_df], axis=1)

# Remove duplicates to get unique dates
df.drop_duplicates('asof_date', keep='last', inplace=True)

# Get 3 month averages of the values
rolling_means = df.rolling(3).value.mean()

# Take only the most recent 12 months then remove the last month
rolling_means_12mo = rolling_means.tail(12)

# Finally find the lowest mean value
lowest_mean_value = rolling_means_ex_latest.min()

# The latest value is simply the last rolling mean. The lowest is lowest_mean_value
out.latest[:] = rolling_means.tail(1)
out.lowest[:] = lowest_mean_value



That is the bulk of whats needed to calculate the Sahm Index. There is a bit more explanation as well as the rest of the calculations, all in pipeline, in the attached notebook.

Interesting direction trying to forecast the potential of a recession. Good luck!

2