Back to Community
Need Help: Pipeline, Quandl Data & Custom Factors

Hi everyone,

I posted this post a few months ago and didn't get any responses. I think because I originally posted it as an incomplete code that I needed help with. I did some digging and worked on this some more and got farther along. Any thoughts?

A little more background:
I am trying to recreate a version of the Sahm Index from FRED that can be referred to in the pipeline, to possibly change the portfolio weights based on if there is a potential recession. I created a version of this in a research notebook that pulled the data from elsewhere, however, I realized that not only does Quandl work differently in the IDE than in research but pipelines work very differently with outside data like Quandl than in the notebooks. So below is my effort to find the middle ground.

It's a jumping-off point for some recession data getting in the mix of some future algos, which I find interesting to test against historical data as well as the very recent economic climate.

For reference:
Sahm Index FRED

The first place I heard about it:
NPR – The Eponymous Economist

I'm stoked I got this far with the help of the following forums:

Spy on FRED Pipeline Syntax Help
How to Implement the Moving Average of a Pipeline Factor
How to Utilize FRED Datasets in Pipeline
Custom Factor with Boolean Values

However, I'm sure this is not the cleanest code, and there may be some better methods out there of achieving the same result So I'm open to suggestions. Also, I'm pretty new, so if I made a huge mistake somewhere, some critiquing would be awesome as well.

Loading notebook preview...
5 responses

I think this should do it. Please try.

class SahmIndex(CustomFactor):  
    window_length = 252  
    def compute(self, today, asset_ids, out, unrate):  
       3M_MA = np.nanmean(unrate[-90:], axis=0)  
       12M_Min = np.min(unrate, axis=0)  
        out[:] = 3M_MA - 12M_Min  

@nadeem did u just write 6 lines of code for what took this gentleman months of work?

And i found his code complex

@Nadeem Ahmed

Thank you so much for helping me clean up the Custom Factor, I was having trouble applying basic computations inside of the CustomFactor which lead to a lot of workarounds. This is super helpful. The only piece that is missing is that the 12-month low is representing the low of the 3 month averages, not the 12month low in general. It also has to exclude the current month's data point. I was having trouble calculating that within the custom factor which is why in the original I opted to move that calculation into the pipeline itself. I'm going to mess around with this code you sent and see if I can add onto it, if you find another way before me please let me know, Thanks!

@Octavian N

Yes, he pulled a lot of my calculation into the Custom Factor instead of relying on the pipeline. A lot of the code is extraneous as well, it just exists so you can see all of the data. I'm VERY new so I'll be slow and clunky at this. Nadeem's cleanup was very helpful. Could you let me know what was overly complex in my code so I can try to clean it up more? Thanks!

Im newer than you my friend. Im still getting surprised at a lot of things i learn. Thank you for sharing

It's always a good idea to look at the raw data, and associated asof_dates before jumping into coding. This is true for fundamental data, and especially true for metadata such as the unemployment rate. That way one can better understand how often the data updates, and what the lag time is.

The quandl unemployment rate data from FRED is only updated monthly. The daily values over a given month are all the same and only change as a new monthly report is made public. It's not really accurate to average 3 months of daily data to get the 3 month 'monthly' average. There are two issues. First, if one month has 20 trading days and another 21, then the latter would be weighted more when averaging. Second, and this is a bigger issue, using an approximation of 63 trading days for a 3 month average may not always get 3 complete months of data, More often, it will get a portion of the latest month and a portion of the month 4 months ago. Not what is really intended.

What's the fix? Look at the asof_date for the data and only take a single value for each date. I typically use the last value in case there was an update. The last 3 unique values of asof_date will be the last 3 available months of data. Find the mean of these 3 values to get the 3 month average. To get the 3 month average each month for the past year, one needs to implement a 'rolling mean'. Fortunately, pandas dataframes have a method just for this.

Here is a custom factor to get both the latest 3 month average and the minimum 3 month average over the past 12 months. A single factor with two outputs saves time over two separate factors since much of the calculations for each are the same.

class Average_Unemplyment_Rate(CustomFactor):  
    inputs = [fred_unrate.value, fred_unrate.asof_date]

    # Ensure we have enough data for a years worth of 3 month averages (plus a little more)  
    window_length = 252 + (21*3) + (21*3)

    # Define the two outputs  
    outputs = ['latest', 'lowest']

    def compute(self, today, asset_ids, out, values, asof_dates):  
        # Start by getting everything into a single dataframe  
        values_df = pd.DataFrame(values, columns=['value'])  
        dates_df = pd.DataFrame(asof_dates, columns=['asof_date'])  
        df = pd.concat([values_df, dates_df], axis=1)

        # Remove duplicates to get unique dates  
        df.drop_duplicates('asof_date', keep='last', inplace=True)

        # Get 3 month averages of the values  
        rolling_means = df.rolling(3).value.mean()

        # Take only the most recent 12 months then remove the last month  
        rolling_means_12mo = rolling_means.tail(12)  
        rolling_means_ex_latest = rolling_means_12mo.head(-1)

        # Finally find the lowest mean value  
        lowest_mean_value = rolling_means_ex_latest.min()

        # The latest value is simply the last rolling mean. The lowest is lowest_mean_value  
        out.latest[:] = rolling_means.tail(1)  
        out.lowest[:] = lowest_mean_value

That is the bulk of whats needed to calculate the Sahm Index. There is a bit more explanation as well as the rest of the calculations, all in pipeline, in the attached notebook.

Interesting direction trying to forecast the potential of a recession. Good luck!

Loading notebook preview...

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.