Back to Community
Custom Data Source for use in Pipeline

I am looking to add a custom data source as an input into a custom factor to add an additional pipeline column into Research. I am using custom made data at this point, but just trying to get the right process to add custom data into a pipeline column, so I can replicate with different data. I have reviewed the forums, and I have not been able to find another person with this use case, but I expect it is something that lot's of people could leverage.

Attached notebook for review.

Thank you!
Adam

Loading notebook preview...
5 responses

Hi,
Could you please be more specific on what exactly are you trying to do, which data are you trying to add from the csv file and what results you are expecting to get out of the pipeline ?

Hi Has,

I am looking to add the 'Value' column from the CSV as a column in pipeline (I realize the data would be the same for every symbol). The data is housing index data from quandl. In the end, I am looking to use the housing index data in a predictive model with other data to see if the housing data is predictive of stock returns.

The code below is what I am using, however I am having trouble using the housing data as an input into a pipeline custom factor.

As I think this through, perhaps a custom factor may not be the best approach to get the housing data into pipeline, perhaps it would better defined as simply adding the column in pipe.add, but when doing so, I get the following error: TypeError: 'Series' object is not callable - I am unsure if the format has to be a pandas multi-index dataframe to be compliant with adding to pipeline.

What are your thoughts?

Thank you!
Adam

Hi,
I think you are in the right direction. Pipeline output is pandas data frame(multi-index), perhaps after you have your pipeline table you can add column form CSV file. Here is a nice overview how to merge different pandas data frame (http://pandas.pydata.org/pandas-docs/stable/merging.html)

Hi Has,

I tried to concact the dataframe I created with the housing data to the pipeline output - it worked to some degree, but since I I am joining a normal two dimensional dataframe to a hierarchical dataframe that includes the equity object's name in each row next to the date, the rows were appended to the bottom of the dataframe.

I thought about generating my own multi-indexed dataframe, but I doubt that would help as I would run into the same issue.

I think the ideal situation would be to add the dataset via a custom factor prior to when pipeline runs, unfortunately, I have yet to find documentation around how to do this (what format should the data be in so pipeline can run?) - in the IDE, we can simply add via fetch_csv, but local_csv does not add a stock symbol to the data object if there is no corresponding SID.

How do you think I could re-approach via a custom factor?

Thanks
Adam

Hi,
Ok this is happening because your index from pipeline dat frame doesn't match csv. I think very easy solution should be after you have your dataframe from pipeline what you can do is reset index using name_of_your_pipline_dataframe.reset_index(inplace=True). After this you can use pandas join function to combine two data frame on common column result = left.join(right, how='outer'). Here left and right are two separate dataframes and both of them have common column called outer (Note: its important that values in the column should match , for example if common column is dates make sure that both dataframes have exactly the same date format) You can see more information on the link which i posted above