Back to Posts
Listen to Thread
Edited

This feature is a big one! I can't tell you how many times someone has asked me: "I have a set of prices on my computer, can I use it in Quantopian?" or "Do you have this specific other data source in Quantopian?" Until today, the answer was no!

Fetcher permits you to import any time-series data in a CSV format into Quantopian. The data can be additional information about a stock already in Quantopian, like if you import the short interest for a list of stocks, or it can be totally stand-alone data like the price of palladium.

Fetcher imports the data into a pandas dataframe. Once in the dataframe it is sorted by datetime. Fetcher permits you to manipulate and format the imported dataframe both before and after the timesort. Once the formatting is done, the dataframe is then used as a datasource just like the existing price data that Quantopian uses.

This is incredibly powerful. Until today, you've only been able to code your algorithms against signals that you find in the price and volume stock data. Now, you can code your algorithm against any data source you can get your hands on! I particularly want to point out http://www.quandl.com/. They've put together 4 million datasets that you can use for free that are easily accessed through Quantopian. The sample algo below loads the prices of palladium and gold from Quandl, as an example.

For the full explanation of the feature, you can read the help here and here.

Here is a quick primer. The generic code that you use for Fetcher:

fetch_csv(url, pre_func=None, post_func=None, date_column='date',  
           date_format='%m/%d/%y', timezone='UTC', symbol=None, **kwargs)  

Here is a simple implementation for a file shared on Dropbox:
fetch_csv('https://dl.dropbox.com/u/1/history.csv', date_col='date', date_format='%m/%d/%Y')

Here are the first few lines of that CSV, so you can see how the formatting works:

date,symbol,days_to_cover
2/28/2013,MTH,3.43719
2/15/2013,MTH,2.841933
1/31/2013,MTH,3.197315

Clone Algorithm
125
Loading...
Backtest from to with initial capital ( data)
Cumulative performance:
Algorithm Benchmark
Custom data:
Week
Month
All
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Information Ratio
--
Benchmark Returns
--
Volatility
--
Max Drawdown
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Information Ratio 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
This backtest was created using an older version of the backtester. Please re-run this backtest to see results using the latest backtester. Learn more about the recent changes.
There was a runtime error.

This is a truly excellent addition in functionality, well done guys.

Excellent, this is a wonderful addition, I will use it to pull VIX and SKEW prices, at least!

Just a reminder to everyone that is fetching economic data series -- as Jessica Stauth pointed out in NYC a week or two ago, often these data are time-stamped such that the historical time stamps are earlier than the data was available at that point in time.

In her case, the NYSE short interest data time stamps were 8 days prior to when those figures were released.

In order to prevent look-ahead bias, you must adjust those time stamps!

Simon, that is a very good point. We work very hard to keep look-ahead bias out of the system, but Fetcher does permit it to creep back in if you're not careful.

One example: say you import unemployment as your signal, and you enter November's unemployment as Nov 1, 2012 - that's look ahead bias, because you don't know the unemployment data then! You have to import it with the release data of the data as the key date - that's the first date that you can trade with the information.

I'm going to make a point to cover this new feature in some detail in today's webinar. If you're looking for a deeper run-through, sign up here.

Yeah, it can be more subtle than that, since most economic release series have their own dates, which do not correspond to the dates of the data's release, but rather with the dates of the data themselves. Worse, these data are often revised, where data from the past is updated in the future, so if you are back-testing over the series, your algo might work great with all the revised figures, but those revisions were future data.

EDIT: still an awesome feature!

This is one of the best and necessary updates I've seen. Last I checked, there were only certain approved python libraries that we could reference. Is there a list somewhere?

Hi guys,

I completely agree about this great new feature--thank you.

In testing, I cloned this algorithm and get the following when I build:

Something went wrong on our end and we've been notified about it. Sorry for the inconvenience.  
LookupError: unknown encoding: idna  
File test_algorithm_sycheck.py:17, in initialize  
File algoproxy.py:882, in fetch_csv  
File requests_csv.py:168, in __init__  
File requests_csv.py:203, in load_df  
File requests_csv.py:194, in fetch_data  
File api.py:55, in get  
File api.py:44, in request  
File sessions.py:276, in request  
File models.py:221, in prepare  
File models.py:290, in prepare_url  

I know you know about it (via that message) so I should probably not be redundant here. Is this a short or long fix? No pressure.

@gary, you can see the whitelist here: https://www.quantopian.com/help#ide-module-import

Oh wow perfect. statsmodels was the main one that I was looking for.

@ryan, thanks for the heads up, checking...

@ryan, the problem is fixed and Fetcher works again. Sorry for the inconvenience!

This looks very interesting... at a glance it looks like you're sticking with US Equities.. but if you let us fetch OHLC data from someone like dukescopy we could run algorithms on FX too :)

Great new feature! Any possibility of adding the Quandl python package to the whitelist?

Hey Fawce and Dan--thanks for the fix. It's working for me and I am exploring the capabilities (like pre_func, post_func and how to use multiple columns in a remote DataFrame). I'll update my other post/thread with my working examples once I get to a good point.

@Ryan, thanks. Please let me know if you have questions as you kick the tires.

@George, quandl did a lot of great work on their API, and it fits in perfectly with Fetcher. We were just talking about adding it, and have a question for you. To guard against lookahead bias in the simulation, we'd restrict quandl to the initialize method, and provide a way for you to push the dataframe you receive into the algo's data sources. That way you'll get all the quandl goodness plus our data alignment, batching, transforms, and so on for the data. Would you be happy with that?

thanks,
fawce

Not only is this awesome, but I've been looking for something like Quandl for ages!

fawce,

Glad to hear you are thinking about it - I think what you are proposing would work. To be honest, I just thought it was nice and simple to use the quandl get function:

import Quandl as ql
vxv = ql.get('CBOEFE/INDEX_VXV')

But now that I think about it more, I wonder of quandl is going to be happy with all the quantopians adjusting parameters of their algos all day, reloading the data for every backtest. Maybe quantopian can get a mega-authorization key or something like that. (Hope I'm not opening a can of worms..)

I love what you guys are doing!

thanks,
george

I've tried a lot but will not post my mess of non-working code.

Instead, would it be possible to see an example of using Fetcher without the 'symbol' keyword, so that it can get a .csv file containing derived signals, something like:

date,GOOG,IBM  
2008-05-14, 0.012, 0.3  
2008-05-15, 0.011, 0.21  
2008-05-16, 0.017, 0.4  
2008-05-19, 0.011, 0.1  
...

using a call like

fetch_csv('http://someurl.org/sig.csv', date_column='date', date_format='%Y-%m-%d')  

All the examples I have seen use the 'symbol' keyword in fetch_csv().

Then, in handle_data(), I want to use use a stock's price with its associated signal: data[cur_sid].price (the normal thing with price from sid()) and data[cur_sid].sig (my uploaded .csv file). I realize that I need to introduce the 'sig' column name somewhere, a la the 'post_func=rename_col' in the Fetcher example. But I just cannot seem to figure it out.

A separate issue: it would be ideal to have the ability to use the column names in my .csv file (GOOG, IBM in my example above) as a way to populate my portfolio in initialize() with all the appropriate sids.

This way, my single uploaded sig.csv file carries all the ticker info so I do not have to actually type sid() or their numbers. I can set up the code on Quantopian and control which stocks are tested simply by changing the .csv file on my server (or specifying a different .csv file).

Are such things possible?

Thanks,
Ryan

George,

Quandl actually gave us just such a mega auth key! Whenever you run Fetcher with a quandl.com, we're auth'ing it with Quantopian's key. The Q+Q combo has hit a chord, so maybe it would be cool to add a new builtin 'fetch_quandl' so you could just do:

fetch_quandl('CBOEFE/INDEX_VXV')  

instead of

fetch_csv('http://www.quandl.com/api/v1/datasets/CBOEFE/INDEX_VXV.csv')  

and in the background, we'll use the quandl api.

thanks,
fawce

Hi Ryan,

Here you are: https://www.quantopian.com/posts/ranking-and-trading-on-days-to-cover
You should be able to grab the csv I'm using to see the structure of the file and so forth.

thanks,
fawce

Thanks for the big example, Fawce. That is very helpful. I'm looking forward to adapting it. I see the big difference is, instead of an N+1 column DataFrame-looking .csv file, it is instead a stack of N 3 column DataFrame-like for however many dates you want. Got it.

Do I understand correctly that there is a limit of 10 securities in a portfolio?

Thanks again,
Ryan

Ryan,

My algo limits the portfolio to 20% of the universe (top and bottom deciles). That works out to be about 100 securities for the universe I chose.

thanks,
fawce

Great, thanks for your patience and explanations.

Ryan

Is it possible to import a time series of minute data using fetch_csv? The examples above all appear to be daily data. If so, what format should the dates and times be given in?

Thanks,
David

Yes David, you can. Within your fetch_csv you'd need to specify it. Something like this:

        date_format='%Y-%m-%d %H:%M:%S'  

Hi Dan,

Thanks for building out this new feature and it was a pleasure meeting you in person at the last Quantopian meetup (you may remember me as the annoying person who asked a few questions about how you guys secure your site against malicious Python attacks). I was wondering if it's on the project timeline for Quantopian to open source Fetcher?

The reason being that a lot of people (meaning at least me) are trying to use Zipline as a way to backtest with external data sources (in SQL/NoSQL), e.g., historical options/forex data. Fetcher will be an excellent tutorial for folks like me who are trying to build a customized data source. I promise if a good example of how a custom source is built, I'll upload my custom data source up on Github. Thanks for your team's excellent effort on building a great backtester both online (Quantopian) and offline (Zipline) Until then, I'll try to figure it out via PyDev via trial-and-error.

Hi Dan,

Thanks for building out this new feature and it was a pleasure meeting you in person at the last Quantopian meetup (you may remember me as the annoying person who asked a few questions about how you guys secure your site against malicious Python attacks). I was wondering if it's on the project timeline for Quantopian to open source Fetcher?

The reason being that a lot of people (meaning at least me) are trying to use Zipline as a way to backtest with external data sources (in SQL/NoSQL), e.g., historical options/forex data. Fetcher will be an excellent tutorial for folks like me who are trying to build a customized data source. I promise if a good example of how a custom source is built, I'll upload my custom data source up on Github. Thanks for your team's excellent effort on building a great backtester both online (Quantopian) and offline (Zipline) Until then, I'll try to figure it out via PyDev via trial-and-error.

Hi,
I found at that this snippet can be used to set headers in case a file has no headers. However, how to define a different delimiter like ';' instead of ',' ?

def set_headers(df):  
    header_row = ['Date','Open','High','Low', 'Settle', 'Volume', 'Prev. Day Open Interest']  
    df.columns = header_row  
    return df  

pre_func=set_headers,
usage similar to
post_func=rename_col,
J.

Log in to reply to this thread.
Not a member? Sign up!