Back to Community
Different kinds of data sources?

Hi folks,

At Quantopian world headquarters, we've been having some interesting discussions about the different kinds of data sources it might be useful to make available to algorithms, and what the use of those data sources would look like within an algorithm's code. We'd love to hear other other people's thoughts about this.

Right now, we've only got one data source data source which gives you the closing price for each time interval (minute, for full backtests, or day, for quick backtests running in the IDE).

One of our users suggested that it would be interesting to have additional data sources representing commonly used algorithm factors (a.k.a. signals). We think that's a neat idea. What factors would you like to see available within your algorithm as a data source?

The next question is, assuming that additional data sources such as factors are available in your algorithm, how would you like to get them?

Would you like all data sources to continue to go through handle_data, with a new field added to every event to show what kind of source it is? Different kinds of events would have different fields, so you'd have to write handle_data to look at fields conditionally based on the type of source.

Or would you like to be able to write different handle_* functions for the different data sources, with function decorators indicating what source should be fed into each function?

Or is there some other way you think it would be more useful to get the data?

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

10 responses

Hello Jonathan,

Is the idea that Quantopian would supply the additional data? Or would users be able to upload additional data sets and provide links to external data sources/feeds?

Seems to me that at a high level, the additional data would just need to be time-stamped appropriately for compatibility with the present backtester (and future live trading and any flexible research tools you have in mind). Beyond that, the data could be anything...minutely/daily tick data from other markets, news & weather feeds, government agency data, images, video, audio, etc.

Certain external events (e.g. news) can be backtested simply by giving the user an efficient tool to submit orders of specified SIDS at specified times, based on a list supplied by the user. The user would just provide the list of orders--the analysis to determine the list would be done offline by the user. Here's a somewhat dated news article on trade-on-news: http://www.nytimes.com/2010/12/23/business/23trading.html.

Grant

Hi Jonathan,
When you say the "signals", are you talking about the technical indicators? If that's the case, I think we can calculate those by ourselves.

It would be interesting to have fundamental data (to make accounting quantitative?). I am not sure if that will cost you guys a fortune though

Thanks,
Tim

@Grant The data sources we will enable you to use in the future will fall into three categories:

  1. data we provide and pay for;
  2. interfaces to third-party providers for which you will need to have your own account (e.g., you tell Quantopian your username and password for the third-party service, and we log in and fetch data on your behalf); and
  3. the ability to import arbitrary provided by you, e.g., CSV upload to create static data source, data source in CSV or JSON format streamed from a URL of your choice, etc.

I've read a bit about trade-on-news and I'd already seen that article you forwarded. It's my understanding that since then, a number of data providers have actually started selling various data feeds that attempt to quantify news streams, and I think it would be great if we could find a way to pull some of those into Quantopian, though cost will definitely be an issue.

@Tim Sorry for the delay in responding to your comment!

You're right that technical indicators derived from the performance of the individual stocks being traded by your algorithm can be calculated by the algorithm itself. But aren't there also market indicators and the like which aren't linked to specific stocks and yet still might be useful in people's algorithms? Forgive me if this is a silly question; I'm an Operations guy, not a quant.

We've had a number of requests for fundamental data, and yes, a big barrier to it is that it is extremely expensive. We're continuing to explore how we might be able to provide in a cost-effective way and/or enable our users who already have access to fundamental data to access that data from within their algorithms (c.f. my comment to grant above).

hi Jonathan,
Let me start off by saying that you guys at quantopian are doing an exellent job.
I think this is a very important question. At the moment one of my big problems with quantopian is that there are no futures or fx data available. That is fine as i understand that it is very difficult/expensive to provide good quality data but i think a good way to mitigate that in the short run is to provide a functionality for the user to import their own data from a csv file. I am currently researching a lot of strategies across futures ( of all asset classes) and fx and i am writing my own backtester in matlab or even in python (using pandas ) from scratch. If this functionality was available in quantopian i would use it a lot more.
As far as signals are concerned ( i assume you are talking about indicators and not factors related to expected return models like Arbitrage Pricing Theory such as inflaiton. yield curve, GNP , corporate bond premium etc) then i think there should a big library of this indicators built in such as moving averages, stochstics, rsi,average true range etc and be able to import those as functions where you can define the parameters such as whether to use closing prices or somekind of average of closing ,open,high low etc.

Hello Michael,

Thank you for the kind words. I'm glad you like what we're doing so far.

We've thought about implementing a CSV. A few different ways we've considered implementing it: a) import the file into Quantopian, we store it in S3 b) you put the file into Dropbox and we read it from there c) you put the file in some other web service that we read. The downside to a) is that some members have secret data they don't want to store in S3. The other two are more complex. It sounds like you'd be happy with the first one, a), right?

We very much understand that we need to get other data into the system. That feature is competing with some other big ones, and right now it feels like its several months out. But if we can figure out a quick win, we always take it.

Thanks,

Dan

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Folks,

One interim approach would be simply to cut-and-paste the data into the algorithm. Is there a limit to the size of an algorithm? Also, can users create their own Pandas DataFrames? My sense is that this might be the best approach (but I'm still getting up the learning curve).

Grant

Hi Grant,

As far as I know, we don't enforce any limits on how long an algorithm can be, aside from the maximum size of the MongoDB document in which we store the algorithm in our database, which is 16MB. However, we haven't done substantial testing of very large algorithms, and I can imagine that there might be performance issues with large algorithms... the algorithm editor might not perform well with a very large algorithm, and the entire algorithm would have to be transmitted back and forth between the browser and our application servers whenever it is edited or when a backtest is viewed.

You're right that it might be a workable short-term solution for relatively small datasets, but I don't think it's a viable long-term solution.

Jonathan Kamens

Just a quick note regarding a csv file format. A format native to Python might be preferred, especially if it is compact. In MATLAB, data can be stored in a .mat file and loaded into the workspace in a script. Does Python have anything similar?

Hi Dunn,
Option a) might be a good interim solution , it gives you the flexibility to use the platform with any data you want. Will the files stored in S3 be visible to everyone or you mean some members have concerns regarding the security of data stored in S3?
Grant, i think given that it is pretty standard to have these kind of data in CSV files the format uploaded by members should be CSV and then the quantopian team can transform that and save it in any file they see fit.