Can we have more meaningful datasets please?

Since Quantopian advocated using their datasets to generate alpha for allocations, I began to study the available datasets. Very handful of them are meaningful. Out of the 50 datasets mentioned most are irrelevant ones from Quandl etc. Even of those relevant many use blaze and are not supported in algorithms or pipeline. Can you please add more datasets so that we can succeed in our quest for alpha?

9 responses

Yes. We agree with the philosophy that datasets are the key to success in the quest for alpha. Adding datasets for you to work with is a top priority item for us. The reason you haven't seen many new datasets added recently is because we are improving our data system such that we can add new datasets at a much quicker rate. Our goal is to have thousands of datasets for you to choose from, and the changes we are making will make that possible.

All that being said, we're always looking for requests of particular types of datasets. Are there any particular datasets that you'd like to see on the platform? If you don't feel comfortable sharing them in a public setting, feel free to email in through our support channel ([email protected]).

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Realtime indices, and commodities spot prices :)

Hi Jamie,

Thanks for your email. Will drop a note to feedback with list of interested data sources.

Best regards,
Pravin

Hi Jamie -

Thousands of data sets? I'd be interested in your motivation, and how you'd expect individual users to use that many data sets productively. It ends up being one data set per listed company, basically. Say each data set has an teeny-tiny transient uncorrelated alpha. What would be the path to writing an algo that could get a $50M allocation? I can see how this might fit with the framework provided on https://blog.quantopian.com/a-professional-quant-equity-workflow/ and Pipeline, assuming there is enough alpha in daily data. Do you think a naive equal-weight alpha combination will work? Or will something more sophisticated be required, to do the combination? On a related note, it sounds pretty daunting for a single Q user to sift through thousands of data sets, combining the good ones into a comprehensive, scalable algo. But say I picked one, and showed that there was a little bit of alpha there. How could I get paid, so that you could license my little gold nugget for the fund, and I could buy a sandwich? One suggestion for a data set would be Internet health (e.g. https://www.akamai.com/us/en/solutions/intelligent-platform/visualizing-akamai/real-time-web-monitor.jsp). Daily data should be pretty easy to come by. And deriving a real-time minutely feed, I'd think, would be fairly straightforward. Even down to individual companies, it should be possible to get the data by writing a script to query site availability (e.g. Amazon, Facebook, etc.). At some level, there must be alpha in such data sets, but if someone is already doing it (almost certainly), then minutely data may not be fast enough. You might run this by Fawce though, given his do-good tone on https://www.quantopian.com/posts/phasing-out-brokerage-integrations. In all likelihood, you could be profiting off of criminal activity (but then, you are hooked up with Point 72, which has a very sketchy history, as portrayed in the book, Black Edge). By the way, if you do end up using my Internet data idea, my one-time licensing fee is$500 cash (\$20 bills would be nice).

For the Q fund, would there be any way to publish data that would allow users to do the kind of algo viability analyses that presumably you can do? For example, say I'm working on writing a new algo. I'd like to know the degree to which it might be accretive, so that I know that I'm not wasting my time (and your platform resources). Presently, it is a total open-loop, time suck. Building a crowd-sourced fund without giving the crowd access to the fund as it is built would seem to be counter-productive. But alas, ironically, the whole crowd-sourced, collective, "we are all in this together" concept seems to be totally lost on you guys, in my opinion. What is your sense from the inside (we can start another thread, if you want)?

There are a lot of great comments here but I'll focus on the area that I have expertise in!

Our allocation process attaches high value to algorithms that use alternative datasets. We evaluate all algorithms that use alternative data, including strategies that use either free datasets or premium datasets.

Along with that we're working to add new and meaningful datasets, as Jamie mentioned, there is some product work being done to make that possible.

Seong

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

@ Seong -

Thanks. I knew all of that. If one writes a multi-factor algo, how will you assess the contribution of each factor, after the alpha combo and optimization steps? If the algo uses a mix of traditional and alternate factors point-in-time how will you rank the algo for its use of alternate data sets? It is easy enough to mix in lots of factors but the alpha could still be dominated by traditional ones. It almost seems like you are interested in single-factor algos which would make the whole assessment problem easier. Even then, how will you know the actual source of the alpha? I guess I don't quite follow (without unraveling the strategy in detail which goes beyond your terms of use, I think). If the algo is accretive to the Q fund, the source of the alpha is irrelevant.

Hi Jamie -

Point-in-time data sets germane to pairs trading might be interesting. For example, a reference data base of all pairs back to 2002, found by brute force might be useful.

Also, you could grab a paper such as the one posted here, and create a data base to overcome platform limitations (see Pravin's Jan 23, 2016 comment, "If you increase number of stocks to 40+ either the kernel dies or it runs out of memory or it takes hours to complete.")

Generally, any point-in-time look-up tables to overcome platform limitations would be helpful.

Daily VWAP