I was hoping to play in Research with the datasets from https://atlas.media.mit.edu, the observatory of economic complexity, hoping to combine that data with data Quantopian makes available. I made csv files from the compressed tsv files they make available but they end up being about 180 Mb, and another is 2.2 Gb, which I would guess to be beyond courtesy if not capacity to upload. What are the file size limits of read_csv in research and fetch_csv for algorithms, and limits of the data directory itself? Would it be ok to just break the files into bite size chunks and use that?
For something like the Economic Complexity data set there might be a lot of other people interested in using it too. We have the fundamentals dataset from morningstar available, would it be considered to set up a few more shared databases if there was enough interest in a particular dataset? Maybe establish a 'commons' for datasets that people have uploaded and wouldn't mind sharing? I'm looking at https://github.com/quantopian/zipline/wiki/How-To-code-a-data-source and wondering if it could be formatted in such a way, but the sids requirement makes it not really appropriate. The goal of the analysis would be to detect correlations between the commodities and countries with the sids, but it is far from input.
For more personal datasets it could be nice to be able to point fetch_csv at files within ones' Research data directory. Is that a functionality being considered?