Back to Posts
Listen to Thread

I've been playing around on the site for a week or so now, and the one thing I would like is to be able to export my trade and some of the metrics I'm capturing using log.info to a .csv file. This would allow me to further mine the data and gain a better insight into what is driving the algo's performance. Otherwise I'm stuck writing algorithms, seeing how they perform, and then trying to improve them with pretty poor visibility into what is really going on. Ideally, I'd be able to export some data to a .csv each frame, with the full .csv available for download after the backtest has been completed.

I couldn't find this in the API docs, so I was just wondering if this is possible or if it is planned?

We don't have a pending plan for the problem you're talking about. I understand what you're trying to do, but I don't have a fix yet. One of the constraints is our data source. Our minute-level bar data is provided to us with the agreement that we won't enable anyone to copy it. It's OK for our members to run algos against the data but it's not OK for our members to take a copy of the data. Therein lies the constraint: how do we enable logging and downloading, but still prevent the bars themselves from being logged and downloaded? Given how broadly we support Python, it's a non-trivial challenge.

I know that's not the answer you were looking for, but perhaps this conversation will help us find a creative solution.

have you tried finding trading firms that are now shadows of their former selves and just tried to buy their data set and the machines that record it every day? alot of people that used to need really good tick data aren't do so well anymore.

Hi Brad,

Some folks have run zipline (the open-source backtesting engine of Quantopian) on their pcs (and Thomas Wiecki provided an example of distributed computing). You'd need to obtain the data on your own, but I gather that there are some free sources of limited datasets.

Grant

Perhaps you could allow users to add data to a minimongo collection. Those collections would be associated with an algorithm, and would be rewritten with each backtest. They could be queried from an online shell like the at mongodb.org. That would allow users to query their custom datasets and do data manipulation from the Quantopian site itself.

Brad, that type of thing we definitely plan on making available. Taking a step back, we think of algo trading as going in three loose phases.

  1. Get a dataset that you think might have something useful. Explore it. Slice, dice, hack, fold, spindle, mutilate until you find something that makes you go "ah ha, there's something."
  2. Write the algo that exploits the data discovery. Backtest the algo and further assess that you really have a viable idea.
  3. Trade the algo with real money.

When Quantopian started, we built #2 first. We're working hard on #3. When we're ready, we're going to circle back and do #1. When we do that, we'll be able to do something like you describe - we still have the data, but you have the analysis.

The thing that I haven't figured out how to do is how to make logging easier/more powerful in #2.

Would the data constraint permit exporting binned data for histograms?
i.e. a histogram of the daily returns of your algo, but there are many other useful histograms that could be made.
This would remove any time dependence of the exported data - making it impossible to log and download the price data.

Some function like "recordhistogram( data, NumberOfbins )" would be great for this.

Even if this isn't possible with the data constraint, it would be nice to see a plot of the histogram of the daily returns of your algo in the backtest section.

Hello all,

One approach to this problem would be for Quantopian to create a separate data set that has all of the salient features of the real-world data, but does not carry the distribution restriction. This would require transforming the real-world data in such a way to make it useless for developing specific trading strategies, but would still maintain its utility for exploring market characteristics and developing algorithms. Basically, it would be the unrestricted "virtual reality" version of the Quantopian data set.

I realize that the transformation might be kinda tricky technically. Just anonymizing the sids would probably not be sufficient. For example, I could use SPY daily data (e.g. from Yahoo). with pattern recognition code to find the sid data associated with SPY in the database.

EDIT: Looks like http://www.grid-tools.com does this sort of thing.

Grant

Log in to reply to this thread.
Not a member? Sign up!