Does Quantopian have Bid/Ask spread data? If so, how does one access it? If not, does anyone know where I could find historic Bid/Ask spread data for individual securities?

17 responses

I'm not familiar with a historical source for the information, but if you find one, you can use Fetcher to import the data into your algo. We've heard the request before and we don't have any plans to add it at the moment, but it's something we're open to in the future!

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Hey James, while Quantopian does not have data on historical bid/ask prices, you can model the impact of transaction costs and losses due to the bid/ask spread with a slippage model in Quantopian (see help). You may want to specify a fixed slippage model with a spread that you think is a typical bid/ask spread in the securities you are trading. One option would be to use Fetcher, as Alisa suggested, to pull historical bid/ask prices to your algo, estimate an average bid/ask using that data, and pass that estimate to the slippage model.

Best,
Ryan

I'd like to support the idea of including Bid/Ask prices in your minute data. I would very much like to make use of that information when designing an algorithm. It would also be a huge help to know the number of open contracts for any given security on a minute basis.

Adding $0.02 from a Quantopian newbie (who has developed on many other platforms). Ability to access bid/ask info in live execution is critical for many strategies, and not limited to what you might consider high frequency. My personal opinion is that it's critical to execute with current market quotes, but not as critical to backtest with bid/ask data, though some may scoff at the idea. My rationale is that (1) bid/ask data doesn't really make sense over a bar, it's implicitly a point in time snapshot measure (2) the slippage modeling feature gives you a good way to backtest on OHLC bars and, over the long run, get realistic results as long as your strategy doesn't depend on spread scalping, and (3) backtesting with bid/ask really means you need to build an event based algorithm that's responding to each change in quote, which is a huge pain and much slower because number of events may increase by a factor of 100. However, for live trading, you will or won't get filled based on the quote and being able to save that penny or avoid that unfilled order matters a lot in the long run. For instance, if I were running a gap-on-open algorithm, it's critically important to know what the ask is, since it both lets me decide if I want to bid and lets me bid on the open instead of waiting for a market to form. If I were running a pair trading algo on ETFs, I'd need to know that a spread had been created even including the spreads I'd need to pay on both legs of the pair. NBBO is easily available from the broker (I assume IB is providing real-time consolidated quotes to Quantopian?) and the info being available within a strategy would allow us to include logic in algorithms. Updates to NBBO could even be updated at the same frequency as current handle_data events (ie wouldn't need to increase event frequency). To the earlier question, there are many, many sources of historical tick data. You should be able to access historical tick data from IB (for a fairly short time window if memory serves). It's snapshotted (200 or 250ms) rather than true tick arrival, but more than adequate for purposes of modeling spreads. I also recommend IQFeed which offers relatively affordable paid data packages (you're looking at$100+/mo for this) with more and better NBBO. I'd recomment QCollector as a good client to manage your IQFeed data downloads and store in csvs that could be accessed by fetcher. Or look on bigmiketrading.com for something called BABar, which is an open sourced piece of software that serves the same purpose.

If I were running a pair trading algo on ETFs, I'd need to know that a spread had been created even including the spreads I'd need to pay on both legs of the pair.

Yeah, this has been my experience, many of the pairs and baskets that look great with bar data turn out, upon backtesting with real bid-ask spreads, to have aggregate spreads which span all the mean reversion you thought you had discovered. I have been using ActiveTick for historical tick data - they are very cheap, but not without their problems, and validating the tick data has become a major enterprise.

1. Per the FAQs (https://www.quantopian.com/faq#data), "For paper trading and real-money trading, we get a realtime feed of trades from Nanex's NxCore product. Those trades are bundled into one-minute bars and fed to the trading algorithms." So it is not a correct assumption that "IB is providing real-time consolidated quotes to Quantopian."
2. Presumably, the data for backtesting are from the same source, but I don't recall Quantopian ever revealing their source for the backtest data. For example, if I run a backtest on last week's data, those data were last week's live feed?
3. Regarding the use of fetcher, my understanding is that it could work for exposing external bid/ask spread data to the algorithm for backtesting, but it is not possible to set up a real-time feed for live trading.
4. For a given security, might there be a relationship between the OHLCV data and the estimated bid/ask spread at the end of the bar? Or are they completely unrelated?
5. Although perhaps not conventional, I've thought that Quantopian is throwing away information by not providing the timestamps of the OHLC trade data. For example, when did the high occur during the minute?
6. There are some unspecified latencies in the Quantopian/IB system, so even if algorithms had more market information, could it be used? Or would the advantage get washed out anyway?

Take look at Reuters. Its all there!

Simon,

Makes sense. If the spread is zero, then the OHLC values will equal the identical bid/ask values, i.e. there is only one price. But as the spread goes up, the OHLC trade values are somewhere in between the bid/ask values, and the spread could be arbitrarily large.

Grant

Grant, good point on the NxCore data - I overlooked that point, though I think the underlying need is the same. And I'm quite sure it'd be trivial for NxCore to transmit the latest NBBO with each rolled 1m bar (both real time and historical) if that were in scope for their agreement.

I figured out an inelegant way to get what I proposed - real-time access to bid/ask data, using undocumented url tags for yahoo finance. Very easy to access and parse the values needed. The below code could be called on each handle_data event to give NBBO state as a decision-making input. Not sure if the bid size and ask size are real time, but the bid price and ask price should be. I'm sure this is not an institutional grade data feed so I wouldn't ever rely on it blindly, but better than nothing.

I know urllib is not accessible within quantopian. Anyone know of a good method for getting equivalent functionality?

import urllib

url_bidSize = "http://finance.yahoo.com/d/quotes.csv?s=MSFT&f=b6"
url_bid = "http://finance.yahoo.com/d/quotes.csv?s=MSFT&f=b3"
b = urllib.urlopen(url_bid)

print "B_size " + bidSize
print "Bid " +bid


As far as I know, fetch_csv is the only method allowed in Quantopian to fetch external data.

You are right Mete. I've attempted a number of routes including urllib and pandas read_csv to access external files and they were not permitted. I'd like to add my vote to request that Quantopian provide bid/ask data, both for backtesting and for live trading.

Hi Robert,

I haven't done it myself, but you just have to put the data for fetcher in a file. Under live trading, the file gets read in before market open, but to my knowledge, there's no way to define your own intraday live data stream. One issue that I'm just now starting to appreciate is how one would synchronize such a stream with the bar data from Nanex which Quantopian uses. And what to do if the stream glitches? And if 10,000 Quantopian algorithms decide to use the same stream? Etc. Seems like it's gonna need to come from Nanex, or be managed somehow by Quantopian versus a 'maker' approach.

Grant

Hi Grant,

I've used fetcher and had a few reservations. One is that I just wanted to read in a symbol by symbol parameter file to create my universe and specify security specific precalculated parameters. Although not elegant, it's seems to be better to hard-code those into the context to be read during initialize. The algorithm could be stopped and edited to change the data. The other issue that I faced was that it seems that reading in a fetcher file causes data for all symbols contained in the fetcher file to be included in 'data' which is passed to the event handler. So, when an event occurs in another symbol, data is present for all symbols contained in the fetcher file. As such, I placed orders on securities which had not traded yet that day and the algo crashed. After that, I was told that you cannot place orders on symbols which have not yet traded that day. That is VERY unfortunate for me also. I had to eliminate my fetcher csv file to prevent data from appearing for symbols which have not yet traded. I have some data files which I might want to use also. Unfortunately, they would have to be updated daily. However, I will avoid that for the same reason along with the logistic issue. I suppose that my existing trading strategy is not optimal for Quantopian. I am trying to develop new strategies which involve more liquid securities. If I succeed, these issues may not be relevant.

Robert,

There's a way to test if a security traded in a given minute. See Fawce's reply to my original post on:

I wouldn't necessarily give up on dealing with thinly traded securities; it just takes a bit of coding to avoid problems.

Grant

@MarketMeat, I get the limit order prescription - that's solid advice for all situations (marketable LMT orders vs. actual MKT orders) but stops? If my strategy has no way of knowing of the bid/ask conditions, I absolutely want nothing to do with a stop loss order, which is not in any way guaranteed to fill at the stop price. Am I missing your point on stops?

My specific need for bid/asks is that with "thinly traded" (which is not the same as thinly offered...) stocks, the last trade price may not be the same as the current quote. If the last trade happened 5 seconds or 5 minutes ago (more common than you think during the midday, even for stocks that trade 100K shares per day) the "last" price is somewhat irrelevant. Despite the possibility that prices can change a millisecond before I transmit my order, in my experience they don't for small volumes, so knowing the market state and setting a limit order at that price or a penny through it is a great way to get high fill rates and price certainty.

This becomes even more true in pairs trading, where you need to know that a pair spread actually existed and wasn't an artifact of the way the bars were rolled or forward-filled. If I know that, as of a few milliseconds ago (or a few seconds ago or whatever) that you could lift an ask of 20.10 on PEP and at the same point in time could have hit a bid of 50.25 on KO, and if you believed that price pair was an arbitrage with a reasonable margin of safety, then I'm not crazy for believing that the market is out of whack and I could reasonably submit a pair of limit orders on each side of the trade with a margin of safety for moving prices. However if the arbitrage is only to do with a last price from PEP that happened a few seconds ago and the last price from KO that happened a few minutes ago (not likely the case with this particular pair, of course) then the "arbitrage" is very likely an illusion.

However if the arbitrage is only to do with a last price from PEP that happened a few seconds ago and the last price from KO that happened a few minutes ago (not likely the case with this particular pair, of course) then the "arbitrage" is very likely an illusion.

+1