When running an algorithm, have you ever seen the log message “Your order has been partially filled”? If this is the case then you may not have controlled for liquidity risk. Liquidity risk arises when your algorithm is not able to buy and/or sell the securities dictated by a strategy, and it can completely alter the composition of your trading portfolio.

Our answer to the problem posed by liquidity risk are two universes of equities: the Tradeable500US and the Tradeable1500US. By limiting an algorithm’s universe to equities in one or both of these baskets, you can minimize liquidity risk and almost guarantee that your order will be filled.

These stock universes combine the tradability filters proposed here and here and combine these with a sector exposure limit that prevents the universe from being biased towards any one sector in a robust and fast way. We envisage this tool existing as an importable set of equities in the Quantopian backtester that will remove any algorithmic anomalies caused by liquidity risk. To use this tool in a trading strategy, one could set a pipeline screen for the Tradeable500US, and only equities in this universe would be considered by the algorithm. The Tradeable500US could also be used in the research environment to speed up complex computation by limiting the number of equities queried by Pipeline. It would also allow you to limit your interaction to real, traded stocks, as opposed to recent IPOs and depositary receipts, which can confound good analysis.

Before we set this process in stone, we want your feedback in order to make this methodology as robust as possible. Are there any corner cases where an illiquid stock might slip through our filters? The more effective our filter is, the stronger your algorithms will be. Feel free to experiment with universe sizes, sector exposure limits and let us know what works best for your code.

163
Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

64 responses

Thanks Gil -

Would it be possible to select only tradeable from Nasdaq? And from that, the top N stocks, by market cap?

Grant

Hey Grant,

This is indeed a possibility and I think this is how that filter would look.

N = 500

def make_pipeline():
nasdaq_screen = (morningstar.share_class_reference.exchange_id.latest.eq('NAS'))
return Pipeline(
screen = (nasdaq_screen & mkt_cap_screen)
)

run_pipeline(make_pipeline(), '2016-01-01', '2016-01-01')

I definitely thought about using market cap to rank the securities, but decided against it in the end. My thought process is that if we are aiming to measure liquidity, why not actually rank on dollar volume traded as opposed to market cap, which many use as a proxy for liquidity (larger companies generally have the most shares traded, as you rightly point out).

As for the Nasdaq, that is also a good idea but my thought is that it is not fully scalable. You'll notice in the documentation there is a placeholder filter called domicile. When the Quantopian platform eventually expands into global markets it will be a lot easier to filter with respect to headquarter_country rather than specifying specific exchanges.

Both things that I definitely considered and thanks for bringing them up!

Best,

Gil

Gil,

Looks awesome, but I hope there is a means of permutation between all the different filters described above:

EDIT: Basically hoping each filter can be commented at will in whatever code makes its way to the back tester.

I'm all for this, as it helps create a level playing field for algos to compare their strengths.

For me, a good feature would be to remove companies that are current acquisition targets. These tend to have very low volatility and very high event risk (if and when acquisition doesn't go through), so confound any conventional risk parity weighting scheme.

Gil -

• You have several fixed values in your code, including:
# Equities with an average daily volume greater than 750000.
high_volume = (AverageDollarVolume(window_length=252) > 750000)

# Equities for which morningstar's most recent Market Cap value is above $300m. have_market_cap = mstar.valuation.market_cap.latest > 300000000 # S&P Criterion liquid = ADV_adj() > 250000 It would seem that these should be normalized to the current market. For example, if I run a backtest starting in 2002, won't they result in a bias, since presumably you are using current values? In other words, your filter criteria need to be point-in-time; they need to be relative to the current market. • I don't understand your definition of "sector neutral" and how you are applying it. Please elaborate. Also, you state that you are wanting to limit liquidity risk, so how does sector neutrality relate to that objective? Or are you trying to accomplish something more than just limiting liquidity risk? • I'm wondering if your IPO filter is too simple. For example, how does it handle GOOG/GOOGL? Would GOOGL be excluded, as an IPO? • I suggest having a look at all of my comments on https://www.quantopian.com/posts/pipeline-trading-universe-best-practice, to see if you missed anything (e.g. stocks with verified bad data that has not yet been fixed by Quantopian). • How do you propose to check the list point-in-time, to make sure it is correct? Seems kinda tricky. @Frank I am not quite sure what you mean by "permutation", but one thing I will say is that we are aiming for this code to not be visible in the backtester. A possible implementation could be (and this is essentially pseudo-code, as the engineering team will have their own ideas about this) from quantopian.universes import Tradeable500US ... def make_pipeline(): return Pipeline( screen = Tradeable500US ) @Dan H Acquisition targets are definitely something we talked about as we definitely had the S&P500 methodology in mind when considering constituents of the universe. One thing I would flag up about your statement about volatility (which is entirely correct) is the idea of weighting. The Tradeable500US is a universe and can be thought of as a basket of stocks without weights, if we were going to risk-weight these securities we would definitely consider the corner case of acquisition targets. @Grant • Definitely take the point that these values have different meanings in 2002 and 2016. Will definitely investigate to see if this creates an issue. My thought is that these numbers are not restrictive even in 2002 (the turnover graph I give starts at 2003, so this year would be interesting to dig into) • So the idea behind sector neutrality is that this code aims to provide a universe of stocks. If it was just to return technology stocks, this would hardwire sector bias into any algorithm using the universe. By capping sector exposure, the code ensures that the basket of stocks returned is indeed representative of the entire universe of equities. You are right to point out that this sector neutrality constraint does come at the price of having slightly less liquid stocks in the universe; however, this is a tradeoff I am willing to make to ensure algorithms are able to select from all industry groups. • As for the IPO filter, there are two checks and balances in place. Firstly, a stock when it is first issued has the suffix '.WI', which is filtered out. Secondly, my implementation of AverageDollarVolume (ADV_adj) sets close price np.nan values equal to 0 and the calculates the true mean. This more rigorously removes recent IPOs. (Regarding GOOG_L vs GOOG, GOOG is removed as a result of the IsPrimaryShare() filter) • I am not quite sure about the "verified bad data" but I have not had issues with any data series that have been included in the Tradeable500US or Tradeable1500US. I definitely looked through your comments on that thread and they were very useful in thinking about the construction of the methodology. One thing I would say is that we will do our best to remove any "bad data" from the these universes. • For point-in-time checks for the list, see my reply to Frank. We imagine that checking inclusion the Tradeable500US and Tradeable1500US should be as easy as setting a pipeline screen. @Gil Instead of fixed numbers for high volume, wouldn't something like what is in https://www.quantopian.com/posts/pipeline-mean-reversion-example work as a screen or mask ... # Define high dollar-volume filter to be the top 5% of securities by dollar volume. high_dollar_volume = dollar_volume.percentile_between(95, 100) @Alan definitely a good idea. One possible hitch to that approach is that the percentile_between method makes the filtered universe dependent on the wider universe. So the top 5% of a super-universe of 1000 is 50, but the top 5% of a super-universe of 10000 is 500. My thought is that the Quantopian super-universe will change over time, especially as global equities are added, but I definitely take the point that the universe_filter cutoff of 750000 for AverageDollarVolume seems rather arbitrary and a dynamic threshold might be more robust. Gil - See https://www.quantopian.com/posts/missing-split-adjustments-for-lbty-b-and-lbty-k for a recent report of stocks that have bad data (missing splits for LBTY_A and LBTY_B). So, the idea is that as soon as problems are found, the offenders would be added to a database, which you could include in your filtration/filtering/filter process (take your pick). This has been one of my gripes for awhile. The list should be available at least for copy-and-paste into an algo, from a downloadable spreadsheet or something. End of rant. Cheers, Grant @Grant Thanks for bringing this up. I gather from the thread you posted that Nathan has raised this issue already but I will make sure to chase it up Hi Gil, I'm still not understanding your definition of "sector neutrality." Perhaps you could outline the steps, in words, so that the definition is crisp. You are doing more than just outputting an overall dollar volume universe, e.g. ranking by dollar volume and taking the top 500. Are you effectively dividing the full universe into sectors, ranking each sector by dollar volume, and then combining, to get the top 500, so that within the 500, the fraction of companies in each sector is similar to that of the full universe? I would also consider how to make this tool more easily configurable, via high-level user settings, versus need to drill down and unravel code (even if it is generously commented). For example, say I want AverageDollarVolume(window_length=60) instead of AverageDollarVolume(window_length=21). I'd have to create my own version, and edit the low-level code. Or what if I don't want sector neutrality, but like everything else? It should be released as a call-able function-like thingy with settings, versus code with a bunch of magic numbers and choices embedded in it. Grant Could you also make a version for use with get_fundamentals, please? @Grant Maybe it's the word "neutrality" that is confusing. Let me define two terms, neither of which is canonical, but will hopefully be illustrative. Let 'strong sector neutrality' occur if each sector made up an equal proportion of the universe (e.g. if there were 4 industry groups and each represented 25% of the universe). In contrast to this, 'weak sector neutrality' prevents any one sector from being a larger proportion than a desired threshold (here, given as the sector_exposure_limit). My algorithm attempts to preserve weak sector neutrality, albeit at the expense of very slightly lower average liquidity of the universe. This was chosen because this is a universe, and should therefore be representative of the collection of all equities available on the Quantopian platform. I'll write out my process for create_tradeable in pseudo-code: 1. Find the threshold number of securities given by the sector_exposure_limitmultiplied by the tradeable_count (if a fraction, round up) 2. For each sector, filter so that the top threshold securities by AverageDollarVolume is returned. By limiting each sector to threshold securities you ensure that the sector_exposure_limit will not be surpassed (with the exception of a slight breach caused by a rounding error). 3. Apply this filter to the whole Quantopian universe, in addition to the universe_filter and take the top N securities here by AverageDollarVolume. This methodology is attractive because it is fast, limits sector exposures and only returns very liquid securities. With respect to creating a final product that allows parameterization such as screen = Tradeable500US(AverageDollarVolumeWindowLength=60), I think Dan H on this thread put it best when he said that this tool aims to create a "level playing field for algos". By creating standard universes, we can truly minimize liquidity risk for all algorithms and still compare them on the same set of securities. However, I will definitely run some tests on longer windows of AverageDollarVolume to see if this methodology has better attributes. @Tim I think the goal is to migrate algos to Pipeline, but will definitely consult with the engineering team to see if there is a good way to do this for get_fundamentals. If there is a particular use case you foresee where get_fundamentals would be better for this, definitely let us know. @Gill, I have to admit that the only reason I am asking for this option is that I have never been able to learn how to work with the Pipeline framework -- but then again, it could be that I am not the only one ... Many thanks for considering an implementation. Did you mean to exclude the filter sector_check = Sector() != -1. from the final filter combination? universe_filter = (high_volume & primary_share & have_market_cap & not_depositary & common_stock & not_otc & not_wi & not_lp_name & not_lp_balance_sheet & liquid & domicile) @Tim If you're interested in the Pipeline framework, I would definitely check out Jamie's awesome Pipeline tutorial. I found it really useful even as a frequent user of Pipeline. This can be found at https://www.quantopian.com/tutorials/pipeline @Bryce, that is definitely a good spot. The current filter should remove anything with a Misc. (-1) sector code but I will make sure this is explicit in the final implementation of this methodology. Thanks for flagging this up @ Gil By "permutation", I meant the ability to sift between each and all of the filters that generate the "Tradeable 500". Based on the sample code above it seems like that will not be the case. EDIT: But no big deal How do you use this in a normal backtest? With respect to creating a final product that allows parameterization such as screen = Tradeable500US(AverageDollarVolumeWindowLength=60), I think Dan H on this thread put it best when he said that this tool aims to create a "level playing field for algos". By creating standard universes, we can truly minimize liquidity risk for all algorithms and still compare them on the same set of securities. However, I will definitely run some tests on longer windows of AverageDollarVolume to see if this methodology has better attributes. Not sure I follow the logic here. Presumably, you are heading toward releasing the code in some form (on github?), so that it is controlled and users can plug it in (copy and paste...ughh!). You can set it up so that there are default settings (perhaps the ones you chose), which would be the recommended values, somehow vetted/biased by Quantopian. But if you bury all of the parameters within the code, then it is kinda unconventional for a module. Parameter adjustment then requires delving into the code, making changes, and releasing a new version. Also, it should be easy to study the effect of each parameter (e.g. in a loop, to map out a multi-dimensional response surface). If the parameters are not accessible in a call-able form, then again, the code has to be re-factored. For example, have a look at http://docs.scipy.org/doc/scipy-0.17.0/reference/generated/scipy.optimize.minimize.html#scipy.optimize.minimize : scipy.optimize.minimize(fun, x0, args=(), method=None, jac=None, hess=None, hessp=None, bounds=None, constraints=(), tol=None, callback=None, options=None) It would seem that your tool should conform to a call-able function or equivalent, right? You can still release code that would deliver a "standard universe" by controlling the defaults, but changing the defaults shouldn't require a new version of the code. @Frank I think you're right, I believe these filters are not permuted over but rather applied in one batch as a pipeline screen. @Suminda at the moment this requires some tinkering but we are planning on releasing a very user-friendly version in the near future (see my reply to Grant below for more details) @Grant definitely take your point on board. This is how I envisage the Tradeable500US's final implementation: from quantopian.universes import Tradeable500US ... def make_pipeline(): return Pipeline( screen = Tradeable500US ) So, not parametrized at all. The purpose of this post is to expose the inner workings of this filter so that users can play with the implementation and see why certain decisions were made. Also make a 50, 100, 1000, 1500, 2000 version available through import. Perhaps the 50 maybe too small if not include it also. Gil - So why the reluctance to make it parameterized? Is it tricky to code? It just seems that you are assuming that the choices you've made will yield the best results, and that the crowd will not be able to do better by making adjustments. Just trying to understand where y'all are headed with this. Don't get me wrong: it is a great idea to support clean universes with a few lines of code, but you shouldn't fix the criteria. In the end, will it be possible for users to grab the code from github, and paste it into their algos, and tweak it, should they not want to use your few-line code above? Any more thought on adding a has_bad_data filter? To me, this makes a lot of sense. Why would you want users to be writing algos if there are errors in the data? Grant @ Gil Its all good. The code in the links you provided in the first post of this thread provide the access I was seeking. Gil, Regarding your sector neutrality stuff, I see: tradeable_pipe.set_screen(tradeable_filter & sector_filter) So, it seems that in effect, you are attempting to create a universe with equal numbers of securities in each sector with sector_filter that are also deemed to be tradeable, with tradeable_filter. You can end up with unequal numbers of securities in each sector because the sector is smaller than threshold to begin with, or it just doesn't have enough tradeable securities. Correct? One potential problem I see is that you are unhinged from using the more fundamental market cap measure as a means to slice and dice, as is typically done. By using average dollar volume, at least in theory, the market could go berserko, and you could end up having a universe defined by market conditions, instead of fundamental value. For example, what if everyone just decided to stop trading AAPL--you'd have to drop it, even though it has a huge market cap. And conversely, low market cap stocks could end up in the universe, just because they are trading like crazy. I guess I'd wonder if the selection criteria might be assuming certain steady-state market conditions. What if there's an extreme historical event that causes a lot of turmoil? Does the universe get skewed in some whacky, risky way because of the dollar-volume ranking? Or maybe for hedge fund style trading, market cap is irrelevant, and you want to be dealing in high dollar-volume stocks? It would be interesting to see how an investment in a hypothetical Quantopian Tradeable500US ETF would compare to one in SPY or RSP. Are they at all similar? Regarding my GOOG/GOOG_L question, so you exclude GOOG? It has a market cap of$484.46B and is publicly traded, right? So why exclude it? Also, I'm wondering if you end up excluding GOOG_L for 1 year, based on your code (I think that's what window_length = 252 ends up doing, but I'm not sure)?

@Suminda we will make a few versions of this universe available. We are thinking about 500, 1500 and 2000, but could add more.

@Grant:

• RE parametrized function: we want to create a level playing field for all algos. By allowing users to adjust these universes this distorts the pool of securities from which they can select and makes these universes idiosyncratic to all users, not general. Use of these universes will be optional, so if you do not want to use them, you are free to do so, but they will help clean your underlying data.
• RE has_bad_data I have put in a request for this.
• RE sector neutrality, these corner cases are dealt with in the code. Also, as I said before, we are aiming for 'weak sector neutrality' so a certain amount of inequality between the sectors is expected.
• RE market cap, the point of this universe is to give you tradeable stocks to mitigate liquidity risk. If a company like AAPL stops trading, we want it to drop out of the universe as you can not buy these equities, and this failure to fill an order can undermine complex and brilliant techniques. To guard against spikes in trading volume, we use the AverageDollarVolume over one month. The length of this window smooths volatile conditions.
• RE ETF one thought we have about this is to assign weightings and create an index from this data, we will continue to work on this idea as it would provide a superior benchmark for certain algorithms.
• RE GOOG, as I mentioned above. GOOG is not a primary share and is therefore removed from the universe.

Are IPOs excluded for 1 year? Am I correct that GOOG_L would not be included for 1 year after its IPO? Just wondering if there might be important corner cases such as this that are not handled correctly, since GOOG_L was not a new company, but another event that one would not term an "IPO".

Gil -

A few more thoughts:

• I kinda get the idea of Quantopian defining and offering, with a simple API, tradeable stock universes that the masses can use in developing algorithms (presumably for the contest/fund). The final implementation you illustrated above, with a few lines of code, is elegant. But I'm wondering how it fits with the Quantopian paradigm of transparency, open source, and "leverage the collective intelligence of the crowd" which make Quantopian unique. To this end, how will users have access to the underlying code, should they want to review it in detail? Or modify it, to create their own "My_Tradeable500US" for example?
• Any possibility of releasing this API so that users could import their own filter? For example:

...

def make_pipeline():
return Pipeline(
)

Or would it be a matter of a copy & paste? And from where? Github? And example posted in the forum? Other?

• One concern is that there will be changes to Tradeable500US and the Tradeable1500US by Quantopian without notice to users. Inevitably, you'll find a need to make adjustments, so how will you do revision control (public github repository)? For example, will users be able to control the revision, when importing?
• I have the vague sense that using dollar-volume may bias toward price-volatile stocks, since stocks that have a lot of volume may also be seeing large price movements. So, I'm wondering if one ends up with a more volatile universe than would be desired? Under times of market stress, does the universe end up tilted toward more volatile stocks?

• One general risk is that by attempting to mitigate liquidity risk, you'll create another risk in that you are herding users to try to compete head-to-head in areas of the market that are dominated by big money. It might actually make sense to compete at low or intermediate levels of liquidity, where the big boys are not going to be able to play, since they have to slosh around gazillions with every trade. Your screen is highlighted by Dan D. on https://www.quantopian.com/posts/contest-20-rules-changes-$10m-capital-base-new-entry-required as consistent with some overall strategy at Quantopian to "think big!" but does the strategy make sense? Dan doesn't really flesh it out there, discussing the rewards and well as the potential strategic risks. Quantopian has a bias toward high-capacity strategies, but why? And what are the risks? I would discourage you from the "set this process in stone" mentality since it is not even obvious that users should be encouraged to write high-capacity strategies. Maybe the niche is in lots of lower capacity strategies and dealing in lower liquidity stocks will win the day. Hey Grant, with regards to GOOG vs. GOOG_L, I have looked into this. GOOG would be discounted from this methodology as it is not a primary share, while GOOG_L's stock split is be applied throughout the entire pipeline window, as is the case with all stock splits. GOOG_L has close prices and volumes back from years before the stock split, so this is not an issue. To address your point about price volatility, notice that the candidates for exclusion from the universe are only those stocks that are deemed untradeable by the universe_filter. Therefore, price volatile stocks will not be excluded because of large swings in AverageDollarVolume if they are already part of the universe. To answer your question about transparency, that is why we posted this notebook: to allow users to understand and critique the methodology fully before implementation. If you have some ideas about how to trade low and medium liquidity stocks, I would definitely encourage you to give those a go! However, I would definitely add some lines of code which handles the case of an order not being filled. Regarding price volatility, I'm wondering if there is an implicit bias to be trading more price-volatile stocks with the proposed approach. It would seem that if there is market stress, there would be a tilt toward stocks with higher trading volumes, under the assumption that there is a correlation between trading volume and price volatility (a qualitative description is given on http://www.investopedia.com/ask/answers/09/daily-volume-volatility.asp). Is there any academic or other precedent for this universe construction method? Or did someone at Quantopian just wake up one day and make it up? It still could be the best way to go, but normally there would a bibliography, putting the approach into context, and justifying it. Just seems like it could end up having a pretty big impact on the algos you get for the fund, so I'm wondering if it is a proven best-practice, or more ad hoc. If you have some ideas about how to trade low and medium liquidity stocks, I would definitely encourage you to give those a go! The point is that it would be great to give them a go, using a standard configurable API like the one you could develop, but unfortunately it apparently won't be configurable at all. The logic in this design choice is still lost on me. Regarding orders not being filled, my understanding is that they tend to get filled in real-money trading. The Quantopian simulator doesn't necessarily handle things accurately, and then they get cancelled at the end of the day automatically. Hi Gil - I think this is a great idea, but I'm just concerned about the inflexibiltiy and lack of transparency inherent in the ultimate implementation, based on your responses. Any feedback from other users or opinions around the office on the idea of a release that would simply be along the lines of: from quantopian.universes import Tradeable500US ... def make_pipeline(): return Pipeline( screen = Tradeable500US ) And potentially without access to the underlying code? I guess I'd wonder if it might back-fire, since more skilled programmers will "roll their own" anyway based on your example (and perhaps share their customizable code on the forum, undoing your effort to define a standard). If it will not be required to use a canned universe, then I don't understand your comment: By creating standard universes, we can truly minimize liquidity risk for all algorithms and still compare them on the same set of securities. But then somewhere else you say that users would not be required to use a canned universe. Will this be in a public github repository: quantopian.universes Has this been ported over to be run in the backtester? Your request includes: Feel free to experiment with universe sizes, sector exposure limits and let us know what works best for your code. Do you plan to provide an example? Then I could plunk it into my code and give it a try. Also, how long will the "comment period" last? Do you have an anticipated go-live date? @Grant, I am currently talking to the engineering team about possible implementations and discussing your thoughts on the matter. The Tradeable500US is by no means a mandatory universe, however it can be thought of as an easy and fast way to reduce liquidity risk in for an algorithmic strategy. Users should definitely experiment with different universe selection methods; however, having a few standard, robust, useful universes will definitely be a tool in an algo writer's toolkit, for a jumping-off point if nothing else. If they want to create their own methodology based on that of the Tradeable500US, they are more than welcome to use this notebook to begin that process. As for comments and go-live, I will update this thread as deadlines and plans are made/change. Gil First off, very good thread. This is exactly the kind of topical lucid discussions which keep me coming back to Quantopian. Kudos to Grant, Frank and the others for the substantive feedback. So, my two cents... Provide a built in TradeableXXXUS filter, ideally with parameters to make it configurable. Great idea! Make the use of this filter somehow mandatory (like the leveraged ETFs exclusion rule). Bad idea! There are a number of reasons I feel a mandatory universe is bad, but here are my top ones: 1) Why the fixation on liquid stocks? I'd imagine there are some very good strategies out there which focus on these securities. Opening and closing positions over time in smaller trades. If Quantopian truly wants a diverse set of funds then excluding these doesn't make sense. 2) Why the arbitrary exclusion of "recent IPOs" and "depository receipts (ADRs)". Again, I'd imagine some very viable strategies which include these and their exclusion would limit the diversity of strategies which Quantopian is presumptively striving. 3) As a practical matter, it would be virtually impossible to create a filter which doesn't generate securities which move in and out at the "fringes". As a security drops out of the list, presumably every algorithm holding that security would then sell. This isn't a very healthy state for the Quantopian fund if a number of algos suddenly sell a single security (or potentially buy as a stock moves into the universe). Reading through all the posts above, it seams like the consensus is towards an optional built in "quick start" filter which I'd support. Dan @Dan I think you've hit the nail on the head when you say "optional built in 'quick start' filter". It effectively provides a very clean dataset to prototype and trade algorithms on, although I take your point in (2) that really interesting strategies can be made by trading these types of securities. The fixation with liquid stocks, your point in (1), is due to the type of strategy this methodology could be used for: cross-sectional factor algorithms. With these, you are effectively ranking attributes of stocks, as opposed to considering the individual equities themselves. Illiquid or non-traditional securities like ADRs often have attributes that skew these rankings. So this security removal issue, (3), is definitely something I considered -in the notebook, I call this "turnover". This methodology attempts to keep turnover to a minimum to guard against this. Thanks for the feedback! Gil - Thanks for the update. Looking forward to your feedback. While you are talking with your engineering colleagues, it would also be interesting to hear about the vision for quantopian.universes in general. For example, would it be in a public archive? Would it be possible to propose changes there? Fork the code, and then be able to import it. Etc. Basically, how would quantopian.universes work, beyond being a Quantopian internal repository? Or would it work just like zipline (which I guess is still importable)? This seems like bending over backwards to avoid fixing the broken slippage model, no? Why not just calibrate a new slippage model to the realized slippage you've observed from all of your realized IB fills for live traders and your prop accounts, and fix the backtester so that it can fill on bars where historically there were no trades? Slippage isn't squared, it's square-root-ish (3/5? fun project), and for small lots, one usually gets filled very quickly, even if the stock hasn't traded all day. If it's an ETP, you very often can get filled for arbitrary size very quickly. All of which makes the default squared slippage model with a 2.5% of historical volume per bar terribly misleading. @ Simon, Love the idea. Specifically using live fills as the target data in an ML algo, and then using backtester fills and other well thought out feature data to calibrate the slippage model. When running an algorithm, have you ever seen the log message “Your order has been partially filled”? If this is the case then you may not have controlled for liquidity risk. Liquidity risk arises when your algorithm is not able to buy and/or sell the securities dictated by a strategy, and it can completely alter the composition of your trading portfolio. So, we have a problem statement that that needs to be revised. Are we talking about backtesting, Q simulated/paper trading, paper trading at IB, trading with real money at IB? And the problem statement says nothing about price impact; it only discusses an inability to buy/sell, regardless of price. For simulated and real-money trading, we also have automatic cancellation of orders at the close. So, if the problem is partial fills of orders, then this feature of Q trading should be included in the discussion. It is not obvious that spreading orders out over a couple days would necessarily be a bad thing for the kind of intermediate-time-scale algos that would be suitable for the Q fund (which presumably is also relevant, but not mentioned explicitly in the problem statement). The fixation with liquid stocks, your point in (1), is due to the type of strategy this methodology could be used for: cross-sectional factor algorithms. The problem discussion should probably say something about the intended primary use, which I gather is to support scalable factor algos for the Q fund. There's a serious convolution of the problem, as stated, with a backtester that is inaccurate (meaning it doesn't do an adequate job of simulating real-money trading). If in reality, orders are pretty much always filled (perhaps with a price impact), then the Tradeable500US is solving a problem that doesn't exist. The problem is that the backtester is inaccurate. It seems like the backtester should be reconciled with reality first, and then you can circle back and sort out how best to define trade-ability. Regarding using real-world data to develop a more accurate slippage model, I have to think that IB would have lots of data. You could give them a call, to see if they'd provide it (perhaps for a fee). As Simon mentioned, it could be an interesting research project. Hi Gil - Any thoughts on Simon's recent comment and the other feedback in this thread? It seems there are some open questions to be addressed before you march ahead with a release. Any feedback from the rest of the Q team that could be summarized here? Above, it sounded like you'd be running some of this by other Q employees. What have they said? Grant @Grant There have definitely been a lot of action over here regarding this universe selection methodology, but I am hesitant to talk about specifics as we are still working out these details and I would hate to promise anything that we do not eventually deliver. Regarding the slippage model, I think, @Simon, you are getting this 3/5 figure from the paper "Direct Estimation of Equity Market Impact" by Almgren et al. This paper was actually just presented at an internal academic symposium, so transaction cost models and slippage models are definitely on our mind here. Gil - Thanks for the update. As a general direction, I think this is a great idea. The thing is, practicing engineering is hard and a classic mistake is not to spend enough time understanding the problem/need, getting the requirements correct (or not even writing them down, even in outline form!), finding the gaps/risks, revising the requirements, etc. The other mistake is not to sort out ahead of time how each requirement will be verified, and confirmed valid for the end-use. For example, see: Customer Centered Products: Creating Successful Products Through Smart Requirements Management by Ivy F. Hooks et al. Link: https://amzn.com/0814405681 Whole books on the topic. I commend Q for throwing an intern like you into the deep end of the pool. Keep at it, and don't assume your managers know diddly, since it is possible that their brains have been taken over by stress-induced chemicals (no criticism--just human nature). It is your job to think for them. Is this going to limit all algo's to a universe of only top 500? What is your strategy trades small cap stocks? As I understand, it will not be mandatory. I'd imagine, though, that the code could be adapted to handle small cap stocks (e.g. first select all small cap, and then select the most tradeable). Also think about the calculation overhead. If I use 500 any calculations other than for this is a wast. Also keep in mind what comes into the 500 and leaves the 500. @ Gil - How's this coming along? Do you have an update to your straw-man proposal above? Perhaps you could summarize where things stand. Regarding Rafael's comment above, I think it is an interesting point. Say I wanted a universe to be similar to the Russell 2000 (e.g. IWM). What would be the path for a Quantopian user to set such a thing up? Or the Nasdaq 100 (QQQ) as another example? I would think that any universe creation tool developed by Quantopian would want to make it easy for users to emulate major stock market indices (and their ETFs). Is this an objective? Another consideration would be for Quantopian to offer universes that a user could not implement in his code, either due to computational limitations and/or data licensing restrictions (e.g. baskets of stocks that have potentially interesting statistical characteristics). Say I just wanted ETFs, point-in-time. How would you support that? Thanks, Grant @Grant, Just working on implementation now. I don't want to promise anything that will not definitely be in the final product, but I can say we are aiming to provide a parameterized implementation in addition to the Tradeable500US and Tradeable1500US for users who want to design a universe bespoke to their algo. @Gil, I agree with Grant's comments above. A parameterized implementation sounds great but it seems to make sense to simplify the process for replicating a range of popular indicies (based on user demand) rather than simply the 500 and 1500 you have mentioned. Also, presumably the off-the-shelf universes would be pre-calculated across the full history for very quick retrieval rather than simply running optimized code for generating the universe each time? Gil - Thanks for the update. I have an algo to plug it right into; it'll be interesting to try. Grant Gil, I've been thinking about using your top 500 code with my own factor ranking in place of the 21 day dollar volume ranking. My factor ranking relates to my trading strategy, rather than a universe filter per se. This would give me a convenient mechanisms to (weakly) limit sector concentration. Obviously I can go ahead and do this myself with your code, but it got me thinking, how are you defining liquidity? Is it absolute (the two hard coded annual average dollar volume thresholds) or relative (the 21 day dollar volume ranking)? Why have both? How do they interact? My other question relates to lookback periods. I generally use a year or a period that relates to the hold of my strategy. For example short term mean reversion has a 1-2 week hold, so seems a sensible lookback for establishing a mean. This is not so much scientific, rather it's trying to cut the number of parameters. I wondered why 21 days? I would guess a high volume over 21 days means likely high volatility means likely price decline means likely mean reversion upwards. I have to wonder if this approach is implicitly biased toward more volatile stocks. Would there be some way to factor out volatility, so that the filter is just on trade-ability (which, in this case is the ability to get orders filled without order cancellation at days end, when backtesting/simulated trading on the Quantopian platform). A common thing I've seen is to divide dollar volume (or turnover) by volatility (or similar measure). This is like saying volume imbalances cause price impact which causes volatility. We want stocks that don't move much when volumes increase. I thought about suggesting this, but it seems incompatible with the slippage / price impact model, which is a monotonic function of dollar volume only. @ Gil - In one of my algos, I'm seeing: 2002-07-12 WARN Your order for 142 shares of PG_WI failed to fill by the end of day and was canceled. In your code above, you have: not_wi = ~mstar.share_class_reference.symbol.latest.endswith('.WI') So, it is not clear that your code will filter out all of the when-issued stocks. EDIT: Tried your .WI filter, and it removes _WI. Why the different symbols between Morningstar and your main database? Could lead to confusion/mistakes. Hi Gil, I'm using the code below, but still getting: 2002-07-03 WARN Your order for 155 shares of PG_WI failed to fill by the end of day and was canceled. Shouldn't not_wi = ~mstar.share_class_reference.symbol.latest.endswith('.WI') remove all of the _WI stocks? def make_pipeline(n_stocks): ### Factors # Create a factor for the market cap. market_cap = mstar.valuation.market_cap.latest # Create a factor for the exchange ID. exchange_id = mstar.company_reference.primary_exchange_id.latest ### Filters # Create a filter returning true for securities with a non-nan market cap. has_market_cap = market_cap.notnan() # Create a filter returning true for securities with a non-nan market cap. has_exchange_id = exchange_id.notnull() # has_exchange_id = exchange_id.eq('NAS') # Equities not listed as depositary receipts by morningstar. # Note the inversion operator, ~, at the start of the expression. not_depositary = ~mstar.share_class_reference.is_depositary_receipt.latest # Equities that listed as common stock (as opposed to, say, preferred stock). # This is our first string column. The .eq method used here produces a Filter returning # True for all asset/date pairs where security_type produced a value of 'ST00000001'. common_stock = mstar.share_class_reference.security_type.latest.eq(COMMON_STOCK) # Equities whose exchange id does not start with OTC (Over The Counter). # startswith() is a new method available only on string-dtype Classifiers. # It returns a Filter. not_otc = ~mstar.share_class_reference.exchange_id.latest.startswith('OTC') # Equities whose symbol (according to morningstar) ends with .WI # This generally indicates a "When Issued" offering. # endswith() works similarly to startswith(). not_wi = ~mstar.share_class_reference.symbol.latest.endswith('.WI') # Equities whose company name ends with 'LP' or a similar string. # The .matches() method uses the standard library re module to match # against a regular expression. not_lp_name = ~mstar.company_reference.standard_name.latest.matches('.* L[\\. ]?P\.?$')
# Equities with a null entry for the balance_sheet.limited_partnership field.
# This is an alternative way of checking for LPs.
not_lp_balance_sheet = mstar.balance_sheet.limited_partnership.latest.isnull()
# Get the top n_stocks securities by market cap (for securities that have a market cap).
top_market_cap = market_cap.top(n_stocks, mask=has_market_cap & has_exchange_id & not_depositary & common_stock & not_otc & not_wi & not_lp_name & not_lp_balance_sheet)
# Combine all of our filters.
tradeable_filter = has_market_cap & has_exchange_id & top_market_cap
return Pipeline(
columns={
'market_cap': market_cap
},
)

Hi Gil,

Have a look at:

If at all possible, there should be some way to filter out stocks with known bad data, based on a database maintained by Quantopian. I certainly wouldn't consider LBTY_A and LBTY_B "trade-able" until their data are fixed.

I suppose a case could be made that users just need to take it on the chin, under the assumption that data errors will pop up and be fixed promptly. But as the LBTY_A and LBTY_B case illustrates, many weeks (months?) can pass before a fix is released.

Hello Gil,

Any update? You've expressed concern about promising things that you might not deliver. I guess I'd simply put out a revised proposal for comment, based on your current thinking, to get more feedback. For example, what about Simon's point above? It sounds like you are mulling over changes to the slippage model, which would seemingly be a better way to improve the simulation accuracy of order fills.

Regarding my comment above "How do you propose to check the list point-in-time, to make sure it is correct?" say I wanted to confirm that the "when-issued" filter actually works. Other than Morningstar, does Quantopian have access to a list of all stocks classified as "when-issued" that could be used to check the filter? Similarly for the other filters? In other words, how can one tell if all of the junk and only the junk has been removed without an independent data source for comparison (and even then, there could be errors)?

If you haven't already, you might also run this by your new CIO, who has a lot of industry experience and might be able to share industry best practices in this area (perhaps not covered by any NDA he's under, since it is pretty general stuff).

Hey Grant,

So the project is definitely coming along well and we hope to have something out soon. We would rather be thorough and create a truly useful, robust product than put something weak together quickly.

With regards to errors, I have not had time to look into this particular edge case, but I will look this over before release.

Also, our CIO and VP of Quant Strategy have been very involved with the final product so that we can provide something that is really industry-grade.

Hi Gil,

Another split missed:

https://www.quantopian.com/posts/missing-split-nke

Also, it appears that the missing splits reported (1 month ago) have not been fixed:

Any discussion about how to handle the situation on your end? Seems you could include tools for users to skip stocks with data problems.

Personally, I'm just keeping a running list in my algo:

All good, except that I probably don't have the full list (and may stumble on more), and once the data are fixed, I'll want to update the list. Guess I'll get notified by "listening" to my posts and assume that Q will follow up there. Perhaps dealing with data errors is a separate project that could be integrated with your tradeable universe project at some point? Or maybe it'd be easy to incorporate the worst offenders (e.g. missing splits), since it is fairly painful from a user standpoint to catch them, verify, report, track, etc.?

Grant

Hi Gil,

How's the "(almost)" part coming along? Any update?

Grant

Hey Grant,

We are in the last stages of testing the product and constructing material to teach the community how to use it. Not too long a wait now!

Gil

How is this going? When will it be out? Also make sure weights in Super Sector, Sector, Group (include >= 500), Industry (include >= 1,000) are balanced not just Sector.

Any update? Will there be an alpha/beta release, for review and comment? Or are we going straight to production with this baby?

Hi Gil,

Sounds like you are locked in on the sector diversification idea, but I'm wondering if clustering might be an alternative approach? For example:

http://scikit-learn.org/stable/auto_examples/applications/plot_stock_market.html

Sectors seem sorta artificial and unjustified, at least to someone not in the field. Maybe clustering doesn't buy you anything, and you end up in the same place?

Grant

Hey everyone, here is the Tradeable500US, now dubbed the Q500US: https://www.quantopian.com/posts/the-q500us-and-q1500us. From this thread it definitely looked like the community wanted a parameterized universe creation function, so we made sure to include that in the release. Hope you have a good time with it!

@Grant. With make_us_equity_universe you can slice the dataset into whatever groups you fancy, but I have a feeling that cluster analysis might be better suited to an implementation in a risk model; could be a very cool project.

This will make it easy to quickly test out a new factor model without having to double check for the noise. Thanks Gil!