One of the biggest remaining holes in Pipeline since the launch of Classifiers has been lack of support for string-typed data. Support for strings was merged in Zipline about a week ago, and as of today we now have support on Quantopian for loading string data in Pipelines.
There are two major use-cases for strings:
- Converting them into booleans via string-matching predicates (e.g. "startswith").
- Using them as grouping keys to transform numerical expressions (e.g. Z-Score asset returns by country code).
The groupby use-case works for strings exactly the way it does for integer columns like
Classifier announcement post provides an overview of grouping operations, and there's a new Working with Strings section in the Pipeline docs that provides another example with a string column.
The use-case of implementing filters based on string data is supported by a suite of new methods on
More information on each of these methods is available in the Classifier API Reference.
To demonstrate the kinds of operations one might want to do with string-based filters, I've attached a notebook that implements 9 common universe selection criteria in Pipeline and analyzes their outputs.
This analysis is a step toward eventually providing recommended synthetic trading universes (e.g. a "Quanto 500" or "Quanto 3000") as efficient Pipeline built-ins, so I'm interested to hear if there are other interesting filtering criteria that could be included in the analysis.