Community members who followed the original announcement posts for the Pipeline API may recall hints about a third expression type in the original system design.
Classifiers have been in our roadmap from the very beginning, since they enable a number of important operations that involve grouping expressions on
Factor outputs. Classifiers were cut from the original launch in order to make the rest of Pipeline available sooner, but we'd always planned on adding them eventually.
Today, I'm excited to announce that Pipeline's third major expression type is finally available. Attached to this post is a notebook with several detailed examples of working with Classifiers.
Some highlights from the notebook:
- There's now a new base expression type:
Classifier. In the same way that
Factorsare expressions producing numerical-valued results, and
Filtersare expressions producing boolean-valued results,
Classifiersare pipeline expressions producing categorical-valued results. Another way of thinking about classifiers is that they're computations that produce labels for assets. Canonical examples of classifiers are sector codes, and quartiles/quintiles/deciles of another factor (e.g. deciles of stocks by market cap).
- There are two new
zscore(), that take an optional
groupbyargument, which can be passed a classifier. These methods produce new Factors that apply normalizations to the daily output of the original Factor. A detailed example of how this process works can be found in new Normalizing Results section of the help docs.
- There are two new builtin classifiers, and several more in the works.
- There are several new
Factormethods that produce classifiers by producing quantile labels. The most general of these is
Factor.quantiles, which accepts a bin count as an argument. Convenience aliases are available for
I think having the ability to perform grouped aggregations and normalizations opens the door to many more sophisticated quant workflows, so I'm excited to see what the community builds with this new functionality. As always, I'm also interested to hear feedback on how these features could be made more useful. Are there other natural candidates for built-in classifiers (e.g. exchange id or country code) that could enable better algorithms? Are there other normalizations like
zscore that could be made
Factor methods (one that I have my eye on right now is the existing
rank() method)? Are there other interesting possible additions to the
Classifier algebra? Feedback from users on these (or other) topics would be greatly appreciated.