Community members who followed the original announcement posts for the Pipeline API may recall hints about a third expression type in the original system design. Classifiers
have been in our roadmap from the very beginning, since they enable a number of important operations that involve grouping expressions on Factor
outputs. Classifiers were cut from the original launch in order to make the rest of Pipeline available sooner, but we'd always planned on adding them eventually.
Today, I'm excited to announce that Pipeline's third major expression type is finally available. Attached to this post is a notebook with several detailed examples of working with Classifiers.
Some highlights from the notebook:
- There's now a new base expression type:
Classifier
. In the same way thatFactors
are expressions producing numerical-valued results, andFilters
are expressions producing boolean-valued results,Classifiers
are pipeline expressions producing categorical-valued results. Another way of thinking about classifiers is that they're computations that produce labels for assets. Canonical examples of classifiers are sector codes, and quartiles/quintiles/deciles of another factor (e.g. deciles of stocks by market cap). - There are two new
Factor
methods,demean()
andzscore()
, that take an optionalgroupby
argument, which can be passed a classifier. These methods produce new Factors that apply normalizations to the daily output of the original Factor. A detailed example of how this process works can be found in new Normalizing Results section of the help docs. - There are two new builtin classifiers, and several more in the works.
- There are several new
Factor
methods that produce classifiers by producing quantile labels. The most general of these isFactor.quantiles
, which accepts a bin count as an argument. Convenience aliases are available forquartiles
(quantiles(4)),quintiles
(quantiles(5)), anddeciles
(quantiles(10)).
I think having the ability to perform grouped aggregations and normalizations opens the door to many more sophisticated quant workflows, so I'm excited to see what the community builds with this new functionality. As always, I'm also interested to hear feedback on how these features could be made more useful. Are there other natural candidates for built-in classifiers (e.g. exchange id or country code) that could enable better algorithms? Are there other normalizations like demean
and zscore
that could be made Factor
methods (one that I have my eye on right now is the existing rank()
method)? Are there other interesting possible additions to the Filter
/Factor
/Classifier
algebra? Feedback from users on these (or other) topics would be greatly appreciated.
Happy coding,
-Scott