Back to Community
paper - "A Dissimilarity Measure for Comparing Subsets of Data: Application to Multivariate Time Series"

I just came across a paper that appears to be relevant. The authors apply their algorithm to market data:

Similarity is a central concept in data mining.
Many techniques, such as clustering and classi ca-
tion, use similarity or distance measures to com-
pare various subsets of multivariate data. However,
most of these measures are only designed to nd
the distances between a pair of records or attributes
in a data set, and not for comparing whole data
sets against one another. In this paper we present
a novel dissimilarity measure based on principal
component analysis for doing such comparisons be-
tween such data sets, and in particular time series
data sets. Our measure accounts for the correlation
structure of the data, and can be tuned by the user to
account for domain knowledge. Our measure is use-
ful in such applications as change point detection,
anomaly detection, and clustering in elds such as
intrusion detection, clinical trial data analysis, and
stock analysis.