You'd think this would be more well known. In fourier analysis or other frequency/wavelet decomposition techniques, the data you get out depends a ton on the windowing function that you use.
Of further potential interest: This paper cites an earlier paper by Keogh and Lin with the provocative title "Clustering of time-series subsequences is meaningless", available online at https://www.cs.ucr.edu/~eamonn/meaningless.pdf
> Given the recent explosion of interest in streaming data and online algorithms, clustering of time series subsequences, extracted via a sliding window, has received much attention. In this work we make a surprising claim. Clustering of time series subsequences is meaningless. More concretely, clusters extracted from these time series are forced to obey a certain constraint that is pathologically unlikely to be satisfied by any dataset, and because of this, the clusters extracted by any clustering algorithm are essentially random. While this constraint can be intuitively demonstrated with a simple illustration and is simple to prove, it has never appeared in the literature. We can justify calling our claim surprising, since it invalidates the contribution of dozens of previously published papers. We will justify our claim with a theorem, illustrative examples, and a comprehensive set of experiments on reimplementations of previous work. Although the primary contribution of our work is to draw attention to the fact that an apparent solution to an important problem is incorrect and should no longer be used, we also introduce a novel method which, based on the concept of time series motifs, is able to meaningfully cluster subsequences on some time series datasets.
Several commenters here seem to ask "okay, so then what's the right way to cluster windows of timeseries??" Perhaps the final sentence of this abstract suggests a solution in that direction?
The suggested solution is look at motifs: windows that are highly similar when trivial matches due to window overlap are excluded. If you take this to its logical conclusion, you end up in the Matrix Profile rabbit hole. https://www.cs.ucr.edu/~eamonn/MatrixProfile.html
If you were to compute it the naive way, that would be slow, but with increasingly sophisticated algorithms developed over a series of twenty papers, you can get massive speedups. Lots of clever tricks to enjoy! Though I guess you can skip the scenic route if you want and just read the first and last papers.
The problematic projection into vectors is done using a window size of a constant number of samples? Isn't that very weird?
Unless the samples are nicely periodic, that makes the contents of a vector, and the position of a value in a vector, dependent only on the window length and the order of time values, and not the time values themselves. Since the ordering in the vector, and thus the meaning of that dimension in vector space, is likely to be effectively random, why would we expect clustering in that vector space to mean anything? It's a weird thing to do on the face of it.
Imagine you know that there are some repeating patterns in your dataset, e.g. heartbeats in an EKG. If you take two windows that happen to contain a heartbeat at the same position, the corresponding vectors will be close, and if the positions are different, the vectors should become more dissimilar the more the alignment is off.
Then if you create a number of clusters equal to the window size, you might expect that each cluster will correspond to one of the possible positions of the heartbeat within the window. But somehow that's not what happens...
The paper ignored a big honking chunk of the field, as is used in finance often - you take the delta between adjacent timepoints, something like the derivative, now you have something centered on zero and going up and down "randomly". Something like PCA also typically may use centering although that's not exactly the same. I would have loved to see an investigation of window size and such and deltad time series (usually either x2-x1, or x2/x1). In this sense the paper is just recapitulating, "if you don't delta your time series bad things happen"
In defense of sliding windows of time series, if not clustering, the principal components analysis of the matrix on page 1 is one robust way to extract a linear model for the system producing the time series. It's useful for finding frequencies of closely spaced oscillators in a time series when least squares fitting would be arduous to parameterize and convergence a battle.
Interesting proofs, though I would love more of the examples that show the dataset and the unexpected results, as well as describing what may lead to these traps and how to avoid them
... if I get it right the whole idea of clustering sliding windows is wrong, the question of "what you should do instead?" is an interesting one.
I'd imagine two answers are: (1) for time series which are somewhat periodic you might cut out individual days or weeks and try to cluster them for each other, (2) for time series which are intermittent you might create some definition of an "event" (an earthquake, or a particle passing through a detector) and then cluster events and maybe (3) for something episodic such as "heart rate during a workout" you would cluster episodes.
I don't see how the authors are calling a sinusoidal centroid meaningless even while observing that it's the dominant component. That's the meaning of the centroid. A 1-means cluster is just the average and the unique features are averaged out.
For example:
https://en.wikipedia.org/wiki/Sinc_function
https://en.wikipedia.org/wiki/Window_function#Examples_of_wi...
reply