
Don’t make this mistake when clustering time series data - NicoJuicy
https://towardsdatascience.com/dont-make-this-mistake-when-clustering-time-series-data-d9403f39bbb2?source
======
ken
That title seems awfully clickbaity to me.

How about something with some meat on its bones, like "Sliding windows on time
series subsequences yields meaningless clusters"?

~~~
allwynpfr
This is one of the reasons why I hate browsing other news/article aggregation
websites and rather enjoy HN. _The title should be a concise excerpt of the
article it holds or better still tell me exactly what I stand to gain /learn
if I read the article, not what I will loose!_

------
jrauser
Keogh has produced a bunch of interesting work. Toward Parameter Free Data
Mining is a gem:
[https://www.cs.ucr.edu/~eamonn/SIGKDD_2004_long.pdf](https://www.cs.ucr.edu/~eamonn/SIGKDD_2004_long.pdf)
Of course SAX is also super interesting:
[https://www.cs.ucr.edu/~eamonn/SAX.pdf](https://www.cs.ucr.edu/~eamonn/SAX.pdf)

~~~
bmiller2
Not only is it great work, it's greatly named work. He's brought such gems to
the community as "Experiencing SAX:..." and "Hot SAX", and "Group SAX".

~~~
eamonn
Many thanks for your kind words. I have to confess, it was Jessica Lin who
came up with the name "SAX". But I did run with it ;-)

------
vslira
Just to give some context, the paper referenced by the post[0] is a bit old.
Eamonn Keogh is (to my knowledge) the current head of the research group that
developed the Matrix Profile, which purports to be useful in clustering[1].
Actually on their main introduction to the MP[2] they use a chart to explain
motif discovery and chains that is very similar to the one on section 6 of
[0].

To be fair, I don’t remember any of the papers of the group explicitly
discussing subsequence clustering, but one hopes they have advanced on that
front.

[0]
[https://www.cs.ucr.edu/~eamonn/meaningless.pdf](https://www.cs.ucr.edu/~eamonn/meaningless.pdf)

[1]
[https://www.cs.ucr.edu/~eamonn/MatrixProfile.html](https://www.cs.ucr.edu/~eamonn/MatrixProfile.html)

[2]
[https://www.cs.ucr.edu/~eamonn/Matrix_Profile_Tutorial_Part1...](https://www.cs.ucr.edu/~eamonn/Matrix_Profile_Tutorial_Part1.pdf)

------
jerkstate
My intuition is that clustering subsequences of time series data should easily
find diurnal/seasonal patterns - am I thinking about this wrong?

