
Show HN: Matrixprofile-ts – A Python library for timeseries motifs and discords - gdpq11
https://github.com/target/matrixprofile-ts
======
cwal37
This blog post[1] from the developer helped me to understand this better. It's
linked on the github page, but I wanted to provide the link for more direct
clarity.

 _Astonishingly, we can process 20 years’ worth of data, sampled every five
minutes, in less than 20 seconds._

That sounds quite promising.

[1] [https://tech.target.com/2018/12/11/matrix-
profile.html](https://tech.target.com/2018/12/11/matrix-profile.html)

------
kiv6
Thanks for publishing!

I'd love to hear more details about how this is used in production. For
example, if you have an anomaly that occurs only twice in a long dataset, the
two anomalies would match each other and would have a low matrix profile value
and be considered equally normal as a pattern that recurs thousands of times,
correct?

I would normally think of an anomaly as a point in a low density region of
space, but Matrix Profile seems to have a more strict definition as a point
that has a large distance to its nearest neighbour - is that fair?

I'm also interested in your process for setting the parameter of the subquery
length. Do you have to already know something about the expected length of an
anomaly/motif, or do you sweep over multiple values?

How does this tie into alerting? Do you set a threshold on the matrix profile
value that would fire an automated alert? Or is this used more as an offline
tool to explore the dataset?

Minor nitpick: on the Target blog post, Prof. Keogh's name is spelled wrong
(as Keough)

~~~
gdpq11
The nice thing about how the Matrix Profile is built is that you can slice up
different regions of time to focus on your use case. To build the MP you start
with an NxN matrix that lists the distance between every point (or technically
N-m+1 x N-m+1), then find the overall closest distance for each point.
However, we've found that first "updating" the NxN matrix allows you to do
analyses like your two anomaly example.

In that case, you'd create a parameter "w" that specifies the boundary between
when two matching points are a pattern, or if enough time has elapsed so that
they should be considered two anomalies. In the NxN matrix, for the ith row
you'd then set every value outside the i+w/i-w boundary to infinity. In that
way, the resulting Matrix Profile would account for your situation.

Due to the algorithm's speed we do often sweep over multiple values, but try
to use domain knowledge where we can. And for alerting, we sometimes have
labeled data that we can calibrate the threshold to, but often times that's a
matter of customer trial and error.

~~~
jamesb93
You say that the algorithm is fast, and the literature certainly points to
this too but I tried the python implementation (linked here) on some audio
data sets. 1 second of audio at a reasonable quality is 44.1k data points and
it was taking minutes to process this data.

I tried an R implementation which was multi-threaded and a lot faster, but
still the algorithm took ages to test lots of different window sizes and data
sets.

~~~
gdpq11
That's odd, we've definitely processed larger datasets much quicker than that.
Feel free to raise an issue on Github and we can take a look.

~~~
jamesb93
Just to advance my previous comment; right now the most interesting feature to
me is the motif detection, however, the motif's are always incredibly short
(or a fixed size?). Please excuse me ignorance in this regard, I'm not too
versed in the algorithm. Is there any use cases where you have looked at
longer motifs?

~~~
gdpq11
There's some interesting work around multi-month seasonality that covers what
you're talking about, but Eamonn Keogh would probably have a better answer
that I :). Also, in regards to your performance issue, did you use the pip
version or the code directly from Github?

~~~
jamesb93
I used the pip version.

------
jmmcd
See also the tutorial slides available here
[https://www.cs.ucr.edu/~eamonn/MatrixProfile.html](https://www.cs.ucr.edu/~eamonn/MatrixProfile.html)

------
boltzmannbrain
Interested folks may also want to check out the Python library nupic for
streaming time-series analytics [1] and anomaly detection [2].

[1] [https://github.com/numenta/nupic](https://github.com/numenta/nupic)

[2]
[https://www.sciencedirect.com/science/article/pii/S092523121...](https://www.sciencedirect.com/science/article/pii/S0925231217309864?via%3Dihub)

------
marmaduke
It seems like the method assumes stationarity, can anyone comment on how this
might be useful if one has brown spectrum?

It also seems like it requires some normalization of the data, this should
counted as an effective parameter of the method.

In any case, it’d be useful to a GLM with these profiles against bug or outage
reports, log rates etc.

------
eamonnkeogh
It is possible that two occurrences of the same motif can overlap. And It is
possible that two different motifs can overlap. Lets see both cases, in string
analogs. We will start with the second case, using an example from John
Cleese…

“…itself…and hence the very meaning of life itselfish bastard, I'll kick him…
selfish…” Here there is a motif “itself” and there is a motif “selfish”. Note
that one occurrence of each motif appears overlapping in “itselfish”. \--- Now
for the first case: “….soihsehihrhewCOMICOMICireoqiwwherhqwe…”

Here we have a motif “COMIC”, but they share a letter, the central ‘C’. We can
allow motifs to share more letters, but they cannot share ALL letters, that
would be a trivial match.

The matrix profile has a simple parameter (the exclusion zone) that lets you
control how much overlap you want to allow.

~~~
HMH
Assuming this to be the reply to my question:

I probably was a bit imprecise but what I want to know is if there is a way to
apply this to data that are possibly in a superposition and overlapping
meaning that you only see the sum of the events. For example if one wants to
analyze a changing electric or magnetic field.

Nevertheless, the points you mentioned are something I did not think about at
first, interesting once again.

------
Topolomancer
Thanks for making this available! As someone who also has some TS analysis to
do, I appreciate the fact that the code is available and in Python!

I am curious: are you affiliated with the UCR people? What's your opinion on
Keogh's claims of the matrix profile making many TS problems easy or trivial?

~~~
gdpq11
I'm not affiliated with UCR, though I am a product of the UC system :)

I agree with Keogh that Matrix Profile can help solve a very wide range of
problems, but you usually have to go a little bit deeper than just calculating
the topline Matrix Profile. A good example of this is that if you calculate
the Matrix Profile for something with daily seasonality (say, in-store retail
sales), you'll see the same daily pattern in the Matrix Profile. The
straightforward fix for this is to normalize by time window (say, only compare
the Matrix Profile at the same time each day).

------
bra-ket
Good tutorial:
[https://www.cs.ucr.edu/~eamonn/Matrix_Profile_Tutorial_Part1...](https://www.cs.ucr.edu/~eamonn/Matrix_Profile_Tutorial_Part1.pdf)

------
thenaturalist
Thanks for making this available.

Slight beginner question pertaining to the anomaly detection with STAMPI
example: How exactly do the graphs showcase a "detection" by the Matrix
Profile?

While the signal graph is clearly out of bounds (100% above last upper bound),
the relativ Matrix Profile's "spike in value" fits perfectly within the bounds
of that graph.

~~~
gdpq11
Yeah, this is actually a good example of why it's important to add a bit to
the raw Matrix Profile. The point is anomalous with respect to the pattern
preceding it (the "sawtooth"), so in this case one needs to consider the whole
Matrix Profile. It's a good callout in that the graph isn't a complete anomaly
detection system; it more demonstrates how a single anomalous point can impact
the Matrix Profile value.

~~~
thenaturalist
Great, thanks for the context!

------
slamstacken
Can this be used for multivariate time series? If so, do you have an example?

~~~
aouyang2
Would this be along those lines?
[https://www.cs.ucr.edu/%7Eeamonn/Motif_Discovery_ICDM.pdf](https://www.cs.ucr.edu/%7Eeamonn/Motif_Discovery_ICDM.pdf)

~~~
gdpq11
Exactly

------
HMH
Thanks, very interesting!

So what about superpositions of events i.e. two motifs overlapping, anyone any
thoughts on that?

I guess this just means a lot more swiping of observed patterns over the
timeseries and thus being somewhat slow.

------
filleokus
Should the URL maybe be changed to [https://github.com/target/matrixprofile-
ts](https://github.com/target/matrixprofile-ts)? I'm interested in the code,
not in the "stargazers".

~~~
gdpq11
Shoot, sorry about that! Definitely a typo.

~~~
gota
Also, it seems you've mispelled Eamonn Keogh's name and got 'Keough' instead.

~~~
gdpq11
Thanks for the heads up! We've made the correction.

