
Show HN: Midas, a Streaming Anomaly Detector. Now Implemented in Go - steve0hh
https://github.com/steve0hh/midas
======
MauranKilom
Neither title, comments here nor Github repo indicated to an outsider like me
what this is or what field it's even from. The abstract of the linked paper
helps though:

"Given a stream of graph edges from a dynamic graph, how can we assign anomaly
scores to edges in an online manner, for the purpose of detecting unusual
behavior, using constant time and memory? Existing approaches aim to detect
individually surprising edges. In this work, we propose MIDAS, which focuses
on detecting microcluster anomalies, or suddenly arriving groups of
suspiciously similar edges, such as lockstep behavior, including denial of
service attacks in network traffic data. MIDAS has the following properties:
(a) it detects microcluster anomalies while providing theoretical guarantees
about its false positive probability; (b) it is online, thus processing each
edge in constant time and constant memory, and also processes the data 162−644
times faster than state-of-the-art approaches; (c) it provides 42%-48% higher
accuracy (in terms of AUC) than state-of-the-art approaches."

~~~
specialist
Thanks. I'd love some ELI5s on this.

My last gig had some newly minted CS graduates who proposed complimenting our
monitoring with some anomaly detection.

Having done a bit of both data mining and optimization, a lifetime ago, I
could kinda follow the big data and machine learning stuff the kids were
doing.

Whereas anomaly detection and fault prediction seem magical to me.

~~~
siddhartb_
There is a recorded presentation of the paper at
[https://youtu.be/Bd4PyLCHrto](https://youtu.be/Bd4PyLCHrto) The first 5-10
minutes or so should be quite explanatory. Please feel free to let me know if
you have any specific doubts. Thanks.

------
dabei
What are some interesting applications of edge anomaly detection?

~~~
siddhartb_
Detecting Intrusions, Denial of Service (DoS) attacks, Distributed Denial of
Service (DDoS) attacks. It can also be used to detect fake profiles in Social
Networks like Twitter, Facebook, Amazon reviews, and Financial Frauds.
Basically any suspicious similar group of edges in time-evolving/dynamic
graphs.

------
siddhartb_
Code is quite neat! What are the changes it will need for including fit and
predict API?

~~~
steve0hh
I've just added the fit/predict api for Midas. Will be doing MidasR next when
I've time. :)

The change needed is basically to create a MirasR struct and re-purpose the
MidasR function into fit and predict.

------
pmfgpmfg
it's not clear to me how this API supports streaming edges. Looks like it
takes lists of sources and destinations and loads them into memory.

~~~
steve0hh
Thanks for your comments. I'll be updating the readme to better reflect it..

I've just added the midas struct, in which you can do the following:

```

m := midas.NewMidasModel(2, 769, 9460)

m.FitPredict(2,3,1) // inputs will be your src,dst and time (from your stream)

m.FitPredict(2,3,1) // same here

```

The MidasR model is next. :)

~~~
floren
Could you go into the selection criteria for those 3 numbers? I see 2 and 769
on the original MIDAS repo as default number of hash functions and buckets,
but without explanation of why they're chosen. The third argument, "m", I
don't see explained anywhere.

~~~
steve0hh
I adapted the code from the original paper from
[https://github.com/bhatiasiddharth/MIDAS/](https://github.com/bhatiasiddharth/MIDAS/)
.

And from one of the
issues([https://github.com/bhatiasiddharth/MIDAS/issues/7#issuecomme...](https://github.com/bhatiasiddharth/MIDAS/issues/7#issuecomment-597185695)),
M can be any reasonable value because it's just used to compute the edge hash
([https://github.com/steve0hh/midas/blob/master/edgehash.go#L4...](https://github.com/steve0hh/midas/blob/master/edgehash.go#L42)).

Hope it clarifies.

