
Show HN: Fast Anomaly Detection in Graphs [pdf] - siddhartb_
https://www.comp.nus.edu.sg/~sbhatia/assets/pdf/midas.pdf
======
dmos62
Git repo
[https://github.com/bhatiasiddharth/MIDAS](https://github.com/bhatiasiddharth/MIDAS)

> Up to 48% more accurate and 644 times faster than the state of the art
> approaches

~~~
siddhartb_
Thanks. We give theoretical guarantees on the False Positive Probability which
can be useful to decide the parameters. Some use cases of the project include
detecting: 1\. Intrusions 2\. Fake Ratings 3\. Financial Fraud

------
lmeyerov
If anyone would be interested in trying to apply these techniques to our COVID
behavior change & anti-misinformation effort ProjectDomino.org, we'd be happy
to share data - this may be quite helpful! Just jump into the Slack (open
invite) and we can start getting you situated.

~~~
siddhartb_
Sounds interesting. Can you elaborate on what the data is like?

------
ronittaleti
How do you think this can help change the landscape of security, judging by
the fast speeds, as low as .13s on DARPA, I imagine it will help block larger
numbers of suspicious activities.

------
debangsha
As a researcher in anomaly detection myself, I found the premise of the paper
very intriguing.

------
fudged71
Gephi has a realtime stream importer for Twitter. Would it be possible for
this tool to be a Gephi plugin that could be used in realtime on the same
graph?

~~~
siddhartb_
Definitely. It will need only very small changes to the code. I would love to
add it as a plugin. Can you point to some resources that can help in
incorporating MIDAS into Gephi.

~~~
fudged71
Awesome! It seems like this is the best place to start:
[https://github.com/gephi/gephi-plugins](https://github.com/gephi/gephi-
plugins)

------
nimish_17
This will propel important research into anomaly detection using dynamic
graphs. Existing static graph methods have huge flaws; this would fix some of
them

------
shivin9
Can you share the implementation?

~~~
siddhartb_
Code and Datasets we used are available at
[https://github.com/bhatiasiddharth/MIDAS](https://github.com/bhatiasiddharth/MIDAS)

~~~
rathel
Nice! I like there's a ready-to-use command line utility.

I think it'd benefit from a refactor to actually allow real-time streaming
from stdin.

Do you have any hints how to choose timestamp units and how that affects
parameters I should choose?

~~~
siddhartb_
Nice suggestion. Will definitely try to refactor. Thanks!

In most of the cases, timestamps should be with the data itself (assuming its
a dynamic graph). If timestamps are to be chosen, one can select in a way
seeing how many edges usually come in one time tick (second/minute etc.)

Timestamps don't affect any parameters other than alpha (temporal decay
factor). You may want to check out how to decay the contribution of the past
edges in the anomalousness of the current edge. If there is lot of granularity
in the timestamps, a smaller alpha should be chosen. Hope it helps.

~~~
rathel
Thank you for the explanation.

I'm looking forward to M-Stream for multi-dimensional data - but I have one
question for that. Is there some preferred approach for selecting features in
multi-dimensional anomaly detection?

Because I wonder if given enough dimensions, everything would be anomalous.
Kind of like p-hacking works (at p=0.05 one of twenty hypotheses is falsely
accepted just by sheer luck).

~~~
siddhartb_
Interesting question. With an increase in dimensions, we consider the
correlation between the features in addition to considering them individually.
The work is currently under review. Feel free to get in touch and I can update
you once we release the MStream work.

------
RocketSyntax
How are anomalous edges (triplet pairings) different from anomalous
communities?

~~~
siddhartb_
We detect suddenly appearing bursts of activity which share many repeated
nodes or edges, which we refer to as microclusters. E.g. denial of service
(DoS) attacks in network traffic data and lockstep behavior.

Also, we detect scenarios where an individual edge may not be anomalous but
along with other edges it acts as an anomalous community. For example, in the
animation at
[https://github.com/bhatiasiddharth/MIDAS/](https://github.com/bhatiasiddharth/MIDAS/)
it may be possible that an individual edge is not anomalous but together the
three malicious entities do a coordinated DoS attack.

------
gbasin
This is very cool! I wonder where else this can be applied...

~~~
siddhartb_
Thanks, MIDAS can be used to detect intrusions, fake ratings, frauds.
Basically finding anomalous and suspicious behavior in a dynamic (time-
evolving) graph.

We have also extended MIDAS to detect group anomalies in higher-dimensional
records e.g. event-log data or multi-attributed graphs. We will release it
soon.

------
shera
Will it work if anomalies are more in number than normal?

~~~
siddhartb_
We assume (like any anomaly detection algorithm) that the majority is normal
sample. In your context, the normal samples will be considered as outliers and
therefore caught by the algorithm. One way to mitigate this is to either swap
the labels. Another way is to sample a subset of the anomalies and then try.

~~~
shera
Thank you. Is there a Java implementation available?

~~~
siddhartb_
Currently MIDAS is available in Rust, Python, Ruby and R at
[https://github.com/bhatiasiddharth/MIDAS](https://github.com/bhatiasiddharth/MIDAS).
If someone is interested to convert MIDAS to other languages, please feel free
to do so and let me know so that I can add a link in the repository.

------
eclee
Great job. Thanks for sharing.

------
srivastavag
amazing work

------
kk58
Excellent

------
kk58
Excei

