Show HN: Fast Anomaly Detection in Graphs [pdf]

dmos62 · on April 7, 2020

Git repo https://github.com/bhatiasiddharth/MIDAS

> Up to 48% more accurate and 644 times faster than the state of the art approaches

siddhartb_ · on April 7, 2020

Thanks. We give theoretical guarantees on the False Positive Probability which can be useful to decide the parameters. Some use cases of the project include detecting: 1. Intrusions 2. Fake Ratings 3. Financial Fraud

lmeyerov · on April 7, 2020

If anyone would be interested in trying to apply these techniques to our COVID behavior change & anti-misinformation effort ProjectDomino.org, we'd be happy to share data - this may be quite helpful! Just jump into the Slack (open invite) and we can start getting you situated.

siddhartb_ · on April 7, 2020

Sounds interesting. Can you elaborate on what the data is like?

MintChocoisEw · on April 8, 2020

Im currently curating an article about the best COVID datasets and resources so i'd be interested in this

ronittaleti · on April 11, 2020

How do you think this can help change the landscape of security, judging by the fast speeds, as low as .13s on DARPA, I imagine it will help block larger numbers of suspicious activities.

debangsha · on April 7, 2020

As a researcher in anomaly detection myself, I found the premise of the paper very intriguing.

fudged71 · on April 7, 2020

Gephi has a realtime stream importer for Twitter. Would it be possible for this tool to be a Gephi plugin that could be used in realtime on the same graph?

siddhartb_ · on April 7, 2020

Definitely. It will need only very small changes to the code. I would love to add it as a plugin. Can you point to some resources that can help in incorporating MIDAS into Gephi.

fudged71 · on April 8, 2020

Awesome! It seems like this is the best place to start: https://github.com/gephi/gephi-plugins

nimish_17 · on April 7, 2020

This will propel important research into anomaly detection using dynamic graphs. Existing static graph methods have huge flaws; this would fix some of them

shivin9 · on April 7, 2020

Can you share the implementation?

siddhartb_ · on April 7, 2020

Code and Datasets we used are available at https://github.com/bhatiasiddharth/MIDAS

rathel · on April 7, 2020

Nice! I like there's a ready-to-use command line utility.

I think it'd benefit from a refactor to actually allow real-time streaming from stdin.

Do you have any hints how to choose timestamp units and how that affects parameters I should choose?

siddhartb_ · on April 7, 2020

Nice suggestion. Will definitely try to refactor. Thanks!

In most of the cases, timestamps should be with the data itself (assuming its a dynamic graph). If timestamps are to be chosen, one can select in a way seeing how many edges usually come in one time tick (second/minute etc.)

Timestamps don't affect any parameters other than alpha (temporal decay factor). You may want to check out how to decay the contribution of the past edges in the anomalousness of the current edge. If there is lot of granularity in the timestamps, a smaller alpha should be chosen. Hope it helps.

rathel · on April 7, 2020

Thank you for the explanation.

I'm looking forward to M-Stream for multi-dimensional data - but I have one question for that. Is there some preferred approach for selecting features in multi-dimensional anomaly detection?

Because I wonder if given enough dimensions, everything would be anomalous. Kind of like p-hacking works (at p=0.05 one of twenty hypotheses is falsely accepted just by sheer luck).

siddhartb_ · on April 7, 2020

Interesting question. With an increase in dimensions, we consider the correlation between the features in addition to considering them individually. The work is currently under review. Feel free to get in touch and I can update you once we release the MStream work.

RocketSyntax · on April 7, 2020

How are anomalous edges (triplet pairings) different from anomalous communities?

siddhartb_ · on April 7, 2020

We detect suddenly appearing bursts of activity which share many repeated nodes or edges, which we refer to as microclusters. E.g. denial of service (DoS) attacks in network traffic data and lockstep behavior.

Also, we detect scenarios where an individual edge may not be anomalous but along with other edges it acts as an anomalous community. For example, in the animation at https://github.com/bhatiasiddharth/MIDAS/ it may be possible that an individual edge is not anomalous but together the three malicious entities do a coordinated DoS attack.

gbasin · on April 7, 2020

This is very cool! I wonder where else this can be applied...

siddhartb_ · on April 7, 2020

Thanks, MIDAS can be used to detect intrusions, fake ratings, frauds. Basically finding anomalous and suspicious behavior in a dynamic (time-evolving) graph.

We have also extended MIDAS to detect group anomalies in higher-dimensional records e.g. event-log data or multi-attributed graphs. We will release it soon.

shera · on April 7, 2020

Will it work if anomalies are more in number than normal?

siddhartb_ · on April 7, 2020

We assume (like any anomaly detection algorithm) that the majority is normal sample. In your context, the normal samples will be considered as outliers and therefore caught by the algorithm. One way to mitigate this is to either swap the labels. Another way is to sample a subset of the anomalies and then try.

shera · on April 7, 2020

Thank you. Is there a Java implementation available?

siddhartb_ · on April 7, 2020

Currently MIDAS is available in Rust, Python, Ruby and R at https://github.com/bhatiasiddharth/MIDAS. If someone is interested to convert MIDAS to other languages, please feel free to do so and let me know so that I can add a link in the repository.

eclee · on April 9, 2020

Great job. Thanks for sharing.

srivastavag · on April 7, 2020

amazing work

kk58 · on April 7, 2020

Excellent

kk58 · on April 7, 2020

Excei