> Up to 48% more accurate and 644 times faster than the state of the art approaches
I think it'd benefit from a refactor to actually allow real-time streaming from stdin.
Do you have any hints how to choose timestamp units and how that affects parameters I should choose?
In most of the cases, timestamps should be with the data itself (assuming its a dynamic graph). If timestamps are to be chosen, one can select in a way seeing how many edges usually come in one time tick (second/minute etc.)
Timestamps don't affect any parameters other than alpha (temporal decay factor). You may want to check out how to decay the contribution of the past edges in the anomalousness of the current edge. If there is lot of granularity in the timestamps, a smaller alpha should be chosen. Hope it helps.
I'm looking forward to M-Stream for multi-dimensional data - but I have one question for that. Is there some preferred approach for selecting features in multi-dimensional anomaly detection?
Because I wonder if given enough dimensions, everything would be anomalous. Kind of like p-hacking works (at p=0.05 one of twenty hypotheses is falsely accepted just by sheer luck).
Also, we detect scenarios where an individual edge may not be anomalous but along with other edges it acts as an anomalous community. For example, in the animation at https://github.com/bhatiasiddharth/MIDAS/ it may be possible that an individual edge is not anomalous but together the three malicious entities do a coordinated DoS attack.
We have also extended MIDAS to detect group anomalies in higher-dimensional records e.g. event-log data or multi-attributed graphs. We will release it soon.