Hacker News new | past | comments | ask | show | jobs | submit login

This is very interesting and a source of frustration I have with New Relic and the other alerting services we use. New Relic uses a 5-minute rolling average for error rates, and alerts when that average goes above some threshold. However, that means that it takes ~5 minutes from a spike occurring to an alert being created - even if the error rate has increased to 50%.

It would be much better for it to be doing this sort of outlier detection - a gradual increase in error rate to 3% should not trigger a critical alert, whereas a big jump in error rates should trigger an alert quickly.

Has anyone implemented a system like this?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact