

Are We Ready to Kill Thresholds? - obfuscurity_
http://obfuscurity.com/2013/06/Are-We-Ready-to-Kill-Thresholds

======
Pewpewarrows
Forgive me if this is a dumb comment to make, as I'm just barely starting to
get into monitoring and the statistics knowledge that goes along with it, but
adaptive fault detection does tend to scare me a bit. In the event that a
problem isn't a spike, and instead gradually builds up over hours/days/weeks,
I wouldn't be confident in something picking a dynamic threshold for me. I'd
be afraid of it deeming the ever-rising resource usage as normal behavior, if
it happens slow enough, and me not being alerted before it's too late (servers
becoming unresponsive).

~~~
obfuscurity_
That's not at all a dumb comment. As I alluded to in the post, I think it's
important that we understand how these systems determine what is - or isn't -
an abnormality or fault. Unfortunately, that often means revealing their
"secret sauce" and risk exposing their product differentiation. It's going to
be interesting to see how these products earn our trust.

~~~
jonlives
Absolutely - this is one of the reasons that we made Kale open sourced so that
people can see what we consider an anomaly, and adapt for their own use cases
if needed. If your anomaly detection contains secret sauce, it'll be very hard
for people to have confidence in it.

