

IMVU: Detect failure with statistics - sstone
http://www.evanmiller.org/poisson.pdf

======
gjm11
So, here's what this is about.

One-sentence summary: If you're monitoring something to look for problems, you
shouldn't treat each observation independently; multiple somewhat-low or
somewhat-high observations may be a sign of trouble even if each on its own
isn't enough to worry about. In more detail:

Suppose you have some number you're monitoring. It might be network latency,
number of customer signups, temperature, fraction of your email that's spam,
whatever. You would like to be notified if it starts behaving unexpectedly --
maybe your network is down, someone just trashed your company in the media, a
fan has failed, or your spam filter has gone nuts.

There's a technique called Holt-Winters forecasting, which looks at historical
data and assumes it's made up of something constant, something periodic (e.g.,
daily variation), and noise; it generates predictions, which include a measure
of uncertainty as well as a predicted value. Then some guy called Brutlag
developed a way to compare observations with Holt-Winters predictions from the
past, and determine whether each new observation is suspect.

However, Brutlag's analysis basically treats each new measurement
independently. So, e.g., suppose you have a number that's always non-negative
(number of customer signups, say), and suppose the H-W prediction says that a
value of 0 isn't too improbable. Then Brutlag's approach will not complain
even if from some point onward _every single measurement is 0_ \-- because
each one, on its own, is reasonably plausible.

Evan Miller has a more sophisticated way of looking for anomalies. Each time a
new observation comes in, he looks at the plausibility of that observation,
just like Brutlag does; but he also tries adding up the last N observations
and comparing them with expectations for the sum of N consecutive
observations, for N=2,3,...T (for some suitably chosen limit T). So if you
get, say, a lot of zeros, they may not be very implausible on their own, but
getting five zeros in a row might be enough to trigger a warning.

Miller gives an example where IMVU caught a network problem using this
technique -- they were watching the number of customers who invited contacts
to open an account -- which wouldn't have been caught by the Brutlag method,
for exactly the reason given above: they had a run of quite-low measurements,
but none of them on its own was low enough for the Brutlag method to complain,
because Brutlag's lower confidence limit was zero.

------
noelwelsh
Caveat: I only skimmed the paper.

The combination of two things set off alarm bells: firstly, the problem
observed is that a continuous sequence of zero readings is erroneously treated
as an "ok". Second, the variance is modelled as a normal distribution (Section
3.1). Since the normal is symmetric, if the mean is sufficiently close to zero
readings below zero will be within one standard deviation. You can't ever have
readings below zero in the type of systems under consideration. It seems this
is the flaw in the original work, and furthermore this assumption is carried
through to the fix (with some ad-hoc modifications [remember, I only skimmed
the paper]). It seems to me that a cleaner model would drop this assumption of
normality and use an asymmetric distribution (say, the Poisson) in its place.
I would be interested in any comments from those who read the paper in more
depth.

~~~
nethergoat
This is precisely the approach they take. Section 3.4 ("The Model") begins,
"The heart of the model is to treat incoming events as a Poisson process..."

------
zdw
This is somewhat tangential, but might help if other people are attempting to
do similar things in the future...

A relation of mine is a pharmacist at a hospital. At his workplace, they have
automated drug dispensing machines that can be used by employees to obtain
medication for patients - saves a lot of time and work for dispensing normal
stuff like painkillers, etc.

These machines use a statistical method to flag when an employee is pulling
out more than the usual amount of medicine, as there are infrequent cases of
employees selling/using it themselves.

The machines were programmed to use standard deviation for this - if what you
draw is within 2 standard deviations of the mean of all users, you're fine.

The problem is that, on one occasion, an employee in a small section was a
junkie and pulled out so much of a certain drug that the mean was skewed to
the point that they were still within 2 std dev of it, and it wasn't noticed
for a few months.

So, to wrap up the story, you probably don't want to purely use statistics to
test for failure. You want some basic sanity checks in there.

~~~
hle
"All the models are wrong but some are useful". I think that in this case the
model was just wrong... or a bit simplistic.

