
Making Sense of Standard Deviation - motxilo
http://amarsagoo.blogspot.com/2007/09/making-sense-of-standard-deviation.html
======
jasonlotito
A story of my last use of Standard Deviation.

My day to day job deals with credit card processing. A lot of my job deals
with ensuring that transactions occur securely and reliably. It's a tedious
job, it's not exciting, and involves a lot of testing, but I love the sense of
knowing that actual money is flowing through a system I built.

Anyways, one of the things I wanted to do was build an automate alert system
that would notify me of problems with processing transactions. Looked at it
from high up, the system is fairly stable, and while it would be easy to
notice if transactions suddenly stopped on the entire system, this rarely, if
ever, actually happens (and hasn't happened except for planned < 1 minute
outages).

However, consider the system runs through it many small individual sites,
looking at all transactions is fairly useless. Instead, I wanted a warning to
notify me when any specific account was suffering. Each account is different,
and accounts for a variable number of transactions each day. Some accounts do
more, some less. Their are other variables: some accounts do well at different
types of the day because of where they are promoted in the world. Weekends
generally see an uptick, but this again is variable.

So, I developed a system (using standard deviation) that essentially looked at
an accounts history for the past X time period for a certain period of time
throughout the day. Some accounts are inspected by looking at the numbers in
the past hour (accounts with steady transactions), others are looked at over
the last few hours, others are looked at over the day.

Obviously, we don't alert ourselves to certain cases that fall outside
standard deviation, and we've adjusted the numbers to look at other areas, but
the result of using standard deviation in this way suddenly opened up a new
way at looking at our numbers and evaluating the current status of our system,
as well as the accounts using it. Even if a problem doesn't exist on our end,
we can alert the people that handle these accounts that a problem might exist,
allowing them to take necessary action.

Understanding standard deviation, understanding how it can be used (along with
other associated tools of math) makes for some really interesting things that
you can do to improve your system as a whole.

~~~
tricky
A similar system was put in place by a pharmacy system to detect employee
narcotics abuse. They'd throw an alarm when a pharmacist started dispensing
pills at a rate outside their SD.

The system worked great except in the case where a pharmacist slowly ramped up
usage as his addiction worsened. Seems like that's a feature that works well
in your case. As a business sells more stuff, their transactions will ramp up
and your model will still hold.

nice.

~~~
jasonlotito
That's one of the cases we don't alert ourselves too: getting a lot of
transactions is a good thing. =)

In the reverse, however, is another issue entirely. A site that is slowly
getting less traffic won't be alerted. However, this is where we have to draw
the line. We aren't looking for long term trends. Rather, immediate issues.
Basically, if a site that normally has 20 attempted transactions during the
last hour suddenly has 0, is this a problem?

There is still a lot of tweaking that needs to go into the system. We're also
exploring how to use this in other areas as well. Since all these accounts are
internal, it's easy for me to get feedback.

The end goal, however, is to be notified as soon as possible when something
occurs that will negatively affect our bottom line.

------
tel
A small but vital correction is that the "biased" estimator will approach a
population's std dev as the sample size increases. This is clear since the
adjustment is N/(N-1) which tends to 1 with large N.

It's not often a huge deal, but I'm unconvinced that biased estimators deserve
such a derogatory name. Using a unbiased estimate does not actually mean
you've got a better estimator. I think this is part of why there seems to be
no intuitive description of why the N/(N-1) factor removes bias. You instead
need to invoke the powerful, abstract ideas of sufficiency and completeness.

~~~
lmkg
The "intuitive" explanation I've heard is that (N-1) is the number of degrees
of freedom of a sample with N data points, so when you divide by (N-1) you're
normalizing some valuation of the "spread" of the sample against a measure of
the _potential_ for the sample to "spread."

I think how "intuitive" this idea is depends on how well you understand
thermodynamics. I, clearly, do not.

~~~
tel
I've heard that description before and find it more ad hoc than intuitive. Is
there some valid mathematical object K such that often times you will
normalize unbiased estimators by N/(N-K)? Is this K sometimes equal to some
measure of the number of equations/constraints you have? Does this seem to
hold a lot of force in ANOVA especially?

Sure, sure, sure. I still don't see why this normalization is required to make
a _better_ estimate (excepting when you define "better" as an unbiased
estimator, maybe the MVUE).

------
klenwell
I don't have a lot of practical day-to-day use for standard deviation and
possess a vague understanding of it. Like the author, I resort to Wikipedia
periodically and go away with no stronger an intuitive sense than when I
began. I think the author offers a great explanation for the advantages of a
more intuitive understanding. This article and its simple graphics definitely
provided a more intuitive sense of the concept.

I'll wait a while for the statistics experts to weigh in here before I write
it to my cognitive hard disk. :)

------
crikli
I explain it like this:

It's 3rd and 4. You have two running backs. Running back A averages 8 yards a
carry with a standard deviation of 6. Running back B averages 5 yards a carry
with a standard deviation of half a yard.

Which guy do you choose? If you just look at averages, you probably would
choose A. But if you think of the standard deviation as an indication of
consistency you choose B, because his 5 yard gains are very consistent,
whereas A might bust one for big yards, but he's also likely to get dropped
for a loss.

~~~
jasonlotito
Ahh yes, but it's not that simple. You can also account for the plays these
are on - some people crack under pressure. If you just add up the numbers
without accounting for other conditions, results can change dramatically.

This isn't to say your example is wrong, but simply to say that all data is
not equal and to make sure you use the right data for whatever you're trying
to accomplish.

------
amalcon
The main reason to use standard deviation instead of mean absolute deviation
is that

    
    
      sqrt((f(x)-g(x))^2)
    

is differentiable, while

    
    
      |f(x)-g(x)|
    

is not. Though exaggerating the larger differences is often a desirable
property, the ability to do calculus is nearly always a desirable property.

~~~
mturmon
You point out a real problem with the article. The given rationale for (.)^2
vs. |.| is just wacko ("there is obviously more variation in the second data
set than the first").

The reason it's mistaken is, there are other examples, typically with just one
or a handful of outliers, in which the SD is "obviously" too high compared to
mean absolute deviation.

Futhermore, these type of examples are common in practice, so the SD is
problematic to use if you have outliers.

Often, in robust statistics, the _Median_ Absolute Deviation (which is
conventionally labeled MAD,
<http://en.wikipedia.org/wiki/Median_absolute_deviation>) is used. As an
estimate of scale, it's much more robust to outliers. But of course, you can't
differentiate it!

------
yummyfajitas
The author makes a minor mistake.

You can make an unbiased estimate of the _variance_ using the formula s^2 =
(sum of squared deviations) / (N-1). However, taking the square root of this
gives you a biased estimate of the _standard deviation_ due to the concavity
of sqrt.

<http://en.wikipedia.org/wiki/Bessels_correction>

~~~
wnoise
Heh. Of course being a non-linear transformation, bias will creep in if you
just naively take the non-biased value, but people rarely think about that.
Thanks for pointing this out.

(BTW, the link needs an apostrophe:
<http://en.wikipedia.org/wiki/Bessel%27s_correction> )

------
NY_Entrepreneur
Can think of standard deviation much like a _distance_.

So, if have random variables X and Y, if they have finite means and standard
deviations, and if we denote standard deviation by Std(X) and variance by
Var(X) and set Z = X + Y, then Var(X) = Std(X)^2 and we get the Pythagorean
theorem

Var(Z) = Var(X + Y) = Var(X) + Var(Y)

if we can assume that X and Y are _uncorrelated_ which is true if X and Y are
independent. So, uncorrelated is analogous to perpendicular in geometry.

