
Standard deviation is ready for retirement: Nassim Taleb (2014) - chollida1
https://www.edge.org/response-detail/25401
======
CorvusCrypto
Great writeup. I think this should go hand in hand with articles explaining
why p <= 0.05 is not an end-all be-all confirmation of your
hypotheses/conclusions. Before I jumped to the software engineering world, I
did biostats and was essentially a "bioinformatician". You quickly realize how
many experts in the field misuse statistical tools entirely while using their
results to prove a point, or worse, draw conclusions incorrectly from their
results. A big faux pas I saw a lot was using normality-assumed parametric
tests on non-normal data where the skew was clearly significant (i.e. you
couldn't get away with it like some non-normal data dists.). Seriously, go
take a look at some bioinfo papers (or any biology papers for that matter),
it's getting pretty bad. However when we learn (even in master's or Ph.D.
programs) about these mathematical tools, we are not often taught about what
the math means. I'm sure I'm not the only one to have been taught formulas as
a means to prove your research rather than the theory/intuition behind them.
Luckily there are articles such as these that force you to step back and
consult the maths again to learn what is really going on.

As an aside, I'm sure this sort of thing happens in all walks of life, not
just maths/data science. Some programmers don't understand the intuition of
certain things that they code and when it is time to explain they will likely
freeze because they know how to code it, but don't really know why the code
works fundamentally.

~~~
stdbrouw
> A big faux pas I saw a lot was using normality-assumed parametric tests on
> non-normal data where the skew was clearly significant (i.e. you couldn't
> get away with it like some non-normal data dists.)

Depends on how much data you have. With a couple hundred observations, you can
have as much skew as you like and the normal approximation will still be
pretty good. I only mention this because there are a lot of misconceptions
about how statistics relies on normal data, when really it mostly just relies
on the distribution of the mean being normal, which is pretty much a given
because of the central limit theorem. There are much worse sins and abuses --
which yes, unfortunately you do see all the time in scientific papers.

~~~
NarcolepticFrog
In general it is a misconception that real data always follows a normal
distribution. It is true that if you sum many /independent/ random quantities,
then the result is approximately normal (e.g., the central limit theorem and
generalizations). But real data tends not to be independent. Many real world
quantities follow extremely skewed distributions. E.g, Zipf's law, Korcak's
law, Pareto's laws.

For a concrete example, if you look at the distribution of the number of
friends users have on social networks, you might expect that 95% of people
have the mean number of friends +/\- a few standard deviations (since this
would be the case for a normal distribution). It would be virtually impossible
for someone with a number of friends that is say thousands of standard
deviations away to exist, yet there will be many such users in social networks
(celebrities, bot networks, etc). In reality, the empirical distribution in
this case follows an extremely skewed distribution.

~~~
stdbrouw
And yet a Pareto distribution still has a mean, and the sampling distribution
of that mean is approximately normally distributed.

Of course I'm not claiming that you can just pretend that a Pareto
distribution is a normal distribution, but statistical tests are generally
concerned with differences in means (group A does on average 25% better than
group B) so it's the sampling distribution we're interested in, not the parent
distribution.

You make a good point about autocorrelation and dependent data, but that's a
very different issue. To riff on your example about social networks, you'd
have dependent data if you're trying to see what kind of news articles people
like to read, if those preferences turn out to be mostly guided by what
friends are reading.

~~~
hrzn
"And yet a Pareto distribution still has a mean, and the sampling distribution
of that mean is approximately normally distributed."

This is often wrong. The central limit theorem requires finite variance and
some (yet common) Pareto distributions have infinite variance.

------
hyperpape
I confess that I found this article unhelpful. There are interesting tidbits
in there, but I don't think it helped me identify any specific errors you'd
reach by using a standard deviation rather than mean average deviation. The
closest it came was:

"1) MAD is more accurate in sample measurements, and less volatile than STD
since it is a natural weight whereas standard deviation uses the observation
itself as its own weight, imparting large weights to large observations, thus
overweighing tail events."

More accurate how? Less volatile, not overweighing tail events: what inference
would I make incorrectly by using the standard deviation?

To be clear, I'm not arguing "for" standard deviation, I'm just saying that I
wish this article had said more about why it's potentially misleading/less
powerful.

~~~
OliverJones
> specific errors you'd reach by using a standard deviation

> ..."overweighing tail events."

That's it. Sigma puts too much weight on outliers, giving them the power to
distort summary stats. MAD doesn't.

~~~
ori_b
I don't think that answer helps. How are you assigning "too much" weight on
outliers, and the process behind deciding the right amount of weight? Can you
think of any concrete examples?

~~~
reso
In this case what Taleb is concerned with is decision making. The right amount
of weight is what allows human beings to make good decisions. He believes that
MAD is much more intuitive to humans and therefore leads to better decisions.

~~~
ori_b
In that case, I'm looking for a concrete example.

------
lordnacho
I met Nassim Taleb a few years ago. He was doing due diligence on a fund I was
working at.

He's an incredibly colourful character. We chatted about various authors in
the statistical space, and chastised them all! "What about (this guy)? Idiot!"
It was a rant worthy of a Hitler parody. "Everyone who believes in standard
deviation, leave the room!"

It was hilarious. He had a point, too, about our statistical methods. A few
months later the fund blew up in a textbook way. He'd have made a packet if he
followed his own advice. No idea if he did.

~~~
oneloop
"Everyone who believes in standard deviation (...)"

Sounds like a moron.

I would love to hear more about what happened with your fund though.

------
s_q_b
_" What is worse, Goldstein and I found that a high number of data scientists
(many with PhDs) also get confused in real life."_

Wow. Just wow.

Some of people are calling themselves "Data Scientists" who don't know the
difference between σ and MAD?

I don't care how many letters are after your name. If you don't know the
absolute most basic types of summary statistics, you have no business calling
yourself a Data Scientist.

\---

To the phonies, please stop. Most of us work _hard_ to stay at the bleeding
edge, lest we fall behind, as I'm sure all HN devs strive to do. You're
essentially defrauding people, and if you're working on anything important,
the cascading effects can hurt a lot of real live actual human beings.

It's one thing to play around with data science to satiate your curiosity.
It's an entirely different matter to declare it your profession. For example,
I play around with KSP. That does not make me an Aerospace Engineer.

\---

Edit in reply @tgb: Actually I think you and I are on the same page about what
Taleb means. In fact, MAD is probably the more intuitive summary statistic for
most people.

I just find the number amongst Data Scientists surprisingly high, since half
the job is spotting these kinds of misinterpretations.

~~~
Homunculiheaded
Even worse there are people (utter frauds of course) who confuse MAD and MAD!
In all seriousness, there is much confusion between Median Absolute Deviation
and Mean Absolute Deviation out there. Ironically the MAD in this article is
still not a robust measure of variation in data as it will break for the many
distributions that have undefined/infinite mean (Cauchy and Levy as examples).

Even then many summary statistics rely on a well-defined PDF which is also not
true for many real life cases. I think most data scientists out there are very
familiar with quantiles, which are often more useful as all random variables
have a CDF (and the quantile is just the inverse CDF).

I quite enjoy Taleb's writing (I tend to find his ego a bit amusing) but I
think even he is guilty of Jaynes' "Mind Projection Fallacy"[0] in regards
searching for more meaning than exists in Fat-tailed distributions. When we
model our data with infinite/undefined mean and variance distributions we're
just saying "I don't know". No amount of cleverness with summary statistics,
or understanding of pathological distributions will create information where
there is none.

The overall point being: there are many, many ways of viewing statistics and
it's pretty trivial to find a perspective that allows you to call someone a
"fraud". Sure there are actual frauds in data science, but one of the biggest
strengths in this trend is bringing quantitative people from a wide range of
backgrounds to gain refreshing insights. It is much more useful to encourage
cross-discipline exploration than to simply say "you don't belong here".

[0]
[https://en.wikipedia.org/wiki/Mind_projection_fallacy](https://en.wikipedia.org/wiki/Mind_projection_fallacy)

~~~
eruditely
Doesn't seem like he's seeing more meaning than exists, check this book out &
this field.

[http://www.amazon.com/Modelling-Extremal-Events-
Stochastic-P...](http://www.amazon.com/Modelling-Extremal-Events-Stochastic-
Probability/dp/3540609318)

[https://fernandonogueiracosta.files.wordpress.com/2014/07/ta...](https://fernandonogueiracosta.files.wordpress.com/2014/07/taleb-
nassim-silent-risk.pdf)

Also Jaynes, made the mistake of thinking only Guassian's matter(The mistake
he criticizes).

------
chollida1
I submitted this for Nassim's article but this whole site is pretty great.
Check out their "big questions" on the left hand side where they get people to
weigh in on them:

[https://www.edge.org/contributors/what-do-you-think-about-
ma...](https://www.edge.org/contributors/what-do-you-think-about-machines-
that-think)

[https://www.edge.org/contributors/what-scientific-idea-is-
re...](https://www.edge.org/contributors/what-scientific-idea-is-ready-for-
retirement)

[https://www.edge.org/contributors/what-should-we-be-
worried-...](https://www.edge.org/contributors/what-should-we-be-worried-
about)

[https://www.edge.org/contributors/what-is-your-favorite-
deep...](https://www.edge.org/contributors/what-is-your-favorite-deep-elegant-
or-beautiful-explanation)

And to be honest where else can you find Nassim Taleb weigh in on any subject
but keep himself to writing only 500 words:)

------
jmount
There are some reasons to prefer variance/stddev, such as getting averages
correct: [http://www.win-vector.com/blog/2014/01/use-standard-
deviatio...](http://www.win-vector.com/blog/2014/01/use-standard-deviation-
not-mad-about-mad/)

~~~
stdbrouw
That's not a valid reason. You use descriptive statistics when you describe
something and you use inferential statistics when you're doing inference.

~~~
jmount
I understand we disagree, so I'll try to clarify my point of view a bit which
hopefully makes things better not worse.

One often (despite promises not to) is forced to optimize what one measures or
shares. It may or may not be relevant to a particular project, but I doubt it
is completely invalid (despite the claimed clean room separation of
descriptive and inferential procedures).

The idea is: if one believes absolute deviation is the one true measure then
it would not make sense to optimize over a different measure (variance).

I in fact like quantile regression, but it has its own caveats.

~~~
stdbrouw
So, let's say we want to make an industrial process more reliable by reducing
the variability of its output. We repeatedly measure the variability by means
of MAD. We'd like to know what causes the variability, so we regress MAD on
various predictors to see what causes variable performance. The regression
allows us to optimize MAD but the regression itself is fit using ordinary
least squares. I don't think anyone would object to that?

I guess you're thinking more along the lines of describing the performance of
a model in terms of the MAD, and then optimizing using MAD / L1, which isn't
always a wise choice? Then I guess we don't disagree at all. I do like MAD as
an easy to communicate loss statistic in many cases (as well as % of cases
with predictions further than a domain specific distance from the truth), but
I don't think many people would consider loss to be a descriptive statistic at
all – it describes not the world but a model of the world.

------
psoy
This is silly in my view, and bordering on pseudoscience-level writing. It's
not a random historical accident that we use these quantities.

Variance / stdev appear all over probability and statistics. For example, in
this famous Stats 101 theorem:
[https://en.wikipedia.org/wiki/Chebyshev%27s_inequality](https://en.wikipedia.org/wiki/Chebyshev%27s_inequality)

~~~
jerf
You are mistaking him for someone who is ignorant and slagging the best
practices; in fact, he's incredibly knowledgeable and experienced and slagging
best practices. I'm not saying he's guaranteed to be right about everything,
but he's made rather a lot of money by turning his criticisms into actions,
and that's a pretty tall bar to leap.

I'll say it again because I can already hear the reply buttons clicking... I'm
_not_ saying he's right about everything or that money is the only measure of
rightness. I'm just saying that it's definitely a measure worth paying
attention to, and definitely puts you on the experienced side rather than the
ignorant side. If it was just chance, it's still interesting.

~~~
psoy
Sounds like Appeal to Authority to me:
[https://en.wikipedia.org/wiki/Argument_from_authority](https://en.wikipedia.org/wiki/Argument_from_authority)

FWIW I am also someone who knows what he's talking about, as I am a seasoned
professional in the same line of work.

This guy's accomplishments don't make the Chebyshev inequality any less true
(nor all the other theorems involving variance), so I don't see how he can
claim something like this and be taken seriously by people in the field.

~~~
jerf
Because it's not about the theorems being wrong, it's about people using them
in impractical ways.

The Central Limit Theorem is true. Full stop. It can't be wrong. However, in
the real world, a lot fewer things are truly Gaussian than may initially meet
the eye. It doesn't make the CLT "false", it just means that people who apply
it too carelessly are making a mistake. Standard deviation is a thing, but
that doesn't make it the _right_ thing for a given task.

A lot of people apply statistics inappropriately. It's hardly their fault,
it's basically what they are taught. I remember seeing my wife take her
biology statistics courses, which at times seemed to be a course in which you
would repeatedly calculate p-values. Just that, over and over; calculate this
p-value. Calculate that p-value. Calculate this other p-value. Say "Yes" if
it's less than 0.05 and "No" if it's greater. Now do it again. And again. And
again. Certainly numbers went in one end of the calculator and came out the
other, but did they _mean_ anything? If not, it's not because the p-value
isn't "true", just not even remotely as useful as the course was implicitly
teaching.

(Yes, _words_ were said about how it wasn't the only useful thing, but the
actions spoke loud and clear. Compute p-value. Say yes if below 0.05. Say no
if above. Repeat. The current problems all the fields are having with
statistics aren't that surprising if you look back to the beginning.)

------
auggierose
I remember that when I learnt Probability Theory at Uni around 1996, I got a
3- in the exams (german grades go from 6 (the worst) to 1 (the best), a -
indicating a tendency towards the worse grade, and + indicating a tendency
towards the better grade). Everyone else in that year got a worse grade ...
And we were all (aspiring) mathematicians. The thing is, if you want to teach
that stuff in a way that is both rigorous with respect to theoretical
underpinnings, while at the same time making sure that the student can
actually apply in practice what they learnt ... that's a pretty difficult
task, even if your students are all mathematicians. Now, I have no idea what a
good way to teach these things to non-mathematicians would be!

Edit: I always was quite astonished of how easily Biologists etc. seem to have
understood quite complicated probabilistic mathematics, but now I understand
that mostly they are just cargo culting stuff they don't really understand.

------
arafa
I have replaced uses of the Standard Deviation with the Mean Absolute
Deviation at work on several occasions, for just the reasons described here.
It often leads to substantial improvement in predictive validity, in some
cases fixing a broken process.

~~~
Bromskloss
What kind of processes are we talking about?

------
chuckcode
Might be a little premature to call for retirement of sigma. The mathematical
concept of standard deviation is super useful but I agree that the name is
confusing and that we need to improve the naming and ideally the notation and
teaching of statistics. Ability to deal with uncertainty and variance is
becoming more and more important in all sorts of fields as data volumes get
larger so I'd hate to see us give up just because it is hard to understand.

~~~
stuxnet79
I've never really been a fan of the notation. But I don't know how you can
conceivably enforce this because just like programming, people will
pontificate about 'design', 'clean code', 'maintanability' and other similar
cargo-cult buzzwords but they will go ahead and do whatever they want with
their code.

~~~
chuckcode
Completely agree that standardizing notation takes decades as a lot of it is
how textbooks and tutorials from experts are written. I for one though am
really glad that we standardized on Leibniz's notation rather than Newton's
for calculus though and aren't using roman numerals anymore.

------
nerotulip
The technical (real article) is here;
[https://dl.dropboxusercontent.com/u/50282823/standarddev.pdf](https://dl.dropboxusercontent.com/u/50282823/standarddev.pdf)

------
bluecalm
Previous discussion about Taleb's view on this:
[https://news.ycombinator.com/item?id=7064435](https://news.ycombinator.com/item?id=7064435)

------
georgewsinger
Using MAD instead of standard deviation in the formula for correlation would
result in the possibility of random values being "correlated" by values that
exceed 1 in magnitude (the proof for a correlation coefficient being bounded
by 1 relies on the Cauchy-Schwarz inequality, which could no longer be
appealed to).

This doesn't detract from Taleb's points at all, but it does show a
mathematically "nice" property that results in the usage of standard
deviations.

[1]
[https://en.wikipedia.org/wiki/Cauchy%E2%80%93Schwarz_inequal...](https://en.wikipedia.org/wiki/Cauchy%E2%80%93Schwarz_inequality#Probability_theory)

------
hendzen
I usually prefer median absolute deviation on real world data. It's more
robust than either SD or mean absolute deviation.

------
omginternets
Where can I read more about the concept of infinite variance?

A gentle introduction would be much appreciated.

~~~
moultano
[https://en.m.wikipedia.org/wiki/Cauchy_distribution](https://en.m.wikipedia.org/wiki/Cauchy_distribution)

~~~
omginternets
Thanks! How does this relate to real-world statistics? In what context(s) does
this pop up?

Edit: what's wrong with my question? I _did_ mention a gentle introduction was
needed, so if the answer is obvious to some, please forgive my ignorance and
help me fix it.

~~~
rcthompson
Imagine an infinite line and a spinner[1] a short distance away from it. Spin
the spinner, wait for it to stop, and then mark the point on the line that the
spinner is point directly at (or away from). Repeat lots of times. The
resulting points have a Cauchy distribution. If you tried to figure out where
the spinner was along the line by taking the arithmetic mean of the points,
you would fail miserably. Taking the median is much more likely to give you a
good answer.

That was still a somewhat contrived example to demonstrate the point, but if
you replace the spinner's pointing with photons, you realize that a Cauchy
distribution describes the intensity of light shining on a flat surface from a
point light source[2].

[1] That is, one of these things: [http://www.ontrack-
media.net/math8/m8m5l1image13.jpg](http://www.ontrack-
media.net/math8/m8m5l1image13.jpg)

[2]
[http://stats.stackexchange.com/a/36037/5736](http://stats.stackexchange.com/a/36037/5736)

------
kazinator
If we regard taking the absolute value as a squaring followed by taking the
positive square root, then basically we have "root mean square" (STD) versus
"root square mean" (MAD), that is all. The one calculation takes the square
root _after_ the mean, the other moves it _before_.

If we extend MAD to vectors, then we average the vector norms.

What is the norm? It is the root mean square of the vector components. So then
MAD is then the "root square mean" of "root mean squares".

------
pepematth
The difference between MAD and STD is the use of the mean instead of the
quadratic mean. Sometimes the quadratic mean is better, for example: if you
have n some particles with velocity vi then the quadratic mean allows you to
replace your system with another where each particle has the same velocity,
the quadratic mean, this way the total kinetic energy is the same as in the
original system, and this conservation of energy is a very important property
in physics.

------
aws_ls
To understand better. Lets take the following deviations from mean in series
A&B.

A: 2, 1, 3, 1, 2

B: 4, 1, 2, 1, 1

MAD(A) = 9/5 = 1.8

STD(A) = sqrt(21/5) = sqrt(4.2) = 2.05

MAD(B) = 9/5 = 1.8

STD(B) = sqrt(23/5) = 2.14

So it clearly shows that STD penalizes any significant error. It weights any
deviation by itself (Sqare). So the message is all the values in a series
should be in a close range. If you look at its applications.

[https://en.wikipedia.org/wiki/Standard_deviation#Interpretat...](https://en.wikipedia.org/wiki/Standard_deviation#Interpretation_and_application)

"a small standard deviation indicates that they are clustered closely around
the mean"

The answers in this stack exchange (regarding RMSE, same thing, as Taleb also
notes), also point to this: [http://stats.stackexchange.com/questions/118/why-
square-the-...](http://stats.stackexchange.com/questions/118/why-square-the-
difference-instead-of-taking-the-absolute-value-in-standard-devia)

Also as per this wiki, its mainly used to compare models against each other:
[https://en.wikipedia.org/wiki/Root-mean-
square_deviation#App...](https://en.wikipedia.org/wiki/Root-mean-
square_deviation#Applications)

One comment on HN says that its M is for Median in MAD. Taleb's note itself
makes it clear by the temperature example that it is Mean average deviation.

Also, whats with the Taleb bashing on HN in these past two articles. The man
may be arrogant or he may be not. So what? Bringing that aspect of someone's
personality in technical discussions is bringing them to a very low level,
apart from being Ad hominem. I think, even in this post he is speaking from a
point of view of a practitioner, who has seen a tool abused a lot. And he may
not have all the time to utter out his full knowledge about a subject. Just by
that mention of Karl Pearson, it is obvious he knows compeletly what he is
talking about, as that person seems to be the first who gave the term
'standard deviation' to 'root mean square error (As per wiki on STD).

------
fpoling
A similar thing is using least squares for linear regression rather than
minimizing MAD. In past the argument was that the least squares sum has a
closed expression, but with computers even that advantage eliminated.

The nice thing about minimizing MAD is that in typical settings the liner
regression line-plane-hyperplane goes through measurement points. As such
there is no interpolation and outliers are nicely cut off making the result
very robust to measurements errors.

------
OliverJones
One of the barriers to adopting MAD() is the two passes over a dataset needed
to compute it.

As it happens, I made a MySQL feature request for a MAD() aggregate function
when Dr. Taleb's article first appeared. Any upvotes on that request would be
welcome.

[http://bugs.mysql.com/bug.php?id=71391](http://bugs.mysql.com/bug.php?id=71391)

------
vasili111
I think that linck should be here: [http://www.win-
vector.com/blog/2014/01/use-standard-deviatio...](http://www.win-
vector.com/blog/2014/01/use-standard-deviation-not-mad-about-mad/)

------
nerotulip
[https://dl.dropboxusercontent.com/u/50282823/standarddev.pdf](https://dl.dropboxusercontent.com/u/50282823/standarddev.pdf)

------
ris
While we're at it, can we also retire "linear" mean in favour of geometric
mean in what is commonly understood as "average"?

It is far more representative in most real life situations I find.

~~~
Tloewald
Can you give some examples where it's more representative? Average height?
Average income? There are a lot of arguments in favor of median as a more
intuitive replacement (and it is frequently used thus) but consider that:

a) Some people have ZERO income. b) Either an odd or even number of people
have NEGATIVE income.

Every day I deal in execution times, rounded to the nearest integral number of
milliseconds. Lots of zeros.

A measure that is (a) not actually intuitively correct, (b) very hard to
calculate in your head, and (c) useless for many, many non-trivial cases is
NOT a good replacement for "linear mean" which is (a) often intuitively
correct, (b) pretty easy to calculate in your head, and (c) always works,
seems better.

~~~
saalweachter
Geometric mean also interacts with unit conversions which use a different
zero-point. For instance, if you take the geometric mean of daily
temperatures, you will get different means depending on whether you work in F
or C or K.

------
calebm
I never understood why "root mean square deviation" is called standard
deviation instead of the MAD. That always annoyed me.

------
vacri
It'll be good to see MBAs bragging about their "6 MAD" courses instead of
their "6 Sigma" ones...

------
forgotpwtomain
I'm rather surprised how exceptionally naive articles are upvoted on HN when
the subject matter is specialized. If you're going to talk about retiring
standard deviation you should have some pretty detailed mathematical
arguments, this is a waste of time.

~~~
LionessLover
You didn't read it? Or you overlooked the part where he said WHO should
"retire" it - and who should keep it!

This is about how people _interpret_ statistical results. Which is not a
mathematical process - it is what comes _after_ the math.

~~~
forgotpwtomain
> This is about how people interpret statistical results. Which is not a
> mathematical process - it is what comes after the math.

If the later is not a process with a scientific method which can be detailed
and substantiated, than this is mysticism and my interest stops there.

That being said you really don't need to do math to practice mysticism, but
I'm sure having numbers and equations help with the hand-waving when selling
it to non-technical people.

------
graycat
Taleb is partly correct in his

> Standard deviation, STD, should be left to mathematicians, physicists and
> mathematical statisticians deriving limit theorems.

E.g., for positive integer n and a sequence of n random variables with the
same expectation and with finite variance and, as n grows to infinity, with
the variance converging to zero, the random variables, actually points in the
Hilbert space commonly called L^2, converge in the norm of that space, and
then a subsequence must converge almost surely, that is, the strongest case of
convergence. Of course, this is a very old result and standard when consider
convergence of random variables.

But standard deviation still has an important role in common applications of
statistics without "deriving limit theorems". And, with some irony, we don't
derive a limit theorem but use one, indeed, likely the most important one, the
central limit theorem (CLT).

With the CLT, under mild assumptions, for positive integer n, as n grows to
infinity, the probability distribution of the mean of n independent and
identically distributed (the _i.i.d._ case) converges to a Gaussian. Likely
the mildest assumptions are from the Lindeberg-Feller case (don't ask but look
it up if you wish, and to read the proof set aside much of an afternoon).

Now, when have convergence to a Gaussian and have the standard deviation of
that Gaussian, we can calculate any and all confidence intervals we want on
our estimate of the mean of that Gaussian. So, THAT'S one case of where and
why even in just common work we still want standard deviation.

Yes, how fast the convergence is to a Gaussian can be relevant in protecting
against Talib's "black swans" and avoiding, say, the disaster of Long Term
Capital Management (LTCM) in their estimates of _volatility_.

That is, suppose we want to estimate the standard deviation of an average (as
above). Suppose the random variables we are averaging have a distribution that
has in its probability density function a bump way, way, way out in a tail.
The way out in the tail means that if get such a value, then it's really large
(in absolute value, and in practice really far from the expectation of that
random variable). So, if get a value in that bump, then can have a "black
swan". But the probability of the bump is quite small. So, we can take samples
from the distribution of that random variable and average them for weeks
before we ever get a sample from the bump, before ever see a _black swan_.

So, doing this, in our sampling never seeing a black swan, we can have an
estimate of standard deviation that is significantly too small. So, with that
small standard deviation, can believe that some highly leveraged financial
positions are relatively safe, that is, also have low volatility.

Then, bad day, the Russians default on something, we get a "black swan", and
suddenly lose some billions of dollars where before we were really sure that
wouldn't happen for millennia. Sorry 'bout that.

Roughly, that is what happened in the famous, expensive crash of LTCM.

------
kenjackson
What he says seems to supply to volatility, but does it apply to other uses of
volatility?

------
a3n
I don't find "nassim" anywhere on the page or in the source.

~~~
evanpw
It looks like they're expecting you to click through from this page:
[https://www.edge.org/contributors/what-scientific-idea-is-
re...](https://www.edge.org/contributors/what-scientific-idea-is-ready-for-
retirement), which lists the actual authors.

