
Why isn't everything normally distributed? - tambourine_man
https://www.johndcook.com/blog/2015/03/09/why-isnt-everything-normally-distributed/
======
antognini
As a (former) astronomer, I've never understood why people assume normal
distributions for everything. I understand that there are good theoretical
motivations for this --- namely, the normal distribution is the distribution
that maximizes entropy for a given mean and variance. But in astronomy,
nothing is normally distributed. (At least, nothing comes to mind.) Instead,
everything is a power law. The reason for this is that most astrophysical
processes are scale free over many orders of magnitude, and if you want a
scale free process, it must be distributed as a power law.

There's actually a joke in the field that when you get a new dataset, the
first thing you do is fit it to a power law. If that doesn't work, you fit it
to a broken power law.

~~~
Kenji
> As a (former) astronomer, I've never understood why people assume normal
> distributions for everything.

It shouldn't, because there is a big reason to lean towards normal
distribution, and that is the central limit theorem. If you sample a variety
of different distributions, the sum of all these samples will go towards a
normal distribution. That is why the normal distribution pops up in so many
places.

~~~
smallnamespace
That sounds nice, justifying why you expect your distribution to be the _sum_
of other distributions, and not another operation like multiplication takes a
lot more work; in some fields (e.g. finance), it's completely wrong.

~~~
Houshalter
If you multiply random numbers together you get a log normal distribution.
Which means the logarithm of the number is normally distributed
(multiplication is just adding logarithms after all.) However multiplicative
noise is a lot less common than additive noise.

------
technofire
Perhaps a better question is "Why is anything normally distributed?" It
appears originally to have been a simplification to make the math more
convenient:

As Rand Wilcox reports, "Why did Gauss assume that a plot of many observations
would be symmetric around some point? Again, the answer does not stem from any
empirical argument, but rather a convenient assumption that was in vogue at
the time. This assumption can be traced back to the first half of the 18th
century and is due to Thomas Simpson. Circa 1755, Thomas Bayes argued that
there is no particular reason for assuming symmetry, Simpson recognized and
acknowledged the merit of Bayes's argument, but it was unclear how to make any
mathematical progress if asymmetry is allowed." (Wilcox, p. 4)

Wilcox, R. (2010). Fundamentals of modern statistical methods: Substantially
improving power and accuracy (2nd ed.). New York, New York: Springer.[1]

[1] [http://amzn.to/2tkMRoI](http://amzn.to/2tkMRoI)

------
abetusk
As others have pointed out, power laws are more "normal" than the normal
distribution.

The reason for this is that if you have the sum independent identically
distributed (I.I.D.) random variables (R.V.s), if they converge to a
distribution, that distribution is Levy Stable [1], which is power law in it's
tails. The Gaussian is a special case in the family of Levy Stable
distributions.

The article states that "the sum of many independent, additive effects is
approximately normally distributed" which is patently false. The sum of many
independent random variables __with finite variance __is normally distributed.
Once you relax the finite variance (and in more extreme cases, finite mean)
power laws result.

There are other ways to generate power laws, including having killed
exponential processes [2]. There are many other references that talk about the
rediscovery of power laws [3] and give many ways to "naturally" create power
laws [3] [4] [5].

The article claims that multiplicative processes lead to log normal
distributions. I've heard that this is actually false but unfortunately I
don't have enough familiarity to see how this is not true. If anyone has more
insight into this I would appreciate a link to an article or other
explanation.

[1]
[https://en.wikipedia.org/wiki/Stable_distribution](https://en.wikipedia.org/wiki/Stable_distribution)

[2]
[http://www.angelfire.com/nv/telka/transfer/powerlaw_expl.pdf](http://www.angelfire.com/nv/telka/transfer/powerlaw_expl.pdf)

[3]
[https://arxiv.org/pdf/physics/0601192v3.pdf](https://arxiv.org/pdf/physics/0601192v3.pdf)

[4]
[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.122...](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.122.3769&rep=rep1&type=pdf)

[5]
[http://www.angelfire.com/nv/telka/transfer/powerlaw_expl.pdf](http://www.angelfire.com/nv/telka/transfer/powerlaw_expl.pdf)

------
leephillips
An example from a book I used to own, called, I think, _Treatment of
Experimental Data_: imagine a factory manufacturing ball bearings. They want
them all to have the same radius, but because of random errors, the radii will
be normally distributed about some mean. If this is true, other random
variables, such as the mass, can not be normally distributed.

~~~
prashnts
Sorry, could you explain why mass can't be normally distributed? If, say, the
mass of the bearing is related to its radius, then shouldn't it follow similar
distribution?

~~~
Sharlin
Mass is proportional to radius cubed. Cubing a normal distribution makes it
not normal.

~~~
tgb
Though it should be noted that "cubing" doesn't refer to just taking the cube,
i.e if f(x) = e^{-x^2/2} is proportional to the standard normal distribution
then the cube of f(x), namely f * f * f is not the distribution you're
considering here. Rather it's the "pushforward measure" [1], in this case is
1/3 y^{-2/3} e^{-y^{2/3}/2} where y=x^3 is the mass and x is the radius. (Up
to constants.)

[1]
[https://en.wikipedia.org/wiki/Pushforward_measure](https://en.wikipedia.org/wiki/Pushforward_measure)

~~~
xyzzyz
Cubing does refer to just taking the cube, but the cube of random variable,
and not its probability density function -- the density after cubing is indeed
the pushforward.

------
nerdponx
From the comments:

 _Personally, I don’t find it surprising that not everything is normally
distributed. Why should any real phenomenon follow a theoretical limiting
distribution anyway, never mind a symmetric, infinite-tailed distribution that
is exact only in an unachievable limit? The surprise is that so many things
_are_ sufficiently near normality for it to be useful!_

------
ronald_raygun
Interestingly enough - you don't need random variables to be independent or
identically distributed for a CLT to apply.

See

[https://en.wikipedia.org/wiki/Central_limit_theorem#CLT_unde...](https://en.wikipedia.org/wiki/Central_limit_theorem#CLT_under_weak_dependence)

and

[https://en.wikipedia.org/wiki/Central_limit_theorem#Lyapunov...](https://en.wikipedia.org/wiki/Central_limit_theorem#Lyapunov_CLT)

~~~
felippee
But the variance needs to be bounded. With power laws is is often unbounded.
The trick that many get trapped in is that short samples of power laws
typically look regular. Estimates of variance will always be finite on finite
sample size. Nassim Taleb goes into more detail on this.

~~~
thanatropism
Then, Taleb is a jerk when someone tries to pin him on one of his wild
exaggerations.

Even as a quant Taleb was prone to smushing over details to push a narrative.
In the 90s already Derman was enabling Taleb into claiming the Black-Scholes
formula was an interpolation algorithm already known to option traders and
served basically to justify using the risk-free rate as a drift (price trend)
parameter in accordance to the "economics establishment". But (as noted by
more than one response in the literature) making a different assumption on the
drift (or even the distribution) -- i.e. leaving the "Black-Scholes world" and
merely interpolating two world-states -- you're left with calibrating a
stochastic discount rate that gives put-call parity. But hey, not that
technicalities should get in the way of a good story!

------
graycat
Because for random variable X, sometimes need to care about X^2, and when X is
normally distributed X^2 is chi-square distributed and not normally
distributed.

Because under mild assumptions, for arrivals, say, of visitors to a Web site,
gas station, hospital, the arrivals form a Poisson process, the times between
arrivals are independent and identically exponentially distributed, and that's
not normally distributed.

Now at nearly any server farm, it is easy to get wide, deep, rapidly flowing
oceans of data on the performance of the server farm, and, as U. Grenander
once explained to me in his office at Brown, the data is wildly different from
what statistics was used to, e.g., in medical data. In that ocean of data,
finding anything normally distributed will be very rare.

The claims in the OP about many effects are nothing like good evidence for the
central limit theorem or normally distributed. E.g., from the renewal theorem,
many examples of Poisson processes are from the results of many independent
effects.

E.g., the usual computer based random number generators return what look like
independent, identically distributed random variables uniform on [0,1], and
that is not normally distributed.

The question in the OP about why not normally distributed is, in one word,
just absurd.

------
eanzenberg
What's more surprising is how little the central limit theorem holds and is
useful, yet is still used all the time to justify poor analysis, usually a/b
testing. When the underlying distribution has high variance, as many metrics
I've come across with extreme long tailed behavior, the aggregates need large
N before they adhere to Gaussian

------
jsnk
We see Pareto distribution in nature and society a lot.

[https://en.wikipedia.org/wiki/Pareto_distribution#Applicatio...](https://en.wikipedia.org/wiki/Pareto_distribution#Applications)

------
smcgraw
Someone poke Taleb.

------
vanderZwan
> _Height is influenced by environmental effects as well as genetic effects,
> such as nutrition, and these environmental effects may be more additive or
> independent than genetic effects._

This makes me wonder if in countries where these environmental effects are
mostly optimised (everyone having access to good nutrition), or at least
nearly identical for everyone, the normal distribution of height breaks down.

Is height normally distributed in the tallest countries in the world? What
about the shortest?

edit: I'll just copy this question to the comments under his blog, maybe the
author has some idea about that.

edit2: just noticed the blog post is from 2015... oh well, it was worth a
shot.

------
SophosQ
I came across this CMU presentation on why the purported ubiquity of power
laws must be taken with a grain of salt:

[https://goo.gl/23PP7v](https://goo.gl/23PP7v)

Check from slide 32.

As someone who doesn't have significant experience in statistics, I'd be
grateful for an expert's opinion on the arguments presented in this
presentation.

------
jtolmar
I'd take the CLT as everything complicated being normally distributed by
default, but with fairly common exceptions:

1 - If the problem isn't actually that complicated, the CLT doesn't do much.

2 - If the problem is dominated by one component, it will still mostly look
like that component.

3 - Most ways of slicing a normal distribution lead to other distributions.
For example the Rician distribution.

------
evanwarfel
Because not everything has finite variance.

------
BjoernKW
Because many phenomena aren't in fact representations of arbitrary random
variables.

Take word distribution in any human language for instance. Word frequencies
follow a Zipf distribution because it decreases entropy and hence is more
efficient.

~~~
oh_sigh
It's not arbitrary random variables, it is independent additive variables
which leads you to gaussian distributions frequently.

~~~
pmiller2
The surprising thing about the CLT is that it applies whenever the means and
variances of the summed random variables exist. This is really a very mild
condition, but the surprise to me is that the result is independent of the
actual distributions of the variables being summed!

~~~
pishpash
The convergence rates are different though, so at finite sample sizes and as a
practical matter, it does matter.

------
acscott
Now, having read some comments, do not get lost in your assumptions (which
implies you should know your assumptions). It's really that simple.

------
tictacttoe
There are a lot of quantities which are positive definite. If it's bounded
from below, it's not Gaussian.

~~~
kgwgk
Strictly speaking nothing physical is Gaussian, because everything is bounded.
There are a finite number of particles in the observable universe.

------
725686
Nassim Nichalas Taleb doesn't have many nice things about the bell curve in
his Black Swan book.

------
acscott
From the question without even reading a thing, why would any set of events
follow a gaussian?

------
ruste
Can anyone quickly explain to me why some things _are_ normally distributed?

------
clentaminator
Is the distribution of distributions itself normal?

------
agentofoblivion
All models are wrong, some are useful.

