

An Introduction to the Central Limit Theorem - StylifyYourBlog
http://spin.atomicobject.com/2015/02/12/central-limit-theorem-intro/

======
jordigh
> Often referred to as the cornerstone of statistics

Well... often referred to as the central theorem of statistics. Each time you
say its name. What's central is the theorem, not the limit. It was Polya who
first called it that, "zentraler Grenzwertsatz".

> Why the Central Limit Theorem Works

Well... I don't think that's really an explanation at all of why e^(-x^2/2) is
such a privileged function. Why would any distribution converge to a normal
distribution?

It essentially boils down to the Fourier transform. When you take the Fourier
transform of the sample means, if you ignore all but the quadratic terms
(there are no linear terms if you centralise to mean 0 and variance 1), you
get the exponential limit (1 - t^2/2n)^n. That's the Gaussian, which is its
own Fourier transform.

[https://en.wikipedia.org/wiki/Central_limit_theorem#Proof_of...](https://en.wikipedia.org/wiki/Central_limit_theorem#Proof_of_classical_CLT)

In other words, because the Gaussian is its own Fourier transform, sample
means converge to the Gaussian.

~~~
jamessb
> _Well... I don 't think that's really an explanation at all of why e^(-x^2)
> is such a privileged function. Why would any distribution converge to a
> normal distribution?

> It essentially boils down to the Fourier transform_

Precisely.

One motivation for taking a fourier transform is that the the p.d.f. of the
sum of several random variables is the convolution of corresponding p.d.fs,
and that convolution in the original domain is equivalent to multiplication in
the fourier domain (by the convolution theorem).

~~~
shas3
...and p.d.f's convolve when you derive the distribution of a random variable
defined as sum of other independent random variables.

------
pspencer
I always liked this visual representation of the central limit theorem:
[http://blog.vctr.me/posts/central-limit-
theorem.html](http://blog.vctr.me/posts/central-limit-theorem.html). There is
a faster one here (I think written in R):
[http://vis.supstat.com/2013/04/bean-
machine/](http://vis.supstat.com/2013/04/bean-machine/)

These are computer simulations of Galton boxes:
[http://en.wikipedia.org/wiki/Bean_machine](http://en.wikipedia.org/wiki/Bean_machine)

~~~
stdbrouw
As the comments to that page mention, though, it's not a representation of the
central limit theorem at all. It shows how a binomial distribution "becomes" a
normal distribution in the long run, but it doesn't show why this would apply
to the distribution of errors of a series of sample means.

------
bkcooper
I think the first half of the article showing how this works with a given
sample distribution is pretty good. I don't think it's really doing much to
build intuition at the end, though.

It's also worth pointing out that there are distributions for which the
central limit theorem doesn't hold (e.g. the sum of samples from a Lorentzian
distribution will again be Lorentzian, not Gaussian.)

~~~
jordigh
> the sum of samples from a Lorentzian distribution will again be Lorentzian,
> not Gaussian

Lorentzian? I had to look that up. Oh. Cauchy distribution. Right, because it
doesn't have any finite moments, because the tails are too heavy.

~~~
cozzyd
Physicists like to say Lorentzian (or sometimes Breit-Wigner) instead of
Cauchy.

------
rm999
I have a series of basic questions I include in any data science interview,
and one is "please describe what the central limit theorem says in simple,
high-level terms". It's absolutely amazing how many people who have great
credentials can't do this. I get a lot of "any distribution becomes normal
when you sample it enough". This is nonsensical and shows a lack of
understanding of the theorem.

Please, if you claim to know stats, understand what the central limit theorem
says. It's a pretty incredible and useful theorem.

~~~
whorleater
Out of curiosity, mind sharing some of those questions? I'm a computer science
major looking into getting a data science internship, and I'd love for a
chance to see some real interview questions.

~~~
rm999
Yep! Here are a few, I customize the questions a bit based on seniority and
background.

1\. In a 1-d gaussian with mean=0 and variance=1, what is p(x=0)?

2\. If I give you a csv file with a header, how would you return a tsv file
with columns 6 and 2? Just high-level, not actual implementation.

3\. If I have some dataset (I change it up a lot, a common one is subway
ridership by station and date), how would you create a visualization to give
some insight to a fairly non-technical audience, e.g. the distribution of
ridership by station.

4\. If I gave you a house dataset with a bunch of attributes (e.g. year built,
number of sq feet, presence of pool, zip code), how would you go about
building a machine learning model that predicts whether the house is being
rented or is owned (assuming you are provided with tagged data)? The selling
price? How many times it has been sold?

5\. What's a technology that you're excited about (could be a computing tool,
a machine learning algorithm in a library, etc)? Tell me some of the pros and
cons of it and competing technologies.

We also give out a 4-6 hour task as homework, and evaluate the code quality,
tools used, answers provided, etc.

------
jhallenworld
My introduction to the central limit theorem was that chained independent
random processes tend to result in a Gaussian distribution. This is so general
that one is surprised when one finds non-Gaussian distributions (canonical
example: the stock market).

I attended a lecture by Mandelbrot (shortly before he died) where he spoke at
length about this- take a look at stable distributions and the generalized
central limit theorem.

~~~
dnautics
I think that Mandelbrot's point is actually quite the opposite - that non-
gaussian stable distributions are the norm; and that we are trained to see
gaussians and that's why we're surprised. When alpha is very close to 2 (e.g.
1.9 and above it really makes little difference) but even a bit lower like 1.8
you can have some very scary consequences for modelling. In his book he has an
example of an insurance market and modelling the failure of insurance
companies... Which was especially poignant considering what happened after he
died (and what is happening now).

------
giarc
Is it not tradition to use n for sample size rather than N? N is typically
population size (in my experience).

~~~
jordigh
I don't think any tradition is too entrenched on this particular point.

~~~
giarc
You might be right. In my stats courses, we used lower case letters for all
sample estimates and upper case for all population values.

------
eliwjones
To me.. the core idea is that (given one chooses, over and over, from a bunch
of independent and identically distributed events.):

There are more ways for everything to happen than there are ways for one thing
to happen over and over.

------
willvarfar
An interesting bit of trivia for computer history buffs:

Alan Turing independently discovered the Central Limit Theorem while still an
undergrad in 1934.

