
The Central Limit Theorem Visualized with D3 - vicapow
http://blog.vctr.me/posts/central-limit-theorem.html
======
Jabbles
This isn't the central limit theorem. This is a binomial distribution. Nice
animation though.

~~~
nphrk
It is the distribution of the average of n binomial distributions (taking on
values x±1)†, which according to the CLT converges to a Gaussian as n→∞.

[†] x is where it is centered.

~~~
btilly
You only get the Gaussian if you take the limit correctly.

In the visualization as you increase the number of bins, the Gaussian
approximation becomes more and more squeezed, and by the weak law of large
numbers the limit is a Dirac delta.

In order to get the Gaussian you'd need to be looking at a window whose width
is proportional to the square root of the number of bins.

------
calinet6
A great example of convergence in the natural world I once saw was drops of
water falling off a gutter from about 10 feet in the air. There was very
little wind, and the drops fell on a line right under the gutter. It was
immediately clear that the drops landed in a roughly normal distribution with
their random fall through the air, and the pattern of wet pavement they
produced was a perfect little compressed bell curve. It was quite beautiful.

~~~
dietrichepp
Well, it wasn't perfectly normal. But it was close. The Central Limit Theorem
is about closeness, not perfection.

------
vanni
Video of a binomial distribution [1] physically visualized using a Galton box
[2] Pachinko [3]: <http://youtu.be/AjI_LcQOOs4>

[1]: <http://en.wikipedia.org/wiki/Binomial_distribution>

[2]: <http://en.wikipedia.org/wiki/Bean_machine>

[3]: <http://en.wikipedia.org/wiki/Pachinko>

------
hanzq
Here's a galton box. (<http://www.youtube.com/watch?v=AUSKTk9ENzg>)

It's a meatspace demonstration of the CLT.

~~~
yid
This was one of the most "a-ha moment" videos I ever watched back in the day.
Anyone with even a functional knowledge of statistics should watch this one.

------
drostie
I liked this but as other commenters have pointed out it's purely _binomial_
at this stage. The great thing about the central limit theorem is that it is
_more general_ than just the limiting Binomial case.

So, there's this thing called the cumulant-generating function. It's pretty
much defined for any random variable X. If you want to get technical it is the
logarithm of the Fourier transform of a probability density function f(x).
You're on HN so you probably know the first two jargon words, "logarithm" and
"Fourier transform". A "probability density" just means that f(x) dx is the
probability for X to be in the interval (x, x + dx). The Fourier transform
puts us into a "frequency space" indexed by some variable k, so we can write
the CGF as some function c(k), or in other words:

    
    
        c(k) = ln[ ∫ dx f(x) exp(i k x) ]
    

So for the "step left/right" variable which is -1 with probability 1/2 and +1
with probability 1/2, c(k) = ln[cos(k)]. (It can get a little messy when you
ask what happens when cos(k) crosses 0 etc, but this function is infinitely-
often differentiable on a disc centered at 0 which is all that we need.)

It also turns out that since the sum of all the probability ∫ dx f(x) = 1, you
can just evaluate this for k=0 as c(0) = ln[ 1 ] = 0. The cumulants are
derivatives evaluated at k=0:

    
    
        c'(0) = i E[X] = i μ               (where μ is the "expectation")
        c''(0) = - (E[X²] - E[X]²) = -σ²   (where σ² is the "variance")
        c'''(0) = -i E[(x - μ)³] = -i σ³ γ (where γ is the "skewness")
    

It is not very hard to prove that if I have a sum of two random variables X +
Y = Z, then their CGFs add like cz(k) = cx(k) + cy(k). This is the Fundamental
Awesomeness of Cumulants: cumulants are pseudo-linear; they're linear in
_independent_ random variables.

It is also not hard to prove that if I have a scalar multiplication U = X / n,
then cu(k) = cx(k / n). Combining these together, the mean M = (X1 + X2 + ...
+ Xn) / n of n identical and independent random variables looks like:

    
    
        cm(k) = n c(k / n)
    

Now if you know calculus, you know the rest. Taylor-expand around k = 0 to
find:

    
    
        cm(k) = i μ k − k²/2 * (σ²/n) − i k³/6 (σ³ γ / n²) + ...
    

We see a _geometric attenuation_ in this series, we keep dividing new terms by
n. If we drop terms from c'' on we get a constant, M = μ. That's boring, so we
keep one more term, to find:

    
    
       cm(k) ~= i E[X] − (Var[X] / n) k² / 2
    

This is extra-good if f(x) is symmetric about the mean (so that the skewness
vanishes), but it is also pretty good even if the distribution is skewed.

We can now invert all of the steps to return back to a probability density:
you exponentiate, so you get exp(i μ k) exp(- k² σ²/ 2 n), then you inverse-
Fourier-transform, which transforms the exp(i μ x) into a frequency offset x →
x − μ and which transforms the Gaussian into another Gaussian, so you get a
Gaussian centered at x = μ.

In other words you get approximately a normal distribution with mean E[X] and
variance Var[X] / n, with the error given by convolutions of higher-order
terms, and the first correction disappearing when the distribution is
symmetric about μ (or otherwise non-skew).

~~~
mturmon
Some of the explanatory power of the graphical demo seems to have been lost.

~~~
drostie
Maybe. But the proof also gives a lot of cute ideas for how to go further with
this sort of demo. Since adding random variables amounts to doing convolutions
with their distribution functions, if you had a Fourier transform package in
JS you could probably show people some nice visualizations of adding together
various random variables, and what it does to their probability distributions.
This might give more insight into the generality of the CLT: you take a
strange curve, convolve it with itself a bunch of times, and in the
convolutions and rescaling it becomes first a skewed distribution, then a
normal one.

------
graycat
> In probability theory, the central limit theorem (CLT) states that, given
> certain conditions, the mean of a sufficiently large number of independent
> random variables, each with a well-defined mean and well-defined variance,
> will be approximately normally distributed (i.i.d.).

No, that's just wrong. The context is some positive integer n and values of
some n real valued random variables independent and identically distributed
(i.i.d.).

If for some n do what the quote says, then are not guaranteed to get a
Gaussian.

Moreover, if do what the quote says and take the limit as n goes to positive
infinity, then, say, in the case the random variables have expectation that is
finite and finite variance, will get just the common expectation of those
random variables -- this is the law of large numbers, not the central limit
theorem. That the distribution of the mean converges to the point with the
expectation is the 'weak' law of large numbers. That the mean converges to the
expectation with probability 1 is the strong law of large numbers.

For the central limit theorem, add up the n i.i.d. values and divide by the
square root of n. Then under meager assumptions (e.g., i.i.d. and, for likely
the strongest conditions known, the Lindeberg-Feller conditions), as n goes to
infinity, the distribution of the result will converge to a Gaussian
('normal') distribution. Authors with the details include J. Neveu, L.
Breiman, M. Loeve, and K. Chung.

It's essential to divide by the square root of n, not just n.

------
pseut
Kind of a misleading animation, since n is the number of switches (4) and not
the number of balls (which diverges to infinity).

Setting the bins to 500 froze the browser, unfortunately.

The physical machines that do this are undeniably cool, but especially when
they have the curve painted on beforehand; this is the best I could find right
now, but I remember seeing a pretty big one as part of a traveling mathematics
museum exhibition:

<http://www.youtube.com/watch?v=AUSKTk9ENzg>

~~~
vicapow
what browser are you using? the bins <input> tag has a max value set to 25

~~~
sp332
I'm using Firefox 22.0 and I have no problem setting it higher than that.

------
baby
I have to learn this Theorem for my exams (which are in a week). All I know is
that

(X1+X2+ ... + Xn - n * mean)/sqrt(n * variance) --> N(0,1)

I don't see how that is visualization of this theorem.

~~~
dietrichepp
X1, X2, ..., XN are Bernoulli(0.5) variables. You can increase N to show that
the distribution approaches a Gaussian distribution, as long as you normalize
the variance correctly (otherwise you get either no distribution at all, or
you get a delta distribution).

------
goronbjorn
Plinko!

([http://en.wikipedia.org/wiki/List_of_The_Price_Is_Right_pric...](http://en.wikipedia.org/wiki/List_of_The_Price_Is_Right_pricing_games#Plinko))

------
vicapow
"The Bean Machine, also known as the quincunx or Galton box, is a device
invented by Sir Francis Galton[1] to demonstrate the central limit theorem, in
particular that the normal distribution is approximate to the binomial
distribution." - <http://en.wikipedia.org/wiki/Bean_machine>

~~~
mrmagooey
I don't understand, they take a distribution known to look like a normal
distribution, then... yep it looks like a normal distribution, central limit
theorem proved.

------
dangoldin
This is pretty great. I've just started using D3 and it's pretty powerful
stuff.

For those interested in data visualization - also take a look at Vega
(<http://trifacta.github.io/vega/>) - it makes creating the simpler charts
easier than using D3.

------
graycat
Here's what's going on, what will illustrate the central limit theorem (CLT)
and what will not:

For positive integer n and for i = 1, 2, ..., n, let real valued random
variable X(i) be so that

P(X(i) = -1 = P(X(i) = 1 ) = 1/2

Assume that {X(1), ..., X(n)} is an independent set of random variables.

Easily the expectation E[X(i)] = 0.

Let real valued random variable

S(n) = X(1) + X(2) + ... + X(n)

Easily E[S(n)] = 0.

For n = 4, S(n) can take on values -4, -3, ..., 0, 1, ..., 4.

The animation in the OP should converge to the density of S(n) where n = 4.

By the CLT, as n approaches infinity, the density of

( 1/sqrt(n) ) S(n)

will be Gaussian with expectation 0.

By the strong law of large numbers, as n approaches infinity, the probability
random variable

(1/n) S(n)

converges to 0 is 1.

For increasing n, S(n) is a martingale and is a discrete approximation to
Brownian motion. As n approaches infinity, the density of S(n) approaches 0
everywhere on the real line and in its limit is not a density.

~~~
graycat
Errata:

Change

P(X(i) = -1 = P(X(i) = 1 ) = 1/2

to

P(X(i) = -1 ) = P(X(i) = 1 ) = 1/2

For more, let v(i) be the variance of X(i). Then since E[X(i)] = 0

    
    
      v(i) = E[ (X(i) - E[X(i)])^2 ]
    
           =  E[ (X(i))^2 ]
    
           = (1/2) 1 + (1/2) 1
    
           = 1
    

Since in the i.i.d. case the variance of a sum is the sum of the variances,
the variance of S(n) is n. Then the standard deviation of S(n) is sqrt(n) so
that the standard deviation of

(1/sqrt(n)) S(n)

is 1.

So, for large n, the density of

(1/sqrt(n)) S(n)

converges to Gaussian with expectation 0 and standard deviation 1 and variance
1.

~~~
graycat
Errata:

Change

Since in the i.i.d. case the variance of a sum is the sum of the variances,

to

Since in the independent case, and, hence, also the i.i.d., case, the variance
of a sum is the sum of the variances,

------
sengstrom
Very cool. Could I suggest that you display the total balls dropped somewhere
as well?

Edit: just two niggles - (a) 13 bins would be better numbered 0-12 or 1-13.
(b) I would round the percentages to nearest value (not down).

~~~
vicapow
good idea! i'll try and add this when have time.

------
Sammi
I had to finish the number examples on that page to prove this to myself:

000 -> 0 001 -> 1 010 -> 1 011 -> 2 100 -> 1 101 -> 2 110 -> 2 111 -> 3

0 -> 1: 12.5% 1 -> 3: 37.5% 2 -> 3: 37.5% 3 -> 1: 12.5%

0000 -> 0 0001 -> 1 0010 -> 1 0011 -> 2 0100 -> 1 0101 -> 2 0110 -> 2 0111 ->
3 1000 -> 1 1001 -> 2 1010 -> 2 1011 -> 3 1100 -> 2 1101 -> 3 1110 -> 3 1111
-> 4

0 -> 1: 6.25% 1 -> 4: 25% 2 -> 6: 37.5% 3 -> 4: 25% 4 -> 1: 6.25%

~~~
Sammi
Sorry hn killed the formatting :(

------
cypher543
You may want to do a bit of proofreading. It's difficult for me to take the
content of this article seriously with so many spelling mistakes; especially
since you misspelled "probability", which is the focus of the entire thing.

~~~
vicapow
thanks for pointing this point. I updated the document.

------
sharkweek
This is great -- I love math visualized. It's fun to set the settings to no
delay and just watch the bar graph normalize really quickly

<http://i.imgur.com/E8yppGG.png>

------
Fuzzwah
Turn that into something I can gamble on and I'd play it all day.

------
vicapow
while i have the attention of HN, does anyone have suggestions on any other
types of statistic visualizations?

~~~
btilly
Brownian motion remains Brownian motion at all scales if you scale correctly.

More specifically, if you take a rectangle of width 1/nth of your
visualization box, height 1/sqrt(n)th of your visualization box, centered on
the point of the walk in the middle of the box then zoom in, you get another
random walk with the same statistical properties as the original.

~~~
mturmon
There is a gif of this at

<http://en.wikipedia.org/wiki/Wiener_process#Self-similarity>

which is rather nice.

If you get some infrastructure for making Brownian motion sample paths,
another good one is that it crosses zero infinitely often in the neighborhood
of the origin. Keep zooming in, and see more and more crossings. This is
another instance of the scaling rule you mention, but it is neat to see.

------
mathattack
It does get a lot more visually interesting with a very high # of bins, and
very small delay.

------
gmriggs
Also known as Plinko Theorem

------
coherentpony
This is not the Central Limit Theorem.

~~~
dietrichepp
Really? I see a collection X1, X2, ..., XN of Bernoulli(0.5) variables and the
demonstration that the distribution of their sum, normalized for variance,
approaches the Gaussian distribution as N increases. Is that not a direct
consequence of the CLT?

~~~
zmmmmm
> the distribution of their sum

Exactly. But the page defines the CLT as referring to:

"the _mean_ of a sufficiently large number of independent random variables
..."

So it talks about the _mean_ and then illustrates the _sum_ , just as you are,
which is confusing. I spent several minutes puzzling about how the mean of
variables between zero and 1 could be 5, for example.

Now, I suppose you will say, the distribution of the mean is just the
distribution of the sum, normalized, and this ought to be obvious to anybody
with half a brain, but if this page is meant to help people trying to learn
this stuff understand it, it would be better if it was accurate and not
directly confusing with its language.

------
rypit
nice!

~~~
vicapow
thank you!

