

Normal vs. Fat-tailed Distributions - lewis500
http://vudlab.com/fat-tails.html
fat tailed distributions
======
jessaustin
This is great, but I have a slight complaint.

Sliding the kurtosis indicator changes the left distribution, which makes
sense. However, it also changes the _appearance_ of the right, normal
distribution, which is misleading. Normal distributions have [EDIT: constant]
kurtosis. I realize that the appearance is changing because the scale is
changing so that the max is always pegged. However, it might be less confusing
if the scale remained static and the height of the left distribution simply
changed, since that would be a more accurate representation of what's actually
happening. That would obscure some details, but I don't think the precise
details of the heights of each bar in the distribution are really the point of
this page.

~~~
lewis500
yeah i gave myself some grief about that aspect but ultimately decided
different scales was more misleading. could go either way. also, the kurtosis
of a normal distribution is 3, not 0.

~~~
Fomite
Many statistical packages (SAS IIRC) subtract 3 from the kurtosis to set a
normal distribution at 0. This is an _infuriating_ tendency.

~~~
madcaptenor
There are some good reasons for this. For example, if Y is the sum of n
independent and identically distributed random variables with the distribution
of X, then the kurtosis of Y is 1/n times the kurtosis of X. This doesn't hold
without subtracting 3.

------
lambdasquirrel
I worked with an exponential distribution before at work. At first, it seemed
like we could model them like gaussian because the part of the data we were
interested in was "close enough" to a gaussian. We already wrote code that
worked for our other data that was gaussian. As it turns out, I was wrong.

The thing that can't be easily seen in pictures is that exponential
distributions move differently than gaussian distributions. When the variance
increases for a gaussian, it flattens out. When the variance increases for an
exponential, the whole thing spikes out to the right, and the area under the
tail actually increases. It really screwed things up until I started treating
it for what it really was.

~~~
aet
Also, it is important to note that Gaussian is symmetric about its mean,
whereas an exponential distribution takes values between 0 and infinity.

~~~
unlikelymordant
I think OP is talking about the two sided exponential i.e. laplace
distribution which is symmetric about its mean.

------
karl42
From the article: "Both distributions below have standard deviations of 1"

I thought fat tailed distributions don't have a variance. But apparently I'm
using the term in a stricter sense than other people. See
[http://en.wikipedia.org/wiki/Fat-
tailed_distribution#Definit...](http://en.wikipedia.org/wiki/Fat-
tailed_distribution#Definition) for details if you were wondering, too.

~~~
ltjohnson
Most statisticians have their own definition of fat-tailed, I like the
definition you linked to (polynomial decay), but it is not universally
accepted/known. These fat-tailed distributions will have a variance as long as
they decay faster than x^(-3).

------
andrewcooke
the [edit: _outer envelope of the_ ] walk should slowly deviate from zero,
shouldn't it (the variance of the sum increases with time / a random walk is a
_walk_ )? what i am seeing is returning to zero much more strongly than i
would have expected. what is the prng that you are using? i suspect it's not
that great.

~~~
benmaraschino
Symmetric random walks, including those with step sizes drawn from a Gaussian
process with mean zero, have expectation 0 at any time t. Since there's no
drift term, either walk won't be expected to slowly deviate from zero.

~~~
thetwiceler
This is false. Think about it this way. A random walk is a martingale - you're
expectation of where you'll be in the future is. So you're right that, before
you start the walk, they have expectation of being at 0 at any time in the
future.

But the variance of your probability distribution of where you'll be at time
_t_ is linear in _t_. So say that your variance is _v(t)_ = _t_. Then at _t_
=1, there is a 32% chance that you'll be outside of the range (-1,1). As you
can see, as _t_ increases, you expected to drift further than further.

So while the expectation of _x(t)_ may be 0 for all time, the expectation of |
_x(t)_ | scales like _sqrt(t)_ (the standard deviation of the distribution).

~~~
benmaraschino
Thanks for the clarification! I haven't studied stochastic processes in a
while, so the distinction was lost on me; it definitely makes more sense to
think of it as a martingale.

------
rypit
It would be awesome if these could be embedded in Wikipedia somehow.

~~~
ableal
My laptop's fan says that may not be advisable.

P.S. sometimes people overlook how a well-drawn illustration may be superior
to photos or video; think anatomical drawings, for instance. In this case I
feel the lack of two simple curves superimposed to illustrate the point about
the variance.

------
flatfilefan
I did not quite get the example with the bets. It is not clear what kind of
problem the fat tails bring compared to normal as long as the distribution is
symmetrical.

There is a term for dealing with actual distributions that differ from the one
you put as basis in your theory.
[http://en.wikipedia.org/wiki/Robust_statistics](http://en.wikipedia.org/wiki/Robust_statistics)

Regression analysis and particulary ANOVA are sensitive to
[http://en.wikipedia.org/wiki/Heteroscedasticity](http://en.wikipedia.org/wiki/Heteroscedasticity)
but may be less sensitive to fat tails as long as the standard deviation is
independent from the mean.

~~~
eridius
I could be completely off here, but my understanding of the bet example was if
you make bets assuming normal distribution, the worst-case situation you're
prepared to handle is going to be different than the worst-case situation
that's actually going to happen.

------
gtani
[http://blogs.sas.com/content/iml/2013/06/10/compare-data-
dis...](http://blogs.sas.com/content/iml/2013/06/10/compare-data-
distributions/) is pretty good about box plots and historgrams (static) and
mentions Chambers' visualization book (besides Cleveland's classic text)

------
RobinL
Nice use of d3 that really adds value. I'll use this as an example next time I
do time series training at work.

Could I ask what you've used to make the animation repeat? From a quick search
I noticed you aren't using setInterval() or d3.timer().

Are you just calling redraw as quickly as the CPU runs the code or am I
missing something?

~~~
lewis500

              .transition()
                .duration(dur)
                .ease("linear")
                .attr("transform", "translate(" + x2(-1) + ",0)")
                .each('end', plot1Anim);
    

the thing gets called when the transition ends.

~~~
RobinL
Ah yes, thank you!

------
johnwatson11218
I think the biggest fat tail distribution of all is the things that developers
are asked to do on a day to day basis. The fat tail is where frameworks break
down, where development processes and procedures break down. Most of the stuff
I do is stuff that I have never done before and will never do again.

~~~
johnwatson11218
oops - I was thinking of the "long tail" not the fat tail.

------
graycat
Sure, given a real valued random variable, it has a cumulative distribution.
If that distribution is differentiable, then the random variable also has a
density (distribution). Fine. Alas, that doesn't mean that in practice we
should try to find what the distribution is!

Actually, in practice, it's generally difficult to know with much accuracy
what a distribution is. Then, for the OP, in practice it's much more difficult
to know much about the tails or when they are _fat_ or not or if fat how fat.

The OP wants to claim that the normal distribution applies to heights of
people. I can believe that this is only roughly true!

Actually, the usual way we come to a normal distribution is from the central
limit theorem (CLT); in practice we want something like the _mechanism_ of the
CLT to apply.

When do we get the CLT? Sure: If for some positive integer n our random
variable Y is the sum, divided by the square root of n, of n independent,
identically distributed (i.i.d.) _samples_ of some distribution with, say, a
mean and a finite variance.

If n is 12, then can start to entertain normality if don't want accuracy in
the tails. If want high accuracy in the tails of the normal distribution from
the CLT, then I'd recommend some careful work and otherwise not trust the
accuracy.

A place where have a better shot at getting accuracy in a tail is the
exponential distribution. The leading case is: Suppose we have a Geiger
counter that goes "click" when it detects a radioactive decay. If the click
rate is low so that the chances of two or more decays giving only one click
are low, and real random variable T is the time until the next click, then
under usual situations in practice T will have quite accurately exponential
distribution.

More generally, the _stochastic process_ of such clicks is a Poisson process
and an example of an _arrival_ process. Well with weak assumptions, for
positive integer n, the sum of n independent arrival processes approaches a
Poisson process as n approaches infinity. This result is the _renewal_ theorem
with a proof in W. Feller's volume II.

An example of a use of the renewal theorem is arrivals at a Web site. So, for
each person on the planet there is an arrival process for that person at that
site. If assume that the people are independent, then the Web site sees the
sum of those arrivals and should see, over intervals of time much shorter than
one day, a good approximation to a Poisson process.

So, that's some 'applied probability' where we might work with tails.

Mostly in applied probability, just f'get about accuracy in the tails!

------
cjfont
> Lewis Lehe is a PhD student in Transportation Engineering at UC Berkeley,
> and Victor Powell is a freelance developer/teacher.js.

Is this a typo, or is teacher.js a cute way of saying JavaScript teacher?

~~~
accountoftheday
This is a javascript based demo which teaches you something. Think about it.

------
Loadin
Any chance you feel like sharing the d3 code? Would love to see how this was
done.

~~~
sebg
right click and view source.

