
Stats on volcanic eruptions show pattern called Benford's Law - llambda
http://arstechnica.com/science/news/2012/04/fun-with-numbers-volcanoes-obey-benfords-law.ars
======
BenoitEssiambre
Benford's law is not odd if you are believer in the Bayesian interpretation of
probability theory.

The shape of Benford's distribution actually follows the shape of a maximum
entropy, most uninformative distributions (the “flatest distribution” in an
information theoretic way) for a _magnitude_ value (a value that can only be
positive).

For example: sizes, lengths and volumes. These units can't go in the negative.
Alternatively, positions can go in the negative, the ignorance distribution
for position is flat horizontal line and numbers representing a position
generally do not follow benford's law.

It is not that unintuitive. Take street lengths. Assuming that they follow
Benford's law (a log prior) simply means that, for any length of street L, if
you pick another random street, you are as likely to pick a street within the
length range (L/2 to L) than (L to 2*L).

From the original street length, to get a street twice as short, you need to
subtract much less than you would have to add to get a street twice as long.
That is why this distribution is not flat. Or rather it is flat on the
multiplication or division operation, not on addition or subtraction.

If instead you'd assume that a street X meter longer is as likely as a street
X meter shorter you would end up with impossible probabilities. For example,
when picking a random street relative to a street of 1 km, a (1 to 3)
kilometer street would be as likely as a (-1 to 1) km street??? If you assumed
probabilities were equal for all lengths between 0 and infinity that would
mean you think there are likely as many streets measuring a tredecillion
billion km long as there are street 5 km long. This is simply not how things
are sized in the universe. Smaller things are in greater numbers. Log priors
are one of these areas where the math predicts the universe logically and the
universe is mirrored by the math beautifully iff you do your calculations
properly (Using Bayesianity).

------
Lazare
Of course they do. Benford's Law is just an artifact of how counting works.

Let's say we are measuring widget production. Right now we're making 1 widget
per period. Each period we increase production by 10%. After a random number
of periods, what do we expect the MSD (most significant or left-most digit) to
be? After a random (large) number of periods, if we add up how frequently
different numbers show up as the MSD, what will the distribution be?

Well, let's think about it. It takes 8 periods before our production number
gets over two (and thus stops having an MSD of 1), and another 4 periods
before our production number gets over three (and thus stops having an MSD of
2). If we keep going, the results look like this:

    
    
      1 -> 8 periods
      2 -> 4 periods
      3 -> 3 periods
      4 -> 2 periods
      5 -> 2 periods
      6 -> 2 periods
      7 -> 1 periods
      8 -> 1 periods
      9 -> 2 periods
      1 -> 7 periods
      2 -> 4 periods
      3 -> 3 periods
      4 -> 3 periods
    

And so on. If we were to stop there, we have counted 41 periods. The most
common MSD is 1 (taking place 36% of the time), followed by 2 (19%), 3 (14%),
4, and so on. Benford's Law! Of course, you might say, that just shows up
because I chose to stop there.

But actually, it works to a greater or lesser extent, pretty much regardless
of when we stop (or where we start). It works regardless of the units being
used. It also works if we're adding a flat amount instead of a percentage
increase, or if we're adding a random amount, or if the number is not
monotonically increasing. It also works in other bases. (In fact, thinking
about the example of binary may be illuminating. The MSD is _always_ 1 for any
number > than 0.)

It also works if we're generating random samples between 1 and some BIG_NUM.
If you generate 1000 numbers between 1 and, say, 2^24, you might find that
that the MSD of 48% of those samples is 1, while no other digit is found as
the MSD in more than 7% of samples. Pick a different BIG_NUM, and you'll see a
different distibution. 2^20 gives you a much flatter distribution; in the test
I just ran I saw 16% with a MSD of 1, versus 9-12% for all other digits. But
the point is, lots of potential BIG_NUMs produce samples which skew heavily
towards having an MSD of 1, and thus a random BIG_NUM has a highly skewed
expected distribution.

And so on. :) It's actually kind of fun to play with. But at the end of the
day, this is just a product of using a positional number system.

------
thebigshane
An excellent visual tour of Benford's Law:
<http://www.datagenetics.com/blog/march52012/index.html>

My understanding of this law is that any collection of real-world data/metrics
should follow this distribution pattern. There may be exceptions in which
there are other factors influencing the data, but _most_ should follow this
"law". In that case, I don't see what is surprising about volcanic eruptions
following this law. I would have assumed it already did (well, I suppose you
have to already know about this law first to assume that).

------
RockofStrength
Simple explanation for Benford's Law: When numbers expand out into an
additional digit through doublings, a "1" is always present, "2" is usually
present, and so on down the line. The law is present in proportion to the
degree that the sample set possesses logarithmic distribution.

The distribution doesn't have to be limited to doublings. The law also applies
to triplings, quadruplings, halvings, etc. The key is
logarithmic/exponential/geometric distribution. Try it for yourself on the
calculator and see how rarely 9s come up as the first digit for every x^n.

A good real world example would be the distribution of frequencies of musical
pitch in equal temperament. In this case, the exponential multiplier (x) for
x^n is 21/12, or about 1.059. Here's
(<http://en.wikipedia.org/wiki/Piano_key_frequencies>) a list of the
frequencies; notice how the first digits have tons of ones and almost no
nines.

------
waivej
Benford's Law seems pretty obvious once you think about it. Though I love its
application to "forensic accounting".

Do you think it could be used to judge the "truthiness" of media websites
online? I don't think it works with rounded numbers that are more likely to be
found in the news.

~~~
chimeracoder
> Benford's Law seems pretty obvious once you think about it.

Nate Silver actually did a bit of analysis on it around the time of the 2009
Iranian elections and the Al Franken-Norm Coleman recount. In short, he
explained that while it does hold in certain circumstances, it's not really a
'law' that can be applied very broadly, unfortunately, even if the idea makes
sense intuitively. For example, I believe he said that the Iranian ballots did
_not_ follow Benford's law, but that this was not necessarily very
problematic, because election results in general have no reason to follow
Benford's law.

(I don't have the link, but that should be enough to find it in the 538
archives for anyone who's interested).

~~~
waivej
Thanks for the tip! I think this is the link you were referring to:
[http://www.fivethirtyeight.com/2009/06/karroubis-
unlucky-7s....](http://www.fivethirtyeight.com/2009/06/karroubis-
unlucky-7s.html)

------
rollypolly
As a side note, the main photo in the article is stunning! I need to find a
desktop-size version.

~~~
celias
<http://earthobservatory.nasa.gov/IOTD/view.php?id=6592>

