
Why 30.1% of numbers start with 1 - yannis
http://www.dspguide.com/ch34/1.htm
======
jacquesm
Benfords law is instrumental in fraud detection, the Enron guys were wise to
it, but Bernie Madoff was not:

[http://falkenblog.blogspot.com/2008/12/benfords-law-
catches-...](http://falkenblog.blogspot.com/2008/12/benfords-law-catches-
madoff.html)

~~~
wooby
As a friend of mine pointed out, how is this possible? Shouldn't fraudulent
numbers also follow Benford's law?

~~~
schammy
Random data doesn't follow the law. Fraudulent tax returns are essentially
random numbers, hence, detectable.

~~~
shrughes
Uniformly distributed random data doesn't follow the law. Data whose logarithm
is uniformly distributed does.

------
imurray
Suggested exercise:

Predict and then create a histogram of the leading digits of the file sizes of
the non-zero-length files on your computer.

[ _SPOILER:_ when I did this once I found sharp peaks around digits that
weren't 1. You are likely to see this if you have a large number of files
around a particular size, examples: a) whatever your digital camera typically
produces; b) whatever size your software encodes a typical song into. These
files violate the assumption that you are sampling sizes over a wide range of
sizes. After excluding these files I observed Benford's law quite closely on
the remainder.]

~~~
spc476
Curious. I did so:

    
    
        1       ****************************
        2       ***************
        3       *********
        4       *****************
        5       *******
        6       ******
        7       *****
        8       ****
        9       ***
    

That spike for 4 is due to the default directory size of 4096 (my experiment
included directories as well as files). The information was pulled from
503,444 files and directories.

------
cruise02
Not "30.1% of numbers" start with 1, but 30.1% of numbers from distributions
that cover several orders of magnitude might.

<http://www.billthelizard.com/2009/04/benfords-law.html>

</pedantry>

~~~
palish
"Explaining Benford's Law" may have been a more accurate title, but "Why 30.1%
of numbers start with 1" brought this neat article to a larger audience. Taken
too far, though, that kind of thing can ruin social news sites. Headlines have
always been a gray area.

~~~
Confusion
Unfortunately, "Explaining Benford's Law", which is the actual title of the
article, is a complete misnomer, as the article doesn't actually explain the
first thing about Benford's law. It just restates Benford's law and gives
examples. There is no explanation or derivation at all.

~~~
FalcorTheDog
Click the "Next Section" link at the bottom to get way more
explanation/derivation than you could ever want.

~~~
Confusion
Ah thanks, I mistakingly thought that linked to a next chapter(which actually
isn't there; this is the final chapter in the book).

------
gjm11
The supposed explanation here is not very good. It amounts to this: "To get
those first digits, you took the actual numbers and scaled them all by powers
of 10 to get values between 1 and 10. That's kinda logarithmic, and Benford's
law is kinda logarithmic, so it's no surprise that the results end up obeying
Benford's law. All you really need, kinda, is for the probability distribution
to span several powers of 10." This is hand-wavy and, not to put too fine a
point on it, wrong. For instance, suppose we generate random numbers uniformly
distributed between 1 and 1000000; they will not obey Benford's law.

The author also claims that looking at the Fourier transform of the
probability distribution is key to understanding what's going on. But the full
extent of his Fourier-based analysis is this: Consider the probability
distribution function for log_10(data). Then Benford's law holds if this is
constant (editorial note: it cannot in fact be constant) and holds roughly if
it's roughly constant. That happens, kinda, when the probability distribution
is very broad (editorial note: no, not really; see the example above). What,
you didn't see anything about Fourier transforms there? Well, that's because
the Fourier stuff is really almost all window-dressing.

For a brief account of Benford's law and related matters written by someone
with a better grasp of what's going on, you could turn to
[http://terrytao.wordpress.com/2009/07/03/benfords-law-
zipfs-...](http://terrytao.wordpress.com/2009/07/03/benfords-law-zipfs-law-
and-the-pareto-distribution/) whose author is one of the best mathematicians
currently living and also a very good expositor.

~~~
adamc
Thanks much for the link.

------
jgrahamc
Benford's Law and the Iranian election:
[http://www.jgc.org/blog/2009/06/benfords-law-and-iranian-
ele...](http://www.jgc.org/blog/2009/06/benfords-law-and-iranian-
election.html)

And the British MPs' expenses: [http://www.jgc.org/blog/2009/06/its-probably-
worth-testing-m...](http://www.jgc.org/blog/2009/06/its-probably-worth-
testing-mps.html)

And BBC executives' expenses: [http://www.jgc.org/blog/2009/06/running-
numbers-on-bbc-execu...](http://www.jgc.org/blog/2009/06/running-numbers-on-
bbc-executives.html)

~~~
joss82
But as the article says, if you add a certain percentage of votes to one
candidate, you don't actually break Benford's law.

I wonder why this law is used for detecting fraud...

~~~
jgrahamc
It's not clear that it should be used for elections. See for example,
[http://www.jgc.org/blog/2009/06/does-benfords-law-apply-
to-e...](http://www.jgc.org/blog/2009/06/does-benfords-law-apply-to-
election.html)

------
DaniFong
Benford's law is not universal in the same sense that the Pareto and Gaussian
distributions are not universal: blatantly, while at the same time people
believe and treat them to be universal -- often several different universal
distributions by the same people!

~~~
tel
Maybe they're not universal, but each is a strong attractor of models given a
relatively small number of commonly satisfied assumptions.

In particular: dimensioned measurements must represent the same relationships
between data under many different scales thus leading to logarithmic sampling.

Without further information to give expectations to other trends, these laws
great starts.

~~~
DaniFong
If by must you mean often tend to, and by relationships you mean ratios, then
I agree; it's a very useful model in the absence of other information.
Wonderful, in fact.

Still, sociologically, working scientists tend to believe in these models as
if they were hard rules, to the extent of constructing bridges, rockets,
nuclear reactors, nationwide health recommendations and global financial
systems without fundamentally understanding why each distribution might arise,
and why it might fail to explain real phenomena.

------
jparise
There's a really good segment on Benford's Law (and how it's been applied to
business fraud detection) in a recent Radio Lab episode called "Numbers":

<http://blogs.wnyc.org/radiolab/2009/11/30/numbers/>

------
Fixnum
Without taking the time to read the rest of the book, I am suspicious of the
author's claims that he has really solved the problem. Consider:

\- Wolfram Mathworld references this topic and claims Benford's Law was put on
a rigorous footing in 1998 (<http://mathworld.wolfram.com/BenfordsLaw.html>),
but this is _not even mentioned_ in the book.

\- the author makes "straw-man" type claims that (unnamed) prominent
mathematicians view Benford's law as "paranormal". Also see the last two
paragraphs on the first page of the original article, where the author
dismisses the idea of a "universal distribution", which is used at Mathworld
to give a heuristic derivation (suppose some rule governs this distribution ->
apply scale invariance -> derive needed properties) - it seems like he
misunderstood this.

\- the author claims that he is the first to have solved this mystery, but
doesn't reference any literature since 1976

\- he claims on a blog (<http://www.dsprelated.com/showarticle/55.php>) that
he tried to publish in journals, but was rejected because mathematicians
weren't interested. He then published in a textbook, not even in some sort of
paper. His "proof" is really long, uses very elementary mathematics and
unnecessary computer programs, and refers back to other parts of his book (so
that I don't want to actually try to parse the whole thing and see if I
believe it).

Can anyone confirm or deny my suspicions?

That said, Benford's Law is really cool.

~~~
brg
After reading the chapter and glossing over Hill's paper, I think I can deny
your suspicions.

Ted Hill's papers on Bedford's law are from 1995, and this chapter is from
1997. Wolfram's date is incorrect, although the secondary phenomona that
choosing from a distribution which itself was chosen at random is log normal
was proved at that later time.

The 'straw-man' was misunderstood by you. The author points out in the
conclusion of that paragraph that all of these pseudo-scientific or grandiose
explanations where nonsense.

The 'proof' is really an explanation that the phenomena presents itself to an
easier analysis when viewed in terms of FTs. The computer program is there to
show the reader the 'repeatedly divide by ten' action which is implicitly
going on when we map from the unbounded domain to a small bounded domain.

Its a nice explanation really, but as the problem was solved years earlier and
this analysis isn't given with another application, I can see why it wasn't
accepted for publication in a pure math journal.

~~~
Fixnum
Thanks!

------
orblivion
Ok, so lets say you have a data set that consists of items that tend to cap at
2000. Already, half of all possible numbers begin with 1.

I think 1 is special, in any base, because it's the first digit used when a
new digit gets added. If you're talking about quantities that vary easily by,
say, thousands, once it crosses the 10,000 threshold the first digit changes
much more slowly.

There's more to think about here for sure, though.

~~~
jacquesm
Think of binary: Any number will start with a '1', except for '0'!

~~~
eagleal
Only if you're using a decimal notation for binary systems (think of 1 and 0
of computers).

[Not regarding you comment or this response]

As I have understood, the Law applies only to a logarithmic scale [1 to 2, 2
to 3, 3 to 4, ... to ...]. Look at this pattern graph:

<http://en.wikipedia.org/wiki/File:BenfordDensities.png>

~~~
jacquesm
It works in any base, you can extend Benfords law to other bases easily.

Try it on a set of numbers in base 10, then convert them to base 16 and check
the percentages you get. They still follow the same pattern, but of course
there are more slots and the individual percentages are lower because of that.

------
roundsquare
_For example, the histogram in Fig. 34-2b was generated by taking a large
number of samples from a computer random number generator. These particular
numbers follow a normal distribution with a mean of five and a standard
deviation of three_

Eh? Shouldn't this be a uniform distribution?

~~~
jgrahamc
Yes. It's amusing how this is a prime example of a person 'seeing' a normal
distribution where there is none.

~~~
wjy
The author refers to the distribution of _values_ in the text. That graph is
the distribution of _leading digits_ of those values. It doesn't follow that
the first digits would also be a normal distribution.

------
csallen
34% of the numbers in this article start with 1.

------
crocowhile
Some time ago someone posted this link on HN <http://www.arandomnumber.com/>

I wonder whether number choice will fallo Benford's Law.

~~~
ahlatimer
I have a feeling that would actually show a Gaussian distribution, perhaps
with a peak in the lower digits (1-10 or so). So, you would see more number
start with a 4 or 5 than a 1, or it would at least lack the logarithmic
distribution that Benford's law has.

------
jkincaid
This is awesome. Also, I can't help but wonder if this could be used in some
sort of bar trick to impress members of the opposite sex.

~~~
fleaflicker
Probably not but the birthday problem is always a crowd pleaser

<http://en.wikipedia.org/wiki/Birthday_problem>

------
scythe
Simple trick: come up with "random" numbers, from your own head. Make a list
of them. They're not random, but part of the charm here is noticing that even
you're affected by Benford's Law.

Now, pair them off, in order, and write down the products: you'll notice a
large amount start with 1!

(of course you'll actively avoid this if you're expecting it)

------
anigbrowl
Though a fun article, this post serves better as an introduction to the book
from which it is an extract (already posted today:
<http://news.ycombinator.com/item?id=1076122>)

------
leif
There's a radiolab show (called "Numbers") where they talk about this. Quite
cool, though they don't go in to the reason behind it, which is a little
unsatisfying.

------
papaf
Apparently, Benford rediscovered the law that Simon Newcomb wrote about in
1881:

<http://www.jstor.org/pss/2369148>

~~~
lmkg
It happens.

<http://en.wikipedia.org/wiki/Stiglers_law_of_eponymy>

~~~
gjm11
You're missing an apostrophe. Will HN linkify this correctly?
<http://en.wikipedia.org/wiki/Stiglers_law_of_eponymy> [EDIT: no, it won't,
and I suspect the parent included the apostrophe and HN removed it. So let's
try a different way: <http://en.wikipedia.org/wiki/Stigler%27s_law_of_eponymy>
... yup, that works.]

It's pleasing to note that the law applies to itself; Stigler was not its
discoverer, and of course he was well aware of this when naming it.

------
vlisivka
Linear data is measured using logarithmic scale (..., 0.1, 1, 10, 100, ...).

If you saw logarithmic scale ( <http://www.ieer.org/log.gif> ), you know that
distance between 1 and 2 on logarithmic scale is much wider than between 2 and
3, and so on. So, no surprise, if you seed linear data through logarithmic
(non-linear) filter, then numbers will follow pattern of logarithmic scale.

------
TotlolRon
"If the tool you have is a hammer, make the problem look like a nail."

------
zephjc
silt settles at the bottom of a pond.

