
Benford’s law, Zipf’s law, and the Pareto distribution (2009) - dpflan
https://terrytao.wordpress.com/2009/07/03/benfords-law-zipfs-law-and-the-pareto-distribution/
======
ccvannorman
>>Being empirically observed phenomena rather than abstract mathematical
facts, Benford’s law, Zipf’s law, and the Pareto distribution cannot be
“proved” the same way a mathematical theorem can be proved. However, one can
still support these laws mathematically in a number of ways, for instance
showing how these laws are compatible with each other, and with other
plausible hypotheses on the source of the data.

Interesting that some the most reliable and interconnected observed emergent
behaviors in mathematics do not have an actual hard-and-fast proof, but merely
a lot of "evidence".

Is there such a proof? If so, would it be enormous like "Classification
Theorem of Finite Groups" or "Erdos discrepancy problem"?

Or are such things purely outside the range of mathematical language, but can
only be talked about in terms of observational data?

~~~
Terr_
I'm pretty sure someone could (and has) proven Benford's law as an outgrowth
of (A) random probability and (B) the design of our number systems.

I mean, it's basically like throwing darts at logarithmic graph paper: You're
more likely to hit a "ones" box than a "nines" box.

~~~
BenoitEssiambre
Exactly. A reasonable uninformative prior for scale parameters is Jeffrey's
prior 1/x equivalent to log(x) being flat. These patterns arise from this
fundamental prior.

The same way the lack of knowledge of some phenomenons can be described as a
flat probability distribution making every point as likely as the other, some
other phenomenons are better described as log(x) being flat.

For the first case of flat x, it means that adding or subtracting a constant
to the variable doesn't affect the probability.

For the flat log(x), it means multiplying or dividing by a constant always has
the same effect. For example, something twice as big is always twice as
unlikely, half as big twice as likely no matter your starting point.

For "scale" parameters that can't go into the negatives (you can't have a
negative size) the flat log(x) probability distribution is very natural to
represent lack of knowledge. I think it is the maximum entropy distribution
under reasonable assumptions. For "location" parameters that go from -∞ to ∞,
a flat x "Uniform distribution" is more natural. Few people question the flat
x prior. It seems somehow much more intuitive.

Note that these are "improper" distributions that don't integrate to 1 but to
infinity.

