Hacker News new | past | comments | ask | show | jobs | submit login
Benford’s law, Zipf’s law, and the Pareto distribution (2009) (terrytao.wordpress.com)
30 points by dpflan on Jan 9, 2016 | hide | past | favorite | 4 comments



>>Being empirically observed phenomena rather than abstract mathematical facts, Benford’s law, Zipf’s law, and the Pareto distribution cannot be “proved” the same way a mathematical theorem can be proved. However, one can still support these laws mathematically in a number of ways, for instance showing how these laws are compatible with each other, and with other plausible hypotheses on the source of the data.

Interesting that some the most reliable and interconnected observed emergent behaviors in mathematics do not have an actual hard-and-fast proof, but merely a lot of "evidence".

Is there such a proof? If so, would it be enormous like "Classification Theorem of Finite Groups" or "Erdos discrepancy problem"?

Or are such things purely outside the range of mathematical language, but can only be talked about in terms of observational data?


I'm pretty sure someone could (and has) proven Benford's law as an outgrowth of (A) random probability and (B) the design of our number systems.

I mean, it's basically like throwing darts at logarithmic graph paper: You're more likely to hit a "ones" box than a "nines" box.


Exactly. A reasonable uninformative prior for scale parameters is Jeffrey's prior 1/x equivalent to log(x) being flat. These patterns arise from this fundamental prior.

The same way the lack of knowledge of some phenomenons can be described as a flat probability distribution making every point as likely as the other, some other phenomenons are better described as log(x) being flat.

For the first case of flat x, it means that adding or subtracting a constant to the variable doesn't affect the probability.

For the flat log(x), it means multiplying or dividing by a constant always has the same effect. For example, something twice as big is always twice as unlikely, half as big twice as likely no matter your starting point.

For "scale" parameters that can't go into the negatives (you can't have a negative size) the flat log(x) probability distribution is very natural to represent lack of knowledge. I think it is the maximum entropy distribution under reasonable assumptions. For "location" parameters that go from -∞ to ∞, a flat x "Uniform distribution" is more natural. Few people question the flat x prior. It seems somehow much more intuitive.

Note that these are "improper" distributions that don't integrate to 1 but to infinity.


You can't actually get causal proof though right? I mean we can prove that probability should cause a phenomena identical to Benford's law, but maybe it actually happening is just a fluke! A few trillion pieces of evidence isn't proof after all.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: