Hacker News new | comments | show | ask | jobs | submit login

Ok, so lets say you have a data set that consists of items that tend to cap at 2000. Already, half of all possible numbers begin with 1.

I think 1 is special, in any base, because it's the first digit used when a new digit gets added. If you're talking about quantities that vary easily by, say, thousands, once it crosses the 10,000 threshold the first digit changes much more slowly.

There's more to think about here for sure, though.

Suppose you vary the unit of measure. If the unit of measure is picked in the Benford's law distribution, then the expected leading digit of any value you give me will follow Benford's law.

Think of binary: Any number will start with a '1', except for '0'!

Only if you're using a decimal notation for binary systems (think of 1 and 0 of computers).

[Not regarding you comment or this response]

As I have understood, the Law applies only to a logarithmic scale [1 to 2, 2 to 3, 3 to 4, ... to ...]. Look at this pattern graph:


It works in any base, you can extend Benfords law to other bases easily.

Try it on a set of numbers in base 10, then convert them to base 16 and check the percentages you get. They still follow the same pattern, but of course there are more slots and the individual percentages are lower because of that.

That doesn't explain base invariance

Definitely does. Whether I'm switching from 9 to 10 or 7 to 10, there's a border. from 0 to 17 in base 8, half of the numbers still begin with 1.

Applications are open for YC Summer 2018

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact