The context is randomly generated passwords, so dictionary attacks (or other attacks that look at the plaintext from a Huffman encoding perspective) aren't really relevant.
A 10 character password (if randomly chosen from the same character set) has 10^17 possibly combinations (about 4,000x more), and 59.4 bits of entropy, 11.8 bits more. 2^11 = 2048.
In the context of randomly generated passwords, it's absolutely ok to think about it in terms of the logarithmic relationship between 1) entropy per symbol times number of symbols and 2) strength of the password.
He said 10% stronger (which I took to mean 10% more entropy), not 10% more time to crack.
> He said 10% stronger (which I took to mean 10% more entropy), not 10% more time to crack.
Hence the problem?
Yes, measuring "strength" by "bits of entropy" is technically correct (the best kind of correct...).
It's also exponentially misleading... possibly the worst kind of misleading?
Just look at the question: "Is there even a reason to include special characters in passwords? They add 10% more to security...". I don't know about you, but to me doesn't really portray an understanding of the fact that it takes twenty-five times longer to crack such a password for merely 8 characters, not merely 10%.
I mean, counting in entropy with the knowledge that the applied effects can be logarithmic is the standard way of discussing such matters. It's sort of the basis for the information theory that's underneath this type of work.
Edit: And the point of his argument is that more symbols of a smaller corpus of symbols can be equivalent if the entropy is equivalent.
I randomly generated an 8 character alphabetical (all lower case) password "jraxxhwr". According to keepass it has 32 bits of entropy, but the entropy should be 26^8 = 37.6 bits because the search space is all 8 character letter permutations. There's no way you can reduce the search space from 37.6 bits to 32 bits unless you have an oracle that says which characters I used.
It does make sense, because the keepass entropy estimate presumably (like the excellent zxcvbn) tries to approximate the empirical distribution, not the theoretical uniform one.
In theory, "68703649" and "12345678" are equally likely to be pulled from the hat, but in practice one is a much better password than the other. You can reduce the search space by trying the passwords with higher (empirical) probability first.
Thanks. I've looked at the code, and it does not seem to try to estimate the empirical distribution (doesn't appear to be using dictionaries, for examples).
Then the discrepancy maybe comes from the number of glyphs within certain categories, or their repetition?
That's not how this works. By your logic having a password consisting of 1,2,3,4 is only twice as secure as having just 1,2.