Hacker News new | past | comments | ask | show | jobs | submit login

I support the conclusion of this XKCD comic, but the math always seemed off to me.

Lets say your dictionary has 100,000 words[0], and your attacker has access to the same list. If the attacker knows that you have chosen four words off of that list, he still only has a 1/4,166,416,671,249,975,000 chance of guessing the right permutation (not combination!). That's less than 2^61, which is certainly very secure.

However, the entropy calculation in the XKCD comic assumes that the characters are uncorrelated with each other, the way they would be if you used a random sequence of characters as your password.

(Of course, this assumes that you choose the words truly (pseudo-)randomly, and not "cherry-picking" permutations that are easy to remember.)

[0] Not unreasonable - /usr/share/dict/words on Ubuntu has over twice as many.




> However, the entropy calculation in the XKCD comic assumes that the characters are uncorrelated with each other, the way they would be if you used a totally random password.

No. A word picked from 2048 word dictionary has 11 bits of entropy, that is where XKCD gets its 44 bits of entropy for four words.


What common user do you know of that has a 100,000 word vocabulary? My point being that a list of the 10,000 most commonly used words would most likely be sufficient to cover most combinations of pass phrases.


It would seem that word length is important too. Between 5 and 8 characters. Wouldn't that drop in half the size of the vocabulary?


No, word length has (almost) nothing to do with it.

XKCD assumes each word has, regardless of length, 11 bits of entropy. It implies that you are picking up each word out of a dictionary of the 2^11 (2048) most common, non-trivial[1] words. And truly, example words are common: correct, horse, battery, staple.

Contrast this with the "classic" example. You pick a single base word from a larger list (16 bits of entropy == 64K-word dictionary) of longer, more complex (troubadour, 10 letter long) words, and then subject it to a number of transformations to pump its entropy another 12 bits.

The key insight of this piece is that attackers have moved over to techniques that make password length a poor estimator of its entropy level. It is the rarity of the base word that makes the lion's share of a password entropy, with length adding marginal improvements, mostly from the increased chances to pack more transformations into it.

This gets lost on the discussion of the comic's main thesis and less subtle insight that it is easier to add entropy by increasing the number of base words than by adding transformations to a single base word.

[1] I am removing trivial words of length < 4 because if you choose from them, you may end up with a password with length between 4 and 12, which may be brute-forced without regard for dictionary attacks now or in the near future. Shortest word in the provided example is "horse" which is weak evidence in favor of this hypothesis.


If you compare the number of possibilities per word versus the number of possibilities per character (94 on the commonly used ASCII spectrum for US keyboards), the benefits are clear.

That's not to say that it's impenetrable. It's just making it less convenient to crack which seems to be the name of the security game.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: