This is simply incorrect. If you assume you really do have 100 000 "characters" ...

demallien · on Aug 12, 2011

Yeah, no. If I have a dictionary of 100000 words, then each word represents about 17 bits of entropy. If I have three words, that makes 3 x 17 = 51 bits of entropy.

xkcdentropy · on Aug 12, 2011

You deny the fact that English text can be attacked separately from your dictionary. English text is very predictable, for example e is much more common and q is almost certainly followed by u.

I'm not making this up on my own either. Please check out http://en.wikipedia.org/wiki/Entropy_%28information_theory%2.... Let me quote the important part: The entropy rate of English text is between 1.0 and 1.5 bits per letter,[1] or as low as 0.6 to 1.3 bits per letter, according to estimates by Shannon based on human experiments.[2]

Consider the following example: You have a wordlist of 100 000 words. It seems only normal that log(100 000)/log(2) is equal to 16.6 bits of entropy. Now consider you take three words out of that list completely at random. You get the words "a no we". Assuming 16.6 bits of entropy per word you do indeed have to search through a space of 49.8 bits but only if you attack that via the dictionary

It is clear that in this case you can do a different attack. Instead of brute forcing the words you can brute force the characters on their own with a search space of a-z and space. This equals log(27^7)/log(2) or 33.2 bits. A lot less than 49.8 bits estimated when only considering a dictionary approach. In reality English text is so predictable that you don't have to search even close to 33.2 bits of entropy if you brute force it with an algorithm that is aware of English text. Assuming Shannon's 1.3 bits per character estimate this password has 9.1 bits of entropy.

I understand that this is an edge case with very short words. But I choose that to try and show that there are other ways to attack the password by using a 27 character dictionary. This is cold hard math and therefore much easier to accept than the magic entropy estimation of englist text. Once you see that this way can reduce your entropy calculation it's not that hard to accept that there might be more ways to reduce the entropy ever further.

demallien · on Aug 12, 2011

1) the predictability of the distribution of characters in the English language has nothing to do with this type of password - the symbols aren't characters, but words. 2) that figure of entropy per character of 1.3 bits per character only applies to English text, and the figure is low because there are a bunch of small words, like "and" and "the" that are regularly repeated. The entropy per character for words containing 6 letters or more, not arranged in sentences is a lot higher, like about double if I recall correctly. So sure, just as I can expect to get brut-forced if I choose a pin of 0000, I can get brute forced if I choose a passphrase of 'and the in'. Good luck forcing "queens examine faulty charges" though.

xkcdentropy · on Aug 13, 2011

I am merely suggesting that the entropy can be less than what is estimated by looking only at the dictionary.

Re 1: words still consist of characters Re 2: Certainly correct, but to ignore the possibility of English words having less entropy than it appears at first is odd given the patterns English words often follow.

I'm interested in reading more about those entropy estimations, can you recall where you read about it? According to Applied Cryptography Shannon states that entropy per letter decreases as the text grows. Shannon estimates 2.3 bits per letter for chunks of 8 letters but it drops down to between 1.3 and 1.5 bits per character for 16 character chunks.

Applied Cryptography cites a paper by Shanon called "Predication and Entropy in Printed English" in the Bell System Technical Journal from 1951. I Have not personally read it yet but will try to find it in the near future.

demallien · on Aug 13, 2011

The mistake you are making is that you keep on wanting to treat a passphrase made up of words as being a 'text' in the sense that Shannon was using, but it is not a text in the Shannon sense of the word.

Shannon was analysing real world messages to arrive at that figure, not a string of random words. Here's a way of thinking it through that should help you see the problem clearly.

Let's imagine that Alice has just used a dictionary to generate a passphrase. Furthermore, let's imagine that the dictionary in question is a collection of 6 character or longer words pulled from "Pride and Prejudice". I'm pretty sure Jane Austen tops the 5000 words needed for such a dictionary.

Now, in the simple example, let's imagine that the passphrase is one word long. Bob is an attacker that knows the dictionary, He will guess the word in a maximum of 5000 tries. After 2500 he will have a 50% chance of having found the word.

Charlie, a second attacker, isn't going to use words as symbols, he's going to try and brute-force the word just by throwing random characters at the problem. He doesn't know the length of the word, so he's going to have to try all lengths of the word starting from one letter and working up. I trust that you can see that Charlie is going to need a lot more than 5000 guesses to find the passphrase.

David is a bit smarter than Charlie. He decides to use a Markov Chain of 3 character length to generate his guesses, so that the generated passphrases start resembling English words. The Markov chain was trained on text from the Sydney Morning Herald. David is going to do a lot better than Charlie, but that Markov Chain has to be capable of generating all of the words in my original dictionary, plus a bunch of other words, plus a bunch of gibberish non-words. He's clearly going to take more attempts than 5000 to find Alice's passphrase.

Evan is smarter still. He trains his Markov Chain using only words that are 6 characters or longer, and furthermore, he increases the chain length to 6 characters. He also teaches his Markov Chain that word seperators exist, and the Markov Chain generator is reset when a new word is started. Now we're talking - Evan's system will produce very few strings that are not actual words, but it will generate a bunch of words that were in the SMH, but not in "Pride And Prejudice", so he's going to still need more than 5000 guesses to be sure that he's found Alice's passphrase, so he still hasn't done better than treating the words as symbols.

There is one Markov Chain attack that gets nearly equal performance. Take Alice's dictionary, use it to train a Markhov Chain generator that knows about word separators, and that knows that the words don't have any probability links between them. But now your Markov Chain generator has just become a fancy way of picking words out of the dictionary. In other words, it has degenerated to a dictionary attack, not a brute force attack, and a dictionary attack is just another way of saying "treat the words as symbols" which means we're back to my original entropy calculations as being the optimal way of determining the entropy of the passphrase.

As I said at the start, your error is that you're trying to use randomly picked dictionary words as an English text in the Shannon sense of the word. Hopefully the example scenario that I just ran through will help you see why the distinction is important.

xkcdentropy · on Aug 13, 2011

I understand randomly selected words are not the same as the English text Shannon is talking about. However my point is that entropy may be lower than what it appears to be. I'm not saying it is 0.6 bits per character (or any other number). I'm saying that unless you words are long enough it is very likely that the entropy is lower than simply taking log(dictionary_size^word_per_passphrase)/log(2) The references to Shannon and Applied Cryptography were only made as supporting evidence that entropy of English text is lower than what it appears. I'm not claiming their exact numbers apply in this situation.

If we can't agree on that, the we must simply agree to disagree.

demallien · on Aug 13, 2011

* I'm not claiming their exact numbers apply in this situation.*

You see, the thing is, you kind of are making that claim. If entropy goes out to 3 bits per character, the one feeble point that you did have gets blown out of the water. The fact that you don't seem to understand that indicates that you really need to go back and reread a few books on Information Theory.

Look, for your own edification, try and come up with a scheme that will reliably beat a dictionary attack in terms of the number of attempts needed before finding a password taken from the dictionary. You could even write a simple program to test the idea. Take a dictionary of 5000 words with a length of 6 characters or longer (much shorter words than what your theory suggests should be secure). Select 100 words from the dictionary at random, and then run any attack of your devising against those words. If you can reliably get a better average number of attempts than 2500, I'll concede the point.

Until then, I'm here to tell you that you don't understand information theory as well as you seem to think you do.