I do exactly this. About 4-5 characters in the support person interrupts me with "yeah, whatever".
The entire security question situation makes me incredibly pessimistic that we will ever get good security. The idea of security questions is so mind numbingly stupid to me yet it's widely used. One would have thought that after the Sarah Palin hack years ago everyone would have realised that but it seems like nobody did. The support agent didn't see my security question and go "oh that's clever". That's despite him being a person who deals with these all day they should realise the overwhelming stupidity.
In a sane world companies who tell their users to use special characters etc. in their passwords and rotate them but then encourage them to mess it all up by storing information from their Facebook page ad a replacement for the password should have to pay massive fines. Yet hardly anybody is even seeing a problem with this.
This situation to me is so demotivating because it makes me think that whatever security mechanism we come up with well meaning people will undermine it.
Four to five characters is probably enough for their threat model though?
The only way I can think of that somebody could steal only the first few characters of your security answer is by looking over your shoulder at a very unfortunate time. That seems unlikely, and most of the questions they use are predictable from the first few characters when answered genuinely anyway (surnames, car names, streets and towns).
It's not about what you say, it's about what an attacker can get away with saying. And they can almost certainly get away with "I just mash the keyboard."
Ah, I see what you mean. Perhaps instead of grabbing a handful of characters from /dev/urandom, you generate a passphrase (a few random dictionary words)?
Been doing this for several years and prefer this method. I also try to reduce the number of times I use a particular security question. However, I don't think the problem comes from what questions you use or what answers you provide. It becomes like others have pointed out, a problem of what a hacker can get away with answering when asked by a phone representative. Although, I do think this approach provides a little more security than just answering the "what city were you born in" question with the correct answer on every site.
I would definitely be weary of using the same answer in multiple places. Even more so than with passwords. These stupid answers clearly get stored unhashed (how else would they be verified via phone?). Do if the system gets compromised the attacker now has your security question response for multiple targets.
Other than being pronounceable I see the exact same requirements for security questions as for passwords. If anything they need to be stronger.
I like the appeal (and the book) but I recall, when researching diceware, reading that this is a terrible idea in practice since the entropy is lowered dramatically by using natural language that's already in the public record. Even if they can't put every printed phrase into a lookup table, the probability of certain words following others wrecks the entropy.
Indeed, but for the attack discussed here (someone calls support and pretends they're you) you don't need that much entropy, as you can't test different phrases quickly.
You just need a larger number of random words to reach the same entropy as random passwords. It's not like your random password is made up from secret alphabets!
You seem to be misunderstanding how diceware works. You randomly generate numbers by throwing dice. Every five rolls indexes exactly one "diceware" word. So even if an attacker knew we were using diceware, each word contains
log2(6^5) = log2(7776) ≈ 12.9 bits
of entropy. If you want 128 bits of etropy in your security question field, then just randomly generate 10 diceware words. This is comparable to choosing 20 random printable ascii characters or so.
Since we pick the words by literally throwing dice, English grammar has nothing to do with it.
Median novel has some 65k words. Take all (consecutive) quotes of 2 to 24 words, and you have some 1.5m phrases. Take the top 666k books (apparently there've been about 130m titles been published in total, about 5m in the Amazon Kindle store), and you're at about 1e12 phrases, or 40 bits of entropy, or worse than a password with 7 random letters/digits/symbols.
You could probably improve on it considerably by selecting fewer books, and only taking quotes starting at some punctuation mark.
For a naturally throttled attack like here (on the phone) that's fine, but for an offline attack (where the attacker has access to the password hash) that can be cracked within days.
Necronomicon quote? Nice. This has me thinking about what I can do to make my security answers to security questions untethered from PII. A book quote is a really good idea.
I'm guessing that having every book loaded into a password cracking database, subdivided and indexed by each leading phrase word, is still computationally infeasible for non-government actors.
If I walk into a library, pick a floor, aisle, shelf, book, and page at random (just walk, don't think about it), and use a phrase that is a minimum of 12 words long -- is that more random than what I presume happened here, where someone knew that their target liked that style of poetry and was able to concentrate their search on that genre? ( a "crib" in Bletchley Park terms)
The comments about English grammar are correct - classes of words (nouns, verbs, adverbs, etc) do fall in certain positional order and frequency analysis becomes important. A brute-force attacker would have to work through four types of passwords - the commonly used passwords like "12345" and "letmein", language-based phrases (like my not-great idea), language-based phrases with letter substitution (leet-speak, etc), and then truly random letter sequences.
What's happening is that people collect endless phrases and alter them with a ton of standard manipulation schemes, compute the corresponding private and public keys & addresses for all the variations, create a lookup table for the addresses and private keys, and as soon as they see a known keypair in use then they use the corresponding private key to swipe the funds.
See my comment above - unless I'm mistaken, taking all 2 to 24 word quotes from the most popular 1 million novels gives you about 40 bits of entropy (less than a password of length 7), and can easily be stored on one hard drive. In other words, feasible even for some script kiddie in mom's basement.
No need to have every book loaded, only the top 50000~ read by people who would use that method of passphrase generation should work fine (and be feasible for almost everyone). Cryptonomicon would probably be in that list.
Nope. If a phrase from literature is “memorable”, it’s guessable.
The logic of passwords is simple, once you realize that all humans are terrible random number generators.
When you allow any part of your password to be chosen by a human, i.e. yourself, you have to assume that the human-chosen part is known to an attacker. The solution is to generate passwords with enough random bits to satisfy current demands. And by “generate” I of course mean to allow a real number generator (either a computer, or dice, or anything really random; i.e. something a casino would accept) to choose the password for you. Without any restrictions except a desire to minimize length, you get the classic unmemorable 0vT2GVlncZ4pZ0Ps-style passwords. If you add the restriction “must be a sequence of english words”, you get xkcd-style “correct horse battery staple” passwords. Both are fine, since they contain enough randomness not generated by a human.
But if you yourself choose, either old-style “Tr0ub4dor&3” or passphrase “now is the time for all good men”-style, you have utterly lost, since nothing has been randomly chosen, and “What one man can invent, another can discover.”.
Note: this also applies if you run a password generator and choose a generated one that you like. Since you have introduced choice, you have tainted the process, and your password now follows an unknown number of intuitive rules (for instance, there was a story here on HN some time ago about how people prefer the letters in their own name over other letters of the alphabet), and these rules can be exploited by an attacker.
> this also applies if you run a password generator and choose a generated one that you like.
I'm sure there's some math that could be applied here to determine how much a user selecting from one of n generated passwords. Human intuition in cases like this can often be wrong as human psychology hasn't evolved to solve problems like this, so please correct me if I'm wrong, but mine tells me that a user choosing a password from whole cloth has much less entropy when the user is taken into account than a user choosing a password from a small set of those generated with high entropy.
While the latter is less than leaving it up to be chosen purely at random, I think it's much closer to pure random than it is than from the one that's created by the human. It's likely not your intent, but your note comes across as not acknowledging this. Am I reading it wrong? Or are my intuitions wrong? If one were to choose between (a) human generated or (b) human chosen from a set of non-human generated, how much stronger do you think (b) is than (a), and how much weaker is (b) compared to (c) randomly chosen from non-human generated?
That’s easy to calculate. If you generate, say 4 password of 32 bits of randomness each, and you pick one of them, you must assume that the 32-bit password you chose has 30 bits of randomness, since your choice between 4 options has 2 bits of information in it.
Detecting the randomness of a user-generated password is like detecting randomness in general; it can’t be done¹. Is a number like 392872956 random, or is it derived by using some obscure but guessable procedure? You can’t know just by looking at the number. Even if a user thinks they are choosing randomly, subconscious biases are very powerful. The same principle applies to word and character based passwords, so the only safe course is to assume that anything chosen by a user directly is not random at all.
Sure. So is there nothing to my intuition above? If you were to have users choose between (a) and (b) above, is (b) generally safer than (a)? Much safer? Only marginally so? When using a password manager that presents 10 passwords, should I always choose the first one to remove my choice from the equation? Are those few bits I've removed that important, given that the entire set is random?
I'm not trying to catch you out here. I'm trying to see how far my intuition works in this case and how to read you note in the context of the rest of what you've said.
A user-chosen password have exactly 0 bits of guaranteed randomness. A randomly generated password has X bits of randomness, and a list of Y passwords of X bits each, where the user is allowed to choose exactly one of the passwords, has exactly X−(log2(Y)) bits.
So, to answer your questions: Your intuition is correct – since user-chosen passwords do not contain any guaranteed randomness, generated passwords are better. How much better depends on the values of X and Y in the formula above. The value of X can only strictly speaking be said to depend on the generating algorithm for the passwords, and not any specific value like length or presence of special characters, etc. Yes, I try to always force myself to choose the first one of generated passwords if many are available. The importance of doing that, i.e. preserving those bits, depends of the size of X; a large value of X might stand to lose log2(Y) bits without any real downside.
The default pwgen(1) password algorithm appears to generate a display of 8 columns by 20 lines of passwords, each 8 characters long, like so:
All the characters in each password are lower case letters a through z, except one, which is always a digit, and one other, which is an upper case character, A through Z.
These assumptions give us all the information we need to calculate the actual number of guaranteed random bits in a password chosen from this output. There are 7 letters in a password, each a-z, which gives 26⁷ combinations. Then one of the 7 characters is made upper case, which multiplies the number of possible passwords by 7. Then a random digit (0-9) is inserted in a random place (1-8), which multiplies it again with 10 and 8, respectively. The resulting number is
26⁷×7×10×8 = 4497813698560
Now, 4497813698560 possible passwords is equal to log2(4497813698560) bits; i.e. 42.03236104393261 bits.
The number of password choices is 8×20; i.e. 160 different passwords. Our formula above thus gives us
log2(26⁷×7×10×8)−log2(8×20) = 34.71043294904525 bits of randomness if the default options for pwgen(1) is used, and one of the displayed passwords is chosen by a user.
Now, whether 34.7 bits or 42 bits is to be considered high or low is not my area of expertise, and I am given to understand that this changes rapidly over time as computing technology advances.
You’re right. Looking at the source code (https://github.com/tytso/pwgen/blob/master/pw_phonemes.c#L59), the algorithm seems to be rather complicated, so I can’t say what the exact number of bits is. But we could certainly calculate an upper bound:
7 letters a-z which are either upper or lower case, plus an unknown digit at an unknown location, gives:
(26+26)⁷×10×8 = 82245736202240 possible passwords, giving log2(82245736202240) = 46.225006121875005 bits. Subtracting the bits for the 8×20 choices of passwords gives
log2((26+26)⁷×10×8)−log2(8×20) = 38.90307802698764 bits as an upper bound of the security of a password chosen by a user from the default output of pwgen(1). This is a bit more than the 34.7 bits I first thought it was, but not much more. And this is an upper bound; since I can see that the source code does not choose each character completely randomly and does, as you say, seem to prefer lower case letters, the correct number of bits is guaranteed to be lower than 38.9.
I have no idea who is downvoting you; this is perfectly correct. In fact, one of the (minor) plot points in the quoted book is a cyphertext getting broken because the person generating one time pad keys looks at the letters!
Most likely from a "helpful" CS agent offering up the hint above. "It's really weird" or "I've never seen that one before" or just an odd chuckle. Anything an attacker could use to gain an advantage will be used to compromise you eventually.