You don't have to say "oh I just mash the keyboard for those", you can say "it's...

ajmurmann · on Sept 23, 2017

I do exactly this. About 4-5 characters in the support person interrupts me with "yeah, whatever".

The entire security question situation makes me incredibly pessimistic that we will ever get good security. The idea of security questions is so mind numbingly stupid to me yet it's widely used. One would have thought that after the Sarah Palin hack years ago everyone would have realised that but it seems like nobody did. The support agent didn't see my security question and go "oh that's clever". That's despite him being a person who deals with these all day they should realise the overwhelming stupidity. In a sane world companies who tell their users to use special characters etc. in their passwords and rotate them but then encourage them to mess it all up by storing information from their Facebook page ad a replacement for the password should have to pay massive fines. Yet hardly anybody is even seeing a problem with this.

This situation to me is so demotivating because it makes me think that whatever security mechanism we come up with well meaning people will undermine it.

smelendez · on Sept 23, 2017

Four to five characters is probably enough for their threat model though?

The only way I can think of that somebody could steal only the first few characters of your security answer is by looking over your shoulder at a very unfortunate time. That seems unlikely, and most of the questions they use are predictable from the first few characters when answered genuinely anyway (surnames, car names, streets and towns).

maxerickson · on Sept 23, 2017

The quote is an attacker attempting to bypass the check.

wyager · on Sept 23, 2017

It's not about what you say, it's about what an attacker can get away with saying. And they can almost certainly get away with "I just mash the keyboard."

ddevault · on Sept 23, 2017

Ah, I see what you mean. Perhaps instead of grabbing a handful of characters from /dev/urandom, you generate a passphrase (a few random dictionary words)?

jeffmould · on Sept 23, 2017

Been doing this for several years and prefer this method. I also try to reduce the number of times I use a particular security question. However, I don't think the problem comes from what questions you use or what answers you provide. It becomes like others have pointed out, a problem of what a hacker can get away with answering when asked by a phone representative. Although, I do think this approach provides a little more security than just answering the "what city were you born in" question with the correct answer on every site.

np_tedious · on Sept 23, 2017

Sounds like a "correct battery horse staple" would fit the bill

chiph · on Sept 23, 2017

Or use a memorable phrase from literature.

> This was not the last encounter between Bobby Shaftoe and Goto Dengo

ajmurmann · on Sept 23, 2017

I would definitely be weary of using the same answer in multiple places. Even more so than with passwords. These stupid answers clearly get stored unhashed (how else would they be verified via phone?). Do if the system gets compromised the attacker now has your security question response for multiple targets.

Other than being pronounceable I see the exact same requirements for security questions as for passwords. If anything they need to be stronger.

pault · on Sept 23, 2017

I like the appeal (and the book) but I recall, when researching diceware, reading that this is a terrible idea in practice since the entropy is lowered dramatically by using natural language that's already in the public record. Even if they can't put every printed phrase into a lookup table, the probability of certain words following others wrecks the entropy.

pxndx · on Sept 23, 2017

Indeed, but for the attack discussed here (someone calls support and pretends they're you) you don't need that much entropy, as you can't test different phrases quickly.

Natanael_L · on Sept 23, 2017

You just need a larger number of random words to reach the same entropy as random passwords. It's not like your random password is made up from secret alphabets!

pault · on Sept 23, 2017

Sentences aren't random.

xelxebar · on Sept 25, 2017

You seem to be misunderstanding how diceware works. You randomly generate numbers by throwing dice. Every five rolls indexes exactly one "diceware" word. So even if an attacker knew we were using diceware, each word contains

    log2(6^5) = log2(7776) ≈ 12.9 bits

of entropy. If you want 128 bits of etropy in your security question field, then just randomly generate 10 diceware words. This is comparable to choosing 20 random printable ascii characters or so.

Since we pick the words by literally throwing dice, English grammar has nothing to do with it.

pault · on Sept 25, 2017

I was responding to someone who was recommending using sentences from books as passwords. Hence the comment about grammar.

FabHK · on Sept 23, 2017

Median novel has some 65k words. Take all (consecutive) quotes of 2 to 24 words, and you have some 1.5m phrases. Take the top 666k books (apparently there've been about 130m titles been published in total, about 5m in the Amazon Kindle store), and you're at about 1e12 phrases, or 40 bits of entropy, or worse than a password with 7 random letters/digits/symbols.

You could probably improve on it considerably by selecting fewer books, and only taking quotes starting at some punctuation mark.

For a naturally throttled attack like here (on the phone) that's fine, but for an offline attack (where the attacker has access to the password hash) that can be cracked within days.

LutfuRadu · on Sept 25, 2017

I am pretty confident that some phrases would repeat.

FabHK · on Sept 25, 2017

True, so even less entropy.

mighty_atomic_c · on Sept 23, 2017

Necronomicon quote? Nice. This has me thinking about what I can do to make my security answers to security questions untethered from PII. A book quote is a really good idea.

chiph · on Sept 23, 2017

Close! Cryptonomicon.

I'm guessing that having every book loaded into a password cracking database, subdivided and indexed by each leading phrase word, is still computationally infeasible for non-government actors.

Natanael_L · on Sept 23, 2017

Bitcoin brain wallets based on obscure Africa poems have been successfully cracked. Don't trust your choice of obscure books to be sufficient.

chiph · on Sept 23, 2017

I need to look into that some.

If I walk into a library, pick a floor, aisle, shelf, book, and page at random (just walk, don't think about it), and use a phrase that is a minimum of 12 words long -- is that more random than what I presume happened here, where someone knew that their target liked that style of poetry and was able to concentrate their search on that genre? ( a "crib" in Bletchley Park terms)

The comments about English grammar are correct - classes of words (nouns, verbs, adverbs, etc) do fall in certain positional order and frequency analysis becomes important. A brute-force attacker would have to work through four types of passwords - the commonly used passwords like "12345" and "letmein", language-based phrases (like my not-great idea), language-based phrases with letter substitution (leet-speak, etc), and then truly random letter sequences.

Natanael_L · on Sept 23, 2017

What's happening is that people collect endless phrases and alter them with a ton of standard manipulation schemes, compute the corresponding private and public keys & addresses for all the variations, create a lookup table for the addresses and private keys, and as soon as they see a known keypair in use then they use the corresponding private key to swipe the funds.

FabHK · on Sept 23, 2017

See my comment above - unless I'm mistaken, taking all 2 to 24 word quotes from the most popular 1 million novels gives you about 40 bits of entropy (less than a password of length 7), and can easily be stored on one hard drive. In other words, feasible even for some script kiddie in mom's basement.

luiscarloscb · on Sept 23, 2017

No need to have every book loaded, only the top 50000~ read by people who would use that method of passphrase generation should work fine (and be feasible for almost everyone). Cryptonomicon would probably be in that list.

teddyh · on Sept 23, 2017

Nope. If a phrase from literature is “memorable”, it’s guessable.

The logic of passwords is simple, once you realize that all humans are terrible random number generators.

When you allow any part of your password to be chosen by a human, i.e. yourself, you have to assume that the human-chosen part is known to an attacker. The solution is to generate passwords with enough random bits to satisfy current demands. And by “generate” I of course mean to allow a real number generator (either a computer, or dice, or anything really random; i.e. something a casino would accept) to choose the password for you. Without any restrictions except a desire to minimize length, you get the classic unmemorable 0vT2GVlncZ4pZ0Ps-style passwords. If you add the restriction “must be a sequence of english words”, you get xkcd-style “correct horse battery staple” passwords. Both are fine, since they contain enough randomness not generated by a human.

But if you yourself choose, either old-style “Tr0ub4dor&3” or passphrase “now is the time for all good men”-style, you have utterly lost, since nothing has been randomly chosen, and “What one man can invent, another can discover.”.

Note: this also applies if you run a password generator and choose a generated one that you like. Since you have introduced choice, you have tainted the process, and your password now follows an unknown number of intuitive rules (for instance, there was a story here on HN some time ago about how people prefer the letters in their own name over other letters of the alphabet), and these rules can be exploited by an attacker.

tony101 · on Sept 23, 2017

Diceware is memorable but not guessable.

Source:

https://en.wikipedia.org/wiki/Diceware

https://www.rempe.us/diceware/#eff

teddyh · on Sept 23, 2017

Agreed, but the context was using memorable phrases from literature, in which case they are guessable. Post edited to clarify.

tony101 · on Sept 24, 2017

Gotcha. Thanks for the clarification.

grzm · on Sept 23, 2017

> this also applies if you run a password generator and choose a generated one that you like.

I'm sure there's some math that could be applied here to determine how much a user selecting from one of n generated passwords. Human intuition in cases like this can often be wrong as human psychology hasn't evolved to solve problems like this, so please correct me if I'm wrong, but mine tells me that a user choosing a password from whole cloth has much less entropy when the user is taken into account than a user choosing a password from a small set of those generated with high entropy.

While the latter is less than leaving it up to be chosen purely at random, I think it's much closer to pure random than it is than from the one that's created by the human. It's likely not your intent, but your note comes across as not acknowledging this. Am I reading it wrong? Or are my intuitions wrong? If one were to choose between (a) human generated or (b) human chosen from a set of non-human generated, how much stronger do you think (b) is than (a), and how much weaker is (b) compared to (c) randomly chosen from non-human generated?

teddyh · on Sept 23, 2017

That’s easy to calculate. If you generate, say 4 password of 32 bits of randomness each, and you pick one of them, you must assume that the 32-bit password you chose has 30 bits of randomness, since your choice between 4 options has 2 bits of information in it.

grzm · on Sept 23, 2017

Cheers :) See? I knew there was some math. So how do you feel that compares to a user-generated password? That's the question I was getting at.

teddyh · on Sept 23, 2017

Detecting the randomness of a user-generated password is like detecting randomness in general; it can’t be done¹. Is a number like 392872956 random, or is it derived by using some obscure but guessable procedure? You can’t know just by looking at the number. Even if a user thinks they are choosing randomly, subconscious biases are very powerful. The same principle applies to word and character based passwords, so the only safe course is to assume that anything chosen by a user directly is not random at all.

1. http://dilbert.com/strip/2001-10-25

grzm · on Sept 23, 2017

Sure. So is there nothing to my intuition above? If you were to have users choose between (a) and (b) above, is (b) generally safer than (a)? Much safer? Only marginally so? When using a password manager that presents 10 passwords, should I always choose the first one to remove my choice from the equation? Are those few bits I've removed that important, given that the entire set is random?

I'm not trying to catch you out here. I'm trying to see how far my intuition works in this case and how to read you note in the context of the rest of what you've said.

teddyh · on Sept 23, 2017

A user-chosen password have exactly 0 bits of guaranteed randomness. A randomly generated password has X bits of randomness, and a list of Y passwords of X bits each, where the user is allowed to choose exactly one of the passwords, has exactly X−(log2(Y)) bits.

So, to answer your questions: Your intuition is correct – since user-chosen passwords do not contain any guaranteed randomness, generated passwords are better. How much better depends on the values of X and Y in the formula above. The value of X can only strictly speaking be said to depend on the generating algorithm for the passwords, and not any specific value like length or presence of special characters, etc. Yes, I try to always force myself to choose the first one of generated passwords if many are available. The importance of doing that, i.e. preserving those bits, depends of the size of X; a large value of X might stand to lose log2(Y) bits without any real downside.

The default pwgen(1) password algorithm appears to generate a display of 8 columns by 20 lines of passwords, each 8 characters long, like so:

    Uvee5exo aiXae6mi OoR5eiph thoo1Mo3 Ac0quiep woo5Ing7 uh2AiXei poh1Aigh
    ab1Mayai aeHaing4 eip0Wae1 Ho0jaeku Ahxah4Ec Kei4daez Gohmaib6 Chisaib3
    eiphim5U jiepai8C aeXohN3u SeiDahy2 cee9oiVu kei1Eel2 foht6iuY Kievei6o
    Eequ6Aeb eeng9wuS Kog6cie3 sapi7ooP ek9Aitie ohX6eese Eez5oth8 evaeL3oo
    gae1caeF io8EiNga ceaxaY6t eiZ1Lee1 Wagh2Bee maPh0een zoBi0Pee Kou8iel9
    ahj7Ooph eB9beGhe MieV6pe1 loGhae0F ughueTh1 eBohHae2 Eiv1aaQu ahRohv7b
    Iehoo7qu Ga6Buwuh We0UK9Ee gu8ahSoh Ahn2ash8 pee7Airo ey1Faish aeFaiQu1
    Einge6ai vi6uWeir eine8ooK Bae0lugh hewu5Hol hohd1nuH ohn2aeVa nei3oo4L
    Oob6aira Aij4Gila hieNgih7 Ax5iej7O lohLood6 thoo2ahG Thie6aeh Cee7Aajo
    zoot0Ief VaeN4uL5 SaiLa6ie Fii8Xeer uPhoo7os Iew7roh8 Kootu6ei Ohngue7e
    xah4aiPh OVeiT0th Ca3ohjae uiCohs0N Quei9eet Xoh5oobo eicaRae2 ahp1Joom
    Eequeer5 deiZ5uZa ApooSah4 Ca2wuale Xei1aifa qua1jooR oo9haiJo ie2rei2K
    sah4Kai7 Aiphoos3 Di7naip5 uo4sooG3 Aiw7luph ooL6xir0 seo2ooBo shib8eeL
    aem7kieJ aphei9Ie uo1ohF9A choh4Noo EijuF5Uy DohmieJ8 op5cieSh Barauk1o
    EePhi2el oFabee9i AiGhoP8G yaeZa6ah ca6ooTh8 Houc2ro4 Pi9phee5 Ahng1ief
    Eew2Eewu Vu3Wahm6 niep7Wei Gezai2no loR7noh5 aiph0aeT eiW2ap7o aiD6MeSu
    ahgh5Uaf ahse4Aid Yaenei5t ooV4mooc HauYey3r pho1uSah uZuy8fie aiTiek8B
    osh8Chae ee1Ju2Uo eet4Xo4U cheaw6Ee Ri2eoyei eesooh7X du3Pee0a hi8chohV
    ung6Ju7u thahMai1 Cho5ahs0 beipam6A ooSeich0 pohx5Eiy Iene0me8 eBo7aegi
    ohn6uaT7 iami8Aef Nooh6yai vaPhae7u aipai6Oe yaiPh0ue apohSh7i aiNgu8zo

All the characters in each password are lower case letters a through z, except one, which is always a digit, and one other, which is an upper case character, A through Z.

These assumptions give us all the information we need to calculate the actual number of guaranteed random bits in a password chosen from this output. There are 7 letters in a password, each a-z, which gives 26⁷ combinations. Then one of the 7 characters is made upper case, which multiplies the number of possible passwords by 7. Then a random digit (0-9) is inserted in a random place (1-8), which multiplies it again with 10 and 8, respectively. The resulting number is

26⁷×7×10×8 = 4497813698560

Now, 4497813698560 possible passwords is equal to log2(4497813698560) bits; i.e. 42.03236104393261 bits.

The number of password choices is 8×20; i.e. 160 different passwords. Our formula above thus gives us

log2(26⁷×7×10×8)−log2(8×20) = 34.71043294904525 bits of randomness if the default options for pwgen(1) is used, and one of the displayed passwords is chosen by a user.

Now, whether 34.7 bits or 42 bits is to be considered high or low is not my area of expertise, and I am given to understand that this changes rapidly over time as computing technology advances.

quesera · on Sept 23, 2017

FWIW, I see several examples with two numbers and up to four uppercase letters. There's a clear bias toward lowercase letters though.

teddyh · on Sept 24, 2017

You’re right. Looking at the source code (https://github.com/tytso/pwgen/blob/master/pw_phonemes.c#L59), the algorithm seems to be rather complicated, so I can’t say what the exact number of bits is. But we could certainly calculate an upper bound:

7 letters a-z which are either upper or lower case, plus an unknown digit at an unknown location, gives:

(26+26)⁷×10×8 = 82245736202240 possible passwords, giving log2(82245736202240) = 46.225006121875005 bits. Subtracting the bits for the 8×20 choices of passwords gives

log2((26+26)⁷×10×8)−log2(8×20) = 38.90307802698764 bits as an upper bound of the security of a password chosen by a user from the default output of pwgen(1). This is a bit more than the 34.7 bits I first thought it was, but not much more. And this is an upper bound; since I can see that the source code does not choose each character completely randomly and does, as you say, seem to prefer lower case letters, the correct number of bits is guaranteed to be lower than 38.9.

grzm · on Sept 23, 2017

That's consistent with what I was thinking. Thanks!

pault · on Sept 23, 2017

I have no idea who is downvoting you; this is perfectly correct. In fact, one of the (minor) plot points in the quoted book is a cyphertext getting broken because the person generating one time pad keys looks at the letters!

oh_sigh · on Sept 23, 2017

How would the attacker know that you mashed the keyboard when answering 'What high school did you go to?' ?

brewdad · on Sept 23, 2017

Most likely from a "helpful" CS agent offering up the hint above. "It's really weird" or "I've never seen that one before" or just an odd chuckle. Anything an attacker could use to gain an advantage will be used to compromise you eventually.

dllthomas · on Sept 23, 2017

Or because you posted about it on HN...

macintux · on Sept 23, 2017

How hard do you think it is to get a bored call center employee to give you enough of a hint to know that it’s random characters?