Cheers :) See? I knew there was some math. So how do you feel that compares to a...

teddyh · on Sept 23, 2017

Detecting the randomness of a user-generated password is like detecting randomness in general; it can’t be done¹. Is a number like 392872956 random, or is it derived by using some obscure but guessable procedure? You can’t know just by looking at the number. Even if a user thinks they are choosing randomly, subconscious biases are very powerful. The same principle applies to word and character based passwords, so the only safe course is to assume that anything chosen by a user directly is not random at all.

1. http://dilbert.com/strip/2001-10-25

grzm · on Sept 23, 2017

Sure. So is there nothing to my intuition above? If you were to have users choose between (a) and (b) above, is (b) generally safer than (a)? Much safer? Only marginally so? When using a password manager that presents 10 passwords, should I always choose the first one to remove my choice from the equation? Are those few bits I've removed that important, given that the entire set is random?

I'm not trying to catch you out here. I'm trying to see how far my intuition works in this case and how to read you note in the context of the rest of what you've said.

teddyh · on Sept 23, 2017

A user-chosen password have exactly 0 bits of guaranteed randomness. A randomly generated password has X bits of randomness, and a list of Y passwords of X bits each, where the user is allowed to choose exactly one of the passwords, has exactly X−(log2(Y)) bits.

So, to answer your questions: Your intuition is correct – since user-chosen passwords do not contain any guaranteed randomness, generated passwords are better. How much better depends on the values of X and Y in the formula above. The value of X can only strictly speaking be said to depend on the generating algorithm for the passwords, and not any specific value like length or presence of special characters, etc. Yes, I try to always force myself to choose the first one of generated passwords if many are available. The importance of doing that, i.e. preserving those bits, depends of the size of X; a large value of X might stand to lose log2(Y) bits without any real downside.

The default pwgen(1) password algorithm appears to generate a display of 8 columns by 20 lines of passwords, each 8 characters long, like so:

    Uvee5exo aiXae6mi OoR5eiph thoo1Mo3 Ac0quiep woo5Ing7 uh2AiXei poh1Aigh
    ab1Mayai aeHaing4 eip0Wae1 Ho0jaeku Ahxah4Ec Kei4daez Gohmaib6 Chisaib3
    eiphim5U jiepai8C aeXohN3u SeiDahy2 cee9oiVu kei1Eel2 foht6iuY Kievei6o
    Eequ6Aeb eeng9wuS Kog6cie3 sapi7ooP ek9Aitie ohX6eese Eez5oth8 evaeL3oo
    gae1caeF io8EiNga ceaxaY6t eiZ1Lee1 Wagh2Bee maPh0een zoBi0Pee Kou8iel9
    ahj7Ooph eB9beGhe MieV6pe1 loGhae0F ughueTh1 eBohHae2 Eiv1aaQu ahRohv7b
    Iehoo7qu Ga6Buwuh We0UK9Ee gu8ahSoh Ahn2ash8 pee7Airo ey1Faish aeFaiQu1
    Einge6ai vi6uWeir eine8ooK Bae0lugh hewu5Hol hohd1nuH ohn2aeVa nei3oo4L
    Oob6aira Aij4Gila hieNgih7 Ax5iej7O lohLood6 thoo2ahG Thie6aeh Cee7Aajo
    zoot0Ief VaeN4uL5 SaiLa6ie Fii8Xeer uPhoo7os Iew7roh8 Kootu6ei Ohngue7e
    xah4aiPh OVeiT0th Ca3ohjae uiCohs0N Quei9eet Xoh5oobo eicaRae2 ahp1Joom
    Eequeer5 deiZ5uZa ApooSah4 Ca2wuale Xei1aifa qua1jooR oo9haiJo ie2rei2K
    sah4Kai7 Aiphoos3 Di7naip5 uo4sooG3 Aiw7luph ooL6xir0 seo2ooBo shib8eeL
    aem7kieJ aphei9Ie uo1ohF9A choh4Noo EijuF5Uy DohmieJ8 op5cieSh Barauk1o
    EePhi2el oFabee9i AiGhoP8G yaeZa6ah ca6ooTh8 Houc2ro4 Pi9phee5 Ahng1ief
    Eew2Eewu Vu3Wahm6 niep7Wei Gezai2no loR7noh5 aiph0aeT eiW2ap7o aiD6MeSu
    ahgh5Uaf ahse4Aid Yaenei5t ooV4mooc HauYey3r pho1uSah uZuy8fie aiTiek8B
    osh8Chae ee1Ju2Uo eet4Xo4U cheaw6Ee Ri2eoyei eesooh7X du3Pee0a hi8chohV
    ung6Ju7u thahMai1 Cho5ahs0 beipam6A ooSeich0 pohx5Eiy Iene0me8 eBo7aegi
    ohn6uaT7 iami8Aef Nooh6yai vaPhae7u aipai6Oe yaiPh0ue apohSh7i aiNgu8zo

All the characters in each password are lower case letters a through z, except one, which is always a digit, and one other, which is an upper case character, A through Z.

These assumptions give us all the information we need to calculate the actual number of guaranteed random bits in a password chosen from this output. There are 7 letters in a password, each a-z, which gives 26⁷ combinations. Then one of the 7 characters is made upper case, which multiplies the number of possible passwords by 7. Then a random digit (0-9) is inserted in a random place (1-8), which multiplies it again with 10 and 8, respectively. The resulting number is

26⁷×7×10×8 = 4497813698560

Now, 4497813698560 possible passwords is equal to log2(4497813698560) bits; i.e. 42.03236104393261 bits.

The number of password choices is 8×20; i.e. 160 different passwords. Our formula above thus gives us

log2(26⁷×7×10×8)−log2(8×20) = 34.71043294904525 bits of randomness if the default options for pwgen(1) is used, and one of the displayed passwords is chosen by a user.

Now, whether 34.7 bits or 42 bits is to be considered high or low is not my area of expertise, and I am given to understand that this changes rapidly over time as computing technology advances.

quesera · on Sept 23, 2017

FWIW, I see several examples with two numbers and up to four uppercase letters. There's a clear bias toward lowercase letters though.

teddyh · on Sept 24, 2017

You’re right. Looking at the source code (https://github.com/tytso/pwgen/blob/master/pw_phonemes.c#L59), the algorithm seems to be rather complicated, so I can’t say what the exact number of bits is. But we could certainly calculate an upper bound:

7 letters a-z which are either upper or lower case, plus an unknown digit at an unknown location, gives:

(26+26)⁷×10×8 = 82245736202240 possible passwords, giving log2(82245736202240) = 46.225006121875005 bits. Subtracting the bits for the 8×20 choices of passwords gives

log2((26+26)⁷×10×8)−log2(8×20) = 38.90307802698764 bits as an upper bound of the security of a password chosen by a user from the default output of pwgen(1). This is a bit more than the 34.7 bits I first thought it was, but not much more. And this is an upper bound; since I can see that the source code does not choose each character completely randomly and does, as you say, seem to prefer lower case letters, the correct number of bits is guaranteed to be lower than 38.9.

grzm · on Sept 23, 2017

That's consistent with what I was thinking. Thanks!