

Towards Reliable Storage of 56-bit Secrets in Human Memory [pdf] - gwern
http://www.usenix.org/system/files/conference/usenixsecurity14/sec14-paper-bonneau.pdf

======
twotwotwo
Humans being the error-prone creatures we are, I wonder how it could work
better to a longer word-based code, but allow for certain kinds of errors: for
example, 1) word order doesn't matter, 2) it tries to correct a word not in
the built-in word list to the "closest" word in it, 3) maybe you only need,
say, seven of the eight words to be right to get in (then it might remind you
of the real code). You can do a little calculation to see how many words you
need to get to the desired number of bits then.

Pretty different from passwords as we know them (#3 means you can't simply
store a bcrypt/scrypt'd code--you could build something complicated tricks to
square "allow one word wrong" with not storing plain PWs: each word is 10 bits
and what you tell the user is really their secret code + a parity/ECC word,
and correction happens before a convnetional password check--that rabbit hole
goes deep, though.) But if all this gets average folks remembering more
entropy than before, that makes it kind of interesting.

On the study (which I admit just skimming), survey in 2010 suggested about
half of Mechanical Turk users were located in the US ([http://www.behind-the-
enemy-lines.com/2010/03/new-demographi...](http://www.behind-the-enemy-
lines.com/2010/03/new-demographics-of-mechanical-turk.html\)--folks) who
learned English at school would probably do better at memorizing a password in
their native language. (I studied French many years in school, but I doubt I'd
memorize a set of random French words as easily as English words.)

~~~
harshreality
1 and 2 are easy. Sort the words and use levenshtein distance to correct
unrecognized words to the nearest valid word, prior to doing the password
hashing. 3 is slightly more difficult; for 7-of-8 you'd have to store 9
hashes, one for the full list, and 8 more hashes resulting from removing one
of the 8 words. What you gain in user friendliness costs you since you have to
store several times the hashes, and you have to run multiple hash computations
for each attempted login.

It should be noted that 1 and 3 both reduce entropy significantly.

A system that lets users omit one of the words in their passphrase will end up
with users omitting one of the words all the time (why would you type more
than you have to, particularly for passphrases of 20-40 characters?); the
effective security of an 8 word passphrase with that system is worse than if
it was a 7 word passphrase with no leeway for forgetting words. With a strict
comparison but a passphrase of one less word, attacker then has to guess the
right 7 words. By allowing a word to be forgotten, the attacker can guess any
of 8 different 7-word passphrases and get in.

Any system that reminds you of the full passphrase has to store the passphrase
unhashed (or equivalent, like, with each sub-hash, storing a hint for the
missing word), which is unacceptable.

~~~
twotwotwo
Yeah, I did some rough calcs (after my anti-procrastination timer expired and
I couldn't waste more time here, haha). If you have ~1000 words, ~10
bits/word, seven words is 70 bits, and you lose about 12 of those by allowing
reordering, and obviously 10 if you let them flub a word (via a parity word or
storing 8 hashes). Seven words could be 48 bits with both kinds of variation
allowed, or 58 or 60 with only one kind allowed. It might also help users
remember to give them limited choice of code (pick one of 4 or 8 codes, say),
which could cost you up to 2-3 more bits (on the paranoid assumption that
users will pick according to predictable patterns).

How much entropy you target depends on the system. If it needs to work as a
crypto key, then you really need entropy, and good luck to you. (Maybe you
have folks remember a piece and write a piece down, and use an expensive key
scheduley thing to stretch the concatenated result out.) If it's for a login
system where you can rate-limit, then, hey, even 32 bits is a lot if the
attacker has to wait one second between tries.

Just realized, if you use a parity word to recover from the user flubbing one
word in a login system, you don't need to store the cleartext in order to
correct the word they missed--you just redo the parity calculation on the
remaining words to recover it. Yay. Also, to recover after a parity error in a
seven-word phrase, you have to check seven different potential passphrases
(assume word 1 was erased, apply parity to recover it, check, repeat), so I
guess it takes another three bits off your security on top of the 10 of losing
a word.

On users just omitting a word all the time if the system lets them, I think 1)
you require them to enter the full number of words every time, 2) if you see
an error recoverable using parity, first you tell them which word was wrong
and ask them to re-enter it, 3) if they still can't get it, you tell them what
the word was (shoulder-surfer issues obv) and they have to type it. Then you
minimize how much you show, you (try to) keep the user from slowly forgetting
the passphrase one word at a time, and you make sure that entering your whole
passphrase is the fastest way to log in.

Though it's nice what weird stuff you can do while keeping a hashed cred DB,
if it were a login code (not crypto key) and there were big wins to other
aspects of the system to keeping the clear version around (for example, it let
you do some significantly better error-tolerance strategy), I'm not sure you
have quite the same level of obligation around a random token that you give
out specifically for your app as you do around a password the user entrusts
you with that they also use elsewhere.

~~~
darkmighty
One of the greatest problem with login systems currently is not the entropy
but the min-entropy [1] of the passwords. This random keyword approach greatly
reduces this main threat so I'd say It's an acceptable tradeoff to have
entropy for your average user _and_ have a reliable system as you suggested.

I don't think your parity method works for error recovery (it would probably
work if it were a lot more complicated). If you store words are 10-bit values,
each bit is independent from the other. Let's assume if a user makes a mistake
the mistaken word value is uncorrelated with the true one. Then parity will
only allow error _detection_ , and your method of assuming erasures will
simply generate N different correct word hypotheses.

Interestingly, if you don't need permutation reliability (I don't think you'd
need it anyway, people create meaning associating the words in the given
order), you could use an ECC to achieve M-out-of-N passphrase securely,
allowing both erasure and error correction!

For example, with 4096 words, you could do 6-out-of-8 words with two errors
(or 5-out-of-8 with erasures) by splitting the words in 4 3-bit groups and
using a GF(8) Reed-Solomon [2] error correction ([8,5,4] code). The resulting
system would have 5*12 = 60 bits of min-entropy. There's plenty of room to
present some 8 choices of combinations to users and let them pick one, leaving
57 bits.

[1] [http://en.wikipedia.org/wiki/R%C3%A9nyi_entropy#Min-
entropy](http://en.wikipedia.org/wiki/R%C3%A9nyi_entropy#Min-entropy) [2]
[http://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_corr...](http://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction)

~~~
twotwotwo
Getting to this late, but if I hear you right, generating a set of hypotheses
was exactly what I was trying for with the parity. If you see a, say, 6-word
input where the parity doesn't work out, you "erase" each of the six words and
replace it with the parity of the other words, and check if the result is the
user's password. It'd mean the user(/attacker) could make you do a 6 hash
checks each time they tried to log in, so it'd cost you ~13 bits of security
not just 10.

And, yes, min-entropy is a really helpful concept here. Captures well that
nobody's password is a good crypto key, but at the same time it deals with the
attacks trying the really common overused passwords on a lot of accounts.

Very cool that you could tolerate 2 errs/3 erasures and still have 60 bits
with ECC if you trade off ordering. ECC and coding theory is one of those
areas where I wish I had slightly more CS education so I could do more than
just gawk at its power.

------
minopret
For sake of comparison and voice of experience, I actually do use fifty-six
bit secrets for my various needs, as follows. The following secrets are at
least fifty-six bits in size:

* Nine "C isgraph" characters chosen uniformly at random. Those are lowercase, uppercase, digits, and 32 more typewriter symbols. This is the kind of password that I actually use for most systems where I frequently need to type the password myself. I am not making any claims about which kind of password everyone should use.

* Ten alphanumeric characters chosen uniformly at random. Those are lowercase, uppercase, and digits. This is the kind of password that I use for systems where I may need to read someone the password over wired telephone service. That service has the advantage, as I understand, that eavesdropping on it is a Federal crime in the USA.

* Five words chosen uniformly at random from the Diceware list of 7,776 words. This is the kind of password I use to lock each of my AES-encrypted master password files.

In practice I have presumably degraded the security of my passwords despite my
efforts. Here are two obvious ways to degrade the bit size of these secrets:

* Failing to choose the elements in entirely unpredictable way. Rolling fair dice is OK. Using something built in a standard way for the purpose of generating passwords, such as KeePass, is OK. Mentally choosing combinations of characters is not OK, no matter how quirky you think your mind is.

* Failing to keep every part of the secret completely concealed from everyone who does not own it. Letting people know that a password is a particular one of the above types is OK. Letting people see over your shoulder is not OK.

------
tunesmith
I wish I could use these kinds of passwords more often, but too many sites
enforce the "must contain one number, one capital letter, and one punctuation
symbol", oftentimes with a _maximum_ password length.

~~~
aftbit
Absolutely insane. I'd like to see a browser plugin that will detect these
sites and give you a form letter asking them to fix their security standards
as well as the appropriate mailto link.

~~~
avolcano
Something like [https://twofactorauth.org/](https://twofactorauth.org/) for
password limitations would be great.

------
Scaevolus
My attempt at reliable storage of large secrets:
[http://rmmh.github.io/abbrase/](http://rmmh.github.io/abbrase/)

An example 60-bit secret: rolmangrionepolemp (mnemonic: "role man grinding one
political employment"). A bit of a mouthful, but memorizable.

~~~
sillysaurus3
Would you help me understand why this isn't closer to the security of a random
6 character password? It uses the first three letters from each word, and the
first three letters of English words aren't random.

I'm trying to read through the source code to understand it, but the Python
version seems to be performing (slightly?) different steps than the JS
version. Is the Python version an accurate description of your technique?

At first glance, it _seems_ like it could be a 60-bit secret, because a random
18 letter a-z password would be an 84-bit secret, so clearly you're taking the
non-random nature into account somehow. But it's not clear to me exactly how
it's 60-bit.

The page you linked to does describe the high-level process, sort of, but it
doesn't give an example, or explain "This is secure because..." so I figure I
should ask you.

~~~
colmmacc
> Would you help me understand why this isn't closer to the security of a
> random 6 character password?

There are about 80 characters you can type, so a 6 character secret has
262,144,000,000 permutations (80^6) which is about 38 bits (2^38 is
274,877,906,944). 2^60 is much larger: 1,152,921,504,606,846,976 and its sixth
root is 1024. So 1,024 memorable "characters" are needed.

There are about 4000 unique word-based three letter prefixes:

    
    
        $ cut -c 1-3 /usr/share/dict/words | tr 'A-Z' 'a-z' |  uniq | wc -l
        4051
    

So 1,024 of those being associated with memorable words seems credible.

~~~
dTal
Bear in mind "uniq" only removes _adjacent_ duplicates. On my machine:

$ cut -c 1-3 /usr/share/dict/words | tr 'A-Z' 'a-z' | uniq | wc -l

5089

$ cut -c 1-3 /usr/share/dict/words | tr 'A-Z' 'a-z' | sort | uniq | wc -l

3434

p.s. How do you do newlines on HN?

------
deepblueocean
Video. Excellent presentation, for those who are interested:
[https://www.usenix.org/conference/usenixsecurity14/technical...](https://www.usenix.org/conference/usenixsecurity14/technical-
sessions/presentation/bonneau)

------
bayesianhorse
Or you could use the method of loci and learn a 10 character password in 5
minutes. I wrote a simple program which creates a random password and suggests
keyword images...

~~~
eksith
Would you be willing to share the program or source? That sounds like an
interesting project.

~~~
bayesianhorse
It was rough kivy app, which I am not committed to publishing, currently.

You can easily write your own, once you define a system for memorizing the
characters.

The idea is to use animal names for upper case characters, inanimate objects
for lower case, and special mnemonics for other characters you want to use. A
doormat for underscore, a bomb for exclamation mark etc.

Then it's just a dictionary lookup.

The tricky thing is to make the users understand how the method of loci works
(Minecraft is a good medium for this), and that you need to repeat the
password at least once per day until it is completely memorized.

------
hyperbovine
Of course the human brain routinely stores memories which are vastly larger
than 56 bits. The trick is in mapping those to passwords. For instance, if you
went to grade school in the US I claim you can easily remember the following
password: ipattfotusoaattrfwisonugiwlajfa. ("I pledge allegiance to the
flag...") Song lyrics tend to work particularly well.

~~~
WhitneyLand
This is an insecure approach. Dictionary style attacks now can account for
phrases from vast libraries of literature, pop culture, or anything else
you're likely to remember.

~~~
indubitably
But the suggestion is to carry out a simple hash on the input phrase — one
could use the second or third letter, for instance. (Or the first, second,
third, first, second, third…)

How could a dictionary-style attack predict all possible such arrangments?

~~~
WhitneyLand
Why not? The only limit on password cracking techniques is combinatorial
complexity. This gives an overview:
[http://arstechnica.com/security/2013/10/how-the-bible-and-
yo...](http://arstechnica.com/security/2013/10/how-the-bible-and-youtube-are-
fueling-the-next-frontier-of-password-cracking/)

------
rsync
I think it's much more interesting to explore hash algorithms optimized for
memory.

That is, md5 or sha256 checksum program that you could write from memory.

That makes it possible to bootstrap a lot of use cases and OS loading and
hardening scenarios - even disconnected from the network.

------
deegles
I wish there was a method to do PGP in human memory...

