
Xkcd Password Generator - preshing
http://preshing.com/20110811/xkcd-password-generator
======
Dove
I find the discussion surrounding the XKCD strip alarming for the superstition
it reveals about password generation. The particular theme I am alarmed by is
that people seem to think that if a password _looks_ alien, or was difficult
for _them_ to come up with, it will be hard for a machine to guess.

Look, we're working with big numbers here. You need to do the math.

In this thread alone, I've seen suggestions to use a common dictionary word
translated into another language, or written in l33tsp34k with some
permutations. From a probabilistic perspective, _these are still dictionary
words_ , even though they look like gibberish. The same is true of the common
method of typing a word with ones fingers displaced on the keyboard.

Conversely, I see a lot of argument that these XKCD passphrases would be easy
to guess because they are made up of dictionary words. This misunderstands the
math behind the situation. Even if an attacker knows that your password was
generated via this method, and _even if they know the word list you used_ ,
the password is still hard to guess. The difficulty grows exponentially with
each word in the phrase, and that's pretty fast.

The key with passwords is not to create something that _looks_ random --
something that if you showed it to another human being, they'd have a hard
time deciphering. It's to create something that _is_ random; literally a
result of a throw of the dice for every new password.

Human beings are really bad at creating randomness. There's a demonstration
done in an early statistics class in which the professor divides the class
into two groups. He tells one to toss a coin a hundred times and record the
sequence of heads and tails, while the others are to write down a sequence
they think is random using their imagination. The papers are completed and
mixed and then -- magically! -- he is able to sort them into the two types,
easily and with high accuracy.

The lesson is this: even when you think you're being random, you probably
aren't. You're probably using the same tricks everyone else is, and making the
same mistakes.

I would trust passwords that come out of a script like this to be _far_ more
secure than passwords anyone (myself included) made up, no matter how random
they're trying to be.

~~~
Cushman
This should be higher up. It's scary to see people — intelligent people, I'm
sure — saying things like "And that goes even higher when you add
punctuation!"

No, it doesn't. All of the reasonable punctuation you could add to a sentence
adds only a few bits of entropy at best. It also makes the sentence harder to
remember— was there a comma or not? Adding unreasonable punctuation or symbols
is even worse— you get slightly more entropy at the cost of a password that is
way harder to remember.

The crucial point here is that four random words, separated by spaces,
selected at random only from the 2000 most common English words — EVEN IF your
attacker knows that your password is four random English words from the 2000
most common separated by spaces — _already is a very long random string_. If
it's not random, each common English word you add adds 11 bits, and is only
marginally harder for most English speakers to remember. Conversely, choosing
"random" extra characters to add in makes it slightly longer, very slightly
more random, and way, _way_ harder to remember.

~~~
a3camero
It's certainly a "very long random string" without context but as people have
pointed out above, it's actually not a very good password if people adopted
this pattern widely (and you said the attacker knows this).

2000^4 = 16000000000000 possible passwords = 1.6E13 = ([A-Z] + [a-z] + [0-9] +
[!@#$%^& _()])^7.1ish. So, your four words from the 2000 word list are equal
to a 7ish character password that looks like "Av#12_ GH". I'm not sure if you
meant that seven characters was "very long" but I wouldn't say it is. Still a
very strong password but maybe not as random as it appears to be when the
pattern is known.

~~~
j_baker
And yet the point is moot because no one is going to use a password like
"Av#12GH".

~~~
bigiain
I have a password file here with several hundred passwords just like that
(actually, they're all 12 chars with upper/lower case, digits, and "special
chars", as chosen and stored by 1Password...)

Joe Public is unlikely to use passwords like that, but I'm 100% sure I'm not
the only hackernews reader who does.

------
ddlatham
A lot of comments here seem to be missing the point.

The main point is to use passwords that give you the most "bang for the buck"
in the sense of adding the most bits of entropy for the least difficulty of
remembering. Adding an extra number, or punctuation, or certain numbers of
repetitions generally adds only a little bit of entropy for a significant cost
in additional challenge to your memory.

Our minds are well suited to remembering combinations of common words, and by
stringing a few such words together, you can generate a larger search space
than using a single word with a few substitutions. Even if the attacker knows
the scheme you're using, he still must search through the space of
combinations of common words, which XKCD is pointing out is quite large.

~~~
eLod
i think you are missing the point: passwords should be hard to guess first and
should be easy to remember second. the former is the stronger need.

let's say there are 500.000 english words you are choosing from and you use 4
words. that gives you 500000^4 possibilities. let's assume the words averages
about 5 characters, so we will compare this to a 20(=4 words * 5 characters)
character long password made of 26 types of character (english alphabet, not
using numbers and other special characters), that gives you 26^20
possibilities. and 26^20 - 500000^4 ~= 2x10^28, or put it this way: (26^20) /
(500 000^4) = 318 850.382..

i know a random sequence of 20 characters are very hard to remember, but
500.000 is an overestimation too. let's say we use special symbols too (50
characters) and the word dictionary has 100.000 words. (50^12) / (100 000^4) =
2.44 so we can say it is better to have a 12 character long password (made of
alphanums + symbols) than 4 random word concatenated (i think 12 is somewhat a
'standard' for 'sensitive' passwords). and i would argue that on the long term
multiple concatenated passwords are very hard to remember. i'm not saying this
is a terrible approach, just not the silver bullet to the 'password problem'
(which xkcd never claimed of course, and for 'non sensitive',
'reused'/'throwaway' passwords it may be a viable option).

edit: and i forgot about case sensitivity too.

~~~
eykanal
Don't forget spaces. And Poland.

Another point is that letter placement within words is significantly non-
random. By intelligently choosing which letters to try in each position, the
hacker could at the very least minimize the number of tries by an order of
magnitude for the first word.

~~~
artmageddon
I probably shouldn't announce, in a forum, that using Don't Forget About
Poland! as a passphrase seems like an awfully tempting for someone like me :)

(American by birth, Polish by heritage)

Speaking of the example I just presented, how much more effective would it be
to include special characters within these long passphrases? Obviously the
goal is to be able to remember them, but surely most if not all of us, are
already using special characters for our passwords.

~~~
pyre
When counting the entropy you would probably count each word as a single
entry, and each special character as an entry (and disregard spaces).

* By capitalizing the words you've doubled the search space for words (assuming that the search space starts with all words lowercased)

* You could increase the search space for each word by 200% (from the space of all lowercase words) by including the possibility of words in all caps (it's unlikely for people to start using alternating case in the middle of words).

* The ' in "Don't" doesn't increase the search space that much because there are a small number of (common) contractions like that, and each of them would only break down into 3 permutations:
    
    
      don't
      dont
      don t
    

(though the last one is highly unlikely). So you're adding maybe 30 more words
to a search space much large than that.

* As far as the special character is concerned, it probably doesn't add too much to the search space. You can break down your phase like so:
    
    
      Don't Forget About Poland!
                  ||
                  \/
      {item} {item} {item} {item}{item}
                  ||
                  ||  Disregard whitespace (acquire entropy!)
                  \/
      {item}{item}{item}{item}{item}
    

So now you've got 5 items. Each item could be either a word or punctuation.
The search space for words is _huge_. The search space for punctuation is
small. Your algorithm just has to realize that if it chooses punctuation for
one of the items, then it doesn't bother to use whitespace to separate it from
the preceding word ("word," vs "word ,").

* You can also further reduce the effects of punctuation on the search space by realizing that punctuation will almost always follow a word, and not other punctuation. This also discounts punctuation as the first item in the passphrase too.

Edit:

Upon further though, if the attacker uses a simplified algorithm to account to
upper-/lowercase, then it may not have that much of an effect on the search of
each individual item (i.e. n! _4 instead of (n+4)!). An attacker could break
the common instances of case down into:

    
    
      * All words lowercased  "don't forget about poland!"
      * All words uppercased  "DON'T FORGET ABOUT POLAND!"
      * All words titlecased  "Don't Forget About Poland!"
      * First word titlecased "Don't forget about poland!"
    

This discounts the possibility of people alternating titlecase across words,
because that's probably as likely to happen as people alternating case within
words (e.g. WoRdS lIkE ThIs). Granted, this also discounts proper nouns in the
middle of the passphrase (things that don't require extra effort for people to
remember to capitalize).

------
nmcfarl
I've been using phrases and sentences as passwords for a while, and I've found
that there are 2 main problems;

1) A lot of sites, still in this day and age, have max password lengths, so I
still have a lot of short passwords. Usually this is bank sites and the like.

2) Password entry fields are often very short visually, and with a long
password getting lost is much easier. I find I have to type them over A LOT.

The second is actually the more annoying problem.

~~~
colanderman
Don't forget sites that require: "your password MUST contain at least one
number, one uppercase letter, and one of the following characters: !, @, #, or
$, but not %, ^, &, or *". I slap my forehead at how counterproductive these
requirements are.

~~~
Simucal
What could the reasoning behind those requirements possibly be?

~~~
Lexarius
Usually the symbols involved are used by SQL or some other layer, and the
programmers insert the password directly into the query string because they
don't know any better. This leads to SQL injection and other issues.

So rather than discovering the correct way to do things, they try to prevent
you from using any characters that might be involved in an SQL injection.

In some cases the guys on the backend know what they're doing, but the
requirement can still be passed down from on high from some manager who
absorbed the practice from another project.

~~~
enduser
If anyone knew what they were doing the uncrypted password would be nowhere
near a SQL statement.

------
wisty
How about (NOT SECURE YET, IT NEEDS MORE ENTROPY):

    
    
        from nltk.corpus import wordnet as wn
        
        all_animals = set()
        def add_to_set(animal):
            all_animals.add(animal.name.split('.')[0].replace('_',' '))
            for child in animal.hyponyms():
                add_to_set(child)
    
        add_to_set(wn.synset('animal.n.01'))
        all_animals = list(all_animals)
    
        actions = ['ate','chased','killed','fought','kissed',
                   'talked to','hated','loved','ambushed','fled'] # can add more
    
        def make_password():
            import random
            random = random.SystemRandom() # is this secure?
            choice = random.choice
            return 'the %s %s the %s'%(choice(all_animals), choice(actions), choice(all_animals))
    

If you pruned out 90% of the animals (i.e. the obscure, hard to spell, or
scientific names), this is still about 20 bits. And the passwords are kind of
memorable (I've gotten such gems as "the dodo chased the guppy" or "the
tigress killed the king charles spanial").

You could also add a humorous adjective ("rabid", "talking", "magic",
"invisible", "evil" ...) or adverb ("roughly", "quickly", "quietly",
"secretly" ...).

You could also add a place name.

~~~
Periodic
Completely random strings of words can be hard for me to remember, but
something like, "the {adjective1} {animal1} {verb} the {ajective2} {verb2}"
would be much easier for me to remember because the words relate to each other
ways I already understand.

I expect we can get some fairly high entropy from just simple schemes like
this.

However, the length of the password can be a real pain if you have to type it
often, even once a day.

~~~
wisty
You could get about 8 bits per animal, and 5 bits per hand-written verb /
adjective / place (32 choices per category). So that's about 7-10 words you
need in the frame.

You could get decent entropy with: the {adj} {adj} {animal} {verbed} the {adj}
{adj} {animal} from in {place}. That's 5+5+8+5+5+8+5 = 41 bits.

I'm just wondering if it's worth it.

------
drcode
One slight addition to the xkcd password scheme that would add another order
of magnitude of security would be to have your own personal "salt" that you
add to all your passphrases. In this case, the salt would be a short,
traditional, hard to remember password that you re-use with every xkcd style
password. It would be hard to remember, but you'd only need to memorize it
once.

So if your personal salt is "@T#23a" you would use "@T#23a correct horse
battery staple" on one website and "@T#23a giant bug transistor leech" on
another website.

~~~
cdavidcash
You might want to read the cartoon again to see why this is useless,
counterproductive advice.

~~~
dsmithn
If this kind of thing takes off, it will become easier for dictionary based
password attacks. Using this advice would go a long way towards preventing
this.

~~~
AdamTReineke
Easier, yes, but not easy. A dictionary attack on 4 words is the same as brute
forcing 4 letters except now instead of just 26 letters there are thousands.
2000^4 vs 26^4 = 35,000,000% more to check.

------
alanh
Careful! This is only using `Math.random` and does not attempt to use
`window.crypto.random` (though most browsers do not support it yet:
<http://jsfiddle.net/alanhogan/trUYu/>) or anything that would attempt to
bring real entropy into the process.

I don’t mean to fault the creator of this page, but at the same time, I would
not trust this generator for important passwords, simply because you cannot
know if others are getting the same 'random' results as you are.

More info on SO: [http://stackoverflow.com/questions/5651789/is-math-random-
cr...](http://stackoverflow.com/questions/5651789/is-math-random-
cryptographically-secure)

PDF on the topic:
[http://www.trusteer.com/sites/default/files/Temporary_User_T...](http://www.trusteer.com/sites/default/files/Temporary_User_Tracking_in_Major_Browsers.pdf)

> _In the Javascript engines of IE (Trident), Firefox (Gecko), Safari (WebKit)
> and Chrome (V8), the output of Math.random() can be used to reconstruct the
> random seed, and thus provide both this seed and the current “JS mileage”
> (i.e. the number of times Math.random() was invoked)._

~~~
kragen
I wouldn't use a JS program served from somebody else's website to generate my
password anyway. How do I know it's not sending them a copy of the passwords
it generates?

~~~
alanh
Well, I watched network connections and saw none. Do that + use Incognito mode
= you're probably good.

~~~
kragen
He recently changed it to use a random seed sent from the server instead of
the client-side RNG. Over, I believe, unencrypted HTTP. Your suggested
countermeasure would not have detected that attack; indeed, perhaps it was
already in place before you reported no evidence of attacks.

It would, however, have made it harder for him (or your ISP) to tell whose
password they'd stolen.

------
IgorPartola
Put this in your .bashrc:

    
    
      function rpass() {
          strings /dev/urandom | grep -o '[[:alnum:]\/!@#$%^&*()<>,.,{}]' | head -n $1 | tr -d '\n'; echo
      }
    

Then run $ rpass 16 and get a 16 character random password with a fairly high
entropy. Then just use a service like LastPass or a solution like KeePassX or
even a single GPG-encrypted file to store your passwords. Problem solved.

Passwords are evil. Most of them should be treated the way you'd treat your
private SSH or SSL key. Whenever you can eliminate a password and get the user
to authenticate using a third-party identity provider, you are doing them a
favor.

Edit: with 80 possible characters, you get 80^16 possible passwords: 10^19
years at 1000 guesses/second.

~~~
yuvadam
Actually LastPass has this option built-in. It can generate a strong password
in-form and directly save it to your password vault.

Very useful.

~~~
IgorPartola
Yes, but I prefer to generate the passwords on my own. I also use this to
generate random passwords for root accounts (sudo FTW), etc.

------
jsulak
I prefer using a program like Password Safe
(<http://passwordsafe.sourceforge.net/>), and use a safe password that's a
long sentence (with punctuation). Then I can use arbitrarily long and complex
passwords for all my accounts, and not have to worry about memorizing them
individually. The password safe can even be synced across computers using
Dropbox.

~~~
nollidge
I prefer KeePass simply because it's got implementations on multiple OSs, as
does Dropbox (to sync the password database file). So I've got it on my iMac,
Android phone, Windows laptop, and Windows work PC.

~~~
mtogo
If you have an iPhone or don't want to use keepassx, you can use an online
password manager like Passpack or Lastpass.

The downside is that you need to really trust the password manager, as they
have all of your usernames and password.

------
ajross
I can't help but think that this is a solution to the wrong problem. The big
problem with password security in the modern world really isn't that they're
easy to break, but that they're pervasively reused between sites. So breaking
them (for example, by reading them in plain text out of a dumb database!) in
one place opens up attacks on higher value accounts.

The fix, of course, is to get users to stop re-using passwords between sites.

How does making passwords _more_ memorable fix this? If anything, forcing
users to use random base64 strings strikes me as more secure as they will be
forced into some sort of password locker implementation by their inability to
remember them.

~~~
crizCraig
Right, maybe if you use the first letter of the words in a sentence, like "Hey
Jude, don't make it bad, take a sad song, and make it better." ->
"HJ,dmib,tass,amib." Then you can add in some characters that make it
different for each site without it being obvious which characters you added. I
wrote a blog post on how to create different passwords for sites that are easy
to remember: [http://craigquiter.com/post/8668237043/creating-and-
remember...](http://craigquiter.com/post/8668237043/creating-and-remembering-
good-passwords-plus-some)

------
GFischer
The link posted on the article merits a submission by itself:

"The science of password selection" (a breakdown of common passwords by
selection practices, as taken from public leaks)

[http://www.troyhunt.com/2011/07/science-of-password-
selectio...](http://www.troyhunt.com/2011/07/science-of-password-
selection.html)

In short, passwords are chosen from:

 _People names: this includes a list of about 26,000 common first and last
names._

 _Place names: this is everything from towns to states to countries and
includes about 32,000 entries._

 _English dictionary_

The most common passwords by group:

Name:

    
    
       1. maggie
       2. michael
       3. jennifer
    

Place:

    
    
       1. dallas
       2. canada
       3. boston
    

Dictionary Words:

    
    
       1. password (oh dear)
       2. monkey
       3. dragon
    

Numbers:

    
    
       1. 123456
       2. 12345678
       3. 123456789

~~~
kragen
Is it possible that the breached Sony passwords he was analyzing may have been
cracked with dictionary attacks? Maybe the reason only 1% of the passwords had
a non-alphanumeric character was that the crackers mostly didn't crack the
passwords that had any non-alphanumeric characters.

------
nrbafna
"For those of us pedantic enough to want a rule, here it is: The preferred
form is "xkcd", all lower-case. In formal contexts where a lowercase word
shouldn't start a sentence, 'XKCD' is an okay alternative. 'Xkcd' is frowned
upon."

------
wcoenen
Note that 44 bits of entropy is still nothing if you want protection from off-
line attacks on password hashes. A couple of GPUs together can calculate a
billion hashes per second, which eats through 2^44 possible passwords in only
a few hours.

This was recently demonstrated when the mtgox password database was
compromised.

 __edit __: but this shouldn't be a problem if the password is properly hashed
with bcrypt or some other scheme with a work factor.

~~~
salvadors
But this approach scales at a much faster rate. Simply adding a fifth word
throws even a billion-per-second attack out into hundreds-of-years territory.

------
billybob
Example generated phrase: "married greatly snake battle"

These phrases would be easier to remember if they made grammatical sense. Like
Chomsky's famous "colorless green ideas sleep furiously" - the words relate to
each other grammatically, even though it makes no sense.

Imagine memorizing "married greatly snake battle" vs "married snakes battle
greatly." I think the latter is easier.

~~~
burgerbrain
Entropy would take a serious hit if you did that.

~~~
kragen
Not necessarily. If only one-fourth of all English words are grammatical after
an average prefix, then you lose two bits of entropy off each word after the
first. I suspect that the actual situation is not as bad as that. You might
end up using "uncommon" words like "deceased", "advent", "fearful", and "ram"
to compensate, instead of more common words like "strongly", "contains",
"afterwards", and "corporate", but that doesn't seem like a major loss to me.

~~~
burgerbrain
Any narrowing of the search space will most definetely reduce entropy.. by how
much is calculatable but I don't have the time nor language statistics right
now to do it.

~~~
billybob
I'm not sure the technical meaning of entropy in this context, but personally,
I would offset the narrowing effect of "restrict to grammatical phrases" by
adding uncommon words. "Besotted ophthalmoscopes gambol indicatively" forms a
coherent, if silly, word picture for me, so I think I can remember it.

As far as possible combinations, my vague memories of linguistics 1001 include
the idea that this is one of the essential properties of language: it has so
many possible combinations, that every speaker is continually creating
sentences that have never before been uttered. Unlike, say, honey bee dances,
which are often repeated.

~~~
kragen
> I'm not sure the technical meaning of entropy in this context

Roughly, it's the logarithm (base 2) of the number of guesses that an optimal
password guesser would have to guess in order to guess your password. It's a
measure of how _unknown_ your password is.

> it has so many possible combinations, that every speaker is continually
> creating sentences that have never before been uttered.

Yes, this is why all of the suggested alternatives like "choose a line from a
popular song" are so much less secure.

------
ZoFreX
I would actually advise going against this advice. While it isn't a best
practice, password sharing can and does happen, as does shoulder-surfing. It
would take a LOT of effort to memorise my password, but a simple four word
password will probably be remembered by accident. In a year's time if I piss a
friend off, I don't want my Facebook password to be readily accessible in
their memory.

I think more people need to learn to remember arbitrary strings. There really
is no way around that problem if you want a decently secure password, and it's
rare someone has a "good memory" - in most cases they've just learnt how to
remember things well.

(Note: This doesn't really apply to me or most of us here in most cases, but
for example my WiFi password is of the form "Mycatsname9" and yet my neighbour
still has to ask me for it whenever her phone forgets it)

~~~
darklajid
How do you share your preferred password? Because I guess everything but
sending it per text/mail would be tedious, while it would work better with a
couple of words.

Shoulder surfing: It's certainly a risk, but I'd say that prolonged shoulder
surfing shouldn't be possible. If I type fast, it will be very hard to make
out the phrase. If I type slow, you cannot stand around that long.

And - I'm not a security expert, but how much do you gain if you saw a couple
of chars here? My intuition (yeah, shouldn't trust that) says that it's worse
if I watch you and know the _first_ character of your password than you seeing
the first 1-3 characters of the first word of my passphrase?

(We don't know the name of your cat, so judging the quality of the password or
your neighbo(u)r's ability to remember it is hard)

~~~
ZoFreX
> And - I'm not a security expert, but how much do you gain if you saw a
> couple of chars here? My intuition (yeah, shouldn't trust that) says that
> it's worse if I watch you and know the _first_ character of your password
> than you seeing the first 1-3 characters of the first word of my passphrase?

Novel thought and possibly worth persuing, I hadn't thought of that. I want to
re-iterate this isn't something I broadly apply across all my passwords or
even many of them, just that for some users password sharing is a use-case.

------
nakkiel
This might come in handy:

shuf -n4 /usr/share/dict/words | tr '\n' ' '

~~~
eru
If you allow multiple occurrences of the same word, you can get slightly
higher entropy while making the passwords potentially even easier to remember.

    
    
        echo $(for i in 1 2 3 4; do shuf -n1 /usr/share/dict/words; done)
    

(Sorry, I'm not very good at bash, so this loop is probably not idiomatic.)

~~~
bnegreve
for i in `seq 1 4`; :)

~~~
bronson
If this is golf, you took two more strokes than he did.

------
scythe
You could probably get a few more bits of entropy kind of easily if you use
words from other languages. This doesn't help the monolingual among us but
it's great for me.

~~~
eru
Yes, though the number of additional bits you get from increasing the size of
the dictionary decreases fast. E.g. suppose English and German have the same
number of words, then using both only gives you one more bit per word.

(Actually, slightly less since some words exist in both languages. Like
`hell'.)

~~~
scythe
>Yes, though the number of additional bits you get from increasing the size of
the dictionary decreases fast.

Well, sure -- but once you're at around two or three languages, you get to
imagine that the attacker doesn't know what languages you're using. If I use
English, Japanese, and Spanish, I can figure on the attacker needing to check
the Germanic (English, Dutch, German), Romance (Spanish, French, Italian), and
Asian (Japanese, Chinese, Korean) languages at a minimum.

Jargon helps too, and proper names. "dijkstra bicycle entonces boojum
daihinmin"

~~~
eru
Always assume the attacker knows your scheme, but not your random bits.

------
numeromancer
What a great article! I'm changing all my passwords to "correct horse battery
staple" today!

------
mrspeaker
He he, reminds me of the password generator I made concatenating 3 words from
the list of the "500 most common passwords":
<http://www.mrspeaker.net/2009/01/09/make-secure-passwords/>

The top 500 list has an awful lot of naughty words - so the phrases are pretty
easy to remember ;)

------
marze
Isn't this discussion premised on a server configured to allow fast password
guessing indefinitely?

This is 2011, shouldn't every server be configured to allow a guess every two
seconds for 20 guesses, then every 10 minutes, or something similar?

I'm not familiar with common practices in this area, but why wouldn't all such
services be configured to limit the incorrect guesses?

~~~
ry0ohki
Most really strong systems lock an account after a couple of incorrect
guesses. I assume this is all for systems that may not be secured to prevent
brute force.

~~~
mtogo
Locking the account is the wrong way to go about it since it makes DoS on
known accounts trivial.

Blocking the IP or an increasing time between tries is, afaik, the "right
way".

------
ck2
I've been doing this for years on sites that allow long passwords - "pass
sentences" - but I also throw in a number or two.

~~~
buro9
I've also been doing this for years, but with bits of my post code thrown in
to fulfil those edqe cases where complexity requirements are needed.

~~~
petenixey
... random combinations of bike bits plus a greater London post code. Give me
enough monkeys and typewriters and I could take you ;)

~~~
buro9
nah, my typos will laways protect me

------
abecedarius
"It's a novel idea."

No, I posted about my own generator in 2005:
<http://darius.livejournal.com/38591.html> (getting the words from Beowulf).
Then Zooko or Kragen pointed out some even older system in response (I forget
the name).

~~~
ajross
Compuserve was generating automatic account passwords in the early 1980's from
two dictionary words and a non-alpha character in between them. Mine was
"sleeve;coast". No doubt they didn't invent the trick either.

------
adnam
Such a password scheme provides much less than 44 "bits" of entropy.
Considering the use of 4 randomly chosen words from the c.170000 english words
in general use, means we can guess the paraphrase in around 2^22 tries - even
less than "Tr0ub4d0r3&".

EDIT: I'm totally wrong, it's more like 2*10^22 ... oops!

~~~
TetOn
Wouldn't you first have to know that the passphrase consists of four randomly
chosen words (eg not three, five, or eight)? To me, that's the underlying
strength of the approach that the comic (!) is trying to highlight.

~~~
burgerbrain
The entropy is actually calculated with the assumption that the attacker
already knows those things. If they don't, then it is higher.

------
mathattack
The beauty of this discussion is not just "How to create memorable but hard to
break password?" but "How much deep insight can a 4-6 frame cartoon contain?"

The signal to noise ratio of xkcd is fantastic! They've again zipped a great
discussion in just a few frames.

------
hm2k
What about sites that don't allow spaces?

I know hotukdeals.com only allows [a-zA-Z0-9] which sucks.

~~~
pavel_lishin
Leave the spaces out..?

~~~
duck
Exactly, or if the site requires numbers/symbols those three spots are perfect
place to put those instead of spaces.

------
dkokelley
I still sense a problem. While these passwords ARE easier to remember, there
is still the security flaw that most people reuse passwords. A key-logger or
shoulder-surfer could snag this (or a website could store your password in
plaintext and be compromised) and then it's game over. Password managers are
the future. They can memorize unique passwords of any length and complexity
for every website you use, and they can store the passwords with very strong
encryption with 1 key that is memorized. That key is where a password like
'correcthorsebatterstaple' could be effectively used.

------
Khao
I remember wanting to sign up on a website that had the worst password
"feature" ever : you typed your password in a plain textfield, and once you
clicked away it was changed to a password field. Seeing as how this "feature"
was on the main page I decided never to use this service and sent the website
an e-mail saying that their password field is not clever but instead is a big
fat counter-security measure.

Edit : I managed to find back what website it was : <http://www.advirtus.com/>
when you register it shows the password as you type it

~~~
CWuestefeld
I think that's fantastic.

1: what purpose do the stupid asterisks serve, anyway? I understand them on an
ATM machine, but not on my desktop PC or phone.

2: _Very_ frequently (like, maybe 50% of the time) when trying to type a
password on my phone, I miss the little "key" and mistype, but can't see that
I did. I have to make multiple tries at entering the password. This feature
would prevent that.

So it looks like all upside, with no cost (when used only in appropriate
contexts).

~~~
Khao
I agree that this feature is good while working with a smartphone, but I'm
pretty sure Android has a settings somewhere to always show the last letter
you typed in every password field. I would be surprised if there wasn't a
setting for that also on iOS.

The thing is, I think it makes perfect sense to implement this in certain
situations, but at an OS or browser level, not in the website or inside an
application. Passwords are something we have grown used to and we always
expect them to behave the same way! If we were to change the way passwords are
handled, it should be consistent across everything.

For example, browsers could implement password fields with a checkbox next to
it that lets you show/hide password at your will. The fact that this website
has only one setting (always show when in focus) is scaring me.

~~~
roc
> _"I would be surprised if there wasn't a setting for that also on iOS."_

That's the default behavior for password fields in iOS. Trick is, when you
have a long password it takes far too long to shift focus from the keyboard to
the text field to verify each character before moving on.

I'd very much like to have a client-side show/hide button for password fields.

------
Aloisius
Personally I think password entry should be done in madlib form. Each user
would have a unique madlib prompt like:

Username: ___________

Password: Twelve ___(pl. noun)___ jumped over a ___(adjective)___ ___(noun)___
named ___(proper noun)___.

------
y0ghur7_xxx
Reminds me of this previous discussion:
<http://news.ycombinator.com/item?id=2450972>

Maybe Randall was inspired by that post.

------
tantalor
Sorry, but this isn't novel. I can't find it now, but I read a blog post that
described this technique recently (~6 months ago).

Edit: y0ghur7_xxx (<http://news.ycombinator.com/item?id=2872827>) found it:
<http://www.baekdal.com/tips/password-security-usability>

------
Shenglong
Does anyone else here not really remember their main password? Mine's all in
muscle memory and I can't write it out unless I imagine a keyboard.

~~~
shinratdr
I wish that was why I didn't know my password. The real reason is that
1Password manages that part of my life for me so all my passwords are long
randomly generated strings that I don't know.

------
presto8
Four English words selected randomly from a large dictionary is certainly
secure. But it's unwieldy to type 20+ character passwords. I prefer 10-digit
random alpha-numeric passwords, although these are hard to remember and type.
Best compromise in my opinion is to use a hashing function with a moderately
difficult passphrase, e.g., Site_Password = Hash( Domain_Name || Passphrase).

------
tgrass
This still creates a false sense of security since it seems (and I stress
seems) to implicitly suggest you can use the same password on every site (I
assume this since the argument for its use is the ease of remembering). If one
site you visit handles passwords in plain text and it has your email, upon a
breach all your accounts are effectively compromised.

------
tucosan
as embarrassing as it may seem, although having read the good part of this
thread, i still don't understand why a four word password, can be more random
than something i would get like this:

    
    
        ~$ pwgen -s 8 C0olz5KM
    

Would anybody care to explain this to me, or at least point me to a good place
where i can read up on this?

~~~
kragen
Assuming pwgen isn't actually _defective_ (I haven't looked), you can get more
entropy in a shorter password with pwgen. The above looks like 6 bits per
character to me, so 48 bits in all. That's 4 bits greater --- 16 times better
--- than Randall's estimate for "correct horse battery staple", which is much
longer. But "correct horse battery staple office" gets you up to 55 bits. Is
it going to be easier to remember "correct horse battery staple office" or
"C0olz5KM"? And how about typing them without making errors?

------
kingsley_20
If you're bi-lingual in a non-european language, transliterating obscure
phrases from the other language could work well. For example, the poetic title
திரிகூடராசப்பகவிராயர் would transliterate to thirikUdarAsappaKavirAyar. Add
some subs & punctuations and I'm done - very rememberable (at least for me)
:).

~~~
pedro_a
Assuming there are 1024 languages in the world you added 10 bits of entropy.
That can be achieved adding an extra common word.

------
aidenn0
Watch out for sites that only use the first 8 characters of your password (no
matter how long it is).

~~~
raleec
Or those that ignore case... I've come across a few popular ones that do that!

~~~
pedro_a
Case is only 8 more bits assuming you randomly assign case to all of your
letters. More likely it will be only one more bit. Then requiring an extra
character is equally good.

------
ryanelkins
I just hate sites that won't accept spaces and restrict how long a password
can be (usually to something relatively short like 8 or 12 characters). Also,
many require you to use mixed case and/or numbers or "special" characters. I
usually just use complete sentences.

------
codebot
Funny comic as usual, but the 20 years thing is probably invalid. How long
would cracking tr0ub4dor&3 on a 486 take? Also I remember some systems didn't
allow pass phrases back then. Windows NT in particular had a max password
limit of 14 characters, iirc.

~~~
kragen
Faster CPUs don't necessarily mean you get more than 1000 auth requests per
second against the web site you're trying to brute force.

------
iaskwhy
I always thought using two password fields with simple words would be much
harder to break than one field only (which can be used to really strange
passwords but also for simples ones as we all know). Someone care to calculate
how much it would take to break it?

~~~
corin_
Well it depends how it's stored, but assuming a fairly standard setup it
wouldn't particularly help.

The main issue with website security isn't people brute forcing the website
login box, it's people cracking the hashes after stealing them. So if you had
two easy to crack hashes stored in the database, you crack them both and off
you go.

~~~
iaskwhy
Oh, I was (like the article) assuming you would concatenate both words (add a
space or something else in between if you want) and it would be all stored in
just one field. What about it?

------
redslazer
I tried to generate all possible permutations of 4 of the 2000 most popular
words in the English language.

My computer failed miserable after about 2000000 permutations and considering
there is 10^13, i wont be making a rainbow table for this new type of
password.

------
Symmetry
I generally use gpw to generate long random but pronounceable passwords.
Something like 'armsdaynistoppo' is fairly entropic, easy enough to remember,
and when I'm used to it I can type it much faster than 4 random words.

------
ErikRogneby
Amazon has implemented this. It's called payPhrase:
<https://www.amazon.com/gp/payphrase/claim/select-phrase.html>

------
shin_lao
Don't forget to add "correcthorsebatterystaple" to your dictionaries kids!

------
juanefren
I have just created a spanish version, that is easier to fork to other
languajes. <http://dl.dropbox.com/u/1990697/pw_gen.html>

------
pguzmang
This article is math true, however, hackers no longer use brute force attacks
and the most popular method is to attack a weak website like for example a not
very popular blog, then if they succesfully broke it they have a password and
a email account from you and if they are very lucky you have the same password
for the email account, so, they got you. Therefore, nowadays it is safer to
have different passwords for every site. Personally, I love to use lastpasss
for my personal use and keepass for the office to store and manage passwords.
Obviously, the weakest link of the chain is my password for the password
manager application. Any of you use a different password manager?

~~~
Adaptive
You can add a variety of two factor authentication options to lastpass (phys
OTP, yubikey).

You can also allow/disallow "offline" access to your lastpass account when
using these two factor options (force second factor at all times or allow
single factor if offline).

------
absentbird
This is how I come up with passwords; I find a phrase that I can remember
without too much trouble then I use the first letter of each word to make a
password.

Phrase: Three Rings for the Elven-kings under the sky, Seven for the Dwarf-
lords in their halls of stone

Password: 3RftE-kuts,7ftD-lithos

Easy to remember and highly secure. I have been using this method for years.

Bonus example: Four score and seven years ago our fathers brought forth on
this continent, a new nation, conceived in Liberty

4sasyaofbfotc,ann,ciL

Less secure then the last example but still strong. Especially if you use
uncommon strings like the words to a song by a local band or a phrase from the
newspaper or an unpopular book. That way even an attack targeting this method
will take a long long time.

~~~
kragen
This is probably not as secure as the xkcd scheme if you don't make up the
phrase yourself. See my comment above with calculations about a variant of
this scheme. I suspect that both of your example phrases are among the million
most quoted phrases in the English language, giving them entropy of under 20
bits.

------
pilif
Assuming that this method for generating passwords gets popular enough, brute
force tools will begin to create an optimized attack for these passwords.

As there are so little words available, if I were to write a brute-forcing
tool, I would try combinations of four words in my wordlist once I failed with
my one-word-dictionary attack before I start trying out all characters.

But all is not lost: Either use more words or vary the amount of spaces you
put between words. This way the dumb optimization "try four words delimited by
space" wouldn't work on your password and they would have to go over to plain
old brute forcing at which point, I agree, the longer, the better.

~~~
burgerbrain
<https://secure.wikimedia.org/wikipedia/en/wiki/Diceware>

Properly executed, this _will_ protect you against brute force attacks. No
need to do nonsense like adding more spaces.

Of course XKCD botched it and said an inadequate minimum length...

Notable quote from the article: _"This level of unpredictability assumes that
a potential attacker knows both that Diceware has been used to generate the
passphrase, the particular word list used, and exactly how many words make up
the passphrase."_

~~~
darklajid
Was just about to quote that to you. Don't you think that you should take this
quote into account when you say xkcd botched it?

~~~
burgerbrain
That's not how people in crypto roll :)

He calculated the entropy assuming that knowledge was had, I think he botched
it in saying that 44 bits were enough. Of course if he were to recalculate it
without those assumptions, but make it clear that he were doing that, then it
would be better.

------
epscylonb
This kind of misses the point.

A password doesn't necessarily need to be something that is easy to remember.
It just needs to be a unique token that is easy for you personally to present
when needed.

I currently use a keepass file stored in my dropbox folder. I am not certain
what the silver bullet to online authentication will look like. However I
suspect it may not require you to remember more than one secure password,
perhaps not even that.

Trust online is hard though, looking at the problems establishing trust online
reminds me how clever human beings are, we sometimes make mistakes but we are
pretty good at evaluating trustworthiness in the real world.

------
frosas
How does one calculate password entropy? I deduced this one:

entropy = log2(symbols^chars)

But using 63 symbols ([a-zA-Z0-9&]) I get 65 bits for Tr0ub4d0r&3, not 28.

~~~
wcoenen
You are making the same error that led to the popularity of passwords based on
a common word and some substitutions.

Your calculation is for 11 characters, each chosen completely at random out of
63 symbols. People don't chose a password that way - we typically can't
generate or remember random symbols.

XKCD's calculation is for a common word + common symbol substitutions and
additions: log2(#words) + log2(#capitalization options) + log2(#substitution
options) + ...

------
xkcdentropy
The XKCD comic is only partially correct. Depending on what source you believe
English text has about 0.6 to 2.3 bits of entropy per character. This means
you need somewhere between 4.7 and 18.3 characters in each word to reach 11
bits of entropy per word. Assuming entropy is closer to 2 bits per character
this is a realistic situation. However, when you assume entropy is closer to 1
bit per character the words have to be too long to be realistic.

------
ssapkota
I blogged on the same issue almost 2 months back: <http://goo.gl/4Sxf6>

------
harshpotatoes
It's a good idea, and would work well, if only websites would let me choose
passwords longer than 12 characters!

~~~
pedro_a
Randall is saying exactly that: Through 20 years of effort, we've successfully
trained everyone to use passwords that are hard for humans to remember, but
easy for computers to guess

------
vidyesh
Great so now we soon would find a _xkcd-brute-force-attempt-list.txt_

------
dendory
If you look at the source, their word list contains around 1600 words. That is
just no where near enough. Using this would give you a very easy to crack
password. You need to make up your own passwords with words you come up with.

~~~
kragen
1600 words is 10.6 bits per word. If you want to reach 70 bits (safe from
offline attacks with custom hardware), that means you need 7 words, which is
within most people's capacity to memorize. If you increase your wordlist to
65536 words, you can get 16 bits per word, but you have to include words like
"lefeuvre", "aarau", and "aubagne". Then you can reach 70 bits in only 5
words. That's not worth it.

Inventing your own words is unlikely to produce very random words. You'll
probably mostly invent the same few hundred nonsense words that any other
speaker of your native language would invent.

In other words, you have no idea what you are talking about and should not
have posted.

------
kahawe
Original xkcd link: <http://xkcd.com/936/>

Oh and there are unfortunately way too many systems limiting your password to
only 6 or 8 characters still in use today.

------
nikcub
as a bash alias:

    
    
      word_pass() {
          cat /usr/share/dict/words | awk 'BEGIN{srand();}{print rand()"\t"$0}' | sort -k1 -n | cut -f2 | head -n 4 | tr "\\n" " " && echo 
      }
    

then:

    
    
      $ word_pass
      corticifugally tetraploidy democrat vibrionic
    

(if you notice how this works, you can see that it isn't super-efficient, but
it works)

~~~
gjm11
DANGER: This gives no more entropy than what srand() uses, which (at least for
GNU awk) is simply the current UNIX time, which (if we assume that when you
generated the password is known to within one year) means only about 25 bits
of entropy.

~~~
nikcub
srand in awk is platform specific. on most recent is isn't a straight call to
srand().

I have another version that I use that stuffs srand but in the end I figured
srand from a 250k dictionary is still better than picking words out of your
head from a ~1k dictionary

~~~
gjm11
I looked at the trunk code for gawk on Savannah when I wrote the above. It
passes the output of time(0) straight into srand().

The size of the dictionary doesn't matter (given that it's more than about 70
words); the limiting factor is the entropy in the RNG seeding.

------
Gullanian
Once hackers realise people are using ~4 random words for a password the
entropy will decrease hugely.

~~~
wcoenen
I think you've got it backwards: the entropy calculation here assumes that the
attacker already knows the scheme. The 2^44 possible passwords are therefore a
lower boundary for the entropy.

In practice the attacker must cast a wider net because he doesn't know exactly
which word list you use, or if you are using a completely different password
scheme. This increases the difficulty.

------
redxaxder
When picking a password, you don't just care about the entropy. You also care
how far down the password guessing order it is.

People who want to guess a password don't just brute force at random. They use
a guessing order that goes through more common classes of password first. So
if _correct horse battery staple_ becomes a popular password scheme, these
will end up attacked before other password schemes. (See
<http://www.schneier.com/essay-148.html>)

Unless you're going to use a password safe full of nasty passwords, you should
pick your passwords using an unpopular method.

~~~
salvadors
The point is that this approach pushes brute-force guesses out into territory
that makes it unlikely anyone will crack it _even if they know exactly what
scheme you're using_.

People seem to be massively underestimating just how long it would take to
brute-force four dictionary words in a row.

------
abalone
This scheme could be easily guessed by a dictionary attack that simply ran
through combinations of dictionary words instead of individual characters.

If this became a popular scheme, the whole entropy argument goes out the door.
It only has more entropy if we compare the two schemes on a character-by-
character basis (~10 vs. ~25). Of course the longer string will appear to have
more entropy.

But if a password guesser expects the _pattern_ of the "four common words"
scheme, as they might if it became popular, it's not nearly as entropic. A
better comparison would be to consider each word as a single "character" from
a 180,000 sized alphabet (for an English dictionary).

Calculate the entropy of that and you'll find it's in the same ballpark.

~~~
kragen
If you took the suggestion in your last sentence instead of offering it to the
rest of us, you would see that the entire rest of your comment is incorrect.

------
julianpid
I find the idea incredibly stupid. If I know someone who used that precise
generator to produce his password. Then I know that the generator has less
than 2000 words in the dictionnary. It then takes me only a few minutes to
guess his password, rather than 550 years.

Conclusion: Don't ever use this password generator, write you own, and tell
no-one about it.

~~~
zokier

      4b02d9f6353a8f36fbb092f040d5a31cdf6841f2
    
    

You up for a challenge? I just generated a pass phrase with this generator,
and hashed it with SHA-1 (echo -n ... | sha1sum), no salting or anything else
special. Feel free to brute force it.

~~~
julianpid
I wrote this piece of code: <https://gist.github.com/1149417>

It's currently running at 3200000 tries per second on my Xeon machine. I am
probably going to get bored before I find the right combination because I
calculated it could take up to 52 days. :)

But anyways, it is still a lot less time than trying to bruteforce something
like Tr0ub4dor&3 in my opinion.

It seems you like challenges, if I gave you a SHA1 hash of something similar
to "Tr0ub4dor&3", would you be able to crack it (without rainbow table) under
52 days ? I don't think so.

------
ErrantX
Not a good idea, sadly. In fact I'd go so far to say this is a _really bad
suggestion_ ; because it gives a false sense of security.

There is potentially a lot less entropy in this password than "Tr0ub4d0r&3",
assuming the hacker is smart enough to realise he can trivially test
combinations of dictionary words in very short amount of time.

(EDIT: I'm way out of touch with this; it's not as trivial as perhaps I
figured. See lower in the thread)

However; it is in the right direction - introducing some sort of extra entropy
can invalidate that form of attack and make this as secure as XKCD suggests.

What do I currently do? I take a reasonable length common word, do a
string/number replacement as so:

H4ck3r N3ws

And then repeat it 3 or 4 times:

H4ck3r N3ws H4ck3r N3ws H4ck3r N3ws H4ck3r N3ws

For extra entropy mix it up:

H4ck3r N3ws H4ck3r News H4cker News Hacker News

That's a simple example - so long as you have a reasonably random scheme then
it is not easy to test against, but is fairly simple to remember.

Bingo :)

(EDIT: for the down voter(s) note: XKCD specifically says _random common
words_ \- obscure words are another matter)

~~~
darklajid
> he can trivially test combinations of dictionary words in very short amount
> of time.

Explain the reasoning behind this, please.

Start with: You don't know the dictionary I used, but have to use one that
seems 'good enough' (i.e. a superset of mine, if possible).

How many words are in there?

How many combinations can you create for 'two word phrases'? (You don't know
the length of my phrase)

How many for three?

How many for four words?

~~~
ErrantX
> Start with: You don't know the dictionary I used, but have to use one that
> seems 'good enough' (i.e. a superset of mine, if possible).

People are likely to use a standard English dictionary. In my experience
(which is exactly within this field) people use a fairly tight subset of the
English vocabulary.

So I would be quite happy to test for a dictionary of, say, 100,000 words and
be hopeful of a good hit rate (note that XKCD says _common_ words, which is
easily missed)

Our software has a test (which runs about third in its list of tests) which
does dictionary combinations up to three words (two words is quite a commonly
used password based on our statistics) with a dictionary size of
<s>175,000</s>17,500. (Edit: sorry, apparently it is an order of magnitude
smaller, I checked with one of the engineers :)) This includes English words
plus a few commonly used foreign/slang terms. The hit rate on this is fairly
high.

(we crack document/windows passwords mainly)

You could of course choose deliberately obscure words to invalidate this - but
they aren't so easy to remember (so people will tend not to).

If someone is going out of their way to secure a password, sure, you're going
to hit a brick wall. But what every password scheme tends to forget is the
"human factor" whereby people not concentrating on being secure will introduce
attack vectors.

~~~
darklajid
I actually didn't ignore the 'common' limitation (and didn't downvote you -
I'm actually interested how you come up with that).

Follow-up questions:

\- What are the first tests, before this 3rd that tests for words? I assume
tests for passwords of the first/left variety in the comic? Aren't they
cheaper?

\- 'Up to three words' is reducing the exponent of possible combinations by
one. Length/number of words is relevant

Edit: Another issue. You say 'people forget the human factor', while you,
yourself, propose something like '4 times Hack News with substitutions' as
better. How is that including the 'human factor'?

~~~
ErrantX
<scrubbed answer>

You know what; it's been so long since I played around with this stuff (it's
even a separate company now, that we just consult for) that I'm way out of
touch with my thought process :)

You're right; there is nothing particularly wrong with the suggestion that
makes it intrinsically very weak for most uses.

I'd best stop commenting before I make a total mess :)

Sorry.

