Hacker News new | past | comments | ask | show | jobs | submit login
Visual, multi-language XKCD-style password generator (optionfactory.net)
65 points by alanfranz on Jan 19, 2016 | hide | past | web | favorite | 50 comments

From the previous HN thread on this:

To have fun on April Fools' Day we wanted to build something useful and funny: we created a new XKCD-like password generator that can use different languages dictionaries and show a picture for each generated term by searching Google Images. We had a lot of laughs playing around with Italian and English passwords and we hope you'll have as fun with this as we did! https://news.ycombinator.com/item?id=9304688

It should be just for fun and education, anyone foolish enough to use these generated passwords as real passwords is well ... foolish enough. Flaw #1: no HTTPS.

HTTPS does nothing for you in this case. The passwords are generated locally with JavaScript, and there is an option you can check to prevent them being sent back to the server. But even if they actually were generated on the server, HTTPS would be of limited use, since you'd still have to trust the server itself and the people operating it.

Good point. However, the dictionaries also need to be transferred to the user and a MITM could easily replace this with his own (flawed) dictionary.

Same point: JavaScript injection/modification.

HTTPS doesn't guarantee that the dictionary isn't flawed or the JS is secure, but it does guarantee that the received data/scripts/dictionaries are exactly the ones the server intended to send.


> This is why the oft-cited XKCD scheme for generating passwords -- string together individual words like "correcthorsebatterystaple" -- is no longer good advice. The password crackers are on to this trick.

4 true random choices from a dictionary of 5000 words is 50 bits of entropy. All things being equal that is just as good as any other password scheme with 50 bits of entropy. I think that's about the same as a 10 character upper/lower/numbers/symbols password.

That's your worse case scenario. Add to that the fact that:

a) the attacker doesn't have your dictionary b) they can't be sure you're using this method c) you might add a minor extra step of your choosing

then the attack becomes significantly harder than attacking a 'normal' password with 50 bits of entropy

> a) the attacker doesn't have your dictionary

How many truly unique sources for dictionaries are there?

My own password generator simply uses `/usr/share/dict/words`, which is not that hard to guess (and documented in the generator's source code).[0]

Add to this that not all languages lend themselves to this algorithm – while my native language is German, you'd need a word stem dictionary, as opposed to a regular dictionary, for this, so I stick to English dictionaries.[1]

[0] I actually made it configurable, but lacking a better source, it'll probably remain at this default: https://github.com/creshal/yspave/blob/master/config.json.ex...

[1] cf. http://creshal.de/News/?id=59 , where I rambled on it a bit more.

My own password generator simply uses /usr/share/dict/words

On my system /usr/share/dict/words has ~45k words. That's 15 bits of entropy per word, significantly more than XKCD (which assumes 11 bits).

So I'd agree that there probably aren't many unique dictionaries. The strength of this password scheme is that it works even assuming the attacker knows your dictionary.

In practice, I guess that an attacker will use a concatenation of all popular dictionaries. This will make things slightly harder for them (e.g. if they have 50k words vs the 30k in your dictionary) but not by a huge factor.

> The strength of this password scheme is that it works even assuming the attacker knows your dictionary.

Indeed. Arguing about whether or not your dictionary is "unique" enough is a red herring. Whatever generator you use should accounts for the dictionary's word entropy, not for the theoretical per-byte entropy.

I just looked at some of the hashcat benchmarks and they are clocking MD5 on one of their test machines at 115840 Mh/s.

(2^50)/(115840*10^6) = 9719.43 seconds, or about 2-3 hours to break one correct horse style password. I think the point is that 50 bits of entropy is not enough when you consider the speed of GPU hashing.

Obviously you shouldn't use MD5 for hashing passwords, but since you often have no control over what algo a site uses (or even information about which one they did pick), assuming they will do something common and bad like unsalted MD5 isn't a bad starting point for this kind of analysis.

IMO, there are two kinds of passwords relevant nowadays:

• Password used to encrypt your password database, whichever it is • Everything else

Since the idea of a password database is that you can give each website unique passwords without having to memorize it, you can go full ASCII printable range 20 digits for site passwords. Those will not be brute forced any time soon – at 128 bit entropy –, and the damage of having them leaked will be minimal.

Your password database, on the other hand, should use a painfully slow algorithm for its PBKDF. If that's not using at least scrypt, you indeed have a problem. But assuming a good password database design, 50 bits shouldn't be too bad.

Maybe so, but then that doesn't say much about the usual alternative which has even less entropy.

EDIT: Actually I was going by xkcx's claims, which start with a dictionary word and adda random tweaks to it. I don't know off hand how long a random password would have to be to match 50 bits. But ultimately, that is the question; is this method at least as good as the alternative.

Assuming 80 possible characters (A-Z,a-z,0-9 + 18 punctuation symbols), log2(80^8) ~= 50. So a correct horse password would be about the same as a random 8 character alpha-numeric entropy wise.

The XKCD argument is that correct hose style is way easier to remember. It's possible Schneier thought Munroe was arguing that a correct horse was as much entropy as a 20 character random password, even though he isn't. This is the first I've heard of this argument though so I'm not even sure how much of a controversy it is.

The attacker will naturally start with a smaller dictionary, most common words, which reduces the search space greatly. So unless you're using some of those SAT words, it's even less secure.

5000 words is pretty small for a dictionary. Of course if you're using really simple words to make your passwords then you're out of luck, but choosing random words from a 5000 word dictionary is reasonably secure.

Language contains a surprising amount of entropy, not as much as random characters, but still enough that I can be reasonably sure that this sentence is unique.

Far be it from me to contradict Bruce Schneier, but he doesn't actually give a good reason why the XKCD or diceware scheme is bad. He seems to be saying that it's not as good as a 25 character random string, which is true, but that's not what is claimed. Even if the attacker knows your word list, and knows that you've chosen four of them, if you've chosen them randomly then there's still easily enough entropy to make it unrealistic for them to crack it.

And that seems to be an ongoing issue with -sec debates. That the people "inside" are so focused on "perfect" protection, that they see anything else as worthless.

There was someone here on HN just the other day that basically decried a scheme for using multiple user accounts to "sandbox" programs as useless because it would be ineffective against "state actors" (aka NSA and their equivalent).

This even though it would likely foil, or at least reduce the impact of "drive by" attacks aimed at accessing personal information for identity theft etc.

Well, he does explain it:

"Modern password crackers combine different words from their dictionaries"

And then he goes on to give his advice (of using a password manager or a piece of paper, etc). Is it that advice that you're contradicting?

You also said "..but that's not what is claimed". Are you referring to the implied claim in the original XKCD cartoon (https://xkcd.com/936) or something else? I'm just trying to understand.

The whole point of this scheme is that it doesn't matter that password crackers know about it. It would matter if we were comparing them purely on the number of characters, but that's not the basis that either XKCD or this site calculates the entropy. Four words from a list of 5000 gives 49 bits of entropy. That means there are 2^49 possible passwords (~500 trillion) even if you know the wordlist.

Modern password crackers combine different words from their dictionaries

But the XKCD cartoon is working on the assumption that this is exactly what the attacker is doing. Despite that, they would have to do more work than to crack the "Tr0ub4dor&3" type password.

I think Schneier just got the wrong end of the stick here. The XKCD reasoning seems valid to me.

I don't necessarily want insolence being trivially linked to my real life account, hence the throwaway.

There are some issues:

First, what he says is that "password crackers are on to this trick". Knowing the trick doesn't automatically render a scheme ineffectual. If it did, public key crypto wouldn't work (and the proposed scheme for generating passwords wouldn't be any better, anyway).

Second, the article he's citing is an Ars one about the state of the art in password cracking in 2013, which gave a flawed synthesis about the XKCD/diceware scheme based on a misunderstanding of it. Take a look at the list of cracked passwords given as examples, and see if you can find a single one that would have been generated using that scheme.

One of the main takeaways of the Ars article was that the XKCD/diceware scheme is broken. A reference to the XKCD strip is featured prominently in the the main graphic, but the mention of it only shows up on page 3. Examining their methods shows the link from their methods to their conclusion to be a tenuous one. Calling it tenuous is actually being generous, because it's more like a false link. Here's where it gets called out:

    Early in the process, Steube couldn't help
    remarking when he noticed one of the plains he had
    recovered was "momof3g8kids."

    [...] Other times, they combine
    words from one big dictionary with words from a
    smaller one. Steube was able to crack
    "momof3g8kids" because he had "momof3g" in his 111
    million dict and "8kids" in a smaller dict.

    "The combinator attack got it! It's cool," he
    said. Then referring to the oft-cited xkcd comic,
    he added: "This is an answer to the
    batteryhorsestaple thing."
This suggests that the cracker doesn't understand what the comic strip is advising readers to do when generating passwords.

Now some closing remarks about this research in general. When the Ars article was published I waited for someone to call it out, but didn't find anyone doing that. I waited some time to look again and turned up nothing. That is disappointing, but fine, I suppose.

However, the particular way that the security community builds up reverence for individuals and discourages re-examining past results is troubling. When I see how big of an effect the conclusions of one guy and the journalist that wrote about him in a popsci news source have, and when I see that some people have an itch to challenge those conclusions but don't put them out there, I'm reminded respectively of Feynman's motivation to write about cargo cult science and to warn us against the kinds of things that happened with Millikan's results and the oil drop experiment. People shouldn't feel they need to turn away from an argument just because they see those on the other side citing Schneier, and the community needs to do better about not encouraging the growth of unassailable celebrity or creating sacred cows.

Yes, the key point there is that those words aren't random, they're a relatively common sentence. That, incidentally, is a potential weakness in Schneier's own system, which is really just a more complicated (and harder-to-remember) version of the "momof3g8kids" type of password.

That doesn't really address the XKCD. The XKCD was comparing schemes where the attacker has perfect information about the password scheme each user is using. Password crackers therefore cannot be "on to this trick" as it is assumed they already know in the initial calculations of entropy.

Given that Bruce Schneier is Bruce Schneier, it seems likely that I am wrong somehow, but at present I cannot see in what way.

Can anyone explain how removing letters from a sentence increases it's (total) entropy?

Edit: Okay, I can sort of see how it can increase entropy. Replacing characters with 'nothing' randomly does give some extra entropy, although you lose some as well (especially if you overdo it). Seems like it would be best to randomly remove about half of them, although that's also difficult to remember, and only adds 1 extra bit per character (assuming no entropy is otherwise lost). Overall I'm still not convinced.

Schneier misunderstood the XKCD scheme and ranted.

There's no shame in that, per se, but it was pointed out to him repeatedly in the comments, and he never corrected his mistake.

Just one incident, where he lost a lot of credibility with me. I feel he should have stuck to the technical side of software security and cryptography instead of playing politics.

    $ shuf /usr/share/dict/words | head -n4

    $ wc -l /usr/share/dict/words

    $ # No need to use head. Also, I prefer output on one line.
    $ shuf -n4 /usr/share/dict/words | xargs
    limper mastoplastia highfaluting Bobbinite

The problem with that dictionary is that it doesn't contain easy to remember words.

Was my password "hospitium taxonomy pellagra" or "hospitalism taxonomy pellagra"?

Looks like the image search doesn't work?

Looks like it does not work anymore.

By looking at network inspector, calls to google image api return:

/* callback */google.search.ImageSearch.RawCompletion('1', null, 403, 'This API is no longer available.', 200)

also breakpoint in setSearchCompleteCallback shows that imageResults array is always empty.

From the source it looks like the "feature" isn't implemented yet. Only image reference is:

$('#xkcd-image-'+(index)).html(" ");

This should be a "Show HN"

Uhh. Why post it then? Where's the innovation here?

- It would be good to document the entropy evaluation for readers to check if the assumptions are correct.

- Which random number generator is involved? Optimally, the user should be able to put his plain entropy in the locally run "nice password" maker for which he also checked the source code. Anything else, IMHO, isn't more than the game inspired by the real problem.

Edit: itcrowd clears it up: it's made for April 1st, that explains the problems.

- Worth knowing, a simple solution to real user-generated entropy:


The page could be actually useful if it would run fully locally (e.g. on an air gaped computer) and take the input of the values of the user thrown dices.

Nice, but I think I'll be sticking with Preshing's version. I even have my mom using it plus KeePass.

My usage is to generate four words, replace a/e/I/o in one of them with 4/3/1/0, capitalize, and throw on some punctuation that makes sense.

Sites with forced password limits and those that don't accept special characters are still a pain, but using dashes, capitalization, or three of the four words usually helps.


Unless the test will be performed using dictionary words only. Starting with 1 word 1st capital letter followed by 1st lower case. I think the calculation time would drop significantly.

Compared to brute forcing all alphanumeric passwords of the same length, true. However, even if you know the complete dictionary of 5k words this uses, 5000^4 = 625,000,000,000,000 > 2^49. Not trivial to brute force; comparable to a completely random 8-character password that uses letter, numbers, and symbols.

For me, at least, I'd rather use a line of poetry that is clearly already memorised. An old router password was "It profits little an idle king, etc". (Thanks Frasier). I imagine the real winners would be older work with non-current English, or perhaps some good nonsense?

Obviously when someone knows that's my thing it's easier to crack though... I shouldn't have said anything :(

this sounds like a good idea for those folk who are walking poetry references

Serving password and copying over HTTP is actually a very bad idea.

Moreover, for my language I have some non-latin symbols. I'm not sure if every service can be trusted to treat non-latin alphabet password. And this tool generates words with umlauts that we don't even have in Latvian. Some words are already translited, some not - that doesn't help.

The passwords are generated on the client.

What if someone is listening to your traffic and injects a script which sends generated passwords to a server? http only is a bad idea in this case.

Several mostly grammatical examples (English common): "catholic conversation served laughter" "zoo bearing child useless" "hitting burning psychiatrist much" "justice except critical vacation" "assigned fantastic shower interests"

This site http://correcthorsebatterystaple.net takes it's inspiration from the same XKCD cartoon

It lacks the language choice but other options are more useful to me. Also the domain is easily memorable.

"schatzmeister anschlägen vormarsch stolzen" Yeah. Pretty german.

"admire fucking ali beautiful"

I've already memorized it.

Does this word list purposely choose insulting words? Out of 4 tries I was given both p&#$y and c@#t.

Here's the 5K english wordlist file:


Looking at the JS it doesn't appear to do it on purpose, although I did experience a similar situation.

Also to add, this is probably one of those situations where a JSON file isn't really that useful. It's a wordlist that could be split by newlines or similar. There's 12 bytes needlessly added to each word when using JSON formatting (the "'s and ,'s).

I wonder where they got the commonly used word lists.

it would be nice if it allowed to combine words from multiple languages.

That's my technique on the websites that allow long passwords.

"sexy corpse guys grunting"

Hmmm ...

A good password could be created by using a sentences with > 14 words. And than you should use the first letters in their plain form i.e. This is a sunny day will be: Tias, now you should change any s/S to a $ and any a/A to a @ after that you should add the last two bits of your birth year in fron of the sentences and the first two bits at the end of your sentences.

So now you would have something like: 57Ti@$19

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact