To have fun on April Fools' Day we wanted to build something useful and funny: we created a new XKCD-like password generator that can use different languages dictionaries and show a picture for each generated term by searching Google Images. We had a lot of laughs playing around with Italian and English passwords and we hope you'll have as fun with this as we did!
It should be just for fun and education, anyone foolish enough to use these generated passwords as real passwords is well ... foolish enough. Flaw #1: no HTTPS.
HTTPS doesn't guarantee that the dictionary isn't flawed or the JS is secure, but it does guarantee that the received data/scripts/dictionaries are exactly the ones the server intended to send.
> This is why the oft-cited XKCD scheme for generating passwords -- string together individual words like "correcthorsebatterystaple" -- is no longer good advice. The password crackers are on to this trick.
That's your worse case scenario. Add to that the fact that:
a) the attacker doesn't have your dictionary
b) they can't be sure you're using this method
c) you might add a minor extra step of your choosing
then the attack becomes significantly harder than attacking a 'normal' password with 50 bits of entropy
How many truly unique sources for dictionaries are there?
My own password generator simply uses `/usr/share/dict/words`, which is not that hard to guess (and documented in the generator's source code).
Add to this that not all languages lend themselves to this algorithm – while my native language is German, you'd need a word stem dictionary, as opposed to a regular dictionary, for this, so I stick to English dictionaries.
 I actually made it configurable, but lacking a better source, it'll probably remain at this default: https://github.com/creshal/yspave/blob/master/config.json.ex...
 cf. http://creshal.de/News/?id=59 , where I rambled on it a bit more.
On my system /usr/share/dict/words has ~45k words.
That's 15 bits of entropy per word, significantly more than XKCD (which assumes 11 bits).
So I'd agree that there probably aren't many unique dictionaries.
The strength of this password scheme is that it works even assuming the attacker knows your dictionary.
In practice, I guess that an attacker will use a concatenation of all popular dictionaries. This will make things slightly harder for them (e.g. if they have 50k words vs the 30k in your dictionary) but not by a huge factor.
Indeed. Arguing about whether or not your dictionary is "unique" enough is a red herring. Whatever generator you use should accounts for the dictionary's word entropy, not for the theoretical per-byte entropy.
(2^50)/(115840*10^6) = 9719.43 seconds, or about 2-3 hours to break one correct horse style password. I think the point is that 50 bits of entropy is not enough when you consider the speed of GPU hashing.
Obviously you shouldn't use MD5 for hashing passwords, but since you often have no control over what algo a site uses (or even information about which one they did pick), assuming they will do something common and bad like unsalted MD5 isn't a bad starting point for this kind of analysis.
• Password used to encrypt your password database, whichever it is
• Everything else
Since the idea of a password database is that you can give each website unique passwords without having to memorize it, you can go full ASCII printable range 20 digits for site passwords. Those will not be brute forced any time soon – at 128 bit entropy –, and the damage of having them leaked will be minimal.
Your password database, on the other hand, should use a painfully slow algorithm for its PBKDF. If that's not using at least scrypt, you indeed have a problem. But assuming a good password database design, 50 bits shouldn't be too bad.
EDIT: Actually I was going by xkcx's claims, which start with a dictionary word and adda random tweaks to it. I don't know off hand how long a random password would have to be to match 50 bits. But ultimately, that is the question; is this method at least as good as the alternative.
The XKCD argument is that correct hose style is way easier to remember. It's possible Schneier thought Munroe was arguing that a correct horse was as much entropy as a 20 character random password, even though he isn't. This is the first I've heard of this argument though so I'm not even sure how much of a controversy it is.
Language contains a surprising amount of entropy, not as much as random characters, but still enough that I can be reasonably sure that this sentence is unique.
There was someone here on HN just the other day that basically decried a scheme for using multiple user accounts to "sandbox" programs as useless because it would be ineffective against "state actors" (aka NSA and their equivalent).
This even though it would likely foil, or at least reduce the impact of "drive by" attacks aimed at accessing personal information for identity theft etc.
"Modern password crackers combine different words from their dictionaries"
And then he goes on to give his advice (of using a password manager or a piece of paper, etc). Is it that advice that you're contradicting?
You also said "..but that's not what is claimed". Are you referring to the implied claim in the original XKCD cartoon (https://xkcd.com/936) or something else? I'm just trying to understand.
But the XKCD cartoon is working on the assumption that this is exactly what the attacker is doing. Despite that, they would have to do more work than to crack the "Tr0ub4dor&3" type password.
I think Schneier just got the wrong end of the stick here. The XKCD reasoning seems valid to me.
There are some issues:
First, what he says is that "password crackers are on to this trick". Knowing the trick doesn't automatically render a scheme ineffectual. If it did, public key crypto wouldn't work (and the proposed scheme for generating passwords wouldn't be any better, anyway).
Second, the article he's citing is an Ars one about the state of the art in password cracking in 2013, which gave a flawed synthesis about the XKCD/diceware scheme based on a misunderstanding of it. Take a look at the list of cracked passwords given as examples, and see if you can find a single one that would have been generated using that scheme.
One of the main takeaways of the Ars article was that the XKCD/diceware scheme is broken. A reference to the XKCD strip is featured prominently in the the main graphic, but the mention of it only shows up on page 3. Examining their methods shows the link from their methods to their conclusion to be a tenuous one. Calling it tenuous is actually being generous, because it's more like a false link. Here's where it gets called out:
Early in the process, Steube couldn't help
remarking when he noticed one of the plains he had
recovered was "momof3g8kids."
[...] Other times, they combine
words from one big dictionary with words from a
smaller one. Steube was able to crack
"momof3g8kids" because he had "momof3g" in his 111
million dict and "8kids" in a smaller dict.
"The combinator attack got it! It's cool," he
said. Then referring to the oft-cited xkcd comic,
he added: "This is an answer to the
Now some closing remarks about this research in general. When the Ars article was published I waited for someone to call it out, but didn't find anyone doing that. I waited some time to look again and turned up nothing. That is disappointing, but fine, I suppose.
However, the particular way that the security community builds up reverence for individuals and discourages re-examining past results is troubling. When I see how big of an effect the conclusions of one guy and the journalist that wrote about him in a popsci news source have, and when I see that some people have an itch to challenge those conclusions but don't put them out there, I'm reminded respectively of Feynman's motivation to write about cargo cult science and to warn us against the kinds of things that happened with Millikan's results and the oil drop experiment. People shouldn't feel they need to turn away from an argument just because they see those on the other side citing Schneier, and the community needs to do better about not encouraging the growth of unassailable celebrity or creating sacred cows.
Given that Bruce Schneier is Bruce Schneier, it seems likely that I am wrong somehow, but at present I cannot see in what way.
Edit: Okay, I can sort of see how it can increase entropy. Replacing characters with 'nothing' randomly does give some extra entropy, although you lose some as well (especially if you overdo it). Seems like it would be best to randomly remove about half of them, although that's also difficult to remember, and only adds 1 extra bit per character (assuming no entropy is otherwise lost). Overall I'm still not convinced.
There's no shame in that, per se, but it was pointed out to him repeatedly in the comments, and he never corrected his mistake.
Just one incident, where he lost a lot of credibility with me. I feel he should have stuck to the technical side of software security and cryptography instead of playing politics.
$ shuf /usr/share/dict/words | head -n4
$ wc -l /usr/share/dict/words
$ # No need to use head. Also, I prefer output on one line.
$ shuf -n4 /usr/share/dict/words | xargs
limper mastoplastia highfaluting Bobbinite
Was my password "hospitium taxonomy pellagra" or "hospitalism taxonomy pellagra"?
By looking at network inspector, calls to google image api return:
/* callback */google.search.ImageSearch.RawCompletion('1', null, 403, 'This API is no longer available.', 200)
also breakpoint in setSearchCompleteCallback shows that imageResults array is always empty.
This should be a "Show HN"
- Which random number generator is involved? Optimally, the user should be able to put his plain entropy in the locally run "nice password" maker for which he also checked the source code. Anything else, IMHO, isn't more than the game inspired by the real problem.
itcrowd clears it up: it's made for April 1st, that explains the problems.
- Worth knowing, a simple solution to real user-generated entropy:
The page could be actually useful if it would run fully locally (e.g. on an air gaped computer) and take the input of the values of the user thrown dices.
My usage is to generate four words, replace a/e/I/o in one of them with 4/3/1/0, capitalize, and throw on some punctuation that makes sense.
Sites with forced password limits and those that don't accept special characters are still a pain, but using dashes, capitalization, or three of the four words usually helps.
Obviously when someone knows that's my thing it's easier to crack though... I shouldn't have said anything :(
Moreover, for my language I have some non-latin symbols. I'm not sure if every service can be trusted to treat non-latin alphabet password. And this tool generates words with umlauts that we don't even have in Latvian. Some words are already translited, some not - that doesn't help.
It lacks the language choice but other options are more useful to me. Also the domain is easily memorable.
I've already memorized it.
Looking at the JS it doesn't appear to do it on purpose, although I did experience a similar situation.
Also to add, this is probably one of those situations where a JSON file isn't really that useful. It's a wordlist that could be split by newlines or similar. There's 12 bytes needlessly added to each word when using JSON formatting (the "'s and ,'s).
That's my technique on the websites that allow long passwords.
So now you would have something like: