Hacker News new | comments | ask | show | jobs | submit login
Cracking the 12+ Character Password Barrier, Literally (netmux.com)
113 points by sply on Jan 14, 2017 | hide | past | web | favorite | 73 comments

Because remembering the password "horsebattery123" is way easier than "GFj27ef8%k$39"

One way I like to remember long yet high-entropy passwords is to memorise a long, somewhat nonsensical phrase and use characters from it. The reverse is also possible. E.g. that one could become "Gordon Freeman joins 27 electric fences 8% kills $39"

I am pretty suspicious of my meat-based random phrase generator. A lot of the analysis on the entropy of correcthorsebatterystaple type passwords assume the words are uniformly drawn from a vocabulary of whatever thousands. But if you have a large enough dataset of passwords I bet you can find the true (non-uniform) distribution of words people actually draw from and the entropy will be a bit lower. And if you further restrict it to be almost grammatically correct phrases it will be further lower[1].

Am I just paranoid?

[1] RNNs can learn the distribution of your grammar easily http://karpathy.github.io/2015/05/21/rnn-effectiveness/. The worry is that human generators will condition their random words too much. e.g., "correct" ooohh brain just did a adjective let's throw it a noun next "horse".

This is where diceware comes in. You are supposed to pick whatever word you roll on the first try, no exceptions, for this very reason. Rolling until you get a word you "like" reduces randomnesss significantly. They even say you should use only use real, meatspace dice to generate passwords to be truly random, not pseudorandom.


>There are some obscure words in both lists. If you passphrase includes a word you don't know, look it up in a good dictionary. Learning the word's meaning will aid you memory and your vocabulary.

Of course there's exceptions when you should start from scratch.


>Because some words on the diceware list are two characters or less, you can get a very short passphrase. If your passphrase, including the spaces between the words, is less than 17 characters long, we recommend that you start over and create a new passphrase. You should also start over if your passphrase is a recognizable English sentence or phrase. (These situations are extremely rare.)

(If it were me I'd just keep adding more and more words if my password was 17 chars or less)

> Rolling until you get a word you "like" reduces randomnesss significantly.

Rerolling 32 times loses 5 bits of security... not a big deal.

A 5 word diceware password has ~50 bits of security.

5 bits is a significant part of 50 bits. Alternatively, it matters whether an attack takes a month or 2.67 years.

When I use diceware I use random.org for the dice rolls. They claim true randomness generated from atmospheric noise.

Is that because Random.org is really random, or because your threat model doesn't include state level actors taking over random.org?

If state level actors really are interested in what you have to say, then either you're a high enough level spook that you aren't using a web service for your entropy source, or your crypto is not going to be the weak point in your defense.

If every hacker uses random.org, it's more cost-effective to compromise the service than to take a wrench to every single hacker's kneecaps. It wouldn't be a targeted attack, but it might be very useful if a sufficient subset of 'interesting' people use random.org.

Because I don't often have physical dice handy. So it's that or pseudorandom numbers I generate on my local machine.

Getting your entropy for crypto purposes from a 3rd party is not a great idea.

'3rd party' is rather vague in this context. At least to me it is. Perhaps its more clarifying to say the usage of any networking required is out of the question. E.g. you should be able to use a diceware application on a laptop which is completely off the grid.

Pseudorandom bits are statistically indistinguishable from random bits if seeded with true randomness. And if your OS is properly configured it should be seeded with true randomness.

If you prefer to use a dice, that's fine, but you're going to bet less than two bits of entropy per roll. Enjoy rolling your dice more than 70 times to get 128 bits of entropy. ;)

You don't roll them one by one.

Do you roll 10 of them in one hand and manually input the results back? For what purpose?

I'd love to see a study done on this.

I have a feeling that a "phrase" password has significantly more entropy than a "character" password of double or even triple the length (comparing a word equal to a character length wise).

Even taking into account that a real sentence would need to follow a lot of rules, there are still a LOT of adjectives , a LOT of nouns, etc... I'm sure your "meat based" generator is more open to targeted attacks if someone knows your interests or something, but I have a feeling that it's still such a large pool that it's safe.

And if you start to include somewhat nonsensical phrases like "correct horse battery staple" that even opens things up more.

Include other things like spacing, capitalization, misspellings, made up words, or even prepending or appending a "traditional" password gets you even more still.

If you pick completely random words from the dictionary, you get about 17 bits per word. That's worth between two and three random characters.

If you take random words and arrange most of them into a sentence then you drop slightly but you're still okay. If you add any common words like "the" or "was", don't include them in your word count.

While you could boost it with spacing, capitalization, misspellings, etc. you gain very few bits for each modification you have to remember. You're better off tossing a random character or two onto your phrase, or simply making it a word longer.

I don't know how to answer the other parts of your comment, but your "double or even triple length" estimate might be off:

From https://en.wikipedia.org/wiki/Entropy_(information_theory)

> English text has between 0.6 and 1.3 bits of entropy for each character of message.

For comparison if you used a random string of alphanumeric characters it will have lg(26 + 26 + 10) = 5.7 bits per character.

So if your password is drawn from an english corpus, if the low end of the estimate is correct, it's only about as strong as a random password 9 times shorter (or 4 on the high end).

But of course we don't want a grammatical english password. Question is how much entropy does our meat-based random generator actually lose due to language bias compared to random word selection from an english dictionary (which I don't disagree with the analysis of as long as it's machine generated).

I wonder how much extra entropy you can add by introducing an extra language or two?

For instance, having a password like 'unterwasserboot-sparkle-mocidade-yogurt'.

It seems like multilingual folks would be at a distinct advantage here ... at least until you forget which of the words in your password was in which language, and you end up with 'submarine-faisca-jugend-yogurt' instead :)

Beyond grammar, the addition of intentional errors or symbols or Finnegans Wake can help increase the entropy drastically too.

BIP39 is a spec for generating secure mnemonics (passphrases) with 128-bit entropy. I've written a version in rust, have a look. github.com/leshow/rust_mnemonic

BIP39 generates 12 word or 24 word mnemonics usually.

This was just a small project I did, and hasn't been checked for correctness. However, it should give you an idea of how you can generate word sequences.

I do the same, but I skip the "and use characters from it". By using the full multi-word passphrase I get more entrophy than by using just some of the letters, at the same memorization effort. The only exception is sites with low maximum password length.

Does it not occur to you that if you are going to go that route then why not use the password GordonFreemanjoins27electricfences8%kills$39

which would be even more secure and just as easy if not moreso to remember?

I'm all for mnemonic devices and I can understand how one could remember a password like "GFj27ef8%k$39". But can people remember 30 of those, assuming no password reuse?

You can use passphrases with a substitution or shift cipher and get a high entropy password that would also be resilient to combination dictionary attacks.

I use a German word for my WiFi password. 33 characters, and everybody I know enters it correctly on the first try :D

Just use diceware.

Practically, if one doesn't use a password manager, they probably have a much more serious problem than weak passwords, i.e. password re-use.

As opposed to having your manager DB lost or compromised by a trojan.

Using zx2c4 pass with a Yubikey 4. Passwords are GPG encrypted. The private key is on the Yubikey and cannot be read out. The Yubikey 4 is set to require a touch per password unlock. The only passwords at risk are the ones unlocked. At that point, the trojan could install a keylogger and have the same amount of success.

Losing the password store isn't a problem either. It has a git remote on a USB stick. There's a backup if it's ever lost.

This is interesting. I am not well versed on yubikey, but does it allow you to have a similar setup with other password managers, like keepass? (Meaning, one press per one password) Or is it just a substitute for typing a master password?

There are several integrations, but I don't have Google Play Services on my phone, so I only use what's available via F-Droid. See my other comment in this thread.


The touch setting is specific to OpenPGP keys. If you set it, it works that way for all uses of your OpenPGP key. You can turn it on to see if you like it. If you do, you can also set it to 'fix.' Once fixed, it can never be turned off again without deleting the private key and starting over.


In the case you what to use the some passwords on iDevices, what would you recommend for password db?

EDIT: irremediable posed the same question at the same time ;-)

iOS doesn't have an NFC API. Thanks Apple.


Works okay on Mac OS with GPGTools and QTPass.

In that setup, how do you handle needing passwords on your phone?

There's an app for that. Android Password Store[1]. You can use a Yubikey Neo with NFC in combination with Open Keychain[2]. Both are available on F-Droid.



Which phone? If it is Android, you could root it and use the same commands. Else, if you don't want to root or are using iOS there is software available for LastPass and 1Password.

So what happens when you lose the yubikey?

You can encrypt a folder to multiple public keys. Good for having a backup yubikey, sharing a group of passwords with a spouse or other vip, etc.

If you have a trojan, then it does not matter what form of password storage you use (meat or manager). Any password you enter into the computer with the trojan is at risk of compromise.

If you have a keylogger malware, you only reveal the passwords you type in until you get rid of the malware. Also, you will likely notice something being off if you have malware before you type in every single one of your passwords. PW managers - especially those written by others and widely used - are the most attractive targets for an attacker.

I really like Password Chart[1] as my "password encryption" scheme, since it gives me nice long wacky phrases out of something memorable, per-site.

[1] http://passwordchart.com

Except until a couple of sites leak their plaintext passwords, and the attacker can start piecing information together. Bob@foomail.com's PW on Yahoo.com was ZuBrDfPgPg, and on Patreon.com SkBrOxPdGqPgLu. Any chance his PW for Therapy.com might be OxDfGqPdBrSkZu?

Such attack is more attractive and more effective the more people use this specific method.

Here you are using fixed passphrase and site name as the password, resulting in monosubstitution cypher, which is where the insecurity comes from.

If you use site name as the passphrase and fixed password ("password" below) instead, you end up with:

  yahoo.com:   dXgvRMjjHQvRQFvSa
  patreon.com: UdKiiXVxrCRjYe
  therapy.com: KFXUMnMnSrJLIjaB
That seems quite a bit better (though you do lose ability to print the chart for off-line use). If an attacker knows your 3 plain text passwords and suspects that you use passwordchart.com, you are still in trouble.

True. However, the article here is talking about bulk encryption of thousands of passwords, and how "ordinary" word sets are easy to break.

I think the "usual" case is that hackers want to get as many passwords as possible, and so singling out an individual for analysis is probably not worth the time, unless you're an "individual of interest" for some reason.

I highly, highly doubt that anybody is going to do that as a general attack vector. Computers change, but people are people and are not like, in great numbers, to suddenly start having good passwords, just as they are not likely to suddenly start exercising.

You are essentially relying on security through obscurity, it might work but it fails to add security when viewed through Kerckhoffs's principle.

>until auto-generating password managers gain mass adoption, this vulnerability will always be around.

When auto-generating password managers gain mass adoption, there won't be much point to cracking password hashes. Presumably, one would use a different password for everything in that scenario, which makes the clear text password basically useless anyway.

I think there is a relatively simple rule of thumb: If you can reliably memorize the password with moderate effort, it is very likely not safe. The approaches described in the linked article are clever but they do by far not exhaust the possibilities. One big issue is the following: When you try to come up with random words to compose a password, the words that you'll choose are going to be very very non-random. Example: Tell a person to name a random musical instrument. Most people will say violin or piano. Other instruments will rarely be mentioned. Likewise with tools, most people will say hammer or screwdriver. This has something to do with how words are represented in our mental lexicon. There is a lot of research on that that you could easily leverage in software for cracking passwords, at least the type of password that's using supposedly random words.

> When you try to come up with random words to compose a password

That's why you don't. Give a 64ki word dictionary from your native tongue to your computer and let it choose four words uniformly at random out of it. This gives you a password from a distribution with 64 bits of entropy, and is reasonably easy to memorize with moderate effort.

This means an attacker is expected to proceed to 2\\63 hashes to crack such a password. It would take almost 4 year to crack its MD5 digest on the rig used in the demonstration. If you not using a password manager for external sites (which might not use proper KDFs), you can throw in a fifth word, and be safe for the foreseeable future.

Yeah, I get it. The thing is that many password guidelines do not emphasize how important it is to draw words randomly, and that makes all the difference as I tried to explain.

Considering the ease with which "correcthorsebatterystaple" type passwords can be cracked, I hope Randall Munroe updates that page and recommends people use a password manager.

Munroe uses 4 words as an example, which is a shame because 4 words are very weak.

If you pick 6 words, even from a limited set such as Diceware your phrase is good enough.

I'd be interested to read about any successful attacks against 6- or 7- word diceware phrases.


> Even a GPU cluster from December 2012 could, depending on the cryptographic hashing algorithm used to protect plain-text passwords, cycle through 350 billion guesses per second. Referring to that project, Reinhold wrote, "They claim they can crack a random 8-character password in under six hours. At that speed, attacking a 5-word Diceware passphrase would take on average of 7,300 hours or 10 months to find the correct passphrase, assuming they knew you were using Diceware and developed equally efficient software designed to try only valid Diceware words."

> Further, he noted that "Criminal gangs have built botnets from thousands of computers infected with their malware. Marshaling large numbers of these computers they control might allow them to crack a five word passphrase in a reasonable amount of time." (Gosney's 25-GPU cluster attacked the NTLM cryptographic algorithm that Microsoft has included in every version of Windows since Server 2003. It's known to be much more vulnerable to cracking than other algorithms. Gosney's machine wouldn't perform as fast against PBKDF2, for instance.)


> UPDATE: In a followup e-mail to Ars, Gosney noted that "The figures are based on a brute-force attack that targets a single hash. Due to the nature of GPU computing, attacks that combined multiple words are potentially much slower." At the moment, "Since there are no tools that currently combine three or more words, we don't really know for sure how much slower it would be."

"I agree that XKCD's password strength cartoon of four random words is sound but only for non-fast hashing algorithms like bcrypt"

Nobody competent will use md5 and no hash to store password. And even if you are not competent, most frameworks providing auth will have sane default today.

So "correcthorsebatterystaple" is still a very good practice:

- if the auth is correctly implemented, it's still the best ratio for price/safety. - if the auth is not, you are fucked in so many ways that you password size is the least of your concern.

Only if you're limiting your words to those in a small dictionary - or regenerating until they're "common" words, which is the same thing (fwiw, "staple" isn't in the Google list of top 10,000 English words that they're using).

> Only if you're limiting your words to those in a small dictionary

Diceware uses a 7776 word dictionary. How insecure is a 6 word diceware passphrase? That should give 77 bits of entropy.

In the last paragraph he concludes:

If you are really smart you will begin using a password manager like 1Password or Keepass to generate and database your passwords across devices.

This is why most of my important passwords are at least 30 characters

30's probably overkill because it works out to 178 bits of entropy. You'd probably be fine with 128.

How are you making such a specific assessment without any knowledge of what the characters are?

It doesn't matter what the characters are; only what characters are potentially valid.

Would you say the password "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" has high entropy?

People have gotten their Bitcoin wallets owned with 60-character passphrases, because they used phrases that appeared in a Web crawl. Number of characters is not the important thing.

It would be reasonable to assume that someone who reads HN will understand that "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" would not be good, even if it is 30 characters long. Donatj probably considered this.

That said, your point is valid and it's possible you're right.

That's obvious. What's not obvious is whether donatj's 30 characters are individually random or composed of words. And 'individually random' could mean anything between the 4 bits of hex and the >7.7 bits of codepage 1252.

What matters is what frequency distribution the characters were drawn from.

If my password was typed in Chinese would that make it hard to crack?? Or would it just result in collisions?

If the attacker had zero knolwedge of this they would have to rely on random collisions. On the other hand, there are about 5000 commonly used Chinese characters (estimates will vary) so entropy is going to be fairly high if they knew.

The real issue is how different OS and applications handle character encoding and KDF. A good example is WinRAR which does not allow full width CJK characters to be typed in the password field of its UI but one can copy and paste in anything and it will be accepted. Decryption will work on some OS/version combinations but not all.

There are a lot of hackers who know Chinese, so it probably isn't gaining you any ground. See diceware.com for how to generate strong passwords (in many different languages). Also, use a password manager with a password generator for less sensitive accounts and so that you aren't reusing passwords.

I already use 1Password with min 20 characters, unless the site has a length limit ><

I was more curious about extending the characters past alpha numeric / special characters, to include Chinese characters as well. That would open up the number of characters required to brute force and probably result in relying on a collision. Until technology catches up.

I've been wishing to set my passwords in Japanese for a very long time. Too many things reject them, or don't even allow to type them.

I've had many other devs link me to that XKCD post about how any standard password is 'safe', yet fail to acknowledge the major differences in how those passwords are hashed and stored. So many people just quote the math and say: "Math can't lie". It can't, but the very complex math behind the actual hashing is different from just multiplying charsets.

limited reposts are allowed on HN, so please only flag and link to previous discussions if they have more than just a few upvotes and/or comments, or if something has been submitted more than a few times already. Otherwise such comments just add noise.


> Are reposts ok?

> If a story has had significant attention in the last year or so, we kill reposts as duplicates. If not, a small number of reposts is ok.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact