
Zxcvbn: realistic password strength estimation - lowe
http://tech.dropbox.com/?p=165
======
16s
Many sites won't accept my passwords (SHA1_Pass). They say that they are too
long or have inappropriate chars or that they are not complex enough. Here's
an example of inappropriate chars:

UTP+NnhabgHKx6

So I make a different password and the sites say it is too weak as it has no
special chars or uppercase chars:

5133fe36785a6e01cac7a68c9c111afff5bb4821

So I give up and type Password1 which is normally accepted.

~~~
aidenn0
My solution so far is:

    
    
      cat /dev/urandom|base64|tr -d '/+'|head -c10
    

Nearly every site supports a-z,A-Z,0-9 at 10 characters

~~~
jstalin
10-character apha-numeric password is crackable in a matter of days:
<http://whitepixel.zorinaq.com/>

~~~
Game_Ender
Does that break more then md5? I thought it was well known that md5 was a bad
password hash algorithm.

~~~
jstalin
I don't know why an application couldn't also attack other hashing algorithms.
It's just about brute force creating lots of hashes.

This app also uses GPUs to brute force TrueCrypt:
<http://www.golubev.com/igprs/>

~~~
Johngibb
Because algorithms like bcrypt have a computational difficulty parameter. You
can dial it up so that every check takes something ridiculous. Now instead of
brute forcing all of those possibilities in 10 days, it's 1000 centuries.

------
impendia
> One in nine people had a password in this top 500 list. These passwords
> include some real stumpers: password1, compaq, 7777777, merlin, rosebud.

Looks unbelievable at first. How could people be so stupid?

But I use such passwords all the time. I use a variety of websites where I
have no need or desire for security. Want to post burrito reviews on
burritophile.com as me? I picked something simple and easy to guess, a couple
hours and you'll be going to town! (Just promise not to badmouth the Cosmic
Cantina.)

My bank accounts? Oops, didn't use the same password.

~~~
furyofantares
It always bothers me a bit when I see analysis of password strength for
compromised sites without any mention of the possibility that the account
might just not be important to users.

But there is a caveat. If the account is somehow identifiable as yours (say,
because your friends know it's your account) then suddenly it's a possible
social attack vector. Perhaps a weak one, but probably not something to be
ignored, either.

~~~
elithrar
> It always bothers me a bit when I see analysis of password strength for
> compromised sites without any mention of the possibility that the account
> might just not be important to users.

I actually use that as a factor when considering a password. If I think the
site isn't going to be the most secure (a phpBB forum, or hand-rolled web-
app), then I'm more likely to use a simple (but still relatively decent)
password.

~~~
idupree
Recently I made a new password for some random site (and keep an encrypted
record of it). Then I was relieved I did, because the site turned around and
emailed the password right back to me. Unencrypted. In plaintext.

Hmm, that is wrong enough that I'll call them out by name...
<https://www.nbotickets.com/> (Is it polite and useful to email them how I
feel about that? I feel like I'd just be "someone-is-wrong-on-the-
internet"-ing. Advice?)

~~~
vog
Many mailing lists are doing that by default, too.

------
wh-uws
I have waited for this for so long. I'm glad someone finally took it up and
and more importantly that its on a site as popular as dropbox. (this way
hopefully the thinking will gain some traction)

Every time I'm forced to have a password with 3 or 4 character classes I sigh
and think of that xkcd comic

 _Edit:_ also try typing the password from the xkcd comic here
<https://www.dropbox.com/register>

nice touch

~~~
eblume
For the lazy, if you enter "correcthorsebatterystaple" the password strength
gets set to "lol" with an info-box that reads something along the lines of
"Don't take the webcomic too seriously. :)"

~~~
vog
I find that message misleading. The xkcd comic does have a point, and thus
should be taken seriously. (Despite the obvious downside that those passwords
take longer to type, which is why I still prefer short, cryptic passwords.)

So a better message might be:

"Don't follow the webcomic too closely. :)"

~~~
wh-uws
Its saying dont use that particular password. Any attacker of this script
would know what it was inspired by and attempt that password in a dictionary.

Rendering it about as useless as 123456789 in this instance

------
Lagged2Death
I'm surprised to see that "correct horse battery staple" type pass-phrases
really have to be _quite_ long to score well, but that even comically short
email addresses ("dlk3@mit.edu") score very highly. In fact, it looks like my
ever-so-clever words-and-numbers web passwords ("Happy314Day") are all
terrible, but all my email addresses all make maximum strength 4-point
passwords.

I wonder if that's because email addresses are really hard to crack or if it's
because the rules of this scoring system weren't designed to account for such
a practice. Not a practice of using your real email address as a password, but
the practice of using a fictional email address as a password.

~~~
tomp
In general, email addresses should make quite good passwords
(two.words@domain.tld). However, limiting yourself to
yourname.yoursurname@yahoo/google.com reduces the entropy a lot.

Also, the idea of passwords are easy-to-remember&hard-to-guess. The only
emails easy to remember are the one's you're using currently, which shouldn't
be to hard for an attacker to figure out (in general).

~~~
rplnt
What about:

simplepassword@domainimregistering.on

Easy to remember yet hard to guess. (Unless you read this comment)

------
ashishgandhi
The article mentions that non-English language support as a future
improvement. Since the article is long that it's easy to miss this point and
to put that in perspective how important that is here's an example:

    
    
      yehtohaasanhaiguesskarna
    

That means "This is easy to guess" in Hindi transliteration. Only English
support would say it will take "centuries" to guess.
(<http://dl.dropbox.com/u/209/zxcvbn/test/index.html>)

~~~
Tossrock
Did hindi take "guess" as a loanword or is that just a massive coincidence

~~~
ashishgandhi
Loan word. I can't remember the the Hindi word for "guess" right now.

PS: Although I don't remember the exact words but there were some which are
strikingly similar in both languages. But I found this for you.
[http://en.wikipedia.org/wiki/List_of_English_words_of_Hindi_...](http://en.wikipedia.org/wiki/List_of_English_words_of_Hindi_or_Urdu_origin)

~~~
Garbage
Guess = अनुमान , अंदाज़ , अटकल

~~~
jerf
I'd love to be able to use Unicode in general in my passwords. I've already
mapped an interrobang on to my keyboard because I was using it so much I
needed a key for it. But who would take it?

(Since someone will ask, yes, there are some accounts I'm willing to limit
myself to using one of my personal computers to access, or jumping through
significant hoops to get there, like my bank account.)

~~~
astine
Holy crap. The interrobang is awesome! Why have I never heard of it before‽

------
shabble
If these sorts of 'strength checkers' become ubiquitous across enough places,
I wonder how much value there will be in using reverse-engineered (most of
these are in JS for UX latency reasons, right?) models of their strength
testing as another parameter to your brute-forcing module.

Then you can automatically skip any password you know is _too_ simple, because
the site won't have allowed the user to set it in the first place. You could
also de-weight any constructions your generator is using (keyboard locality,
l33t, ..), rather than positively weighting them as is done now.

Intuitively, it seems like the more restrictions placed on a password (must
have 1 _x_ char, no more than 20 total chars, ...), the smaller the entire
search space. But where is the inflexion point where these rules generate
stronger passwords than they assist.

Then again, if you're doing your hashing and storage right, brute force ain't
gonna help.

~~~
jamesaguilar
When the space of passwords denied by a rule is much, much smaller than the
minimal search space, it doesn't matter all that much.

~~~
lotharbot
Right. I think people make the mistake of thinking that, if you have 40 bits
of entropy and then you delete some 20-bit entropy passwords, you only have 20
bits left. That's not how it works.

40 bits of entropy means 2^40; 20 bits means 2^20. 2^40 - 2^20 gives you
something very, very close to 2^40 (39.9999986 bits of entropy.)

------
Splines
I think it's interesting that "correcthorsebatterystapl" is more secure than
"correcthorsebatterystaple".

Makes sense, but it's amusing to see the time _drop_ as you add letters.

~~~
nosignal
I noticed this too. It seems advice to "pick random words" should be extended
to "pick random words and leave the last letter off".

~~~
mokus
That only doubles the size of the attacker's dictionary, though. Instead, I'd
say "pick random words and add a few random typos". As long as there aren't
too many the typos will be as memorable as the words themselves (more so if
you're a spelling pedant like me), and using a variety of typos instead of
just one simple transformation increases the search space a lot more.

------
landr0id
> Bank of America doesn’t allow passwords over 20 characters, disallowing
> correcthorsebatterystaple. Passwords can contain some symbols, but not & or
> !, disallowing the other two passwords

Can anyone elaborate why "&" or "!" wouldn't be allowed?

~~~
_delirium
A not-very-great but traditional way to avoid some kinds of security holes is
to sanitize your input by blacklisting anything that could be a
shell/scripting/SQL metacharacter. Seems restrictions like that are still
pretty common, either because it's actually still needed for security
(alarming if true at BoA), or because it's now a sort of cargo-cult thing.

~~~
lowe
sounds about right. screenshot of BofA's policy here:
<http://dl.dropbox.com/u/209/bofa_password_constraints.png>

the forbidden list is: $ < > ^ ! [ ]

~~~
hbhanu
Huh, thanks! I suspected it was a security thing, but I've seen some sites
where other non-alphaneumeric characters were disallowed as well. :/ At least
this makes some sense.

------
varenc
The demo at <http://dl.dropbox.com/u/209/zxcvbn/test/index.html> shows what's
happening behind the scenes.

The one usability problem I see is users complaining that zxcvbn is calling
their 'secure' password they use on everything insecure. :-)

~~~
drostie
I actually had a problem with this at a web dev job I did, where I wanted a JS
entropy estimator and coded one in maybe half an hour (though it was a bit of
unexpected time to debug it). Mine was considerably simpler than the above and
would basically use lg(character class size) * length, but would notice when
you switched character classes, too. So it was expecting, for example, numbers
at the end of the file and would only reward you entropy(letters) +
entropy(numbers).

It was at least a disaster when it hit the management who were doubling at the
time as user testing -- "this should be a secure password and it's not!"
applied to passwords which didn't sound very secure at the time. This was
fixed by reducing the entropy bounds to be regarded as "safe" or not. (The
result was that "password1" became a "strong" password, if memory serves me
correctly.)

Before that, I got another interesting gripe from one of my dev colleagues:
"'aaaaaaaaaa' [10 a's] is not secure, but 'aaaaaaaaaaa' [11 a's] is, wtf?!". I
was reluctant to do anything more complicated as a waste of my time but there
is a reasonable expectation that if you do something like this, you do it very
well.

------
bo1024
Any password strength estimator worthy of the name ought to hardcode a list of
those 10,000 passwords and disallow any of them. Add in standard algorithms
and you're probably doing pretty well.

~~~
masonhensley
But would that decrease your conversion rate for your site/ application?

There is a line between gracefully telling your user their password could be
better for their own security and irritating them enough to leave your site.

To be honest, while most everyone here knows more about password security,
most of us that do not use a password manager probably have a couple simple go
to passwords that we use to try out all the apps HN users dream up and share.
(...and more secure passwords when needed)

~~~
Aloisius
Instead of banning them, it should simply inform the user:

"Your password is used by 1.2 million other people and can be easily guessed"

------
jjcm
I created something similar a while back to demonstrate what makes a password
secure. It's drastically less sophisticated than this (I wrote it in an hour
or so), but it has the same approach - evaluating a password by entropy, not
random requirements. <http://files.jjcm.org/jspass/>

The important thing I found while testing this was that it was important to
tell users _why_ their password sucked. Often times, they'll just keep adding
1's to the end of their password until it's good enough. Let people know,
"Your password is in a known list of passwords", rather than, "The entropy of
your password is 0."

~~~
cellis
I just tried my password in your service and here's what I got [0]

    
    
      one quintillion ,
      three hundred ninety four quadrillion ,
      seven hundred seven trillion ,
      thirty six billion ,
      eight hundred fifty one million ,
      four hundred thirty five thousand
    

years to crack.

Good god.

[0] - 1394707036851435000 translated by <http://www.webmath.com/_answer.php>

~~~
kingatomic
Apparently one of my throwaways is secure nearly until the heat death of the
universe.

2.1123066418521704e+73 years to crack

------
CGamesPlay
It certainly needs a rule for putting spaces between the words. "correct horse
battery staple" and "correcthorsebatterystable" should be treated as being
approximately equal in strength.

~~~
DanielStraight
Not to mention:

    
    
      horsebattery -- 3 minutes
      h orsebattery -- 8 years
      ho rsebattery -- centuries
      horseb attery -- 85 years
      horsebat tery -- 54 years
    

Which at the very least is a little odd, even if the reason (breaking up the
words into less word-like structures) is clear.

Also:

    
    
      abcde -- instant
      a b c d e -- centuries

~~~
allbutlost
Also,

    
    
      pas sw ord
    

Will apparently take centuries to crack. I see the reasoning, but can this be
correct?

~~~
zmj
Combinatorics. Yes.

~~~
drostie
Well, no. In "password" you have one common word -- let's say 1,000 options,
2^10. You have two spaces which can go in any of 9 places, for 9 * 8 / 2 = 36
different places, plus I suppose the 1 and 9 for zero and one places.

46,000 options for 15.5 bits of entropy, only. Even if we assume that there
are thousands of different "strategies" by which passwords might be chosen,
that only adds ~10ish bits or so, and doesn't bring it under the useful
thresholds.

------
blake8086
This seems like a great step forward, but it's still a bunch of ad-hoc rules.
While the ruleset is definitely well-put-together and fairly comprehensive, it
still doesn't seem like the most accurate measure.

It seems like password strength basically boils down to:

1) imagine the space of all possible passwords

2) put them in order from most to least likely (123456 would be at the top,
some giant 64 character random monster at the bottom)

3a) if you're malicious, use this list to begin cracking

3b) if you're securing something, use this list to measure strength

An ideal password strength measurer would simply return the approximate rank
of your password.

~~~
drostie
That's probably the wrong way to think about it, and might -- as it does in
this case -- lead to a ridiculously oversized password-guessing implementation
which tries to do too much fancy business.

The most obvious way to do password strength checking would not (I don't
think) let you "use this list to begin cracking", but would instead estimate
the Kolmogorov complexity of the password, as a proxy for its entropy.

That sounds daunting, but it's actually pretty simple in principle: append the
password to a couple concatenated dictionaries plus popular password files,
and see how much it compresses with your favorite zipping algorithm. Compare
it to how much 'password1' zips, because you know that's the first one that
they try and therefore it has complexity 2^0. If the zipping algorithm is
good, it will automatically figure out most of these tricks directly from the
'bad password lists'.

I would say a little more: it is probably the case that you can "steal" the
dictionary from one zipping and force a zipping algorithm to use that
dictionary. If this is the case, the dictionary needed reduces to 64 KiB (I
believe) rather than the 600 KB that the above script requires. I don't know
how much effort it takes to get zlib-with-a-static-preset running in JS but
then again, I don't know how much time it took zxcvbn to reach its final form
either.

Using /usr/share/dict/american-english for my dictionary is a bit crap because
it does not yet speak l33t, but my dictionary can be used for
"correcthorsebatterystaple". XKCD estimates 44 bits = 5.5 bytes; gzip --best
estimates 7 bytes, maybe more if we had more sequences ending in '1' to better
compress 'password1'. (Some extra bits are to be expected purely due to the
diverse number of password-guessing algorithms; 'switch one character to l33t,
switch two characters to l33t, end with a number' offer a couple extra bits
which XKCD ignores in order to establish a lower bound.)

~~~
blake8086
I assume since you mentioned Kolmogorov complexity, that you probably have an
understanding of how compression works.

Why would you add the layer of indirection of running a compression algorithm
over something, when it has to measure the same thing you're trying to get at?

Given a character '1' at the beginning of a password, how likely is it that
'2' is the next character? Compression tools answer this question, but then
they go a different direction with the application of their answer.

Also, if your password guesser guesses anything but '123456' as its first
guess, it's suboptimal, since that really is the most frequent password.

~~~
drostie
The only reason I'd add the layer of indirection is "they've already done it
for me, but they didn't expose the API." I would also be interested in
training a Bayes classifier or a neural network, but again, those aren't as
easy to do as just appending two passwords to the end of a dictionary copies
and feeding them to gzip.

------
ig1
It's not unlikely that "correcthorsebatterystaple" is in several password
attack dictionaries now, so sites may be legitimately ranking it as a weak
password.

But more importantly password strength meters don't result in stronger
passwords. I saw an analysis a couple of months ago (unfortunately I didn't
save the link) where they found showing password strength to the user had no
impact on the strength of the password used. People would pick a password and
then stick with it regardless of strength advice.

~~~
fromhet
May be true, but they are not ranking it as a weak password because it exists
in crackers databases but because it doesnt contain numbers and special chars.

------
onions
FJFJFJFJ takes "centuries"? Probably needs a little more tweaking.

~~~
lowe
It certainly needs more tweaking. FJ, FJFJ, etc isn't in any of the 10k
passwords people commonly use, isn't a sequence, isn't a single repeated
character, etc, so zxcvbn recognizes it as bruteforce.

A fun extension would be to recognize repeated chunks in addition to single
characters.

~~~
onions
One thing would be to try to measure entropy in a different way, e.g. run gzip
on it. Right now FJFJFJFJ has the same entropy as FJGJFJGJ.

~~~
lowe
That's a great idea. More generally, whatever the approach, I agree zxcvbn
would be better with a more conservative rating for non-pattern-matched
regions.

------
brownbat
I use a password locker. The only downside is that it makes setting up new
accounts or changing passwords on an existing account slightly harder, which
decrease usability and security a bit respectively.

Someone should RFC a common password API, so password lockers can query the
password rules and set up a new account or change the password on an existing
account in the background while I browse.

You might worry that this would increase the attack surface, or push people
towards a single point of failure, but I think ending password reuse and
simple passwords could make for a healthy net gain if you carefully designed
the protocol with security in mind. (Throttling and preventing account
enumeration would be two key issues, but they could be overcome.)

------
Shank
The real problem with humans is that passwords are still hard to remember for
multiple services. Doesn't matter if you have a secure password and it's used
everywhere.

Likewise, if it's used with LastPass or 1Password style services, you face the
problem of dealing with entering it. Though a desktop PC is fine for this, the
best counter-examples are mobile devices.

LastPass on mobile: 1\. Use app that needs a password. 2\. Realize password is
in LastPass. Exit app, find LastPass. 3\. Open LastPass, and login. 4\. Copy
password. 5\. Switch back to the other app. 6\. Enter password.

This is so tedious that people are going to re-use some password just for the
sake of not having to do the above every time.

~~~
bad_user
The rule of thumb I'm using for password management ... if losing everything
means you'll lose your passwords, then that's not good password management.
But you also need unique passwords for each service.

My passwords are generated using HMAC_SHA256( global password, domain_name,
salt ). My global passwords is a 7 words phrase, contains capitalization and 2
words that are not in the dictionary. Each password generated is unique for
each website and reasonably long (settled on 32 chars).

This is not perfect but works well.

Related to your problem of usability ... I use Firefox on my mobile and
through Firefox Sync I get all cookies synchronized from my laptop. Meaning
that I am rarely required to enter passwords on my mobile.

------
mkjones
Did anyone look at the linked site <http://xato.net/passwords/more-top-worst-
passwords>? I pulled his top 10k list, but it doesn't add up with his
analysis. I get that the top 100 passwords only cover 14% of the accounts, not
40%. And the top 1000 passwords only cover 44%, not 91%. These numbers don't
change his argument all that much, but I'm curious what I'm missing about the
way he calculated his.

------
RandallBrown
I hate when they won't let me use a password that's not "strong" enough. I
picked my password, let me use it. I know the consequences of using an easy
password.

~~~
aqme28
I hate when they won't let me use a password that's too strong. Nothing makes
less sense to me than rejecting a password because it contains '!' or '#'.

~~~
hbhanu
I cringe every time I see this happen... and always with websites where you
-want- stronger passwords.

------
pclark
I feel like this is a stupid question, but what is wrong with having your
password be something like "p4ssw0rd"? eg: a dictionary word where a few of
the letters are switched for numbers, and maybe even a symbol at the end
("p4ssw0rd$") are these terrible passwords for some reason?

~~~
epmatsw
Password cracking tools will try variants of dictionary words with common
substitutions like that. In this case, a/4, o/0, and s/$ would be swapped out
in passwords, and your password would be guessed in a few minutes. John the
Ripper is an interesting tool for messing around with this.

------
sjwright
$^$^$^_ = crack time 26 days

$^$^$^i = crack time 3 months

$^$^$^z = crack time 5 years

Should the result vary so widely given the arguably minor variation?

~~~
lowe
$^$^$^_ and $^$^$^z are both recognized by zxcvbn as bruteforce regions. it
reports the entropy as:

n log (c)

for a length-n password with symbol space c. the huge difference in crack time
is because zxcvbn is using c==33 (symbols only) for $^$^$^_ and c==59 (symbols
+ a-z) for $^$^$^z

$^$^$^i is in the middle -- 'i' is considered a dictionary match, the rest is
c==59 bruteforce.

the bigger problem is $^$^$^ isn't recognized as a pattern, but i'm working on
ways to improve bruteforce estimation too. good example!

------
tnash
This is a really great step forward for password strength estimation. If the
OSC could get going on it and add a bunch more patterns it could be a great
solution. Perhaps I'll have time to work on some patterns.

------
colanderman
Adding a digram or trigram model would be interesting, as having one of these
could greatly reduce the crack time of an English sentence as compared to
random English words.

------
Sidnicious
FWIW, it looks like Google specifically rates correcthorsebatterystaple lower
than similar passwords. I wonder if any of these websites have it in a
dictionary?

------
georgeott
Length beats entropy every time. Steve Gibson has covered this before.

<https://www.grc.com/haystack.htm>

~~~
DanBC
Steve Gibson is pretty much clueless.
(<http://attrition.org/errata/charlatan/steve_gibson/>)

SpinRite was great. Spinrite stopped being any use sometime in the early 90s.

------
brownbat
This is great. I love how:

thisisastrongpasswor should take 20 hours to crack.

But just add a 'd' to the end, and crack time drops to: "instant"

Magnificent!

------
jksmith
Try some of Reinhold's Diceware phrases. They hold up quite well with this.

------
FootballMuse


~~~
FootballMuse
Sorry. Didn't know this would be a problem. I can't edit the original comment
for some reason.

Yahoooo (with 139 o's)

