Hacker News new | comments | show | ask | jobs | submit login
"Pwned Passwords" V2 With Half a Billion Passwords (troyhunt.com)
842 points by explodingcamera 5 months ago | hide | past | web | favorite | 355 comments



I think it would be interesting to do an art project with this data - some of these passwords are funny and/or revealing. Some examples:

  pooplasagna - 3 times
  eggsarebad - 3 times
  eggsaregood - 25 times
  myhusbandcheats - 4 times
  icheatonmywife - 1 time
  ihatemyneighbors - 2 times
  iamanalcoholic - 6 times
  1yearsober - 31 times
  imissmykids - 51 times
  imissmyparents - 6 times


6618 people love life, 1367 want to die.

893 people like turtles, 170 love turtles, 155 love everyone (363 people hate everyone though and 428 hate us all).

4301 people love their dog, 3 fuck their dog, 3 killed their dog (only one person killed their cat)

110 people are killers, 4 kill for money, 1 is a murderer. 68 want to kill, 24 kill for fun :-/

2781 love their wife, 552 love their husband. 68 people hate their wife, 38 hate their husband. 3823 people love their son, 17 hate their son. 212 love their daughter, 3 hate their daughter. 11 siblings are having sex with one another. 3 have sex with their dad, 5 have sex with their mom.

6 people save the cheerleader, 11 people save the world. 1 person is the president. 4 people are obama, only one person is trump. The password donaldtrump is almost 5 times more popular than barrackobama.

339 people love sun, 559 love rain.

114 people are healthy, 216 are sick. 108 people are old, 3 people are too old. 15 people sleep well. 80 people run fast, 6 people run faster. 1 person is the fastest (obviously).

All this is quite facinating and fun .. and partly disquieting.

jesus - 123279 times, lordjesus - 8187, satan - 11662 times, iamgod - 8834, iamjesus - 345 times, jesussucks - 100 times


> The password donaldtrump is almost 5 times more popular than barrackobama.

This is a little late, but...

Right, but "donaldtrump" is only seen 40% as often as "barackobama". "donnaldtrump" is apparently perfectly safe though.


> 1 person is the president

Don't tell me he actually made that his password...


i just checked, at least not his twitter password..


    ilovemymom 20141
    ilovemydad  7850


The number is under-represented because US English is not used worldwide:

ilovemymum: 7928 ilovemymam: 926


Pa, Pop, Dada, Daddy, Mommy, etc. There's work to be done here, folks!


Mam..?


Northern English


And Northern Irish.


Personally I hear 'ma' more in Northern Ireland, never heard 'mam' (although maybe I am mishearing).

mam - https://en.wiktionary.org/wiki/mam#English

ma - https://en.wiktionary.org/wiki/ma#English


I heard `mammy` on Derry Girls which is supposed to be an Irish show.


Yeah, my Irish family say 'mammy'.


Ah, cool


Mammogram. Radiographic procedure that is painful and potentially life saving.


moms are loved almost 3 times more.


Or people who love their moms are 3 times more likely to choose a bad password.


People who love their moms are less intelligent than those who love their dads? There is a dissertation in psychoanalytics just waiting to be written.


> There is a dissertation in psychoanalytics just waiting to be written.

Just don't wear a Hawaiian shirt while giving interview about it on TV.


What's wrong with them? I love those shirts! O.o


Some actual expert was on TV talking about space, a European space probe I think, wearing his Hawaiian shirt with ludicrous SciFi babes on it, and a certain type of person kicked off declaring that somehow SciFi babe Hawaiian shirt is misogyny incarnate or something.

The main outcome for me was that now two different friends own sofa cushions with the identical fabric pattern, they're a bit... garish? Both these friends are women, one of them is a bona fide scientist with a PhD and everything, so evidently the "No to SciFi babes on Hawaiian shirts" misogyny claims were less "mainstream feminism" and more "I'm just looking for reasons to be angry", but whatever, that's culture for you.


I remember that story. The shirt was actually made for the guy by a female friend as a gift


I think it lines up well with existing theories, that of the oedipal complex combined with the idea that women are smarter.


correcthorsebatterystaple - 103 times


this one is quite amazing, I expected it to be one of the weakest around today, I certainly would include it in one of my first attempts if I tried to brute force any password.


There’s no guarantee those accounts hold any value, though. A throwaway account is the perfect chance for such a password — you won’t forget it and won’t care if you lose it.


Good god.

Also I wonder if Randall Monroe knows this somewhat ironic fact? It would make a nice addendum to the alt text on https://xkcd.com/936/ (which BTW is the origin of correcthorsebatterystaple for those wondering).


  hackernews - 3 times
  forgotmypassword - 155 times
  letmein - 184.274 times


>184.274

How is it a real number? Shouldn't it be a whole number?


In Europe, 184.274 means 184,274.


In _parts_ of Europe.


Which parts specifically and why?



hunter7 - 7935 times


Wasn't it "hunter2"? The search turns up 16,092 hits so if I am mistaken I'm not the only one.


People are choosing "*" as a password?



How did you make that "reply" italic?


markup bug in HN, not closing the tag properly so it leaks past the post content if there is an asterix at the end of a post.


"*"


I'm doing. I'll update here soon


do not skip the section on "Cloudflare, Privacy and k-Anonymity" ... it is a great summary of an elegant privacy solution.

And check out Cloudflare's detail post too:

https://blog.cloudflare.com/validating-leaked-passwords-with...


Why does 0000 have the largest number of hashes? Does SHA-1 not distribute hash values evenly?


It's indeed weird that "00000" would be the hash prefix with the highest number of entries. I think it must be a hidden variable. Like some sources put an all-zeroed-out hash in the database for testing or in case of a registration error or for deleted users, and these show up here.


Great thought, but it doesn't seem to be the case - as the number of unique suffixes is the large number here -- in fact, none of the values in the range are simply all zeroes.

https://api.pwnedpasswords.com/range/00000

I wonder if the hidden variable is something to do with how the passwords are leaked. First, let's suppose that a very commonly used broken password hash is plain SHA-1 (I think that's a valid assumption-- unfortunately!). Then, let's figure that amongst the many data dumps / extracts done by hackers, some of them are only able to extract part of the database, or save part of the database, or whatever....and they are fetched / saved / uploaded in lexical order?

Can't think of anything else.

EDIT: Ooops. The other thing is, that these actually are sha-1 hashes of real plaintext passwords. So it's definitely not a test-row in that sense.


Maybe crypto people who have brute-forced up some typeable passwords that hash to low numbers on the first SHA-1 pass, for a fun-and-games equivalent to a Proof of Work? (It'd only show up in actual DB dumps for backends that use "SHA-1 with no salting" for password hashing, which might also serve as a useful canary value.)


Great idea! I ran a quick hashcat against the range00000 list on my laptop. In 1 minute I cracked 79 of them, and not too many of them look very odd - that is, they look sorta like normal cracked passwords.

I'm asking my friend to run a more thorough crack on his dedicated GPU, especially for hash value 000DD7F2A1C68A35673713783CA390C9E93:630 which does stick out to me!


00000000DD7F2A1C68A35673713783CA390C9E93: 89378305686


Non-uniformity in SHA-1 would be major news.

Note that in the description below, I refer to any keyed involution as a block cipher. One may make a semantic distinction, but any keyed involution could be used as a block cipher (though, of course, most involutions would contain trivial cryptographic weaknesses).

SHA-1 is based around a 160-bit unbalanced Feistel block cipher. The input in broken into blocks, where the final block contains padding and a final count of the amount of data processed. A copy of the 160-bit state is made, the 160 bit state is encrypted using a block of the input as a key, and the initial copy is added back (without carries between 32-bit words) to the original copy. This is repeated for each input block in turn. This is called a Davie-Meyer construction for making a hash function out of a block cipher.

For any Davies-Meyer hash function, the block cipher is invertible and therefore unbiased. The addition is invertible and unbiased. Any bias would therefore have to come from non-zero correlation between addition and encryption. For any moderately complex block cipher, these correlations would be very complex. Real world design of Davies-Meyer hash functions focuses on absolutely minimizing any patterns present, and cryptanalysis focuses on characterizing and approximating any and all minute patterns that escape the design process.

There are some patterns (weaknesses) in SHA-1, but all known weaknesses are way more complex (and minuscule) than could explain the sort of bias seen in this data set, so the bias must be coming from a higher-level source than SHA-1 itself.

On a side note, the addition in Davies-Meyer is to intentionally make the round function non-invertible. If the round function were invertible, there's a trivial birthday attack on the intermediate state between rounds that square roots the strength of the hash function. MD4, MD5, SHA-224, SHA-256, SHA-384, and SHA-512 are all Davies-Meyer constructions using unbalanced Feistel ciphers. RIPEMD-160 is a parallel application of two Davie-Meyers hashes with different initial values, followed by XORing the two outputs to obtain the final output. SHA-3 is the most notable cryptographic hash function that's not a Davie-Meyer construction.

In case you're wondering, one could make a Davies-Meyer hash function using AES. The designers of AES took AES, doubled the word size, doubled the number of words, and fixed a deficiency discovered in the nonlinear byte substitution. The resulting hash function is called Whirlpool, and the underlying block cipher is called Anubis. I'm not aware of any use of Anubis outside of Whirlpool.

The Salsa/ChaCha families of stream ciphers and the Blake family of hash functions are all very similar to each other. They all use a very similar family of (unnamed) block ciphers internally that are twice the size of the desired output. They achieve non-invetibility by breaking the block cipher output into two halves and XORing the two halves together.

Before MD5 was broken, I did read briefly about an attempt (not by Ron Rivest) to use the inner block cipher from MD5 for encryption, but the performance wasn't competitive. Now we've characterized the hidden patterns in the block cipher well enough to break it relatively easily. I forget the name the authors retroactively gave to Rivest's inner block cipher from MD5.


Salsa/ChaCha does not halve the output and XOR it together, they add the input block to the output to get non-invertability.

Salsa/ChaCha also does not have a block cipher, just an unkeyed permutation function which is applied to the key plus a constant and counter.


Thanks for the correction. Note that a couple of times I spelled Davies as Davie. Also note that after editing, my description of the Davies-Meyer addition step got mangled. The original copy is added to the result of the encryption.


Both 00000 and 4A4E8 contain the largest number of hashes so it could just be coincidental that the former looks recognisable.


I'm a bit confused - why not distribute a serialized Bloom filter representing these passwords? That would seem to enable a compact representation (low Azure bill) and client-side querying (maximally preserving privacy).


There are half a billion passwords in the list. A bloom filter with even a 1 in 10 false positive rate would still be 286.59 MB.


You could do a Bloom filter on each bucket, each of which has about 500 items. This would reduce the size of the response from about 16k to < 1k. But it would be a lot harder to use since all clients would have to use the Bloom filter code correctly.


A Bloom filter with >500M items, even when allowing for a comparatively high rate of false positives such as 1 in 100, is still in the hundreds of MBs, which would not be that much more accessible than the actual dump files.


The compressed archive here is over 8 GB. An uncompressed 2 GB Bloom filter with 24 hash functions and half a billion entries has a false positive rate of less than 1 in 14 million.

75% space savings, with no decompression necessary for use, and a 1 in 14 million false positive rate is nothing to sneeze at.


But no count of how often the hash is used. Counting bloom filters are till a bit harder to implement.


Counting bloom filters are only marginally more difficult to implement. To increment a key, find the minimum value stored in all of the slots for the key, and then increment all of the stored values for that key that are equal to the minimum value. To read, return the minimum value for all of the values stored in slots for the key.

For these purposes, however, you probably instead want to store just separate Bloom filters for counts above different thresholds, since the common use case would be accept/reject decisions based upon a single threshold.


I agree. Just need to set some bits and test them. This is too big really for a tree or a hash table.


Just added an extra line to the bash wrapper to print how many time the given password appears in the dump:

https://gist.github.com/mino98/8aa240fa55a8182198fba58fb810b...


If you prefer a one-liner like me, the following line works for me:

VARPWD=P@ssw0rd; HASH=`echo -n $VARPWD | sha1sum`; curl --silent https://api.pwnedpasswords.com/range/`cut -b 1-5 <(echo $HASH)` --stderr - | grep -i `cut -b 6- <(echo $HASH) | cut -d ' ' -f 1`

If it doesn't return anything than your password isn't in the list. You should probably start your line with a space so that it isn't recorded in your bash_history.

If someone else can make it better or shorter, be my guest.


Does anyone else not get results when searching for 'asdf' and 'hunter2', and 'lauragpe'(which appears in the article) not return results using the shell script provided?

edit: Ok, so my `openssl sha1` (version 1.0.x) outputs '(stdin) <hash>', whereas the script expects just <hash>. add ' | cut -f2 -d" "' after the 'openssl sha1' call to fix this if you have the same problem.


Here's how I tested:

    echo -n 'hunter2' | sha1sum

    f3bbbd66a63d4bf1747940578ec3d0103530e21d -
https://api.pwnedpasswords.com/range/f3bbb

C-f d66a6 finds

    D66A63D4BF1747940578EC3D0103530E21D:16092


Can you clarify what problem this solves?


You don't submit to the API either a full password or a full hash (which, since Troy produced the hashes is identical). A hash is pretty much perfect for K-anonymity because if you use a prefix like this then it's extremely likely your data will be spread across the buckets, so no 5-character prefix is close to uniquely identifying a password.


As stated in the post, it's a simple solution to help with anonymity.

"The password has been hashed client side and just the first 5 characters passed to the API As mentioned earlier, there are 475 hashes beginning with "21BD1", but only 1 which matches the remainder of the hash for "P@ssw0rd" and that record indicates that the password has previously been seen 47,205 times."


But Troy could still very easily guess the complete hash. It's the one with the 47,205 hits.


But's it's not always that hash. The password you're checking may not be on the list. This is just a quick check to see if the password in question is on the list, in which case it may be a poor choice depending on how often it's seen.

For example, say I want to check "gSAey27tgGsaEG". That hashes to c2e5dfb023cd42df94751581cba33b24bc011027. https://api.pwnedpasswords.com/range/c2e5d has no entry for fb023cd42df94751581cba33b24bc011027, so it's not even in the list of passwords.

Put another way, it averages a few hundred hashes per prefix based on the total password list size (~500M), but there's 2^136 possible has suffixes per prefix. There's no point in guessing that.


Yes but Troy doesn't learn the hashes of uncompromised passwords


Forgive my ignorance but why is submitting a hash a problem? Because Troy knows which passwords have been checked? Why should I care about that? I get that it’s like submitting your password in the clear if it’s in the DB, but in that case surely you have bigger problems.


One way that sites can use this service is to check whether a password has been leaked when users sign up. By handing over the SHA-1 hash of the password you're effectively trusting this service (and anyone who might have compromised it) with all your user's clear text passwords. Connecting the right password with the right user can be trivial in some circumstances, say because a site has a publicly visible sign-up date on profiles, or even if it just hands out sequential IDs to users.


A warning about Cloudflare:

You cannot access their support in any way without logging in. Trying to contact them via their contact/sales page won't work. They won't respond.

This means that if you lose your phone (2FA) and can't log in, you're royally screwed and will have to go to your registrar to recover access to your domains/DNS.


All of that is a good thing in my book. I've been the victim of the "customer service backdoor" on Amazon multiple times. It's ridiculous that someone can just about credentialize as you without even having to log in. They made off with whatever sensitive data the customer service rep had in front of them just from chatting to someone on that anonymous support chat widget.

Meanwhile, all you have to do is backup your 2FA secrets. Why not make it a part of your regular computer backup routine?


You should never use only 2FA for something you don't want to be locked out of. You need a 3rd authentication method to replace the 2nd when you lose it, such as backup codes, that as well as a 4th one to recover a lost password.


> You should never use only 2FA for something you don't want to be locked out of.

Tell that to... everyone.

> You need a 3rd authentication method to replace the 2nd when you lose it, such as backup codes, that as well as a 4th one to recover a lost password.

That's on Cloudflare. If they don't offer backup codes, what can an end user do about that?


Manually record the seed key when you set up 2FA (usually this is contained in a QR code). Keep it somewhere safe and offline. It can be used to recreate your 2FA setup.


I've never looked into that possibility. Thanks.


Pretty sure backup codes are just a part of 2FA.


If support can bypass 2FA why even have it?


What a silly question. One can prove who they are with documents, but nobody can prove who they are with 2FA.

It goes like this: If you can prove who you are, you get access to your account. That's what this is all about.

The more offline, human touch we go, the greater the security.


It's way more likely that a hacker can convince a customer support rep that he's me than that hacker can steal my 2FA codes.

This isn't a hypothetical, this happens all the time including to people I know personally: https://www.forbes.com/sites/laurashin/2016/12/20/hackers-ha...


That's not an inherent problem, that's poor implementation.

Procedures like this could work:

Person contacts support requesting a bypass of the 2FA due to whatever reason.

1. Cloudflare sends email to persons account notifying of the request. 2. Person is required to upload photographic proof of two govt-issued id's. 3. Cloudflare calls person (phone number on file from 2FA or account setup). 4. 30 day delay initiated. 5. 30 days layer, Cloudflare emails and calls person to confirm they requested 2FA bypass. 6. Access is granted.

With procedures like this, it's no longer about convincing a support rep.


Can you just not change nameservers in registrar control panel for domain from CFs to somebody elses?


That moment when you test an old, but still highly valued and securely used, password that you think isn't super obscure but not likely to be used much and see a 4000+ count...


And that's exactly why using this entire corpus - or even more than the first few tens of thousands - as a blacklist would be an extremely user-hostile choice [1].

Password psychology is remarkably consistent across a given demographic, because most people start by modifying of one or more base tokens that are already stored in memory because of their personal significance.

So unless the implementation gives specific, real-time UX feedback to teach users how to pick a password that is very unlikely to be in this (and future, ever-growing) versions of this corpus, using a large blacklist is "gotcha infosec". It creates UX where the user cannot possibly come up with a "good" password using most of their previous strategies. It's the worst kind of "gotcha infosec".

1. https://news.ycombinator.com/item?id=16434266


That moment when you test a very unique password you used to use and it's been pwned once.. GULP! Glad I stopped using that one


Same experience here, except mine was current until shortly after I checked it.

The weird part is that I only used it on internal systems at work. With an overly paranoid security department. Either they’re paranoid in the wrong ways, or I have an evil twin somewhere.

At least, I hope they’re the evil twin...


time for some soul-searching


Am I the only one here who thinks typing your password to a stranger's website is a risk? How do you know he does not log it? how do you know he was not hacked and someone is not logging all passwords that are not on the list YET.


If you don't trust troy hunt / haveibeenpwned.com you can always download the data and analyze your password yourself. But if this is the case you should not trust any website with your password anwhere ever, and should not create accounts anywhere. Troy Hunt has shown himself a responsible security professional, and I trust him more to create a secure password query than some other security organizations.


"But if this is the case you should not trust any website with your password anwhere ever".

That is why you should use unique password for each site.


Yes, with unique passwords for each services, you narrow the attack surface to compromise other accounts. But you still have to trust the operator to store and process this one - unique - password on this one service/website. It does not make any difference for the argument, if one or many accounts are potentially compromised. And you have to trust your password-manager software, since it is next to impossible to remember all the different passwords for all the different services you use.


This is absurd and impossible to remember, you should instead have at least 3 levels of password strenght, one high strenght for base services that are used to retrieve other accounts like facebook and e-mail, other for important services, and another for crap.


You're not expected to remember them all, you're expected to either wrote them down or use a password manager. That way you only really need to remember one very strong password.


Except that my 3 level system failed ages ago. Originally I had one, then with more sites coming - several with bogus or recless implementation - it was extended to the aforementioned 3 tier one just to get f*cked up by 'knowing it better' god complex but stupid enforcers requesting or forbidding (!! how stupid is that!) characters. Not to mention leaks forcing me to introduce new ones, having eventually 5 layers with variations on each level because of the highly arbitrary rules of enforcers blocking my well thought of secure passwords.

All led to the situation that I have an encoded file on my computer with passwords (most just referrals/reminders/instructions not the actual password characters).

How stupid is that! Writing down passwords!

Even into secured files, still, increased level of risk. A method with doubtful protection when someone is targeted for his/her secrets personally. Stupid but that is reality. Made necessary by recless developers.

The whole password infrastructure is dead as means of protection. It does not work against serious attackers, only agains random wanderers. And more and more against rightful users!

And the most was done to ruin it by those enforced the users to solve the problem on the user side that in fact lying in the system side.

Passwords will not fullfill their task if: - allowing parties without permission to enter - locking out righful parties Very strict enforcers corrupt the system through the second point. Narrowmindedly focusing on not letting in unwanted elements cause the whole system to case working as intended, locking out and disallowing users to use it, defying the very purpose of existence.

Encouraging users not to use passwords that ever used by someone is just an extremely very radical level of enforcing and again trying to make users fix the inadequacy of the system developers.....

This is not solving systematic problems just conserving a bad habit plus making a bad situation even worse.


Use a password manager. It's so ridiculously easy to setup and makes it so much easier to log into sites.


No, you shouldn't trust this site with your password, like you shouldn't trust any site with your password.

Choose a new password for every site folks, and if you want to use a site like this, make sure the original places you used the password have been updated to a new one.


The article explains in the "Cloudflare, Privacy and k-Anonymity" section how you can use k-Anonymity to query the API with (at least some) privacy, by just submit the first 5 characters of your password's hash.


As long as you use unique passwords for every site, the risk should be small.


the website does not send your password to their api but the first 5 chars of an sha1 of your password


An old password (12 char numbers and letters) I've since stopped using (but used to use everywhere) appears as pwned in this list (3 times!). I'd love to know who exposed it. Any chance I can find out?


You could check for your email on https://haveibeenpwned.com/


As a policy Troy Hunt won't reveal which breach he found your data in. I considered setting up a series of 'canary' emails so that I could track who's selling what but ... well never got round to it.


You used to be able to adjust your email address to check. For example if you email was bill@gmail.com, you could sign up for HN with bill+hackernews@gmail.com. Gmail ignores the part after the + sign. Therefore if you noticed emails coming to that address, you would know that HN sold their list. However, I've found that most forms reject that as a non-valid email address now.


I use a catchall address on my own domain name, and use sitename@domain.com to sign up for everything.

You should be using your own domain for email anyway


> You should be using your own domain for email anyway

Why?

I trust Google to secure gmail.com better than I can secure my own domain.


People have had their Google accounts suspended (especially if it's associated to say, a YouTube or AdWords account), and now you have no access to your address anymore, and Googles free tier support won't help you.

With domains there's an ICANN process to get your domain back if it gets hijacked.


(a) Using your own domain doesn't require using your own mail server, you can point your mx server at Google apps if you're comfortable or your registrar probably supports mail forward if you don't want to pay for G Suite. (B) it means you can keep your email address if you ever leave Gmail.


I understand that, but there's still the possibility of the domain name itself being hijacked, or even just forgetting to renew it.

Maybe I'm paranoid :)


I achieve the same thing by just accepting wildcard addresses at my domain and using sitename@mydomain.com whenever I sign up for things.


This is why I configured my dovecot instance to use - instead of + for the mailbox delimeter


Don't any half decent sites strip that out anyways? Some ecommerce sites have actually failed to accept that string, inadvertently thinking it's invalid. The rest, or any marketing CMS, would simply remove it.


Why would you modify valid user input?


Yes. Those forms are also ignoring relevant RFCs.


Can you be more specific?


Many sites reject valid email addresses. One character it is common for forms to reject is a "+" in the left hand side of an email address. The email RFCs allow this character, so denying it is bogus. Nevertheless, they do.

https://tools.ietf.org/html/rfc2822#section-3.4

atext = ALPHA / DIGIT / ; Any character except controls, "!" / "#" / ; SP, and specials. "$" / "%" / ; Used for atoms "&" / "'" / " * " / "+" / "-" / "/" / "=" / "?" / "^" / "_" / "`" / "{" / "|" / "}" / "~"

atom = [CFWS] 1 * atext [CFWS]

dot-atom = [CFWS] dot-atom-text [CFWS]

dot-atom-text = 1 * atext * ("." 1 * atext)

...

addr-spec = local-part "@" domain

local-part = dot-atom / quoted-string / obs-local-part


While Troy doesn't directly reveal the breaches, it is pretty easy to deduce which public leaks were part of v1 - especially once they're cracked:

https://cynosureprime.blogspot.com/2017/08/320-million-hashe...

I anticipate similar results for v2.


How does he get hold of the data in the first place?


A combination of public leaks and private contributions. He does a pretty good job of vetting them and determining what era they're from (by polling some users to see when that password was used/changed on that service, etc.)


not sure what his methods are, but you can find torrents of most of the dumps


The UX of a blacklist with a half billion entries would be so crippling that it would cause a user revolt.

Most people's password-selection strategies are similar enough to other people's (like kbenson's 4000+ hit) that they could spend hours trying to come up with a password that has never been leaked before.

I tried to encourage Troy to suggest to implementors that blacklisting all passwords was a Bad Idea. Instead, he doubled down:

https://twitter.com/TychoTithonus/status/966400790221930496

Please don't use the entire list for blacklisting unless you actively also guide the user in how to generate a random passphrase (if a human must remember it) or a random password (if it will be stored in a password manager).

Instead:

1. Use a high-level password-strength assessment widget like zxcvbn:

https://github.com/dropbox/zxcvbn

2. Configure a blacklist with, say, 10K or 20K of the most common passwords.

3. Hash passwords with bcrypt cost 12 (adjusted to your platform's hashrate capabilities), scrypt, or the appropriate method from the Argon2 family.

But for all that is holy, please don't use Troy's entire corpus - or even the first million - as a blacklist. To quote the old NANOG saw, I encourage all of my competitors to use it. ;)

Edit: And since Troy's API at this writing does not support only querying the top X passwords, there's no way to use the API while avoiding the UX nightmare. So if you want to use this data in a professional manner, but don't want to download the entire corpus, here are the first 20K from his list (of which I've only personally cracked 19965 so far, interestingly; gist will be updated once I get all 20K):

https://gist.github.com/roycewilliams/281ce539915a947a23db17...

Edit 2: Preliminary results indicate that this data may be dirty. The 273rd most common password, according to Troy, is '$HEX'. This is almost certainly an import/conversion artifact, since the '$HEX' prefix is how most cracking suites escape non-ASCII or passwords that contain colons. I expect that there will be more artifacts. Use the data with caution.


> I tried to encourage Troy to suggest to implementors that blacklisting all passwords was a Bad Idea. Instead, he doubled down

> Please don't use the entire list for blacklisting unless you actively also guide the user in how to generate a random passphrase (if a human must remember it) or a random password (if it will be stored in a password manager).

I think he did the right thing, and think you are correct as well. I think we have the best of both worlds with this, in that it includes the count, so API users can determine what the correct cut-off is for them. Once you get into the thousands (or maybe less) might be a good indicator that your password is not only relatively common, but also likely to be on (and maybe even fairly high on) many dictionary lists. More secure services that cater to more technically savvy users (or security conscious companies) may decide to blacklist any password on the list period, and that may be okay because those sites either trust their users to deal with it or can dictate conditions for a captive audience.


As a corpus to download for password research, this is indeed useful. But for providing a blacklist -- his stated purpose - it is not.

The crucial tell: his API does not allow the implementor to specify a frequency threshold (by top X in the list, or by Y number of unique uses of the password or higher).

By both API and explicit language in the announcement, he is promulgating the idea that checking the entire blacklist is useful, and "the larger the blacklist, the better." This is exactly what I'm arguing against.


> his API does not allow the implementor to specify a frequency threshold

Yes it does. The output contains the number of matching passwords. It's just client side instead of server side. The reason for not doing so on the server is also obvious taking into account his explanation of cost and caching, which informed much of the API design itself.

> By both API and explicit language in the announcement, he is promulgating the idea that checking the entire blacklist is useful

Because the entire blacklist is useful. He's given all the relevant information to the client to do with as they may. It's up to them to choose how to utilize it. I'm not sure why you seem to think some narrower use case is necessarily better, using end use cases as arguments, given it's an API and needs a client implementation fore being usable anyway.


It's a fair point that raw password count is available.

But that value is an absolute number, without any in-API context of the total size of the corpus. This makes expressing relative rarity only possible by hard-coding the total size of the corpus into a calculation.

Put another way: the 20,000th position has a frequency value of "7889". But what does that mean? Where is that in the distribution of password frequency? It's impossible to tell, without manually constructed context that will change over time as the total number of passwords in his corpus expands.

But more crucially, there is no way to tell relative rank ("is this password in the top 20k?") using the API that I can see. That would make using the top X much easier. But with the K-anonymity "feature", there's no way to do that that I can see.


I don't follow - how is the relative rarity better than absolute frequency? What really matters is how common your password is - not how highly it's ranked in a compromised password list, which has no relevance to how common it may be.

You want to filter on users choosing a password that's been re-used across all compromised more than N times.

Filtering users on choosing a password that ranks N of M on a list of compromised passwords doesn't tell the user how bad that password is.

In fact, once you get to the rail, the ranking is basically based on sort order and become irrelevant?


The ranking in Troy's list is based entirely on how common the words are. Here are the top 10, with their relative frequency:

  c4a8d09ca3762af61e59520943dc26494f8941b:123456 (20760336)
  f7c3bc1d808e04732adf679965ccc34ca7ae3441:123456789 (7016669)
  b1b3773a05c0ed0176787a4f1574ff0075f7521e:qwerty (3599486)
  5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8:password (3303003)
  3d4f2bf07dc1be38b20cd6e46949a1071f9d0e3d:111111 (2900049)
  7c222fb2927d828af22f592134e8932480637c0d:12345678 (2680521)
  6367c48dd193d56ea7b0baad25b19455e529f5ee:abc123 (2670319)
  e38ad214943daad1d64c102faec29de4afe9da3d:password1 (2310111)
  20eabe5d64b0e216796e834f52d61fd0b70332fc:1234567 (2298084)
  8cb2237d0679ca88db6464eac60da96345513964:12345 (2088998)
So ... what is the "right" threshold for N?

  $ for topx in 1 100 1000 5000 10000 20000 50000 100000 200000 500000 1000000; do \
    echo -n "$topx: "; head -n ${topx} pwned-passwords-2.0.txt | tail -1; 
  done

  1: 7C4A8D09CA3762AF61E59520943DC26494F8941B:20760336
  100: 482FA19D5C487CB69ACDA19EEE861CC69D82CC94:272371
  1000: 5B9FE558F673D63309BEB13BFA5DA6C30A3CA1BF:64912
  5000: FE648FC459A6F6EF6CD347BEE3D494766239BBB5:19860
  10000: 2682A3DBA7A1452EE7EE9980F195C6A768055DA6:11055
  20000: 53490A3C8567342B57B6A4FF24908DF73182B357:6309
  50000: 7517CD23A308BBCD05E5AD24AA6AD054237ED470:3153
  100000: BA6D6A41B9548C523833627A8B0E5170558BE1EA:1752
  200000: E50E6893264519636E90E95B6B1A85D0A691E0B1:931
  500000: AF8DF653177BBB3FEE2DA68D314B94CB5281B4F3:381
  1000000: BDD57A4CAA691A3441C1190C6F087B58B2EE3EF6:186
  2000000: C824AF24AA8F2FD99AD6842DC0E4B49100D96161:93
  10000000: 352DB7177AB7848DF1C102234401097FE40EB87D:22
The third field indicates how common the password is in the corpus (for example, the single most common password - "123456" - appears in the corpus 20,760,366 times).

So ... based on this data ... what is a reasonable value for that count, such that if the value is exceeded, the user should be disallowed from using the password? How much real-world online or offline resistance is provided by disallowing, say, passwords used at least 186 times in the corpus (roughly a million passwords, though 5201 passwords are at the 186 mark)? (The answer should be self-evident; if it isn't, I can provide more background).

Put another way ... if the corpus was only 1M in size, those right-hand values would be much smaller. How could you determine the threshold then? What I'm trying to illustrate here is that it's not the absolute value of that commonality number that matters; it's the relative rank. But that relative rank can't be determined via the API; you must analyze the entire corpus directly - and then discard the vast majority of it for blacklisting purposes.

I totally get that the threshold might vary per implementation. But it varies much less once the hash is slow enough, and the authentication service is suitable rate-limited. In other words, any system that would get real benefit from a 1-million-word blacklist is one that needs to be improved elsewhere instead.

But Troy didn't provide any guidance about that, or even how to judge for yourself what the threshold might be. He just provided an API to blacklist a corpus of passwords that is three orders of magnitude larger than a properly designed system would ever need.

1. https://blogs.dropbox.com/tech/2012/04/zxcvbn-realistic-pass...


Why would a password that occurs in this list, but rarely, be safer? Attackers aren't going to skip the rare ones.


Because:

1) In an online attack, against a properly-configured service, even if password spraying is used, only the first few thousand passwords can be tried before rate-limiting, CAPTCHAs, etc. kick in.

Would a user with a known leaked password at a different site be vulnerable to an online correlation attack? Yes. And that's why some big services supplement their approach by proactively searching for those leaks and forcing a password reset for those specific users.

2) In an offline attack, when the passwords are properly hashed with a modern slow hash, even an expensive GPU or FPGA cluster would take weeks to exhaust a 10,000 word dictionary against a large user corpus, and a significant amount of time even when a single user is targeted.

Would users with '123456' get cracked pretty quickly? Yes. And that's why the top X are forbidden - to make offline attackers have to dig deeper into their wordlists (and thereby also their pocketbooks) to crack a password in a useful amount of time.


Eh, "We've got lots of users so it will take a long time to crack them all" isn't much of a defence.

I mean, if you've got Obama or Snowden or Taylor Swift or Logan Paul or whoever as a user, you think hackers wouldn't spent 2 hours of GPU time per account to crack their passwords?


I'm quite familiar with password attack scenarios.

If high-value targets are selecting passwords that would be vulnerable to a targeted cracking attack, the solution isn't to blacklist a half-billion passwords (when they could just as easily come up with literally trillions of other passwords that would also be bad, yet are not included in the blacklist). The solution is to show them how to manage their specialized threat model - 2FA, creating strong passphrases, using a password manager, etc.


I find it hard to believe that you could set a cutoff of passwords that have been leaked but that you could rely on an attacker not to try. These passwords are more useful guesses than anything a password cracker would make up out of components.

XKCD considers a password that's one of 2^28 possibilities "easy" to guess, and provides a well-regarded strategy [1] for coming up with a password that's one of 2^44. Passwords in this list are one of 2^29.

[1] https://xkcd.com/936/


Random passphrases are indeed a good idea. XKCD #936 advocates for 4 words randomly selected from a 6000-word dictionary, which is 6000^4, or ~1.296 × 10^15, which isn't actually that strong if the service in question has chosen a weak password hashing algorithm. When using pure bruteforce or masks (not a dictionary or hybrid attack) against a large-ish corpus of passwords (say, a few million) a system with 6 GTX 1080s can realistically try 8.2 billion SHA1-hashed passwords per second, which would exhaust the entire XKCD 936 keyspace in about 45 hours. (If you bump it up to five words from a 20,000 word dictionary, you get ~3.2x10^21 possibilities, which is better.). And if you focus on a single hash, that SHA1 rate jumps to ~32 billion hashes per second, which would take less than 15 hours.

At that speed, processing the entire Pwned Passwords list would almost take longer to read from disk and into memory than it would take to exhaust against a single password. Password cracking specialists would of course try raw wordlists first (And therefore "more useful", in a way) ... but we many other tools in their arsenal that generate far more than a half a billion candidate passwords. And at that rate, you can exhaust all 8-character passwords made up of printable ASCII - 95^8, ~6x10^15 - in a couple of days. Other techniques (mask, hybrid, rules) can achieve similar rates, and combinator attacks are slower but still pretty efficient.

By contrast, attacking bcrypt cost 12 on the same system can only try ~660 hashes per second - against a single hash. At that rate, if you just tell the attacker "it's somewhere in the Pwned Passwords list", it would take about 210 hours to exhaust the raw list, and 36 years to exhaust all 6-character passwords made up of printable ASCII.

In other words, if a service is storing passwords poorly, that service should be fixing that long before they should be trying to blacklist a half billion passwords. The purpose of blacklisting up front in the password-changing UI isn't to forbid a half-billion passwords. It's a way to reduce risk of online attack - and an opportunity to guide users towards better selection methods. There's a reason why Dropbox only blacklists the top 30K.


> which isn't actually that strong if the service in question has chosen a weak password hashing algorithm.

That only matters if you re-use the password in multiple sites.

If an attacker has access to the hash, that means they cracked the site already at the admin level and got into its user database.

They don't need to crack your password to gain any more access to that same site. (And they already have all the plain text personal info from your account.)

Your only additional problem now is if that password gives them access to your account on other sites that they haven't broken into yet.

The ultimate protection against that is not to have reused that password. That beats the stupidity of "password strength".

If a password is not reused, it has to be only strong enough to survive the five guesses before an account is locked out.

Password strength matters when hashes are public (like in classic Unix non-shadowed /etc/password files). Well, that's a bad idea, which is why we have shadowed password files. Shadowed password files may as well store passwords in cleartext; if those passwords are not reused anywhere, the situation is safe. Anyone who can see the cleartext is already root. If those cleartext passwords don't work on any other system, they are worthless to the attacker.

Thus password strength --- all the fussing with how we properly store passwords with a decently strong hashing function and salting --- is just a fallback strategy to protect password re-users.


> Shadowed password files may as well store passwords in cleartext; if those passwords are not reused anywhere, the situation is safe

Wait, what?

If they were randomly generated and of sufficient length, yes.

If they weren't randomly generated, even if not exactly reused, they are very likely to reveal the psychology of that user's password selection habits. This is of definite value to a focused attacker. Not only could it inform guessing passwords on other systems, it could also inform guessing that user's _next_ password on _this_ system.

> They don't need to crack your password to gain any more access to that same site.

Just because they have the hashes doesn't mean that they have other access. Hash lists are bought, sold, traded, and stolen all the time. Someone who possesses that particular hash may be multiple hops away from the group that originally acquired them.

Also, just because the database layer that the passwords are stored in is owned, does not mean that a particular target level of access has been acquired. Password storage can be abstracted into an entirely standalone subsystem, for which knowing, say, an admin of that system's password would be quite valuable.


It means that suppose the attacker can look in /etc/shadow (due to having root privs) and sees, in plain text, that the password of user "bob" is "correct-horse" (not anything fancy like "correct-battery-horse-staple"). But Bob doesn't use that password anywhere else. So what good is that piece of information to the attacker? On this system, attacker can just "su bob". On systems where attacker is not root, "correct-horse" doesn't get into bob's account.


> If they were randomly generated and of sufficient length, yes.

What does that buy you, if they are in plain text?

(Well, randomness quasi-guarantees that they are not re-used; I covered that.)

If we have passwords in plain text, issues about length related to cracking hashes is moot; the cracking that still matters is someone guessing at the login prompt, where we can lock out accounts after N attempts.


> What does that buy you, if they are in plain text?

Nothing. That's why I was agreeing with you for that subset.

But N may be smaller than you might think, when frequency data is also supplied by the API.

https://gist.github.com/roycewilliams/60b77640a962125b04ae67...


What about the other case - when they're not random, but also not reused ... such that the psychology of the user's password-selection methodology might be exposed?


If you have a password selection methodology that you do not change when hashed passwords are compromised, then it doesn't help you. The methodology will be uncovered once the password is cracked, even if that specific password doesn't itself work anywhere anymore. It's somewhat better if the methodology is discovered later than earlier, I suppose.


> Hash lists are bought, sold, traded, ...

All only possible after the horse has escaped the barn.

> Someone who possesses that particular hash may be multiple hops away from the group that originally acquired them.

But if the hash is for a password that was only used on the original compromised system, it is useless, even if the password is recovered.


Just because the horse is out of the barn doesn't mean that the owner of the barn knows about it yet.


Right! So (from the perspective of the password alone) the owner doesn't have to care if that password is used only on that site where the horse has left the barn.

If the password is used on other sites, then of course all that protects them its strength relative to the compute resources thrown at it, relative to the time between the breach and discovery.

(From other perspectives, the user does care: like their credit card number was stolen and is being misused.)


This list is small for the purposes of password-cracking. Enumerating 500 million things is something a computer can do very quickly.

Consider this: if you store the hash of one of these passwords in your login database, you have stored something that can quickly be turned back into the plaintext password, just by enumerating the list.

Passwords that have been leaked can't become good passwords again.


> Passwords that have been leaked can't become good passwords again.

This doesn't make any sense. We already know all possible passwords: the set of all the permutations of the set of legal password characters of a given length. Your argument applies equally well to these. You can't just use these passwords and check the hash, since the database (hopefully) at least salted their hash to prevent rainbow attacks like that. But even if they didn't salt their passwords, nothing prevents you from hashing all possible passwords and checking. I think I read somewhere that making a rainbow table for all possible 8 character passwords took little time and space; thus all 8 character passwords are already broken. Need to crack a password? Generate the rainbow table and just look it up.

However, your assumption that computers can check passwords quickly is not necessarily correct: that's why bcrypt exists.


No. Let me illustrate with a different security situation where we use blacklisting.

Many years ago Debian mistakenly shipped a version of openssl that didn't use good entropy to pick RSA keys. As a result, everybody with that Debian would get one from a relatively small pool of keys when they asked for a new one. There's nothing special about these RSA keys, other than the fact that Debian systems from a particular era would always pick them.

A good CA (e.g. Let's Encrypt) blacklists the public halves of those key pairs. Again, there's nothing special about them, no reason they're worse than any other random key _except_ Debian always picked those, and since it did bad guys can trivially find out the corresponding _private_ key for each value and so they're useless.

If you propose to use one of these blacklisted public keys, there is a near certainty that it's because you have a broken Debian system making the keys, and so refusing you keeps you safe. Even though there's nothing special about these keys.

Now, if I have a system that generates RSA keys in a known secure way, I needn't check for Debian weak keys myself. Why not? Because there is statistically no chance I'd ever pick one at random, it's a total waste of engineering effort to check. But if I ask somebody _else_ to make a key pair and send me the public half, I should check against the Debian weak keys, because I shouldn't trust that they're smart enough not to use the broken code.

These passwords are crap. They wouldn't necessarily be crap if nobody had ever known what they were, but now they do, so they're crap now. Pick a different password.


Okay, I may be overstating just how quick it is -- you might have to spend a few days of CPU time to go through the whole list, based on estimates I'm seeing. (Parallelize it however you want.)

But don't act like I don't know there are a finite number of passwords. The number of possible passwords grows exponentially with length, and the number of leaked passwords grows linearly with leaks.

It's about the number of possible passwords you have to check. There's a huge difference in magnitude between having to hash "all 10-character passwords" and "all 10-character passwords that are definitely someone's actual password".


Dropbox's zxcvbn password-strength estimator already incorporates a list of 250k+ common passwords and words. No need to make a separate blacklist check unless you want to check more than what zxcvbn already does. (and you could enforce it more strongly on the server instead of locally in JS like zxcvbn)

For reference, these are the passwords and common words zxcvbn already checks against: https://github.com/dropbox/zxcvbn/tree/master/data

And with zxcvbn it'll still flag a password as low entropy/low security if the password it's checking is just a simple modification of something on the common password list.


Then add NIST to the list of people you should be reaching out to (report linked from the homepage of pwnedpasswords):

https://www.nist.gov/itl/tig/projects/special-publication-80...


Indeed. One of the authors of 800-63B is actively involved in the password-research community, and is already aware that the guidance places no restrictions on blacklist size.


Keep in mind a lot of these passwords are associated with email addresses in the actual dumps. By allowing a user to use one of these passwords, there's a non-negligible chance you're knowingly allowing them to use a username/password combo that is publicly available, and that any hacker who wanted to compromise their account could do so _on their first try_.

Besides, when it comes to passwords... half a billion really isn't all that much. There are (26*2+10)^8 = ~218 _trillion_ possible 8-character alphanumeric passwords. So even if every single one of these half a billion passwords were 8-characters long, you'd still only be disallowing ~0.0002% of them.


Some large services do use the actual dumps, and correlate them with the email address associated with the current user, in order to give users a personalized warning that they're reusing a leaked password that's already associated with that specific email address. This is a much different proposition from forbidding that specific user from using a half a billion passwords.

A full 80% of the v1 corpus can be avoided by simply requiring a minimum password length of 12. As Troy has pointed out elsewhere, this wouldn't be great UX, either. While it would dramatically increase the chances that they came up with a word that would be A) not in the existing blacklist, and B) harder to attack offline ... it would still be significantly bad UX compared to the best-practice alternative that I lay out in a separate thread branch.

But it would still be much better UX than use of the full blacklist.


All depends on the threat model.

Reusing username/email/password can cost your users hundreds per day on a gambling site. And users don't care about a password-gen guide. For example, in that case, you'd want to consider just generating passwords for them.

But of course this would be silly for the run of the mill website.


That’s funny because coral.co.uk posts its login over HTTP. In fact if you try to login via https it redirects you to http. It would be fun to setup a hotspot called _The_Cloud outside a Coral and see what you find on the wire!


Rather than "I am right and Troy is wrong", this seems to be a case of "reasonable people may disagree".


> The UX of a blacklist with a half billion entries would be so crippling that it would cause a user revolt.

The situation doesn't need to get as bad as you think. If you suggest XKCD's four random common word method[1] to your users as part of the user interface, you'll be fine. As a test, I tried putting together 2 random, common, unrelated words together:

- yak elephant -> yakelephant

- crowd brown -> crowdbrown

- plastic envy -> plasticenvy

- colon spanish -> colonspanish

- jogging adhesive -> joggingadhesive

Even with two words, none of the above appear in the Pwned Passwords database of a half a billion passwords. It's not difficult to choose memorable passwords and avoid entries from Pwned Passwords if you suggest this method to your users (preferably recommending four words, but three might be OK depending on the threat model).

[1] https://xkcd.com/936/


https://mostsecure.pw/

No pwnage found. Still confirmed for most secure password.


...this is a joke, right?


How do I submit a pull request on Github asking that "hunter2" be removed from his list?


For those who missed the hunter2 reference:

http://bash.org/?244321


I think the other half of his joke is this: https://github.com/danielmiessler/SecLists/pull/155


It was used just 16,092 times?

I thought the number would be much bigger.


I got 4,882 times. You're searching * * * * * * * right?


Adding a space after it reduces the number of times to 0. Quick fix!


Can someone please just provide the exact shell commands to generate a compatible sha-1 of a password to grep against the database?

The article seems to ramble forever about how to perform online checks without discussing the basic offline secure option.


  echo -n 'P@ssw0rd' | shasum -t 1 -
gives me the same value (21bd12dc183f740ee76f27b78eb39c8ad972a757) as appears in the article.


echo -n "password" | openssl sha1 | tr '[:lower:]' '[:upper:]'


You may also want to consider running `unset HISTFILE` before that to ensure that the line containing your password doesn't end up sitting around in your bash history.


Another way is to prefix the command with whitespace.


Only if..

    export HISTCONTROL=ignorespace
..is set (either by default or explicitly)


  python3 -c 'import getpass, hashlib; print(hashlib.sha1(getpass.getpass().encode("utf-8")).hexdigest())'
Avoids history, doesn't echo to the terminal.

In fact, you should be able to just make a rudimentary CLI into Troy's API simply with:

  #!/bin/bash
  HASH="$(python3 -c 'import getpass, hashlib; print(hashlib.sha1(getpass.getpass().encode("utf-8")).hexdigest().upper())')"
  curl -sS "https://api.pwnedpasswords.com/range/${HASH:0:5}" | grep "${HASH:5}"
(It'll emit the line from the API response matching your pass; if it does, then that password was compromised. Bash isn't real good at error handling though, so my biggest concern would be what this might do if an HTTP/TCP error happened. I've attempted to throw -S there to catch that, but use with your head screwed on.)


On my machine this produces

  (STDIN)= 5BAA61E4C9B93F3F0682250B6CF8331B7EE68FD8
Which is obviously not what you want. So I changed it to:

  echo -n "$password" | openssl sha1 -binary | xxd -p -u


This worked. Wow an old, old password that is fairly unique was seen 36 times.


echo -n "password"

echo adds a new line that is likely not in your password.


No it doesn't:

       -n     do not output the trailing newline


He's saying echo without an argument emits a newline, so use -n to suppress it.


Oops. Thanks.


You could skip the "tr" part, as the API to query by hash prefixes not case sensitive, and once you have its results, you can use "grep -i" with the hash.


echo -n "password" | sha1sum

then to remove the sha1sum's trailing spaces and dash: | sed 's/\s.*$//'

and then to uppercase it, as ianlevesque showed: | tr '[:lower:]' '[:upper:]'


http://onlinemd5.com/

Just uses JavaScript in the browser


I wouldn't recommend using this. Even if you were to read the page source and see that it is in fact not sending data back to a third party, either a) the site owner could change this in the future, or b) you could be man-in-the-middle'd since they're using http, or c) one of the third party scripts they run on that page could either accidentally or intentionally take your password. Don't put your password in there.


Split brain your password storage.

Another table, another database or another storage system in general.

If an attacker SQL injections your database don’t go spilling every hashed or unhashed password you’ve got.

I tend to store passwords in a separate keyvalue store from where my authentication identifier is (email, “username”).

If someone gets into my network they need to get into my servers with the email addresses and then get into a secondary system where the passwords are stored.

I like my password systems to be a k/v store because there is no need to “query” it. I usually store the password under a key that isnt the identifier. Instead using something like a database surrogate key.

Have a secondary system (microservice, private subnet) that simply returns a boolean representing if the provided non-email (key) and password (value) match.

Have that secondary system take the plain text password so it can do the hashing without letting the dependent service know what algorithm, salt or stretches you’re doing. This will also allow you to easily roll over to new hashing algorithms over time without affecting the service that is doing th authenticating.

Edit: I’m not trying to be a know it all or a crabby old tinfoil hat a-hole. But it’s passwords, man. When you leak them you ruin people’s days/year/life. Building that system above takes a middle of the road engineer a day or two. Put the effort in. Every password leak makes all of our jobs harder. It’s your companies responsibility to keep that safe. If you know that already, be the annoying guy that brings it up in every stand up. Make that debt known.


> don’t go spilling every hashed or unhashed password you’ve got.

Please don't store unhashed passwords. By now even PHP gives us the tools to do this right. You truly don't need secondary systems, or separate tables, or separate anything. Just hash the passwords.



I would argue that the benefit of putting all the hashes into a separate table is not really worth it. A separate service just to verify passwords sounds an awful lot like reinventing LDAP or AD/Kerberos with less features.

It should be good enough to simply encrypt the password hashes with an application-side key, a simple database dump won't leak passwords anymore.

If your passwords are properly hashed and stretched with appropriate and modern functions (SHA2/3 and Argon2) and encrypted-at-rest (Chacha20 or AES256) then you should be sufficiently equipped to secure your customers passwords against most attacks.


The point is not putting them in a system that can leak them right alongside the login identifier.

If someone wants to go exercise 9384828388 GPUs on your list. Fantastic, at least the other piece of their auth username/etc isn’t sitting right next to it.

Putting them in a separate system can be a simple REST server that sits in front of another database, LDAP or even something like Vault.

Don’t set them right alongside your ecommerce app you’ve got 30 juniors hacking on trying to get something out ASAP.


Why are your juniors hacking in production? They get a testing environment, if lucky they can play with staging but production should not be hacked upon.

I don't see the benefit of a seperate system still, if you really want to, LDAP already does all this. As does AD. Why reinvent the wheel and built a rest service for it?

I also don't know why it's harmful if a hash sits next to the username, if the database has been breached, password hashes are probably the least valuable information in such a leak. Mail addresses are more valuable.


They aren’t “hacking on production.” They’re occassionlly under pressure to release features. If they don’t get someone that catches a SQL injection or something else silly in a PR it’s nice to know we cant possibly bleed our password hashes.

Do you jam all your tables into one public/default schema? Or do you break them up by domain across schemas?

A very simple implementation of this in Postgres is an auth schema. Only one role can read from the table with the credentials, no app has access to that role. You write a function to verify a credential and use a security definer to give a role that can use that function access to read a single result.

In this case you’ve mitigated two “attacks”:

- a sql injection selecting all from users

- some random engineer taking a schema dump for whatever purpose, accidentally grabbing auth details and then having that get leaked somehow down the line.

I literally takes a few lines of SQL to set that up. It’s not an extra server to manage. Just a different schema with higher security constraints.

From your app with the credentials to call that compare function instead of doing a SELECT FROM you can call that function with the credentials and just accept Postgres telling you yes or no the credentials matched.

PG has a suite of hashing functions. Take your pick. That function can do your salting or you could add it as a parameter to the call.


SQL Injection is to my knowledge not a problem if you use literally any query building method but string concatenation.

Any sane SQL driver offers queries with parameters and at work and in private I enforce the usage of parameters in queries entirely. If I find myself needing string concat anyway then I will do it very carefully and isolated from user input.

That mitigates SQL injection.

I also see no benefit in breaking it up into multiple schemas or pushing authentication entirely into SQL. At best that means you complicated the database definition unnecessarily and secondly it means your database gets raw password strings which is contra to what I learned about password security, namely that the raw password does not touch the DB or the DB driver or any DB related code. Ever.

I also don't see how engineers dumping the DB and then accidentally leaking it is a tangible risk. I'd put that on the bottom of my list of things I have to worry about when I verify passwords.

Not everyone works with PG either or even MySQL or there is no possibility of setting up functions or triggers in the database.

>If they don’t get someone that catches a SQL injection or something else silly in a PR it’s nice to know we cant possibly bleed our password hashes.

Which is totally irrelevant if you properly secure the application, do proper code reviews and don't put juniors under pressure to push out code at all costs.

Pushing crappy code because deadline is asking for it. End of story.


The fantasy world you work in must be great. I’m jealous.

I don’t know how many times I’ve seen someone jam a gigantic pile of string into a query builders select method or done some whack string interpolation trying to make a clever search form instead of using a tool like ransack or the equivalent.

Like I said, that was a for instance using Postgres, you can do it plenty of ways. Hashi Vault has been an awesome tool for this recently.

Getting the password near your database isn’t the problem. It’s persisting it unhashed that is. I’d conjectire a fair amount of sites terminate SSL at the edge of their network (nginx, ALB) and they have a password floating through the stack in plain text until it hits a compare method. What does it matter if that method is in your app code versus pg or vault or any tool you’d use for credential storage? Are you sure your app is scrubbing all the json keys or http field names that could contain a password to make sure it’s not going into your rails/node/whatever logs or into your nginx access logs?

Having multiple schemas isn’t complication, it’s organization.

As far as people doing dumb shit with files they’ve dumped or downloaded. It happens.

When you’re designing a system for storing credentials you don’t know who will work on this system or what practices will be in place around code quality, reviews and deadlines when you’re gone. You can’t literally watch every PR or guarantee that ever code review has someone as smart as Zaarn presiding over it.

You should be designing something people can’t shoot themselves in the foot with.

An example of a totally reasonable mistake to make i saw recently in a PR in a client I work with.

They had designed a pretty solid API using the JSONAPI spec. JSONAPI if you aren’t familiar allows an API client to request what fields they need from the endpoint instead of just getting all the fields the API returns. This is nice for mobile clients.

They had the forethought to exclude allowing people to ask for “password” through the /users API. Nice. Nailed it.

A year passes.

They add a feature for changing your password that required you to confirm by email that’s the password change was you and then it swapped the password from the “new_password” column to the “password” column. While this PR was in staging you could do /users?include=new_password and it’d return their new password (hashed, but still).

Password resetting was a feature they were adding to their registration and auth code.

The people that worked on that front end didn’t necessarily work on the user API and weren’t familiar with how it worked. But by adding this flow and this field the created a vulnerability in a tangentially related system.

Now I’m sure you could crap on JSONAPI but it serves a purpose. If they hadn’t been using that someone just as lazily could have designed the regular JSON endpoint to return all fields but password and they’d have ended up in the same boat.


The takeaway from that (which has been known for a decade at least) is that your passwords need to be hashed and salted, and you need to use a hashing algorithm designed for security (meaning it is slow). Not that you should complicate your system architecture and create more potential points of attack.


Absolutely, Im not stating otherwise. Im saying you shouldnt have your authentication inside your business logic. It should be a separate service. Limited access, and transparent to the application that is calling on it.

If you want that to be a REST service, great.

If you want it to be another table/schema/database/server, great.

If you want it to be something else, great.

Just stop putting it right inside the application that is taking user input.

Don't roll your own hashing, use something solid, like Argon2, salt your stuff. Store your salt someplace secure, not in your frigging rails config.yml or inside env on your webservers or anything else that is publicly addressable.


> Im saying you shouldnt have your authentication inside your business logic. It should be a separate service.

And I am saying this is a wrong, counterproductive idea. It complicates your authetication for no substantial gain, and will only result in additional vulnerabilities.

> Store your salt someplace secure, not in your frigging rails config.yml or inside env on your webservers or anything else that is publicly addressable.

So you don't even understand how a salt works. Why should we take security advice from you again?


Wow you’re aggressive.

Using something like Hashi Vault, Gluu, or Shiro does not overly complicate things. They’re stable, trusted solutions.

It can also greatly simplify things. In the common scenario of having a web app for customers and a web app for admins, instead of them each having their own authentication baked in, you can choose an open source solution and deploy it twice, once for each service.

I know how salt works. Don't belittle me.

If someone gets into your web servers they’ve pretty much got carte blanche at your database the web server has access to. They also now have your code, which in a lot of cases is probably plainly readable. So they can see your salt, see how you salt and see the hashing algorithm you've chosen.

I’m saying don’t store the salt in your apps config right along with the credentials to access all of your login identifiers and hashed passwords.

You gave them a piece of the puzzle. If someone wants to grind through and crack all the salted/hashed passwords, I'd prefer them not to have the salt to help in that endeavor.


He's talking about how each password should use a separate salt. This is normally stored next to the password hash. Many hashing algo implementations will even do this for you.


Cool. So if it’s in the same table, it’s just as secure as one salt sitting in a config file. When the table gets leaked, so does the salt.

Also, is he? More than one salt sounds too complicated for him.


All security decisions boil down to your threat model, doesn't it? At some point there are decreasing marginal returns with increasing security. What threat models are you satisfying with the strategies your proposing? Do you think these apply to everyone? Or even everyone who stores hashed passwords? I personally don't think so, but I'm interested in hearing about things I haven't come across yet.


A day of engineering effort to make exposing passwords much more difficult. Yes. If you’re a company that stores someone’s password. You know they probably use it elsewhere so it’s your responsibility to help keep it safe.

Also I left the link off above ^


Sorry, but this is convoluted nonsense that can only achieve one thing: make yourself more vulnerable.

You want your security system to be as simple as possible, and to involve as little custom code as possible. Because you can and will fuck it up if you try to be clever.

Hash and salt your passwords using a library designed exactly for that purpose (which means it will use a slow hash). That's it, end of story.


^ above is pretty damn simple.

I also never said “write your own hashing algorithm” I said abstract it so it’s not sitting around in your ecommerce app code.

That is a simple security system. It’s just not baked into your flagship ecommerce, blog or whatever else your storing the credentials to protect.


Agree.


Rule n1: don't roll your own security. Rule n2: goto 1

You are overcomplicating your authentication system by oversimplifying security problems and the result is that you have solved nothing.

Security always seems very easy to solve and usually non-security engineers tends towards solutions like yours that doesn't provide extra security, they just add a few extra steps for a hacker to obtain you database and as a result you need to maintain extra databases, there are more error points... Do you remember that thing about "each extra system exponentiates complexity"?


You don’t have to “roll your own security.”

You can easily put any open source security system behind a secondary system. Hell - it would already be a secondary system.

Not putting your passwords right next to the identifiers is a simple way to lower the impact of an email or password leak.

Also, that quote is bullshit.


Meh... I won't bother. Discuss your solution with a security guy you trust.


go a step further and you will get into key management devices like the HSMs and/or the Amazon KMS. KMS cost almost next to nothing and it is pretty neat since its a web service, especially coming from the world of $40k+ Thales/Safenet HSM devices which are a pain to deal with (backups, rehash, redundancy).


Mind sharing how you currently use Amazon KMS in practice?


It was to meet the PCI-DSS Level-1 security standards for banking compliance. We'd store encrypted cards in one place and store the master keys in the AWS KSM to later decrypt it. But to retrieve the master keys, it goes through another layer of encryption.


[Pasting an old comment of mine on password managers, since I see people talking about starting to use Keepass. I hope this helps someone]

----

If you're just starting, here's some guidance on setting up a password manager.

First of all: Don't be afraid of using one. It's not just more secure, it's super convenient. Never again will you ask yourself: Did I make an account for this website/service? What email did I use? Never again will you have to remember a password. Using a password manager is a quality of life improvement.

KeepassXC is what I recommend to people at this point. It's free and you own your data (your passwords). They live wherever you want them to live. There are plenty of online services that are supposedly more convenient but I have to say I trust them less -- YMMV (1Password is the best I'm aware of).

https://keepassxc.org

If you do use keepassxc, you get the added benefit of being able to store 2FA settings in it as well (if you store them in the same database as your passwords, be aware that you lose the security benefit of a second factor, however it is still more secure than not having 2FA enabled due to the One-time password component).

Put every account you ever made and ever make into keepass. Enable 2fa wherever you don't have it enabled. Add login URLs and notes. Generate your passwords from keepass itself; the password generator is really powerful and lets you very easily deal with site-specific shitty password limitations. I'm telling you this because, seriously, it's incredibly convenient to have this stuff as long as you're rigorous about maintaining it.

Oh, also, keepass has the full history of all your passwords. Need to look up an old password? Go into details and look at "History". You can also attach files to items (items don't have to be accounts at all, you can use keepassxc as a simple encrypted storage db).

Mobile support: Keepass2Android. Best android client, with google drive support. iOS I have no idea, suggestions welcome.

IMPORTANT: BE STUPIDLY PARANOID AND RIGOROUSLY CAREFUL ABOUT YOUR MASTER PASSWORD. That thing, together with your keepass database, unlocks all your accounts ever. Use a really long passphrase that you will never have to write down (if you do decide to write it down because you don't trust yourself, store it in a safety deposit box, don't put it in a bloody drawer). Make sure the device you unlock the database on is malware-free.

PS: Wondering what's up with Keepass vs. KeepassX vs. KeepassXC? Keepass is the original app, written in .NET but with poor multi-platform support. KeepassX is a rewrite in Qt and is a fantastic password manager, but has gone unmaintained recently. The open source community picked up the slack in the KeepassXC fork (after continuing countless attempts to upstream the patches) and has implemented lots of powerful features. I've switched to it and at this point I strongly believe it's the better client.


I've just switched from 1password to keepassxc in the past few weeks. The only reason I did so was because 1password was trying to force me into their subscription service as I switched from macOS to linux mint. I looked at a few work-arounds on github, but eventually just decided to move over to keepassXC.

The export / import and overall setup was pretty painless. I am still able to sync through dropbox just like with 1password. There is also an ios app called keepasstouch which was a breeze to get going and syncs with my dropbox password vault. Finally, the browser extension works in a very similar way to 1password's.

Overall, I definitely recommend it. I haven't lost any functionality or security to my knowledge. I was happy to pay up front for 1password (probably dropped 60-80$ or so for their apps), but, after doing so, just couldn't stomach being forced into their SaaS model. Especially when a similar free, open-source alternative exists.


Just out of curiosity, aren’t you worried that keepasstouch app on the iOS may be compromised?


I have to thank you for bringing this to my attention!

Although I have not seen any reports of keepasstouch being compromised, it is true that it is not open source ... I had meant to download minikeepass, which is. I've remedied the problem. 2FA keeps me pretty safe on most of my important accounts, but I have still changed pw's on many accounts thanks to your comment. Much appreciated!


> you get the added benefit of being able to store 2FA settings

Don't do this. If you use a password manager with all the benefits this entails (long, random passwords, each only used for a single site), the only benefit 2FA really gives you is if your password manager is compromised somehow. If your second factor is in your password manager, you're screwed.

I use Authy with a long, secure password printed on a piece of paper. Yes, it is cloud and third party and everything, but it's on a completely orthogonal chain from my Keepass DB, so dual compromises are significantly more difficult.


> Don't do this.

On the other hand, do do this, but be aware of the tradeoffs.

I hate telling people not to do something. Most people just end up not turning 2FA on at all. My approach has converted many people from "one password reused everywhere, at best with variations" to KeepassXC unique passwords everywhere + 2FA and I classify that as a big win.

The biggest benefit of TOTP 2FA isn't the "second factor" part, it's the OTP part. This removes many forms of phishing, keylogging and database leaks as a threat to your account. You do not lose these benefits when you have it all in one factor.

If you read my comment, you'll see I address this concern. If this is a real threat for you, then you can always simply use a separate Keepass database for your OTP settings.


I grant that it protects against phishing, but I would cautiously suggest that sites that are smart enough to enable 2FA are smart enough to salt/hash/bcrypt/whatever best practice their passwords, so leaks are neutered. It doesn't not protect, so to speak, but the protection is likely to be redundant.

But it emphatically does not protect against keylogging, anyone who can install a keylogger on your computer can grab your password DB and your master password. This is exactly the scenario where you need actual 2FA.

Anyway, broader point: yes, it's a tradeoff, but the kind of people who needs explaining why a password manager is a good idea, do not understand enough to make an informed decision about these tradeoffs. And so, the responsible advice is to not use it.

I do know enough to understand these tradeoffs, and my conclusion is to keep password management and 2FA strictly seperate.


> But it emphatically does not protect against keylogging

1. A keylogger on your password db is useless if it does not also upload the db (at which point you're looking at a targeted attack, and you have far bigger problems than that).

2. Keyloggers are more and more often browser-based. KeepassXC is immune to those.

3. KeepassXC supports 2FA for the database encryption itself. If you're that paranoid, use that. There's always more you can do.

> And so, the responsible advice is to not use it.

No.

Just as you see in the article where Troy has to make the difficult decision not to include a "Do not put your password anywhere not even here" disclaimer, the same holds in my message: I weigh the pros of someone turning 2FA on as far more important than the cons that come with the less-than-ideal security 2FA adds.

Your advice keeps people from turning 2FA on. 2FA is a pain in the ass for most people.

You are one of the lucky few who understands the tradeoffs involved, as you yourself said. So use that knowledge of yours to actually get people to secure their accounts.

My goal isn't to keep Edward Snowden's accounts secure. It's to keep the bored HN user's account secure. The average HN user has medium-to-high technical literacy and low-to-medium security literacy. A lot of people on here reuse passwords, I'm sure. This is what I'm trying to fix, and I won't advise Ed to keep his TOTP seeds in the same database.


I'm not sure that 2FA is going to give the average person as much protection as you assume. You have to keep your 2FA key somewhere. So instead of needing your master password and password DB, you need the master password, password DB and 2FA key. But the question is how hard is it really to get that key? Certainly harder than having it in the password DB, but in practice, not really any harder than getting the password DB in the first place.

There are lots of options: using memorable passwords/passphrases, using random passwords, using cloud based password managers, using password managers on your own devices, putting the 2FA on the same device, putting it on a different device, putting the 2FA on an air-gapped device.

IMHO, only putting your 2FA on an air-gapped separate device gives you dramatically better protection in the areas you are concerned about. The rest of the conversation is really about where "good enough" lies -- and that depends entirely on what you are doing.


> If you use a password manager with all the benefits this entails (long, random passwords, each only used for a single site), the only benefit 2FA really gives you is if your password manager is compromised somehow.

I don't think this is true. Your long, random password can still be compromised, for example if ever type it on an insecure device (a friend's computer? public or shared computer? etc). 2FA keeps that single compromised password from being useful.

It is undeniably true that storing 2FA secrets in the same vault as your passwords reduces security. However, it's still way, way better than not using 2FA at all.


I use KeePass 2.x (can't be arsed switching to KeepassXC), but I absolutely second Keepass2Android. It supports a whole host of backends, which includes SFTP - great it you want to securely keep your password DB online without using a 3rd party service (e.g. Dropbox).


I use keepass, and I whipped up a quick script to look all the passwords in my keepass file against the hashes in the list. You have to download and unpack the file first.

Here is my script: https://gist.github.com/martinhansdk/de8b27934adf9580aebf2e4...

I got a bunch of hits, so time to change some passwords...


I would also recommend KeepassXC. I personally store the password file in Dropbox allowing it to be used on all of my devices. I use a second KeePass file with a different password for storing 2FA backup codes. I open that so rarely that I have to sometimes test that I still can!

On iOS I would recommend the free MiniKeePass app. It isn't as easy to use as some KeePass app(s) I used on Android years ago, but it gets the job done.

I have a old article[0] on how to use KeePass effectively which makes the workflow more seamless. Ugh - really old, I really should update that!

[0]: http://iamqasimk.com/2013/10/01/using-keepass-effectively/


That’s good info with one exception. You are probably ok with Keepass, but for most people I’d MUCH sooner recommend a solution that can not be done wrong.

LastPass is my go to for most people. It simply can not be set up wrong or confusing. The same can’t really be said for keepass.


As a sysadmin who is pretty critical of password managers, I can vouch for keepassxc as well. I started out with keepass, then keepassx, and ended up on keepassxc. I refuse to use an online pw manager.


iOS has https://github.com/MiniKeePass/MiniKeePass, available on the app store.


Thanks for the heads up! Have you used it? What do you know about it?


I use it regularly. The only real drawback is that the database has to be updated manually (e.g. from DropBox or wherever). Outside of that issue it has lots of features and config options and is generally easy to use.


I just tried a 11 character password without special chars, that I’ve used on over 50 sites, over the last decade. It’s my password for throaway websites. Some pretty dodgy.

Not in the database.

Makes me feel pretty good about password security overall!


I have a password that’s a pair of words in two languages with some number substitution that I use a lot on websites I don’t care about and it’s not in the dB. And I’m sure I’ve used it on sites that have been hacked, so I dunno.


Unless the site was also storing in plaintext they’d have to actually crack the password hash too, which for some passwords is very hard to do.


I think many websites that got hacked leaked email addresses cc info etc, but most do not store passwords so your password wasn't leaked in plaintext.


I just checked my very obscure 9 character password I use only for financial websites, and it appears 3 times.


insert the mathematical joke about the half-black sheep


There's something odd about a website that urges you to test your security by typing in your password.


He specifically tells you that you shouldn't do that. But as the winning tool shows, perhaps people who have passwords in HIBP will changed them after finding this out?


I'm going to have to disagree with the premise that sites should stop users from choosing a password which happens to have been cracked offline at some point in the past -- to the tune of blacklisting half a billion potential secrets.

What exactly is the end goal, and at what cost? Well, there are 3 ways to steal a password. You can steal it from the user -- either by phishing or with malware -- in which case it matters not a bit how complex the password is. You could attempt to crack it online, that is, by attempting to login as the user through the front door. In this case, a simple counter should limit the number of attempts before a second factor is required, such as clicking a link in an email, and a list of half a billion candidates isn't going to help here either. Finally, you can steal the password verifier database and attempt to crack the password offline.

So, the theory must be that passwords in a known attacker's dictionary are more likely to be used as candidates in an offline attack. This is likely true. But once the verifier database is stolen, if an attacker is able to run the password hashing function, then every password which is not raw entropy already must be assumed to be cracked. Regardless of how hostile your password policy is. So what exactly is this policy saving you?

On the flip side, it's reasonable to ask, what would such a policy cost you? It's hard to say without actual data, but I'd love to see some data on what percentage of candidate passwords offered by a user trying to signup on their mobile device would be rejected under this policy? How many attempts on average would it take a user to find a password which was not rejected? And what's the increase in bounce rate, and therefore lost signups, that would result? How many increased password resets would be required from users choosing passwords that they inevitably don't remember? How many additional lock-outs which require customer support capital to resolve?

Password policies on average are pretty horrendous. But password policies which are arbitrary black boxes to the end user are about the worst you can find. Sitting on my mobile device, not knowing if a chosen password will be accepted, having to type it twice each time, is a wretched user experience which would need extremely lofty benefits to outweigh the cost. I fail to see any benefits to this approach which couldn't be solved with better hashing which wouldn't impact the user experience whatsoever.

I'll say one more thing on the idea of blacklists. Users just trivially work around them. Password quality (entropy, guessability) does not generally increase. Bad password policies often decrease password quality, particularly in the case of password expiry. But a frustrating opaque blacklist could be just as bad. (I'm not aware of any studies on this).

An attacker who knows a particular blacklist was in place will use munging rules to find derivatives from the master list which are not on the blacklist. How much will their crack rate (percentage of clears recovered from an offline attack of a given magnitude) be affected? But more importantly, potentially driving down the crack rate through user hostile password policies is a game which has huge dividends at first (getting a password which can't be attacked online) and very little dividends after that point.

Disclaimer: Founder of BlindHash, which is the "better hashing" I refer to above.


> I'm going to have to disagree with the premise that sites should stop users from choosing a password which happens to have been cracked offline at some point in the past

Well, NIST, NCSC and Microsoft all seem to be on the same page:

https://www.ncsc.gov.uk/guidance/password-guidance-simplifyi...

https://pages.nist.gov/800-63-3/sp800-63b.html#5111-memorize...

https://www.microsoft.com/en-us/research/wp-content/uploads/...

> What exactly is the end goal

The end goal is to prevent password spraying attacks. If the attacker can get in on the first try because the user has re-used credentials in between two services, account lockout policies don't help and neither does BlindHash (if the compromised service wasn't using it). For it to bring any significant security benefit you would need everyone to use it (which will obviously never happen).


NIST has unfortunately been the source of a lot of bad advice which has actively harmed password security the last decade. (e.g. [1])

Cargo culting is generally a good thing in crypto because, you know, don’t roll your own. But in this case we’re talking about policy. And this policy is as user hostile (if not worse) than the prior NIST advice on password expiry.

If you want to stop password spraying, protect your hashes. There’s no proof that blacklisting half a billion specific secrets will make cracking any more difficult. Making it neigh impossible for users to register with your service, well I guess if you have no users you have no passwords to lose.

But the point is a blacklist this extensive is just as likely to make passwords easier to crack, not harder, and will come with a direct cost to the company implementing it. I understand well the goal, I’m entirely unconvinced this helps achieve it.

I would be interested to hear Gosney’s (cracker extraordinaire) and Cormac’s (Microsoft Research) take on this.

[1] - https://www.engadget.com/amp/2017/08/08/nist-new-password-gu...


> If you want to stop password spraying, protect your hashes.

Again, it's not about your hashes, it's about the attacker having access to your users' credentials.

Users re-use credentials accross services and you have no control on how (in)securely they are stored there.

Blacklisting (I don't have an opinion on how big the blacklist should be) what is known to be widely used accross services sounds sensible... and there is definitely an argument to be made about blacklisting what is known to be widely available/effective for attackers.


> How many attempts on average would it take a user to find a password which was not rejected? [...] I'll say one more thing on the idea of blacklists. Users just trivially work around them. Password quality (entropy, guessability) does not generally increase. Bad password policies often decrease password quality, particularly in the case of password expiry.

That was my concern - if this blacklist is used widely,could it _encourage_ password reuse? If it takes multiple attempts to find a valid password, will the user just find one that is accepted and then use it everywhere?


Hmm this is a pretty great list to use for any service that has user signups -- disallow signup when using a password that is known to have been "pwned"! :)


Here is a quick PowerShell script - it supports the pipeline so you can automate, e.g. if you use a command line password manager.

https://gist.github.com/lzybkr/85b4dbd6536ea5351e8d8e492a432...


Thank you


I love that simple API! Here's a bash one-liner that checks if 'hello' is compromised:

curl -s https://api.pwnedpasswords.com/range/$(echo -n hello | shasum | cut -b 1-5) | grep $(echo -n hello | shasum | cut -b 6-40 | tr /a-f/ /A-F/)

Edit: Improved one-liner that only requires typing the password once and avoids storing it in the bash history:

(echo -n "Password: "; read pw; curl -s https://api.pwnedpasswords.com/range/$(echo -n $pw | shasum | cut -b 1-5) | grep $(echo -n $pw | shasum | cut -b 6-40 | tr /a-f/ /A-F/))


You can use `read -s` to avoid risking having someone behind you read your password on your screen as you type it.

(echo -n "Password: "; read -s pw; curl -s https://api.pwnedpasswords.com/range/$(echo -n $pw | shasum | cut -b 1-5) | grep $(echo -n $pw | shasum | cut -b 6-40 | tr a-f A-F))


A quick Ruby script to check if a password has been compromised using the Pwned Passwords V2 API: https://gist.github.com/schmich/aeaffac922271a11b70e9a79a5fe...


Guess my password is safe, it seems it was skipped in the list:

hunter1 - 3 times

* * * * * * * - 28 times

hunter3 - 2 times

/s


This is really great and I will use this API.

Wrote a simple method in PHP using 10 lines: https://gist.github.com/JimWestergren/a4baf4716bfad6da989417...

Feel free to use.


A quick python script to hit the API, for those that don't want to use the webform (rightly so):

https://gist.github.com/ShakataGaNai/cb786a2c64abc83d4dbe0db...


Or, for those that don't want to use Python (in case it isn't installed, or requires a non-core module, I dunno) but have access to a Linux box:

    # echo -n "password" | sha1sum
    5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8  -
Take the first 5 characters, in this case "5baa6" and use at the end of the API endpoint in your browser. E.g.

    https://api.pwnedpasswords.com/range/5baa6
Then take the all the rest of the hash after the first 5 characters, in this case "1e4c9b93f3f0682250b6cf8331b7ee68fd8" and ctrl-f search the results page for it.


For ease:

  # echo -n "password" | sha1sum | cut -b 1-5
    5baa6


Yeah, but you would need two commands then, because you need to search for bytes 6-40 in the resulting output. I made a one-liner farther down the comments.[1]

1: https://news.ycombinator.com/item?id=16434244

More

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: