Hacker News new | more | comments | ask | show | jobs | submit login
Prevent users registering with passwords from data breaches (jordanhall.co.uk)
80 points by DivineOmega 35 days ago | hide | past | web | favorite | 123 comments

I need password fields to:

1)not silently fail when I try a 64 character (or 32 character) password

2)not fail and say my password is "too short" when it is 32 characters and you have an unrevealed maximum password length of fewer characters than that.

3)just all-around quit failing when my password is totally fine, it's a quasi-random string of letters, numbers, and symbols and I'll never type it...

oh yeah

4)don't disable pasting in the password field

4b)IF you do, freaking let me see what I've typed. I promise to not enter the password in a place someone can shoulder surf. Trust me that's WAY down on the threat list for my life.

Then, and only then, if you really want, keep me from registering with a breached password. But you had better tell me that is what is going on.

And not complain it doesn’t have enough entropy because it lacks special characters when it is a hex of a 128bit key!

What I find the most annoying is that developers enjoy creativity for password validation. Some requires special characters, but only from 2 or 3 allowed special characters (what good does it even do in term of entropy???). Some limit the number of times a character can appear in a password, and I understand that it is meant to prevent people using “aaabbbccc” as a password but it also makes long passwords impossible. Many have an artificially low length limit (why would they even do that)?

I've implemented multiple such systems and I can say it's never been a case of the developer enjoying it. These dumb requirements always come from an exec that knows better. At one point a marketing manager informed me I "don't understand password security" while handing me a non-negotiable list of dumb mechanisms like you described.

Yup. When I get tasked to do this I will usually end up building a system where you get a choice at the software configuration level:

Do you want actual password security? We can do that. Otherwise, here's whatever nonsense is currently favoured by people who know nothing but set policy. Sometimes obeying policy _is_ negotiable but only for some users, so I could set the former configuration where I was able to negotiate and the latter for everybody else.

For example "passwords are case insensitive" or "no spaces" is likely to be in the latter because someone in customer support finds it easier. "Use any halfway decent salted, pessimised hash" may be in the former, whereas "Use this specific named hash that you'll have to implement by hand because it's not offered in the language and libraries used by the system" is more likely to be in the latter

I will say it's not necessarily "an exec that knows better". You're often playing a game of telephone with a third party such as a penetration test service, some government bureau or a "Being a CTO for Dummies" type text book.

When it's a game of telephone in a large organisation it's unlikely you'll be able to fix it, it only takes one person with political cover between you and the person who originally said something like "I don't know, 16 characters?" to ensure that's a hard requirement you're now being tested against.

My previous employer had systems that made it impossible for QA to get a "normal" customer experience, supposedly for "Network Security". A fraction of all pages loads at random are redirected to / on the correct site. All page loads. Image request, just logged in, made a post to a forum, anything might be turned into a request for the home page. Clearly it was a trivial config bug, and I even met people who knew which individual had made the configuration error, their name, where they worked, etc. But that person had political cover from VP level and so the bug was simply never fixed in the years I worked there. I've actually promised my entire ex-team lunch out if it's fixed while my retainer is still in place. I expect towards the end of the retainer I'll just go back to that city and take them out to commiserate, because the chances of it ever being fixed are so slim.

  why would they even do that?
Because they aren’t good at their job. Because they have managers that use conversation comfort zones as a criteria for decision making.

Pointy haired bosses often deliver solutions based on what sounds like an admirable position to take, or a point of view that makes them look good politically. One that sounds good in a rapid fire round table discussion, like a daily stand up, where the boss’ boss needs two sound bites from them because he’s VERY busy, and doesn’t have the time to think deeply before cocktails at lunch, and an afternoon of golf.

  Q. Give me your update on the password business.

  A. Very good sir, we’re making users pick strong passwords.

  Q. What defines the strength of the password?

  A. Well, we assign 10 requirements, to ensure the user picks something sufficiently random. More requirements is better than no requirements.
Raises and promotions for all.

I like to call those sites out and have submitted some to https://github.com/duffn/dumb-password-rules

Put Microsoft at the top. Office 365 has a maximum password length!!! No more than 16 characters, thank you very much.

If they cannot get it right, how does that leave the rest of us?

There’s something that I don’t really understand regarding password managers which maybe you could explain. How is using a password manager to manage multiple passwords more secure than using a single password everywhere?

In the case of a password manager, if your computer is breached, then all passwords are breached. If your passwords are hosted encrypted on a website, then if that website is breached, the master password you send it will be visible to the attacker, and thus all passwords are breached.

In the case of using the same password everywhere, if one website is breached, then the password to all sites is breached.

Am I missing something?

If your computer is "breached" then it doesn't matter where your passwords come from. They could just run a keylogger and get them that way.

It really depends on your threat model. No system is perfect and we're mostly balancing security and convenience most of the time.

The same password everywhere is super convenient, but only one site needs to fail and it's all over. A cloud password manager with unique passwords protect from that. A local password manager eliminates the threat of them being compromised, but is less convenient if you have multiple devices. Storing your passwords offline on paper in a safe protects your from someone getting read access to your PC or outright stealing it, but is annoying and gets more inconvenient based on your passwords entropy.

When using a single password for all sites, any malicious or breached website exposes access to all your accounts. When using a password manager, only a breach of your computer or the password manager itself (or your email) will totally pwn you.

Breaches online are very, very common, and I would expect any password used on every single website you use to get leaked quickly.

Think of it this way -- your password managers password never gets passed to untrustworthy websites, so there is a very small chance it would ever get breached.

You now only have to trust one website/program, rather than every website (and we have evidence that many major websites can't be trusted with our passwords).

The password for the password manager is only in my brain, the only way to get it is to install a keylogger on my machine. If instead I used it on all 30 + websites that requier login the only thing you need to do it is to hack or wait for one of this sites to get hacked and get it from there, then login with it in all this websites.

I had used the one password on most websites years ago(I did not known better back then but I did used different password for my important email account), I think my credentials were leacked in 5 data leaks and hackers are still attempting to login with my old credentials in present. The only time this affected me when someone hacked and stole my Minecraft account but I got it back.

I've probably made accounts with 100+ or so different websites or services. Eventually this turns into a game of statistics: https://haveibeenpwned.com/ lists no less than 6 mass breaches my account details have ended up in through no fault of my own, all of them including at least email addresses and password hashes (if not plaintext passwords)

(edit: Actually, I've recalled at least a 7th service that has been compromised that HIBP doesn't list)

That's either 6 times my bank account password gets leaked... or 0 times if I use unique passwords. I strongly suspect there are more breaches that HIBP simply isn't aware of with my information in them. Additionally, I need to rotate either 600 (!) or merely 6 different passwords to re-secure my accounts. So, obviously, unique passwords are a big win here.

> If your passwords are hosted encrypted on a website, then if that website is breached, the master password you send it will be visible to the attacker, and thus all passwords are breached.

I use an offline password manager specifically because of this reason. Still, needing to pwn any of 5 targets to get my bank account (me, my bank, my password service, my email provider, or my phone provider) isn't much worse than needing to pwn any of 4 targets (me, my bank, my email provider, or my phone provider), and both are much better than needing to pwn any of 100+ targets, some of which have terrible security and are probably already pwned without even knowing it.

So, options for unique passwords:

Memorize 100+ strong unique passwords: I "can't" do this.

Memorize 100+ weak unique passwords: Too weak to brute force attacks for my tastes.

Use a memorizable unique password generation scheme: Not unique enough for my tastes.

Memorize a few very strong unique passwords and use a password manager. This does have the weakness that capturing my master passwords via keylogger will get you all my accounts, instead of just all the important ones that I regularly use - but since "all the important ones that I regularly use" includes my email, that's just as good as pwning all of my accounts in my book.

>if your computer is breached

>if that [one] website is breached

>if one [any] website is breached

Not one of those things is like the others.

64? How much entropy is in the passwords you're pasting? While holistically I like to see maximum lengths of 200+, personally I'm satisfied with 20 characters holding 119 bits of entropy.

I find a passphrase (e.g., half a dozen words from the diceware list) easier to memorize than random gibberish and symbols, so it's disappointing that they're so rarely allowed.

I understand that entirely, but I'm asking someone that specifically said they don't type their 64 characters.

hey, I'm fine with 20 if you tell me ahead of time that is the maximum. My generator just goes up to 64, so that's where I start. Though I usually find through trial and error that 16 is the maximum in a lot of places...

and allow spaces

But then the support workers might cut-and-paste it wrongly!

All of those rely on Troy Hunt's API. Which may be fine for some people, yet others may uncomfortable introducing an external dependency. I generally recommend avoiding external API dependencies if you can.

Here's a python implementation using bloom filters which avoids storing the whole list (need to store ~1 gig), yet still gives you very good accuracy: https://gist.github.com/marcan/23e1ec416bf884dcd7f0e635ce5f2...

It's more what I think this should look like.

Tend to agree on third party dependencies. They're a useful leg up if you're just prototyping, testing an idea, or generally groping around for product/market fit, but I'm not in love with them as a long term solution - certainly not for anything that's core to your product or service.

Too many potential pitfalls:

- If their service goes down, you may go down too, or you have the added complexity of gracefully handling the situation when they're down

- You have no control over their roadmap... which means you might suddenly get a bunch of non-value-adding but totally essential work dropped into your backlog, and perhaps at short notice, because of changes they make

- Perhaps they go out of business or, for whatever reason shut down their service: again, congratulations, you've just got a load of work you didn't bank on getting in the way of delivering your own roadmap

I think most of us have probably seen multiple examples of the above, if not at first hand, then posted on HN or elsewhere on the web.

I'm not an NIH kind of guy but I do tend to prefer libraries, or co-located installs for long-term dependencies. That way at least you can manage migrations and updates according to your own agenda rather than somebody else's.

Similarly I also wrote a locally hostsable Golang-based REST web service that you can use to check plain passwords or hashes against the HIBP database (and other dbs). It’s based on an optimized Bloom filter library and pretty fast. It also provides a CLI tool and libraries for Python and Go:


There’s no reason you can’t do both... simply build your internal code to check against multiple sources. You can asynchronously hit your bloom filter, HIBP, DeHashed, etc. and cut short whenever there is a hit.

In this way you get the best of all worlds; speed, highest degree of accuracy, and reduced dependency on a single external API.

Ha, I was just about to ask for a bloom filter version of this. Thanks!

This is over the top. Even enforcing password complexity is over-rated.

For online attacks, an attacker can't even try the top 1000 passwords on for an account in any major website in reasonable time without triggering the alarm, as they all(?) have rate limiting (usually in the form of account lockdown after single-digit failed attempts).

For offline attacks, there first needs to be a breach. While they undoubtedly happen, they are very infrequent events. But once they happen, you should assume all passwords would be cracked very fast. Hackers can get their hands on a lot of computing power, and the brute-forcing attempts are not alphabetical, but rather clever how-humans-chose-passwords models. You're relatively safe because of the low frequency of breaches, not because a hacker trying trillions of passwords a second will be frustrated by your password choosing policy. I'm sure most of the passwords from the breaches would be attempted anyway.

Credential stuffing[0] is the real issue. If there was an API to test that a user isn't using this password on other websites, that would be very useful.

[0] https://www.owasp.org/index.php/Credential_stuffing

> For online attacks, an attacker can't even try the top 1000 passwords on for an account in any major website in reasonable time without triggering the alarm, as they all(?) have rate limiting (usually in the form of account lockdown after single-digit failed attempts).

This is empirically a practical attack: attackers successfully executed a common password brute force attack against GitHub in late 2013 by using a botnet with 40,000 distinct remote addresses:


Our solution for a bitcoin casino was to generate passwords for users. But you can imagine how few sites can get away with such a thing. Our create-password input was a disabled textfield with a reroll button.

Before that, attackers would just wait for new usernames to appear on the scoreboard/chat and check them against password dumps. The easy come, easy go nature of bitcoin made it particularly lucrative.

Password reuse is a massive issue. At a glance, one might wonder why most sites need to care so much since they don't deal with money. Who cares about a forum like HN? But consider that it's easier to audit/impede new accounts with anti-spam measures, so there's value in taking over old accounts. And you don't want people with moderation tools getting attacked either. Ideally, it's nice to be able to trust an account with 1,000 posts more than one with 0 posts, but that evaporates when accounts are easily stolen.

Aside, how do you implement account-locking without making it trivial for users to DoS each other that way?

> Aside, how do you implement account-locking without making it trivial for users to DoS each other that way?

By adding an (increasing) time delay after each failed attempt. However many sites, banking in particular, just lock your account and you have to call them on the phone to reopen it. Totally open for massive DoS but the world still stands.

Well, that introduces trivial DoS, so it doesn't satisfy my question.

Notice that banking websites can get away with it, yet surely you weren't only talking about banking when you said "online attacks".

Sorry, I tried to say 2 things: 1. Your simple solution: add a 5 second delay after each failed attempt per username OR ip address. 2. It seems that sites that do account lock-outs don't suffer from DOS attacks, for some reason.

I'm not so sure about that. Hire time on a botnet so you have lots of IPs and start trying mixed usernames with mixed passwords. You won't have any choice over which username you eventually break in with, but I don't think you'd trip any alarms either.

Making credential stuffing harder is the main reason to do this. Credential stuffing works because users reuse credentials across sites. If someone attempts to use a password from the HIBP database, the two most likely cases are that it's extremely common or the same person is reusing it. Extremely common passwords are bad for all sorts of reasons and the same person reusing a breached password makes the account vulnerable to credential stuffing.

You're right. I don't know why it's still not a common knowledge that nobody bruteforces the login pages in this day and age. But I guess too many people, who have products to sell, benefit from the status quo, so things are not going to change.

No, login pages are definitely "still" brute forced. Why wouldn't they be? It's easier than ever.

>Why wouldn't they be?

Because it's trivial to implement decent ratelimiter which stops attackers, but still allows normal users to login.

That you must mitigate the attack is a direct point in my favor and a contradiction to yours: "nobody bruteforces the login pages in this day and age".

It's also not a trivial issue. It's cheaper and cheaper to attack a website with unlimited IP addresses. Which dimension are you going to rate-limit?

What do you mean by "which dimension"? Login attempts per second. How you do it? Depends, but basic answer is captcha. How do you make sure normal user are not affected? You monitor the login attempts and use the metadata, like ip address of last successful login and valid/expired cookies, to assign the "level of trust" on that particular attempt. The lower the "level of trust" the longer it should take. If you have evidence to suspect that account is under attack (like you see multiple low level of trust attempts with random ips) you limit the rate even more, with something like "no more than 10 low trust attempts per hour". If you allow to bruteforce thousand-entries password dictionaries in the reasonable time frame, it doesn't mean the "bruteforce problem" is not solved, it just means that your system is defective.

There is no lock that cannot be lockpicked. The only difference between a good lock and a bad lock is amount of time it takes to lockpick it.

You're starting to build a hand-wavy levitation machine though ("monitoring metadata", "level of trust"), and confirming that it's not so trivial, which is the assertion that started this thread.

And captcha only makes brute-force somewhat more expensive. You likely have to use a 3rd party captcha service (like Recaptcha) which incurs network volume amplification since you need a req/res to Google just to render GET /login. It also shows how hard of a problem captcha is that you can't just roll it yourself.

You're still not addressing the problem since IP addresses are so cheap. I only need 1,000 IP addresses to try the top 1,000 passwords in parallel no matter what your rate-limit scheme is unless you plan on letting me lock the authentic user out of their own account.

You've wandered from your original claim that bruteforce doesn't even happen these days, and I'm certainly not saying we are helpless against it. In the end, I'd simply suggest that this problem is harder than you originally gave it credit for, and maybe that's something we can agree on.

I didn't change my argument. "Just bruteforce the login page" is not a go-to method of the competent attacker. Why? Because it's trivial to stop it.

You want simple concrete example of how to stop all that "unlimited amount of ip addresses with captcha solver service" stuff? No more than 20 attempts per day with ip different from the ip of the last successful login. Here it is, you just solved the "bruteforce problem". That's all you have to do. Other things are just quality of life improvements.

Just because it's trivial doesn't mean all websites are doing it.

Some sites store passwords as plain text. And?

But if care about security, you might as well solve the problem properly instead of wasting your time implementing "your password must have a capital letter and a number" snake-oil features which only annoy your users.

And I thought we were getting away from arcane rules for passwords. Now you have to avoid every compromised password from any unrelated account? I may use random passwords, but I don't expect the typical consumer to do the same. Sometimes I simply don't care about security for a one off account on a free service where I'll happily use the simplest permutation of "password" for the password.

I think every entity which holds a password should adopt the same policy: from the moment the password is created, the entity will undertake to do everything it can to try and break it. Whatever it takes - bot farms, 3rd-party white hat outfits, social engineering in the canteen, whatever. Anything (legal) goes.

Once the password is broken, the account will immediately be placed in suspense until the owner creates a new password. Which, of course, immediately gets fed back into the machine and life goes on.

This would eliminate password rules. If you want to create a password which consists of, say, 100 consecutive zeros, go for it. But you might only get to use it for a fraction of a second if the network can break it quickly.

I can see a few problems with that approach:

- It's already fairly straightforward to predict how easy a password will be to crack e.g. if you're going to bruteforce dictionary passwords or those with predictable combinations of characters, why not just include those in your password blacklist? - Randomly suspending users' accounts and telling them to change their passwords is going to annoy them (especially if it happens repeatedly) and consume support resources - It's overkill unless the service is security critical - How easy a password is to crack is partly a function of the hash algorithm and salting the entity uses so it may not be the user's fault their password gets broken

Most people don't know how to make a good password. So don't let them make a bad password, and make that easy by providing a button that generates good ones.

"It has to be new" is about as elegant as you can get for password rules. You can use any secure method you want. Whatever characters you want, just don't do it wrong. If you want it to be easier, press the button.

Even with a one-off account, a complex password you don't bother to remember is better than a weak password, because the service won't have to deal with tons of misused accounts.

Better for the service, theoritically, but not better for me.

I say better for the service theoritically because I probably won't sign up (less of an issue since I moved to LastPass, but still annoying because so many sites don't work with autofill.)

(FYI lastpass is a bit of a shitshow, I'd recommend using a different service)

Can you expand on it, and which choices are better?

Starting point would be https://en.wikipedia.org/wiki/LastPass#Security_issues, but there's a variety of other things like automatically filling passwords making you vulnerable to XSS based password stealing, the general disrepair of the extension, it being owned by logmein, the "two factor auth" being entirely clientside (the extension will actually fully start up and autofill whatever page you're on before prompting you for a code) and the fact that they're still trying to charge me for a premium subscription I cancelled ages ago.

Personally I use 1password, but there are loads of other ones that don't have a list of security breaches on their wikipedia page.

The existence of password breach databases means that attackers are already attempting other people's passwords against your accounts.

We should honestly move to the world where typical consumers are using password managers that generate passwords randomly. I think it is pretty reasonable to expect the typical consumer to install and use a password manager; I think it's pretty unreasonable to expect the typical consumer to generate memorable, strong, and unique passwords for each website and store them in their head.

Realistically even a dictionary attack for a login service should have some sort of flag after repeated incorrect passwords or at least be rate limited to some degree. It would really only be bad if they gained access to your hashed passwords database table because then the dictionary attack would immediately hit for some weak passwords. I argue that access to the hashes shouldn’t be allowed in the first place.

Even putting online attacks aside, breached password lists make fantastic dictionaries for cracking hashes.

If the service doesn't care about security at all then you could just not have a password, log on using your email.

If it cares a little bit then you have to characterize exactly how bad it would be if the account was compromised. The right solution might be to just accept any non-blank password that's less than 32 chars or something.

In theory blank passwords aren't a problem for some sites, because usernames shouldn't be the same as display names, and usernames can carry enough entropy to deter simple attacks from outside the organisation?

You don't even need to call out to an external API to get good coverage for this rule, in case you're averse to doing such a thing.

You can just do a case-insensitive match against this file that I compiled a while back: https://github.com/robsheldon/bad-passwords-index

It includes the most commonly reused passwords according to in-the-wild breaches.

I'm a bit embarrassed to see that it's been 2 years since the last update. I was thinking recently about updating this again. I think I'll do that.

This is a neat solution! My coworker documented how we mimic the PwnedPasswords API serving (lots of) static files: https://blog.benhaney.com/2018/11/04/recreating-pwnedpasswor...

So what happens when a significantly large percentage of 'standard' passwords are disallowed and your average Joe just can't be bothered to create another one?

Seems like eventualy you'll end up driving away everyone who isn't tech savvy/doesn't use a password manager/randomly generated password, which seems like something that'll significantly limit your site or app's audience.

I get the logic behind it, and it's a neat idea on a security level, but it seems like a guaranteed way to drive your userbase to your competitors by making it annoying to sign up.

If you need your users to have a secure password then that's a good thing. We need to make password managers simple enough, and default, for all users.

> So what happens when a significantly large percentage of 'standard' passwords are disallowed and your average Joe just can't be bothered to create another one?

This type of question should always be asked with an "XKCD What If?" or "Back of the envelope" calculation attached so that when responding people can respond to your numbers rather than a vague intuition like "large percentage of standard passwords".

This way when doing the calculation you sometimes go "Oh, I see now" and then you don't have to post the question at all because you learned something on your own just like if you are wondering what a "password" is and then check a dictionary you don't have to ask "What's a password?"

What's a "significantly large percentage?" Let's suppose it's 10%. If 10% of these "standard passwords" are disallowed, and a user is only willing to try twice before giving up then that means 1% of users picking random "standard passwords" can't sign up, which we could argue is a noticeable problem for your site.

Next we need to ask ourselves what's a "standard password"? Do we just mean "a password from the top 100 most common passwords list?". If so game over, we've decided by policy to have users with bad passwords that will be guessed, their accounts will constantly get broken into, and I guess now we need to figure out why our service is "compelling" enough that you want to pay for it anyway despite the constant break-ins.

So let's suppose we have a broader definition of "standard password". Maybe it's 8 characters chosen from A-Z. That sounds like a "standard" password my mother would choose. That's over 200 billion "standard passwords". Troy's service blacklists, wait for it... 500 million passwords.

Let's guesstimate that the list grows by half that amount every single year - meaning 250 million new unique passwords are revealed by idiots every single year. We should hit that concerning 10% rate no sooner than... 78 years from now. If in that time we cannot come up with anything better than "try to memorise a unique password eight letters long" then that's a far more serious technical failure.

The real problem is that people think they must be the only person in the entire world to have chosen "iluvlucy". That's a "standard password" by our definition above, but it's also ludicrously obvious. Pwned Passwords lets us distinguish "iluvlucy" from "xlvghydm" not by some crappy heuristic that would also catch other things but by the fact that lots of people already used it and had the password revealed.

What is the latest and greatest in password creation rules anyway? It is hard to define a password policy which covers both users' practical needs and still makes ERP systems secure and follows best practice of the major players.

Microsoft research has an interesting paper on it. Are there more like this out there? https://www.microsoft.com/en-us/research/wp-content/uploads/...

Hints welcome!

This is a terrible idea which will backfire.

Many users have a "universal weak password" for sites that don't really matter, now you will be forcing them to jump through hoops just because so.

I suspect there's a relatively small window of folks who have a universal weak password. There are lots of folks who have a universal password for everything; there are some folks who have a unique password for everything, because once you use a password manager or even a password scheme you might as well customize your password for every site.

And I think forcing people who are in a position to use unique passwords easily, but too lazy to do so, to get around to using unique passwords is a good thing to do. I include myself in this category - I was sloppy at password hygiene until very recently and I should have gotten on it a long time ago.

(Note that I'm not endorsing password schemes because they're very vulnerable to targeted attacks, but they are popular and arguably easier than password managers and they do technically count as letting you use a unique password on each site - if you use one, the HIBP API will not block you from logging in to any sites other than the one that got breached.)

this shouldn't be used on all sites, only ones where security matters.

It might be frustrating for users to have their typed password rejected without an explanation. Since there’s no authoritative list of compromised passwords they would have to take your word for it.

Wouldn’t a better solution be to increase password requirements until users are forces to generate one using a password manager? If you can memorize a password it’s probably not secure.

I feel like if they have already managed to gain access to your hashed passwords 9/10 times its already game over

Please don't use passwords at all. They are wrong for so many reasons. Use emailed sign-in links.

Which, in general rely on your users email password .. which may not be as secure as their bank password, because "email doesn't handle money".

So, to access your bank now crackers just need to get in to your email (which may be true anyway, of course 2FA helps in both cases).

If somebody get to your email you are doomed anyways because with passwords all they have to do is reset it through that hacked email.

The benefits of sign-in links are: a) you don't have to remember gazilion of passwords b) you don't have to use password managers c) you can setup a really strong passphrase for your email and actually remember it because it's the only one you have to. And of course set up strong MFA or whatever d) you don't have to setup MFAs on services and giving them more personal info(phone number) than they really need e) for me as a developer it's also easier to implement actually

I think the GP was suggesting automated sign in links for random web services. Using them for a bank account would be silly.

If you say to the user, "sorry but that password is too common - please try again" then the user will simply add a 1 to the end of the password and press submit. That doesn't offer much improvement in security.

If that happens commonly, the original password with a 1 appended will probably eventually appear in a future HIBP database. In fact, the user could continue adding 1s until they either give up and try something different, or until it becomes uncommon enough.

Here's an example of that user experience:

User: Set my password to 'monkey'

Website: Sorry that's a common password

User: OK, set my password to 'monkey1'

Website: Sorry that's a common password

User: What?! OK, set my password to 'monkey123'

Website: Sorry that's a common password

User: Grr! Set my password to 'monkey123fuckyou!!'

Website: Sorry that's a common password

User: Screw this, I'll just sign up to your competitor's website instead

But with Pwned Passwords after that last attempt it just works. Because your scenario is imaginary and you haven't actually checked these were all backlisted. Whether the user will remember they picked "monkey123fuckyou!!" is a good question, but it's a markedly better password than "monkey".

That doesn’t seem too far off from what I would expect, if your site actually cares about the data your users store there.

I wouldn’t recommend it, but you could try a slightly scarier explanation. (“Sorry, hackers have cracked this password on other sites. If you use this password elsewhere, you should consider immediately changing it.”)

This is not a good idea, as implemented. You shouldn't disallow a user from using an otherwise strong password just because it's detected in a breach unless you can definitively see that it's already associated with their email address or username.

The logical conclusion of a password checking system like this is that this password:

can no longer be used by anyone, because I've just "breached it" by posting it on Hacker News.

If the password actually has a lot of entropy but it appears in a breach then that's some fairly strong evidence that the user is reusing it.

Specifically if it appears n times in the HIBP database you should assign at least roughly 1/n probability that the user is reusing it.

So if you assign disutility -V to letting a user have a known username + password combo and utility U to letting a user sign up with a known password but unknown username, the utility is (n-1)/n×U - 1/n×V

Reasonable values of U and V for a given site will be different depending on the application, but for online banking -V would be maybe -20 and U might be negative as well. You wanna bank with a public password lol? For something like gmail or Facebook it would be the same story.

On the other hand if the password is quite weak then it's vulnerable to credential stuffing. If it appears, say, 10,000 times in the HIBP database then most likely it's as good as public whether or not the user account name is known.

Maybe there's a sweet spot around 50 instances where you can't really credential stuff it, and you also aren't that sure that it's a reuse.

In terms of usability you could tell the user to change it up a bit, add some words.

For example, r0bbiewilliams appears 5 times in the database. luvrobbiewilliams appears 0 times AND IS PROBABLY EASIER TO REMEMBER!

You can almost always get away from a breached password by adding a small amount of text.

> If the password actually has a lot of entropy but it appears in a breach then that's some fairly strong evidence that the user is reusing it.

I'm not talking about scenarios where you can associate the password with a specific user.

You can in fact associate the password with a specific user - the fact that that exact password is being reused is, by itself, strong mathematical evidence that it's the same user or someone they told the password to, because it is basically mathematically impossible that anyone else could generate the same password by coincidence (unless they're both using a password generator that doesn't have good random seeds or is otherwise deterministic, in which case you should be banning the password anyway).

> exact password is being reused is, by itself, strong mathematical evidence that it's the same user

yes, exactly.

I thought of another way of putting this - a 20-character alphanumeric password is a random 114-bit value. A UUIDv4 is a random 122-bit value (the remaining bits are specified by the UUID spec). If you generate UUIDs for your users, and you don't expect two users to end up with the same UUID, it would be confusing if you somehow expected two users with 20-character alphanumeric passwords to potentially collide. The probabilities are just a factor of 256 from each other.

(119 not 114, there are 62 alphanumerics)

well when the password got breached it is associated with a particular user.

And HIBP will tell you how many times a given password appears, but not which account it appears with. See:


Many people use the known passwords list with offline cracking tools.

The odds of a user randomly selecting that password, rounded to 20 decimal places, is 0.

That's a fair point. My original problem with this is that you'd likely not have the entire copy of breached passwords available for local comparisons. But Troy Hunt provides that freely so you don't need to use his API. That wouldn't have much latency at all...that fact in combination with the obvious probabilities involved here, yeah I'll concede the point.

If they use an already compromised password, they're prone to a dictionary attack.

edit: I did try logging into your account with the password you posted. :P

That said, I know of one event where someone got into a bunch of accounts on a site and did some real damage by using known username password combinations and preemptively forcing specifically those accounts to change their passwords and blocking them from being used would have prevented the outcome that the site invalidated literally every user's password instead.

Any known password is no longer a particularly strong one.

That doesn't make sense. If I publish a list of 20 trillion alphanumeric passwords, each of which is 20 characters long, your thesis is that no one should ever use any of those passwords again?

Yes, that is the correct conclusion. There are about 0.7 trillion trillion trillion alphanumeric passwords 20 characters long (62^20). Banning 20 trillion of them is a drop in the oceans, and nobody using a password generator is statistically likely to generate them within the expected lifetime of the universe, let alone of any given website. So, if you see one of those passwords, it is overwhelmingly likely that someone took a shortcut and used your list.

No, it's not the correct conclusion. Virtually no passwords are safe under an offline brute force attempt if they haven't been protected by a robust key derivation function and a randomized salt. If the password has been protected like that, you not only need to try all 20 trillion of those hypothetical passwords; you also need to try them with the correct salt.

And this is aside from the fact that you won't even rip through those 20 trillion in an online brute force attempt. Eventually you are only constrained by computational resources. If the passwords are cryptographically secured, it's fine if any one of them is published on the internet if you cannot associate it with any given user. If they're not cryptographically secured, this won't meaningfully reduce your already poor security anyway.

If you have a password manager you can and should create passwords that are completely safe against offline brute force. 20 characters works for that. Nothing has 2^119 computation power to break it.

Blocking a list of 20 trillion passwords is probably overkill if you have a slow hash. But with a fast hash it's the difference between "impossible" and "less than one GPU-week".

And if it's easy to block, you might as well do it. There's no upside to letting people use already-posted 20 character strings.

But what is the harm in blocking all 20 trillion such passwords? Do you expect to have any false positives?

The harm is the principle of it. You should not design a system that greps through e.g. every single breach dump for arbitrary passwords every single time someone tries to sign up for your service. That's maddeningly inefficient.

Bloom filters! B-trees!

There are MUCH smarter ways to implement this than grepping a list line by line. I mean seriously who the hell would do that... If your bio isn't complete BS then surely you understand that proposed implementation is BS.

To clarify the context I was talking about, I mean online checking of a hashed user password against Troy Hunt's HIBP database of hashed passwords. The original context of this discussion was an API, not a local copy of the database.

That being said I'm actually going to concede this argument, because on further investigation Troy Hunt provides the entire copy of his database freely for local lookups. Once you hash the user password to match the database hashes you can introduce the optimizations you mentioned, and in any case it shouldn't introduce intolerable latency to do that lookup locally.

Why do you think Troy Hunt is not doing those same optimizations on his end?

Why is it inefficient? Is the HIBP API too slow? How slow is too slow?

My principle is that you should not let people sign up with breached passwords at all - don't make judgment calls about which breaches matter, and whether you think it's the same user or not, or the password is strong enough or not. Just ban the passwords. (Remember that no actual data breach contains 20 trillion passwords.)

I'm actually withdrawing my argument, as I mentioned in the sibling comment to this one. The HIBP API does introduce a lot of latency, but you can do this checking with Troy Hunt's entire password database locally. He provides it for free.

You still haven't answered or conceded my question: is the online API too much latency? You only need to check it when changing a password or registering a new account, not when validating a password.

The world has decided that making me go through a Google CAPTCHA when registering a new account - a process that takes me several seconds of active mental effort - is fine. If the check takes even 1 second of server time, is that noticeable?

Editing since I misunderstood you:

If you publish one trillion passwords from a large space then each one of them gets a probability boost (of approximately 1/1 trillion), though not enough to ban them, especially if they are not actually associated with accounts.

The danger with using a rare but breached password is that there is actually quite a high chance that it was breached from your account elsewhere.

I think we're reading the question differently - I'm responding to the question of, what if you publish a tiny subset of the passwords, 20 trillion out of 0.7 trillion trillion trillion. That does change the probabilities.

I do agree that if the entire space of possible passwords is only 20 trillion, that doesn't change the probabilities. But there are over 20 trillion eight-character alphanumeric passwords. So, I would actually say you should ban them all, because you should insist your passwords are at least eight characters long. :-)

Edit: yes, agree, in practice the probability boost is not very much. I'm just saying you may as well ban them on the assumption that the HIBP API will do so at its current level of performance. (20 trillion is a ridiculous number, because it's much larger than any possible breach and yet much smaller than any meaningful password space, so any arguments about it are going to be inherently silly in some fashion. My current silly assumption is that the HIBP API is capable of ingesting 20 trillion breached passwords with no performance hit.)

Well there is one important factor about these 20 trillion passwords: are they associated with real user accounts?

If not then it really doesn't matter that they got published. They're useless to hackers without knowing what email to type in. The attack model is that the attacker actually has to log into a website and you don't get 20 trillion attempts.

Never underestimate the ingenuity of the user. They might search for a list of good passwords, find it, and pick one.

> The attack model is that the attacker actually has to log into a website

Not to find the password. If it was then nobody would get upset about plaintext password storage.

Well if we're talking about the possibility of an offline attack against a password database that's a bit different. The standards for a good password are higher for that attack.

But anyway if you pick a password from a list of 20 trillion where the offline attacker knows the list, it doesn't actually help them much because a single selection from 20 trillion options has 44 bits of entropy.

Passwords that users choose typically have less entropy than that afaik

Most passwords are worse, yeah, but 44 bits isn't great. With a fast hash that's less than a GPU-week. It's basically enough if you use bcrypt, but even then it's not protected from an attacker with a lot of money to throw at it. (8 GPUs per server, 10 servers per rack, 50 racks, suddenly you're hashing work-factor-10 bcrypt passwords at about 2 million per second and average cracking time is 50 days.)

Right - "attacker gets an old database backup, and wants to escalate to access to the live website" (or perhaps "attacker breaches QA", or something) is a realistic attack model. Take all the known passwords, hash them, match the hashes against the database, look at the next column over to see whose accounts you compromised.

If it's published as a list of known passwords, yes. That's roughly 1.50463276905253e-21 percent of the potential passwords for that character space (assuming 64 possible characters). If it's know that those are passwords then they're much much easier to test against than the 1329227995784915872903807060280344576 possibilities.

First of all, in an online brute forcing scenario attackers will never get through the entire 20 trillion. Even if they knew for certain that the target victim's password satisfied the precise constraints of the passwords published in that list, they'd need around 2 years (with reasonable assumptions of millisecond latency) of constant, 24/7 attempts to run through them all. This is assuming there is no rate limiting.

In an offline brute forcing scenario, either the passwords have been hashed with a strong key derivation function and a randomized salt or they haven't. If they have, it doesn't matter if the password is in that list. If they haven't, you're most likely screwed either way, because attackers can get up to 1 trillion password attempts per second in real world cracking setups now.

No, you're not screwed either way, because there are trillions of trillions of trillions of possible 20-character passwords. So even if they can get one trillion attempts per second, it will still take an attacker trillions of trillions of seconds to brute-force all possible 20-character passwords, which is longer than the lifetime of the universe.

Now keep in mind that 1) it is possible to choose a strong password with fewer than 20 characters, and 2) most people will not choose a password that long.

Extrapolating further, this "policy" would disallow people from choosing passwords below a certain length, because it's theoretically plausible someone has published the set of all possible strings fewer than n characters long.

> Extrapolating further, this "policy" would disallow people from choosing passwords below a certain length, because it's theoretically plausible someone has published the set of all possible strings fewer than n characters long.

Yes, that seems like a good password policy. A list of possible alphanumeric strings that is actually reasonable to physically publish (i.e., not 20 trillion) is a list of extremely short alphanumeric strings. 5 alphanumeric characters is 380 million possible strings. 6 is about 2 billion. You should absolutely ban passwords that are 6 characters or shorter!

In fact, I would go so far as to say that the questions of "Is this password too short because someone could brute-force all the possibilities, even if we're using a good password hash and a previously-unknown salt" and "Can someone physically enumerate all passwords of this size and put them on Pastebin or otherwise get them in the HIBP database" are equivalent.

Salts do not need to be previously unknown. This goes back to what I was saying - they're either securely protected with a key derivation function, or they're not.

Right. That was just to give the most generous possible circumstances to a password when arguing that it's still too weak.

Yes, it's just safer. We do this _all the time_ in the Web PKI.

Remember the "Debian weak keys"? What's weak about those particular keys?

Nothing. Nothing whatsoever. Those keys aren't special in any way. Except, Debian shipped releases that always picked one of these key pairs. So anyone with a mind to can go find the list of private keys that corresponds to these particular public keys and thus we don't let you use those public keys any more.

(You can go try this if you don't believe me, submit a CSR to your preferred public CA asking for a certificate for one of the Debian Weak Keys, it will be rejected and there may or may not be an explanation attached saying your keys are crap and to get new ones)

Whole swathes of keys are blacklisted. ROCA is another example, somebody took one mathematical short-cut too many in their optimised design for RSA key generation, and so the resulting keys all have this very obvious structure that's exploitable (not easily, but enough that a sovereign entity could definitely break them). So we just blacklisted all those keys.

If you pick truly random keys you'll never notice this in a lifetime because of statistics, it's just some code on the issuer's systems that you never need to care about.

> publish a list of 20 trillion alphanumeric passwords, each of which is 20 characters long, your thesis is that no one should ever use any of those passwords again?

No because they each still have a very low probability.

You have to be Bayesian about this: a list of one trillion passwords that have no further distinguishing information about each one of them cannot be assigned a probability of > 1/(1 trillion)

In a data breach, a given password appears next to a particular username or email, which means it has a very high probability of being the password for that account.

Just a quick point that is worth considering: If a user types in a password and that password appears once in the HIBP breach list, then it is extremely likely that the source of the password in the breach list IS that user.

If it appears 2-3 times, then there is still a significant chance that that user is the source of the password getting into the HIBP database.

And if that user is the source, then the bad guys most likely know that user's email and password, and their account is wide open.

Exactly. I think of the HIBP password list as having three types of passwords (this is an oversimplification, but bear with me):

1) Extremely weak ones that lots of people use (e.g. 'password1') 2) Somewhat unique ones (their pet's name and birthday) 3) Truly strong ones (random, long strings)

I don't want users on my site using type 1 passwords at all. If a password is really type 3, the odds say that no user will ever try to use it again, so there's no collateral damage in blocking it. The person signing up with a type 2 is almost certainly the same user whose credentials are in the breach. I don't want them to reuse that password on my site because it makes their account vulnerable to credential stuffing.

So they'd have to be testing in the browser. Seems unlikely.

That is going to be insanely annoying.

Could you explain? Unless you're using a really weak password or reusing passwords, how would it affect you?

I do, in fact, use really weak passwords on sites where it doesn't matter, and I reuse them whenever I feel like it.

Passwords already suck. Please don't make them suck even more.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact