Hacker News new | past | comments | ask | show | jobs | submit login
A password hash storage scheme that prevents efficient password cracking (github.com/polypasswordhasher)
91 points by monort on Sept 16, 2015 | hide | past | favorite | 37 comments

I'm curious what threat model this protects against that isn't also trivially covered by bcrypting hashes and then encrypting them with a symmetric key that has to be provided to the system at boot time by a human or keywhiz or similar. It seems they're trying to protect against an attacker who has DB access but not RAM access, which is a very important vector to consider but this seems to be an overly complex way to handle that.

> boot time by a human

From the paper:

> The server will restart periodically. All data kept in memory is lost at this point and the server must restart using only the data on disk – which is attacker visible.

So I guess your solution is correct and trivial but the author's solution doesn't use data from outside, not even a user provided symmetric key.

It does use data provided from outside: you either need to provide enough valid passwords for it to recreate the secret, or leak data from the stored secrets, as described here:


They leak data but their threat model involves that the attacker can get this leaked data too (they leak it to the hard disk?).

>provide enough passwords

Why don't they just store some dummy users' (with zero privileges) credentials directly on the hard drive? They simply can provide valid passwords from it after reboot and your attacker can have these passwords since they can't do anything with them. Maybe their leak strategy involves something like this?

The paper linked from the article mentions the threat model as disk access, may already know an arbitrary number of passwords, but cannot read RAM, and may need to be rebooted.

The alternative solutions does not mention doing what you suggest, but reboots would be problematic. Of course you could do the same thing for bootstrapping as they do (leak a portion of the salted hash to disk and use that for verification), but that I'm pretty sure that allows probabalistic cracking of weak passwords (assuming most passwords are weak, the weakest password that matches is likely to be the password).

It seems this system uses your scheme, except that instead of getting the key from administrator input, it computes it from the first K passwords it gets using a secret sharing scheme.

The advantage is that the administrator doesn't need to manually intervene, the disadvantage is that you can't verify passwords (in a way that is more secure than just verifying hashes) until you get at least K correct login attempts.

If the goal is only to defend against SQL injection, a much simpler solution is to just put the key outside the database, for instance in the configuration or source of the application.

Please don't ever put keys in the source of an application.

Why don't they encrypt entire databases like this, and thus avoid the problems from all these hacks?

While I'm not a security expert, but I guess most of the attacks don't copy the database from the disk directly but use the SQL server for that through injection. If the SQL server can read your database (which is mandatory if it wants to work with it) then the underlying disk encryption doesn't help.

The reason I had difficulty using scrypt is because I didn't want to DoS my own server. Script asks you to choose a "work factor," and the higher the work faster, the harder it is to reverse the hash. Trouble is, if you set it too high, it's a bit painful to change it later. If you make a computationally difficult hashing scheme, then a lot of simultaneous logins can therefore force your server to use all of its CPU.

Does this suffer from the same issue?

It isn't necessarily an important issue. Maybe nobody will figure out they can take down your service that way. Maybe rate limiting is enough. Maybe you can solve it by having servers dedicated to hashing.

I was just curious about the performance characteristics:

Suppose that three people have passwords that are each randomly chosen and 6 characters long. A typical laptop can crack those passwords in about 1 hour. If you take the same passwords and protect them with PolyPasswordHasher, every computer on the planet working together cannot crack the password in 1 hour. In fact, to search the key space, it would take every computer on the planet longer than the universe is estimated to have existed.

The contradiction is that if users can log in to your service quickly, then they can try a lot of passwords quickly. So is this more expensive than scrypt?

If an attacker has read-only disk access, they can swipe any unencrypted data. Scrypt protects you from this, and it's straightforward about the requirements and computation costs.

In practice, what kind of CPU usage would a PolyPasswordHasher scheme need?

One way to prevent DoSing your own server while using expensive hashing algorithms is to offload some or all of the computation to client-side JavaScript (if it's a website) or client-side native code (if it's an app). This is called "server relief" [1].

Experts have said that scrypt can be safely used with server relief (see link below). But I'm not sure whether PolyPasswordHasher can be safely used with server relief (i.e. without leaking secrets to the client). Perhaps someone more knowledgeable will pitch in with a better answer.

[1] https://news.ycombinator.com/item?id=9305504

Online DoS shouldn't really be a problem, since you should be rate limiting logins on a per-IP basis to prevent online brute-force anyway. Of course, typically nobody does that.

'Server relief' is dumb... correctly implementing a 'hashcash'-like scheme to prevent excessive CPU resource commitment is non-trivial. It will require either persistent pre-login server state, and/or cryptographic IP/session binding, both of which open up a second DoS vector.

Implementing Scrypt etc in the browser doesn't completely prevent DoS, and sacrifices your ability as an operator to ensure passwords are being appropriately salted and munged (because of third party clients, unauthorised API users etc). Plus, attackers can just start using malicious webpages or botnets to crowdsource their hashing efforts which, because it scales horizontally, could actually speed things up for an attacker if you're server isnt doing any rate-limiting on the final hash+authentication stage.

Limiting logins per IP address is a luxury that cannot be afforded by developers of applications targetting some Eastern countries or which are expected to be primarily used by mobile devices, due to pervasive usage of NAT and shared IP addresses. You are simply trading one form of DoS for another :/.

Seems like it should be possible to rate limit failed logins, and of course failed logins on a per-account basis.

Failed logins, yes. However What the original comment was getting at scrypt would in effect, reduce the number of simultaneous login requests a server can handle, so legitimate usage could bog the server down.

Either way, how are you going to rate-limit failed logins while at the same time not allowing a DDOS of a user login? If I am using a botnet to keep trying passwords for Sarah Palin, how are you going to know when the real Sarah Palin logs in from a new computer? Sarah Palin will never be able to log in again from a new computer, unless she uses a key from her old one.

I would assume that you'd simply do (increasing) timed lockout periods by user/ip combination.

At some point you have to accept that administrators will need to do some work, and if 200 IPs are trying to log into the same account 5 times every 15 minutes you should probably email the user and lock the account.

>per account

Nobody stops an attacker to use a fixed password and cycle the username. If your site has lots of users this could work pretty well. Your site could even "leak" (like forums) usernames making this approach more efficient.

I haven't read the full paper, so I don't have a complete understanding of this scheme...

But what stops a malicious attacker from registering a bunch of dummy accounts, and then using the known passwords to reconstruct the Shamir Secret?

Similarly what do you do for the first couple of users that need to legitimately login but you don't know the secret yet?

(I also haven't read the paper, but the concept is intriguing).

There is the option to "leak" a portion of the users' salted hashes, so that the users can be verified upon reboot. This leaked information harms overall security but, even with this option, the scheme is still better than regular salted-hashes.

Hmm, hard coded bootstrap dummy users? (Evil)

Maybe the probability that your dummy accounts share the same "line" (their terminology) will be low if the number of users is large enough? (and the number of "lines" is also large enough) Although this conflicts somewhat (but not assymptotically) with the need to have enough users at any "line" so it can be easily computed by the server from scratch when it needs to restart.

They have the concept of "shielded" users, which are users that are shielded by the shared secret, but don't contribute to it. Though, they don't really specify when to put people into the shared-secret group and when to leave them as shielded. I suppose it's an exercise for the user.

This is a very solid point. We already had an open cryptographic competition to select a password hashing standard, and we have a winner: Argon2 is being developed as a new standardised password hash (i.e. slow-hash).

If you're going to introduce a new one, it should go through at least the same level of cryptanalytic scrutiny before anyone uses it.

This is not a password hashing algorithm, but storage scheme. It's purpose is orthogonal to Argon2.


This seems useless to me because of the well-known tendency for people to use the same username and password on multiple systems. A hacker need simply try a series of known username and password combinations until they have generated enough successes to crack the system.

Are incorrectly stored passwords really the problem with a lot of the last few major security breaches? For instance, I read that Sony hack was an act of social engineering. Anthem was hacked via phishing attack. Ashley Madison hackers got in via SQL injection. I guess is all of this complexity really worth it for an average company when it doesn't even seem to be the root of the problem.

It's not really about keeping people from breaking into your site, but rather keeping them from using the data they get from your site to break into other sites.

Basically, a lot of people will use the same credentials on both (for example) Ashley Madison and their bank. If AM does a bad job storing their passwords, then an SQL injection there can be used to reconstruct the users' passwords and then log into many of their banks.

HSM are freely around for Ashley Madison to buy with their multi millions that are optimized for hi speed crypto functions with physical keys that can't be extracted by a remote attacker stealing the db. Since they don't bother to use easily found current solutions don't see why any of these corps with massive breaches would bother to use this beta software solution either.

There are a few separate problems at play. Security is never one thing.

The root of the password problem is that people don't use a password database. I didn't until recently, even though I was aware of the risks. It was more than I could be bothered to do. Apparently Rayiner wasn't using one until recently as well. It seems pretty likely that most people won't use one unless they're forced to.

The effect of this is that if you use the same password on many services, you're at a pretty big risk. What do most users do, then? "Ah, I'm clever; I'll put the name of the service in the password."

Trouble is, this is exactly the same as the previous situation. You're sharing passwords across services, whether or not you thought so. The first thing that an attacker will do when they reverse your password is scan it for instances of the service name. If your password is "DropboxFoobar", it's likely your Twitter's is "TwitterFoobar". In fact, passwords of this type are the most valuable to an attacker, because it's a surefire flag of "You can get a ton of data from this person if they use the same scheme everywhere!" (When you reverse other types of passwords, it's unclear whether it's worth the effort to try it against a bunch of services.)

That said, anything to improve this situation is worthwhile effort. Whether it's a new hashing scheme or a public service video about the dangers of not using a password database, security improves incrementally. Shielding a modern company is extremely hard, as it requires all employees to use the same security measures. Anything that can be automated would be a win.

Are there other automatable things besides hashing? Phishing attacks can't be detected because they're crafted so well. SQL injection can be, though: Throw your server requests through a bayesian filter and watch it decide to put requests containing SELECT into their own class automatically. Defenses to social engineering are close to impossible: if someone wants to get into your building, they'll pose as a maintenance crew. If they're discovered, they'll say "Sorry, my work order must be for a different floor."

All in all, this is the general essence of why Sam Altman's desire to fund new security companies is good, but might not produce many results.

So are incorrectly stored passwords really the problem? Well... Not if you're using bcrypt or scrypt. Possibly this scheme; time will tell. The research is valuable for other reasons too: it's a whitepaper with source code, which is extremely rare. Might be good to encourage more of that.

It seems to me that after a new reboot it's impossible to register new accounts, or change any passwords because the server does not yet know the "line".

You would need enough logins for it to calculate the true line before it could create any new passwords.

What's wrong with PBKDF2?

Plenty, PBKDF2 is ASIC friendly and not memory hard, things addressed by better schemes...

It doesn't look like an improvement to me.

During normal operation it has a black box that validates passwords quickly. The PloyPasswordHash is the key to how it does it - but really it doesn't matter. All that matters is a password (or a hash of it) goes in, the blackbox maybe reads some data from the (effectively public) password database, and a yes/no comes out at the cost of stuff all CPU time.

The point is - if you can reproduce the blackbox once per GPU node on a password cracker it's no better than any other scheme.

So the security of the scheme rests on the attacker not being able to reproduce the blackbox. If it were a real blackbox sitting somewhere secure then maybe - but it's not. They make a large point of saying this scheme has no secret hardware sauce - is it's just software, in reality just another process running on the web server. If you managed to get a copy of the memory image of that process it's game over. But we can't use that weakness to attack it because it is acknowledged in the paper.

During normal operation the blackbox creates it's secrets by effectively combining a _lot_ of passwords (say N) from users. Thus you need N valid passwords before you can crack one. I think they achieved that, and it does indeed make the it near impossible to recreate the blackbox by brute force - if all you have is the public password data base and a copy of the code.

But it's not that simple. Firstly they acknowledge a naive implementation doesn't work because an attacker can just create N accounts and feed the passwords to the blackbox so it can initialise itself. In addition there is a bootstrap problem - no user can validate their password until N password have been gathered by having N attempted logins.

They get around that by having N (or close to N) special accounts which they call protector accounts. These accounts are presumably created by the server admin and thus can be trusted. They are used for both bootstrapping (implying they are available at boot time) and preventing the flood attack (by insisting some of the special accounts must be part of the N).

And so we get to the real issue: the passwords for those accounts must be entered when the machine boots. Effectively they are the secret the blackbox needs to do it's stuff. Replacing the PolyPasswordHash with a process that demands a single secure password be entered when the machine boots would work just as well. Eg, that password is an extra salt fed to a cryptographic hash along with the password word and on disk salt, and the result must match the hash stored on disk. Without knowing that salt there is no way to validate the passwords.

Sure entering N passwords isn't hard, but would you enter N passwords when there is a method offering equivalent security when you enter just one? And even that isn't the real issue - the real issue is fuck all corporations on the planet have employees willing to be woken up a 2 AM in the morning when a server boots to enter just one password.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact