It's straightforward and effective when password hashing happens client-side (like local encryption for a bitcoin wallet), but using scrypt or bcrypt server-side at effective difficulty levels will either destroy your L1/L2 cache or exceed your CPU budget.
In practice, from what I've seen, this means scrypt is never used, and bcrypt is used with a work factor that's tuned low. I think this is just a fundamental asymmetry, and that at the end of the day people use bcrypt so that they can say "we used bcrypt!" and get a nod when their entire user database is compromised, even though it provides very little value the way it's deployed.
Actually the Password Hashing Competition has shown this asymmetry can be exploited to the server's advantage: one of the ideas being studied is called server relief.
For example instead of storing the scrypt hash on the server you store the SHA256 hash of the scrypt hash (and the salt alongside), and ask the client to send the scrypt hash to authenticate. This way the per authentication cost to the client is expensive (one scrypt hash) but the cost to the server is cheap (only one SHA256 hash).
Attackers cannot take shortcuts. They cannot send random scrypt hashes to the server while skipping the scrypt computation step, because the space of possible scrypt hashes is just too large to brute force.
Edit: @__david__: yes you could SHA256() only the hash part of the scrypt hash, as opposed to the hash & salt parts. It is an implementation detail --not really important. And regardless of how you do it, if the DB get stolen the SHA256 hashes are useless to authenticate.
Edit: @espadrine: no, breaking a SHA256(scrypt()) hash is no easier than breaking a scrypt() hash. Server relief with SHA256 introduces no weaknesses. It is strictly more secure than standard scrypt hashes.
Edit: @plantbased: I recommend against MD5(scrypt()), because it is likely preimage attacks will be found on MD5 in the near future, so someone stealing your database of MD5(scrypt()) hashes will be capable of computing an arbitrary scrypt hash that authenticates to a given user. But you could certainly do MD5-crypt(scrypt()) with MD5-crypt being the salted & iterated version by PHK.
Edit: Thinking about it more, I'm guessing it is for the case where your DB gets stolen. With the salted hashes, attackers can't just send the direct scrypt hash down to the server to get authenticated.
In this case you'd need to precompute an SHA256 of a dictionary of scrypt hashes which would have been computed from a dictionary of english words.
In practical terms, if you steal the DB and wanted to reverse passwords, you'd need to scrypt lets say 100 million words, then SHA256 them, then compare them to what's in the DB.
The original blog entry argues (or maybe it was the post on the same site that linked to) that we should use more compute intensive hashing algorithms, rather than relying on salts. scrypt is definitely more compute intensive.
So if you didn't use a salt and wanted to throw a 100M word dict at this, how long would that take on decent hardware to create the rainbow table of 100M words which could be used repeatedly against stolen dictionaries? On a decent GPU you can do around 1.2 billion SHA256 hashes per second. So the SHA256 step is trivial.
ASIC bitcoin hardware can do around 4.5 million scrypt hashes per second: https://bitcoinmagazine.com/13000/new-benchmark-scrypt-minin...
So you could create your rainbow table to attack the non-salted version of this in around 25 seconds.
A salted version of this same system would need the scrypt to be salted because SHA256 is so fast. If you just salted the SHA256 you'd be able to use a precomputed scrypt rainbow table and then you only have to compute the salted SHA256's which as you can tell from my benchmark above is very fast.
So clearly the salt is needed and you need to compute scrypt(salt + password) -->> send to server and then SHA256(scryptHash). Note that you don't need to add the salt on the SHA256 side because you've already defeated a rainbow table attack by your initial salting.
So how fast could you attack a salted system like this using GPU's for your SHA256 and ASICS for your scrypt based on the benchmarks above?
You can pretty much ignore the SHA256 step because it's so fast, so you'd get around 4.5 million guesses per second using ASICS doing scrypt.
Compared to 1.2 billion guesses using a GPU for SHA256, that's not too bad.
An interesting thing about GPU password cracking that many people don't realize is that the GPU is responsible for not only hashing candidate passwords, but also generating those candidate passwords. The bus bandwidth isn't enough and the CPU isn't fast enough to keep it fed with candidate passwords otherwise.
You could, of course, build password cracking ASICs that have internal processors for generating candidate passwords, but I'm not convinced that it doesn't make more sense to stick with GPUs or maybe FPGAs there.
You may be able to detect this with testing, but not necessarily. And yes, it'll probably happen sooner or later, given the nature of JS compilers.
I don't know if there's a method that avoids this problem that still has the advantages that you mention.
IMHO While MD5 has a fraction of the keyspace of SHA2, it's still a very hard problem to reverse it and intuitively it seems this might provide a huge improvement over salted and stretched (multiple rounds) of md5 on the server.
I'm curious - what kind of infrastructure would constrain you to md5?
That, and the major reason we do hashes even over TLS: if the server gets cracked and the attacker has access to the stored hashes, breaking a SHA256 is trivial (ie, brute-forcing b when knowing s = SHA256(b)). Then, the attacker can assume the identity of any individual whose SHA256 s got broken, by sending the bcrypt b directly from a crafted client, without having to find p where b = bcrypt(p).
Even MD5 is still (AFAIK) effectively unbreakable in this use case (finding md5 collisions is possible, finding the input given an md5 is not). According to http://en.wikipedia.org/wiki/MD5#Preimage_vulnerability finding input to generate a given md5 has a complexity of 2^123.4 - i.e. never going to happen. And that's Md5, an obsolete hash!
Sha1, also considered obsolete, has a bad rep because there are scary theoretical collision attacks. This isn't good news, certainly, but for a sense of perspective: nobody, not even with lots of distributed computing power, has ever actually managed to find a collision since that would take around 2^60 attempts. I don't think anyone has ever found even a theoretical weakness in its preimage resistance.
And you're talking about sha2 - there have been no published reports on any attacks, even theoretical, on the full sha2 (they have found weaknesses on reduced versions, suggesting room for improvement - but that's largely of academic interest).
Collision attacks are another matter. If you're using a hash to sign potentially hostile content rather than recognize friendly content, md5 is clearly dangerous, and sha1 may well become so. But that's relevant to things like certificate authorities, not message authentication.
I don't think that's feasible. "Expensive" is an understatement. To compute the rainbow table you need to do the work of actually calculating the scrypt output for each password, even though you don't have to store it. The space is just too big, therefore the time cost.
But yeah, it's not likely to be a major threat. In any case, salting makes it irrelevant.
1: Steal database
2: find collision
3: authenticate with the input that hashes to the same value
What stops that from happening?
MD5 is (currently) vulnerable to collisions, but not to preimages. So you can find two inputs that have the same hash, but not an input that hashes to some particular value in a database.
More specifically: if you just hash a bunch of arbitrary strings that aren't specially-constructed for the purpose of colliding, then collisions are basically random, and extremely improbable. But you can fairly easily generate two files, differing only in a small number of bits, with the same MD5 hash by taking advantage of the structure of the algorithm.
Examples here: http://www.mscs.dal.ca/~selinger/md5collision/
Clearly doable in specific instances, but it's not going to be an addition to the script-kiddie attack handbook anytime soon.
Ah, got you, I'm wrong. Thanks!
In the case of sending bcrypt output wouldn't the "password" be long and essentially random and therefore not easy to enumerate?
Or is SHA also reversible?
2. Even with a lower work factor, bcrypt is drastically better than salted SHA.
3. Companies with auth loads so high that they really need to tune bcrypt should also be using dedicated auth servers, which (a) improve security in other ways and (b) allow for a lot of flexibility in handling work factors.
Isn't that a security risk of it's own though? Wouldn't creating a rainbow table for (non-salted) passwords hashed with the default bcrypt settings then make sense again?
Also; if you don't feel like writing a useful response feel free not to respond at all.
I thought hashes were deterministic by definition so I was wondering where the required randomness came from.
As far as I'm concerned bcrypt took the term for a concept that is generally understood to be deterministic (hashing) and redefined it by adding the randomness in by default.
That was probably a better call than trying to introduce a new term and a new hashing algorithm at the same time; but it's also asking for the exact long-term confusion you're complaining about here.
> That was probably a better call than trying to introduce a new term and a new hashing algorithm at the same time; but it's also asking for the exact long-term confusion you're complaining about here
To be fair, besides a basic search for information on the subject, this was also covered in the linked article...
P.S. The XKCD wasn't directed tptacek but at tedunangst who apparently felt my comment made this thread 'astonishingly tedious.'
From the article, even 74kh/s is going to tear through a lot of passwords, but with a secure element in the mix, offline attacks go nowhere.
I'm not even saying that keyed hashes are a bad idea. I'm saying they're orthogonal to the question being discussed on the thread.
Facebook has 900M monthly users, 30M per day and probably most of those are cookied in. Even if not that's only 350 authentications/sec...
You need to make sure your login system can handle malicious clients, brute-force attempts from individual IPs, or against particular accounts, dropping back to CAPCTHAs or some other means of resource control when things look bad. This isn't a particularly easy problem to solve, and if you do it incorrectly you open yourself and your users up to a denial-of-service.
IMHO, some of the new fancy password-based authentication protocols, like AugPAKE, which do hashing client-side, might ease this problem... but they're still stateful (SYN floods anyone?) and therefore still depend upon good DoS protection.
Edit: Oh poop, you're djb.
How about this: Save two hashes, one low cost, one high cost--both must pass for login. Don't run the high cost unless the low cost one works first.
Note: not a security guy, so I'm probably overlooking something silly.
Edit: the two responses below (thus far) are reasons you should not do this. Duh on my part.
You can, however, do the high-cost hash on the client, then do a low-cost hash comparison of the result on the server.
So you do a cheap SHA256 hash comparison on the server, but because the input to that comes out of, say scrypt, it has 256 bits of entropy (or near), making cracking infeasible. (scrypt has variable length output, so you could use other hash sizes).
This is only done when checking; when setting the password, both the slow and fast hashes are done on the server (to stop someone just hashing 'password123' with SHA256 and sending that).
This is called server relief. There's a discussion of it elsewhere in this comment thread, it is well known, e.g. , though doesn't google particularly well.
1) You copy the whole password, and use the whole password for both. Then an adversary can just attack the weaker, faster one.
2) You split the password; half goes in one and half in the other. Now user passwords have doubled in length---or they haven't, and you have 4-10 character half-passwords instead of 8-20 character passwords.
When there is a choice between making dealing with denial of service mitigation more difficult or making it easier for user passwords to be discovered after a breach, you should definitely choose dealing with denial of service mitigation. Not saying you might disagree with that sentiment but I would have very little sympathy about a company trying to make excuses like that to validate increasing the risk of user passwords being exposed.
The systems involved actually doesn't tend to handle this many in reality (the ISP in question restrict it to some 20-30k/s to our system). These systems are starting to be better at supporting keeping state through restarts, but there can still be pretty impressive spikes of authentication traffic if the wrong system is rebooted (or crashes!).
In a perfect world, quality web apps have rate limiting built into their auth schemes. But it's important to acknowledge these two algorithms will put a much heavier burden on your CPU.
Or because it expires, the duration means it doesn't matter if it's compromised?
I see that token used - sometimes in the http request header - but most of the time as a param in the GET request, over plain http.
Does it even matter if my auth was secure ? I just need to get hold of some access logs and I can impersonate everyone ?
edit: I'm specifically talking about http basic auth with a precomputed "Authentication: base64($username + $passwd)" header, not a GET of "/foobar?api_key=12345abcd". The latter is obvious in it's failures and is not related to http basic auth.
> Authenticate once, generate a token, and use the token for auth from that point.
If that token is passed back and forth in the http url, it ends up at places where it's easy to find/intercept.
You can use a gazillion bcrypt rounds to store the password: they still send me a link to a page, including their auth-token.
Sorry I think you already explained this, I'm just swimming. I probably should have basic CS understanding to be in this conversation ....
There is no reason to hash a secure cookie, because they aren't used across several machines. The compromise of a stored session key implies that every system where that cookie is valuable is also compromised.
This also has the benefit of defending against timing attacks, for example, if you don't completely trust your code or the compiler / interpreter to not optimize your "constant time" comparison.
> In most cases that's a low-cost operation
But resetting API tokens is not necessarily low cost, as it's literally pulling the plug on all your persistent connections, and requiring manual intervention to bring them back online. There are many cases where you can't simply pull the plug on all your clients.
Also, as a byproduct it enforces all sorts of best practices, like ensuring the token is just a token, and doesn't have data piggybacking on it.
If the backend database is compromised/leaked whatever, the attacker sees the result of the Hash/HMAC, but cannot generate a preimage, so your tokens stored by clients do not need to be reset. Otherwise, a leak of the backend token database grants full access to the attacker.
This is particularly useful for long-lived tokens, like API tokens, but can also be used for short-lived sessions alike.
This is not currently considered a must-have / expected best practice, but I believe it will evolve into one, because it's hard to get wrong, and it protects against a reasonably common attack vector. Guaranteed timing attack resistance is a bonus add-on which is valuable in its own right.
There is no point to hashing cookies and session keys. Don't bother. It's not just not a best practice. It's something a smart auditor would file a sev:info bug against.
What I like most about 'token = sha256(token)' is that it expresses a significant part of the design in a single, obvious line of code; we don't store tokens, we can't provide tokens after the fact, we can't display tokens on our god-mode UI, if you lose your token a new one much be regenerated because we simply don't have it anymore, the token must not encode data beyond being a unique random value, we cannot be socially engineered into sharing your token with an attacker, the token will never be disclosed through a timing side-channel, insider threats cannot access your token, etc.
Simply doing a hash of the token won't get you the desired "noise" in the system. That is, you need to do something else besides what you have said here. For example, writing a new token, whether produced by an attacker or a legitimate user, will produce the same some "noise", no? Unclear how, when the attacker owns your database and any triggers, you are going to tell if it is an attacker.
Similarly, the single obvious elegant line of code is not going to deter an attacker. Oh, I imagine that there are attackers that enjoy the beauty of code in a hacked system and put it on pastebin, possibly slowing them down.
And what are the chances if the attacker is that deep into your system, that they aren't reading the tokens before they are hashed?
The downside that I see, and I would flag this scheme, is that it doesn't provide the protection that the developer is led to believe.
Nothing, literally, nothing, can provide protection against the attack vector you're saying this does not help against. Sure, and of course I agree. But obviously we don't give up and do nothing because sometimes exploits grant root. We add layers of defense and expect that some will hold, and some will not. Some will act to contain a breach across certain vectors, and some will be run over. Some will simply slow an attacker down, or prevent them from walking away with 100% of your data, and only get a percentage based on the time they were able to maintain the compromise.
I don't know what you mean that the line is "not going to deter an attacker". Surely you understand the difference between a short-lived attack which can only target persisted data, and a persistent attack with full memory access, and surely your threat model allows for one without the other in many cases? So green boxes in those cases for this technique, and N/A for the other boxes. What am I missing?
There are attacks it does not withstand, the same attacks that make password hashing useless by-the-way, and there are attacks that it certainly does withstand. There are also several attack vectors which it eliminates completely, and various follow-on benefits which I listed as well.
On the whole it is a large positive in a simple obvious package with no possible downside (except perhaps auditors flagging it as a bug). I never said it was a magic bullet which protects you from full persistent system compromise. I understand exactly what protections it provides and beneficial policy restrictions it imposes.
I find the resistance to such a simple defense-in-depth perplexing. It not being perfect is not a reason not to do it. Neither is password hashing, but we still do that. I really am shocked that token hashing would be at all controversial. I do it, and I think everyone else should too.
"It doesn't provide the protection that the developer is led to believe." Well this is much more true of bcrypt than it is of token hashing! What percentage of passwords does bcrypt protect, and for how many minutes after the database is popped? It's impossible to answer. What percentage of tokens does token hashing protect? If you can answer a) describe the access level granted by the compromise, and b) describe the time when the compromise began, then I can definitively answer that question.
The answer cannot be, assume the attacker had full root access to all systems. Because we are under attack every single day, and some of those attacks succeed in some limited measure. There are laws on the books today, and new ones coming, which require rapid disclosure of the actual extent of data breaches, so this isn't academic.
Of course the exact extent of a compromise is not always know, but likewise there are cases where the extent of a compromise is precisely known, and in a subset of those cases, a hashed token prevents an attacker impersonating your clients.
I mean, sha256(token) is child's play, to be arguing about it is just odd. I'd like to see homomorphic encryption stop even a memory-sniffing attacker from ever intercepting my tokens. Lets get serious! What, you'd rather just trust TLS? :-/
To the larger point, I do agree that some of this starts to look to be a bit of security theater. If a user picks a laughably weak password, and the server is completely compromised, it doesn't much matter what password storage scheme you used, and on the other hand if the user picks a great password, especially if he never reuses it, the consequences of losing that hash is limited.
And using scrypt with a decent work factor isn't 100 times harder, its tens or hundreds of thousands of times harder.
Additionally, I think one of the benefits of scrypt is that it is specfically designed to deter GPU or FPGA attacks because it is memory intensive instead of just CPU.
Hardware is cheaper than trust, or litigation.
(And there's always rate-limiting)
In the time it takes for your users to give up on the hourglass, assume their login request is hung, and start double-clicking submit, you have now DDoS'd yourself. That's about 750ms I think...
Also, try this sometime -- clear your cookies and try to login to Google, or Twitter, or Facebook. Time how long it takes before you see the first byte of a session cookie. Hint: Not Long Enough.
Password hashing is embarrassingly parallel. The botnets don't cost the attackers anything to hash your password on 100,000 CPUs all at once, after they've decided to target you. Moore's law? Also not helping.
We need something better. I'm working on just that, I'm sure it will have more than it's share of naysayers, but I'm giving it my best.
I can probably count on one hand the number of times I've seen an application use bcrypt or scrypt and define their work factors. The common case seems to be sticking to the defaults that their chosen library ships with. While these may be somewhat less than ideal, it's universally true that these schemes are stronger than salted passwords fed to a fast cryptographic hash.
Perhaps this is the real issue: lack of sufficient budget for CPUs that can reliably get the job done is a reflection on the priority organizations place on information security.
Comparatively, if they hit with 100 req/s for normal front-end data, caches can easily handle that. So really, adding these types of login checks to any publicly available login page ends up creating an on-demand DDOS target.
From the password cracking side, the asymmetry comes from the fact that an organization has to have X amount of server resources to service Y number of requests, but an attacker can utilize the same X amount of server resources to service just one thing, breaking the exact password they are looking for. This is further exaggerated by the fact that generally for an organization those X server resources also need to more than just service logins (like actually serve content or API data).
I specifically set a request number that would be below most thresholds.
That said, a one-time request from 100 hosts would still use 100 CPU-seconds of work. Other than preemptively blocking hosts (such as all of AWS), there is no way that a "per-ip attempt throttle" is going to catch a single request from 100 different hosts.
If you had a server with maxed out with 8 intel E7-8870V2 CPUs (15 cores @ 8 CPUs = 120, $32k server cost just in CPUs), and set for a work factor that gave a normal "1 second" work factor, then someone just DOS'd you for most of a second. On a more reasonable 8 core server, that DOS would last 12.5 seconds (actual CPU time, not including the fact that they are stacked on top of each other). And on a dual core system that's almost 2 minutes.
If there was any other part of a website that could be DOS'd for so long with so few requests, then most people would also suggest the application had "far more gaping holes in it." Moxie's point above is that the advice presented is to just put bcrypt/scrypt/PBKDF2 in place, and no advice is given at all on how to deal with these issues that come up, or even that they exist at all, and thus systems end up being misconfigured or work factors relaxed to a point where only a false sense of security is gained.
No legitimate user will be logging in once per second. If you're specifically throttling logins and don't set it higher, you're so incompetent you shouldn't be writing production code in the first place.
> That said, a one-time request from 100 hosts would still use 100 CPU-seconds of work. Other than preemptively blocking hosts (such as all of AWS), there is no way that a "per-ip attempt throttle" is going to catch a single request from 100 different hosts.
So you've successfully DDoS'd an application for the few seconds that it takes a couple dozen cores to chew through 100 CPU seconds. Um, congratulations? I don't think there's a reward for "most pathetic DDoS attempt in history", but it should go to this.
> And on a dual core system that's almost 2 minutes.
You're very confused. When we say "1 CPU second", we're speaking of a single CPU core. 100 CPU seconds on a dual-core system takes ~50 real seconds. Your hypothetical 120-core server would be theoretically capable of processing nearly 120 logins per second at a 1-second work factor. bcrypt et. al are not multi-threaded.
Which have a wide array of other mechanisms by which to DDoS your application. If that level of force is being directed at you, you need professional DDoS mitigation assistance. The CPU time cost of your login mechanism is immaterial.
> There are proxies and NATs with hundreds or thousands of users behind them.
You do not need to throttle an IP if it is not the source of an attack. But it remains inevitable that a serious DDoS will sometimes break legitimate users, even with intelligent mitigation strategies. Welcome to the real world, it ain't pretty!
Why use any of those other mechanisms, which might require a few thousand hosts, when a method exists where a hundred hosts can do just as much damage by getting the server to punch itself in the face.
Few downsides that immediately come to mind:
- The client now has to generate the salt. So you'll want the client to be using a modern browser which supports window.crypto.getRandomValues(); (so IE 11 or newer)
- Low power (either electrically or performance wise) might struggle with the work factor and that might reduce the user experience (the browser might become unresponsive)
- Changing the crypto function is now quite annoying/work intensive. Effectively the client will now need to generate the old hash and the new hash, and send both. You'll need to check the old, check the new, and store the new if either are correct.
- There are some cross-site-scripting and forged request concerns. But those exist for all login pages frankly.
- You'll still need to do SOME work on the server to convert the raw client hash into a storable version (in case of compromise) so they cannot just re-send what is in the database for login (essentially the hash would be a plain text password for all intents and purposes). You could reduce the work factor, but not remove it entirely.
Essentially trusting a browser is a bad bet because too many browsers suck. I'd prefer to put my trust in a more known quantity, the web-server, even if it costs me a few cent in usage than do the extra dev', extra testing, and extra tech' support required to get browsers to do it.
Unfortunately there are still a lot of IE 6-8, Android 2.xx, and so on users who drag everyone down.
Of course, that doesn't really undermine your conclusion in any way - as you say, enough browsers "suck" that doing this client side may well negatively impact quite a few users.
The KDF (bcrypt, scrypt) certainly needs a suitable CSPRNG though, which you aren't able to get across all browsers/clients right now.
If you generate the salt on the server and pass it to the client, then the client only needs to be able to implement the deterministic part of bcrypt.
The Original OpenBSD implementation http://ftp3.usa.openbsd.org/pub/OpenBSD/src/lib/libc/crypt/b... only generates randomness in bcrypt_initsalt
The mindrot java implentation
Only uses [Secure]Random in gensalt
This fact should also be an intuitive deduction - after you've performed and stored the initial bcrypt derivation (with a fresh salt) all future executions need to be deterministic, because they need to product the exact same output (for comparison purposes). bcrypt wouldn't be much use to you if it generated new hash output each time you ran it (with the same password+salt).
Moxie seemed to hint at this problem by mentioning local bitcoin wallets--client-side hashing only makes sense if you're not sending the hash anywhere.
- The user generates a password. That password can be assumed to have relatively low randomness (entropy). It's normally going to be 6-12 ascii characters which isn't a particularly large input space. Expecting users to remember something substantially larger is not realistic.
- Because the input space is so small, good password hashing solutions need to have a high work factor, so that is it computationally infeasible to do a dictionary attack on the database.
- If the input space was larger [but everything else stayed the same] then we could reduce the work factor, because the size of the dictionary would increase, automatically making dictionary attacks harder to do.
- That "everything else stayed the same" is important. We still need our hash to be one-way, but it means (in overly simple terms) if we increase our input space by 1000 times, then we can decrease our work factor by the same magnitude.
- If you can increase the input space enough, then eventually you get to the point where a standard hash like SHA2 has a high enough "work factor" (CPU/GPU load)
And the missing piece is that it is argued (and is potentially true, but I haven't done the analysis, so don't take this as a recommendation or endorsement) that scrypt (used appropriately, with random salting) is an effective means of converting that small input space into a larger one.
To prevent this, you still need some transformation on the server between the login form and the database, which means you're back where you started.
What ever work factor downtuning you need to do to conserve server resources may well be dwarfed by the downtuning you need to do to get bcrypt to run in a reasonable amount of time on a mid range smartphone.
It doesn't seem like it ought to be true. Running bcrypt in Ruby with default settings takes small fractions of a second on a few-years-old Macbook Pro. Even if the smartphone is 100x slower, it should be within the realm of reason to run a process that might take 1-2s once every thirty minutes or so.
Now, I'll buy that if your target device is not a mid-range smartphone, but "the very bottom of the smartphone market," bcrypt might be too expensive.
I think salting offers pretty clear benefits against other attacks if you have more than one user.
Edit: The point I'm trying to make here is that it is N_USERS times better to have a unique salt per user, regardless of the hash. One should use bcrypt/scrypt/pbkdf2 with unique per-user salts.
If you knew that a hash was used many times over in a system, then you'd try cracking that password first to get access to the most accounts.
With salting you don't get to prioritize, or compare to previous information trivially.
Your argument is about protecting trivial passwords. It's quite common for people to independently come up with the same crappy passwords - the adobe "crossword puzzle" leak was a particularly interesting example of that.
My argument is about protecting moderately good passwords. If you use, for example, PBKDF2 with a fixed salt and have a million users, an attacker can try close to a million times as many passwords vs random salts.
You can still prioritize cracking with salted passwords in many cases. Say you had a dump of hashes form hackernews, and they were bcrypted with random salts. You'd go after the admin accounts first, then the well known accounts with a lot of influence, then everybody else.
Of course salting is better than no salting, but its orders of magnitude less important than choosing a good hashing mechanism. Luckily, the good hashing mechanisms do the salting for you so fixating on salting really just serves as a proxy for letting us know that you don't understand the attack vector.
Yes, having a good hash in the first place is more important, but salting is right up there.
The only thing that matters is whether or not your password database ends up on a message board somewhere, not about how you feel about hashing.
Since DDG does it quick, I'll use MD5 as an example, but don't ever use MD5 as your hash!
An attacker already knows a user's plaintext password is "pass123" and no salting has occurred. He can search for "32250170a0dca92d53ec9624f336ca24" in the database and find 5 other users that have the same hash. He now knows that those 5 people have the same password.
Even if you're using a "hard" hashing algorithm, you're significantly reducing the amount of computations an attacker would have to go through by forgoing salts. Especially considering that salting has very, very little overhead.
How can it "salt for you" without you providing the salt?
This question is operating on the presumption of a unique salt per user.
bcrypt for instance generates a random salt on the input and stores it in the output.
It's a different form, but it's pre-calculation and lookup nevertheless.
Are you somehow interpreting the stolen database as the rainbowtable somehow?
Are you saying that, when the attacker calculates the hash, it's analogous to looking that hash up in a rainbow table?
My understanding was ryan-c was describing a database that has a hash(global_salt+password). This concept of the global_salt was introduced specifically to baffle rainbow tables that attacked databases of hash(password)
ryan-c was describing how, if you have a database of salted hashes, the attacker can check every entry in that database against attempt, multiplying his efficacy by the number of people in that database.
To me, these are different threat vectors, rainbow-tables attacking hash(password), ryan-c describes attacking hash(global_salt+password), and to mitigate ryan-c's attack you must do hash(unique_user_salt+password)
can you explain to me how ryan-c's attack is analogs to a rainbow table?
> Say you have a password database with 10,000 users in it. You want to see if any of them used "Passw0rd$". Without salt, you compute the hash once and check it against all the users (perhaps with a bloom filter).
That is basically the same as a rainbow table attack. A conventional rainbow table attack looks like this:
for user in users:
for hash in rainbowTable:
(hash == user.hash)
The attack that ryan-c is describing looks like this:
for hash in rainbowTable:
for user in users:
(hash == user.hash)
They are still fundamentally the same concept.
This is dangerously incomplete reasoning. Salts
real value is that if multiple password hashes are
Leaked simultaneously then each must be cracked individually. In the Adobe leak password hints were included, which allowed for very easy guessing of passwords, and because there were no salts even if you had no hint if someone else had the same password and an obvious hint you were cracked.
If a database is popped and people know John123's password, and his hash is the same as Worklogin456's hash, then they know my password.
username | sha256 hash
john123 | ef92b778bafe771e89245b89ecbc08a44a4e166c06659911881f383d4473e94f
joe456 | ef92b778bafe771e89245b89ecbc08a44a4e166c06659911881f383d4473e94f
username | salt | sha256 hash
john123 | ponies | 98d4c01710b2b775e41b0bf4984f0cec7621ae1dfc4304047b196af46ed589b2
joe456 | beards | 0f7c74279db5daa1f4de41cc7f6acc94651da754c41675dcde8dcf151d14f5bf
their passwords are all `password123`
Some subset of your users will inevitably use the same terrible password as another user. Or one user will create multiple accounts using the same strong password. These passwords should appear unique in storage.
You still need salt, and that salt must be unique for each password. That's the peril of PBKDF2 password storage: I've seen applications completely botch salting by using one for all users. Bcrypt and scrypt avoid this common error by salting internally.
At least that's my understanding... I am not a security expert! :)
In the "user with multiple accounts but one strong password" scenario, for instance, grouping by password would highlight which users are inclined to use the same strong password with multiple accounts, perhaps on other web sites. Multiple accounts would give an adversary more identifiers to use when cross-referencing other leaked databases or the web.
For instance in the first case where hash = md5('deliciously-salty-' + password)
I'm not sure how easy it is for an attacker to reverse the hash if they don't know the salt. (I understand you need to assume the attacker has the salt, but just posing a constrained hypothetical here)
Attacking md5 is not really that hard, nowadays - the article mentions 180 billion tries/second.
Yes, it requires a rig with 25 GPUs, but this is almost the perfect use for the "cloud" - you can rent those for a few bucks just for a specific attack. There are even dedicated services nowadays, with no programming knowledge required - you just submit the hashes, and out come the found passwords.
If all you have is a hash and know the password you still dont know if it's
md5sum('salt:' + password) or
md5sum(password + '-salt') or
md5sum('salt+' + password + '=salt')
I suppose you'd need to brute force a large key space and look for any results which contain your known password. Does that sound right?
So, the attacker knows which of those formats in your comment is being used. With no salts, the attacker can just start running md5 against every character string, and look for matches in your database. Adding salts means he has to do that for every character string appended to the specific salt, so he has way more work to do. But, md5 is still really fast, so that extra work is still feasible.
Adding scrypt/bcrypt/PBKDF2 means he has to use a slow hash function multiple times for each possible password he's trying. That means it takes significantly longer for the attacker to try a potential match, which is the goal.
Think about it: adding a known global salt to each password in the same manner before hashing is effectively the same (in the long run) as not salting. Once the attacker figures out the pattern you used, he can proceed to crack specific passwords, then search for the resultant hashes in the database.
If the assumption is that the random number is known, per both guidelines that only the user-given secret can be assumed secret and that the article says it could conceivably be stored plaintext, isn't it effectively equivalent to just another username?
You'd just need to generate (0 or more chars) + password + (0 or more chars).
Consider an alternative: randomly generate salts with a constant password. It's the same amount of information, and it is obvious that it does no good to ask for a password if there is only one password that everyone shares. All you need is the dynamic part, and you have access. The dynamic part is the security and the data, and everything else is infrastructure that can be known simply by probing.
This is just how information processing works. You have a static 'language' and a dynamic 'meaning', and if you don't know the language, you can't possibly get the meaning -- but this means you can't even use the site at all (even with a password.) So obviously the 'language' has to be public so it can be known by the people who wish to communicate with the site legitimately.
This is absolute drivel.
Salts prevent checking an attempt against all the hashes in parallel.
That is to say if the attacker has a dump of 1 million salted hashes, they have 1 million times the work to do compared to 1 million unsalted hashes.
Just from the added password length alone it has to be "way more" secure, right? Would love some professional crypto opinion on this.
The salt is generally known to the attacker. The point of the salt is that it’s unique for each password hash, making it impossible to run crack attempts on every password at the same time; for a password hash to be checked, the salt has to be retrieved. This means they’re usually stored together and therefore also compromised together; keeping them apart is rarely useful (kind of like peppers, which is what you might be thinking of; they’re per-application rather than per-hash, and their purpose is to be unknown to the attacker, which is equivalent to encrypting things with a per-application key).
If N = 1000000 and it's just SHA("somesalt65634734" + password) then the attacker is just about guaranteed to be rolling in tons of cracked passwords in no time. Whereas with bcrypt or a unique salt per password you don't get that massive speedup by a factor of N.
Lets assume the hash function being used is SHA2-256.
The input block size there is 512 bits (64 bytes).
So if adding the salt pushes the total length of the hash input past 512 bits then it doubles the runtime.
I do not see any particular advantage there over simply running the hash function twice.
Term people are using for a secret is pepper.
H(pepper | salt | password) type of thing.
My term is vinegar which is IMHO better!
Edit: That said, they do still provide value, by making password collisions less obvious.
Iterative hashing with per-user salts has been a known pattern. Practical Cryptography covered it in the first edition. The relevant parts should be derivable for most engineers. There's no reason there should be any confusion or crazy blog posts.
Imagine a service which just stores hashes of every password that it can find from everywhere. Any site that wants can check whether a given password is in use..somewhere..and reject it if it is. Any site that wants can add more passwords to the list.
Internally the service would work from a bloom table designed for a fixed false positive rate. (You can actually layer bloom tables in front of bloom tables with parameters to make this possible.) So even if someone compromises that site, there is no direct record of the actual hashes for passwords.
Basically let sites say, "You have to choose a unique password, and do not reuse a password that anyone has used elsewhere!"
If the service and your site were both compromised, then it would be possible to match up incoming hashed passwords to users by timestamp. But even so, the fact that no passwords were reused will hopefully make cracking the hashes harder.
So..probably a bad idea. :-(
Also, there's a large industry trend of, "let's 'pepper' our hashes too!" whereby they have a sitewide salt. This is a mistake. If you want to add a site-specific layer of security to your passwords, encrypting them makes far better sense than adding to the salt.
For example, I'm writing an app that requires a password setup in Python so I was investigating this issue:
1. Passlib - https://pythonhosted.org/passlib/ - "implementations of over 30 password hashing algorithms" - What? I'm not qualified to judge what's secure so I don't even care, I just want something secure!
2. python bcrypt - https://pypi.python.org/pypi/bcrypt/1.0.1 - OK standard bcrypt, supposedly that's secure - "Author: Donald Stufft" - who is Donald Stufft? Is the code good? I don't know C and I'm not qualified to judge!
3. python scrypt - https://pypi.python.org/pypi/scrypt/ - OK it's Colin Percival's code, it's probably good, "Author: Magnus Hallin" - did Magnus Hallin screw it up though? "Burstaholic on Bitbucket provided the necessary changes to make the library build on Windows." Eh. Did "Burstaholic" ruin the crypto? Who knows.
Also, in my case I needed something cross-platform, primarily for Windows. bcrypt worked out of the box. The scrypt library required building on Windows with OpenSSL development headers or whatever and all that - I just said "pass".
Nacl is fantastic and simple to use, it can't get much easier than this: https://libnacl.readthedocs.org/en/latest/topics/public.html
Something similar should exist for passwords - I just want to call compare(password, storedhash) or whatever and be done with it, I'm not qualified to judge the crypto. And it should be cross-platform.
Correct me if this is mentioned in the article, but based on several mentions of how salts were effectively useless, I think this was overlooked.
He's simply saying that a salt isn't enough to keep your passwords secure - you should also be using a slow hashing / memory intensive hashing function.
ps. for those that use bcrypt and want to ensure a constant user experience on their server see https://www.npmjs.com/package/bcrypt-speed
Beside it makes rotation easier, and not having to remember multiple passwords, or having a password manager, etc. a thing.
Oh wait ;)
Google and others are actually attempting to make this work with U2F:
https://support.google.com/accounts/answer/6103523?hl=en (yes it works with gmail. Right. Now.)
While it recommends using bcrypt, if you augment that with "or scrypt or PBKDF2" the same principles apply.
The user group really benefiting from strong password hashing schemes is the one between the two mentioned extremes. So if you really want to protect all your users and use simple password authentication you have to enforce strong passwords policies which is itself not trivial and will definitely badly annoy some users.
Sure. This is something I touched on very briefly near the end of the post. Password re-use is a huge problem and could easily be an blogpost of its own.
> On the other hand strong passwords will withstand an attack even if they are hashed with plain MD5 or SHA1 - at least right now, there are already some serious collision attacks.
For some value of 'strong' sure. If you're password is long enough, random enough, and of a broad character set then yes, an md5() of that may well be beyond the reach of most attackers. But that's not the sort of password most people, even security conscious people, choose. A preimage against MD5 would quickly change this too, and that's a risky gamble.
> The user group really benefiting from strong password hashing schemes is the one between the two mentioned extremes.
The best we can do is make attacker costs a function of the password that the user has chosen. If someone makes their password 'password' there's no degree of magic that's going to help. Mostly, the benefit comes if/when disclosure occurs and there's an active incident response effort.
> So if you really want to protect all your users and use simple password authentication you have to enforce strong passwords policies which is itself not trivial and will definitely badly annoy some users.
It's not a zero sum game. When a breach occurs, the rough points in time are something like:
1) Attacker compromises digests (let's just assume all are obtained in an instant, rather than exfiltrated over time)
2) Attacker recovers 1 password
3) Attacker recovers n passwords
4) Attacker recovers all passwords
Our goal isn't to eliminate these steps except hopefully the extreme of 4. Rather, it's to increase the time between 2 and 4, and make sure 3 is closer to 2 than 4.
On the other side of the coin, there's a few points in time:
a) Victim org. realizes digests have been compromised.
b) Some credentials valid at 1 are made invalid.
c) All credentials valid at 1 are made invalid.
If you accept my previous point that at some extreme there's nothing that can be done to prevent brute force, then you should also see that storing passwords in this fashion isn't meant to be a perfect defense. By minimizing the number of compromised accounts at a given point in time, we can maximize the benefits of our response. We can notify users and force resets, invalidating some set of credentials before they're recovered. We want b and c to be closer in time to 2 than 4, and minimize the value of n when we get there.
You don't need SQL Injection if you can directly read the database file, or can operate as the web server user, therefore if you're doing SQL Injection you don't have those privileges.
If you can HMAC or encrypt passwords (prior to hashing them) in the database with a key only on the web server, then that's an extra level of protection. You can't get that key with SQL Injection alone.
password_hash(base64_encode(hash_hmac("sha512", $password, $key, true)), PASSWORD_BCRYPT)
No we can't.
- Rolled my own password library (linked elsewhere in this thread)
- Rewritten an existing AES-CBC + HMAC-SHA-256 library to use a different
underlying driver (mcrypt is basically a mummy at this point)
- Did both of these things in PHP, which a lot of people assume
is inherently incompatible with security
However many people have reviewed your code, far more people have reviewed libraries like bcrypt!
When I said I wrote my own crypto code, I don't mean like something that belonged in a PHC entry.
var hash = new System.Security.Cryptography.Rfc2898DeriveBytes(password,
Wound up having to implement my own version of PBKDF2 that lets you specify the HMAC algorithm.
For anybody who has critical security needs, just use an HSM already. Sophos/Utimaco makes a nice one, and it is a nice foundation for password security.
Rolling your own login system, dealing with hash functions and checking passwords yourself is far too low level and dangerous for most apps (as this thread demonstrates).
Or should we still include salts in addition to scrypt?
If yes, only one global or many user specific or both?