Hacker News new | past | comments | ask | show | jobs | submit login
How To Safely Store A Password (codahale.com)
298 points by IgorPartola on Dec 14, 2010 | hide | past | web | favorite | 210 comments



B-crypt and S-crypt are great libraries to use to solve this problem. However, the poor man's approach is as follows with HASH being your favorite hash function

    h = HASH.new()
    HASH.update(password)
    HASH.update(salt)
    for x in xrange(X):
        HASH.update(HASH.digest())
    return HASH.digest()
this approach "strengths" the hash by forcing you to calculate it over and over again. You should set X to be the number of rounds you want to conduct. Ie. how slow you want you server to respond to an individual request. It is always a trade-off between server slowness for individual requests and "security" of the hash function. The goal is to make dictionary attacks take longer than is feasible for you attackers to conduct.

[Note: you should absolutely have a different salt for each password with this approach.]


For what it's worth: this is essentially PBKDF2.


Actually, it's essentially PBKDF1, but that's close enough for non-cryptographers.


Shit, you're right. (PBKDF2 generates arbitrary-length output, which is what you want from a key derivation function, but maybe not something you care about for a password hash).

Here's a version that's much easier to read than the spec:

https://github.com/emerose/pbkdf2-ruby/blob/master/lib/pbkdf...

I should stop saying PBKDF2 and just go back to saying PBKDF.


PBKDF2 also protects against entropy loss from repeated iteration, although that doesn't really matter unless you're using an unreasonably short hash.


Why would you use this "poor man's approach" over bcrypt or scrypt? My understanding is that these two work on a very similar concept (work factor) and are free to use.


Some projects require FIPS 140-2 compliance. I've not been able to find that blowfish or bcrypt are certified. See http://csrc.nist.gov/publications/fips/fips140-2/fips1402ann...


If a randomly clobbered together and unvetted system is compliant, but bcrypt isn't, that just goes to show how little FIPS140-2 compliance actually means. (as if everybody didn't already know it's worthless)


Nonetheless, some projects mandate use of FIPS 140-2 hashing algorithms, and afaik bcrypt is not one. So if you find yourself on such a project, bcrypt is not an option on the table.

I'd be happy to find out I'm wrong.


I can think of one reason: Lack of the available libraries on a particular platform.


This is why I mentioned it. It provides improved time guarantees for your users with out needing another library.


bcrypt is open source and ported to a range of operating systems. I cannot imagine getting it up and running on any system would be too difficult. It would probably be wiser to to spend the time to get bcrypt (or another standard scheme) working rather than coming up with some custom scheme which probably hasn't had the same level of thought put into it.


Because building your own square-shaped wheel is more fun than talking a walk over to your friendly neighborhood wheel store and buying a round one?


Because we all know rolling your own in the crypto world instead of deferring to the experts never leads to catastrophe!


You assume those are the only two choices, I was considering that in some environments these might be the only two choices when I asked the question.


Why not also add

  HASH.update(salt)
On each iteration as well? Then again, if things were this simple, someone would have already said "just do this", it would have been peer-reviewed and would have become widely used. Anyone know why this hasn't caught on yet in web frameworks like Django/Rails?


If you're talking about why you don't do hash.update(salt) on every round, well it turns out that H(salt || H(H(H(H(H(H(password))))))) is just as strong as H(salt || H(salt || H(salt || H(salt || H(salt || H(salt || H(salt || password))))))). So you don't add any security by doing that.

When working with cryptography, "why not also add" is really dangerous, because some times you get less secure systems after doing so. And other times you get no benefits from doing so.


Could you explain a bit more about this?


(Aside: I'm sure that tptacek could give a significantly better answer here than I can, but I'll give it a shot. (And then whenever he answers trust him more than me.))

Assume you have a perfect cryptographic hash function H(X). No matter how many of N bits of X you change (0 < N <= len(X)), on average 50% of the bits of H(X) will change. So, let's consider the case of just H(salt || H(salt || password)) versus H(salt || H(password)).

Let A = H(salt||password), and let B = H(password). A and B are now, for all practical purposes, two different random integers. Each of which has the same entropy. This is because adding the salt should make no impact on the quality of these random numbers. It should now be fairly easy to see that there is no difference between H(salt||A) and H(salt||B), other than the fact that they produce different outputs.

This is all based on the assumption that the hash function is a perfect one -- however this assumption is reasonable for strong hash functions.


What value of X would be good for a web app? 10? 100? 1000?


1000 is a minimal value.


As I said, it depends on how slow you want your server to respond to a given request. For instance, you could set X so high that it takes a minimum of 1 second to respond to a request. That is probably not necessary. The real answer is you need to take the estimated minimum amount of time to calculate one of hash using the function you have chosen. Then compute using that number as the basis the minimum number of rounds to assure reasonable security for a single password. Reasonable security could be it would take 1 year to a do a full dictionary attack alphanumeric only or perhaps 1 month or 1 week. The value you choose for the time it takes to brute force one password is the level of protection you are providing your users.

EDIT: also what Thomas said. It is in the standard that 1000 should be the minimum value. I would use higher based on the level of protection you want to give your users. Note that different hash functions have different costs so this will also impact the choice of X.


I wonder, is it easy to use bcrypt with a variable work factor per-password?

I'm thinking you could take your entropy analysis of the user's password and set it so that "weaker" passwords use a higher work factor. This analysis could be easily done before hashing every time the password is input, so an attacker wouldn't be able to single out weak passwords from the hash file.

Theoretically, you should be able to tailor the numbers so that cracking a weak password isn't any faster than cracking a strong one, right? Plus it will encourage your users to use a stronger password in the first place, since it'd make login slightly faster.

Any obvious problems with this plan?


It's not hard to do this (with Ruby bcrypt, it's 2 lines of code using the public interface), but I think you're overthinking it. Most users will use crappy passwords. Set the work factors uniformly high.


Definitely a good point.

I feel like it might still help you to avoid wasting time exhaustingly hashing already strong passwords. I mean, how high does that uniform work factor have to be?

The guy whose password is "password" or "qwerty12" is gonna get cracked no matter what, sure. But what about people whose passwords are a couple of dictionary words? If the work factor means each hash takes a second or two, even a slightly complex dictionary attack becomes fruitless (for most crackers...)

So the user can still log in, but it takes a second or two, and there's that little message, "To ensure the safety of your password, your login has been slowed down. If you use a more secure password, like <generate password>, you will login much faster!"

And the young lady using "6ab$TRa?" isn't being punished for the fact that most users use crappy passwords, nor is your server.


You're overthinking both the security aspect of this and the performance aspect of it. User-imperceptible hash times are adequate to make most conceivable brute force attacks intractable.


That's true of passwords over a certain strength, for sure. And it always has been; "6ab$TRa?" has never been counting on the difficulty of hashing for security, and probably won't for some time.

But as computing power increases, that minimum strength is being pushed out. bcrypt lets us hold the line by keeping pace with computing power— couldn't it also give us the ability to push back?

How many users are using the 100,000 most common passwords? 1,000,000? As it stands today, anyone using one of those is instantly compromised the moment the hash file is accessed. With a truly variable work factor, you could theoretically ensure the safety of any password, regardless of strength.

It obviously gets silly toward the far end (make "password" take three months to hash?) And maybe it gets silly a lot sooner than I'm thinking. But surely it could be pushed back a little, yes? Make one of those 1,000,000 passwords intractable, and you're protecting thousands of users from attack.


"6ab$TRa?" has never been counting on the difficulty of hashing for security, and probably won't for some time.

False.


Elaborate. Obviously its security depends on the hashing scheme (if it's CRC32, you could find a collision pretty easily), but educate us -- is that all you meant?


To nitpick, the topic at hand is pre-image attacks, not collision attacks. Pre-image is where you know the hash and want the plaintext, collision is where you create two plaintexts with the same hash but don't care about the actual hash value. The former is recovering information, the latter is falsifying trust and almost always involves signatures.

Collision attacks don't apply to many situations but are much easier to execute, for example a MD5 pre-image attack requires approximately 2^128 steps but a collision attack requires only about 2^64 steps. This is why MD5 is totally unsuitable for collision resistance, and in fact has already been successfully exploited to fabricate a real-world CA certificate, but still puts up mild resistance to password cracking. Not that I'm recommending you use it or anything -- do what the nice gentleman says and just use bcrypt already!


Wrong. Collisions can be found in MD5 in 2^21 time due to an attack by Xie and Feng. 2^64 is a very respectable number and is not practical for people to do on their home machines. 2^21 is.


You are right, of course. I wrote that as 'ideal digest' instead of MD5 then rewrote it. Specific digests always lose a few bits in real life, or in MD5's case, most of the bits...


Plus you only need a collision to break into a system that uses hashed passwords, not a preimage.


Clarify? Collision attacks by definition do not feature an existing digest as input so they are not useful for breaking into a system secured with a digest.


Ah, I misunderstood. By "collision attack" you meant "find two plaintexts that hash to the same digest", I interpreted it as "find one plaintext that hashes to a specific digest", and "preimage attack" as "find the plaintext that was hashed to this digest".

Please disregard my comment above.


I guess he could mean that you could find a plaintext that had the same hash value through use of a collision ... but that's just finding a preimage.


That's all I meant. 8 type-able characters is pretty easy to break if you have a couple of grand to invest in GPU's.


> As it stands today, anyone using one of those is instantly compromised the moment the hash file is accessed.

How? If hashing one password takes one second, and you have a dump of a thousand users, it will take you a million seconds to try just 1,000 common passwords on that list.

Are you thinking of unsalted hashes, perhaps?


As noted in the OP, it takes far, far less than one second to hash one password using standard algorithms.


It seemed to me like you were talking about bcrypt, weren't you?


Well, the argument was that user-imperceptible hash times should be sufficient for all passwords. I was trying to make the case that even the class of extremely weak passwords can be protected with perceptible hash times.

So really I think you were demonstrating my point :)


I think it would be enough to get hold of the 1,000,000 most common passwords and tell the user it's not allowed.


That still leaves you open to side-channel attacks, yes? It's easy for an attacker to find which passwords are prohibited, so by restricting them you remove them from the search space. But your users aren't going to start choosing fundamentally secure passwords, the attack just shifts to the next 1,000,000 common passwords.


Maybe it just shifts to the secure password you generate and suggest to them?


Dumb idea:

Have the work factor also be a function of the password itself... this would cause even more difficulty brute-forcing the password, as it means intermediate steps would also have to be tested - breaking one weak password doesn't give you any information about other passwords.


I'm not sure about the bcrypt implementation, but if the work factor is public (i.e. you have to know it before calculating the hash), this gives you information about the password, which is bad.


This exposes information about the password: namely it's estimated entropy. The bet you're making is that the increased work factor overshadows any advantage an attacker may gain knowing that information.

How could someone use this? Well, I could decide to only target the rows with a low work factor. Since your entropy estimate is high for these rows, I can know that it's more likely they'll be 8 characters or longer and use a wider range of characters. I can likely ignore all candidate passwords that are shorter or that do not include non-alpha characters.

How useful is this? Let's assume 2 choices of work factor. Also let's assume strong passwords of length 8 have 96^8 ~= 53 bits of entropy and eak passwords of length 8 or less have 27^8 ~= 38 bits of entropy.

You just let me cut the search space for strong passwords of length 8 to to ~15 bits, and in a double whammy, I get to use bcrypt with a low work factor when brute forcing against these rows.

I'm not a cryptographer, and so it's entirely likely I've made some mistake here. But as a general rule, I think it is an _extremely_ bad idea to use cryptography in any way that exposes additional information about individual rows in a database.

Cryptography is not a place for innovative thinking. Even cryptographers need their cleverness to undergo exhaustive review.


If I’m remembering my discrete math correctly, your claim isn’t correct:

> … Let's assume 2 choices of work factor. Also let's assume strong passwords of length 8 have 96^8 ~= 53 bits of entropy and eak passwords of length 8 or less have 27^8 ~= 38 bits of entropy.

> You just let me cut the search space for strong passwords of length 8 to to ~15 bits…

You can’t subtract bits of entropy like that.

Here’s something I hope will convince you this reasoning is faulty.

Imagine a universe of 3-digit passwords, and there are two kinds of passwords: Strong ones use a mix of digits 0–7, and weak ones only use the digits '0' or '1'.

You could see the strong passwords could be any of 8^3 = 512 different combinations (~9 bits of entropy), except the 2^3 = 8 combinations (3 bits of entropy) that would only contain ones and/or zeros. So while a worst-case for brute-forcing a known-weak password is trying 8 strings, the worst-case for brute-forcing a known-strong password is trying 504 strings. This is still the same order of magnitude, and still approx. 9 bits of entropy! You removed such an incredibly small sliver of passwords, that an attacker really isn’t any better off than before.

Another way to think of this is, just because the user didn’t use only lowercase letters, doesn’t mean that none of the characters are!

Back to your example, with a strong password search space of 96^8. Now if you know a password is strong, that means it isn’t one of the 27^8 possible weak passwords. By how much does this reduce our search space?

  7,213,895,789,838,336 possible strings of length 8

- 0,000,282,429,536,481 possible 'weak' 8-char passwords

= 7,213,613,360,301,855 possible 'strong' 8-char passwords

We’ve reduced our search space by only .0039%.

That said, rolling your own crypto — which the grandparent post isn’t really quite doing — is something you should run away from, fast, unless you really are a cryptographer!


You are correct: 2^10 - 2^5 != 2^(10-5); which is what he is doing by subtracting the entropy.


Thanks for the correction!


If I understand the grandparent correctly, there would be no way to determine which rows have a higher work factor. The work factor would be determined when the password is given, based on the password entropy - the correct password will always have the same entropy, therefore the same work factor. If the work factor is calculated on the wrong password (typo, etc.) it will not generate the correct hash anyway. Therefore an attacker has no way of determining which password hashes have a high work factor, and which ones have a low work factor.


Ah, yes, I'd missed that. Since you have the password at login time, you don't need to store the work factor. There are some practical things about making sure you can version the entropy estimate code without breaking existing logins, but it's certainly doable.


The definition of a "weak" password is so loose it doesn't make sense to make your work factor a function of it. The most common attack on this kind of schemes is a side-channel attack, where you single out a subset of the password alphabet with a relatively small "work factor" and crack passwords which use that alphabet quickly.

BCrypt is already based on the premise that there are passwords which are easy to compute, and tries to avoid that differentiation. Why would you introduce it back in?


I see it as a kind of side-channel defense; take the set of passwords which it is possible to crack quickly and use the work factor to even it out so that the total time to crack a password remains relatively constant.

Of course it's a good point that the challenge then becomes determining what that set of passwords is. And not having run the numbers, I can't be sure how much room for improvement there is.


Hm. I don't know about bcrypt at all -- is it possible, given the ciphertext, to know roughly how much work is required to test a password?

E.g. an attacker can go "Oh! this password will take FIVE SECONDS to test, so I know it must be a simple password." or "Hey, check this out; this password can be tested in 0.1 seconds. It must be pretty complex."

In general, I'd guess that these kinds of information leaks are pretty bad because if an attacker can see how hard a password is to test, he now knows something about the password.

It may be better if, given a single unchanging hash, if it takes a variable amount of time to test a given password against this hash, though that might have its own can of worms.


> is it possible, given the ciphertext, to know roughly how much work is required to test a password?

The work factor is an input to the digest function, both when creating and when validating the password. Normally it should be stored alongside the digest itself so you can increase the work factor over time without disrupting existing passwords. So you are correct. It might theoretically be possible to correctly balance the work factor to counter variation in password info entropy so that all passwords take about the same time to crack, and this would be very cool and impress members of the opposite sex, but it would not improve security at all.

Making a probabilistic password checker is also a superficially interesting idea. Maybe my mind is too small to explore it completely, but it seems that at best it would be no better than just increasing the work factor.


Yeah, obviously if you can tell from the hash which class the password came from you lose the benefit.


Doesn't matter. 5 seconds or 0.1 seconds per test means that you cannot brute-force. Even for 5-letter English lowercase letters your search space is 11,881,376 combinations. At 0.1 sec/try it will take you 6.8 days on average to find just one password.


"It’s important to note that salts are useless for preventing dictionary attacks or brute force attacks. You can use huge salts or many salts or hand-harvested, shade-grown, organic Himalayan pink salt. It doesn’t affect how fast an attacker can try a candidate password, given the hash and the salt from your database."

Why are you storing the whole salt in your database? Isn't it much more common to keep half of it in a configuration file? I know Django has a SECRET_KEY parameter for this sort of thing, and hopefully other frameworks do also.

For that matter, why is authentication being handled by the web server? If you've got data worth stealing (billing, emails, medical), you can afford to spring the extra few hundred for a proper authentication server.

Gawker's password handling (7-bit salt, in the database, digested with crypt) seems like the worst possible implementation of secure password storage.


That salt is a public value. The security of salted password schemes is meant not to depend on the secrecy of the salt.

Every time this topic comes up, 15 people chime in with various schemes in which some of the "salt" is derived from the hostname and some of it is stored in an encrypted vault and some of it is inferred from the color of the user's eyes. This is why Coda is making fun of "Himalayan pink salt".

To understand how irrelevant these details are, consider AES encryption. In addition to hiding an AES key, you can also hide portions of the AES CBC IV (a public value). You could use a random number of rounds. You could mix in tweaks with the round key. All of these things are possible, but (a) nobody analyzes AES based on those random hacks, because they don't fundamentally alter the properties of AES, and (b) nobody does those things, because they are silly.

Gawker could use the best conceivable practices in cryptography to obscure passwords; they could be using Colin Percival's scrypt function (which nobody uses yet) to be (in some way) provably resilient to hardware-assisted cracking. You could still level this criticism at them, for "not doing something to further obscure the encryption they used".

This is not a new argument; what I am re-explaining here is Kerckhoffs' principle.


Chrome OS is using scrypt!


> That salt is a public value. The security of salted password schemes is meant not to depend on the secrecy of the salt.

I don't understand why you'd add extra content to the password unless you keep it secret. The whole point of a salt/nonce is to prevent attackers from attacking the digest, right? You need some per-user data, to defend against rainbow tables, and some per-site data, to protect weak passwords.

My fundamental objection to schemes like bcrypt/scrypt is that they impose a heavy performance penalty on authentication to avoid a relatively rare case; besides, any theoretical entity capable of reversing a typical salted-password implementation is also capable of reversing bcrypt/scrypt.


"Reversing" bcrypt? "Reversing" salted hashes?

You're exactly the guy I'm talking about. "Oh, I use AES, but I don't just use AES; I store the secret IV for AES in a cookie so even my server can't decrypt it unless the client comes back with the IV so it's like two guys in the silo with the missile keys". Seriously, I just found that piece of code yesterday. Did you write it? Stop writing that stuff.


I can probably top you: some many years ago, I discovered that Network Solutions (the original domain name provider) was using the first two characters of the password as the salt for their user accounts, presumably so that the salt could also be secret. Of course, this had the side-effect of making the first two characters of the password visible in plain text if you looked at the hashes. I reported this as a bug and never heard back. I've also never done business with Network Solutions since then.


So the salt was just added to the hash without hashing the concatenated hash + salt, i.e was this the method they chose?: saltedhash(password)=hash(password).salt

A part from the obvious flaw of including part of the password in plain text, how secure is this method compared to the following method were the salt is not a secret and were the concatenated hash + salt is hashed?: saltedhash(password)=hash(hash(password).salt)


Using two-character salts might be as big of a wtf here.


Perhaps in retrospect, but this was standard for the Unix crypt() function and passwd files of the time: http://linux.die.net/man/3/crypt The dangers of rainbow table attacks weren't well considered at the time.

The real error was misreading the man page and using the first two characters as the salt, which is then published as the first two characters of the hash. It's sort of an easy error to make, because to decrypt, you do use the first two characters of the hash. Understandable for a beginner working on a school project, but pretty ludicrous for a large company holding control over most of the domains on the internet at the time.


No; I use standard, time-proven, secure designs. SHA1(salt + password) is sufficient in almost all cases, and if anybody is capable of deriving the original input from the digest, they can do so no matter what digest algo is used.

I know there's lots of ways to screw up security, but most of them derive from lazy people taking shortcuts. They run the httpd, database, and authentication all off the same server so a vulnerability in one compromises all. They store secrets in the database because figuring out secure storage would take half an hour of research.

Replacing a poorly-implemented SHA1-based system with a poorly-implemented bcrypt-based one won't help security.


SHA1(salt || password) is an incompetent design that is debatably even easier to crack than the Gawker hashes. The insecurity of that construction is why we have PBKDF2.

If the entire knowledge you have of cryptography comes from _Applied Cryptography_ --- wait; let me extend that: if you even feel the need to cite _Applied Cryptography_ --- you should be careful debating crypto constructions. You're not going to end up happy.


I'm curious what the attack is that makes that easier to crack than the Gawker way. (I'm sure you're right, I just didn't know that it would be easier.)


SHA1 might (I think it is, but I'm not sure) be faster than DES; among other differences, DES crypt(3) has to run the DES key schedule before producing a hash. Data slips through SHA1 like a greased seal.


Looks like you're right.. these benchmarks show SHA1 about 5x faster than DES: http://www.cryptopp.com/benchmarks.html


Ah, alright. I was thinking there was some kind of length-extension weakness (which doesn't apply here, which is why I was confused) or some other attack in the cryptographic sense. Thanks.


SHA1(salt || password) does have a length-extension property; you can easily compute SHA1(salt || password || junk || arbitrary-data) for any given hash. That's not very useful for password hashes, but is devastating for the kinds of SHA1-based authentication schemes that people who write their own password hashes seem to come up with.


Yeah, by "doesn't apply" I meant "that's not very useful for password hashes".


A machine with a few GPUs can compute hundreds of millions of SHA1 hashes per second.

http://www.win.tue.nl/cccc/sha-1-challenge.html



I don't have a dog in this race either way, but I'm curious what you dislike about Schneier's book.


It's not deep enough to provide true knowledge of cryptography and cryptographic attacks while it also doesn't give practical advice on what to actually do in situations that require cryptography (read: always use high-level primitives). Applied Cryptography is pretty good (if outdated), I think, if you're seeking to gain a beginner-level knowledge of cryptography. Practical Cryptography, on the other hand, is a far better choice for what to do when actually using cryptography (although even that is outdated now).


If you think _Practical_ is outdated (and I'm not saying it isn't), you should come over and let us buy you coffee sometime.


Haha I'll take you up on that when I get back to Northwestern. I'll admit that the only reason I think it's outdated is because I was just teaching basic cryptography to the network security students and ran across a few things that made me think "Hmm, Niels/Schneier should really include this in their next printing." Some things I'm thinking of are EAX/GCM instead of the conventional CTR.


Quoting me: "Lots of random facts about crypto trivia. Not a lot of context. Even less information about how to actually safely use crypto primitives. You'll come out of it knowing how to get CAST or IDEA into your code --- two ciphers nobody uses anymore --- but not how to properly choose an IV for CBC mode."


> No; I use standard, time-proven, secure designs. SHA1(salt + password) is sufficient in almost all cases, and if anybody is capable of deriving the original input from the digest, they can do so no matter what digest algo is used.

Anyone can create an input that hashes to a given value. The relevant factor is how long it takes to create that input. I hope you can see the difference between that process taking 3 seconds vs 40 years.


> they impose a heavy performance penalty on authentication to avoid a relatively rare case

You're authenticating over the internet. What is 1/10th of a second to authenticate the first time you want to log in relative to everything else? It's like complaining you have to put the key into your car before starting a five hundred mile road trip. Yes, it takes a few seconds. But worth it? Most definitely.


If you have 10 people logging in per second, you put a 1 second delay on future requests which is certainly noticeable.

Maybe the correct thing to do is to make the key derivation executed on the client side, but then this would erode the experience of mobile phone users.


There is no way to securely do this clientside. It's hard to imagine a situation in which login overhead is painful where scaling in general isn't already a huge concern; presumably, anything you do after login is going to be more painful than bcrypt.


I would imagine some kind of zero-knowledge proof would work here, but that would require more server interaction than just doing the hashes in the first place.


If you have ten people logging in per second, you've got to have more than one server. Distribute the login requests.

And if it really kills you to make it take a full second, then make it take 1/10th of a second: there, now your hashing is faster than the time it takes to dynamically generate a page.

People really need to learn that security doesn't come free, and some times you just need to bite down and say "You know what? I never plan on getting broken into, but just in case I do I'll take the tenth of a second extra computation in exchange for doing the right thing."


Basic version: The salt prevents an attacker from being able to pre-compute a mapping from password <-> hash value. It's most effective when it changes per-password and it doesn't matter if it's public.

http://en.wikipedia.org/wiki/Salt_(cryptography)


To make it even more obvious, it's to prevent this (I haven't done SQL in a long while so this may be broken):

  SELECT username FROM users WHERE password=HASH('secret');
I've seen systems where that statement will give a list of all usernames with the password "secret".


If you use a salt you could still do this 'attack.

    SELECT username FROM users WHERE password = HASH(salt||'secret');
This is academic because you already know the password ('secret').

Salts make rainbow tables (essentially precomputed hash values of (say) all english words) hard and infesiable.


The point of the salt is not to add extra complexity to the secret information; it is to make your transformation of the password into a hash different from everyone else's such transformation. This prevents attackers from getting a supercomputer or two, calculating a mondo rainbow table, and using it to crack every password hash in existence. The salt forces them to perform this computational feat separately each time they want to crack a password.


I may be mistaken, as I was only browsing. I was looking at Django's auth system last night (as all of this had me curious about the backend). I don't think that SECRET_KEY is used for part of the salt at all, I think it's primarily used for validation of site-generated data, signing requests, and cookie encoding/decoding.


You are correct, Django's hashes are stored as salt$digest, or maybe salt$function$digest nowadays. It wouldn't be very good if all your password hashes became invalid because you changed the secret key.


You're probably correct; when I used Django I had to reimplement the auth, since their built-in doesn't support roles, and I didn't look too closely at their implementation.


If you want to crack passwords ultrafast, you make a look-up table of the hashes of all probable passwords. Without a salt, this is rather a smaller table. A 7-bit salt makes the table larger, but not hugely larger, and not troublesomely larger.

The 128-bit salt used by bcrypt makes the table intractably hugely large. You cannot precompute it.

Of course, you know the salt (because it's stored right there in /etc/shadow), so you can still run through dictionary words and try them all. But bcrypt is designed to take arbitrarily long amounts of real time to do this.

So in the case of bcrypt, it's not really an issue that the salt is stored right there alongside the hashes password.


Those password crackers that are belting through billions and billions of passwords an hour with just a couple of video cards aren't using rainbow tables, are they? You could have zero bit salt: you're still boned.

Bcrypt is not better because it has a better salt. It's better because one iteration of bcrypt takes a long time, and millions of iterations take an intractably long time.


Bcrypt is better for BOTH those reasons working together; removing either the salt or the variable-cost key scheduler would be Bad.

Obviously, the variable-cost key scheduler is the central notion to the thing, but not having a large salt completely nerfs it.

bcrypt uses a 128-bit salt, and it uses it for good reason. See the paper, sections 6.2.1 and 6.2.2.


You're not wrong. But a 128 bit salt on SHA1 would do zilch to slow down an attacker. You see my point. I see yours.


Yes... the point I see you making is something like this:

Putting the 128-bit salt on there prevents a precomputed dictionary attack, a constant time operation.

But SHA1 itself is a constant time operation, so having a salt only slows down an attacker in a wall time sense, but not the more important time-complexity sense.

We all agree the salt is not particularly important for the constant time hash. (It's only practically important when the hash takes a lot of wall time relative to the wall time of a precomputed dictionary lookup, and constant time hashes gradually lose this edge due to Moore's Law.)

The point you see me making is that bcrypt is not a constant time operation (due to the variable-cost key schedule--2^cost, actually), and allowing people to use a constant time precomputed dictionary lookup by not having a large salt would make it as bad as no-salt SHA1.

So we all agree that the large salt is vitally important for the non-constant time bcrypt.

Not that either of these points are relevant to my initial assertion that giving the salt to an attacker is not something people worry about. The salt is there to prevent a precomputed dictionary attack, and a large salt does this no matter how well-known it is.


Why are you storing the whole salt in your database? Isn't it much more common to keep half of it in a configuration file? I know Django has a SECRET_KEY parameter for this sort of thing, and hopefully other frameworks do also.

How is that really any better? You should assume that if an attack can get to your database, they can get to your web servers and take all of that as well. Such a scheme certainly wouldn't have saved Gawker.


In Gawker's case this is exactly right. The database AND the source code was accessed so the config file defense would not have worked.


Config files are not typically world-readable; the httpd reads them before dropping permissions. Otherwise, a vulnerability in the httpd would allow access to everything in the config (remote server passwords, signing keys, etc).

There's no reason why compromising the database would allow attackers into the web server, unless you've configured SSH to allow signing in from arbitrary remote systems.


If you lose code execution on your server to an attacker, you're done. 100% fucked. Everything in your environment needs to get stripped down and rebuilt from trusted sources. Do not be one of those people who rationalizes "oh, I just lost uid=4294967294". Gawker lost root. So will you.


And if you lose root, the disk must be reimaged. Rootkits are too advanced these days to ever make the assumption that you have removed them.


It's possible that sometime not too far in the future that advice might need to be upgraded to "If you lose root, unplug and destroy the hardware and install a completely new machine from trusted sources".

http://www.theregister.co.uk/2010/11/23/network_card_rootkit...

(Why yes, I _did_ have that problem weighing on my mind while I investigated several machines that had a weekends worth of exposure to the recent Exim remote root exploit...)


Direct DB access isn't typically world-accessible either. It's not like I can do an XML request to get my hashed password on any site.

The point is, if someone has gotten enough access that they can actually get a raw copy of the database, it's just as likely they can get a raw copy of the config files, or /etc/shadow, or whatever else is on the host system.


> Config files are not typically world-readable; the httpd reads them before dropping permissions. Otherwise, a vulnerability in the httpd would allow access to everything in the config (remote server passwords, signing keys, etc).

I think you give people way too much credit. Maybe config files aren't "typically" world-readable but they ARE typically httpd-readable.

It's a mistake to assume, that because you might follow best-practices, the rest of the people out there do. You said yourself in a comment on this same post that many people make mistakes because they're too lazy to take that "half an hour of research." Let's make the assumption that that is typical.

> There's no reason why compromising the database would allow attackers into the web server, unless you've configured SSH to allow signing in from arbitrary remote systems.

Oh, you mean the default behavior? See above.


Fun fact: this has been posted to Hacker News before, when it first came out. I think most would agree this is a great example of why re-posting should be allowed.

http://news.ycombinator.com/item?id=1091104


I admit I games the system a bit to re-post this link, but it seemed very relevant. I learned a ton from this discussion, so my selfish reasons were met.


+1

Thanks to codahale for writing this, r11t the HN user who originally posted this early this year, and the discussion by everyone on that HN thread which convinced me of the correctness of the bcrypt approach.

I've already shipped one project which I'm sure at least part of the reason we successfully pitched was our demonstrated indepth understanding of password security requirements.

(I also realised in retrospect that a project I'd designed and specced before reading that article/discussion was going to be wrong in it's password handling, I'm disappointed we never got to build that product, but I'm kinda glad I'm not sitting here thinking "Fuck, what am I gonna do if $project's database ever gets compromised? It'll be just as bad as Gawker...")


If you have a 5-year old database using a given bcrypt work factor, how difficult is it to transition to a new, higher work factor?


This is a good practical question. My read of the algorithm is that you must force each user to enter a new password, and encrypt that at the new higher cost.

If you wanted to "upgrade" the passwords to the higher cost key schedule, you'd just continue the key schedule where it left off--but this would require knowing the original password! So that's not really an option.


You can upgrade passwords when the user logs in. The sanest thing to do seems to be to store all the public variables (work factor, salt) alongside the digest so each password can be handled separately.


What about performing the hashing again, on the existing hash (of course it would be simplest to encrypt from the beginning when user logs in again, but just for discussion's sake). Let's say we have a legacy db with hashes created with small work factor. We could simply perform the hashing on these previous hashes (increasing work factors as appropriate). We could simply annotate the fact that one have to perform the hashing two times.

Of course we're 'overthinking' it again, but is the above solution viable?


I'm no expert, but I think you could just convert a database via function composition:

    new_hash = (bcrypt new_work_factor) . old_hash -- new hashing function

    new_hashed_passwords = map bcrypt old_hashed_passwords -- convert the old hashed passwords to new
Of course, this will fail horribly if (bcrypt new_work_factor) is somehow an inverse (or partial inverse) of old_hash. It could also fail horribly if (bcrypt new_work_factor) maps it's input into a low "rank" (sorry, I'm a mathematician, not a crypto expert) region of old_hash's domain.


But if one of those two properties where true, that would probably give you some hints into how to attack bcrypt.


Imagine this Python code (I'm using SHA1 iterated multiple times, basically PBKDF2):

    import hashlib
    hashed = password
    for i in range(50000):
        hashed = hashlib.sha1(salt + hashed)
Provided we also store the number of iterations (along with the salt), and provided I didn't do anything stupid above, we could simply add more iterations after these 5 years and update hash and number of iterations field. Would it be a viable solution?


Yes.


So the original question remains, is it possible with bcrypt?


I don't know, but Oliver Hunt suggested just validating the password on the next login and upgrading it on the fly, which, to be honest, is what I'd probably do.


  import bcrypt
  hashed = bcrypt.hashpw(password, bcrypt.gensalt(log_rounds=13))
Increasing log_rounds by one increases the work factor exponentially (2 * * log_rounds).


I rigged up a small test on my Macbook. Do you think 50,000 iterations would be enough for a general website (such as HN)?

    import timeit
    
    t = timeit.Timer(stmt="""\
    def test(pwd, n_iter):
        for i in range(n_iter):
            pwd = hashlib.sha1(pwd).hexdigest()
    test('hello', 50000)
    """, setup='import hashlib')

    print t.timeit(100) / 100
    
    >>> 0.126629960537


If a user logs in, and their work factor is the low one, authenticate them, and then calculate the new hash with the higher work factor, and save the new hash and work factor.


Follow-up question: I assume this also means that it would result in new hashes for the same passwords?


Yes. But from what I understand, that's even the case if you bcrypt() the same password with the same load factor multiple times, as it uses a random salt.


My issue is asking a lot of people to change their password because I've decided to change my encryption algorithm. Is there a best practice for upgrading encryption without forcing users to do that? Something like when the user logs in, hold onto their plaintext password for a bit, confirm it's correct against your current algorithm, and then re-encrypt it with bcrypt?


Why not do it behind the scenes?

Simply use the existing MD5/SHA1 hash as input to bcrypt and update all password hashes in your database in one go. Then, whenever the user logs in you first apply the old hash function followed by bcrypt before comparing with what you have in the DB.


Nice, thanks. This is why I asked. That approach had not occured to me.


Good question.

My gut instinct would be to do just that but I wonder if there's a better way. You'd probably also want to track the encryption of each user so you know when you can make the final switch.

Another alternative is to just send "update your password" emails to everyone framing it as an improvement to your site's security.


>Another alternative is to just send "update your password" emails to everyone framing it as an improvement to your site's security.

I guarantee everyone with a decent spamfilter will miss those e-mails - that looks exactly like a standard phishing attempt.


HN, is this the consensus? I thought certain hashes worked fine when properly salted?


The problem is that it's relatively easy to brute-force attack even a salted hash today.

On the first day, people stored passwords in plain text. If someone got access to your database, they had all the passwords.

On the second day, people decided to hash the passwords so that people couldn't unencrypt them. Then the attackers created rainbow tables that correlated each hash with its associated password since the password would always hash to the same value (so you have "109BCA" in your database, but they have a table that has that hash and that "12345" hashes to that value).

On the third day, people decided to salt the hashes to render the rainbow tables ineffective. Now, each password would hash to a different value so they couldn't just look up the password for a given hash. However, as computing power increased it became easy to just brute-force the password. You have the hash and you have the salt so you can just try every password with the hash until you get a match.

  hashed_password = "ACBDEF1234567890"
  salt = "12345"
  possible_passwords = ["password1", "ilikesheep", "micro$oft"]
  possible_passwords.each do |pass|
    if Digest::SHA1.hexdigest("#{pass}#{salt}") == hashed_password
      real_password = pass
    end
  end
The problem is that code like that has gotten really cheap to run and it's incredibly parallel (you can have loads of machines each take a piece of the workload - oh, and hopefully no one will make a joke that you'd never write that in Ruby; I just felt that would be easy pseudo-code to demonstrate). You can just try combinations of passwords at random, but there are lists of more common passwords that you can try first making it go even faster. Hashing algorithms are meant to be fast because they're meant to be usable on a large piece of data. As such, it also becomes fast to brute force check all sorts of combinations.

On the fourth day, people started using things like bcrypt because bcrypt was made to be slow. The fact that bcrypt is slow means that if someone wants to brute force it, it will be very slow going for them. If one can check 10,000 SHA1 hashes in a second, but only 10 bcrypt hashes in a second, it will take 1,000x longer to crack the bcrypt stored password (those are just made-up numbers as an example).

Salting is better than not salting because they have to brute force the password. However, as computing power increases it isn't so much better because brute forcing is becoming easy. One needs to use a slow algorithm to make sure that cracking it will also be slow (prohibitively slow). Bcrypt also allows you to specify how much work you want it to have to do. This way, as computing power increases, you can increase the amount of computing power needed to compute the bcrypt. By contrast, hashes are meant to go fast and so every day they're getting less secure.


Salting was introduced long before rainbow tables were invented.


Salting was introduced in the 1970's to combat the underlying class of attacks that rainbow tables optimize. Think of rainbow tables as a compression scheme; the attack is precomputation.


That said, I don't know why, with today's computing power, people don't make rainbow table chains longer (or shorter, it's been a while since I read the paper) to save on disk space.


And with GPUs rainbow tables are gone - it's now quicker to calculate an MD5 than read it from disk!


> And with GPUs rainbow tables are gone

Not exactly. I could have a rainbow table for every possible password eight characters or fewer, and find it in a rainbow table in log(table_size) but to brute force it might take several days. GPUs make it so you can generate bigger rainbow tables faster too.


Yes, this is consensus. "Salting" isn't actually a real security practice. See: http://chargen.matasano.com/chargen/2007/9/7/enough-with-the...


That article actually states that salting is a real security practice, is a necessity, and that it's been in UNIX since 1976,


He may be saying that because I've been making fun of people who use the word "salt" (in most other situations, crypto people would call that a "nonce"); I have in the past made the point that real crypto people never say "salt".

But then Eli Biham used the term extensively in his HAIFA framework and I lost the argument. I'll still make fun of the word (salt! smoked salt! peanut salt! hah!), but I can't say no crypto people use it. I wish they wouldn't, though.


That argument sounds like the German balloon fliers, who insist that they "fahren" (drive) the balloon and don't "fliegen" (fly) it.


I was more talking about bcrypt being consensus with that post. Salting slows down cracking attempts by a factor of N where N is the length of the hash; md5 and sha1 are so fast that you shouldn't be banking on salting to protect you in 2010. Also, you should assume that if an attacker has your database (with salts), they also have your code, which tells them exactly how the salt was applied. In that case, salting doesn't slow you down at all.


But you're talking about defending against brute force attacks? Salting's strength is in defending against rainbow tables. It seems the best policy at this point is to use both as they defend against different things and are not mutually exclusive.


I think the point is that you should use bcrypt and salt at the same time.


You don't need to do anything like that; bcrypt does all this stuff for you. Don't salt bcrypt.


I think people would be a lot less confused if you said this first. Say "Don't salt bcrypt, bcrypt salts for you" and you'll have to explain yourself less.


The salt does exactly one thing: it makes the attacker pay to get each password, as opposed to making them pay once and get all of them. When the cost is a fraction of a second on a commodity CUDA setup, it's not really that helpful to make an attacker pay it multiple times.


In a case like the Gawker incident, it isn't safe. If the source code to the password hashing algo is compromised (it was) then the salt becomes useless. In short, just use bcrypt.


Salting and choice of hashing algorithm are orthogonal. If it takes X amount of time to brute force a password of strength Y and you do salt, then an attacker can crack N passwords of strength Y in time NX. If you don't salt, then an attacker can crack all* passwords of strength Y in time N, which is a huge difference.


Alas, I stand corrected. Thanks. At the same time, the author does have a good point that very slow algorithms are fundamentally better for security, but I'm glad you pointed this difference out. Classic case of me not thinking things all the way through.


No. No no no.

The source code to the hashing algorithm means nothing. It is already open source!

The reason that the salt is there is to prevent against rainbow tables.

The salting did NOT become useless. If they had not salted passwords, then many many more passwords would have been broken because instead of having to brute force each and every one, you'd just look it up in a massively large hash table.


No, he thinks the salt is stored secretly in the source code. Of course, it's not; you store it right next to the hash so that you don't have to use the same salt for every password.


Micrsoft Active Directory stores what it calls an NT Hash in NTDS.DIT files on domain controllers.

These are unsalted, md4 hashed, unicode strings.

They are fast and easy to crack if you ever do get your hands on them. The point is that many big companies don't do passwords right, so why expect Gawker to do so?

Edit: md4 is not a typo. They use md4 not md5... OK.


And that's not the only place they're stored (assuming the default cached credentials setting is still in place). Also, with the advent of pass-the-hash, you don't need to crack Windows AD passwords to use them anymore. Just having the hash is enough for loads of fun.


Worse, Windows up to Server 2003 by default stores LAN Manager hashes as well. To describe the cracking of these as 'trivial' would trivialise the work 'trivial'.


The part that I don't quite understand about bcrypt and the "scales with hardware speed" claim, is that, as I understand it, validating a password requires 3 parameters; password string, salt, and cost factor.

If you start out now with some given cost factor, that is unfeasibly breakable with modern hardware, that factor will remain stored somewhere, presumably in the database. Once computing hardware speeds up to the point that your factor is now practical to break, you can increase the cost factor for new passwords, but older ones remain susceptible to cracking. The only option would be to re-encode them periodically with the new factor to keep them secure.

Can anyone clarify if I've understood this correctly, or if I'm missing something fundamental about bcrypt? I've looked over the usenix paper, but I can't see anything obvious to confirm one way or the other.


If you're using bcrypt then the client has transmitted the password to you. You can store a higher cost hash after verifying the password against the old hash.


If an attacker steals a DB of passwords, it is only a matter of time before computing power catches up and is able to crack the list, regardless of the methods used to hash the values. Advances in cryptography are rare, advances in processor capabilities are not. bCrypt may be the best we can do at this point to delay this inevitability, but programmer's shouldn't come away from this thinking that using bCrypt "solves" this problem. An interesting question related to this is how long a time period is considered "safe" enough to protect a stolen list? If the list is protected from brute-force cracking for 3 years after theft, is that enough time to render the passwords unusable? 5 years? It seems like the answer to this question would be used to calculate an appropriate value for bCrypt "speed".


I made an attempt to implement bCrypt on the last web app I built. The problem I found with it is that there was no robust implemention of the algorithm for the tech stack I was using (J2EE) - I'm not sure whether that is the case outside of Java. jbCrypt was the only thing I could find, and if you look at the source, it really is a poor implementation. I could have gone ahead and rolled my own implementation, however I'm no security expert, and much prefer to rely on something more robustt then my own implementation which in all likelihood would have bugs in it.


I googled [bcrypt api] to see how hard it would be to wrap with JNA (my guess: not very), but the first hit was "jBCrypt - strong password hashing for Java." So I stopped looking.


What did you end up using? What are good solutions on a java platform?


The Java implementation linked from the article, jBCrypt, http://www.mindrot.org/files/jBCrypt/jBCrypt-0.2-doc/ uses java.lang.String's compareTo() function (lexicographical comparison) to validate the hashes of passwords. Is there cause for concern about this in practice, considering it will probably have varying runtimes for different match lengths? (I realise that running in a JITted VM means I can't predict the assembly language produced anyway)


What the article tries to say is, that your password hashing function needs three attributes:

* it has to be dog slow (to make brute forcing hard)

* it should be complicated enough to avoid collisions (this really applies to most hash functions)

* it should be suitably salted, to avoid rainbow tables

-> bcrypt is designed like this, if in doubt, use it


Some everyone is saying how easy it is to crack md5. Say I have an md5 hash and I know it was created from a 1000 byte string. How long would it take to enumerate all 1000 byte strings that could create that hash?


Actually you can (almost surely) stop cracking at 16 bytes, because that's how long MD5 digests are and any more bits than that are going to give you hashes you've already seen. You won't get back the original string but you don't need the original string.


Actually, you have to go a bit beyond 16 bytes to be sure.


I think sooner or later computing power is going to increase to the point of making bicrypt look like md5 or lesser,i think what matters is protecting the hash.


Its really simple , AS LONG AS the user uses a weak password, using bcrypt or not wont protect him. Why ? Well instead of brute forcing the hashed password i'll directly try to bruteforce using the normal login method of your site (even if you rate limit my login attempts it wont take that much time...(see proxys)(if you are thinking about rate limiting per username etc you suck).

If you need yours users account to be safe just force them to use a strong enough password

hashed(password + salt) = epic win


I think rate limiting per user is perfect. And if the real person wants to log in while someone else used up their attempts, do a quick email confirmation.


"do a quick email confirmation". And what happens if one or more of your users gets targeted for a long period of time ? You will force them to open there inbox every time they want to log in your site ? And this gets even better if they target your site generaly, it will be a lot of fun for the majority of your userbase to have to do that "open inbox" step, bet users will love it :)

Sorry mate but your method sounds easily exploitable ... heck using reCaptcha would be less punishing for the user than your approach.


please explain what you mean by rate limiting and why that's no good.


In some period of time a user is only able to do some amount of login attempts, after that he has to wait until he can try again to login. You can do that per IP or per username, doing it per username its not good because someone can abuse that to block the genuine user to log in. Doing it per ip is the best option you have and i didn't say that its not good, what i said is that if the user uses a weak password even if you put a rate limit they will be able to find the password soon enough.


When seeing the title, I was expecting an article about CMAC or HMAC or even PBKDF2-like function but it's the "old" bcrypt.


You say that as if it's a bad thing.

EDIT: Oh, you mean it was similar to other submissions. Fair enough.


Not really. Just it was already mentioned sometimes ago about the fixed cost function in bcrypt:

http://news.ycombinator.com/item?id=762708


I use a different password for every website. It is easy to remember, I just have a portion of my password that changes for every website. For example:

dg76fb23S for Facebook, dg76fb23S for hacker news

Been working great for years :)


>dg76fb23S for Facebook, dg76fb23S for hacker news

So you use the same password for everything.


Your Facebook and HN passwords are the same? ;)

And if someone ever sees these in plaintext, it's trivially extrapolate-able to other sites. Dunno.


What if there are no passwords to start with. If users do not enter a password and instead a hashed key is generated for the device used by the user which then encrypts everything before it is stored on the server.

This key could be generated in real time and would not be displayed anywhere on the form and will be transferred in stealth mode to the server.

With no passwords to enter or transmit, there will be nothing to hack.... If the key generated in origin is itself a strong key decrypting information stored on the server will require first hacking the key which if not stored on the server in the first place will make life hell for hackers as they will require access to the individual devices as well.

Cheers, gurudatt


I'm not so sure. The other side of the coin is that ``{MD5, SHA1, SHA256, SHA512, SHA-3, etc}'' have had extensive peer review in the cryptography world.

There's a long history of ``clever new ways'' to use existing crypto algorithms turning out to have serious flaws -- a good example is early attempts to improve the strength of (56-bit key) DES by encrypting three times with three keys. This turns out to introduce enough non-randomness to make the result much weaker than one might expect; standard 3DES works by encrypting with the first key, decrypting with the second, and encrypting with the third, which results in very different properties of the output cyphertext.

I'm not saying BCrypt has the same sort of issues, but I'd like to see some cryptanalysis of this before trusting my users' data with it. Notably, there seems to be no links to such analysis on the BCrypt home page -- not even an argument from the author as to why this code should be cryptographically sound.


No. Use Bcrypt. Always.

Bcrypt is backed by Blowfish, designed by Bruce Schneier. Go look it/him up. It's secure.

MD5/SHA1/etc are not weak because they are cryptographically weak (though some are), it is weak because they are fast. SHA3, when it is picked, will still be a very bad choice because it too will be fast.

So, why Bcrypt? Well, it uses Blowfish. Blowfish has a very slow key scheduling algorithm which basically involves a lot of hashing to get the round subkeys. Bcrypt makes this even slower. So what? Well, with Bcrypt you could set it up to take .3 seconds to verify a password. Try bruteforcing on that.


Indeed. Blowfish is also standardized and has withstood peer review, cryptanalysis and has otherwise been baptized by fire. Plus, if it's good enough for people as picky as Theo DeRaadt, it's probably good enough for you.


Bcrypt is backed by Blowfish, designed by Bruce Schneier. Go look it/him up. It's secure.

Well, yes, I trust the Blowfish cipher (and GP probably does too). The question is how we can be sure it isn't being somehow misapplied, i.e. whether the way bcrypt uses it opens some other hole. That's what I (and probably GP) would like to see an expert weigh in on.


Bcrypt is recommended by all the relevant experts who haven't heard of scrypt(1). Those who have(2), use scrypt because it's got a better built-in Moore's Law-defeater than bcrypt.

(1) http://news.ycombinator.com/item?id=601408

(2) http://www.chromium.org/chromium-os/chromiumos-design-docs/p...


(a) Virtually nobody has heard of scrypt; contrary to HN conventional wisdom, Colin is not yet a world-famous cryptographer. Give him time.

(b) There are operational reasons not to use scrypt, one of them being that there is no reference implementation with broad language bindings.

(c) The specific improvement scrypt makes over bcrypt is not yet relevant; nobody has ever hardware-optimized a bcrypt cracker, and the project that successfully does so and publishes their results will have made a contribution to cryptography literature.

(d) Even when bcrypt starts to face down hardware crackers, it doesn't "lose"; you simply have to increase work factors to compensate.

(e) You don't even have to use bcrypt; you can use PBKDF2, which simply iterates SHA1 a tuneable number of times. Bcrypt is better than PBKDF2, but every adaptive hash, PBKDF2 included, is in a different and better league than "salted hashes".


The specific improvement scrypt makes over bcrypt is not yet relevant; nobody has ever hardware-optimized a bcrypt cracker, and the project that successfully does so and publishes their results will have made a contribution to cryptography literature.

Sure they have. A hardware-optimized bcrypt cracker is called a GPU. I can buy a 480-core GPU on Newegg for $350, but it doesn't come with any more RAM than a low-end PC does.


URL to the source code, please.


I am looking right now at Schneier's reference implementation of the Blowfish key setup function, and there is nothing in it that would not be straightforward to parallelize across a CUDA warp (read: no conditional branches). Is it your assertion that getting from here to a working bcrypt cracker would be sufficiently difficult as to constitute a significant "contribution to cryptography literature"?


You win this time, Franke. This time.


Are you earnestly conceding, or are you just bogged down in enough other arguments that you don't want to deal with this one?


I'm earnestly conceding.


True. But this article was about Bcrypt, so I'm writing that instead of Scrypt.

Edit: I defer to tptacek.


What it means to "open a hole" or what a "hole" would even be are hard to imagine in the case of bcrypt.

First of all, none of this security even comes into play until after your password file is stolen, but your and jim's comments imply you believe that using bcrypt would somehow make an existing site less secure.


Question, I use SHA512 for my passwords; is that also "fast" in comparison to bcrypt?


If I can believe this ( http://www.cryptopp.com/benchmarks.html ), SHA-512 can still hash data at tens to hundreds of megabytes per second on a single couple-year-old core. Bcrypt lets you configure the slowness of your key generation with a "work factor", so I suppose with a work factor of 1 it'd probably be pretty fast too, but with a work factor of 10 it takes me roughly 100ms to hash 1 10-byte password.

So yeah, SHA-512 gives you a much larger keyspace, which is awesome, but isn't demonstrably slower than MD5, SHA-1, etc. Which was one of its design goals (and also one of the goals of all entrants in the ongoing SHA-3 competition).


SHA512 is very, very fast compared to Bcrypt. It's only slightly slower than SHA1 or SHA256.

Source: http://www.cryptopp.com/benchmarks.html


Thanks for the answer and the link. My salting algorithm is very strong but I hadn't considered the idea of renting a bunch of EC2 instances and brute-forcing hashes because the hashing algorithms are so fast.


You shouldn't get downmodded for a comment that demonstrates exactly the moment of enlightenment Coda was trying to get people to have about password hashing. Let me fix that for you.


What do you mean by "my salting algorithm is very strong"?


Instead of just concatenating a salt to the password string, I use a dispersion method. I first concat the salt to the beginning of the password and SHA512 that. I then have a globally configured list in my app (it's different for every app I produce) that defines at which index, in the hashed salt+password digest, chunks of the (same) salt are sliced and interspersed.

Given the same list of indexes, I can then "find" the salt of a stored password hash and run a given plain text password through that algorithm.

But, as has been stated, that effort is completely null if the SHA512 algorithm is fast and brute-forcing it only takes a handful of rented GPU instances...

[EDIT] Now that I think about it, if the Gawker attack were to happen to me, then the attackers would also have the source code and can get the salt dispersion list... So this is, in hindsight, kind of pointless.


Poor mans solution here is to iterate your fast hash a number of times.

This is essentially what bcrypt does for you anyway (in a really crude sense).


The question is not whether BCrypt is backed by Blowfish, the question is whether BCrypt uses Blowfish in a way which is cryptographically sound. If it does not, then an attacker may not need to use brute force. Assuming the author has read more of Mr. Schneier's book than the quoted preamble, he should know this -- Schneier discusses this at length in both editions of Applied Cryptography.

Again, an example of this is the 3DES encrypt/decrypt/encrypt process vs. a more naive encrypt/encrypt/encrypt process. One is substantially stronger than the other. One is a secure way to use DES, and one is not.


Don't use DES-EDE.

Schneier all but disavowed _Applied Cryptography_ in _Practical Cryptography_.

Schneier's reputation as a cryptanalyst is, as even he might concede at this point, somewhat outstripping his actual career.

Bcrypt is part of the academic literature; the people who wrote it are both renowned.

You can make your same critique about any other crypto construction; maybe the OCB block cipher mode is unsafe! After all, Bruce Schneier didn't write it!


Bcrypt has been around for ten years. No one has broken it yet. Is this perfect? No. But then again, we don't know that the implementation you would pick for MD5, SHA1, etc are perfect either. You take the best you can get.


You're speculating about the existence of a flaw which is there no evidence to believe could exist. Just because some crypto constructions are not as sound as naive theory suggests does not mean all constructions are flawed, or even capable of being flawed.

bcrypt is not an encryption algorithm. It doesn't "protect" your users' data, so there's no reason to trust or not trust it with their data.


Yes there is. A hash function does protect a user's password.

If you don't believe me, consider this hash function

H(A) = (A>>1)&0xFFFFFFFF

There. Hash function. It sends any input to a 32 bit value. Would you use it for your password though? No. I certainly would not.

Granted, that is an incredibly weak example, but it's one that's easy to see why it's weak, and thus why a hash function does protect a user's data.


ok, that's a fair point. i guess i just believe bcrypt does a better job than that. :)


"A Future-Adaptable Password Scheme" by Provos and Mazieres. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.80....

An HTML version appears to be here: http://www.usenix.org/event/usenix99/provos/provos_html/node... Some of the links, particularly the main page, appear to be broken.


As with the website, that paper discusses only the speed of BCrypt. There is no cryptanalysis at all, and no indication that BCrypt was submitted anywhere for peer review of its cryptographic soundness.

This doesn't mean BCrypt is unsound. It does mean that I would want to see such analysis before using it.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: