"Use an accepted key derivation function, such as PBKDF2, bcrypt or scrypt".
All the work described is encapsulated into the concept of a key derivation function. Telling people to go into the details is like telling people how to do encryption by describing how to implement AES and Twofish. Don't go there. Find a library that implements a key derivation function, and use it. PBKDF2, bcrypt and scrypt are all fine.
"Use an accepted key derivation function, such as PBKDF2, bcrypt or scrypt, with accepted parameters".
What is "accepted" changes over time, of course.
Search for the word "weaker"
Basically, bcrypt randomly generates a long, probably-unique salt for every hashed password, and the salt is stored along with the hash by simply concatenating it to the string (e.g. “randomsaltZdR3/passwordhashMP2tA”). To check an incoming password against the stored hash, bcrypt splits the stored string on ‘/’ to extract the salt, hashes `password+salt`, and compares that hash against the one to the right of ‘/’ in the stored string.
This means that you just whack the hash and the password the user entered into password_verify(...) and it will tell you if you have a match - you don't have to keep track of the four parameters.
The beauty of that when a stronger algorithm comes out or you want to increase the cost factor of your passwords, you just change the hashing code and you don't have to do anything fancy to not break all your old hashes.
You can also use the password_needs_rehash(...) function once you verify to see if you should rehash the password to bring it up to the new level.
The compatibility library to use for PHP <5.5 is Antony Ferrara's password_compat: https://github.com/ircmaxell/password_compat.
The most recent version of the ASP.net Membership provider offers this hashing algorithm as a built in option.
I can see that scrypt has an obvious advantage over both, since it is memory-intensive unlike the others. But when comparing bcrypt with pbkdf2, I can't seem to find any reason to prefer one over the other.
bcrypt is everyone's darling at the moment, but why so little love for pbkdf2? PHP 5.5, for example, bundles a very convenient password_hash() function that defaults to bcrypt, whereas hash_pbkdf2(), which is also new to PHP 5.5, won't even generate a salt for you.
Is pbkdf2 broken in some way, or do people simply not trust it because it's just a bunch of iterations of sha256?
On the other hand, has anyone discovered a weakness in bcrypt and/or the underlying blowfish algorithm?
On my computer, pbkdf2 with 20,000 rounds of sha256 takes about as much time as bcrypt with a cost factor of 10.
The reality is, bcrypt, scrypt, PBKDF2: throw a dart at them and you'll be fine no matter which you hit.
I'm also unclear on which "assumptions" you believed bcrypt depended. Could you be a little more specific, please?
So if you say bcrypt is as well vetted as PBKDF2, I'm going to defer to your judgment and update accordingly.
You could also feed the output of bcrypt into PBKDF2, but as far as I'm aware, nobody is recommending against straight PBKDF2/SHA256 for key derivation.
As I said in another post, use adaptive hash function (bcrypt, scrypt or pbkdf2). And if you do it with limited computing power, my preference is to use PBKDF2 over bcrypt (scrypt is of course out of question).
The costs are based on what was considered 'modern hardware' in 2009.
Other one like SHA256, SHA512 are supposedly for fast, cryptographically strong hashing. At least that's how I view it.
> If you have special security needs, enforce a minimum length of 12 characters and require at least two letters, two digits, and two symbols.
I hate when passwords require certain patterns. See http://i.imgur.com/hvyziAx.png from USCIS.
And I think there is one point missing: limit the max number of characters for the password. As shown in Django's password DoDS attack, don't allow long password takes up 1MB.
I think people started doing this when passwords were typically short, to increase the search space. Although one unwanted side effect is that it also reduces the search space, since if you know that pattern requirement then you know not to brute force anything that doesn't have two letters, two digits etc.
But assuming a required pattern of a certain length does the job (whatever the system administrators think "the job" is), I wonder how much longer a password needs to be, with no pattern requirement, and be as difficult or more to crack than the shorter patterned password.
The worst I have seen is from USCIS. See this image: http://i.imgur.com/hvyziAx.png
Good luck with that!
What annoys me most is when systems have a maximum password length which is something on the order of 10-12 characters. It's needlessly reducing the search space.
This may help: http://xkcd.com/936/
"In Django 1.5.5, we’ve reverted this change and instead improved the speed of our PBKDF2 algorithm by not rehashing the key on every iteration."
Incidentally Bcrypt does have a 56 byte limit which I understood to be a Blowfish limitation (you can use longer bcrypt passwords by doing a SHA256 pass first)
This is incorrect; bcrypt has a 72-character limit. Try this in python:
import bcrypt, sys
salt = "$2a$10$2TmO7iAhRfimvNwvpBn.7e"
print bcrypt.hashpw(sys.argv, salt)
$ python pass.py 12345678901234567890123456789012345678901234567890123456
$ python pass.py 123456789012345678901234567890123456789012345678901234567
$ python pass.py 123456789012345678901234567890123456789012345678901234567890123456789012
$ python pass.py 1234567890123456789012345678901234567890123456789012345678901234567890123
True, but: Beware! Imagine you're doing bcrypt in C, and the first sha256 output byte is a 0-byte. bcrypt() sees the NUL / empty password and produces a hash trivially breakable by anyone aware of this issue. :-)
So I'd recommend doing Base64 on the sha256 hash, if you're using C and you want such long passwords with bcrypt. But plain bcrypt will give the best results.
Hopefully the above illustrates the general wisdom of avoiding DIY crypto of any kind.
edit: code formatting
Maybe it's changed but the paper which defined Bcrypt specifically states:
"Finally, the key argument is a secret encryption key, which can be a user-chosen password of up to 56 bytes (including a terminating zero byte when the key is an ASCII string)."
which is a subpage of:
"Note that much confusion on the web about key lengths with bcrypt ('72' vs. '56' bytes) comes from the fact that there are TWO algorithms called 'bcrypt' which both happen to use Blowfish. One is an encryption algorithm, and the other is a hash algorithm. While they share a common core component, the purposes are thereby entirely different. For the sake of the bcrypt HASHING/key-strengthening algorithm (which we care about now), the 72-byte input parameter is in no way a theoretical problem at all, even for non-Latin UTF-8 based passphrases which eat up 3 bytes per unicode point. The reason is because the text-based passphrase itself needs to 'somehow' be converted into 18 words of 32-bits each anyway. (If we were _encrypting_ then the 56 bytes limit for THAT algorithm would come into play.) Even if you restrict your attention to ASCII it would not be ideal to simply convert ASCII code points to a (zero-padded) juxtaposed stream of 32-bit chunks anyway, because, well for one thing, you would be throwing away entropy due to not using the upper 1-bit range in each char, not to mention the range of unavailable non-printable ASCII characters."
I think this is actually a great idea. Imagine someone steals your database with 100,000 user accounts, and wants to find a few users who used the password: "password123". They could easily take each salt and calculate password hashes in parallel, and would probably find at least one user account using that password. However, if the hashes were also encrypted with an unknown key, then none of your user's passwords could ever be cracked.
I'm actually surprised this is the first time I read about encrypting password hashes. I think the Rails gem "devise" should do this by default.
if you store the user_salt and the output of bcrypt(code_pepper + known_pass + user_salt) in your db, guessing the pepper requires comparing the bcrypt() output of every random pepper. you are as unlikely to brute force it as the plaintext pass of other users.
this type of pepper is essentially a mechanism for key stretching 
Seriously, this is a terrible mindset when dealing with security. Stop it.
If it's just added (but not counted), then it'd be like asking for a password (XXX) and always prepending something to it before hashing (saltyXXX) -- at which point -- what value are you getting out of it?
I don't believe an /uncounted/ common salt is harmful - but I don't see any use on its face.
Private User Data should not be changed by many things. That is "PRIVATE" and therefore should be under tighter control than the rest of your data.
Don't try to outsmart the likes of cpercival and tptacek by inventing your own goofy hashing scheme. Just use bcrypt or scrypt as they suggest, the reasons for which they have explained repeatedly on this site for years.
My "Goofy" scheme is based on the practices which Microsoft, LinkedIn, and FaceBook all use. You know which of those use Bcrypt? Zero.
Maybe you shouldn't blindly follow things people say.
Source? The typical benchmark is 8ms for a generation of a bcrypt hash on a modern machine.
> My "Goofy" scheme is based on the practices which Microsoft, LinkedIn, and FaceBook all use. You know which of those use Bcrypt? Zero.
Again, source? PBKDF2 and scrypt would be acceptable alternatives and both can take just as much time to compute as bcrypt (or more!) depending on how many rounds are set.
The rest of your advice in these comments (esp. your salt 'mechanism') is wildly off-base and in many cases, actively promotes worse security without any evidence to back up your claims.
The gold standard if you are worried about DoS and need security vs. stolen db right now is to encapsulate everything inside an HSM and have a symmetric algorithm. Unfortunately HSMs today are shit -- $30k, slow, a hassle, etc. I'm testing some cheap stuff to do this cheaply and sanely (USB connected, etc.).
If you can scale out your frontends, use bcrypt/scrypt/pbkdf2 with a reasonable work factor.
Thankfully, rate-limiting login/registration forms is pretty straightforward (although the metric choice, not so much) and is good practice. If you can HTTP 429 them before they can actually submit a password to your service you're mitigating a large part of the problem.
>My "Goofy" scheme is based on the practices which Microsoft, LinkedIn, and FaceBook all use.
I don't want to be sarcastic here, but you're seriously going to cite LinkedIn? Really?
And your excuse is "people who disagree with me in an unrelated argument cite LinkedIn, so I'm doing it too"?
These two statements contradict each other.
One problem though with this guide, some experts think it makes sense to add some extra computation to make it a little bit more difficult to compute the hash, for example creating a hash loop which rehashes 10,000 times or something similar.
In the past, I've found the following php tutorial really well written, and comprehensive:
Lesson learned, never reuse passwords.
The only one that I can think of is that an eventual MITM can't sniff your password, and then try to reuse it with other sites (as he only has an hash specific for a site, and not your password). Is that all?
The main reason we encourage people to use key stretching algorithms on the server is that, if an attacker gets access to a database of password digests that aren't very strong, they can trivially be reversed.
Doing this key stretching or "password scrambling" on the client side simply moves the processing burden from the server to the client. There is nothing less secure or less useful about it.
Until such time as these things are readily available, recommending that people do client-side hashing is absolutely going to result in trivially poor implementations.
You might want to consider that if these problems were as trivial as you seem to believe, there would already exist a library vetted by cryptographers to do exactly that.
Anyway you look at it, there's no good reason to instantiate use a fresh SecureRandom instance every call.
Anyone fancy taking a look to see if I understood correctly?
EDIT: Specifically, should the PBKDF2 "key", not be used for something (like HMAC) rather than being used as the hash itself?
hash, err := bcrypt.GenerateFromPassword(byte(password), 10)
err := CompareHashAndPassword(hash, byte(password))
It seems like a red flag that it took more than 50 lines of code to do this simple thing, though.
Using the output of 1,000 rounds of PBKDF2 as an input to HMAC wouldn't get you any extra security.
As a matter of fact, the article you reference even says so: " “Don’t Hash Secrets” is not always entirely necessary. In the password example, you can hash a password as long as you salt it correctly".
Using a HMAC may be useful since you can add an extra secret that's only known to a separate server or a HSM. The original article describes that in the section titled "Impossible-to-crack Hashes: Keyed Hashes and Password Hashing Hardware".
I'd like to find the same thing for encrypting all the data of users in a DB.
If you have a password recovery page, couldn't attackers use that to test usernames?
I've seen tons of sites that tell you "invalid username or password" but will happily tell you if the username is registered if you try to register it. It adds no security and just inconveniences users.
Some sites will tell you "Email not on account."
I've seen a few that say "We've emailed you instructions on resetting your password."
A. Your Salt should ALWAYS include a feature you have to store anyway about the user that would be unique to your service. Use multiple if you need to. This applies no matter what Hash mechanism you use. Also include a calculated value that isn't stored.
UserName + PassWord + User ID + CreationDate + <Numerals from a Base64 encode of the users first and last name>
Don't put all of this in the same database, so that a Data Breach won't give an attacker all the data they need.
No Rainbows for that combination. If a user changes their user name update their hash.
This is why in good services you have to provide your password when changing your username even when you are logged in.
B. To Validate the User Name:
Step 1 is always "Check that the password meets your password Criteria. If you have a requirement that passwords are 6-24 characters, contain 1 upper, one lower, and one digit, then make sure the password has those before you do anything else. No database lookups for bad passwords. No running bcrypt on a 4 megabyte section of post data. This is how you keep Denial of service attacks from crushing your service.
C. MD5 and Sha1 are only bad because there are rainbows for them, and because you can calculate every possible combination of values for a user/pass quickly if you know a single user/pass in the system you can calculate the salt methodology if you have access to the source code as well.
I won't tell you to use either of these, but if you lose your source code, and two databases worth of data so they have user, pass, and user data, you did something else horribly wrong, or it was an inside job and nothing would have saved you. (bcrypt is expensive if you have millions of users a day logging in and not any more secure if you did the other things above.)
No sane system follows the "store the data in multiple places" design; you'd need to pull it all back together to validate the password anyway, and a breach is extremely likely to compromise a system that can do exactly that.
Similarly, creating a salt from a calculation rather than, say, taking 128 random bits, is not going to help, nor is depending on the secrecy of the source code ever a prudent choice. It will certainly produce less entropy than the latter, will not produce a different salt across password changes, and is just very, very hacky.
(Over generalization, but if people can get your source code basically there is zero you can do to keep them out)
Storing data in more than one store is a best practice.
Your random 128 bits have to be stored somewhere. That means an attacker needs only the database, not the source. In any large scale environment you likely have multiple machines. When a machine gets taken out of service at your hosting provider your database might still be on it when it hits the dumpster. If your source code is on another machine it doesn't matter.
The same is true of the Private User Data and the passwords. two locations means you need two breaches to do anything.
As I mentioned in my other post, it really doesn't. Using a calculated value means that the entropy of your salt is only as high as the values used to calculate it.
>Only PHP and JS/Node scripters assume that their source isn't safe in an attack.
So, the vast majority of websites.
>The rest of us assume that our compiled code is safe.
Why would you assume that?
>if people can get your source code basically there is zero you can do to keep them out
If people have your source code, and your passwords are stored properly, they aren't going to have any easier of a time cracking them.
>Your random 128 bits have to be stored somewhere. That means an attacker needs only the database, not the source.
The whole point of a salt is that it doesn't have to be secret.
A breach of your Data should not be a breach of your code.
Always remember: a secure system is secure even if the attacker can see your source code. Any work you do based on assuming the opposite is a waste of time.
At that point, your attacker has not only the ability to derive the salt and attack each password individually, but potentially the ability to generate rainbow tables for subsets of users with identical (or largely identical) derived salts.
The point is that at best, you're adding a term to the overall hardness of the attack. When you use something like bcrypt, you are multiplying the hardness of the attack. There is no comparison between what you've proposed with MD5 or SHA-1 and simply using bcrypt on a random salt, even if the latter has your source code in the latter but not the former.
What is your rationale for this? Is your concern is that a random salt will accidentally collide? With a 128 bit salt, there is a one in a million chance of a collision occurring in a set of 26 quadrillion.
>UserName + PassWord + User ID + CreationDate + <Numerals from a Base64 encode of the users first and last name>
That is just incredibly low entropy compared to a random string of bits. If you do this and use (as you suggest is just fine further down) MD5 or SHA1, you are begging for disaster if your site is worth attacking.
>MD5 and Sha1 are only bad because there are rainbows for them
Wrong. MD5 and SHA-1 both have significant vulnerabilities.
>bcrypt is ... not any more secure if you did the other things above
Insanely, completely, ridiculously wrong. MD5 and SHA-1 are stupid fast. A GPU-based cracker can try 5.6 billion MD5 hashes per second, and 2.3 billion SHA-1 hashes per second. If you're using a long random salt, you might hold out for a while after a data breach before all of your passwords are cracked. If you derive a low-entropy salt the way you explained above, you'll be completely boned.
Although, the XOR comparison implementation seems flawed as it will only compare min(supplied password length, existing password length); which would allow the attacker to identify the existing password length by providing a sufficiently long password. Instead the existing password should be padded to the length of the supplied password in order to hide the length of the existing password.
My issue with the XOR comparison implementation is irrelevant as password hashing should use stretching (ex. PBKDF2, BCrypt, SCrypt) which means the hash of the supplied password and that of the existing password would be the same length.
The implementation of the PBKDF2 is also flawed.
Iterations: The recommended PBKDF2 iterations has long surpassed 10,000 (it's closer to 100,00 now). See here: http://security.stackexchange.com/questions/3959/recommended...
The rule of them is that given a sufficient length salt, the number of iterations should take about 8ms on the hardware it is running on.
Salt Size: The recommended salt size is 128-bits/16 bytes (not 24). See here: http://security.stackexchange.com/questions/17994/with-pbkdf...
That stackexchange question also recommends using SHA512 as it requires 64-bit arithmetic operations which GPU's are supposidly not great at.
I believe I read stackoverflow and most big websites store about 24 bytes of the hash. The salt is generally prefixed to the hash and that is stored (ex. salt size 16 bytes + password hash 24 bytes = 40 bytes).
If I wanted to version a stored password, I'd simply use the first byte as an indexer to select a password hashing function instead of prefixing the hash with the number of iterations which seems non-portable.
Even if you do everything right concerning the hashing of passwords, account security extends beyond passwords - such as alternative methods of authenticating (forgot password, secret questions, authentication tokens). OWASP is a great authority in regards to this: https://www.owasp.org/index.php/Password_Storage_Cheat_Sheet
The OWASP site's password storage guidance is particularly bad, and I recommend that developers avoid that page altogether. The page began as a litany of very bad advice (reversible encryption, salted hashes), and when it was pointed out that secure password hashes like scrypt were the right answer, the authors decided instead to "teach the controversy". OWASP is very political, and not managed by computer scientists.
† The post we're commenting on inaccurately says that the hypothetical timing attack it describes has been done before, but links to Boneh's TLS timing attack paper. My guess is that the password hash attack the author was considering has never been tried; among other things, it stipulates a password hash oracle for the attacker.