For the people not understanding why this discussion is a problem, in synopsis:
In order for a server to use encrypted passwords for authentication, the server must have access to the decryption key -- which means an attacker has access to your decryption key if they can compromise your box. Furthermore, if they don't get your decryption key, but they get your database dump -- they can then just brute your db and decrypt the passwords. They can do this because encryption is, by design, reversible. Encrypting passwords is almost as bad as storing in plain text.
Cryptographically hashing a password is the only way to go. When you cryptographically hash a password, using a known library that has been tested and written by actual cryptographers with passwords/authentication in mind (like bcrypt, etc -- no, sha's and md5's are not safe to use) - you get a one-direction algorithm in that a password goes in, is digested, and a fixed-length hash output comes out. There is no way to reverse this hash back into the original input. Adding salt(s) combats people making rainbow tables against your implementation, as they then would need to know not only which algorithm you are using, how many iterations it does, but also what your salt(s) are (sometimes even varied every so many iterations). This makes it almost impossible for someone to brute your leaked password database.
I'm a bit confused. This article says SHA-256 is a general purpose hash algorithm, but wikipedia says it's a cryptographic hash function. Is the wiki article wrong?
Both are correct. A cryptographic hash function is a hash function that is not invertible, resists tampering and collisions, and has other properties. Using cryptographic hash functions in certain cryptographic settings is appropriate. SHA256("My Super Secret Password") is not one of those settings.
SHA256 can be the basis of a scheme to securely store passwords. There are many different names for that:
* key derivation function
* secure password storage scheme
* password scrambler (PHK's term)
* (...)
A secure password storage scheme uses a cryptographic primitive like a cryptographic hash or a cipher to increase the time it takes to decrypt passwords. That's a good thing, because when it takes your user 0.5 seconds to go from "right password -> on disk hash", it'll take a prohibitively long time for an attacker to guess through passwords.
Okay, that makes sense. SHA-256 for 50 iterations, with salting: is that equivalent to bcrypt in the matter of slowing things down....is it good enough?
I'm not sure what their bcrypt difficulty is, but whatever it is: bcrypt makes it five or six orders of magnitude harder. And remember, bcrypt (like good KDFs) is tunable, so you could make that as hard as you want.
> As a result, the new cluster, even with its four-fold increase in speed, can make only 71,000 guesses against Bcrypt and 364,000 guesses against SHA512crypt.
If I bump up your password scheme by 4 orders of magnitude it's still under <.06 seconds on my machine. And that's with a ridiculously inefficient password generator I banged up in ten seconds.
Use bcrypt. Use scrypt. Use PBKDF2. Don't roll your own password stretching function, it's a recipe for disaster.
Thanks very much, all very clear now. I'll try to persuade the people who have a say in the matter...it's tricky when you're not really an expert yourself so you can't fully explain the matter to a perfect extent, and you're stuck saying "but people on the internet said..."
Half the problem is that quite a few standards/compliance groups don't impose the right requirements. PCI audits used to just check for "encrypted passwords."
> PCI audits used to just check for "encrypted passwords.
PCI is a joke. The self-fill audit: "Are you secure?" -- Uh... sure?
The various scanning companies are of differing quality too, adding to the difficulty in getting any real security improvements from the standard across all ecommerce shops.
If you mention Thomas Ptacek of Matasano Security (that's who tptacek is), they might be more inclined to listen. Sadly, my name is of little value and if they had heard of it, they'd probably assume malicious intent.
The problem with md5 and sha and such is that they are fast. bcrypt is useful because it takes a fixed amount of complexity which can be specified by the programmer (and later increased). It also uses salts to thwart rainbow tables. The complexity bit is important, though, because as computers get better, you can make it harder to reverse your passwords.
Also, the biggest reason any of this is a problem is because password reuse is rampant. You are being entrusted with the password to your user's life - their email, their bank account, their facebook account... treat it with respect.
Is there any reason most programmers should know any of this ?
The code should look more like this(and should use a standard library):
import Password_protector
P = Password_protector(date_of_first_use) #library upgrades algorithms every once in a while, only a single algorithm available per date. If there's a need for automatically upgraded hashing library should handle covertly.
protected_password = P.protect_password(password)
if P.compare_password(password, protected_password) == True ...
Or something similar. That at least should cover the basics - safely.
Quite simply, the reason programmers should know about this (if they're implementing a password system) is because the library you propose often doesn't exist, and when it does exist it's often implemented incorrectly.
In an ideal world, programmers shouldn't have to worry about this stuff. But this is not an ideal world.
I was under the impression SHA256 is valid...it's a cryptographic hash function that hasn't had weaknesses identified yet. 40 iterations of that + salting would suffice, no?
the SHA algorithms were designed to digest data as fast as possible -- which things like bcrypt purposefully slow it down plus can perform multiple iterations with the output of one iteration feeding as the input to another.
Because it has no work factor. It's uniformly fast, which makes it very easy to brute force. The general-purpose strong cryptographic hashes --- your SHA2s, SHA3s, and BLAKE2s --- are all amenable to fast implementation on GPU.
What you need is a password hash, not merely a cryptographic hash.
As noted above, this discussion prompted me to investigate my company's methods, a fairly major US website, and I found they're using SHA-256 for 50 iterations with salting. Does that slow things down sufficiently, or is that not comparable to the internal mechanism of bcrypt?
If you change 50 to (say) 1000 you are getting most of the benefits.
The big problem you are trying to solve is "our password database just leaked; how fast can attackers figure out the passwords?" Say you can run SHA in 1 microsecond, and 1000 iterations in 1 millisecond. Even though it's 1000x slower, that's probably an unnoticeable difference as far as your performance is concerned. (YMMV)
But, anyone trying to brute-force the passwords out of your database will now have to do 1000x the work.
BTW, none of this will help you much if your users choose passwords like "abc" or "password".
(IIRC scrypt lets you blow up in size as well as CPU, which makes GPUs impractical for attacking.)
Because a lot of passwords aren't particularly strong, and thus are feasibly crackable when you use something like SHA-256. You can test millions or billions of candidates per second on commodity hardware, and you'll be able to recover a lot of passwords.
The purpose of systems like PBKDF2, bcrypt, scrypt, etc. is to intentionally slow the whole thing down so that it takes on the order of one second to test a candidate. This is a minor burden for normal use, since you don't check passwords very often, but it makes it vastly harder for an attacker to crack a password.
Edit: just to put some numbers on this, it looks like high-end GPUs can brute-force about 2 billion SHA-256 hashes per second. That means that an eight-character alphanumeric password could be recovered from a SHA-256 hash in less than a day, on average. Add 20 non-alphanumeric symbols to your character set and you're still only looking at about six days. Eight alpha+num+symbol characters isn't a terribly great password anymore but it still shouldn't be recoverable if your password database leaks.
In order for a server to use encrypted passwords for authentication, the server must have access to the decryption key -- which means an attacker has access to your decryption key if they can compromise your box. Furthermore, if they don't get your decryption key, but they get your database dump -- they can then just brute your db and decrypt the passwords. They can do this because encryption is, by design, reversible. Encrypting passwords is almost as bad as storing in plain text.
Cryptographically hashing a password is the only way to go. When you cryptographically hash a password, using a known library that has been tested and written by actual cryptographers with passwords/authentication in mind (like bcrypt, etc -- no, sha's and md5's are not safe to use) - you get a one-direction algorithm in that a password goes in, is digested, and a fixed-length hash output comes out. There is no way to reverse this hash back into the original input. Adding salt(s) combats people making rainbow tables against your implementation, as they then would need to know not only which algorithm you are using, how many iterations it does, but also what your salt(s) are (sometimes even varied every so many iterations). This makes it almost impossible for someone to brute your leaked password database.