This is a great example of a bona fide awful trade press article: some guy with a column-inch quota sees a piece by another trade press hack who is in turn digesting a blog post by someone who is "allergic to command line utilities" writing up someone else's GPU password cracking tool. And so, strong passwords are useless!
Except that this has nothing at all to do with why passwords are bad. The problem GPU crackers exploit was solved over a decade ago with adaptive hashing. GPU crackers cannot crack scrypted passwords of any real complexity in tenable amounts of time.
I really hated that phrase 'allergic to command line utilities'. It reminds me of this odd attitude, which used to be a bit more common, that ignorance about computers is virtuous or amusing.
Bcrypt seems to have a big design oversight. Given a password stored with cost factor C1 that you would like to increase to cost factor C2, it seems to me that you have to have the plaintext password to do this. Thus, you have to wait until that user next provides the password, verify it, and then re-store it with cost factor C2.
That's just not true. To oversimplify a bit, let's say BCrypt is a function b(data, cost) that returns a hash.
Right now, I have b(password, c1) = hash_a stored in my database. To increase the cost to c1+c2, I just replace that with b(hash_a, c2) = hash_b in my database. To verify a user's login, I just compute b(b(password, c1), c2) and compare it to the hash_b I have stored.
Or you can wait until the user signs in again and create hash_b at login with the plaintext password after you validate it against hash_a. I believe that is how the bcrypt documentation recommends you do it.
I've read the usenix paper, and I'm reasonably sure I'm right. Care to elaborate on what I'm mistaken about? I (and a few other people I've talked to about schemes isomorphic to this about) may learn something new :-D
OK, so you can chain bcrypt in theory, but if you look at the API and how it stores rounds/salt, I don't think you'll enjoy implementing the code to do so. You can't just store hash_b, you'll need to store hash_a too. It's not worth it.
Basically, the assumption that bcrypt is just a function that returns a hash oversimplifies the detail of how salt is stored in practice.
I wrote a comment to the effect of "you can just chain bcrypt" but deleted it because I didn't want to wade into exactly this tarpit. If a client asked me how to upgrade the cost of their password hashes, I'd probably tell them to validate incoming plaintext and then create a fresh hash.
I agree: this is not a big deal in practice. bcrypt with a cost factor of 10 is so much slower than SHA1 that this is just not going to be an issue.
Seriously though, this is a design decision that cannot be easily solved with any non-reversible encryption. If you were able to increase the cost factor, the you'd open the algorithm to attacks that reverse that process.
Err, no, that's actually not true. Take the case of PBKDF2 -- if you apply 1000 rounds of PBKDF2 to password X to get X' then apply 1000 rounds to PBKDF2 to password X' to get X'', applying 2000 rounds of PBKDF2 to password X == X''. You could increase the cost factor (number of rounds) every time the password is used, for instance.
The finish step is irreversible, so you can't unwind it and add more work later. In practice though, even low cost factors for bcrypt are tons more work than MD5 or SHA. It's not a big deal.
I would say that, given how 'cheap', you could slowly increment the cost-factor over time. You could also ask a user to replace their password after a certain time period, or after the cost factor gets too 'low' - you can even allow the same password to be entered. Having a different cost-factor for every password is a bit of a logistical nightmare though.
The fact is that any significantly high cost factor is much better than md5, sha1 or any other normal hashing function, and should prevent password cracking for 5-10 years. This is a reasonable period of time.
Codahale's article is misleading with salts. Bcrypt can't also protect against brute-forcing or dictionary attacks. Please don't spread that link as an example of good password storage security.
Edit: Since I can't reply to poster. I don't see bcrypt using a unique salt for each password. It's the equivalent of exchanging a SHA256 + Same Salt for Bcyrpt + Built-in-Salt.
Notice how they aren't the same hash? And, how I didn't have to go mine pink salts from the Himalayas?
By spreading the notion that there is something wrong with Coda's article when you really aren't familiar with bcrypt, you're doing people a disservice. They are going to go use SHA1 with a big random salt instead, because they've heard of SHA1 before and it therefore sounds more secure. And they will thus be terribly vulnerable to brute-force attacks.
A preset list of salts would still show a different hash for 2 consecutive bcrypt executions. Your example proves nothing.
I'm not saying you're wrong. I'm saying you didn't demonstrate proof, so you shouldn't act like you did.
I'm extremely interested why bcrypt is much more secure than SHA1 + extremely long hash, so perhaps you could shed some photons on that?
Also, in order for bcrypt to generate a unique salt on two different computers, it needs a unique source of randomness. Obviously seeding by "current time" is insufficient, as this would generate the same salt for two different computers at the same time. So what does it use? Mac address of network card + current time?
I don't need to prove that bcrypt has a "preset list of salts". Read the paper. Or even just Coda's summary of the paper. What an extremely silly objection to raise. Why would any cryptosystem embed a preset list of nonces?
How frustrating to have to argue about a system with someone who clearly hasn't done any reading on it at all. It's a Usenix paper. It's not a EUROCRYPT paper full of math. It was written by systems programmers, then from the OpenBSD project. You think the OpenBSD programmers devised a password scheme with a "preset list of nonces"?
"MAC address of the network card plus the current time"? To get a random number?
[edit: palish deleted his comment; excerpts below for what I was responding to:
> Except random numbers aren't random. They only appear to be. If an online casino seeds its random number generator with current time, then you can do the same, and predict which cards it will deal.
> So here you are saying that bcrypt generates a random seed, and I'm thinking "how does bcrypt get around the aforementioned problem?"
> If you don't want to answer my specific question ("why is bcrypt stronger..."), then don't.
]
I'm neither tptacek nor a security expert, but I think I know bcrypt well enough to answer your questions. Apologies if anything I say isn't correct; someone correct me if that's the case.
Starting with the problem: SHA1+salts are well and good. The password isn't in plaintext, so I can't just read it. SHA1 has no real cryptographic attacks on it. The salts prevent rainbow table attacks.
Except there's a major flaw. SHA1 is fast. SHA1 is very fast.
When you hash your password with SHA1+salt (even if you use a 1024 bit salt) an attacker will get a GPU cluster and run billions of hashes per second until they bruteforce your password. Anything under eight characters will fall in under a day, easily.
Now, let's talk bcrypt. Based off of blowfish (which has a reasonably long key-schedule algorithm), bcrypt is designed to be very slow. Very very slow.
If you want to think of a very naive way of doing this, imagine my hashing function is simply
salt||SHA1(SHA1(SHA1(SHA1(...(SHA1(password||salt))))))
It is easy to see how this would take significantly longer than simply salt||SHA1(password||salt). Now there's some other details that you don't seem to be interested in, but that's the main idea. If you care to read more, see the paper.
This means that if an attacker would try to bruteforce even the simplest passwords, it would take significantly longer. Look at the chart on page 11, for example. The nice thing about this is that you can, over time, adjust it to take longer and longer as computers get faster and faster.
So, bcrypt isn't some fancy thing which picks salts in some random way (although it does get salts; as for how, I don't know, but /dev/random would work just fine).
If you care about it and want to look in to it more, scrypt is also interesting -- it also uses as much memory as it can to stop simply throwing just CPU time at a problem.
The algorithm requires (cost, salt, password) as input, and the output hash includes the cost and salt. When generating a hash for a new password you should of course be choosing this salt randomly, and when checking a password be using the stored cost and salt.
In particular see the Implementation section:
An important requirement of any bcrypt implementation is that it exploit the full 128-bit salt space. OpenBSD generates the 128-bit bcrypt salt from an arcfour key stream, seeded with random data the kernel collects from device timings.
This article is missing a bunch of facts, namely that NTLM passwords have been broken for 2 to 3 years long before the invention of cheap GPUs. Rainbow tables are a far more effective way to attack NTLM hashes. Cracking passwords is trivially parallelizable, if you let your hashes fall into the hands of the enemy then it's only a matter of money.
The fundamental issue is that of salting combined with not disclosing your password database. If you're using Windows you should enable the options for SSHA and disable the NTLM / LM hashes. Instead of using a password use a passphrase. I use 3 levels of password and at the top level you're looking at ~22 characters which contains characters outside the normal ASCII range so an attack is looking at ~65536 combinations instead of ~256. I doubt many attackers include the snowman character. :) (and no that char isnt in my password)
Running the math quickly yields a time to crack in excess of the lifetime of the universe, it would probably be easier to generate a hash collision than crack the password.
GPUs are rendering MD5 and other weak hashing methods useless when a database is compromised. Using bcrypt [properly] can reduce the attempts the fast computer on the planet could make to a handful per second.
His recommendation of not using salts is very misleading and should not be quoted due to poor advice as Bcrypt is also useless against dictionary and brute force attacks.
The aim of password storage security is to prevent pre-computed hashing reasonably enough. No amount of password hashing even if it takes 1000 years to hash a password if your users use '12345678' as a password.
Salting protects against pre-computed hashes (rainbow tables [0]). Otherwise I can pre-compute many bcrypt hashes and then quickly look up the password.
All secure password storage schemes are randomized. If you insist on using the word "salt", then just know that bcrypt, scrypt, and the PBKDFs build the salt in. You cannot "pre-compute many bcrypt hashes and quickly look up passwords". So: if you're using a scheme that requires you to think about salts, you are virtually certain to be using an insecure storage scheme.
And, dictionary and brute force attacks are the entire problem bcrypt is designed to address. Go back to that article that you didn't really read, follow the first Usenix link, and read the Usenix paper it cites.
Bcrypt uses a 128 bit salt. The code I provided above didn't show it, because the library simply generated it for me.
The bcrypt libraries that require users to explicitly provide a salt are themselves doing their users a (slight) disservice, by making it seem as if there was a reasonable option for salt generation other than simply using a CSPRNG.
> The code I provided above didn't show it, because the library simply generated it for me.
... and this is another reason why people should use one of the bcrypt() implementations for their language* - the API makes doing the wrong thing difficult or impossible.
The problem is, I can onlybchoose the password hashing algorithm when I control the server side code. 99.9% of the time I don't, and I don't even get to find out how the people controlling the server side code are storing by password.
Quick quiz: how does HN store passwords? How about apple.com? Facebook? Twitter? Gawker? Sony Pictures? Perlmonks?
A suitably paranoid person would assume any password you've given to somebody else's website is compromised.
Do not reuse passwords ever. Don't even think "I'm just trying this new web service out, I'll use the same password I always use when trying new things out", 'cause you'll end up forgetting to upgrade that password when something gradually changes from "some new and maybe interesting website" to "somewhere that is an important part of my online reputation" or "somewhere I've given authority to charge my credit card".
(and, even more importantly, don't ever fall into that trap when developing server side code "Oh, I'll just do a quick login method for testing that just stores cleartext passwords, I'll fix that bit up before we go live..." Because one day that code _might_ end up live...)
Kind of proving my point. You'd hope somewhere that calls itself "hackernews" and which may as well have an echo-bot set up quoting that codahale article about bcrypt any time someone writes "password" - might be handling passwords "properly". SHA1 isn't really all that different from Gawkers MD5 password ballsup...
If HN ever gets 0wn3d, anybody with a (password) dictionary word or 8-9 char or less password will almost certainly be exposed. (even if the database doesn't get broken into, anybody who gets enough shell on the webserver to read /tmp/shash will be able to see cleartext passwords passing through...)
There isn't a database per se - usernames and password hashes are stored in a file. If someone managed to read that file, it would not be difficult for them to obtain most of the passwords.
It may store it as plaintext as well, without HTTPS it does not really matter. I mean it is easier to obtain passwords with tools like firesheep instead of trying to break into DB.
On the other hand, it is not like you have something secret there. The worst thing that could happen is someone pretending to be you.
That said I'd love to have https option.
There are at least a few obvious solutions. One is to use long passphrases. Another is to stop trying to memorize your passwords, write them down on a postit, encrypted file, etc. The main threat to your security is not someone digging through the postits in your wallet, it is the hash being stolen from someone's insecure database.
The bad part is that some sites (even banks!) force you to limit the number or type of characters. In one ridiculous case, I was forced to use an 8 character pwd with only lowercased letters.
The solution is to use tools like KeePass or 1Password to allow maximum length passwords (randomly generated also) wherever possible.
I recently signed up for the new payment system for my local water and gas company...
you must use your phone number as your user name, and the password, to my suprise, can only be digits, up to 8 characters.
Pretty horrible. I wonder what level of idiot designs these systems.
That "new payment system" is quite possibly interfacing with, say, mainframe code written in the 70s. That's often the source of banks' ridiculous account restrictions.
I can't see any legitimate technical reason to restrict passwords.
In my example, they could put a sensible account system on the front end, and do whatever they need to in the back end to link it to my phone number and a random 8 digit number, if that's what they need.
Same with banks. They can set up a real user name and password from you, store it in a modern system, and link it to the older system behind the scenes. Perhaps there's some arcane legislation that prevents this, but it seems unlikely.
It's likely that they are storing passwords in plain text or encrypted. If my website can accept forum posts in UTF-8 of 256+ characters, then people can log in with UTF-8 256+ char passwords.
Since this has attracted the attention of some very knowledgeable people, can anyone suggest a good textbook/site/paper discussing password security and encryption? Obviously I can google 'computer security' and find something, I just wondered if there was a well regarded standard reference (ala K&R or Art of Electronics). I don't understand the majority of the techniques being discussed here but would quite like to.
Sorry, key derivation functions are kind of a back alley in crypto research. Your best bets are Colin's scrypt paper and the original bcrypt paper; the bcrypt paper is on the link at Coda Hale's "How To Store Passwords" article, which you can find plastered all over this thread, and Colin's paper is at:
Implementers should be aware that NTLM does not support any recent cryptographic methods, such as AES or SHA-256. It uses cyclic redundancy check (CRC) or message digest algorithms ([RFC1321]) for integrity, and it uses RC4 for encryption. Deriving a key from a password is as specified in [RFC1320] and [FIPS46-2]. Therefore, applications are generally advised not to use NTLM.<68>
> Yes, but once you are in to the point where you have the db...what else are you going after?
There are many conceivable attacks that would result in a read-only dump of password hashes. An administrator's account would be useful in such an administration, turning your read-only access to read-write access.
I would hope the db didn't store credit card info or at least encrypted it somehow. Granted that's a pretty big hope. But if you can crack the passwords in the db you have, chances are those passwords will work on other sites. People often use one password for everything.
SQL injection, social engineering a password, somebody's crappy "secret question," insecure PHP, and hundreds of other ways. Closing off known avenues of attack and still mitigating against a leak with peer-reviewed math is what you have to do.
Hopefully, the hashes are physically secure and accessed in a way that would require a great deal of human effort to steal. If that is the case, then yes, you can control the login process.
The technique in the article is relevant when one has the hashes and wants the plaintext (and according to some here, still easy to mitigate then) - if you're guessing a web login, different game.
NTLM passwords? Meh. An awful benchmark for evaluating modern passwords. Who uses NTLM besides Windows, honestly? (Apparently the person who seemed to have not liked this comment, very quickly, heh)
I'd be curious about the timeliness of GPU cracking a 8 character, alphanumerical mixed case password hashed with SHA1?
Like MD5, SHA1 is designed to be fast. For that matter, so is SHA256, and all the SHA-3 candidate functions. The right way to think about this is that all the strong crypto hash functions are unsuitable for use by themselves as password hashes.
Cryptographers have a mechanism for securely turning human-readable text into secret material: key derivation functions. All the modern KDFs are adaptive, meaning that they come with a "pain" dial that you can turn to force attackers to spend more time executing crypto code paths and less time comparing results.
You can build decent KDFs out of functions like SHA1, but if hardware GPU cracking is the future, you're best off with something like Colin Percival's "scrypt", which was explicitly designed to be memory-hard (requiring lots of state to compute) and hard to parallelize:
I know you're a domain expert and in this thread you've suggested both bcrypt and scrypt. Is there any advantage to one over the other? From my POV it seems that bcrypt has more implementations and seems better-known (in a good way).
This is a great example of a bona fide awful trade press article: some guy with a column-inch quota sees a piece by another trade press hack who is in turn digesting a blog post by someone who is "allergic to command line utilities" writing up someone else's GPU password cracking tool. And so, strong passwords are useless!
Except that this has nothing at all to do with why passwords are bad. The problem GPU crackers exploit was solved over a decade ago with adaptive hashing. GPU crackers cannot crack scrypted passwords of any real complexity in tenable amounts of time.
Once again: http://codahale.com/how-to-safely-store-a-password/