Hacker News new | past | comments | ask | show | jobs | submit login
Salted Password Hashing – Doing it Right (crackstation.net)
249 points by axelfontaine on Feb 23, 2014 | hide | past | web | favorite | 148 comments



This article looks great for anybody who wants to know the details. But "doing it right" is much simpler, and doesn't really need an article:

"Use an accepted key derivation function, such as PBKDF2, bcrypt or scrypt".

All the work described is encapsulated into the concept of a key derivation function. Telling people to go into the details is like telling people how to do encryption by describing how to implement AES and Twofish. Don't go there. Find a library that implements a key derivation function, and use it. PBKDF2, bcrypt and scrypt are all fine.


You might want to add "with a high number of iterations or work factor," at least. PBKDF2-HMAC-SHA-256 is just as useless as 2xSHA-256 if you just do one iteration.


Yes - this is reasonable. I'd generalise this to:

"Use an accepted key derivation function, such as PBKDF2, bcrypt or scrypt, with accepted parameters".

What is "accepted" changes over time, of course.


There is some use in this article though, because it gives some good examples of terrible crypto. Oddly enough the "combine different hash functions!" meme isn't just limited to amateurs: "An Secure Hash Algorithim (SHA) hash of the received challenge string, the peer challenge string, the session identifier, and the MD4-hashed version of the user's password."

http://technet.microsoft.com/en-us/library/cc957983.aspx


I worked with a group once that stored both MD5 and SHA1 hashes of passwords. Their reasoning was that it was WAY harder to get a collision with TWO hash functions.


It's a little premature to be recommending scrypt. There have been some posts on openwall suggesting it may be weaker than bcrypt, although it is also still a work-in-progress. I'd hold off until it is more battle-hardened before either recommending it or using it.


Link for that? All I find is mentions of its weakness when used with very small amounts of memory (like Litecoin's silly decision to go with 128KB).


Sorry for late reply, the post I was thinking off (and didn't revise before posting) is http://www.openwall.com/lists/crypt-dev/2012/09/02/1

Search for the word "weaker"


Thanks - that's actually the post I was thinking of. Considering on most modern machines scrypt will likely tune itself to use 512MB, a 1MB buffer's pretty small, but it would be interesting to know where the cutoff for significantly-better-than-bcrypt might be. I expect most sites could throw 16MB at logins without much difficulty (as suggested further in that thread).


Indeed. This article is much longer than the docs for some bcrypt library in $LANGUAGE.


That's a lot of code in the PHP example! To use bcrypt you can just do this:

  password_hash($password, PASSWORD_BCRYPT)
From version 5.5 that is, but you can get a compatibility library for older versions.


I was wondering how bcrypt can be secure while not requiring a salt to be passed in, since the article devotes so much time to salts. I found the answer here: http://stackoverflow.com/questions/6832445/how-can-bcrypt-ha....

Basically, bcrypt randomly generates a long, probably-unique salt for every hashed password, and the salt is stored along with the hash by simply concatenating it to the string (e.g. “randomsaltZdR3/passwordhashMP2tA”). To check an incoming password against the stored hash, bcrypt splits the stored string on ‘/’ to extract the salt, hashes `password+salt`, and compares that hash against the one to the right of ‘/’ in the stored string.


The PHP function produces output compatible with the crypt() function, which has not only the salt and the hash, but also includes the type of algorithm and the bcrypt cost.

This means that you just whack the hash and the password the user entered into password_verify(...) and it will tell you if you have a match - you don't have to keep track of the four parameters.

The beauty of that when a stronger algorithm comes out or you want to increase the cost factor of your passwords, you just change the hashing code and you don't have to do anything fancy to not break all your old hashes.

You can also use the password_needs_rehash(...) function once you verify to see if you should rehash the password to bring it up to the new level.


In general, using the defaults (mcrypt_create_iv() using dev/urandom for the salt and bcrypt with a work factor of 10 are specified with PASSWORD_DEFAULT in the curreent implementation) along with password_needs_rehash() will keep you up to date with no "retouch" necessary. When the default changes, password_needs_rehash() returns true, and you then rehash with the new defaults. All you need to do is make the db column wider than necessary for the current defaults to put maintenance off into the future. (The only real problem is that the hash function truncates passwords at 72 characters at the moment, which will force you to pre-hash the user value with something like SHA256 for very long passphrases.)


I don't think it's bcrypt that does it. It's the underlying implementation of bcrypt chooses to do it for the users. Or does the actual bcrypt standard says implementation has to generate one for user?


It seems that bcrypt does not specify that salts are randomly generated, but it does specify that the salt is stored with the hash. The “Bcrypt Algorithm” section of the paper at https://www.usenix.org/legacy/publications/library/proceedin... shows salts to be an input to the algorithm, and also describes them being concatenated with the hash. Random generation of the salt must be just a de-facto standard for bcrypt APIs. It’s true that that feature is orthogonal to the hashing function, and so libraries for other hashing algorithms could easily copy it.


I think the part concat with salt is required. But generating it FOR the user is orthogonal to the hashing function. But I will be surprise if the paper actually says it is required to generate one - because that shows the cleverness of the inventors, but also a potential weakness (implementation may be using weak random source, though counter-argument is that user supplied salt can be weak too).


Great summary, thanks! I've looked at a few blog posts on bcrypt and they were so convoluted I left more confused than when I began.


This is bang on. PHP devs need to stop rolling their own hashing methods.

The compatibility library to use for PHP <5.5 is Antony Ferrara's password_compat: https://github.com/ircmaxell/password_compat.


Agree - same applies for the other languages at the bottom of the page. He should remove those code samples and just link to the bcrypt function/library for each language.


Disagree. Windows .net devs shouldn't use Bcrypt. The .net bcrypt port hasn't been verified for .net, so they should be using pbkdf2 instead, as per the article.

The most recent version of the ASP.net Membership provider offers this hashing algorithm as a built in option.

http://stackoverflow.com/questions/481160/is-bcrypt-a-good-h...


Has the BCrypt code for all other languages been formally verified?


I assume this is that .NET library function Rfc2898DeriveBytes?


Question: Is there any reason to prefer bcrypt over pbkdf2, or vice versa?

I can see that scrypt has an obvious advantage over both, since it is memory-intensive unlike the others. But when comparing bcrypt with pbkdf2, I can't seem to find any reason to prefer one over the other.

bcrypt is everyone's darling at the moment, but why so little love for pbkdf2? PHP 5.5, for example, bundles a very convenient password_hash() function that defaults to bcrypt, whereas hash_pbkdf2(), which is also new to PHP 5.5, won't even generate a salt for you.

Is pbkdf2 broken in some way, or do people simply not trust it because it's just a bunch of iterations of sha256?

On the other hand, has anyone discovered a weakness in bcrypt and/or the underlying blowfish algorithm?

On my computer, pbkdf2 with 20,000 rounds of sha256 takes about as much time as bcrypt with a cost factor of 10.


Yes, all other things being equal, you should prefer bcrypt, which forces more effort onto attackers than PBKDF2 given similar parameters.

The reality is, bcrypt, scrypt, PBKDF2: throw a dart at them and you'll be fine no matter which you hit.


The long and short of it is that PBKDF2's assumptions are better vetted, but bcrypt is stronger if its assumptions are correct.


Exactly what do you mean by this?


That PBKDF2 has been more extensively studied, but it has lower memory requirements. So the chances of a surprising vulnerability being found in bcrypt are higher, but failing that, it is the better option.


In what way has PBKDF2 been more extensively studied? It's the product of a standards effort, not academic research. bcrypt was introduced in a refereed Usenix paper. I'm not suggesting bcrypt is particularly rigorous either, but I'm unclear on what "study" you're referring to for PBKDF2.

I'm also unclear on which "assumptions" you believed bcrypt depended. Could you be a little more specific, please?


I'm just going by what I've read elsewhere, and I don't have citations on hand. I'm not a security expert, I just try to listen to security experts.

So if you say bcrypt is as well vetted as PBKDF2, I'm going to defer to your judgment and update accordingly.


What's maybe more relevant is that PBKDF2 is explicitly called out by name as the Proper Thing in various standards, so it's less work to use it in environments where people care about those standards.


bcrypt has a fixed output size, which makes it less than perfect for generating encryption keys from passwords. That's the primary function of PBKDF2, while bcrypt is mostly used for simply comparing hashes.

You could also feed the output of bcrypt into PBKDF2, but as far as I'm aware, nobody is recommending against straight PBKDF2/SHA256 for key derivation.


This is what I use to justify my need.

http://security.stackexchange.com/a/33684/9897

As I said in another post, use adaptive hash function (bcrypt, scrypt or pbkdf2). And if you do it with limited computing power, my preference is to use PBKDF2 over bcrypt (scrypt is of course out of question).


Do you know where that table came from originally? I'd be interested to read a fuller report so if/when I resent those figures to anyone else I can answer any pertenant questions that come up.


That table can be found in the paper "Stronger Key Derivation via Sequential Memory-Hard Functions"[0] by Colin Percival[1]. This is the paper that introduces explains and introduced scrypt.

The costs are based on what was considered 'modern hardware' in 2009.

[0]: https://www.tarsnap.com/scrypt/scrypt.pdf [1]: https://news.ycombinator.com/user?id=cperciva


Which cryptographically strong hashing algorithm to use? It depends on situation. I think adaptive one ( PBKDF2, bcrypt, and scrypt) should be used all the time. PBKDF2 if you want to hash on devices with limited computing power, bcrypt and scrypt on standard servers.

Other one like SHA256, SHA512 are supposedly for fast, cryptographically strong hashing. At least that's how I view it.

> If you have special security needs, enforce a minimum length of 12 characters and require at least two letters, two digits, and two symbols.

I hate when passwords require certain patterns. See http://i.imgur.com/hvyziAx.png from USCIS.

And I think there is one point missing: limit the max number of characters for the password. As shown in Django's password DoDS attack, don't allow long password takes up 1MB.


> I hate when passwords require certain patterns.

I think people started doing this when passwords were typically short, to increase the search space. Although one unwanted side effect is that it also reduces the search space, since if you know that pattern requirement then you know not to brute force anything that doesn't have two letters, two digits etc.

But assuming a required pattern of a certain length does the job (whatever the system administrators think "the job" is), I wonder how much longer a password needs to be, with no pattern requirement, and be as difficult or more to crack than the shorter patterned password.


I think a good password would be something like, instead of writing the letter "a" in ascii, you use "a" in another encoding (I forgot what that called). I supposed no one ever thought of cracking such password yet.

The worst I have seen is from USCIS. See this image: http://i.imgur.com/hvyziAx.png


> I supposed no one ever thought of cracking such password yet.

Good luck with that!


I wonder where the no $ comes from.


>I think people started doing this when passwords were typically short, to increase the search space.

What annoys me most is when systems have a maximum password length which is something on the order of 10-12 characters. It's needlessly reducing the search space.


Similarly, when they disallow ordinary ascii characters like punctuation and spaces.


Well, then they're harder to format when printing. :)


"I wonder how much longer..."

This may help: http://xkcd.com/936/


Django reverted the password limit change:

"In Django 1.5.5, we’ve reverted this change and instead improved the speed of our PBKDF2 algorithm by not rehashing the key on every iteration."

Incidentally Bcrypt does have a 56 byte limit which I understood to be a Blowfish limitation (you can use longer bcrypt passwords by doing a SHA256 pass first)

edit: typo


Incidentally Bcrypt does have a 56 byte limit

This is incorrect; bcrypt has a 72-character limit. Try this in python:

  import bcrypt, sys

  salt = "$2a$10$2TmO7iAhRfimvNwvpBn.7e"
  print bcrypt.hashpw(sys.argv[1], salt)
Nearly identical 56 and 57-char inputs produce different hashes:

  $ python pass.py 12345678901234567890123456789012345678901234567890123456                                                                     
  $2a$10$2TmO7iAhRfimvNwvpBn.7e701F3mp2Z7fpuY/4lu6vUUkrlMEmMou
  $ python pass.py 123456789012345678901234567890123456789012345678901234567
  $2a$10$2TmO7iAhRfimvNwvpBn.7ejJV9jpN0Ahp2RmNAdUGaRxRZndRAs9y
Nearly identical 72 and 73-char inputs produce identical hashes:

  $ python pass.py 123456789012345678901234567890123456789012345678901234567890123456789012                                                     
  $2a$10$2TmO7iAhRfimvNwvpBn.7e9w2f6/fKQ2QBu1eaIXp4A1WheruxtGK
  $ python pass.py 1234567890123456789012345678901234567890123456789012345678901234567890123
  $2a$10$2TmO7iAhRfimvNwvpBn.7e9w2f6/fKQ2QBu1eaIXp4A1WheruxtGK
you can use longer bcrypt passwords by doing a SHA256 pass

True, but: Beware! Imagine you're doing bcrypt in C, and the first sha256 output byte is a 0-byte. bcrypt() sees the NUL / empty password and produces a hash trivially breakable by anyone aware of this issue. :-)

So I'd recommend doing Base64 on the sha256 hash, if you're using C and you want such long passwords with bcrypt. But plain bcrypt will give the best results.

Hopefully the above illustrates the general wisdom of avoiding DIY crypto of any kind.

edit: code formatting


EDIT: I was attributing the 56 byte limit incorrectly to hashing, see the reply I made to this comment for what the limit actually seems to be.

Maybe it's changed but the paper which defined Bcrypt specifically states:

"Finally, the key argument is a secret encryption key, which can be a user-chosen password of up to 56 bytes (including a terminating zero byte when the key is an ASCII string)."

https://www.usenix.org/legacy/events/usenix99/provos/provos_...

which is a subpage of:

https://www.usenix.org/legacy/events/usenix99/provos/provos_...


To clear up my own confusion, the 56 bytes only comes in when using bcrypt for encrypting vs hashing, however the "72 character" password limit for hashing is actually 18 words of 32-bits.

http://www.gossamer-threads.com/lists/wiki/wikitech/430719#4...

"Note that much confusion on the web about key lengths with bcrypt ('72' vs. '56' bytes) comes from the fact that there are TWO algorithms called 'bcrypt' which both happen to use Blowfish. One is an encryption algorithm, and the other is a hash algorithm. While they share a common core component, the purposes are thereby entirely different. For the sake of the bcrypt HASHING/key-strengthening algorithm (which we care about now), the 72-byte input parameter is in no way a theoretical problem at all, even for non-Latin UTF-8 based passphrases which eat up 3 bytes per unicode point. The reason is because the text-based passphrase itself needs to 'somehow' be converted into 18 words of 32-bits each anyway. (If we were _encrypting_ then the 56 bytes limit for THAT algorithm would come into play.) Even if you restrict your attention to ASCII it would not be ideal to simply convert ASCII code points to a (zero-padded) juxtaposed stream of 32-bit chunks anyway, because, well for one thing, you would be throwing away entropy due to not using the upper 1-bit range in each char, not to mention the range of unavailable non-printable ASCII characters."


I have question. Is salting solution like hash_function(salt_hardcoded_in_app + password_string + user_unique_salt) bad? Does adding hardcoded salt reduces security of app? Why? Thanks for any info on that.


A hardcoded value like that is actually known as a 'pepper' -- cute right? If you google 'salt and pepper cryptography' you'll get your answers on why pepper is, at best, a good addition to salt, at worst useless. Sorry, Sunday morning, feeling lazy, no link to give :-D


Thank you, I looked few pages. Looks like paper is useless most of the time. According by this article http://blog.ircmaxell.com/2012/04/properly-salting-passwords... better solution is encrypt(hash(salt + password))


What is the encrypt() for? Anyone who recovers your DB will recover the key.


Not necessarily, if the database server is separated from the app server. If only the db is compromised, an attacker would first have to crack the encryption key, before being able to crack the password hash. But apart from the encryption key being really long, I believe that there's also no easy way of knowing if you cracked it, because all output would just look like a hash.

I think this is actually a great idea. Imagine someone steals your database with 100,000 user accounts, and wants to find a few users who used the password: "password123". They could easily take each salt and calculate password hashes in parallel, and would probably find at least one user account using that password. However, if the hashes were also encrypted with an unknown key, then none of your user's passwords could ever be cracked.

I'm actually surprised this is the first time I read about encrypting password hashes. I think the Rails gem "devise" should do this by default.


I had same thoughts about that idea with encryption. In theory getting passwords is just matter of time. Encrypted hashes give you more time and additional challenge for attacker.


in article as encryption AES used to encrypt resulting hash.


The article doesn't really justify this. What benefit does the block cypher offer with a fixed key offer over not using a block cypher with a fixed key?


i usually add pepper like you're describing prior to bcrypting. my thoughts being that it adds a bit of security in case your db gets compromised but not your server/code. the chance that one happens without the other may be slim.


They don't need to compromise your server/code. They just need to be able to create an account (with a password they know) before they dump your database. Then it's just a matter of brute-forcing the 'pepper.'


yes, but the pepper still must be brute forced, and to do so requires that each attempt is run through bcrypt.

if you store the user_salt and the output of bcrypt(code_pepper + known_pass + user_salt) in your db, guessing the pepper requires comparing the bcrypt() output of every random pepper. you are as unlikely to brute force it as the plaintext pass of other users.

this type of pepper is essentially a mechanism for key stretching [1]

[1] http://en.m.wikipedia.org/wiki/Key_stretching


Any time you, as a non-cryptographer, deviate from established cryptographic solutions based on "your thoughts being", you are at best achieving no extra security, and at worst seriously weakening your security.

Seriously, this is a terrible mindset when dealing with security. Stop it.


If the 'common-salt' length is counted as part of the total salt length, then an attacker who knows the common part of the salt could more easily generate a rainbow table (as half the salt is predictable).

If it's just added (but not counted), then it'd be like asking for a password (XXX) and always prepending something to it before hashing (saltyXXX) -- at which point -- what value are you getting out of it?

I don't believe an /uncounted/ common salt is harmful - but I don't see any use on its face.


I just added a comment basically saying the same thing. But you should use data that you would have that an attacker would not. First and Last Name as stored in your system, User ID (like 128912) and User Creation date are good choices. These should be in a separate database from the hashes.


Making the password comparison depend subtly on other fields seems like a bad idea from a maintainability perspective. Lots of reasons to update name, etc., which might not happen anywhere near the password code, and might be done automatically.


If the fact that a field might change some where else in your code is a "maintainability" issue, you have bigger issues and shouldn't be doing anything with passwords.

Private User Data should not be changed by many things. That is "PRIVATE" and therefore should be under tighter control than the rest of your data.


You misunderstand, it seems. The user's password shouldn't need to be re-entered and rehashed every time a non-password field is changed. Suppose you have a new developer on your team who writes a new uset profile page that can edit user first and last name without asking for the user's password again. Suppose you want to create an admin interface that allows your customer support team to change a user's name or e-mail address without seeing or knowing the user's password.

Don't try to outsmart the likes of cpercival and tptacek by inventing your own goofy hashing scheme. Just use bcrypt or scrypt as they suggest, the reasons for which they have explained repeatedly on this site for years.


Bcrypt offers no increase in security, and can add huge delays in your sign-on, user creation, and open you up to computational attacks by doing repeated sign-ins.

My "Goofy" scheme is based on the practices which Microsoft, LinkedIn, and FaceBook all use. You know which of those use Bcrypt? Zero.

Maybe you shouldn't blindly follow things people say.


> ... and can add huge delays in your sign-on

Source? The typical benchmark is 8ms for a generation of a bcrypt hash on a modern machine.

> My "Goofy" scheme is based on the practices which Microsoft, LinkedIn, and FaceBook all use. You know which of those use Bcrypt? Zero.

Again, source? PBKDF2 and scrypt would be acceptable alternatives and both can take just as much time to compute as bcrypt (or more!) depending on how many rounds are set.

The rest of your advice in these comments (esp. your salt 'mechanism') is wildly off-base and in many cases, actively promotes worse security without any evidence to back up your claims.


DoS of the high-work-factor algorithms actually is a thing. The bad part is the attacker doesn't have to wait for a response, so he can actually initiate a bunch easily.

The gold standard if you are worried about DoS and need security vs. stolen db right now is to encapsulate everything inside an HSM and have a symmetric algorithm. Unfortunately HSMs today are shit -- $30k, slow, a hassle, etc. I'm testing some cheap stuff to do this cheaply and sanely (USB connected, etc.).

If you can scale out your frontends, use bcrypt/scrypt/pbkdf2 with a reasonable work factor.


> DoS of the high-work-factor algorithms actually is a thing. The bad part is the attacker doesn't have to wait for a response, so he can actually initiate a bunch easily.

Thankfully, rate-limiting login/registration forms is pretty straightforward (although the metric choice, not so much) and is good practice. If you can HTTP 429 them before they can actually submit a password to your service you're mitigating a large part of the problem.


>Bcrypt offers no increase in security

[citation needed]

>My "Goofy" scheme is based on the practices which Microsoft, LinkedIn, and FaceBook all use.

I don't want to be sarcastic here, but you're seriously going to cite LinkedIn? Really?


I don't like to, but every time I point out why node sucks, the node guys cite LinkedIn, so now I include it.


That makes absolutely no sense. You're already arguing from authority while citing less reliable authorities than the security experts who recommend bcrypt, and then you're going one step further by including a site that hurts your argument by having already had a massive data breach that resulted in leaked passwords.

And your excuse is "people who disagree with me in an unrelated argument cite LinkedIn, so I'm doing it too"?


You do recall that LinkedIn had its passwords stolen semi-recently.


You misunderstand. You shouldn't have the password in memory. The password should basically never exist beyond the fractional second that you are validating it.


I know that. Which is why the hash should include just the password, and not a bunch of other data that would require having the password in plain text again just to let the user change something in their profile.


> But you should use data that you would have that an attacker would not. First and Last Name as stored in your system, User ID (like 128912) and User Creation date are good choices.

These two statements contradict each other.


I personally think it's a good idea to use both. As someone else stated, at worst it's useless. For me, the best case scenario is, someone hacks the DB, now they have the salt for every user, but they still don't know the common salt because it lives in the script and not the DB. Just my 2 cents.


Seems to be down, here's the google cached version: http://webcache.googleusercontent.com/search?q=cache:S76Yg0P...


It's always shocking to me how many people just assume that a login/password system should store everything in plain text. This was a pretty good explanation of the right way to hash.

One problem though with this guide, some experts think it makes sense to add some extra computation to make it a little bit more difficult to compute the hash, for example creating a hash loop which rehashes 10,000 times or something similar.

In the past, I've found the following php tutorial really well written, and comprehensive:

http://forums.devshed.com/php-faqs-and-stickies-167/how-to-p...


That's what key derivation functions, and hashing algorithms like bcrypt do, they have a #rounds, or a cost factor which adds a non-trivial amount of computation time. Fast enough that it doesn't impact "human time" but slow enough that "computer time" is impacted.


I remember the incident when the League of Legends server got hacked and the attackers got all the passwords in plaintext. After that my google account almost got hijacked because I reused passwords - lesson learned :D (To be honest, I didn't expect one of the largest online games to have such incompetent programmers) This guide is a must for everyone who handles user passwords.


That has happened to me as well, when Dropbox leaked passwords (they claimed only emails leaked... bullshit). I reseted the Dropbox password right away, but someone had access to my Steam account with the same password and purchased a game. I was able to get a refund, but my password probably got into some forum, every other week or so someone attempts to login into my Steam account (Steam now requires a token sent by email to login from a different computer).

Lesson learned, never reuse passwords.


Node.js password hash module, node version >= 0.11.11. https://github.com/fundon/pbkdf2


How is this better than the built-in pbkdf2 in node?

http://nodejs.org/api/crypto.html#crypto_crypto_pbkdf2_passw...


Node.js stable version(0.10.x) can not change algorithm. http://nodejs.org/api/crypto.html#crypto_crypto_pbkdf2_passw...


Not many people are going to be running the unstable branch of node in production.


Question: that article suggest to hash passwords client side too. This has any security advantages?

The only one that I can think of is that an eventual MITM can't sniff your password, and then try to reuse it with other sites (as he only has an hash specific for a site, and not your password). Is that all?

"If you are worried about the computational burden, but still want to use key stretching in a web application, consider running the key stretching algorithm in the user's browser with JavaScript." make it sounds like I can use a lighter hash-solution server-side if I use key stretching client-side (at least to me, non native here so probably I'm just reading that wrong).


It gives some level of protection against buggy server-side code leaking the plain-text password. I remember a case where a site logged all request information (including the plain-text password) directly to a log file. Oops!


IMO Hashing on the client side is a terrible idea. There is now no way for the application to check the complexity of the password on the server side. Relying on client side controls to do the same is almost useless.


Why do you think it's useless? Not transmitting the password in plain text is clearly a win, and you can do whatever checks you need on the client side just as easily. (If a user wants to fool whatever checks you have, they can do that whether or not you have server-side checks--it happens all the time.)


It's fairly useless: If an attacker can sniff your connection he can just submit the encrypted password he sniffed to the server. The only win you have is that the attacker can't see the password itself, a potential bonus when people reuse passwords.

However, if you can sniff the connection you can probably alter it and inject javascript that submits the clear-text password to the attacker.


We are just talking about scrambling the password. You would still be using e.g. SSL/TLS or TLS-SRP for the transport. There's nothing "useless" about it.


Well, useless as in "added implementation complexity and no gain". It won't hurt much either, but you could be spending your time better elsewhere, for example in tuning your SSL/TLS-setup.


No. There's absolutely a gain. A big one, too. The main reason it isn't done today is not that it isn't useful, but that it's too slow to run e.g. bcrypt on the client without native browser APIs. WebCrypto will change that by adding PBKDF2 as a module accessible from JavaScript.

The main reason we encourage people to use key stretching algorithms on the server is that, if an attacker gets access to a database of password digests that aren't very strong, they can trivially be reversed.

Doing this key stretching or "password scrambling" on the client side simply moves the processing burden from the server to the client. There is nothing less secure or less useful about it.


There is, if you don't again hash the passwords on the server. This is why new password-hashing proposals like [Catena](http://eprint.iacr.org/2013/525.pdf) include an official "server relief" mode where the majority of the hashing is done on the client side, but there's still a final server-side transformation step.

Until such time as these things are readily available, recommending that people do client-side hashing is absolutely going to result in trivially poor implementations.

You might want to consider that if these problems were as trivial as you seem to believe, there would already exist a library vetted by cryptographers to do exactly that.


I agree with you about SSL, client-side hashing seems like a solution for a very mature website with clientèle who strongly value their privacy. Something like LavaBit comes to mind (RIP). But "useless" is too strong a word. It is really trivial to use stolen credentials on other services, and effective to boot. But by hashing on the client side in addition to the server, you "double" the amount of effort required to perform the attack.


Password complexity is not a critical validation. The client-side code could easily perform the same validation before hashing. If someone hacked the client-side code and intentionally disabled the validation, then they could submit a weaker password, and that's not a big deal. It's a huge win if the plaintext password doesn't leave the browser.


The java version instantiates a new SecureRandom instance every time it creates a new salt. While I don't think this hurts the randomness of the salt, it's an unnecessary performance hit.


As far as I know, that behavior is best practice. An instance of SecureRandom does not automatically receive new entropy over time. If you desperately need to keep using that same instance, you have to keep seeding it.


You can use a SecureRandom implementation that uses the underlying entropy pool for every call, or you can use an ordinary implementation of SecureRandom and call getSeed(numBits) instead of next(numBits), or as you mentioned, you can call random.setSeed(random.getSeed(NUM_BITS) every once in a while. Finally, if you are on linux the default SecureRandom provider will mix urandom with the prng output on every call to nextBytes, so there's no need to reseed at all.

Anyway you look at it, there's no good reason to instantiate use a fresh SecureRandom instance every call.


As this is a tutorial, the snippet will likely be copy-pasted into production code running on God knows what OS by someone who can't imagine why there could be a problem with seeding from the system clock. Using a fresh instance allows the code to be portable, noob-friendly, and concise, which are important traits for its context.


...or you can consider going password-less: https://news.ycombinator.com/item?id=7283650


SRP [1] is a very nice alternative for storing password hashes in the database. As an added bonus, it prevents MITM attacks which steal user passwords before they reach your servers. Meteor [2] JS framework successfully uses SRP for authentication.

[1] http://srp.stanford.edu

[2] http://docs.meteor.com/#meteor_loginwithpassword


Browsers don't support SRP, which means that if you want to use it, you'll need to deliver it via Javascript. SRP is particularly tricky to get right; a substantial fraction of the SRP implementations we've looked at have bugs that cough up authentication bypass. SRP verifiers are easier to crack than bcrypt hashes (though harder than trivial salted hashes). And finally, it's not much of a win; you get the same or better protections simply by running your login over HTTPS.


Would you mind expanding on that a bit? I mean, what kind of authenticsation bypasses you found? I am using SRP (for non-web stuff) and it would be good to know/to check whether I am affected by those problems.


The part about having to use HMAC for keyed hashing is (hopefully) going to be obsolete soon. SHA-3 is explicitly designed to be safe under the H(key+message) usage.


Just had a quick port of his example to Go.

https://bitbucket.org/pkg/passwd/src/27f970fbcc05c0c6c258600...

Anyone fancy taking a look to see if I understood correctly?

EDIT: Specifically, should the PBKDF2 "key", not be used for something (like HMAC) rather than being used as the hash itself?


Alternatively, Go's bcrypt implementation has a very simple API:

    hash, err := bcrypt.GenerateFromPassword([]byte(password), 10)

    err := CompareHashAndPassword(hash, []byte(password))
There's also a Cost(hash) function that you can leverage to re-hash passwords with a higher cost.



1,000 rounds of PBKDF2 is less secure than just using scrypt, and probably less secure than just using bcrypt. But the code looks correct as written.

It seems like a red flag that it took more than 50 lines of code to do this simple thing, though.

EDIT: Specifically, should the PBKDF2 "key", not be used for something (like HMAC) rather than being used as the hash itself?

Using the output of 1,000 rounds of PBKDF2 as an input to HMAC wouldn't get you any extra security.


There are many things wrong with this article. You can't hash secrets like this because hash functions are iterative. You should be using HMAC. Here's a good explanation http://benlog.com/2008/06/19/dont-hash-secrets/


The article you're referencing speaks about extension attacks. The attack allows you to extend a plaintext and calculate the appropriate hash if you know the original hash and the last parts of the original plaintext. These properties do not apply to password. You could extend the original password and calculate the appropriate hash in theory, but that doesn't help the attacker since the password hash is known and in the database.

As a matter of fact, the article you reference even says so: " “Don’t Hash Secrets” is not always entirely necessary. In the password example, you can hash a password as long as you salt it correctly".

Using a HMAC may be useful since you can add an extra secret that's only known to a separate server or a HSM. The original article describes that in the section titled "Impossible-to-crack Hashes: Keyed Hashes and Password Hashing Hardware".


This is a great article though I knew most of what is said.

I'd like to find the same thing for encrypting all the data of users in a DB.


If an attacker already has access to your user's table (as is implied by most of the situations in this article), then they probably have your whole database. What else is there to lose? Why would an attacker bother to use any method to match passwords to hashes? Don't they already have everything they could want?


No, they do not. Maybe an important person's password hash gets leaked and they want to extract the password to check other sites for reuse to get into other systems. Now you say reuse is dumb, and I agree, yet a lot of people do it because they're lazy. This is just an additional stepping stone to make a malicious hacker's life harder.


They have everything they could want from your site, but presumably your site isn't a financial institution. The goal of most hackers is to crack relatively insecure or low-value databases in order to leverage the results for hacking user accounts in better-protected or higher-value databases.


For anybody who doesn't need this much detail, there's a great intro to this topic in the Web Dev course on Udacity: https://www.udacity.com/course/viewer#!/c-cs253/l-48666069/m...


>Always display a generic message like "Invalid username or password." This prevents attackers from enumerating valid usernames without knowing their passwords.

If you have a password recovery page, couldn't attackers use that to test usernames?


Not really. If you mean actual password recovery, then I'd consider the site broken anyhow. If you mean password reset - any site worth its salt in security displays the same message regardless. Usually something like "If the account was found, an email will be sent shortly." I ran into one recently that simply said "An email has been sent" regardless of what was entered, which threw me off a bit as I was entering the wrong email address for that site and wondering why I wasn't getting their email....


How about a registration page instead? Are you going to not tell the user whether the registration worked or not? I've never seen a site that does that.

I've seen tons of sites that tell you "invalid username or password" but will happily tell you if the username is registered if you try to register it. It adds no security and just inconveniences users.


Respectfully, just because hackers can drive a truck through your app doesn't mean they should get to fly a plane, too. Its also easier for your users if you put a CAPTCHA in you registration form rather than your login form. This can mitigate large-scale fishing for usernames. Granted, that hacker who is going after a single poor soul who uses your service won't even feel the speedbump. But s/he is the toughest adversary, and the least common.


Or just trying to register an account? Are you just going to say "that registration may or may not have worked, check your email". I think you should just accept that usernames are public unless your specific instance really needs to hide them.


Possibly.

Some sites will tell you "Email not on account."

I've seen a few that say "We've emailed you instructions on resetting your password."


This isn't "Doing it right".

A. Your Salt should ALWAYS include a feature you have to store anyway about the user that would be unique to your service. Use multiple if you need to. This applies no matter what Hash mechanism you use. Also include a calculated value that isn't stored.

Example:

UserName + PassWord + User ID + CreationDate + <Numerals from a Base64 encode of the users first and last name>

Don't put all of this in the same database, so that a Data Breach won't give an attacker all the data they need.

No Rainbows for that combination. If a user changes their user name update their hash.

This is why in good services you have to provide your password when changing your username even when you are logged in.

B. To Validate the User Name:

Step 1 is always "Check that the password meets your password Criteria. If you have a requirement that passwords are 6-24 characters, contain 1 upper, one lower, and one digit, then make sure the password has those before you do anything else. No database lookups for bad passwords. No running bcrypt on a 4 megabyte section of post data. This is how you keep Denial of service attacks from crushing your service.

C. MD5 and Sha1 are only bad because there are rainbows for them, and because you can calculate every possible combination of values for a user/pass quickly if you know a single user/pass in the system you can calculate the salt methodology if you have access to the source code as well.

I won't tell you to use either of these, but if you lose your source code, and two databases worth of data so they have user, pass, and user data, you did something else horribly wrong, or it was an inside job and nothing would have saved you. (bcrypt is expensive if you have millions of users a day logging in and not any more secure if you did the other things above.)


IMHO this is all unsound advice.

No sane system follows the "store the data in multiple places" design; you'd need to pull it all back together to validate the password anyway, and a breach is extremely likely to compromise a system that can do exactly that.

Similarly, creating a salt from a calculation rather than, say, taking 128 random bits, is not going to help, nor is depending on the secrecy of the source code ever a prudent choice. It will certainly produce less entropy than the latter, will not produce a different salt across password changes, and is just very, very hacky.


I agree with sk5t. You can't count on the source to stay secure in a breach. It only adds unnecessary complexity without adding security. You are simply adding length to the password with your own key lengthening algorithm using values that are still in the database or source. You might think your clever (UserName + PassWord + User ID + CreationDate + ...) function adds strength because the current cracking tool doesn't have your clever function. This clever function can be easily added to the cracking tools thus rendering your clever function useless. What you failed to do was actually add a secret that you can guarantee to keep secret even when the database and source are breached. This is why a secure hardware device like the YubiHSM is recommended as the secret is never revealed even in a breach.


A calculated value means you need source to figure it out. Only PHP and JS/Node scripters assume that their source isn't safe in an attack. The rest of us assume that our compiled code is safe.

(Over generalization, but if people can get your source code basically there is zero you can do to keep them out)

Storing data in more than one store is a best practice.

Your random 128 bits have to be stored somewhere. That means an attacker needs only the database, not the source. In any large scale environment you likely have multiple machines. When a machine gets taken out of service at your hosting provider your database might still be on it when it hits the dumpster. If your source code is on another machine it doesn't matter.

The same is true of the Private User Data and the passwords. two locations means you need two breaches to do anything.


>A calculated value means you need source to figure it out.

As I mentioned in my other post, it really doesn't. Using a calculated value means that the entropy of your salt is only as high as the values used to calculate it.

>Only PHP and JS/Node scripters assume that their source isn't safe in an attack.

So, the vast majority of websites.

>The rest of us assume that our compiled code is safe.

Why would you assume that?

>if people can get your source code basically there is zero you can do to keep them out

If people have your source code, and your passwords are stored properly, they aren't going to have any easier of a time cracking them.

>Your random 128 bits have to be stored somewhere. That means an attacker needs only the database, not the source.

The whole point of a salt is that it doesn't have to be secret.


The cracking community, unlike the bulk of the mainstream programming community, has not lost the art of disassembling executable code. Hiding source code provides negligible additional security.


"Source" or "Execution". Your Database isn't on the machine executing the code.

A breach of your Data should not be a breach of your code.


It still confers a negligible advantage. If the attackers have gotten into your data server, why shouldn't they be able to get into your application server just as easily? All you're doing is slightly inconveniencing the attacker before they can derive your salt wholesale.

Always remember: a secure system is secure even if the attacker can see your source code. Any work you do based on assuming the opposite is a waste of time.


Really? How about because my database server could be something like DynamoDB that I don't own. Why would the execution and the Data server have the same vulnerabilities. The whole reason to have two servers is so that you can say. "The only thing that can talk to the database is a server on the 'Inside'", but injection attacks mean you could possibly exploit the data server with no access to the execution server.


Even so, you're proposing a very low entropy salt. If the attacker can crack one or two weak passwords, they can start limiting their search space immediately, and crack more passwords. As they gain more examples to work with, they will be able to work out how your salt is derived, and then they can start looking for salt collisions based only on the data that is stored in the clear.

At that point, your attacker has not only the ability to derive the salt and attack each password individually, but potentially the ability to generate rainbow tables for subsets of users with identical (or largely identical) derived salts.

The point is that at best, you're adding a term to the overall hardness of the attack. When you use something like bcrypt, you are multiplying the hardness of the attack. There is no comparison between what you've proposed with MD5 or SHA-1 and simply using bcrypt on a random salt, even if the latter has your source code in the latter but not the former.


>Your Salt should ALWAYS include a feature you have to store anyway about the user that would be unique to your service.

What is your rationale for this? Is your concern is that a random salt will accidentally collide? With a 128 bit salt, there is a one in a million chance of a collision occurring in a set of 26 quadrillion.

>UserName + PassWord + User ID + CreationDate + <Numerals from a Base64 encode of the users first and last name>

That is just incredibly low entropy compared to a random string of bits. If you do this and use (as you suggest is just fine further down) MD5 or SHA1, you are begging for disaster if your site is worth attacking.

>MD5 and Sha1 are only bad because there are rainbows for them

Wrong. MD5 and SHA-1 both have significant vulnerabilities.

>bcrypt is ... not any more secure if you did the other things above

Insanely, completely, ridiculously wrong. MD5 and SHA-1 are stupid fast. A GPU-based cracker can try 5.6 billion MD5 hashes per second, and 2.3 billion SHA-1 hashes per second. If you're using a long random salt, you might hold out for a while after a data breach before all of your passwords are cracked. If you derive a low-entropy salt the way you explained above, you'll be completely boned.


This is all terrible advice. Please stop giving it.


Pretty good article, I especially like the idea behind doing linear time comparison using XOR.

Although, the XOR comparison implementation seems flawed as it will only compare min(supplied password length, existing password length); which would allow the attacker to identify the existing password length by providing a sufficiently long password. Instead the existing password should be padded to the length of the supplied password in order to hide the length of the existing password.

My issue with the XOR comparison implementation is irrelevant as password hashing should use stretching (ex. PBKDF2, BCrypt, SCrypt) which means the hash of the supplied password and that of the existing password would be the same length.

The implementation of the PBKDF2 is also flawed.

Iterations: The recommended PBKDF2 iterations has long surpassed 10,000 (it's closer to 100,00 now). See here: http://security.stackexchange.com/questions/3959/recommended...

The rule of them is that given a sufficient length salt, the number of iterations should take about 8ms on the hardware it is running on.

Salt Size: The recommended salt size is 128-bits/16 bytes (not 24). See here: http://security.stackexchange.com/questions/17994/with-pbkdf...

That stackexchange question also recommends using SHA512 as it requires 64-bit arithmetic operations which GPU's are supposidly not great at.

I believe I read stackoverflow and most big websites store about 24 bytes of the hash. The salt is generally prefixed to the hash and that is stored (ex. salt size 16 bytes + password hash 24 bytes = 40 bytes).

If I wanted to version a stored password, I'd simply use the first byte as an indexer to select a password hashing function instead of prefixing the hash with the number of iterations which seems non-portable.

Even if you do everything right concerning the hashing of passwords, account security extends beyond passwords - such as alternative methods of authenticating (forgot password, secret questions, authentication tokens). OWASP is a great authority in regards to this: https://www.owasp.org/index.php/Password_Storage_Cheat_Sheet


Constant-time comparisons of password hashes are a regular feature of password hashing discussions, mostly because timing attacks are one of the few crypto attacks that developers can get their heads around immediately. But they're practically irrelevant to password hashing†. It does not matter if you use a constant time comparison for your password hashes; to see why, try to design the actual timing attack on the password hash compare (if you've never done this before, start by demonstrating to yourself that you can design a timing attack against an HMAC comparison).

The OWASP site's password storage guidance is particularly bad, and I recommend that developers avoid that page altogether. The page began as a litany of very bad advice (reversible encryption, salted hashes), and when it was pointed out that secure password hashes like scrypt were the right answer, the authors decided instead to "teach the controversy". OWASP is very political, and not managed by computer scientists.

The post we're commenting on inaccurately says that the hypothetical timing attack it describes has been done before, but links to Boneh's TLS timing attack paper. My guess is that the password hash attack the author was considering has never been tried; among other things, it stipulates a password hash oracle for the attacker.


Simply: performing a timing attack against a hash function implies performing a second-preimage attack against the hash function.


Password hashing with salt was one of the things I first implemented when switching to Node.js 3 years ago. It's an essential part of local password storage.


Nice reading, bookmarked.


Very helpful and well written article.


No Perl example? Sob :(


Dry your tears! A Perl example can be found in the synopsis of this fine POD file:

http://search.cpan.org/~gray/Crypt-Scrypt-0.05/lib/Crypt/Scr...


site down. yet again a static html file is served from a private server.


Excuse my ignorance, can you please elaborate. What's the alternative?


He is probably suggesting that it should be served from large CDNs so it wont go down


hn could do that on behalf of the sites it links: http://www.coralcdn.org


nice article. password hashing - doing it right && cracking it right

http://www.linuxx.eu/2014/01/how-to-hash-and-crack-unix-pass...




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: