PBKDF2 is not a cipher. It's a KDF, and it's almost always used with an HMAC or a cryptographic hash rather than a cipher. The thesis of this article seems to be "PBKDF2 is well understood, where bcrypt is not." In fact, the opposite is probably true.
bcrypt uses a block cipher (blowfish) to create its underlying compression function. Block ciphers are extremely well understood, have been studied to death for years, and are modeled on extremely well understood constructs. They can be used to create cryptographic hash functions, but usually aren't, because they're slow (which we don't care about in this case).
Cryptographic hash functions, by contrast, are not well understood at all. They are "magic" in many ways, and aren't modeled after anything. Many more "bad things" happen in this space than in the block cipher space. The only reason people mess with them at all is because they're faster than block ciphers, which again, we don't care about in this case.
The other appeal to PBKDF2 is because it "comes from RSA." This doesn't feel like an extremely compelling argument, but if we were going to believe it, then why not use the PKCS#12 KDF? PBKDF2 was proposed in PKCS#5, and "12" is a larger number than "5", so if we're going to do what RSA tells us we should do, they're essentially saying we shouldn't actually use PBKDF2.
Is this a common opinion amongst practitioners? The opposite philosophy (e.g., that a random oracle is a "weaker object" than an ideal cipher) underlies some lines of work in the theoretical cryptography literature.
This of course says nothing about how to go about building an actual cipher/hash that withstands all kinds of cryptanalysis.
Edit: apparently not so clear, I'm told: http://arxiv.org/abs/1011.1264
Also, that paper was subsequently shown to be fatally flawed. http://arxiv.org/abs/1011.1264
Anyway, yeah, a theoretical diversion.
No. This is incorrect. This is exactly the right attitude for most developers to have about cryptography, because on a subject as complex as cryptography most developers (including me!) are nowhere near smart enough to understand the ins and outs.
Encouraging people to make their own decisions on subjects they aren't equipped to understand fully is dangerous advice. It leads to all sorts of bad outcomes. People who have to choose between options they don't really understand end up choosing things at random, or on the basis of incomplete or misleading information, or getting seized up by the need to make a choice and choosing nothing at all.
This makes cryptography one of the very few cases where a cargo-cult approach is better than the alternatives. A simple message that "this is the approach people smarter than you agree is correct, use it," repeated consistently, will help more people more completely than dumping them into the deep end of the crypto pool ever will.
I hear that a lot, and it always reminds me of Jante Law. Its a disservice to keep telling people that they are too stupid to understand something. Too ignorant, perhaps - that can be remedied - but everyone is not too stupid to understand crypto. It's simply another field, mostly mathematical, and goof-ups are easy to make and often very costly.
edit: I should also make another point. Cryptography is exactingly and excruciatingly hard to do at industrial strength. I am not recommending people go out and roll their own crypto for production systems. It's possible for the initiated to do right; it's possible to get initiated. The uninitiated almost certainly will goof. I'd like to further point out , which is a discussion and break on a homebrew crypto, for a taste of the difficulties and mathematical sophistication needed.
That's not the point. Specialization - developing a deep understanding requires time and effort. As a developer choosing a way to store passwords securely is one task out of 1,000 I'm responsible for. So I acquire a general understanding of cryptography to make a decision - but that decision relies heavily on experts and consensus. The worst thing I can do as a developer is to go down the rabbit hole of cryptography and spend 10 hours researching an optimal password hashing scheme for my app. Sure, I've begun the journey of having a deep understanding of crypto - but I still have 999 things to do!
Which reminds me I need to get off HN and get some work done .....
But this is precisely what the cult priests don't want you to do. There are only two words you need to know: "use bcrypt". Any attempt to learn more is heresy.
However I can see the problem with such a culture if your boss happens to be a bcrypt-tard and is closed off to discussion/learning.
Related: Which is why I'm against any and all forms of electronic voting. I've done a fair share of crypto (as a user of crypto libraries) and I barely understand how it works. There's ZERO hope for the layperson to understand crypto-based voting systems.
One of the central tenets of American style voting is a public vote count. Using crypto puts the "public vote count" into the hands of a the high priesthood (a few adepts), which is a really bad idea.
Um, yea. That's what "public vote count" means.
As for the technical competence of our nation's election administrators, have you not been paying attention? I've met many many. Great at elections. Terrible at computers. Completely and utterly reliant on the vendors. Who've manifestly demonstrated their complete inability to code their way out of a wet paper bag, much less be entrusted with the foundations of our democracy.
Edit: I'm not sure why I'm getting downvoted - am I misinterpreting what specialist is saying? For another example, look at HTTPS - it secures your communication without any need for knowledge of how it's doing so, you just need to know that if you see the green lock icon you're "safe" (though I'm aware of all the usability issues, like how everyone just ignores warning messages when something goes wrong). Is there something fundamentally different about public voting systems in this regard?
The only electronic voting system I'd probably be OK with is one that is totally open-source, software and hardware. The implementation must be completely transparent.
Even then, fraud will happen. Always has, always will.
By all means, there are different shades of grey. It should not be all black or white.
They all rely on one's vote being lost within a herd of votes. So if your ballot is one of a million, and the ballots are simple with a few races/issues, it's easy to have a secure one-way hash with collisions. (The collisions make it impossible to work backwards to infer how each person voted.)
Alas, real-world elections in the USA are administered at the precinct level. In my state, that's between 0 and 1000 registered voters.
Further, a general election (November) ballot will have dozens of issues and races.
So it's more than likely that any single ballot will be utterly unique. So with the voting systems I studied, it's trivial to infer how everyone voted.
I could imagine crypto-based voting systems working for certain applications. Like corporate shareholder meetings. Or maybe Australian and British style parliamentary elections. (Don't hold me to the last guess, I only have cursory knowledge of their election systems.)
Edit: I discussed these practical issues with one of the grad student authors of a crypto-based voting system. Being completely ignorant of real elections, he had NO IDEA what I was talking about. He denigrated my input; saying our elections should be easily tailored to accommodate crypto. (Good luck with that.)
Maybe it's just me, but I'm of the opinion that people setting out to solve a problem should probably make some token effort to first understand the problem. YMMV.
What happens if that method is flawed? You hear about it as soon as it's discovered and it gets fixed quick.
Option 2: Roll your own stack based on a personal understanding of cryptography.
What happens if that method is flawed? Perhaps only you and an attacker could possibly know such a thing. You have to be ever vigilant and you have to acquire an incredible amount of crypto knowledge. If you ever leave the company they are pretty much fucked from then on until whatever you wrote is replaced.
Don't roll your own system.
But it's not necessary to understand the depths of cryptography beyond that there are various black boxes that have certain properties.
So, by all means, subscribe to a cargo cult for crypto. But pick the cult carefully.
However, the point of the bcrypt argument is not that bcrypt is the best algorithm for certain things, but that it's (at a minimum) about four orders of magnitude better than most people's "secure" password storage algorithm: sha1(password). Because it requires both a salt and a work factor, even a dictionary attack is wildly impractical unless there's a massive flaw discovered in the algorithm.
If developers are going to be trained to pick a specific algorithm for password storage, I'd much prefer bcrypt (no known flaws, many benefits) over sha1 or md5 (designed to be fast for checksumming, salt not required). Might PBKDF2 be a better choice still? Very possibly; I haven't done enough research to intelligently answer - and since this is crypto, I will not best-guess it.
My real point here? The article attacks bcrypt as a key derivation algorithm, but I've never seen someone suggest it to be used in such an application. Even the post that started what you may call the bcrypt movement (http://codahale.com/how-to-safely-store-a-password/) is linked in the article, and it's titled "How to safely store a password". It is NOT titled "How to safely derive encryption keys".
So yeah, I'm calling linkbait.
This is a case where unanimity in the message is important. If all the experts say "use A," people will take that to mean there's no debate about the merits of A over B and C, and use A. If some say "use A" while others say "use B" or "use C", some fraction of listeners will give up and use nothing at all.
Especially with linear congruential generators, it's easy for people who don't know what they're doing to add "extra randomization" that makes the resulting numbers worse than the originals.
It's better for most people to not get fancy and use the standard algorithm.
A modern CSPRNG like the SHA-3 candidate Skein can produce random data at a steady rate of only a few cycles per byte. I wonder if at some point something like that will dethrone MT for non-crypto PRNGs.
But it will be a long time before an alternative is better in a compelling way. And take a look at the list of implementations for the MT:
That's not a single throne, it's a multifaceted empire.
A pithier way to say the same thing is, "you're right, except for the words 'without understanding it'".
That kind of copying is clearly inferior to either gaining the knowledge to make an informed decision or hiring an expert to do it for you, but for most projects that need crypto neither one of those is practical. So, copying the experts is the best practical approach in most cases.
Expecting every programmer who needs to build an application that stores passwords or personal information etc to have a rigorous mathematical understanding of crypto (this probably means having a PhD or similar in the subject) is just plain unrealistic.
It's sort of akin to suggesting you shouldn't allow a mechanic to change the brakes on your car without them having a detailed understanding physics.
I'm not saying you should build your own cryptography, but a good hacker (or a good engineer how we call them in early 90s) should understand difference between bcrypt, PBKDF2, and scrypt. At least to understand why bcrypt is better than salt+SHA-1. And some other aspects of security.
However, if somebody has no clue about cryptography and security then she/he should go with bcrypt - but I'm not sure if that person should be responsible or in business of storing somebody's critical data at all.
Why not just say, "people who spent a lot of their time studying cryptography strongly recommend this approach. They feel the approach that you are considering is simply insecure."?
Framing things in terms of intelligence isn't going to win anyone over, if that's your goal. And it probably isn't accurate, either.
I think "smarter" in this context means more well read about the particular subject of crypto, although people good at that are likely to very intelligence all round too.
A different approach is not necessarily "less secure" it's just that it may have had less people banging on it trying to figure out ways to break it.
scrypt is better than bcrypt.
bcrypt has the advantage of being both very good, and also broadly available on web platforms. scrypt does not yet have that advantage; when it does, I will start saying "just use scrypt".
But the simple fact is: all three of these functions are fine. ANY of them is a huge step forward from what people do without them.
"Just use bcrypt" is 1000x more effective as a meme than "just use adaptive hashing" (which is what all these constructions are).
So, while I have few specific technical qualms with this article (e.g. why do I care whether something is a PKCS standard or not?), the overall message is a bit hyperbolic.
By roughly a factor of 5.
By roughly a factor of 4000.
why do I care whether something is a PKCS standard or not?
You don't care, and I don't care, but I'm sure you know lots of companies which do care (especially since PBKDF2 is a NIST standard too).
Was I wrong?
it reminds me of the vulnerability issues. when apps have no known vulnerabilities, all is fine. when a new "instant root compromise of any system" comes out, its omgomgomg.
Then its fixed, and all is fine again.
Except that vulnerability was always there. And other ones that are yet to be public are there too. And many of them are "omgomgomg" material.
Well crypto is the same. We don't have public data on which algorithm are broken. We just know they will be eventually, by logic or by brute force.
So, take the wise decisions, and don't forget you might eventually need to update it.
That's not to say that all implementations are secure, or that there are not undiscovered mathematical flaws in common algorithms, but the idea that all encryption is brute-forcable given enough AWS instances is just plain incorrect.
To expand on sibling comments: Cryptography essentially depends on the assumption that P=NP (well, not exactly, but...). It's possible, though unlikely, that mathematical discoveries could undermine all possible conventional cryptographic schemes.
As for brute-force, that's a tricky one as well. If you allow a strengthening of Moore's law that says that operations per second per dollar increase exponentially, then you can construct the following "polynomial time" algorithm for any cryptographic problem:
Wait n*k years, where
- n is the problem size in bits, and
- k is a scaling factor to get the exponents to align
Buy a computer
Run the brute force algorithm on your new computer
mathematically-secure cryptography was invented in 1882/1917: http://en.wikipedia.org/wiki/One-time_pad
Fun further reading: http://en.wikipedia.org/wiki/Venona_project
Also, fun reading: The Code Book by Simon Singh, and Spycatcher by Peter Wright
Do you have this written up anywhere? Link?
Here's my opinion on this:
* These numbers are more appropriately expressed exponentially. I.e. "factor of 5" = 2 bits of security.
* 2 bits of security is not significant at all.
* The Scrypt x5000 seems to only apply when your attacker has access to a chip foundry and it buys you maybe about 12 bits of security against such an attacker.
* For comparison, 128 bits of security is often considered a minimum for resisting offline bruteforce attacks (e.g. AES has a 128 bit key). Of course passwords usually don't come anywhere near close to 128 bits of effective security, so 12 bits might make the difference for some of them if you're lucky.
* In a sense, table 1 in the Scrypt paper shows that the relative difference between the functions is less significant than even relatively small variations in password strength. When the defender has 100ms of CPU to spend authenticating each password, all three functions compared cost less than $200 to break an 8 letter password and more than $150M to break a 10 character password.
* Very few attackers are going to actually spend $M++ trying to break your password (cue XKCD strip). A relevant exception might be botnets, the operators of which don't pay the power bill for their computations.
Therefore, I conclude that for most purposes these functions are mostly equivalent and quality-of-implementation and password strength issues dominate in practice.
If your attacker is using GPUs, scrypt probably gives you an even bigger win, due to the compute/memory balance they use.
Note that the defender pays a cost for this too though. Where he could be happily running PBKDF2 or Bcrypt in multiple Apache process on his multicore servers, Scrypt is going to completely trash the L2/L3 caches and saturate the memory bus and make everything else on the server run like a dog.
Scrypt is operating as designed, of course, but it raises the question of whether or not a defender with a busy website on a farm of multicore servers would be able to configure his work factor as high (in terms of single thread benchmark ms) with Scrypt as he would with Bcrypt or PBKDF2.
For instance, right now I'm implementing support for the modern SHA-512 crypt() variation. It doesn't translate well into GPU space at all, which will end up meaning that I can only offer dictionaries that are half a trillion words smaller than formats which are fast on GPUs.
So far the data I'm seeing indicates that differences of that scale in dictionary size really do make a difference on the success rate of the job. So for what it's worth, from that perspective, it is a factor.
Granted, an attacker could use your slow hash for a DOS-attack, but most websites I've seen so far always had some sort of slow operation that was easily exploitable without authentication.
> but most websites I've seen so far always had some sort of slow operation that was easily exploitable without authentication
Yes, but those things are typically easy to optimize or temporarily disable in a hurry once they come under attack. Not so much with authentication.
Then I'd hope that you make extra double sure that your user's passwords are secure. Let's call it the cost of business.
> or a database server where the attacker-facing app doesn't use connection pooling?
We're talking how to store user-passwords here, don't we? If your database server passwords get lost or cracked, change them. Use random-generated passwords, don't reuse them.
>> but most websites I've seen so far always had some sort of slow operation that was easily exploitable without authentication
> Yes, but those things are typically easy to optimize or temporarily disable in a hurry once they come under attack. Not so much with authentication.
That depends on the website. If your major content is available unauthenticated, then you might as well go and disable authentication. My major point was that you can't defend against a DOS attack by using a weaker password hash, the attacker will just throw more requests at you. In a DOS, the attacker does have the advantage of needing less computational ressources.
In practice, the user gets to choose the password and the website at best gets to veto it or accept it without knowing how many other places it's re-used. There aren't too many sites assigning randomly generated passwords right now, I wish there were more.
Yes, some DoS attackers may be able to throw more and more resources at you until you go down. But some don't and you don't have to make it easy for them by preemptively DoSing yourself with too much password hashing! Alternatively: for some fixed amount of attacker DoS resources, your system can support a certain amount of password hash cracking resistance. Cracking resistance is thus a tradeoff with DoS resistance. The root cause of this situation is the poor entropy present in many users' choice of passwords.
Turning off authentication is generally not an option if your site has any data worth securing. If it were, an attacker could bypass your access controls by simply DoSing you until you disabled authentication.
That's right. So I don't see where database connection pooling is part of the issue and that's what my remark about random generated passwords was pointed at. You should use secure passwords to authenticate your app at the database. You should probably use md5  or similar as password hashing scheme for the database credentials since this is where you really have a valid tradeoff between performance and security that allows any attacker to bog down your database. But this issue is not the scope of this discussion.
It's certainly a valid technical concern to not increase your password hashing work factor to a point where this it is a valid attack vector for DoS attacks, my point simply is that you can go quite far in terms of work factor until you reach that limit. Increasing the work-factor to a point where authentication takes 10ms will in many cases still leave other parts of your application more vulnerable.
And well, turning off authentication should imply denying access to data that requires authentication. This certainly is an option if the majority of your data is available for unauthenticated users. It certainly is not if you only store data worth securing.
 as postgres and mysql both do
The primary advantage of Scrypt over the others is that it enters a completely pathological memory locality access pattern and stays there for almost the whole function. This works to neutralize the advantage of an attacker who has a custom CPU because he probably can't also develop a custom RAM subsystem to feed it with (at least not one that's many times more efficient than what the defender has in his server).
But if you've done any performance tuning on multithreaded code, you know that cache effects caused by memory access patterns very quickly begin to dominate as multiple cores and threads are added. Things that look great in single-threaded benchmarks almost never scale linearly and there's probably nothing that will scale worse on our shared-memory multiprocessors than Scrypt. It's a feature.
So the defender (say, a busy website with commodity multicore servers) with Scrypt is likely not going to be able to take as good an advantage of his hardware. He won't be able to crank up the work factor quite as high as he could with Bcrypt or PBKDF2.
This may represent an advantage to the attacker, who doesn't have the additional constraint of keeping the response time up on a busy webserver. This attacker's advantage is probably not significant by cryptographic standards (maybe 2 or 3 bits of security lost), but pathological multithreading could represent a big issue operationally.
I'm honestly not trying to cast FUD on Scrypt here, I think it's the best function. I'm just saying like everything else multithreaded you really need to benchmark it under real-world conditions.
The key issue isn't the design of the RAM subsystem but instead the design of the RAM subsystem -- in particular, making sure the attacker can't "cheat" by using a smaller circuit than the defender.
(Also, 80 bits is usually enough to withstand attack, no?)
A solid 80 bits of security out of any of these functions might turn out to be safe forever. But, in practice, most password databases are going to have some fraction of users choosing passwords straight out of the cracker's dictionary, some fraction that will never ever be cracked, and the smallest fraction being crackable according to the defender's choice of work factor.
(If you're buying in bulk, ASICs are cheaper, but few will be willing to pay for that much cracking power.)
precis: exhaustive search on DES for $1000 in FPGAs from eBay, 2 years ago
DES is in the range of 'costs less than a new phone/iPad' to do an exhaustive search at this point
If I were a bad guy, I would prefer to not have a password-cracking special-purpose supercomputer in my possession. (But I'm not a bad guy, and in fact I would love to have a few around the house. :-)
You seem to be speaking only to its compute time -- what do you say to the article's claim that bcrypt has a higher probability of having an unexpected attack that mitigates its computational complexity?
Also, since all of these algorithms have adjustable work factors, what does it even mean to say that one is stronger than another? Couldn't you just calibrate the work factors so that they are equivalently strong? Though naturally scrypt has strength in another dimension also (memory).
But that's not my issue with the article. My issue with the article is that it takes a simple security issue with no "real" wrong answers and turns it into a tribal conflict, which has the net effect of reducing the number of people who will use adaptive hashing at all.
I'm familiar with Tony Arcieri's work and generally think highly of him; this article, though, is inexplicable and smacks of hipsterism. "I liked Nirvana but then they got popular and sold out, so now I listen to Sleater Kinney". Well, that's going to sound dumb in 10 years.
Let me say yet again that if you use PBKDF2, bcrypt, or scrypt, you are going to look smarter than the average webdev, no matter which one you pick. Do whichever is easiest.
At the very least, I'd rather the meme be: Use bcrypt, scrypt, or PBKDF2.
* It has marginally worse library support and is built out of universally available primitives, which increases the odds that generalist devs will DIY it.
* It is actually faster than bcrypt (see Colin's paper); in other words, even without waiting for a hypothetical research result against bcrypt, PBKDF2 is already "vulnerable".
* PBKDF2 deployments virtually all use SHA2 as their PRF, and PBKDF2/SHA2 is a construction that depends entirely on the security of hash functions; hash functions are more poorly studied than block ciphers.
* Attacker tools are (mostly, but not entirely) built out of preexisting infrastructure and not by cryptographers; of the three functions, the best accelerated brute force support is available for PBKDF2/SHA2. For instance, is there a widely-available GPU implementation of bcrypt?
* The standards process that ran for PBKDF2 did not include the extensive peer review that (say) AES went through, and isn't a significant asset for PBKDF2. Meanwhile, bcrypt had broad deployment long before PBKDF2 was widely deployed, and on higher-value target systems.
You'd rather the meme be "Use bcrypt, scrypt, or PBKDF2". I'm fine with that meme! But that's not what you said. You said "please don't use bcrypt".
PBKDF2 isn't bad. It has one significant asset: you can point to a PKCS standard to convince pointy-haired product managers to accept it into systems. But given the choice between an HN cargo cult and "technology made palatable to enterprise-grade engineering managers", I'll take the cargo cult in this instance.
(Another strength of PBKDF2 that it shares with scrypt but not bcrypt: you can use it as a proper KDF for your AES keys... but note that if you need to generate your own AES keys, you're very likely in trouble for other reasons).
> PBKDF2 deployments virtually all use SHA2 as their PRF, and PBKDF2/SHA2 is a construction that depends entirely on the security of hash functions
A quick look at Wikipedia (and my own recollection) suggests that it may be more commonly used with HMAC-SHA2. Although the HMAC construction is not provably secure  and poorly understood, it seems to be fairly resistant to attack (e.g. HMAC-MD5 is not known to be broken, AFAIK.) Also, iterated hashes are much harder to break than single hashes.
 Bellare has a result based on a nonstandard but not entirely implausible assumption about the underlying hash function, IIRC. But, as you point out, hash functions are poorly understood...
I was imprecise, but my point is just, PBKDF2/xSHAx is a construction that relies entirely on the properties of cryptographic hash algorithms; scrypt and bcrypt rely instead on properties of ciphers.
"SHACAL". Look it up! ;-)
Can you elaborate a little bit on this? Or point to a reference. Wikipedia mentions that PBKDF2 is easier to implement with ASICs or GPUs. Is that the primary reason?
All you have to do is not design your own password hashes with SHA-2 (or Whirlpool or CubeHash or whatever some random Stack Overflow answer says you should use). You can safely keep PBKDF2, bcrypt, and scrypt in your bag of tools --- of those, only scrypt is recent --- and reach for whichever one is easiest.
$ gem install scrypt
Building native extensions. This could take a while...
Successfully installed scrypt-1.0.3
1 gem installed
Still, it's certainly good to keep in mind that there are alternatives, usage profiles and requirements differ and so do the solutions.
But then, unless you give up and don't use any of the three, you can't go wrong with any of these constructions.
* It's got a reliable Gem for Ruby
* It's got a reliable easy_install package for Python
* It's got a good reliable CPAN entry for Perl
* It's got a Java jar file from a reputable source
* It's got a .NET assembly from a reputable source
And what I'm saying is not that scrypt will be "safe to use" when that happens. scrypt is safe to use now; safer, marginally, than bcrypt. What I'm saying is that when that set of things happens, I will personally stop recommending bcrypt and start recommending scrypt. And I only point that out because I always feel a little bad about not recommending scrypt, which is strictly speaking better than bcrypt.
It would be wonderful if someone with more knowledge of the subject could throw up a 1-page site with an appropriate security choice (or a few choices with situations in which each would be more acceptable) for a given range of situations, to establish a 'sane default', taking into account their availability on a number of platforms and programming languages.
Need to sign a message? Use HMAC-SHA1
Need to checksum a file? Use SHA-1
Need to hash a password? Use bcrypt
Need to transmit data over a network? Use SSH2
Need to secure HTTP? Use SSL 1.2 with (these ciphers in order of preference)
Need to secure home WiFi? Use WPA2-PSK
Need to encrypt files? Use GnuPG
Need to do (this type) of encryption? Use CBC. For (this type), use ECB
Need to create a TrueCrypt volume? Use (this cipher) with (this many) bits.
Need to sign a message? Using S/MIME or PGP.
Need to checksum a file? Use SHA256.
Need to hash a password? Use bcrypt, scrypt, or PBKDF2.
Need to transmit data over a network? Use HTTPS/TLS
Need to secure HTTP? Use HTTPS/TLS, preferring AES in CTR and then CBC.
Need to secure home WiFi? Use WPA2PSK.
Need to encrypt files? Use any implementation of PGP.
Need to do (this type) of encryption? Use PGP. Never use AES directly. Never use ECB for anything.
Need to create a TrueCrypt volume? Can't help you; we use PGP.
In this case, the answer probably is OWASP which is a great and often overlooked resource, contributed to by a lot of experts in the area. They have a lot of pages in their wiki that address crypto concerns...
Casting my eye over the recommendations in the pages I linked - yes some of it seems to be a little behind the times (for instance, adaptive hashing isn't mentioned once in terms of securing passwords), but none of it seems outright terrible.
Is this something that the crypto community/experts can come together and improve the same way as the vuln/exploit security community have made OWASP what it is?
Or is the real truth that executing proper crypto techniques are simply to difficult to boil down into a pile of cheatsheets?
I think we may have different notions of what "poorly researched" means.
> […] with an academic pedigree from RSA Labs, you know, the guys who invented much of the cryptographic ecosystem we use today.
Appeal to authority fails a little bit when RSA opens random Excel attachments from unknown untrusted sources - attached to an email that have to be retrieved from the junk mail folder.
EDIT: As other commentators point out, I am wrong to suggest that anything coming from RSA Labs is somehow "weak" because someone at RSA fell for a phishing attack. I do find it odd that a security article suggests "These people are good; they did 'this thing' which everyone uses". That's not a great way to approach choosing crypto components. Even experts make mistakes.
Edit: realized I should probably explain. What an administrative assistant does in a social engineering attach has nothing to do with the quality of cryptographic research at RSA. Humans are ALWAYS the weak point in cryptography.
That sounds pretty much like what happened above. The truth item is whether RSA's cipher chops can be respected. The negative characteristic is an HR person succumbing to a social engineering attack. Ad hominem isn't just calling names.
Let me put it differently: in what way does the HR rep's mistake reduce the quality of ciphers that have come from RSA?
The argument "X is safe because its creator, Y, has a good reputation" can certainly be refuted by attacking the claim that Y has a good reputation. It's not an ad hominem.
The fact that the error by the HR person doesn't actually invalidate RSA's reputation as cryptographers is more a Non sequitur.
Are you sure you know what you're talking about here? I'm not saying you don't, but most HN'ers who would write a comment like yours don't.
(Me just saying it won't be nearly as impactful, because apparently I'm in the tank for bcrypt).
Don't know? This will make a worthwhile 30 minute Googling project. That's what I do when I get in over my head, and I promise, learning about adaptive hashing is going to be more useful than reading court decisions about what does or doesn't constitute a breach of the duty of loyalty for a company director (to cite my last Google dive from HN).
It is possible to point out a logical phalacy while having no beliefs (or, in fact, deep knowledge [or, perhaps, religion?]) regarding what the phalacy pertains to.
So: up for it? Is a couple minutes of Google time and the direct attention from several software security experts to learn lots and lots about key derivation functions and password hashes worth it to you?
Unless you're some blogger who needs to generate some page views, then pick some obscure topic like how to store password hashes and rake muck.
multiple things come to mind:
- Millions Of Flies Can't Be Wrong
- Citation Needed
- JUST USE SCRYPT
That's the way to go.
(and i bet many won't even notice the sarcasm)
Freaking hell, do critise, do research, and that apply to everything, not just crypto. It's not rocket science.
Several people working on small projects have already come to me and asked me "what's this bcrypt thing? should I use it here?" I guarantee these folks would have just stored it plaintext otherwise. So I've directly observed the mantra making stuff safer. Win.
Edit: I only post this to add to the examples of who uses PBKDF2 in addition to what the article lists.
Seeing an article where someone disagrees with the buzzing hive mind is always refreshing for me and made me actually consider, for the first time, that an algorithm aside from sha1(app_salt + user_salt + password) would be a good idea.
I did some research and decided that for PHP, bcrypt is the absolute easiest option to implement. scrypt is too new, PBKDF2 while administratively accepted has much less info on using it in PHP than bcrypt does.
So while I ultimately ended up disagreeing with the author, the article was invaluable in the end.
I'm confused. Why would I pick bcrypt as a key derivation function when there are nice key derivation functions out there that are widely documented?
The whole point of this article is to say that, in fact, there are other options.
It irritates me that despite going through the effort of vouching for PBKDF2 and scrypt every time this f'ing topic comes up on HN, people still manage to reduce this issue to another tribal conflict.
And who are these "crypto noobs" you speak of?
It's important to know what you don't know. And I know enough about crypto to know that I don't know anything and should listen to people who do.
I didn't make it a tribal conflict. If anything, the article did... I was summarizing. I must have missed the part where scrypt was mentioned here, but I have seen it called out on SE.
Then maybe it shouldn't have such an inflammatory title as "Don't use bcrypt". "Alternatives to bcrypt", perhaps.
EDIT: The context was comparing 1. converting passwords into keys for cryptographic purposes and 2. hashing passwords to be used for logins.
PBKDF2 isn't a password hash: the specification doesn't define a storage format for iterated, salted password hashes. It's not that hard to invent one if you already specialize in writing cryptographic software, but most programmers still shouldn't be doing that. It's just too easy to make mistakes that go unnoticed until it's too late.
If you insist on using PBKDF2, then I suggest using my PBKDF2.crypt() implementation at https://github.com/dlitz/python-pbkdf2. I'm not a cryptologist, but I'm the maintainer of PyCrypto, so presumably I can be trusted to do a better-than-average job of this sort of thing. If people want, I'll write a proper spec and add SHA256 support with a different algorithm identifier (the current implementation still uses SHA1).
But really, if you need a password hash, just use bcrypt and get back to writing the code that actually provides value to your user base and differentiates you from your competitors. Bcrypt is good enough for now. This advice might change in the future, so do pay attention, but for now, just use bcrypt.
The crypto space can be intimidating to your average dev, but almost every app needs some sort of protection (at least for user information). I think the author is fair in wanting to push for the "default" to be PBKDF2 instead of bcrypt, but should he really be advocating a less-tested function in the same article?
As for the "branding" question, would you recommend XTEA or MARS or Threefish over AES for someone in need of a block cipher? Of course not. Standards are not always perfect (and many a flaw has been found in standards), but they are generally beneficial.
PBKDF2 also has the advantage of being modular. It takes in an arbitrary PRF (although HMAC-SHA1 is the usual); maybe HMAC-SHA1 turns out to be poor for the job, just plugin a better PRF (hell, you can plugin a provable PRF that reduces to integer factorization or the elliptic curve discrete log). bcrypt is just bcrypt --- a seemingly not too peer-reviewed modification of an ancient cipher, that is not even recommended anymore.
scrypt is better than both, of course. Provable time-memory hardness is great, and should be made standard.
Also, for lay developers, choosing a crypto construct for its modularity is like choosing a smoke detector because it allows you to use different radiological bits in it. You're not supposed to be messing with those bits. The whole point of the package is not to have random developers changing them.
Also, I want you to note something:
You pick AES not because it's a standard but because it's the product of a contest in which many of the world's best cryptographers competed to design the replacement to DES. That's not what PKCS standards are. A PKCS standard is simply something that survived a standards group discussion.
I didn't mean to say that Joe the webdev would be choosing the PRF, that's insane. But whoever providing the (library) implementation would have their life facilitated by having modular primitives, instead of having to code another construct from the ground up.
Edit: True, the AES competition was a much higher-profile event. There is an important difference (this is not a rebuttal, but a remark): AES is a cipher, PBKDF is a construction. AES has no proof of security, nor hope of one: it's a purely heuristic security argument. Through models like the random oracle, however, we could show that the PBKDF construction is secure, if H is secure. In such a case, there is not as much need for a competition, unless you're competing for performance or the like. That said, I would love proofs of security (or show the lack thereof) for PBKDF2.
Event hough hash functions are used in almost every system, we know far less about hash functions than we do about block ciphers. This is one of the failures of the cryptographic community. Compared to block ciphers, very little research has been done on hash functions, and there are not many practical proposals to choose from.†
It is also the reason why we are sponsoring contests to replace SHA2, because the research horizon for the current generation of cryptographic hash functions is... ominous.
You're making a noncontroversial statement (ciphers are better studied than hashes) sound like a controversy. It's not really a controversy. And please note: I didn't bring "conservatism" into this discussion; the blog post we're responding to. If you put it to me directly, I'll say bcrypt is more conservative than PBKDF2/SHA2 (which is what every current PBKDF2 system is going to end up using). But I didn't write a blog post that says "don't use PBKDF2".
† I say this only to make the point that it's not an argument pulled from thin air or from Moxie comments; something that Schneier commits to writing is, I mean to say, very likely to represent conventional wisdom.
I tried to be careful with the wording, precisely to avoid making a common-sense statement into a controversy. Yes, block ciphers are more well-studied (2012-70s > 2012-80s). My point was that the gap is much smaller now than (say) when AES was standardized (I wonder, was that passage also in the "Practical Cryptography" 2003 book?).
As a personal note, I doubt 5-10 years from now SHA-2 will have been compromised for password hashing (or HMAC), though. It has been remarked several times during the process that SHA-2 would have made a great SHA-3 candidate, its only major flaw being length extension attacks. The main fear was that SHA-2 would succumb to the same techniques as SHA-1 and MD5; so far, that has not been the case (perhaps because everyone is fighting it off with the SHA-3 candidates).
Of course, I didn't write a blog post either with anything. I don't blog. Given the choice, I'd give priority to PBKDF2; that's about it. Of course, I'm not a customer-facing developer, so my worries are different.
Trusting blockciphers over hashes based on this high-level argument is suspect, at least in principle. (To be clear, I'm not arguing with your practical recommendations as they seem pragmatically justified.) It could well be the case that the ways we build password hashes from blockciphers appear to be more secure simply because we haven't sufficiently cryptanalyzed blockciphers for the necessary collision-resistance-type properties. Perhaps collision resistance is just too much to hope from any highly-efficient function, and it is only a matter of investing the effort to find collisions in the functions derived from blockciphers. Adding to the suspicion is the fact that blockciphers give us good hashes by coincidence, not design (And formal evidence doesn't help explain the situation - although I see hand-wavy claims about related-key attack resistance being sufficient, in my work I've only found reverse connections).
So are you aware of any deeper reasoning behind the blockciphers-over-hashes argument? Could trusting blockciphers for collision-resistance just be a good usage of security-through-obscurity, because a tremendous amount of effort has been invested in finding collisions in the hash functions but not in the blockciphers?
The good news is that SHA-2 and HMAC remain in pretty decent shape (known limitations notwithstanding).
Bcrypt and Scrypt do not. We could debate the relative quality of the cryptographers behind them all day, but realistically Bcrypt has had at least one widely-deployed implementation bug that caused a real decrease in strength which can be traced back to a lack of proper test vectors.
It's not just that it's endorsed by RSA, no it's actually the NIST recommendation for password hashing and I find it rather unfair that people on this thread turn that against it!
It's the same argument why we generally recommend AES over let's say Twofish or Serpent. We all agreed here that in crypto it's a good thing to be mainstream. And being recommended by NIST makes you Justin Bieber, or not? Standard algorithms may be poor, true, but being a standard has one important advantage: most of the public scrutiny goes into the standard. Much more money and fame there. So it's much more likely that the public will get to know about a flaw in the standard faster than it will get to know about flaws in non-standard algorithms. And that's why I follow standards - even if it's a crappy algorithm, I will know immediately when it's broken and I can react by replacing it right away. The time between an algorithm gets broken and the fact becoming public knowledge is potentially higher the less common an algorithm is. And the time in between being broken and being public knowledge is the most dangerous in my opinion.
I'd like to point out that bcrypt is not equal to Blowfish. It piggybacks on Blowfish's key setup. But note that it just piggybacks, on top of that it further extends the original key setup. Blowfish's key setup was probably never invented to do what bcrypt does now, and the last 30min of googling have not brought up any papers about bcrypt cryptanalysis. Compare that to HMAC. Compare that to using PBKDF2 with HMAC SHA-3 when it's out. I'm not saying that Blowfish or any of its parts are bad, but if not PBKDF2 itself, but then most certainly its building blocks have received a lot more analysis than bcrypt or scrypt. With SHA-3 on the horizon the research community knows a lot more about hash algorithms and there is a lot of research going into these topics. That's why I personally feel safer with a construction that maybe in itself has not received more research than the other two alternatives, but where its building blocks almost certainly have, unless somebody proves me wrong. And when that happens, I'll stand happily corrected and will use the next standard.
AND you lose control of your database. Even if I had a magic instant bcrypt reverser, it does me no good if I don't have the hashes. You cannot be compromised by a bcrypt mistake, it would only make your already existing compromise slightly worse.
You could use a similar argument to recommend Elliptic Curve over RSA, or RSA over Elliptic Curve.
I have seen several people, whenever this comes up, make the mistake of thinking that your whole app could somehow be compromised by the wrong choice of password hash.
I only got religion about it after reading 100 threads from people bragging about how they'd designed their custom password hashes with Whirlpool and AES-256; in other words, as a nerd tic.
What you need out of an encryption package is in the event of being tested for PCI compliance or any legal liability investigation into a breach. You need to be able to say "all of our encryption is done with bcrypt , it's the industry standard an complies with X Y and Z".
The reason that it's important (although as others here have noted, less important than primary application security concerns) is what areas of attack are opened up by using insecure password storage "after" an initial compromise.
This could be something as simple as being a nuisance to users of the system (having to send out those "our password database was compromised, and we didn't do a good job of storing them securely, so you should probably change all of your associated passwords), to something much more serious (using said insecurely stored passwords to attack your other systems for example).
PCI doesn't really care how you're encrypting your data at rest. I cracked the password storage from an application once which was literally just a simple substitution cipher (which was positionally dependent...it was for all intents and purposes as secure as a newspaper cryptogram puzzle). That application was PCI compliant.
While "what you need out of an encryption package" might just be the bare minimum of "cover your ass", that's no reason to settle for insecure password storage.
"Site A" may be less than careful about security since they perceive their data as being of low value (e.g. "register for a chance to win free movie tickets"). But when they get hacked and their users' passwords cracked, it will likely expose plenty of Facebook and online banking credentials.
Is there then any need to do more than a simple salted hash? (Remember, the hypothesis is that all your users are using strong passwords).
PBKDF2 has had longer public exposure, and also features an adjustable CPU work factor (though with a lower theoretical safety-to-compute-time than bcrypt).
scrypt is newer, but features both a CPU and memory work factor (memory-hard algorithm), and is algorithmically superior to both.
I don't know whether or not it's had more public exposure, but I don't think it's had longer public exposure. Afaik, bcrypt's canonical reference document is the June 1999 Usenix paper (http://static.usenix.org/event/usenix99/provos/provos_html/i...), while PBKDF2's is the September 2000 RFC (http://www.ietf.org/rfc/rfc2898.txt).
(If scrypt was trivially installable from a gem for Ruby, easy_install for Python, CPAN for Perl, a jar for Java and a .NET assembly for Microsoft, and all those bits came from sources where I didn't have to manually audit them and make personal attestations for their quality when I recommended them to clients, I would immediately stop recommending bcrypt).
More to the point: what are the tradeoffs you'd consider in choosing one over the other?
(Addressed more to @cperciva...) I'm assuming tarsnap uses scrypt as its actual key derivation function for file encryption and authentication. Why scrypt instead of something else (and I have faith that it's not "not invented here" syndrome)?
Short answer: I think scrypt is an advancement over the class of constructions HKDF belongs to. If you're picking nits about which function to use, use scrypt.
Also, somewhat related question -- what if Salsa core in scrypt is replaced with BLAKE core (with fewer rounds than in hash), and SHA-2 in PBKDF2 with BLAKE, thus making it possibly smaller (hardware and lines of code). Will this work well?
In a very theoretical sense, yes. But Salsa would need to be very very broken in order for that to matter (hence the "no reason to think" comment).
what if Salsa core in scrypt is replaced with BLAKE core (with fewer rounds than in hash), and SHA-2 in PBKDF2 with BLAKE, thus making it possibly smaller (hardware and lines of code). Will this work well?
Probably. I proved the security under the random oracle model, but the property I actually need is approximately "can't be iterated fast", which is a far weaker requirement.
PBKDF2 and scrypt each have supposed upsides to bcrypt and all the benefits.
PBKDF2: RSA tested and widely used.
scrypt: memory hard as well.
"RSA tested and widely used" is subjective, not particularly meaningful, and in some senses erroneous, and so makes a poor case for PBKDF2.
If people want to seriously push for scrypt as a replacement for bcrypt as the "default" function, I'll design and print flags and pennants for the movement. But when people say "use PBKDF2 instead of bcrypt", I think the net effect is to scare people back to salted hashes, and my general response is going to be to poke holes in their arguments.
I just spent the past two days sitting in a workshop on "Special-Purpose Hardware for Attacking Cryptographic Systems" and the most repeating thread from all of the talks was how to deal with the unique memory limitations of GPU's and FPGA's when using them to attack crypto. Bandwidth is the largest one, and specifically the tiny amount of shared memory available to the GPU.
Basically, if you're forced to use "local memory" (which has a huge cost in transaction time), the amount of operations per cycle you can perform goes way down, which in some cases can be the difference between an attack taking "2 years" and "until the heat death of the universe".
The key factor is minimizing the relative advantage that an attacker with focused resources (such as dedicated hardware) is able to gain over the defender.
While CPU speeds, transistor densities, and cache sizes have gone through the roof, the 60-80 ns memory latency of off-chip DRAM has been nearly constant over the last few decades of computing.
"Memory-hard" builds the work factor on that.
I'm no expert opinion, but seems a bit unnecessary and that bcrypt is still a perfectly good choice for most password stores.
Unlike some crypto standards, PBKDF2 is just something too simple and too user-configurable to be able to hide a meaningful hole in. Given that PBKDF1 was found to be less-than-ideal and depracated in favor of the better PBKDF2, it would be a very risky proposition to attempt to weaken it in a way that gave a meaningful advantage to one party over the other.