This article doesn't feel very well done to me. He writes: "The first cipher I'd suggest you consider besides bcrypt is PBKDF2."
PBKDF2 is not a cipher. It's a KDF, and it's almost always used with an HMAC or a cryptographic hash rather than a cipher. The thesis of this article seems to be "PBKDF2 is well understood, where bcrypt is not." In fact, the opposite is probably true.
bcrypt uses a block cipher (blowfish) to create its underlying compression function. Block ciphers are extremely well understood, have been studied to death for years, and are modeled on extremely well understood constructs. They can be used to create cryptographic hash functions, but usually aren't, because they're slow (which we don't care about in this case).
Cryptographic hash functions, by contrast, are not well understood at all. They are "magic" in many ways, and aren't modeled after anything. Many more "bad things" happen in this space than in the block cipher space. The only reason people mess with them at all is because they're faster than block ciphers, which again, we don't care about in this case.
The other appeal to PBKDF2 is because it "comes from RSA." This doesn't feel like an extremely compelling argument, but if we were going to believe it, then why not use the PKCS#12 KDF? PBKDF2 was proposed in PKCS#5, and "12" is a larger number than "5", so if we're going to do what RSA tells us we should do, they're essentially saying we shouldn't actually use PBKDF2.
Now I'm as much a fan of Moxie's as anybody, but I think one part of cryptography that needs to change is this tendency to excessively appeal to authority. Especially when speaking about practical issues.
So, I agree, but am happy he took the time to comment and (reasonably) concerned that his comment would be buried somewhere in the bottom third of the thread. I'm appealing to high quality comments, not authority.
Cryptographic hash functions, by contrast, are not well understood at all. They are "magic" in many ways, and aren't modeled after anything. Many more "bad things" happen in this space than in the block cipher space.
Is this a common opinion amongst practitioners? The opposite philosophy (e.g., that a random oracle is a "weaker object" than an ideal cipher) underlies some lines of work in the theoretical cryptography literature.
> I write this post because I've noticed a sort of "JUST USE BCRYPT" cargo cult... This is absolutely the wrong attitude to have about cryptography.
No. This is incorrect. This is exactly the right attitude for most developers to have about cryptography, because on a subject as complex as cryptography most developers (including me!) are nowhere near smart enough to understand the ins and outs.
Encouraging people to make their own decisions on subjects they aren't equipped to understand fully is dangerous advice. It leads to all sorts of bad outcomes. People who have to choose between options they don't really understand end up choosing things at random, or on the basis of incomplete or misleading information, or getting seized up by the need to make a choice and choosing nothing at all.
This makes cryptography one of the very few cases where a cargo-cult approach is better than the alternatives. A simple message that "this is the approach people smarter than you agree is correct, use it," repeated consistently, will help more people more completely than dumping them into the deep end of the crypto pool ever will.
> No. This is incorrect. This is exactly the right attitude for most developers to have about cryptography, because on a subject as complex as cryptography most developers (including me!) are nowhere near smart enough to understand the ins and outs.
I hear that a lot, and it always reminds me of Jante Law[1]. Its a disservice to keep telling people that they are too stupid to understand something. Too ignorant, perhaps - that can be remedied - but everyone is not too stupid to understand crypto. It's simply another field, mostly mathematical, and goof-ups are easy to make and often very costly.
edit: I should also make another point. Cryptography is exactingly and excruciatingly hard to do at industrial strength. I am not recommending people go out and roll their own crypto for production systems. It's possible for the initiated to do right; it's possible to get initiated. The uninitiated almost certainly will goof. I'd like to further point out [2], which is a discussion and break on a homebrew crypto, for a taste of the difficulties and mathematical sophistication needed.
> Its a disservice to keep telling people that they are too stupid to understand something
That's not the point. Specialization - developing a deep understanding requires time and effort. As a developer choosing a way to store passwords securely is one task out of 1,000 I'm responsible for. So I acquire a general understanding of cryptography to make a decision - but that decision relies heavily on experts and consensus. The worst thing I can do as a developer is to go down the rabbit hole of cryptography and spend 10 hours researching an optimal password hashing scheme for my app. Sure, I've begun the journey of having a deep understanding of crypto - but I still have 999 things to do!
Which reminds me I need to get off HN and get some work done .....
> So I acquire a general understanding of cryptography to make a decision
But this is precisely what the cult priests don't want you to do. There are only two words you need to know: "use bcrypt". Any attempt to learn more is heresy.
Perhaps. But so what? As a developer the buck stops with you. If something goes wrong the boss (or client) will not be impressed by an excuse of "well I just did what someone on the internet said was best practice".
However I can see the problem with such a culture if your boss happens to be a bcrypt-tard and is closed off to discussion/learning.
Cryptography is exactingly and excruciatingly hard to do at industrial strength.
Related: Which is why I'm against any and all forms of electronic voting. I've done a fair share of crypto (as a user of crypto libraries) and I barely understand how it works. There's ZERO hope for the layperson to understand crypto-based voting systems.
One of the central tenets of American style voting is a public vote count. Using crypto puts the "public vote count" into the hands of a the high priesthood (a few adepts), which is a really bad idea.
Regardless of what you think of the kind of people who the government would contract to develop electronic voting machines, I don't think I'd call them "laypeople". Or are you saying that every individual voter needs to understand the inner workings of the system for it to be effective?
Or are you saying that every individual voter needs to understand the inner workings of the system for it to be effective?
Um, yea. That's what "public vote count" means.
As for the technical competence of our nation's election administrators, have you not been paying attention? I've met many many. Great at elections. Terrible at computers. Completely and utterly reliant on the vendors. Who've manifestly demonstrated their complete inability to code their way out of a wet paper bag, much less be entrusted with the foundations of our democracy.
But they don't need to understand the mathematical details of the cryptographic protocol (do they?), which seemed to be what you were implying. It seems to me the details could easily be abstracted away into a sequence of idiot-proof steps, but I don't know much of anything about public voting systems.
Edit: I'm not sure why I'm getting downvoted - am I misinterpreting what specialist is saying? For another example, look at HTTPS - it secures your communication without any need for knowledge of how it's doing so, you just need to know that if you see the green lock icon you're "safe" (though I'm aware of all the usability issues, like how everyone just ignores warning messages when something goes wrong). Is there something fundamentally different about public voting systems in this regard?
I guess you're getting downvoted because that's exactly the point: Every layperson should be able to double check the vote in case of doubt. That includes the nitty gritty details, including the security of the cryptographic protocol. Paper-Ballots are simple. There's not much to understand: Make your cross, count the votes, add up, done. If in doubt, count again. You don't need to trust any expert on anything for that.
I would say that yes, every voter needs to be able to comprehend the system. Counting paper ballots is something anyone of normal intelligence can understand. Auditing them is something everyone can understand. Not so for cryptographic based systems, particularly proprietary systems.
The only electronic voting system I'd probably be OK with is one that is totally open-source, software and hardware. The implementation must be completely transparent.
Even then, fraud will happen. Always has, always will.
Crypto-based voting systems are not designed to work for real-world elections. They work fine for contrived academic studies.
They all rely on one's vote being lost within a herd of votes. So if your ballot is one of a million, and the ballots are simple with a few races/issues, it's easy to have a secure one-way hash with collisions. (The collisions make it impossible to work backwards to infer how each person voted.)
Alas, real-world elections in the USA are administered at the precinct level. In my state, that's between 0 and 1000 registered voters.
Further, a general election (November) ballot will have dozens of issues and races.
So it's more than likely that any single ballot will be utterly unique. So with the voting systems I studied, it's trivial to infer how everyone voted.
I could imagine crypto-based voting systems working for certain applications. Like corporate shareholder meetings. Or maybe Australian and British style parliamentary elections. (Don't hold me to the last guess, I only have cursory knowledge of their election systems.)
Edit: I discussed these practical issues with one of the grad student authors of a crypto-based voting system. Being completely ignorant of real elections, he had NO IDEA what I was talking about. He denigrated my input; saying our elections should be easily tailored to accommodate crypto. (Good luck with that.)
Maybe it's just me, but I'm of the opinion that people setting out to solve a problem should probably make some token effort to first understand the problem. YMMV.
What happens if that method is flawed? You hear about it as soon as it's discovered and it gets fixed quick.
Option 2: Roll your own stack based on a personal understanding of cryptography.
What happens if that method is flawed? Perhaps only you and an attacker could possibly know such a thing. You have to be ever vigilant and you have to acquire an incredible amount of crypto knowledge. If you ever leave the company they are pretty much fucked from then on until whatever you wrote is replaced.
To add on to this, it is of course necessary to have enough familiarity and competence with the basic principles of cryptography to know that you're "doing what everyone else is doing" correctly, instead of just some silly pantomime, such as: http://thedailywtf.com/Articles/Topgrade,-SHA1-Encryption.as...
But it's not necessary to understand the depths of cryptography beyond that there are various black boxes that have certain properties.
The point here is that this particular cargo cult around bcrypt (one subscribed to by some really loud people) has a shaky foundation and does not deserve its reputation. He's offering alternatives that have been better studied.
So, by all means, subscribe to a cargo cult for crypto. But pick the cult carefully.
He's correct in that if you've selected bcrypt for key derivation, there's a good chance you could be doing things better (for one, its output is only 184 bits long; insufficient for AES256) where PBKDF2 works in a way where you can customize the output length.
However, the point of the bcrypt argument is not that bcrypt is the best algorithm for certain things, but that it's (at a minimum) about four orders of magnitude better than most people's "secure" password storage algorithm: sha1(password). Because it requires both a salt and a work factor, even a dictionary attack is wildly impractical unless there's a massive flaw discovered in the algorithm.
If developers are going to be trained to pick a specific algorithm for password storage, I'd much prefer bcrypt (no known flaws, many benefits) over sha1 or md5 (designed to be fast for checksumming, salt not required). Might PBKDF2 be a better choice still? Very possibly; I haven't done enough research to intelligently answer - and since this is crypto, I will not best-guess it.
My real point here? The article attacks bcrypt as a key derivation algorithm, but I've never seen someone suggest it to be used in such an application. Even the post that started what you may call the bcrypt movement (http://codahale.com/how-to-safely-store-a-password/) is linked in the article, and it's titled "How to safely store a password". It is NOT titled "How to safely derive encryption keys".
But just the existence of multiple cargo cults causes damage, because it leads people to assume that "the experts are divided." Which will lead some people to go with the wrong batch of experts, and others to just throw up their hands in confusion and store their passwords in plain text because it's too hard for them to figure out which group of experts is right.
This is a case where unanimity in the message is important. If all the experts say "use A," people will take that to mean there's no debate about the merits of A over B and C, and use A. If some say "use A" while others say "use B" or "use C", some fraction of listeners will give up and use nothing at all.
One more example: pseudorandom number generation (non-cryptographic). There are lots of algorithms out there, and lots more you could design, but the Mersenne twister has pretty much become the first-choice algorithm.
Especially with linear congruential generators, it's easy for people who don't know what they're doing to add "extra randomization" that makes the resulting numbers worse than the originals.
It's better for most people to not get fancy and use the standard algorithm.
MT uses a lot of memory. IIRC, it's multiple KiB where 64 bits ought to do fine.
A modern CSPRNG like the SHA-3 candidate Skein can produce random data at a steady rate of only a few cycles per byte. I wonder if at some point something like that will dethrone MT for non-crypto PRNGs.
The opposite turns out to be true in practice. People have the choice of using a simple crypto library interface (like BouncyCastle PGP) or "really getting to understand" AES, and so end up fielding software with vulnerabilities PGP addressed in the '90s.
A pithier way to say the same thing is, "you're right, except for the words 'without understanding it'".
This is true, but most programmers don't and won't ever get a deep understanding of crypto. For that large majority, it makes complete sense to ask "What do the people who do understand crypto use in a case like this?" and use that.
That kind of copying is clearly inferior to either gaining the knowledge to make an informed decision or hiring an expert to do it for you, but for most projects that need crypto neither one of those is practical. So, copying the experts is the best practical approach in most cases.
It depends what level you are using it on.
If you are writing crypto libraries on your own then sure you are at significant risk of screwing it up, but a good crypto library should have good defaults that will provide proper security.
Expecting every programmer who needs to build an application that stores passwords or personal information etc to have a rigorous mathematical understanding of crypto (this probably means having a PhD or similar in the subject) is just plain unrealistic.
It's sort of akin to suggesting you shouldn't allow a mechanic to change the brakes on your car without them having a detailed understanding physics.
I can use a microwave without any understanding of its functioning, or I can try to understand electromagnetism well enough to build my own. I know which method I'd trust not to render me sterile.
It is very dangerous to be ignorant about critical computer science fields like cryptography and just go with the cult.
I'm not saying you should build your own cryptography, but a good hacker (or a good engineer how we call them in early 90s) should understand difference between bcrypt, PBKDF2, and scrypt. At least to understand why bcrypt is better than salt+SHA-1. And some other aspects of security.
However, if somebody has no clue about cryptography and security then she/he should go with bcrypt - but I'm not sure if that person should be responsible or in business of storing somebody's critical data at all.
Why not just say, "people who spent a lot of their time studying cryptography strongly recommend this approach. They feel the approach that you are considering is simply insecure."?
Framing things in terms of intelligence isn't going to win anyone over, if that's your goal. And it probably isn't accurate, either.
He's probably using "smart" in the sense of "accumulated knowledge in a particular space" instead of in the sense of raw mental horsepower. http://news.ycombinator.com/item?id=3583985
Yes, that's right. It doesn't mean that you are hopelessly non-cognitive if you can't understand the complexities of cryptography. (As I said, I certainly can't.) It just means that unless you are one of the very few people who are 1) exceptionally mathematically talented and 2) able to have spent your entire life studying the subject, it's unlikely that you could make an informed choice.
By all means, use different wording when passing this message along if you're worried about hurting the recipient's fee-fees. But you won't be doing them any favors if you change the message to a Stuart Smalley-style "You can do it! You're good enough! You're smart enough! And gosh darn it, people like you!" Because on this specific subject, the odds are very, very, very unlikely that they actually are.
I think "smarter" in this context means more well read about the particular subject of crypto, although people good at that are likely to very intelligence all round too.
A different approach is not necessarily "less secure" it's just that it may have had less people banging on it trying to figure out ways to break it.
bcrypt has the advantage of being both very good, and also broadly available on web platforms. scrypt does not yet have that advantage; when it does, I will start saying "just use scrypt".
But the simple fact is: all three of these functions are fine. ANY of them is a huge step forward from what people do without them.
"Just use bcrypt" is 1000x more effective as a meme than "just use adaptive hashing" (which is what all these constructions are).
So, while I have few specific technical qualms with this article (e.g. why do I care whether something is a PKCS standard or not?), the overall message is a bit hyperbolic.
For what it's worth: when dealing with banks and financial firms, where any technology without an adequately staid and reassuring web presence is frowned upon, we happily recommend PBKDF2. There's nothing "wrong" with PBKDF2. It's just not as good as bcrypt.
When we were looking at password hashing, and the choice came down to bcrypt or scrypt (about a year ago, so recently enough), I said we should go for bcrypt because scrypt was comparatively new; inasmuch as it makes a difference, it's just had less time to be attacked.
its a sane decision. just make sure you implement properly.
like everything crypto, it will be broken, eventually. but its a safer choice than a new algoritm.
it reminds me of the vulnerability issues. when apps have no known vulnerabilities, all is fine. when a new "instant root compromise of any system" comes out, its omgomgomg.
Then its fixed, and all is fine again.
Except that vulnerability was always there. And other ones that are yet to be public are there too. And many of them are "omgomgomg" material.
Well crypto is the same. We don't have public data on which algorithm are broken. We just know they will be eventually, by logic or by brute force.
So, take the wise decisions, and don't forget you might eventually need to update it.
This is a naive attitude. Mathematically-secure cryptography with an implementation that avoids all side-channel attacks is unbreakable, as in, would take more time than the projected heat death of the universe to brute force.
That's not to say that all implementations are secure, or that there are not undiscovered mathematical flaws in common algorithms, but the idea that all encryption is brute-forcable given enough AWS instances is just plain incorrect.
> Mathematically-secure cryptography ... is unbreakable
To expand on sibling comments: Cryptography essentially depends on the assumption that P=NP (well, not exactly, but...). It's possible, though unlikely, that mathematical discoveries could undermine all possible conventional cryptographic schemes.
As for brute-force, that's a tricky one as well. If you allow a strengthening of Moore's law that says that operations per second per dollar increase exponentially, then you can construct the following "polynomial time" algorithm for any cryptographic problem:
Wait n*k years, where
- n is the problem size in bits, and
- k is a scaling factor to get the exponents to align
Buy a computer
Run the brute force algorithm on your new computer
I don't think I follow your definition of 'Mathematically-secure' is this even an attainable definition? To expect all crypto to be eventually broken is an attitude with foresight.
* These numbers are more appropriately expressed exponentially. I.e. "factor of 5" = 2 bits of security.
* 2 bits of security is not significant at all.
* The Scrypt x5000 seems to only apply when your attacker has access to a chip foundry and it buys you maybe about 12 bits of security against such an attacker.
* For comparison, 128 bits of security is often considered a minimum for resisting offline bruteforce attacks (e.g. AES has a 128 bit key). Of course passwords usually don't come anywhere near close to 128 bits of effective security, so 12 bits might make the difference for some of them if you're lucky.
* In a sense, table 1 in the Scrypt paper shows that the relative difference between the functions is less significant than even relatively small variations in password strength. When the defender has 100ms of CPU to spend authenticating each password, all three functions compared cost less than $200 to break an 8 letter password and more than $150M to break a 10 character password.
* Very few attackers are going to actually spend $M++ trying to break your password (cue XKCD strip). A relevant exception might be botnets, the operators of which don't pay the power bill for their computations.
Therefore, I conclude that for most purposes these functions are mostly equivalent and quality-of-implementation and password strength issues dominate in practice.
Note that the defender pays a cost for this too though. Where he could be happily running PBKDF2 or Bcrypt in multiple Apache process on his multicore servers, Scrypt is going to completely trash the L2/L3 caches and saturate the memory bus and make everything else on the server run like a dog.
Scrypt is operating as designed, of course, but it raises the question of whether or not a defender with a busy website on a farm of multicore servers would be able to configure his work factor as high (in terms of single thread benchmark ms) with Scrypt as he would with Bcrypt or PBKDF2.
I tend to think of this stuff from the cloudcracker.com perspective, which is all about cheap jobs rather than millions spent. I rarely do brute force because, honestly, it's rarely necessary. But the single biggest factor in cloudcracker.com support of a hash format really is its efficiency in GPU space.
For instance, right now I'm implementing support for the modern SHA-512 crypt() variation. It doesn't translate well into GPU space at all, which will end up meaning that I can only offer dictionaries that are half a trillion words smaller than formats which are fast on GPUs.
So far the data I'm seeing indicates that differences of that scale in dictionary size really do make a difference on the success rate of the job. So for what it's worth, from that perspective, it is a factor.
I've yet to see a website that get bogged down with authentication request in normal operation, even if those request measure in the multiples of 10ms. The basic idea with all those "slow" password hashes is that authentication is a pretty rare request compared to all other request. Usually authentication requires a single request, at most a handfull. If you run into trouble with bcrypt/scrypt on your webservers, you're doing things wrong. If you have that many authentication requests, direct them to a dedicated server - you'll be in the league that has a large farm running anyways.
Granted, an attacker could use your slow hash for a DOS-attack, but most websites I've seen so far always had some sort of slow operation that was easily exploitable without authentication.
OK, but what if you're a web app that specializes in authentication (like, say, an Oath provider) or a database server where the attacker-facing app doesn't use connection pooling?
> but most websites I've seen so far always had some sort of slow operation that was easily exploitable without authentication
Yes, but those things are typically easy to optimize or temporarily disable in a hurry once they come under attack. Not so much with authentication.
> OK, but what if you're a web app that specializes in authentication (like, say, an Oath provider)
Then I'd hope that you make extra double sure that your user's passwords are secure. Let's call it the cost of business.
> or a database server where the attacker-facing app doesn't use connection pooling?
We're talking how to store user-passwords here, don't we? If your database server passwords get lost or cracked, change them. Use random-generated passwords, don't reuse them.
>> but most websites I've seen so far always had some sort of slow operation that was easily exploitable without authentication
> Yes, but those things are typically easy to optimize or temporarily disable in a hurry once they come under attack. Not so much with authentication.
That depends on the website. If your major content is available unauthenticated, then you might as well go and disable authentication. My major point was that you can't defend against a DOS attack by using a weaker password hash, the attacker will just throw more requests at you. In a DOS, the attacker does have the advantage of needing less computational ressources.
The discussion is about best practices and the relative merits of password hashing functions. So the baseline assumption is that the server-side password database isn't perfectly secure.
In practice, the user gets to choose the password and the website at best gets to veto it or accept it without knowing how many other places it's re-used. There aren't too many sites assigning randomly generated passwords right now, I wish there were more.
Yes, some DoS attackers may be able to throw more and more resources at you until you go down. But some don't and you don't have to make it easy for them by preemptively DoSing yourself with too much password hashing! Alternatively: for some fixed amount of attacker DoS resources, your system can support a certain amount of password hash cracking resistance. Cracking resistance is thus a tradeoff with DoS resistance. The root cause of this situation is the poor entropy present in many users' choice of passwords.
Turning off authentication is generally not an option if your site has any data worth securing. If it were, an attacker could bypass your access controls by simply DoSing you until you disabled authentication.
> The discussion is about best practices and the relative merits of password hashing functions. So the baseline assumption is that the server-side password database isn't perfectly secure.
That's right. So I don't see where database connection pooling is part of the issue and that's what my remark about random generated passwords was pointed at. You should use secure passwords to authenticate your app at the database. You should probably use md5 [1] or similar as password hashing scheme for the database credentials since this is where you really have a valid tradeoff between performance and security that allows any attacker to bog down your database. But this issue is not the scope of this discussion.
It's certainly a valid technical concern to not increase your password hashing work factor to a point where this it is a valid attack vector for DoS attacks, my point simply is that you can go quite far in terms of work factor until you reach that limit. Increasing the work-factor to a point where authentication takes 10ms will in many cases still leave other parts of your application more vulnerable.
And well, turning off authentication should imply denying access to data that requires authentication. This certainly is an option if the majority of your data is available for unauthenticated users. It certainly is not if you only store data worth securing.
Yep, I work close to an extremely high traffic OAuth endpoint. I think scrypt would probably "cost" a lot more in terms of actual servers required to operate it.
scrypt is tunable; you can make it use as much or as little CPU time as you want. For any particular amount of CPU time, scrypt will give you more security than bcrypt or PBKDF2 would give you for the same amount of CPU time.
Perhaps it's more accurate to measure scrypt in terms of cache misses rather than CPU time? This is a different resource on multicore servers with different performance implications (at least if we're down to counting 3 or 4 bits of security).
I thought the definition of "better" in this case is that it requires less work to get the same computational strength with scrypt. Are you saying that the memory locality issues that scrypt causes more than cancel out the computational win?
In this context, work is computational strength so what we're mainly concerned about is an attacker finding a way to do the task significantly more efficiently than the defender. E.g., if the attacker can evaluate the function with half the cost on his power bill relative to the defender, then that can be thought of as knocking off 1 bit of security off the top.
The primary advantage of Scrypt over the others is that it enters a completely pathological memory locality access pattern and stays there for almost the whole function. This works to neutralize the advantage of an attacker who has a custom CPU because he probably can't also develop a custom RAM subsystem to feed it with (at least not one that's many times more efficient than what the defender has in his server).
But if you've done any performance tuning on multithreaded code, you know that cache effects caused by memory access patterns very quickly begin to dominate as multiple cores and threads are added. Things that look great in single-threaded benchmarks almost never scale linearly and there's probably nothing that will scale worse on our shared-memory multiprocessors than Scrypt. It's a feature.
So the defender (say, a busy website with commodity multicore servers) with Scrypt is likely not going to be able to take as good an advantage of his hardware. He won't be able to crank up the work factor quite as high as he could with Bcrypt or PBKDF2.
This may represent an advantage to the attacker, who doesn't have the additional constraint of keeping the response time up on a busy webserver. This attacker's advantage is probably not significant by cryptographic standards (maybe 2 or 3 bits of security lost), but pathological multithreading could represent a big issue operationally.
I'm honestly not trying to cast FUD on Scrypt here, I think it's the best function. I'm just saying like everything else multithreaded you really need to benchmark it under real-world conditions.
scrypt's memory access pattern isn't particularly pathological; it's random, sure, but it reads large blocks.
The key issue isn't the design of the RAM subsystem but instead the design of the RAM subsystem -- in particular, making sure the attacker can't "cheat" by using a smaller circuit than the defender.
You're right, but don't forget Deep Crack. (FPGA-)hardware-based attacks have only become cheaper since then; I would be surprised if the NSA doesn't have a few arrays.
(Also, 80 bits is usually enough to withstand attack, no?)
Yep. There are off-the-shelf FPGA arrays available. Still, bad guys would probably find it much cheaper to rent botnet time for a $0.02/(host*day) or whatever the going rate is.
A solid 80 bits of security out of any of these functions might turn out to be safe forever. But, in practice, most password databases are going to have some fraction of users choosing passwords straight out of the cracker's dictionary, some fraction that will never ever be cracked, and the smallest fraction being crackable according to the defender's choice of work factor.
COPACOBANA cost ~$10 000 and apparently is as fast as 2500 PCs for the DES cracking it's optimized for, so ~$4/PC-work-unit plus insignificant power costs. You'd need to find someone with experience with implementing crypto in hardware, though. On the other hand, botnets risk detection.
(If you're buying in bulk, ASICs are cheaper, but few will be willing to pay for that much cracking power.)
Yeah, I suspect it depends on your bad guys (er, sorry, "threat model") whether or not they feel more comfortable trying to buy botnet time or acquire $10,000 worth of FPGAs in an untraceable way.
If I were a bad guy, I would prefer to not have a password-cracking special-purpose supercomputer in my possession. (But I'm not a bad guy, and in fact I would love to have a few around the house. :-)
You seem to be speaking only to its compute time -- what do you say to the article's claim that bcrypt has a higher probability of having an unexpected attack that mitigates its computational complexity?
Also, since all of these algorithms have adjustable work factors, what does it even mean to say that one is stronger than another? Couldn't you just calibrate the work factors so that they are equivalently strong? Though naturally scrypt has strength in another dimension also (memory).
It's nonsensical. It essentially argues that SHA2 is less likely to have cryptographic results relevant to hashing in the next 5 years than Blowfish. It's also an argument the post doesn't support with any actual evidence.
But that's not my issue with the article. My issue with the article is that it takes a simple security issue with no "real" wrong answers and turns it into a tribal conflict, which has the net effect of reducing the number of people who will use adaptive hashing at all.
I'm familiar with Tony Arcieri's work and generally think highly of him; this article, though, is inexplicable and smacks of hipsterism. "I liked Nirvana but then they got popular and sold out, so now I listen to Sleater Kinney". Well, that's going to sound dumb in 10 years.
Let me say yet again that if you use PBKDF2, bcrypt, or scrypt, you are going to look smarter than the average webdev, no matter which one you pick. Do whichever is easiest.
The problem is that PBKDF2 isn't superior to bcrypt in every regard:
* It has marginally worse library support and is built out of universally available primitives, which increases the odds that generalist devs will DIY it.
* It is actually faster than bcrypt (see Colin's paper); in other words, even without waiting for a hypothetical research result against bcrypt, PBKDF2 is already "vulnerable".
* PBKDF2 deployments virtually all use SHA2 as their PRF, and PBKDF2/SHA2 is a construction that depends entirely on the security of hash functions; hash functions are more poorly studied than block ciphers.
* Attacker tools are (mostly, but not entirely) built out of preexisting infrastructure and not by cryptographers; of the three functions, the best accelerated brute force support is available for PBKDF2/SHA2. For instance, is there a widely-available GPU implementation of bcrypt?
* The standards process that ran for PBKDF2 did not include the extensive peer review that (say) AES went through, and isn't a significant asset for PBKDF2. Meanwhile, bcrypt had broad deployment long before PBKDF2 was widely deployed, and on higher-value target systems.
You'd rather the meme be "Use bcrypt, scrypt, or PBKDF2". I'm fine with that meme! But that's not what you said. You said "please don't use bcrypt".
PBKDF2 isn't bad. It has one significant asset: you can point to a PKCS standard to convince pointy-haired product managers to accept it into systems. But given the choice between an HN cargo cult and "technology made palatable to enterprise-grade engineering managers", I'll take the cargo cult in this instance.
(Another strength of PBKDF2 that it shares with scrypt but not bcrypt: you can use it as a proper KDF for your AES keys... but note that if you need to generate your own AES keys, you're very likely in trouble for other reasons).
> PBKDF2 deployments virtually all use SHA2 as their PRF, and PBKDF2/SHA2 is a construction that depends entirely on the security of hash functions
A quick look at Wikipedia (and my own recollection) suggests that it may be more commonly used with HMAC-SHA2. Although the HMAC construction is not provably secure [1] and poorly understood, it seems to be fairly resistant to attack (e.g. HMAC-MD5 is not known to be broken, AFAIK.) Also, iterated hashes are much harder to break than single hashes.
[1] Bellare has a result based on a nonstandard but not entirely implausible assumption about the underlying hash function, IIRC. But, as you point out, hash functions are poorly understood...
You're right, but the kinds of research results that would accelerate a brute force PBKDF2/HMAC-SHA2 cracker are a superset of the results that would jeopardize HMAC-SHA2 as a MAC. Not to downplay it (it's very important that you get HMAC right) but HMAC-SHA2 is just double-applying SHA2.
I was imprecise, but my point is just, PBKDF2/xSHAx is a construction that relies entirely on the properties of cryptographic hash algorithms; scrypt and bcrypt rely instead on properties of ciphers.
Can you elaborate a little bit on this? Or point to a reference. Wikipedia mentions that PBKDF2 is easier to implement with ASICs or GPUs. Is that the primary reason?
bcrypt is something like a decade old. You don't have to stay up-to-date on it. The vulnerability that bcrypt accounts for dates back to 1972.
All you have to do is not design your own password hashes with SHA-2 (or Whirlpool or CubeHash or whatever some random Stack Overflow answer says you should use). You can safely keep PBKDF2, bcrypt, and scrypt in your bag of tools --- of those, only scrypt is recent --- and reach for whichever one is easiest.
Still I'd stick with bcrypt for ruby. It's in ActiveModel, so it's in Rails. It's in Authlogic. It's the default password storage in Datamapper. I can just point my fellow developer towards the documentation and say "use bcrypt" and be reasonably sure that a basically competent developer will get it right. That's a good thing in my book. Neither scrypt or PBKDF2 have that level of integration so far. When that changes, I'll reevaluate my decision.
Still, it's certainly good to keep in mind that there are alternatives, usage profiles and requirements differ and so do the solutions.
Isn't that a self-fulfilling prophecy? If you buying into the "just use bcrypt" idea, how will scrypt ever change the fact that it's not widely available on the web?
I addressed this directly downthread. I don't care whether TweetYourCatFood.com uses bcrypt. Specifically, I care that:
* It's got a reliable Gem for Ruby
* It's got a reliable easy_install package for Python
* It's got a good reliable CPAN entry for Perl
* It's got a Java jar file from a reputable source
* It's got a .NET assembly from a reputable source
And what I'm saying is not that scrypt will be "safe to use" when that happens. scrypt is safe to use now; safer, marginally, than bcrypt. What I'm saying is that when that set of things happens, I will personally stop recommending bcrypt and start recommending scrypt. And I only point that out because I always feel a little bad about not recommending scrypt, which is strictly speaking better than bcrypt.
I'm a web developer, so while security is obviously a huge concern, it's not my main area of expertise. I don't have the knowledge to evaluate the pros and cons of each cipher, and the situations in which it's appropriate to use each one.
It would be wonderful if someone with more knowledge of the subject could throw up a 1-page site with an appropriate security choice (or a few choices with situations in which each would be more acceptable) for a given range of situations, to establish a 'sane default', taking into account their availability on a number of platforms and programming languages.
For example:
Need to sign a message? Use HMAC-SHA1
Need to checksum a file? Use SHA-1
Need to hash a password? Use bcrypt
Need to transmit data over a network? Use SSH2
Need to secure HTTP? Use SSL 1.2 with (these ciphers in order of preference)
Need to secure home WiFi? Use WPA2-PSK
Need to encrypt files? Use GnuPG
Need to do (this type) of encryption? Use CBC. For (this type), use ECB
Need to create a TrueCrypt volume? Use (this cipher) with (this many) bits.
Need to sign a message? Using S/MIME or PGP.
Need to checksum a file? Use SHA256.
Need to hash a password? Use bcrypt, scrypt, or PBKDF2.
Need to transmit data over a network? Use HTTPS/TLS
Need to secure HTTP? Use HTTPS/TLS, preferring AES in CTR and then CBC.
Need to secure home WiFi? Use WPA2PSK.
Need to encrypt files? Use any implementation of PGP.
Need to do (this type) of encryption? Use PGP. Never use AES directly. Never use ECB for anything.
Need to create a TrueCrypt volume? Can't help you; we use PGP.
I guess the question is - who do you trust to make these calls?
In this case, the answer probably is OWASP which is a great and often overlooked resource, contributed to by a lot of experts in the area. They have a lot of pages in their wiki that address crypto concerns...
Casting my eye over the recommendations in the pages I linked - yes some of it seems to be a little behind the times (for instance, adaptive hashing isn't mentioned once in terms of securing passwords), but none of it seems outright terrible.
Is this something that the crypto community/experts can come together and improve the same way as the vuln/exploit security community have made OWASP what it is?
Or is the real truth that executing proper crypto techniques are simply to difficult to boil down into a pile of cheatsheets?
I think he meant it hasn't undergone the same level of public scrutiny. You've certainly spent ample time researching it, and it's obviously been tested, but possibly not as much something like AES.
> […] with an academic pedigree from RSA Labs, you know, the guys who invented much of the cryptographic ecosystem we use today.
Appeal to authority fails a little bit when RSA opens random Excel attachments from unknown untrusted sources - attached to an email that have to be retrieved from the junk mail folder.
EDIT: As other commentators point out, I am wrong to suggest that anything coming from RSA Labs is somehow "weak" because someone at RSA fell for a phishing attack. I do find it odd that a security article suggests "These people are good; they did 'this thing' which everyone uses". That's not a great way to approach choosing crypto components. Even experts make mistakes.
Edit: realized I should probably explain. What an administrative assistant does in a social engineering attach has nothing to do with the quality of cryptographic research at RSA. Humans are ALWAYS the weak point in cryptography.
No, ad hominem is "You're wrong 'cause you're a jerk." DanBC pointed out that their opinion might not be as above reproach as is often assumed because of past lapses in good judgment.
Ad hominem is an attempt to negate the truth of a claim by pointing out a negative characteristic or belief of the person supporting it.[1]
That sounds pretty much like what happened above. The truth item is whether RSA's cipher chops can be respected. The negative characteristic is an HR person succumbing to a social engineering attack. Ad hominem isn't just calling names.
Let me put it differently: in what way does the HR rep's mistake reduce the quality of ciphers that have come from RSA?
Ad hominem is an attack on the person whose argument you're trying to disprove. Here, that person is the one who wrote the article, not RSA.
The argument "X is safe because its creator, Y, has a good reputation" can certainly be refuted by attacking the claim that Y has a good reputation. It's not an ad hominem.
The fact that the error by the HR person doesn't actually invalidate RSA's reputation as cryptographers is more a Non sequitur.
What can you share with us about the quality of cryptographic research at big software companies? And, besides that one: what are some other big companies whose cryptographic research you would rely on when selecting algorithms and crypto constructions?
Are you sure you know what you're talking about here? I'm not saying you don't, but most HN'ers who would write a comment like yours don't.
I can't share anything. All I can do is respect the peer review process. I'm certainly not making the real appeal to authority of "they are a big company, so they make good ciphers". If you have something specific, I'd love to hear it. In particular, I'd like to hear how social engineering negatively impacts cipher algorithms (negatively impacting key security doesn't count).
What can you tell us about the actual peer review process that you think was involved with PBKDF2?
(Me just saying it won't be nearly as impactful, because apparently I'm in the tank for bcrypt).
Don't know? This will make a worthwhile 30 minute Googling project. That's what I do when I get in over my head, and I promise, learning about adaptive hashing is going to be more useful than reading court decisions about what does or doesn't constitute a breach of the duty of loyalty for a company director (to cite my last Google dive from HN).
You seem to have the mistaken opinion I'm arguing for a particular outcome. My point is simple: you can't say RSA is crap because an admin failed. I'm not saying anything else. If you are reading something else into my statement, stop it. If you have information that is relevant to why you shouldn't trust the ciphers from RSA, dish it up. Like I said, I'd love to read it.
It is possible to point out a logical phalacy while having no beliefs (or, in fact, deep knowledge [or, perhaps, religion?]) regarding what the phalacy pertains to.
No, I'm asking you to take a subject you're obviously engaged in and spend a couple minutes researching it before you write your next comment about it. Not to be a jerk, but because (a) the whole thread would benefit and (b) I can vouch, for you, that this is worth your time as a software developer to do.
So: up for it? Is a couple minutes of Google time and the direct attention from several software security experts to learn lots and lots about key derivation functions and password hashes worth it to you?
I guess it's hip to have an opinion, but JUST USE BCRYPT. It's secure and available. Don't spend time thinking about it, just use bcrypt, it does everything you want, move on to something more worth your time.
Unless you're some blogger who needs to generate some page views, then pick some obscure topic like how to store password hashes and rake muck.
I wouldn't exactly call "just use bcrypt" a cargo cult. It has had real benefits to the web dev community because it factors down to "don't store it in plain text or use MD5 or something".
Several people working on small projects have already come to me and asked me "what's this bcrypt thing? should I use it here?" I guarantee these folks would have just stored it plaintext otherwise. So I've directly observed the mantra making stuff safer. Win.
I strongly agree that when asked by friends what to use, you should think about which library is going to be easier for them to use, and not try to think through which construction is "better". That this blog post gets the real ordering of betterness wrong is tangential to the real problem, which is that it gets the value proposition of bcrypt wrong.
Microsoft uses PBKDF2 for newer domain cached credentials (DCC2). These password hashes are stored in the registry of Windows clients (laptops and desktops) and allow users to logon when the domain is unavailable. They use 10240 iterations. It's very compute intensive to crack... roughly 330 guesses per second. Great article BTW!
Edit: I only post this to add to the examples of who uses PBKDF2 in addition to what the article lists.
Oddly enough, this article prompted me to setup bcrypt hashing for our unreleased app. I tend to not follow the tide when it comes to people screaming on the internet about something. People get highly emotional about the "next big thing" and defend with every ounce of willpower the decisions they've made. While I understand this mentality, it can make it hard to be objective.
Seeing an article where someone disagrees with the buzzing hive mind is always refreshing for me and made me actually consider, for the first time, that an algorithm aside from sha1(app_salt + user_salt + password) would be a good idea.
I did some research and decided that for PHP, bcrypt is the absolute easiest option to implement. scrypt is too new, PBKDF2 while administratively accepted has much less info on using it in PHP than bcrypt does.
So while I ultimately ended up disagreeing with the author, the article was invaluable in the end.
As stated in the article, a popular stance on Hacker News and Stack Overflow is "USE BCRYPT". It's chanted to crypto-noobs and webdevs as a simple-to-use library for password storage that is more secure than MD5/SHA/Whatever hashing, and with built-in salts.
The whole point of this article is to say that, in fact, there are other options.
Bullshit. There is one very-well-written article at Coda Hale's site that says "just use bcrypt", but in discussions of adaptive hashing on HN, people who know what they're talking about are continuously at pains to vouch for PBKDF2 and scrypt (it helps that one of the people who knows what they're talking about on this subject is (a) vocal on HN and (b) designed scrypt).
It irritates me that despite going through the effort of vouching for PBKDF2 and scrypt every time this f'ing topic comes up on HN, people still manage to reduce this issue to another tribal conflict.
I'm one of those crypto-noobs. I'm getting better and studying, and obviously hanging out on Stack Exchange.
I didn't make it a tribal conflict. If anything, the article did... I was summarizing. I must have missed the part where scrypt was mentioned here, but I have seen it called out on SE.
I wasn't aware that anyone suggested using bcrypt for key derivation. The idea, as I understood it, was to avoid writing your own password hashing implementation. Bcrypt is a complete password-hashing implementation, so use it, rather than cobbling something together yourself. This is the standard advice for cryptographic software.
PBKDF2 isn't a password hash: the specification doesn't define a storage format for iterated, salted password hashes. It's not that hard to invent one if you already specialize in writing cryptographic software, but most programmers still shouldn't be doing that. It's just too easy to make mistakes that go unnoticed until it's too late.
If you insist on using PBKDF2, then I suggest using my PBKDF2.crypt() implementation at https://github.com/dlitz/python-pbkdf2. I'm not a cryptologist, but I'm the maintainer of PyCrypto, so presumably I can be trusted to do a better-than-average job of this sort of thing. If people want, I'll write a proper spec and add SHA256 support with a different algorithm identifier (the current implementation still uses SHA1).
But really, if you need a password hash, just use bcrypt and get back to writing the code that actually provides value to your user base and differentiates you from your competitors. Bcrypt is good enough for now. This advice might change in the future, so do pay attention, but for now, just use bcrypt.
So the author says to use PBKDF2 because it's researched and tested better than bcrypt, and then suggests scrypt as another alternative despite having, from what I can tell, less research done on it than bcrypt.
The crypto space can be intimidating to your average dev, but almost every app needs some sort of protection (at least for user information). I think the author is fair in wanting to push for the "default" to be PBKDF2 instead of bcrypt, but should he really be advocating a less-tested function in the same article?
To be honest, I don't see why anyone would use bcrypt over PBKDF2, if the security of the primitives is of any serious concern. I am new here on HN, so I'm not aware of your arguments on why bcrypt is better than PBKDF2.
As for the "branding" question, would you recommend XTEA or MARS or Threefish over AES for someone in need of a block cipher? Of course not. Standards are not always perfect (and many a flaw has been found in standards), but they are generally beneficial.
PBKDF2 also has the advantage of being modular. It takes in an arbitrary PRF (although HMAC-SHA1 is the usual); maybe HMAC-SHA1 turns out to be poor for the job, just plugin a better PRF (hell, you can plugin a provable PRF that reduces to integer factorization or the elliptic curve discrete log). bcrypt is just bcrypt --- a seemingly not too peer-reviewed modification of an ancient cipher, that is not even recommended anymore.
scrypt is better than both, of course. Provable time-memory hardness is great, and should be made standard.
It takes an arbitrary PRF that is in practice virtually always SHA2, and practically always a well-known cryptographic hash function. If you're going to bank on a cryptographic primitive and you have a choice between "cipher" and "hash function", you pick "cipher".
Also, for lay developers, choosing a crypto construct for its modularity is like choosing a smoke detector because it allows you to use different radiological bits in it. You're not supposed to be messing with those bits. The whole point of the package is not to have random developers changing them.
Also, I want you to note something:
You pick AES not because it's a standard but because it's the product of a contest in which many of the world's best cryptographers competed to design the replacement to DES. That's not what PKCS standards are. A PKCS standard is simply something that survived a standards group discussion.
I see you've adopted Moxie's argument, block ciphers against hashes. Very well. The hard part of designing a good hash function is achieving collision-resistance; one-wayness is easy. In this context, we don't really care about collision-resistance, since HMAC can be a PRF without collision resistance of the underlying hash (there was a recent proof by Mihir Bellare) --- this is why it's still "OK" to use HMAC-MD5, despite MD5 being completely broken otherwise. The adage that we know much more about ciphers than hashes, although still mostly true, is an exaggeration at this point in time, where we have the HAIFA mode, good block ciphers, and soon SHA-3.
I didn't mean to say that Joe the webdev would be choosing the PRF, that's insane. But whoever providing the (library) implementation would have their life facilitated by having modular primitives, instead of having to code another construct from the ground up.
Edit: True, the AES competition was a much higher-profile event. There is an important difference (this is not a rebuttal, but a remark): AES is a cipher, PBKDF is a construction. AES has no proof of security, nor hope of one: it's a purely heuristic security argument. Through models like the random oracle, however, we could show that the PBKDF construction is secure, if H is secure. In such a case, there is not as much need for a competition, unless you're competing for performance or the like. That said, I would love proofs of security (or show the lack thereof) for PBKDF2.
It's not "Moxie's argument". It's also Schneier and Ferguson's argument from Cryptography Engineering:
Event hough hash functions are used in almost every system, we know far less about hash functions than we do about block ciphers. This is one of the failures of the cryptographic community. Compared to block ciphers, very little research has been done on hash functions, and there are not many practical proposals to choose from.†
It is also the reason why we are sponsoring contests to replace SHA2, because the research horizon for the current generation of cryptographic hash functions is... ominous.
You're making a noncontroversial statement (ciphers are better studied than hashes) sound like a controversy. It's not really a controversy. And please note: I didn't bring "conservatism" into this discussion; the blog post we're responding to. If you put it to me directly, I'll say bcrypt is more conservative than PBKDF2/SHA2 (which is what every current PBKDF2 system is going to end up using). But I didn't write a blog post that says "don't use PBKDF2".
† I say this only to make the point that it's not an argument pulled from thin air or from Moxie comments; something that Schneier commits to writing is, I mean to say, very likely to represent conventional wisdom.
You're right, I do recall that passage. Apologies if I sounded patronizing.
I tried to be careful with the wording, precisely to avoid making a common-sense statement into a controversy. Yes, block ciphers are more well-studied (2012-70s > 2012-80s). My point was that the gap is much smaller now than (say) when AES was standardized (I wonder, was that passage also in the "Practical Cryptography" 2003 book?).
As a personal note, I doubt 5-10 years from now SHA-2 will have been compromised for password hashing (or HMAC), though. It has been remarked several times during the process that SHA-2 would have made a great SHA-3 candidate, its only major flaw being length extension attacks. The main fear was that SHA-2 would succumb to the same techniques as SHA-1 and MD5; so far, that has not been the case (perhaps because everyone is fighting it off with the SHA-3 candidates).
Of course, I didn't write a blog post either with anything. I don't blog. Given the choice, I'd give priority to PBKDF2; that's about it. Of course, I'm not a customer-facing developer, so my worries are different.
No problem. I think you probably know this stuff better than I do. In practice, I think PBKDF2 really means "PBKDF2/SHA2", and that in 5 years attacker tools will be most efficient for PBKDF2/SHA2, less efficient for bcrypt, least efficient for scrypt, and won't address PBKDF2/AES-256-CBCMAC at all because nobody will know how to code it.
Moreover, what we (= provable security wonks) really don't understand is collision resistance, or, more generally, cryptographic security notions that are stronger than one-wayness (really UOWHF) and that do involve a secret key.
Trusting blockciphers over hashes based on this high-level argument is suspect, at least in principle. (To be clear, I'm not arguing with your practical recommendations as they seem pragmatically justified.) It could well be the case that the ways we build password hashes from blockciphers appear to be more secure simply because we haven't sufficiently cryptanalyzed blockciphers for the necessary collision-resistance-type properties. Perhaps collision resistance is just too much to hope from any highly-efficient function, and it is only a matter of investing the effort to find collisions in the functions derived from blockciphers. Adding to the suspicion is the fact that blockciphers give us good hashes by coincidence, not design (And formal evidence doesn't help explain the situation - although I see hand-wavy claims about related-key attack resistance being sufficient, in my work I've only found reverse connections).
So are you aware of any deeper reasoning behind the blockciphers-over-hashes argument? Could trusting blockciphers for collision-resistance just be a good usage of security-through-obscurity, because a tremendous amount of effort has been invested in finding collisions in the hash functions but not in the blockciphers?
I think that book is going to need an update soon as the SHA-3 competition has directed a significant amount of research behind hash function security.
The good news is that SHA-2 and HMAC remain in pretty decent shape (known limitations notwithstanding).
The reason I would prefer it is that PBKDF2 is standardized in an RFC with proper test vectors an a reference implementation in C.
Bcrypt and Scrypt do not. We could debate the relative quality of the cryptographers behind them all day, but realistically Bcrypt has had at least one widely-deployed implementation bug that caused a real decrease in strength which can be traced back to a lack of proper test vectors.
That's because PBKDF2 is the result of a standards process, and bcrypt is the result of the Unix process (bcrypt is the extaction of OpenBSD's password hash function from the late '90s).
While we agree on the fact that using either of the three can't be a bad thing, I'd like to give my opinion on why I favor PBKDF2 over bcrypt, and probably even over scrypt, although I admit the "memory-hardness" of the latter makes it superior in principle. But still, my reasons for going with PBKDF2:
It's not just that it's endorsed by RSA, no it's actually the NIST recommendation for password hashing and I find it rather unfair that people on this thread turn that against it!
It's the same argument why we generally recommend AES over let's say Twofish or Serpent. We all agreed here that in crypto it's a good thing to be mainstream. And being recommended by NIST makes you Justin Bieber, or not? Standard algorithms may be poor, true, but being a standard has one important advantage: most of the public scrutiny goes into the standard. Much more money and fame there. So it's much more likely that the public will get to know about a flaw in the standard faster than it will get to know about flaws in non-standard algorithms. And that's why I follow standards - even if it's a crappy algorithm, I will know immediately when it's broken and I can react by replacing it right away. The time between an algorithm gets broken and the fact becoming public knowledge is potentially higher the less common an algorithm is. And the time in between being broken and being public knowledge is the most dangerous in my opinion.
I'd like to point out that bcrypt is not equal to Blowfish. It piggybacks on Blowfish's key setup. But note that it just piggybacks, on top of that it further extends the original key setup. Blowfish's key setup was probably never invented to do what bcrypt does now, and the last 30min of googling have not brought up any papers about bcrypt cryptanalysis. Compare that to HMAC. Compare that to using PBKDF2 with HMAC SHA-3 when it's out. I'm not saying that Blowfish or any of its parts are bad, but if not PBKDF2 itself, but then most certainly its building blocks have received a lot more analysis than bcrypt or scrypt. With SHA-3 on the horizon the research community knows a lot more about hash algorithms and there is a lot of research going into these topics. That's why I personally feel safer with a construction that maybe in itself has not received more research than the other two alternatives, but where its building blocks almost certainly have, unless somebody proves me wrong. And when that happens, I'll stand happily corrected and will use the next standard.
"that won't help you if an attack is discovered which mitigates bcrypt's computational complexity."
AND you lose control of your database. Even if I had a magic instant bcrypt reverser, it does me no good if I don't have the hashes. You cannot be compromised by a bcrypt mistake, it would only make your already existing compromise slightly worse.
It's not an argument in favor of bcrypt. It's a rebuttal to an argument against bcrypt.
I have seen several people, whenever this comes up, make the mistake of thinking that your whole app could somehow be compromised by the wrong choice of password hash.
I agree that this is a second-tier security issue, as well.
I only got religion about it after reading 100 threads from people bragging about how they'd designed their custom password hashes with Whirlpool and AES-256; in other words, as a nerd tic.
To be honest, if anybody breaks into your web application it is very unlikely to be because they broke any encryption. SQL injection or man in the middle type attacks are far far more likely.
What you need out of an encryption package is in the event of being tested for PCI compliance or any legal liability investigation into a breach. You need to be able to say "all of our encryption is done with bcrypt , it's the industry standard an complies with X Y and Z".
The debate around secure password storage is sort of orthogonal to initial compromise of a web application.
The reason that it's important (although as others here have noted, less important than primary application security concerns) is what areas of attack are opened up by using insecure password storage "after" an initial compromise.
This could be something as simple as being a nuisance to users of the system (having to send out those "our password database was compromised, and we didn't do a good job of storing them securely, so you should probably change all of your associated passwords), to something much more serious (using said insecurely stored passwords to attack your other systems for example).
PCI doesn't really care how you're encrypting your data at rest. I cracked the password storage from an application once which was literally just a simple substitution cipher (which was positionally dependent...it was for all intents and purposes as secure as a newspaper cryptogram puzzle). That application was PCI compliant.
While "what you need out of an encryption package" might just be the bare minimum of "cover your ass", that's no reason to settle for insecure password storage.
It's also something of a public health issue given how most users re-use some passwords across multiple sites.
"Site A" may be less than careful about security since they perceive their data as being of low value (e.g. "register for a chance to win free movie tickets"). But when they get hacked and their users' passwords cracked, it will likely expose plenty of Facebook and online banking credentials.
Suppose hypothetically you knew everyone using your service used a strong password. Say they all have good password managers that generate 40 character random passwords for them.
Is there then any need to do more than a simple salted hash? (Remember, the hypothesis is that all your users are using strong passwords).
The difference in that (obviously unrealistic) case is (I think?) between a cost per password of hundreds of thousands of dollars versus high tens of millions of dollars.
PBKDF2 has had longer public exposure, and also features an adjustable CPU work factor (though with a lower theoretical safety-to-compute-time than bcrypt).
scrypt is newer, but features both a CPU and memory work factor (memory-hard algorithm), and is algorithmically superior to both.
Why, exactly, do you think PBKDF2 is a more sensible default than bcrypt?
(If scrypt was trivially installable from a gem for Ruby, easy_install for Python, CPAN for Perl, a jar for Java and a .NET assembly for Microsoft, and all those bits came from sources where I didn't have to manually audit them and make personal attestations for their quality when I recommended them to clients, I would immediately stop recommending bcrypt).
Wow, you jumped right on that :) I deleted the comment immediately after posting it because I wasn't really sure I could support that statement. As someone who doesn't know a lot about security algorims, I'm susceptible to the idea posed in the article that the better-proven algorithm is a safer bet; but on second thought, it seemed a little paranoid, since I haven't seen any positive reason to doubt bcrypt's security. And given that PBKDF2 is an uglier thing to try to recommend, and "just use something" is the most important message, I think the current atmosphere around bcrypt is probably fine.
This makes PBKDFs
very different than the general-purpose KDFs studied here. In particular, while passwords can be
modeled as a source of keying material, this source has too little entropy to meaningfully apply
our extractor approach except when modeling the hash function as a random oracle. Also the
slowing-down approach of PBKDFs is undesirable for non-password settings
But, as mentioned by @cperciva elsewhere in this thread, generating a key and creating a password hash are nearly synonymous. Using HKDF for passwords would be silly, but the more interesting question is: when would you use scrypt for key derivation in a system?
More to the point: what are the tradeoffs you'd consider in choosing one over the other?
(Addressed more to @cperciva...) I'm assuming tarsnap uses scrypt as its actual key derivation function for file encryption and authentication. Why scrypt instead of something else (and I have faith that it's not "not invented here" syndrome)?
Short answer: I think scrypt is an advancement over the class of constructions HKDF belongs to. If you're picking nits about which function to use, use scrypt.
Only as a cryptographic mixing/expansion function. There is no reason to think that scrypt's security would be any less if the PBKDF2 calls were replaced with xor.
Doesn't it provide more protection against possible flaws in Salsa?
Also, somewhat related question -- what if Salsa core in scrypt is replaced with BLAKE core (with fewer rounds than in hash), and SHA-2 in PBKDF2 with BLAKE, thus making it possibly smaller (hardware and lines of code). Will this work well?
Doesn't it provide more protection against possible flaws in Salsa?
In a very theoretical sense, yes. But Salsa would need to be very very broken in order for that to matter (hence the "no reason to think" comment).
what if Salsa core in scrypt is replaced with BLAKE core (with fewer rounds than in hash), and SHA-2 in PBKDF2 with BLAKE, thus making it possibly smaller (hardware and lines of code). Will this work well?
Probably. I proved the security under the random oracle model, but the property I actually need is approximately "can't be iterated fast", which is a far weaker requirement.
"Memory hard" is a serious benefit that scrypt actually has.
"RSA tested and widely used" is subjective, not particularly meaningful, and in some senses erroneous, and so makes a poor case for PBKDF2.
If people want to seriously push for scrypt as a replacement for bcrypt as the "default" function, I'll design and print flags and pennants for the movement. But when people say "use PBKDF2 instead of bcrypt", I think the net effect is to scare people back to salted hashes, and my general response is going to be to poke holes in their arguments.
I just spent the past two days sitting in a workshop on "Special-Purpose Hardware for Attacking Cryptographic Systems" and the most repeating thread from all of the talks was how to deal with the unique memory limitations of GPU's and FPGA's when using them to attack crypto. Bandwidth is the largest one, and specifically the tiny amount of shared memory available to the GPU.
Basically, if you're forced to use "local memory" (which has a huge cost in transaction time), the amount of operations per cycle you can perform goes way down, which in some cases can be the difference between an attack taking "2 years" and "until the heat death of the universe".
Just to add to that a little: when comparing these functions it's not so relevant the absolute time consumed; all of them can be tuned to take whatever amount of time is acceptable on the defender's general purpose hardware.
The key factor is minimizing the relative advantage that an attacker with focused resources (such as dedicated hardware) is able to gain over the defender.
By consuming lots of RAM (more than will fit on a single Si die), it means effectively that the defender is able to leverage the economies of scale that go into optimizing the memory bus of his commodity server. This makes it so an attacker who can produce his own chips (or to a lesser extent, use FPGAs or GPUs) has much less of an advantage over the defender.
While CPU speeds, transistor densities, and cache sizes have gone through the roof, the 60-80 ns memory latency of off-chip DRAM has been nearly constant over the last few decades of computing.
For one, it's something the government uses for its own crypto.
Unlike some crypto standards, PBKDF2 is just something too simple and too user-configurable to be able to hide a meaningful hole in. Given that PBKDF1 was found to be less-than-ideal and depracated in favor of the better PBKDF2, it would be a very risky proposition to attempt to weaken it in a way that gave a meaningful advantage to one party over the other.
PBKDF2 is not a cipher. It's a KDF, and it's almost always used with an HMAC or a cryptographic hash rather than a cipher. The thesis of this article seems to be "PBKDF2 is well understood, where bcrypt is not." In fact, the opposite is probably true.
bcrypt uses a block cipher (blowfish) to create its underlying compression function. Block ciphers are extremely well understood, have been studied to death for years, and are modeled on extremely well understood constructs. They can be used to create cryptographic hash functions, but usually aren't, because they're slow (which we don't care about in this case).
Cryptographic hash functions, by contrast, are not well understood at all. They are "magic" in many ways, and aren't modeled after anything. Many more "bad things" happen in this space than in the block cipher space. The only reason people mess with them at all is because they're faster than block ciphers, which again, we don't care about in this case.
The other appeal to PBKDF2 is because it "comes from RSA." This doesn't feel like an extremely compelling argument, but if we were going to believe it, then why not use the PKCS#12 KDF? PBKDF2 was proposed in PKCS#5, and "12" is a larger number than "5", so if we're going to do what RSA tells us we should do, they're essentially saying we shouldn't actually use PBKDF2.