There's interesting technical content here, but it suffers from its alarmist tone.
The MD5 hash function is broken, that is true. However, TLS doesn't use MD5 in its raw form; it uses variants of HMAC-MD5, which applies the hash function twice, with two different padding constants with high Hamming distances (put differently, it tries to synthesize two distinct hash functions, MD5-IPAD and MD5-OPAD, and apply them both). Nobody would recommend HMAC-MD5 for use in a new system, but it has not been broken.
RC4 is horribly broken, and is horribly broken in ways that are meaningful to TLS. But the magnitude of RC4's brokenness wasn't appreciated until last year, and up until then, RC4 was a common recommendation for resolving both the SSL3/TLS1.0 BEAST attack and the TLS "Lucky 13" M-t-E attack. That's because RC4 is the only widely-supported stream cipher in TLS. Moreover, RC4 was considered the most computationally efficient way to get TLS deployed, which 5-6 years ago might have been make-or-break for some TLS deployments.
You should worry about RC4 in TLS --- but not that much: the attack is noisy and extremely time consuming. You should not be alarmed by MD5 in TLS, although getting rid of it is one of many good reasons to drive adoption of TLS 1.2.
There is another technical angle; RC4 is usually quite a lot less CPU intensive than the alternatives available. Not using RC4 can easily mean stuttering video playback, greatly diminished battery life, and even lock ups. Very few users are open to accepting that issues like that are "better" for them.
Many RC4 deprecation efforts have faced rollback in the face of issues like this; especially on hard to fix embedded devices (think TVs, Cars and phones) with comparatively weak CPUs.
There are two solutions: use hardware with the AES-NI instruction set, which makes AES blazing fast, or alternatively use a better stream cipher like Salsa20. On my machine, which has an Intel i5-3570k, Salsa20 is about 25% faster (edit: than RC4)
Unfortunately, neither solution is easy: only the very latest chips have AES-NI instructions, and not many clients support Salsa20 yet (OpenSSL does not, for example, and it powers a lot of SSL stuff).
Anyway, TLS is in a tough spot. It's such a widely adopted standard, with so many implementations, that making radical changes is exceptionally difficult. AES-NI leaves the standard mostly alone but requires new(ish) hardware, but on the other hand, implementing newer, faster primitives (like Salsa20) requires essentially turning the massive boat that is TLS.
There are no easy solutions, at least as far as I can see.
Oh, yes - definitely. I just picked Salsa20 as an example because I already had benchmark data for my machine, and I am familiar with it.
But even TLS 1.2 won't help because 1.2 doesn't include ciphers that are screaming-fast without hardware-acceleration. AES-GCM is faster than AES-128-CBC/HMAC-SHA1, but Salsa20-256/HMAC-SHA1 is still twice as fast on my machine. Now if the AES-NI instruction set is available, then AES-GCM handily beats everything by a large margin. (Of course, using hardware acceleration, AES-128-CBC/HMAC-SHA1 is marginally faster than Salsa20-256/HMAC-SHA1, again on my machine.)
The ultimate point is that, without the AES-NI instruction set, new ciphers are just about the only way to get really good TLS performance.
Does AES-GCM with AES-NI and PCLMULQDQ beat Salsa20+Poly1305 with lots of sessions? I know it's got excellent cycles/byte for a single session, but TLS implementations also need agility.
I'm afraid you've exhausted the limits of my precomputed benchmarks. :)
I don't know the answer offhand, but I would suspect that hardware-accelerated AES-GCM would win. It certainly does in single-threaded, "one-session"-esque tests, and the margin of its victory makes me think that hardware-accelerated GCM would be hard to beat by anything.
On my machine, a single thread/core running nothing but AES-GCM can encrypt/decrypt 8192 byte blocks of data at 1.32 GiB/s (this is using OpenSSL's benchmarking feature). Yes, that's gigabytes, not gigabits. It's literally faster than IO for my SSD. (Salsa20, without a MAC, can do the same at about 0.64 GiB/s.)
When I told OpenSSL to use four threads in parallel, it ranked at 5.01 GiB/s, which is absolutely crazy.
That said, beyond a general leaning towards AES-GCM (simply because it is so fast with hardware acceleration), I don't have any hard data on which would be the victor. But I may just construct some benchmarks to test that out, because it's an interesting question.
(disclaimer, I'm one of the Salsa20 in TLS draft authors).
Note that the suggestion of using Salsa20 is to replace RC4 not only to get better performance, but because RC4 is broken (as you know).
Salsa20 (and ChaCha) can be implemented on constrained devices and reach RC4 like performance. On modern architectures the algorithms word based functionality better utilise the HW than RC4 and can reach better performance.
Yes, AES with HW-support such as AES-NI can provide really good performance too. But then we _only_ have AES (and DES/3DES). Do we want to reduce SSL, TLS to a single symmetric encryption primitive? And no stream ciphers?
There is at least an RFC out for a TLS stream cipher using Salsa20, if I'm remembering current events correctly. Of course publication of a spec will precede implementation in hardware and software by some years, I would imagine.
On the contrary, "It has not been broken" is exactly what I would expect a programmer to say.
If the security of an algorithm is weakened, then it's important to evaluate the use of the algorithm and make efforts to implement stronger security now. You should feel fortunate that you even get the time to move to something better before all hell breaks loose.
This is the same kind of thinking I hear daily when people say things like, "Just use bcrypt" without thinking about the consequences.
The tendency for programmers to think of security in a nihilistic way continues to boggle my mind. I don't think the article suffers from an alarmist tone. I think it's correct to look at something shitty and call it shit.
I have no idea what this comment is even trying to say. I have no idea what MD5 has to do with bcrypt, and I have no idea what "nihilism" has to do with the fact that HMAC-MD5 isn't broken. We didn't just "discover" that MD5 was weak; Paul Kocher knew it was weak when SSL 3.0 was standardized back in 1996, which is why the SSL 3.0 handshake PRF uses both SHA-1 and MD5.
Yours is the kind of comment anyone can write without knowing anything whatsoever about cryptography, so I'm wary of going into more detail.
Apologies. Perhaps I'm being a master of the obvious here, so I'll restate more simply:
When people try to implement security without actually thinking about what the system is doing, it creates weaknesses in the security, not due to algorithmic weaknesses, but because the organization and the engineering discipline for the future is compromised. Thus, while "just use bcrypt" or "just use HMAC-MD5" might work today, the organization doesn't have the mind to update it when it finally does break.
This is exactly what happened (and is still happening) today after MD5 was broken.
This is the same comment with fewer words, and while I appreciate the concision, it doesn't make any more sense to me.
Bcrypt isn't broken or even weakened.
HMAC-MD5 isn't broken.
HMAC-MD5 and bcrypt are unrelated.
Nobody is ignoring the problem of MD5; in fact, suspicion about MD5 animates the very first secure SSL specification we have, from almost 20 years ago. Nobody is saying "just use HMAC-MD5".
I think what he is saying is that many individuals and organizations will not learn the fundamentals behind why X is broken, they only learn "X is broken use Y instead."
They instead should learn that Y is also potentially broken in a given circumstance - and maybe that doesn't apply to my current situation but I need a review process to check that it still doesn't apply to me at a given time in the future.
For someone designing a cryptography application, this understanding should be very deep. I don't think it needs to be as deep for someone who is configuring their Apache server and just needs to know what ciphers to enable and which ones to prefer. In this case it is best to follow an industry best practice based on the type of data being sent over the wire and the compatibility/performance required by the clients/users. Then schedule an annual or quarterly review of those choices to make sure they don't go out of date and keep an eye on security bulletins in case one of them is severely broken.
What he's saying is these blanket statements "just use X" is what is broken. Sometime ago it was "just use md5" and we're still suffering through the fallout of that long after md5 has been shown to be broken. Now we're pointing everyone in another direction and at some point that will be broken too. His point is that we need to educate people on the reasons why one algorithm is better than another for certain security concerns rather than relying on blanket catch-all declarations.
And now I'd like to say for the third time that no, there was no "just use MD5" meme in cryptography or in software development, and if TLS is an illustration of anything, it's of not simply leaning on MD5. Once again: the TLS protocol itself is not vulnerable because of MD5, and it's not vulnerable because its designers and implementors both knew about and accounted for the weaknesses of MD5.
The author took the opposite lesson from TLS than the one that it actually demonstrates, and the commenter above is harping on that broken lesson.
As a computer scientist, it's a joy to discover when you're wrong about things. So I'm enjoying being on the wrong side of the discussion for once, because I'm learning lots.
Thank you for your replies tptacek, I've learned much from this discussion. If I could edit my top comment, I would.
Doubt about this exact case, but I've seen MD5 being (ab)used in a really weird ways, which I attributed to mindless "oh, I'll just use MD5 here, heard it's good for security!"
One particular case I remember was use of md5(md5(md5(unix_timestamp()))) to generate "secure" session tokens.
> This is the same kind of thinking I hear daily when people say things like, "Just use bcrypt" without thinking about the consequences.
Sorry to say, but "just use bcrypt" is currently the right three word statement that you can use if anybody is asking "I'd like to hash a password, and I don't want to learn all of crypto before I do." Bcrypt is currently among the algorithms that are hard to break if used correctly, deployed widely, has wide support in deployed languages and frameworks and it's fairly simple to use. There's little room for major fuckups here.
There are algorithms that are harder to break (scrypt) or an official standard (PBKDF2), but seriously, bcrypt is currently good enough. Sure, it's always better to read and learn, but sometimes people just have to get things done and I'd rather see them use bcrypt than sha1 or unsalted md5.
> The tendency for programmers to think of security in a nihilistic way continues to boggle my mind.
tptacek is appears to be too modest to say it himself, so I'll go ahead and point it out: He's not "just a programmer", he's a well-respected computer security and vulnerability researcher.
This isn't to say that you should ever simply take his word for stuff, but rather that you are on one hand preaching to the choir, and on the other that you are probably not considering practical effects on security design that he has to wrangle with all the time.
For instance, it's probably a bad idea to hop immediately from one weakened (not even broken) cryptosystem to The New Hotness just because flaws are uncovered, especially for those doing this without thinking of the consequences. For every theoretical security bug you may fix while doing the conversion, you may very well introduce two much practical security bugs.
Cargo cults are bad wherever they are encountered, even when the cult involves something as seemingly as innocuous as "Cryptosystem $FOO has been weakened, time to jump ship".
The parent seems to be implying (taking other comments into account) that people "cargo cult"-ing on ideas like "just use bcrypt" might work now, but it will become a liability in future when bcrypt is weakened or broken (making it more difficult to get people to switch to the next standard practice).
That's odd, I don't think anyone is taking the advice to mean "use bcrypt for ever", I'd imagine that everyone understands that we use it because it's good enough for the foreseeable future.
Thanks for your insightful comments and representing the paradox of the internet: Alarmist tones are required to generate interest in a story, but taking the time to explain the cognitively challenging details is instantly off-putting in an article. The average internet users is becoming so stupid, even the simplified versions are TL;DR. I've come to rely on Hacker News comments in ways I used to rely on Slashdot stories. The SNR at Reddit is overwhelming.
Minor nit: note that a collision after the first MD5 application will result in an HMAC-MD5 collision. The dissimilarity between the two MD5 applications isn't for collision resistance. (The second MD5 application is still mandatory for preventing length extension attacks, and makes key recovery attacks more difficult, among other things.)
Rather, the increased collision resistance comes from the fact that the 64-byte keyed padding puts the MD5 context in a state unknown to the attacker before any of the attacker's data touches the MD5 state. As long as the HMAC key has at least 128 bits of entropy, all possible values of the 128-bit MD5 internal state are nearly equally likely. This makes it much more difficult for an attacker to produce collisions.
I'm sort of confused, because TLS 1.1/1.2 support across browsers looks to be quite poor at the moment. Especially when given a large number of IE visitors that will likely upgrade to IE11 (first version that will have TLS 1.1/1.2 enabled by default) sometime around the year 2016.
With that said, I was under the impression that sites will need to support TLS 1.0 for a good long while, and if that is indeed the case, would they not be better off using RC4? From my understanding, the RC4 attacks seemed less practical than attacks against the implementation of CBC mode in SSL 3.0 / TLS 1.0?
Yes, the installed base is going to keep TLS 1.0 and the legacy SSL block cipher construction in deployment for a long time.
Yes, smart people (among them AGL) have said that the RC4 attack is less practical than the M-t-E timing attack on the SSL CBC ciphers. (By the way, it would be great if we could start putting the blame on M-t-E instead of CBC; the vulnerability isn't in CBC per se. CBC is fine; M-t-E is proven not to be.)
But:
* The timing attack also has remediations (see AGL's famous NSS patch) which don't change the protocol.
* The timing attack is fundamentally unlikely to get more powerful; it's exploiting a very simple, well-understood problem.
* Work on exploiting the RC4 attack is in its infancy, and there are multiple ways the attack could get both fundamentally more powerful and more efficiently implemented.
* There are no software-only fixes to the RC4 problem that don't break the protocol; RC4 is fundamentally and irrevocably broken.
But moving from AES and SHA to RC4 and MD5 should have made a few bells go if simply because it is bad engineering and shows lack of knowledge.
If we want to move away from MD5 and RC4 we first must start deprecating their usage wherever we can. Removing suites in SSL/TLS that uses them is a pretty simple step. Moving _from_ good suites _to_ these suites is the totally wrong way to go.
The MD5 hash function is broken, that is true. However, TLS doesn't use MD5 in its raw form; it uses variants of HMAC-MD5, which applies the hash function twice, with two different padding constants with high Hamming distances (put differently, it tries to synthesize two distinct hash functions, MD5-IPAD and MD5-OPAD, and apply them both). Nobody would recommend HMAC-MD5 for use in a new system, but it has not been broken.
RC4 is horribly broken, and is horribly broken in ways that are meaningful to TLS. But the magnitude of RC4's brokenness wasn't appreciated until last year, and up until then, RC4 was a common recommendation for resolving both the SSL3/TLS1.0 BEAST attack and the TLS "Lucky 13" M-t-E attack. That's because RC4 is the only widely-supported stream cipher in TLS. Moreover, RC4 was considered the most computationally efficient way to get TLS deployed, which 5-6 years ago might have been make-or-break for some TLS deployments.
You should worry about RC4 in TLS --- but not that much: the attack is noisy and extremely time consuming. You should not be alarmed by MD5 in TLS, although getting rid of it is one of many good reasons to drive adoption of TLS 1.2.