H(k || m) --> SHA1("secret-key" + "name=bob,withdraw=$200")
H(m || k) --> SHA1("name=bob,withdraw=$200" + "secret-key")
HMAC(k, m) -->
("secret-key" XOR ("\x5c" * 10)) +
("secret-key" XOR ("\x36" * 10)) + "name=bob,withdraw=$200"))
The first two functions are what a normal, reasonable developer could be expected to come up with given SHA1 as a tool. Combine the key with the message and hash them; you can't work back from the hash to the message, so the hash doesn't reveal the key, and the hash will be wildly different if even a single bit of the key is different.
The first example is totally, fatally broken. SHA1 (and MD5 and many other hashes) are machines that share a common design called Merkle-Damgaard, which means that they process messages in block-length chunks, and use those blocks to permute an internal state. The output SHA1 is the "final" contents of that state. But there's nothing that actually "finalizes" the SHA1 state; if you see the SHA1 value on the wire, you can keep cranking the Merkle-Damgaard machine with additional data. That means you can mint new messages with arbitrary data tacked to the end that will appear to be authentic. This attack is incredibly easy to carry out; it takes ~20 lines of Ruby code.
The second example is also broken, and it's the subject of this blog post. If you tack the key on after the message, you can't keep driving the hash with data, because a secret you can't guess goes on the end of it. Colin and Nate are arguing about that downthread.
The final example is HMAC. People talk a lot about HMAC without really knowing what it is, but there you have it. What makes HMAC so much more secure than the other two "normal programmer" examples is that the key and the message are being hashed in separate steps, each made mathematically distinct from each other with the opad (0x5c) and ipad (0x36).
Now you know, and knowing is 1/10000th of the battle.
Thanks for this -- after so long in the field I forget how much we're speaking in our own language until someone points it out. :-)
One small correction:
HMAC(k, m) -->
("secret-key" XOR ("\x5c" * 10)) +
("secret-key" XOR ("\x36" * 10)) + "name=bob,withdraw=$200"))
HMAC(k, m) -->
(("secret-key" + "\x00" * 54) XOR ("\x5c" * 64)) +
(("secret-key" + "\x00" * 54) XOR ("\x36" * 64)) + "name=bob,withdraw=$200"))
1. It doesn't actually weaken the MAC.
2. You'd never manage to ship that bug anyway, since all of your test cases would fail.
I'm not saying that you should be trusted to implement crypto, but I don't think this particular error is a good reason for such a lack of trust. :-)
That said: Use HMAC! It's like having good brakes on your car -- if you're lucky you can get away without it, but it's an easy enough thing to do right that it's not worth taking risks.
In other words, if you use a secure 128-bit hash algorithm (not MD5!) with HMAC, you get approximately 128-bit security. If you use the same hash with "secret suffix", you get 64-bit security. Quite a loss of security for no gain.
Additionally, hashes often fall to collision attacks long before 2nd-preimage attacks. So you've maximally exposed yourself to the leading edge of attacks. Not a good plan.
In summary, today the following MAC approaches for short messages (say, cookies) have approximately this level of security:
SHA1-SecretSuffix: 2^53 (versus 2^80 originally)
MD5-SecretSuffix: 30 seconds on a laptop
You wrote: the construction itself is vulnerable to collisions.
These two statements do not contradict each other. A hash function for which it is feasible to find collisions is considered by the cryptographic community to be broken.
hashes often fall to collision attacks long before 2nd-preimage attacks. So you've maximally exposed yourself to the leading edge of attacks
As I said -- H(m || k) is insecure with a broken hash function, while HMAC(k, m) remains secure unless the underlying hash fails catastrophically. Someone who uses SHA256(m || k) as a MAC is taking an unnecessary risk, but it's not an immediate security flaw.
We both agree that the highest theoretical strength for collision resistance in a hash function is the birthday paradox, which takes sqrt(2^n) work. So if a collision attack is found that is faster, the hash algorithm is considered "broken".
But the H(m || k) construction is broken compared to HMAC by the same definition, no matter how secure the underlying hash function. Out of the gate with a perfect hash, you're reducing the strength from 2^n to 2^(n/2).
If a 2nd-preimage attack was discovered for SHA256 that reduced the effort from 2^256 to 2^128, we would both agree that it is broken for use with HMAC (among other things). This, of course, says nothing about the practicality of a 2^128 search, but cryptographically, SHA256-HMAC would be broken.
So why would you call a construction that always has the same effect as such a momentous discovery not "broken"?
Another example I thought of is truncated HMAC. This is where the output of an HMAC is truncated to meet some space constraint. For example, you might have an embedded device that is now using SHA256-HMAC but you have a legacy need to store the result in 160 bits (due to a hard-coded packet size designed originally for SHA-1). Let's compare this to SHA256-SecretSuffix with no truncation.
Comparison of space used in packet versus the work required to attack each approach:
SHA256-HMAC-Truncated160: 160 bits, 2^160
SHA256-SecretSuffix: 256 bits, 2^128
I just don't want people to say "oh my god, Nate says that what I'm doing is insecure, I'd better fix it ASAP" -- and possibly introduce far worse problems in their panic -- if what they're doing is dumb but not necessarily insecure... hence my pointing out that the insecurity of H(m || k) depends on whether the hash function is broken.
I'll tell you what: I bet you $50 that we can find 5 examples of H(m || k) MACs on Google code search, and that none of them will use an "acceptably" secure hash function.
I'd really like to bet that you simply will - not - be - able - to - find an H(m || k) MAC that uses a hash function that is survivable in that configuration, but proving that would take too long. I think I can win that other bet inside of an hour.
If there are no real-world systems that could possibly be secure in the H(m || k) configuration, I'm left wondering why you're sticking up for it, other than to be pedantic. Being pedantic about security on Hacker News is my job, Colin, not yours.
I know you feel like you're just being academically precise in this conversation, but what you're really doing is creating a subtext that SHA1 is survivable in H(m || k) configurations, and that this is really just an example of "how broken MD5 is".
I didn't say that H(m || k) wasn't broken. I said that it wasn't necessarily insecure. :-)
Because I'm an academic who believes in education and getting details right. And also because I don't want to have people panic unnecessarily.
Name a real system that uses H(m||k) that you trust. H(x||y) MAC schemes are a hallmark of DIY crypto, and DIY crypto systems are the ones that get broken first.
I wouldn't say that it's a hallmark of DIY crypto so much as it is a hallmark of not understanding crypto -- like advertising the use of AES-256 without mentioning the block cipher mode, or treating SSL as a silver bullet.
You're right that I wouldn't trust a system which used SHA256(m || k) as a MAC; but if I was hired to fix such a system, that would be the last thing I'd fix. I'd start by looking for the security flaws.
If you're using a 2002 heavyweight hash function with an early 1990's MAC construction, you are an idiot. I can use such strong words because no one does this. There are no standards anywhere that specified this exact construction because HMAC was invented long before SHA2. If anyone does this in some private toolkit, it's the perfect example why unreviewed, proprietary crypto designs are horrible.
However, the much more likely scenario if you use H(m || k) is that you're using MD5 (or possibly, but less likely, SHA-1). You have a legacy application that inherited an early 1990's design. This construction was specified for SNMP in 1991, for example.
So while you have a technical point, your argument is detrimental to real, deployed systems. I'm extremely concerned that your straw man argument based on SHA256-SecretSuffix, which is extremely unlikely to exist anywhere, would be taken as supporting MD5- or SHA1-SecretSuffix which I know exist in various toolkits. I've seen it when talking to some clients.
Can you make it clear that both MD5- and SHA1-SecretSuffix are not only cryptographically broken, but have feasible attacks with today's hardware?
No matter what Colin is saying, Nate has actually found systems that used SHA1(m||k) and MD5(m||k), and those systems are breakable with off-the-shelf hardware. Which is why he wrote this blog post we're commenting on.
Theory's nice, but Colin's theories are just theories, and in the harsh light of real-world experience, his explanations of those theories are misleading. No secure system uses H(m||k). And H(m||k) is exactly the kind of thing smart developers will come up with if you give them a hash function and say "make a MAC out of this".
I like how you picked two other examples of "bad crypto", one which is clearly an example of bad crypto (using AES and not knowing how to actually apply the function to real data), and the other of which is a personal crusade of yours and, in fact, an example of really good crypto. But you're an academic, and you're all about getting the details right. Right. ;)
I picked those just for you. :-)
the other of which is a personal crusade of yours and, in fact, an example of really good crypto
Treating anything as a silver bullet is a dumb thing to do. I only mentioned SSL because, well, it's something which is commonly used that way.
(I was originally going to use PGP as an example, but figured that you'd appreciate me using SSL instead.)
I originally asked what's wrong with SSL, until I realized it's been replaced by TLS, so I guess there was something wrong with it.
But when you talked about the perils of relying on "SSL", did you mean precisely that, or is there a problem with TLS too?
The fact that a protocol with the same objectives as SSL that you wrote today would be far simpler and more straightforward than SSL is evidence of how important it is to use SSL, because you are not going to think of all the attacks that people like Paul Kocher thought of when they reviewed and modified the protocol.
There are two major problems with SSL:
1. It's very complex and has a lot of optional components an attacker can select. This means that (a) it's very likely that SSL implementations will have bugs; (b) it's very likely that those bugs won't be triggered in common use, and will thus tend to remain unfixed; and (c) if an attacker can find such a bug, he can probably trigger it.
2. It relies on a very large number of single points of failure -- namely, Certificate Authorities. CAs screw up all the time, and an attacker only needs to find one screwy CA in order to pretend to be whoever he wants.
In many situations SSL is the best option available; but that doesn't mean that it's a good option, only that it's the least bad option.
Actually I think I've read something about the different levels of SSL. I suppose it's possible to somehow limit the available modes, to avoid exposing vulnerabilities.
an attacker only needs to find one screwy CA in order to pretend to be whoever he wants.
- What's a screwy CA, and how does the pretending work? .. If it's possible to describe roughly.
Provided that you're using a cipher with a block size equal to the hash length, this can work. This approach is usually avoided for two reasons:
1. It requires two building blocks (hash + cipher) instead of just one (hash).
2. Under the "standard model" of cryptography, you can prove things about HMAC(k, m) which you can't prove about Encrypt_k(Hash(m)) -- for practical purposes these are utterly irrelevant, but cryptographers are academics and like to have cute theoretical results.
You are actually more likely to have your system broken by making a new construction out of well-known pieces like AES and SHA1 than you would be if you made your own crypto primitives out of whole cloth. And you'd never think of designing your own block cipher.
Hey, I like that argument! I think I'll repeat it ad nauseum from now on.
The reason nobody would do this in the real world is that there are already AE cipher modes that are much faster than this, and those modes are NIST standards that have been beaten to death.
Interesting sidenote: one of the most famous AE modes is OCB, which is Phil Rogaway's patented mode. Much of the literature of other AE modes is based on coming up with patent-free alternatives to OCB --- they include EAX, GCM, and CCM. One force that drives real scrutiny into these modes --- there are proposed attacks on some constructions of CCM, for instance --- is the fact that a really good mode is patented, and the people who own the patent are particularly incented to analyze the unencumbered competitors.
Anyways, one of the things that makes the "real" AE modes faster than your proposed bespoke mode is that they can take advantage of the block cipher engine to "fingerprint" the message, instead of having to use an entire hash function. An inaccurate but maybe illuminating way to understand that is to imagine that AES gets extra security with lower overhead because it's designed to integrate a key, where SHA1 isn't designed to take a key and has to incur extra overhead to meet its safety margin.
Just to be clear, I wouldn't attempt to do this at home (so to speak). I was merely curious about the theoretical differences between the alternate approaches.
In practice your options to accomplish this are limited since any cryptographic hash under serious consideration (ie: not MD5) will produce an output longer than the block size of common symmetric ciphers (64 or 128 bits).
That means you have to choose a mode of operation to chain two or more blocks together and since you are constructing a MAC you are presumably going to use an unauthenticated one (or else it's turtles all the way down, right?).
What are some popular options to choose from?
ECB: You've reduced the collision resistance of your hash function to the block size of the cipher since you can now attack the output blocks independently.
CTR: Intercept message without delivering it. Extract keystream directly if the plaintext is known. Forge any arbitrary MAC. Game over.
CBC: Make sure not to leak any information about the difference between padding failure and MAC failure or else you have a padding oracle which can do many magical things, including possibly forge an arbitrary MAC.
Building a secure MAC construction in the way you've described is harder than it looks, mainly because symmetric ciphers themselves are unsafe to use without authentication.
[EDIT: Yes, I had that backwards originally.]
There are attacks against SHA1 that make SHA1(m || k) unsafe, which is a point I think Nate is really trying to get across here.
You might have the terms in that construction backwards, but I'm not going to dig through OASIS standards for more than the 5 minutes I just spent trying, because very few real-world systems use WS-Security.
I'm in the unfortunate position of maintaining a real-world system that uses WS-Security, and this is not the only potential problem I've found. For example, the "m" part of the digested string is composed of parts that are concatenated without delimiters (just like the old AWS authentication digest); one implementation was vulnerable to message forgery because it could be convinced to split the same string in more than one way.