Hacker News new | comments | show | ask | jobs | submit login
H(m || k) also insecure, you really should use HMAC (root.org)
50 points by NateLawson 2944 days ago | hide | past | web | 35 comments | favorite

Decoder ring:

  H(k || m) --> SHA1("secret-key" + "name=bob,withdraw=$200")

  H(m || k) --> SHA1("name=bob,withdraw=$200" + "secret-key")

  HMAC(k, m) --> 
      ("secret-key" XOR ("\x5c" * 10)) + 
        ("secret-key" XOR ("\x36" * 10)) + "name=bob,withdraw=$200"))
All three functions have the same purpose. If you and the bank share "secret-key", and nobody else can guess it, then only you and the bank can compute the function. So when you send the message "name=bob,withdraw=$200", you tack the result of the function to the end, and the bank can prove the message came from you.

The first two functions are what a normal, reasonable developer could be expected to come up with given SHA1 as a tool. Combine the key with the message and hash them; you can't work back from the hash to the message, so the hash doesn't reveal the key, and the hash will be wildly different if even a single bit of the key is different.

The first example is totally, fatally broken. SHA1 (and MD5 and many other hashes) are machines that share a common design called Merkle-Damgaard, which means that they process messages in block-length chunks, and use those blocks to permute an internal state. The output SHA1 is the "final" contents of that state. But there's nothing that actually "finalizes" the SHA1 state; if you see the SHA1 value on the wire, you can keep cranking the Merkle-Damgaard machine with additional data. That means you can mint new messages with arbitrary data tacked to the end that will appear to be authentic. This attack is incredibly easy to carry out; it takes ~20 lines of Ruby code.

The second example is also broken, and it's the subject of this blog post. If you tack the key on after the message, you can't keep driving the hash with data, because a secret you can't guess goes on the end of it. Colin and Nate are arguing about that downthread.

The final example is HMAC. People talk a lot about HMAC without really knowing what it is, but there you have it. What makes HMAC so much more secure than the other two "normal programmer" examples is that the key and the message are being hashed in separate steps, each made mathematically distinct from each other with the opad (0x5c) and ipad (0x36).

Now you know, and knowing is 1/10000th of the battle.

Decoder ring

Thanks for this -- after so long in the field I forget how much we're speaking in our own language until someone points it out. :-)

One small correction:

  HMAC(k, m) --> 
      ("secret-key" XOR ("\x5c" * 10)) + 
        ("secret-key" XOR ("\x36" * 10)) + "name=bob,withdraw=$200"))
The opad and ipad are block-length, and the key is 0-padded; so this should be

  HMAC(k, m) --> 
      (("secret-key" + "\x00" * 54) XOR ("\x5c" * 64)) + 
        (("secret-key" + "\x00" * 54) XOR ("\x36" * 64)) + "name=bob,withdraw=$200"))

Doh. Thanks. Like I always say: I should not be trusted to implement crypto.

This is a harmless mistake, for two reasons:

1. It doesn't actually weaken the MAC.

2. You'd never manage to ship that bug anyway, since all of your test cases would fail.

I'm not saying that you should be trusted to implement crypto, but I don't think this particular error is a good reason for such a lack of trust. :-)

Wow, thanks for the summary.. I wish that all security experts had your skill at explaining things in layman's terms.

Very simple, short explanation. Reminds me that shorter is usually better.

To clarify: H(m || k) is insecure if you use a broken hash function. HMAC remains secure unless the underlying hash fails catastrophically. If you use a non-broken hash -- e.g., SHA256, which is what people should be using in 99.9% of situations -- then the attacks Nate describes are harmless.

That said: Use HMAC! It's like having good brakes on your car -- if you're lucky you can get away without it, but it's an easy enough thing to do right that it's not worth taking risks.

I disagree: it is insecure even if you use a perfect hash function. This is because the construction itself is vulnerable to collisions, whereas HMAC requires 2nd-preimage attacks (as far as we know today, modulo any advances in the multicollision work of Joux and Kelsey for long messages, YMMV).

In other words, if you use a secure 128-bit hash algorithm (not MD5!) with HMAC, you get approximately 128-bit security. If you use the same hash with "secret suffix", you get 64-bit security. Quite a loss of security for no gain.

Additionally, hashes often fall to collision attacks long before 2nd-preimage attacks. So you've maximally exposed yourself to the leading edge of attacks. Not a good plan.

In summary, today the following MAC approaches for short messages (say, cookies) have approximately this level of security:

    SHA256-HMAC: 2^256
    SHA256-SecretSuffix: 2^128

    SHA1-HMAC: 2^160
    SHA1-SecretSuffix: 2^53 (versus 2^80 originally)

    MD5-HMAC: 2^128
    MD5-SecretSuffix: 30 seconds on a laptop

I wrote: If you use a non-broken hash ... then the attacks Nate describes are harmless.

You wrote: the construction itself is vulnerable to collisions.

These two statements do not contradict each other. A hash function for which it is feasible to find collisions is considered by the cryptographic community to be broken.

hashes often fall to collision attacks long before 2nd-preimage attacks. So you've maximally exposed yourself to the leading edge of attacks

As I said -- H(m || k) is insecure with a broken hash function, while HMAC(k, m) remains secure unless the underlying hash fails catastrophically. Someone who uses SHA256(m || k) as a MAC is taking an unnecessary risk, but it's not an immediate security flaw.

To be clear, we are using the cryptographic definition of "broken": a cipher or construction with known attacks faster than the best theoretical strength

We both agree that the highest theoretical strength for collision resistance in a hash function is the birthday paradox, which takes sqrt(2^n) work. So if a collision attack is found that is faster, the hash algorithm is considered "broken".

But the H(m || k) construction is broken compared to HMAC by the same definition, no matter how secure the underlying hash function. Out of the gate with a perfect hash, you're reducing the strength from 2^n to 2^(n/2).

If a 2nd-preimage attack was discovered for SHA256 that reduced the effort from 2^256 to 2^128, we would both agree that it is broken for use with HMAC (among other things). This, of course, says nothing about the practicality of a 2^128 search, but cryptographically, SHA256-HMAC would be broken.

So why would you call a construction that always has the same effect as such a momentous discovery not "broken"?

Ok, I think we agree on that: "broken" doesn't necessarily mean it is feasible to attack.

Another example I thought of is truncated HMAC. This is where the output of an HMAC is truncated to meet some space constraint. For example, you might have an embedded device that is now using SHA256-HMAC but you have a legacy need to store the result in 160 bits (due to a hard-coded packet size designed originally for SHA-1). Let's compare this to SHA256-SecretSuffix with no truncation.

Comparison of space used in packet versus the work required to attack each approach:

  SHA256-HMAC-Truncated160: 160 bits, 2^160
  SHA256-SecretSuffix: 256 bits, 2^128
Thus even though it uses less space, a truncated SHA256-HMAC is more secure than a full size SHA256-SecretSuffix authenticator. I think this also illustrates that SHA256-SecretSuffix as a construction is broken.

There's lots of good reasons to not use H(m || k) -- I'm not going to argue with you there.

I just don't want people to say "oh my god, Nate says that what I'm doing is insecure, I'd better fix it ASAP" -- and possibly introduce far worse problems in their panic -- if what they're doing is dumb but not necessarily insecure... hence my pointing out that the insecurity of H(m || k) depends on whether the hash function is broken.

So basically you're saying that if they're using SHA-256 H(m||k), they shouldn't worry, and you want to make that clear.

I'll tell you what: I bet you $50 that we can find 5 examples of H(m || k) MACs on Google code search, and that none of them will use an "acceptably" secure hash function.

I'd really like to bet that you simply will - not - be - able - to - find an H(m || k) MAC that uses a hash function that is survivable in that configuration, but proving that would take too long. I think I can win that other bet inside of an hour.

If there are no real-world systems that could possibly be secure in the H(m || k) configuration, I'm left wondering why you're sticking up for it, other than to be pedantic. Being pedantic about security on Hacker News is my job, Colin, not yours.

I know you feel like you're just being academically precise in this conversation, but what you're really doing is creating a subtext that SHA1 is survivable in H(m || k) configurations, and that this is really just an example of "how broken MD5 is".

So why would you call a construction that always has the same effect as such a momentous discovery not "broken"?

I didn't say that H(m || k) wasn't broken. I said that it wasn't necessarily insecure. :-)

I'm confused about why you're even trying to make this argument, Colin. Name a real system that uses H(m||k) that you trust. H(x||y) MAC schemes are a hallmark of DIY crypto, and DIY crypto systems are the ones that get broken first.

I'm confused about why you're even trying to make this argument, Colin.

Because I'm an academic who believes in education and getting details right. And also because I don't want to have people panic unnecessarily.

Name a real system that uses H(m||k) that you trust. H(x||y) MAC schemes are a hallmark of DIY crypto, and DIY crypto systems are the ones that get broken first.

I wouldn't say that it's a hallmark of DIY crypto so much as it is a hallmark of not understanding crypto -- like advertising the use of AES-256 without mentioning the block cipher mode, or treating SSL as a silver bullet.

You're right that I wouldn't trust a system which used SHA256(m || k) as a MAC; but if I was hired to fix such a system, that would be the last thing I'd fix. I'd start by looking for the security flaws.

H(m || k) was a construction that was used in the early 1990's. No protocol standards specified it after HMAC appeared in 1996. SHA256 appeared in 2002.

If you're using a 2002 heavyweight hash function with an early 1990's MAC construction, you are an idiot. I can use such strong words because no one does this. There are no standards anywhere that specified this exact construction because HMAC was invented long before SHA2. If anyone does this in some private toolkit, it's the perfect example why unreviewed, proprietary crypto designs are horrible.

However, the much more likely scenario if you use H(m || k) is that you're using MD5 (or possibly, but less likely, SHA-1). You have a legacy application that inherited an early 1990's design. This construction was specified for SNMP in 1991, for example.

So while you have a technical point, your argument is detrimental to real, deployed systems. I'm extremely concerned that your straw man argument based on SHA256-SecretSuffix, which is extremely unlikely to exist anywhere, would be taken as supporting MD5- or SHA1-SecretSuffix which I know exist in various toolkits. I've seen it when talking to some clients.

Can you make it clear that both MD5- and SHA1-SecretSuffix are not only cryptographically broken, but have feasible attacks with today's hardware?

In other words, for the people who are downmodding Nate because he used the word "idiot" and they don't know who he is because his karma score is 3 digits (and who are concurrently modding me and Colin up because our karma scores are 5 digits because we are losers):

No matter what Colin is saying, Nate has actually found systems that used SHA1(m||k) and MD5(m||k), and those systems are breakable with off-the-shelf hardware. Which is why he wrote this blog post we're commenting on.

Theory's nice, but Colin's theories are just theories, and in the harsh light of real-world experience, his explanations of those theories are misleading. No secure system uses H(m||k). And H(m||k) is exactly the kind of thing smart developers will come up with if you give them a hash function and say "make a MAC out of this".

So "zero" are the number of systems you trust that use hand-hacked H(m||k) MACs, and like Nate you agree that using a hand-hacked H(m||k) MAC is evidence of not understanding how MACs work.

I like how you picked two other examples of "bad crypto", one which is clearly an example of bad crypto (using AES and not knowing how to actually apply the function to real data), and the other of which is a personal crusade of yours and, in fact, an example of really good crypto. But you're an academic, and you're all about getting the details right. Right. ;)

I like how you picked two other examples of "bad crypto"...

I picked those just for you. :-)

the other of which is a personal crusade of yours and, in fact, an example of really good crypto

Treating anything as a silver bullet is a dumb thing to do. I only mentioned SSL because, well, it's something which is commonly used that way.

(I was originally going to use PGP as an example, but figured that you'd appreciate me using SSL instead.)


I originally asked what's wrong with SSL, until I realized it's been replaced by TLS, so I guess there was something wrong with it.

But when you talked about the perils of relying on "SSL", did you mean precisely that, or is there a problem with TLS too?

There's nothing cryptographically wrong with SSL (TLS and SSL are mostly synonyms in modern systems that disallow SSLv1 and SSLv2). SSL is actually an example of a cryptosystem done very right, which has survived and adapted to 15+ years of attacks. If you read the protocol, you will see lots of places that seem clunky, and almost all of them are countermeasures to older attacks.

The fact that a protocol with the same objectives as SSL that you wrote today would be far simpler and more straightforward than SSL is evidence of how important it is to use SSL, because you are not going to think of all the attacks that people like Paul Kocher thought of when they reviewed and modified the protocol.

As Thomas said, I was treating "SSL" and "TLS" as synonyms.

There are two major problems with SSL:

1. It's very complex and has a lot of optional components an attacker can select. This means that (a) it's very likely that SSL implementations will have bugs; (b) it's very likely that those bugs won't be triggered in common use, and will thus tend to remain unfixed; and (c) if an attacker can find such a bug, he can probably trigger it.

2. It relies on a very large number of single points of failure -- namely, Certificate Authorities. CAs screw up all the time, and an attacker only needs to find one screwy CA in order to pretend to be whoever he wants.

In many situations SSL is the best option available; but that doesn't mean that it's a good option, only that it's the least bad option.

Thanks for the information!

Actually I think I've read something about the different levels of SSL. I suppose it's possible to somehow limit the available modes, to avoid exposing vulnerabilities.

an attacker only needs to find one screwy CA in order to pretend to be whoever he wants.

- What's a screwy CA, and how does the pretending work? .. If it's possible to describe roughly.

This might be a stupid question, but why not hash a message then encrypt the hash with a symmetric cypher instead of trying to mix the key material into the pre-hash value? Is that just much slower?

why not hash a message then encrypt the hash with a symmetric cypher

Provided that you're using a cipher with a block size equal to the hash length, this can work. This approach is usually avoided for two reasons:

1. It requires two building blocks (hash + cipher) instead of just one (hash).

2. Under the "standard model" of cryptography, you can prove things about HMAC(k, m) which you can't prove about Encrypt_k(Hash(m)) -- for practical purposes these are utterly irrelevant, but cryptographers are academics and like to have cute theoretical results.

And of course you must avoid using this construction yourself because it involves you designing your own cryptographic construction.

You are actually more likely to have your system broken by making a new construction out of well-known pieces like AES and SHA1 than you would be if you made your own crypto primitives out of whole cloth. And you'd never think of designing your own block cipher.

Hey, I like that argument! I think I'll repeat it ad nauseum from now on.

What you just designed is an AE cipher mode (with "additional data", meaning in this case "all the data"). Your AE mode uses SHA1 instead of a permutation of a block cipher.

The reason nobody would do this in the real world is that there are already AE cipher modes that are much faster than this, and those modes are NIST standards that have been beaten to death.

Interesting sidenote: one of the most famous AE modes is OCB, which is Phil Rogaway's patented mode. Much of the literature of other AE modes is based on coming up with patent-free alternatives to OCB --- they include EAX, GCM, and CCM. One force that drives real scrutiny into these modes --- there are proposed attacks on some constructions of CCM, for instance --- is the fact that a really good mode is patented, and the people who own the patent are particularly incented to analyze the unencumbered competitors.

Anyways, one of the things that makes the "real" AE modes faster than your proposed bespoke mode is that they can take advantage of the block cipher engine to "fingerprint" the message, instead of having to use an entire hash function. An inaccurate but maybe illuminating way to understand that is to imagine that AES gets extra security with lower overhead because it's designed to integrate a key, where SHA1 isn't designed to take a key and has to incur extra overhead to meet its safety margin.

Thanks, this makes a lot of sense.

Just to be clear, I wouldn't attempt to do this at home (so to speak). I was merely curious about the theoretical differences between the alternate approaches.

As Colin mentioned if your cipher block size is the same as the hash output size there are no obvious attacks.

In practice your options to accomplish this are limited since any cryptographic hash under serious consideration (ie: not MD5) will produce an output longer than the block size of common symmetric ciphers (64 or 128 bits).

That means you have to choose a mode of operation to chain two or more blocks together and since you are constructing a MAC you are presumably going to use an unauthenticated one (or else it's turtles all the way down, right?).

What are some popular options to choose from?

ECB: You've reduced the collision resistance of your hash function to the block size of the cipher since you can now attack the output blocks independently.

CTR: Intercept message without delivering it. Extract keystream directly if the plaintext is known. Forge any arbitrary MAC. Game over.

CBC: Make sure not to leak any information about the difference between padding failure and MAC failure or else you have a padding oracle which can do many magical things, including possibly forge an arbitrary MAC.

Building a secure MAC construction in the way you've described is harder than it looks, mainly because symmetric ciphers themselves are unsafe to use without authentication.

The WS-Security standard uses SHA1(m || k), so beware if attacks on SHA1 become feasible.

[EDIT: Yes, I had that backwards originally.]

Systems that use SHA1(k || m), or any other Merkle-Darmgaard hash in H(k || m), are trivially broken in 20 lines of Ruby code.

There are attacks against SHA1 that make SHA1(m || k) unsafe, which is a point I think Nate is really trying to get across here.

You might have the terms in that construction backwards, but I'm not going to dig through OASIS standards for more than the 5 minutes I just spent trying, because very few real-world systems use WS-Security.

Yes, I had it backwards; it's SHA1(m || k). Fixed now.

I'm in the unfortunate position of maintaining a real-world system that uses WS-Security, and this is not the only potential problem I've found. For example, the "m" part of the digested string is composed of parts that are concatenated without delimiters (just like the old AWS authentication digest); one implementation was vulnerable to message forgery because it could be convinced to split the same string in more than one way.

That attack you described is a canonicalization flaw, and I think it's one of the least talked-about, most prevalent failures in MAC-only security systems. Colin had a good canonicalization flaw in Amazon AWS.

Great post. Thanks for doing the work to get the word out.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact