

H(m || k) also insecure, you really should use HMAC - NateLawson
http://rdist.root.org/2009/10/29/stop-using-unsafe-keyed-hashes-use-hmac/

======
tptacek
Decoder ring:

    
    
      H(k || m) --> SHA1("secret-key" + "name=bob,withdraw=$200")
    
      H(m || k) --> SHA1("name=bob,withdraw=$200" + "secret-key")
    
      HMAC(k, m) --> 
        SHA1(
          ("secret-key" XOR ("\x5c" * 10)) + 
          SHA1(
            ("secret-key" XOR ("\x36" * 10)) + "name=bob,withdraw=$200"))
    

All three functions have the same purpose. If you and the bank share "secret-
key", and nobody else can guess it, then only you and the bank can compute the
function. So when you send the message "name=bob,withdraw=$200", you tack the
result of the function to the end, and the bank can prove the message came
from you.

The first two functions are what a normal, reasonable developer could be
expected to come up with given SHA1 as a tool. Combine the key with the
message and hash them; you can't work back from the hash to the message, so
the hash doesn't reveal the key, and the hash will be wildly different if even
a single bit of the key is different.

The first example is totally, fatally broken. SHA1 (and MD5 and many other
hashes) are machines that share a common design called Merkle-Damgaard, which
means that they process messages in block-length chunks, and use those blocks
to permute an internal state. The output SHA1 is the "final" contents of that
state. But there's nothing that actually "finalizes" the SHA1 state; if you
see the SHA1 value on the wire, you can keep cranking the Merkle-Damgaard
machine with additional data. That means you can mint new messages with
arbitrary data tacked to the end that will appear to be authentic. This attack
is incredibly easy to carry out; it takes ~20 lines of Ruby code.

The second example is also broken, and it's the subject of this blog post. If
you tack the key on after the message, you can't keep driving the hash with
data, because a secret you can't guess goes on the end of it. Colin and Nate
are arguing about that downthread.

The final example is HMAC. People talk a lot about HMAC without really knowing
what it is, but there you have it. What makes HMAC so much more secure than
the other two "normal programmer" examples is that the key and the message are
being hashed in separate steps, each made mathematically distinct from each
other with the opad (0x5c) and ipad (0x36).

Now you know, and knowing is 1/10000th of the battle.

~~~
cperciva
_Decoder ring_

Thanks for this -- after so long in the field I forget how much we're speaking
in our own language until someone points it out. :-)

One small correction:

    
    
      HMAC(k, m) --> 
        SHA1(
          ("secret-key" XOR ("\x5c" * 10)) + 
          SHA1(
            ("secret-key" XOR ("\x36" * 10)) + "name=bob,withdraw=$200"))
    

The opad and ipad are block-length, and the key is 0-padded; so this should be

    
    
      HMAC(k, m) --> 
        SHA1(
          (("secret-key" + "\x00" * 54) XOR ("\x5c" * 64)) + 
          SHA1(
            (("secret-key" + "\x00" * 54) XOR ("\x36" * 64)) + "name=bob,withdraw=$200"))

~~~
tptacek
Doh. Thanks. Like I always say: I should not be trusted to implement crypto.

~~~
cperciva
This is a harmless mistake, for two reasons:

1\. It doesn't actually weaken the MAC.

2\. You'd never manage to ship that bug anyway, since all of your test cases
would fail.

I'm not saying that you should be trusted to implement crypto, but I don't
think this particular error is a good reason for such a lack of trust. :-)

------
cperciva
To clarify: H(m || k) is insecure _if you use a broken hash function_. HMAC
remains secure unless the underlying hash fails catastrophically. If you use a
non-broken hash -- e.g., SHA256, which is what people should be using in 99.9%
of situations -- then the attacks Nate describes are harmless.

That said: Use HMAC! It's like having good brakes on your car -- if you're
lucky you can get away without it, but it's an easy enough thing to do right
that it's not worth taking risks.

~~~
NateLawson
I disagree: it is insecure even if you use a perfect hash function. This is
because the construction itself is vulnerable to collisions, whereas HMAC
requires 2nd-preimage attacks (as far as we know today, modulo any advances in
the multicollision work of Joux and Kelsey for long messages, YMMV).

In other words, if you use a secure 128-bit hash algorithm (not MD5!) with
HMAC, you get approximately 128-bit security. If you use the same hash with
"secret suffix", you get 64-bit security. Quite a loss of security for no
gain.

Additionally, hashes often fall to collision attacks long before 2nd-preimage
attacks. So you've maximally exposed yourself to the leading edge of attacks.
Not a good plan.

In summary, today the following MAC approaches for short messages (say,
cookies) have approximately this level of security:

    
    
        SHA256-HMAC: 2^256
        SHA256-SecretSuffix: 2^128
    
        SHA1-HMAC: 2^160
        SHA1-SecretSuffix: 2^53 (versus 2^80 originally)
    
        MD5-HMAC: 2^128
        MD5-SecretSuffix: 30 seconds on a laptop

~~~
cperciva
I wrote: If you use a non-broken hash ... then the attacks Nate describes are
harmless.

You wrote: the construction itself is vulnerable to collisions.

These two statements do not contradict each other. A hash function for which
it is feasible to find collisions is considered by the cryptographic community
to be broken.

 _hashes often fall to collision attacks long before 2nd-preimage attacks. So
you've maximally exposed yourself to the leading edge of attacks_

As I said -- H(m || k) is insecure with a broken hash function, while HMAC(k,
m) remains secure unless the underlying hash fails catastrophically. Someone
who uses SHA256(m || k) as a MAC is taking an unnecessary risk, but it's not
an immediate security flaw.

~~~
NateLawson
To be clear, we are using the cryptographic definition of "broken": a cipher
or construction with known attacks faster than the best theoretical strength

We both agree that the highest theoretical strength for collision resistance
in a hash function is the birthday paradox, which takes sqrt(2^n) work. So if
a collision attack is found that is faster, the hash algorithm is considered
"broken".

But the H(m || k) construction is broken compared to HMAC by the same
definition, no matter how secure the underlying hash function. Out of the gate
with a perfect hash, you're reducing the strength from 2^n to 2^(n/2).

If a 2nd-preimage attack was discovered for SHA256 that reduced the effort
from 2^256 to 2^128, we would both agree that it is broken for use with HMAC
(among other things). This, of course, says nothing about the practicality of
a 2^128 search, but cryptographically, SHA256-HMAC would be broken.

So why would you call a construction that always has the same effect as such a
momentous discovery not "broken"?

~~~
NateLawson
Ok, I think we agree on that: "broken" doesn't necessarily mean it is feasible
to attack.

Another example I thought of is truncated HMAC. This is where the output of an
HMAC is truncated to meet some space constraint. For example, you might have
an embedded device that is now using SHA256-HMAC but you have a legacy need to
store the result in 160 bits (due to a hard-coded packet size designed
originally for SHA-1). Let's compare this to SHA256-SecretSuffix with no
truncation.

Comparison of space used in packet versus the work required to attack each
approach:

    
    
      SHA256-HMAC-Truncated160: 160 bits, 2^160
      SHA256-SecretSuffix: 256 bits, 2^128
    

Thus even though it uses less space, a truncated SHA256-HMAC is more secure
than a full size SHA256-SecretSuffix authenticator. I think this also
illustrates that SHA256-SecretSuffix as a construction is broken.

~~~
cperciva
There's lots of good reasons to not use H(m || k) -- I'm not going to argue
with you there.

I just don't want people to say "oh my god, Nate says that what I'm doing is
insecure, I'd better fix it ASAP" -- and possibly introduce far worse problems
in their panic -- if what they're doing is dumb but not necessarily
insecure... hence my pointing out that the insecurity of H(m || k) depends on
whether the hash function is broken.

~~~
tptacek
So basically you're saying that if they're using SHA-256 H(m||k), they
shouldn't worry, and you want to make that clear.

I'll tell you what: I bet you $50 that we can find 5 examples of H(m || k)
MACs on Google code search, and that none of them will use an "acceptably"
secure hash function.

I'd really like to bet that you simply will - not - be - able - to - find an
H(m || k) MAC that uses a hash function that is survivable in that
configuration, but proving that would take too long. I think I can win that
other bet inside of an hour.

If there are no real-world systems that could possibly be secure in the H(m ||
k) configuration, I'm left wondering why you're sticking up for it, other than
to be pedantic. Being pedantic about security on Hacker News is my job, Colin,
not yours.

I know you feel like you're just being academically precise in this
conversation, but what you're really doing is creating a subtext that SHA1 is
survivable in H(m || k) configurations, and that this is really just an
example of "how broken MD5 is".

------
mustpax
This might be a stupid question, but why not hash a message then encrypt the
hash with a symmetric cypher instead of trying to mix the key material into
the pre-hash value? Is that just much slower?

~~~
cperciva
_why not hash a message then encrypt the hash with a symmetric cypher_

Provided that you're using a cipher with a block size equal to the hash
length, this can work. This approach is usually avoided for two reasons:

1\. It requires two building blocks (hash + cipher) instead of just one
(hash).

2\. Under the "standard model" of cryptography, you can prove things about
HMAC(k, m) which you can't prove about Encrypt_k(Hash(m)) -- for practical
purposes these are utterly irrelevant, but cryptographers are academics and
like to have cute theoretical results.

~~~
tptacek
And of course you must avoid using this construction yourself because it
involves you designing your own cryptographic construction.

You are actually more likely to have your system broken by making a new
construction out of well-known pieces like AES and SHA1 than you would be if
you made your own crypto primitives out of whole cloth. And you'd never think
of designing your own block cipher.

Hey, I like that argument! I think I'll repeat it _ad nauseum_ from now on.

------
mbrubeck
The WS-Security standard uses SHA1(m || k), so beware if attacks on SHA1
become feasible.

[EDIT: Yes, I had that backwards originally.]

~~~
tptacek
Systems that use SHA1(k || m), or any other Merkle-Darmgaard hash in H(k ||
m), are trivially broken in 20 lines of Ruby code.

There are attacks against SHA1 that make SHA1(m || k) unsafe, which is a point
I think Nate is really trying to get across here.

You might have the terms in that construction backwards, but I'm not going to
dig through OASIS standards for more than the 5 minutes I just spent trying,
because very few real-world systems use WS-Security.

~~~
mbrubeck
Yes, I had it backwards; it's SHA1(m || k). Fixed now.

I'm in the unfortunate position of maintaining a real-world system that uses
WS-Security, and this is not the only potential problem I've found. For
example, the "m" part of the digested string is composed of parts that are
concatenated without delimiters (just like the old AWS authentication digest);
one implementation was vulnerable to message forgery because it could be
convinced to split the same string in more than one way.

~~~
tptacek
That attack you described is a canonicalization flaw, and I think it's one of
the least talked-about, most prevalent failures in MAC-only security systems.
Colin had a good canonicalization flaw in Amazon AWS.

------
ephermata
Great post. Thanks for doing the work to get the word out.

