
Why you should never use hash functions for message authentication - bakkdoor
http://blog.jcoglan.com/2012/06/09/why-you-should-never-use-hash-functions-for-message-authentication/
======
cbsmith
This is a great essay on why you should never use a hash function for message
authentication.

Except not for the reason the author thinks.

There are several problems here.

First, with SHA-1 for example, you have 64 bytes per chunk. That means you
basically get a free ride on this problem for anything < 64 bytes. A lot of
"application state" fits pretty well in 64 bytes.

Secondly, unless a message ends right on the 64 byte boundary, it is not
nearly that simple. You have a bit of a problem, because the hash is padded,
and when you add extra characters to your original string, that padding gets
_replaced_ with those values. So, it's no longer simple to just "keep going"
from where you stopped.

Still, you can see how that leaves a distinct subset of cases where you'd be
exposed. SHA-1, along with most secure hash functions, appends the length of
the message the end of the source text before performing the hash function.
That means that if you add even one byte to the string, you have now changed
the last 8 bytes that were fed in to the "original" hash function. Oh, and
your extra byte goes in _before_ those bytes, so not only did you change those
8 bytes, but you shifted them down a byte.

So, no, it isn't nearly that easy to crack a SHA-1 based authentication, and
yet, it is easy enough that you should totally NOT use them for authentication
and instead use HMAC ; they _are_ vulnerable to extension attacks, it's just
not nearly as easy as this article suggests, and conclusions one might draw
from this article (like you can solve this problem by feeding all source text
in to the hash algorithm backwards) are likely ill founded.

It just turns out that cryptography is way more complicated, and even in terms
of understanding weaknesses that arise from doing things wrong, you are going
to get it wrong. Trust the experts, when they say it is a bad idea, but don't
assume _why_ it is a bad idea can be explained in a short blog article like
this.

 _UPDATED_ : Added an explanation as to why it might be dangerous to just take
this article at its word.

~~~
pbsd
"Never use a hash function for message authentication" is such a simplistic
view. The author takes a common hash function design (Merkle-Damgard), and
somehow extrapolates that hash functions should be simply ruled out for
authentication.

First, this may send the wrong message to the less-focused reader: "what, I
should use block ciphers instead?". Luckily, HMAC is eventually brought up,
which is a fine solution.

HMAC requires 2 calls to H, our favorite hash function. Certain applications
may find the overhead to be prohibitively high. With non-broken hash functions
(e.g., any of the SHA-3 finalists), we can use the so-called envelope
authenticator: A = H(K||M||K), with some padding to separate K from M to keep
security proofs happy. This is significantly faster for short messages, and
short is the most common size out there.

~~~
threedaymonk
_First, this may send the wrong message to the less-focused reader: "what, I
should use block ciphers instead?". Luckily, HMAC is eventually brought up,
which is a fine solution._

If the reader can't be bothered to read the article to the end, I hardly think
it reflects on the author. Whilst it might indeed be a more concise article if
it just said "don't use a hash function for message authentication, use HMAC",
it would still miss the important final point about timing attacks, not to
mention the journey of explanation about _why_ you shouldn't just use a hash
function.

~~~
pbsd
You are correct, I shouldn't have tried to argue poor readership, that's just
sloppy.

------
Ralz
I e-mailed visa about something similar with their new upcoming V.me service.
They suggest that you use md5 to generate the tag which is known to be weaker
than SHA-1. I was a little surprised that a company like Visa would mess up on
crypto and not know to use HMAC instead of just a simple hash. Never heard a
response from them either.

Here's what their documentation says:

Language Standard Syntax for Generating MD5 Hash Java import
org.apache.commons.codec.digest.*; hash =
DigestUtils.md5Hex(string1+string2+string3...); PHP $hash =
md5($string1.$string2.$string3...); Ruby require 'digest/md5' hash =
Digest::MD5.hexdigest(string1+string2+string3...) Python import md5 hash =
md5.new(string1+string2+string3...)

~~~
simondlr
The link: <https://developer.v.me/docs/get_credentials>

------
loeg
Tl;dr: Use HMAC for Hash-based Message Authentication Codes and hash functions
for hash functions. Don't use them the other way around.

PS, maybe more developers should take an intro course on crypto.

~~~
ch0wn
\+ don't compare secret strings in a manner that makes it possible to draw
conclusions about the position of the inequality.

------
klodolph
> Finally, you should make sure your application does not exit early if the
> tag is invalid. You should do all the data processing you would normally do,
> just short of modifying the database, and check the tag last. If you return
> early you risk another timing attack.

What kind of timing attack is that? In order for there to be a timing attack,
there has to be a difference in the timings.

1\. You can either process the data, check the authentication code, then
commit.

2\. Or you can check the authentication code, process the data, then commit.

I don't see any attacks on #2 that couldn't also work on #1.

~~~
jcoglan
The timing attack is if you check the tag, that fails, and then you _don't_ do
any further request processing. This shortens the request time. It depends
quite a lot on what you're actually doing with the message, but in general you
want to leak as little info as possible about what's happening during any
crypto-related process.

~~~
majormajor
But it only tells them that it failed, not that they got closer, like with the
string-comparison based one. What does that gain an attacker if you already
provide other feedback about the request being invalid? And not doing so would
result in a bad user experience if you have users run into a bug somewhere and
it fails silently so they don't know that nothing was actually done.

~~~
tptacek
The leak that you're ostensibly timing is that in order to figure out how much
of the candidate MAC string was valid, the target had to compare more byte,
which takes more time, which adds observable lag to the error response.

~~~
gav
Is the observable lag for a string comparison significant enough to be useful?

We're talking about such small amounts of time compared to the overhead of the
full web stack.

~~~
Mvandenbergh
The observable lag isn't usually significant enough if you only do it once but
over many requests the stochastic factors can be compensated for.

------
Mithrandir
Number one thing I learned from the Coursera class: don't build your own
crypto.

~~~
cjg
I've heard this said many times before and I agree. However, using crypto
libraries does not solve the problem of vulnerabilities through cryptography
misuse. For example, keys must be stored correctly, algorithms often need
initialising in the correct mode for your specific application, IVs must not
be repeated, and, as shown in this article, hashes should be used in specific
ways to work correctly.

There are many ways to fail with cryptography and avoiding them all takes
considerable expertise. Using crypto libraries does not solve this problem.

~~~
tptacek
This is the entire point of high-level crypto libraries, like Guttman's
libcrypt and Google's Keyczar. So, yeah, don't use OpenSSL or javax::crypto or
whatever .NET calls it; but, do consider using something like Keyczar, or,
better yet, just use PGP/GPG to store data at rest, TLS for data in motion,
and be done with it.

~~~
SoftwareMaven
Those still require key management. There is no way a developer can abdicate
all responsibility for this stuff, no matter how high level (at least, not
until we have good, common, trusted security as a service).

~~~
tptacek
Part of the point of Keyczar (note the name) is to make the right decisions
about key management in advance and abstract them away from developers.

------
theunixbeard
The title is sort of linkbait, as in fact what it should be is "Never use hash
functions vulnerable to extension attacks"... (And most common ones are) With
that said, this stuff is pretty cool and after reading that the author learned
all this in the Coursera Cryptography class I decided to sign up for it.
(Starts June 11th)

~~~
quotemstr
You can learn all that and more just by reading Applied Cryptography.

~~~
tptacek
People that build crypto after reading Applied Cryptography are doing a fine
job paying for my kids college education, so I agree with you and encourage
everyone to do likewise.

If you don't happen to like my kids, well, first, screw you, and secondly: buy
_Practical Cryptography_ or _Cryptography Engineering_ (really the same book)
and burn your copy of "Applied".

~~~
quotemstr
There's nothing wrong with Applied Cryptography so long as you _understand_
it. If you blindly apply outdated algorithms, yes, you lose. Everyone should
read both Applied Cryptography _and_ the other books you mentioned, and keep
up on the literature besides.

~~~
tptacek
No, there's a lot wrong with _Applied Cryptography_, and those things have
very little to do with the fact that AC writes about IDEA and not AES.

If you read _Practical Cryptography_, you don't need to read _Applied
Cryptography_. AC is a book full of trivia, and of encyclopedia-style
descriptions of random block ciphers with minimal attention given to the
actual real-world attacks on implementations of those ciphers.

I strongly advise that you _not_ waste time reading AC. If you're lucky, you
can read it and just lose time; if you're unlucky --- and a lot of my clients
have been --- you can find yourself having learned stuff you'll later need to
unlearn.

~~~
quotemstr
Paging through my copy of AC, I think you're right. I'd been a while since I
read it. PC is indeed the better book.

~~~
tptacek
Even Schneier has somewhat acknowledged how toxic AC turned out to be.

------
jebblue
>> This fact means that an attacker can determine the first correct character
of the tag by submitting requests to a signed URL with a different first
character in the tag each time, and stopping when the request takes a little
longer than usual. After guessing the first character they can move onto the
second, and so on until they’ve guessed the whole correct tag.

Wouldn't this attack be eliminated by using iptables rate limiting to reduce
the attack window of opportunity?

~~~
tptacek
Wouldn't this attack be better eliminated by fixing the timing leak that is
potentially allowing people to guess valid MACs on packets?

The reality is that you probably _can_ dick around with things in your
deployment and your app to make timing attacks prohibitively
expensive/annoying; if you understand that you're not _eliminating_ the timing
leak, but rather _masking_ it, you can take advantage of the additional
measurements required to unmask the leak to give your MAC enough of a buffer
to last for its whole useful lifetime.

But when you do this, you're really playing on the razor's edge of what we
currently know about side channel attacks on crypto, and you're probably going
to end up putting more effort into your workaround than you would in just
fixing the underlying bug.

~~~
CamperBob2
I guess the idea of just blacklisting the client's IP after the first
1,048,576 failed attempts is too boring, or has some other drawback.

~~~
tptacek
The idea of trying to detect people employing timing attacks on your
cryptography and block them individually by IP address is so obviously
retarded that the comment I'm replying to is indistinguishable from trolling.

~~~
CamperBob2
I'm not an IT guy, so no, I wasn't trolling. Why exactly is it "retarded" to
build your system to reject (or at least flag) access patterns that are
unlikely to be due to legitimate activity?

I'd recommend using the word "retarded" with a bit more circumspection.
Obviously the incoming IP address doesn't uniquely identify a client who's
likely to be on the other side of a NAT gateway. But the idea that a system
should just sit there silently and carry on business as usual while any one
address or class-C block generates large numbers of failed access attempts
seems like a _good_ application of the word in question.

~~~
tptacek
You're not an IT guy, but you are a programmer, and you know that leaving a
vulnerability in your code, hoping the devops team catches attempts to exploit
it, is a fucking retarded idea. I think you're just trolling.

~~~
CamperBob2
Who said anything about leaving a vulnerability in the code? If your security
model depends on a suboptimal implementation of strcmp(), you have bigger
problems than timing attacks.

~~~
tptacek
I have no idea what you mean by "suboptimal implementation of strcmp".

------
bemmu
For the string comparison, could you really use that in a timing attack.
Wouldn't the difference between comparison taking one char longer be measured
in nanoseconds, while the overall network lag would be milliseconds?

~~~
lobster_johnson
Yes, I don't see that being detectable for things like web-based APIs. For
these string lengths (eg., HMAC with SHA-256), the network lag would add
sufficient randomness that you would not be able to measure any difference in
timings in the string comparison.

~~~
tptacek
It turns out that it _is_ possible, over a WAN, on _some_ frameworks. The rule
of thumb is, if the comparison is effectively being done by libc's memcmp(),
the timing attack is very difficult to do even across a single switch;
however, some platforms don't drop to memcmp and instead compare each byte
explicitly. These are timeable.

If you're wondering, "how do I detect nanosecond differences over a network
when my measurement will be swamped by other things happening on the target,
the network, and my host", the answers boil down to:

* You're going to move your measurement code as close to the drivers as possible, and fix interrupt handling so that interrupts don't confound your measurements.

* You're going to get yourself on the same hosting provider as your target; for instance, a good chunk of all target apps can be attacked via Amazon EC2 for not very much money.

* You're going to take lots and lots and lots and _lots_ of measurements and then use high school statistics to process the results.

------
spicyj
> The easiest way to defeat this attack is, instead of directly comparing two
> strings, compare their mappings under a collision-resistant hash function.

Is this really the best way to compare strings without giving away timing
info?

~~~
tptacek
No. It's A way to do it, but not the fastest or the simplest. An easier,
faster way to do it is what Rails does (after Nate Lawson via Coda Hale told
them how): accumulate the XORs of each offset of both strings and then verify
that they add up to zero.

~~~
reitzensteinm
What's wrong with simply doing a traditional compare, but not exiting early
when it's found the two strings don't match?

I see how the xor method works, but not why it's superior.

Also, it would seem that you'd be leaking information about the size of the
other string with either method (whether you exit early or not when you run
past the length of one of the strings) - not a problem when comparing hashes
presumably, but is it never a concern?

I don't think you'd be leaking that information if you hashed both strings and
compared the hash, because it wouldn't stop getting faster when the shorter
string gets shorter.

~~~
gchpaco
Compiler optimizations, CPU pipeline and branch prediction would be my guess,
although I'm sure someone else can jump in with more detail. XOR is basically
immune to optimized tricks.

~~~
reitzensteinm
Ah, that does make sense. Thanks!

------
ajdecon
Stupid question: Whenever I've used GPG to sign an email, it includes a line
saying "Hash: SHA1". Does this imply PGP-signed messages are vulnerable to
this, or does PGP/GPG do something different/smarter?

------
jiggy2011
One thing I'm slightly confused about here.

The article says:

 _"This sequence is then folded using a compression function h(). The details
of the h() depend on the hashing function, but the only thing that concerns us
here is that the compression function takes two message blocks and returns
another block of the same size."_

So if the chunks in a SHA-1 hash are 512 bits each then surely the output of
the hash function would be 512 bits rather than the 160 bit digest?

Edit: the IV is 160 bits , so "another block of the same size" means each
derivative block is the size of the IV _not_ of the actual data.

------
more_original
RFC 2104 specifies how you should do it, see e.g.
[http://de.wikipedia.org/wiki/Keyed-
Hash_Message_Authenticati...](http://de.wikipedia.org/wiki/Keyed-
Hash_Message_Authentication_Code)

The Handbook of Applied Cryptography, Chapter 9 (free online:
<http://cacr.uwaterloo.ca/hac/>) nicely explains the reasons.

------
terangdom
In order for an extension attack, wouldn't the blocks have to align perfectly?
Like suppose I hash [abcd][efgh][k]

How would you extend that?

~~~
tptacek
You guess the length of the last block (or iterate over possible lengths with
trials of the attack). When you know it, you can easily predict the MD padding
at the end of the hash to fill in the block. That fake padding (our code calls
it "glue padding") actually ends up in the forged message; for instance, if
you're signing URLs, you'll see it as gibberish in the middle of the URL. In
practice, most code does not care about the gibberish "glue" bytes.

------
seats
tptacek or others with domain knowledge-

Is the timing attack hardening suggested in the blog post a standard approach?

If I was trying to attack a system and knew loosely that they did what he
suggested (hashing then comparing vs comparing with timing exposed) , my
untrained instinct would be that this is the weakest part. In other words I
think this just makes the timing attack a little more difficult, but still
possible, by producing specific hashes that carry out the timing attack.

When I've needed to harden comparisons against timing attacks, I've always
just used constant time comparison functions, such as these ->

<http://codahale.com/a-lesson-in-timing-attacks/>
[http://rdist.root.org/2010/01/07/timing-independent-array-
co...](http://rdist.root.org/2010/01/07/timing-independent-array-comparison/)

------
quotemstr
Also, HMAC is just one MAC (message authentication code). OMAC and is another
good one; it has the interesting property of being built on top of a block
cipher instead of a hash function, which can reduce the number of "moving
parts" in a system if you're using a block cipher of some sort anyway.

------
ma2rten
I was wondering about that timing attack. Is that really possible? How many
requests would you have to make until you can get reliable statistics over the
timing of a string comparison, when you have network delays, other requests
and all kinds of stuff that influence timing?

~~~
tptacek
It's very difficult, but possible. It's a plausible enough threat, especially
if you're cloud hosted now or ever might be, that you should take steps to
avoid it.

Jitter and confounding are problems that can be addressed simply by repeated
measurements.

The rule-of-thumb from Crosby & Wallach's paper on remote timing attacks is,
assume tens-to-hundreds of nanoseconds precision if you can colocate the
attacker at the same provider, and tens of microseconds if you have to do it
over the Internet.

~~~
ma2rten
Thanks for your reply. I was totally oblivious to this type of timing attack
before I read the article.

------
exit
what's wrong with hashing (message + secret) instead?

~~~
jcoglan
Relying on the way you happen to combine data, instead of using a function
that's _designed_ for authentication and has baked-in a safe way to combine
the inputs, is a bad idea. "What if
$EDGE_CASE_OF_INAPPROPRIATE_CRYPTO_FUNCTION" is never a good question to ask.
Just use the right tools in the first place.

~~~
exit
> _"What if $EDGE_CASE_OF_INAPPROPRIATE_CRYPTO_FUNCTION" is never a good
> question to ask. Just use the right tools in the first place._

that's not an intelligent attitude.

understanding where an edge case breaks down is still illuminating, regardless
of whether i use hmac in the end.

------
einhverfr
This seems to me like a variant of "do not trust the client." Good info
though. I have learned a lot more about how hash algorythms work. I do wonder
though if fixed-param hashes are relatively safe due to the inability to add
suffixes.

~~~
tptacek
"Never trust the client" is, like "always validate input", one of those
timeless bits of security strategy that is worth pretty much nothing in the
real world. It's about as useful as "pretty much you should always make sure
you're secure, pretty much."

~~~
einhverfr
I disagree with you here. One should always be sure input is validated
somewhere before anything important is done with it. And not trusting the
client is one element of what I call a "push security back" strategy (that
strategy is, basically, don't do any security enforcement in your application
you can't make a component further back do just as well. The reasoning here is
that components like operating systems and web apps will always have more
review and more eyes than your web app, so if they can work without trusting
your app then all the better. Work from least privilege and design so that
components further back can do things like authentication, authorization, and
the like, to the extent this is practical. This creates a narrower security
perimeter, and greater depth in defence. Of course, as in all things, security
is a matter of perpetual tradeoffs.

------
X-Istence
If I remember correctly this was the same issue that Flickr had with their API
calls at one point in time!

