

The long tail of MD5 - zdw
http://www.tedunangst.com/flak/post/the-long-tail-of-MD5

======
nathan7
These days, I find using MD5 absolutely inexcusable. SHA-1 is available pretty
much everywhere, SHA-2 algorithms are quite readily available too. BLAKE2
([https://blake2.net](https://blake2.net)) is slightly faster than MD5, and
significantly more secure. It's based on the SHA-3 finalist BLAKE.
Additionally, there are versions of the algorithm that parallelise using SIMD,
and there are both 32-bit and 64-bit optimised versions too. It also gives you
the possibility of customising your hash, using it as an HMAC at no extra
cost, salting it, or adding a personalisation key to effectively have
different hash functions for different purposes. As a nice extra, it also uses
a third less RAM than SHA-2 or SHA-3. If output length is a concern,
truncating the output (with a corresponding increase in the likeliness of
collisions) is perfectly fine with any good hash function.

------
jonstewart
MD5 also remains quite popular for file hashing/hash set analysis. I.e., is
this file a member of a known set? For example, NIST's NSRL still includes MD5
hashes.

* This is not an endorsement of MD5 for such use cases.

~~~
smackfu
Do the security problems with MD5 make it bad as a hashing algorithm?

~~~
ChuckMcM
Generally the tuple (<length>,<hash>) is unique since the attacks on MD5 all
involve changing the number of octets in the hashed source. That said, it
isn't secure if you can change the length independently of changing the hash.
Hence the challenge of using it cryptographically. If you're confident you
know the correct 'length' value (acts as a sort of openly shared secret in
this case) then you trust the hash.

~~~
heinrich5991
I believe this is wrong, and in this case it's very dangerous information. See
for example the Wikipedia article for two chunks of data that only differ in a
few bits (not length!) and hash the to same MD5 hash:
[https://en.wikipedia.org/wiki/MD5#Collision_vulnerabilities](https://en.wikipedia.org/wiki/MD5#Collision_vulnerabilities)

------
JoachimS
Good article that really shows why design decisions can have impact 20, 30
years or more.

This is why I get fairly upset when people design something new I n 2014 that
uses md5. Yes even if your application does not use md5 for anything security
related, the mere fact that you use a bad, slow algorithm should be considered
wrong. And adding new dependencies will make it harder for us to migrate away
and extend that tail even further.

If you need a good hash function that is fast, use SipHas. If you need a
really secure one, use SHA-512, SHA-3, BLAKE or just SHA-256.

~~~
pbsd
Do NOT use SipHash as a cryptographic hash function. It is designed to be a
PRF, and its output length is way too small to make it collision-resistant
when used as a hash. SHA-3, SHA-512/256 or BLAKE(2) (in increasing order of
performance) are suitable cryptographic hash functions.

~~~
lisper
I think you meant: it [SipHash] is NOT designed to be a PRF.

~~~
pbsd
No, I meant what I wrote. SipHash is fine as a _keyed_ primitive -- 128-bit
security against key recovery, (64 - t)-bit security against 2^t forgery
attempts.

However, as a hash function (e.g., with a fixed key), it is entirely too weak:
2^32 work for a collision, and 2^64 work for arbitrary (second-)preimages.

~~~
lisper
OK... so if it's suitable as a PRF, then you should be able to widen the
output by using two different keys and concatenating the results, no?

~~~
wbl
Unfortunately not: it's only 64 times slower to generate 2^64 collisions for
hash function by generic attacks than just one, and those 2^64 messages will
contain a collision for another hash function of length 128. The total time is
only the sum of the times for each one.

~~~
pbsd
The attack you describe---Joux's multicollisions, as far as I can tell---
applies to Merkle-Damgard functions. SipHash is a spongy, wide-pipe, design
where that sort of thing doesn't work. The generic bound is (k!)^(1/k) ⋅
2^(n(k-1)/k)) work to find a k-collision, or approximately k ⋅ 2^n for large
k.

------
yuhong
Content-MD5 has been removed from HTTPbis.

Don't forget VBA digital signatures BTW, for which MD5 is the only choice. I
wonder how feasible a collision attack would be.

~~~
bodyfour
Yeah a couple years ago I was trying to determine whether to support Content-
MD5 in some HTTP software I was writing I could find nobody actually using it.
There was an attempt a decade ago to make Firefox validate with it, but that
died:
[https://bugzilla.mozilla.org/show_bug.cgi?id=232030](https://bugzilla.mozilla.org/show_bug.cgi?id=232030)

Also the author is wrong that there isn't a proposed replacement -- it's the
"Digest:" header (RFC 3230) although I don't think anybody uses that either.

~~~
donavanm
Id hazard the most prolific use case of md5 as http checksum is AWS S3. You
can, IIRC, send content-md5 on PUT and POST and theyll reject with a 400 if
the body hash doesnt match. Conversely the default etag is the body md5.

------
IvyMike
Want to kill MD5?

Provide public-domain easy-to-compile/use versions for all languages, and
furthermore, get their google page ranks high.

Do not underestimate laziness. If Joe Random can find a suitable MD5 algorithm
in 10 seconds but it takes 30 seconds to find a suitable SHA algorithm, guess
which one gets used?

~~~
awalton
Honestly, this is not so much a problem. Cryptographers like to write simple C
public domain implementations of their algorithms and then people go to work
for months and years squeezing performance out of them, cryptanalyzing them,
recoding them in various languages and releasing their own implementations
under new licenses. You can see this happening with pretty much every newly
designed crypto component from hashes to ciphers to entire crypto systems.

The problem is exactly the one elucidated in the blog post above: the long
tail of baked in brokenness. These systems were never designed to be
extensible, they are cooked into code that hasn't been serviced in years,
perhaps even decades in some cases. They're bolted into specifications in ways
that either obsolete the technology completely, or make it so incredibly
complicated to update the technology that doing so outweighs the apparent
cost. And that's without considering problems like deployment and phase-out.

These types of problems make it very likely we will be stuck with the
stupidity of DES and MD5 in strange places until it becomes a fire drill and
then all the sudden people will be baking in SHA-1/SHA-2 or BLAKE2 and we'll
be going through these very same motions again in 5-15 years, wondering why we
didn't learn from the mistakes we made last time.

------
w8rbt
HMAC MD5 is a perfectly reasonable choice for some situations.

~~~
astrodust
A situation like not caring.

------
yuhong
_Everybody knows that MD5 is as terribly useless as ROT13_

Huh?

~~~
jmartinpetersen
Collision attacks are trivial. It's been almost ten years since Ron Rivest
declared MD5 broken.

~~~
yuhong
I know, but this does not make MD5 as broken as ROT13.

~~~
NoMoreNicksLeft
Rot13 is actually useful. I have perl scripts which download my bank
statements, running from cron. Gets the latest every month. These perl scripts
use passwords, but without knowing that they are there, who would find them?

The trouble is, they have to pass the name of the form field to the server on
the other side, which is some variation of "password". So anyone doing a
simple string search will eventually find the perl scripts, and have access to
my passwords (supposing they have root).

If I rot13 the form field name, and then have the name be rot13('cnffjbeq') in
the script, this makes is unsearchable.

~~~
kemayo
Note to self: add "cnffjbeq" to my password-finder scripts.

~~~
NoMoreNicksLeft
Might as well add rot-1 through rot-25 to it as well... not like there's a
storage constraint, is there?

