

Single-block collision for MD5 - robinhouston
http://marc-stevens.nl/research/md5-1block-collision/

======
dchest
Nice! From PDF:

    
    
                       Message 1  
        
        4d c9 68 ff 0e e3 5c 20 95 72 d4 77 7b 72 15 87
        d3 6f a7 b2 1b dc 56 b7 4a 3d c0 78 3e 7b 95 18
        af bf a2[00]a8 28 4b f3 6e 8e 4b 55 b3 5f 42 75
        93 d8 49 67 6d a0 d1[55]5d 83 60 fb 5f 07 fe a2
                          
                       Message 2
         
        4d c9 68 ff 0e e3 5c 20 95 72 d4 77 7b 72 15 87
        d3 6f a7 b2 1b dc 56 b7 4a 3d c0 78 3e 7b 95 18
        af bf a2[02]a8 28 4b f3 6e 8e 4b 55 b3 5f 42 75
        93 d8 49 67 6d a0 d1[d5]5d 83 60 fb 5f 07 fe a2
                           
                     Common MD5 hash
        
               008ee33a9d58b51cfeb425b0959121c9

~~~
0x0
Wow, so only two bits flipped? That's pretty bad even if you're using md5 just
as a simple checksum protection against transmission errors.

~~~
rphlx
Not really. This collision required specific intent, and a great deal of
compute power, to find.

md5 is still adequate for accidental/random error detection because the
universe doesn't (usually) spend many hundreds of hours on hundreds of GPUs to
try to corrupt your data without you noticing :)

~~~
0x0
Fair point. I just kind of assumed that any collision would require quite a
lot of flipped bits, because usually just one flipped bit is enough to cause
an "avalanche" of changes in the hash sum.

Would it be possible to find a collision where only one bit is flipped in the
input?

------
dorianj
From the paper:

"Based on our complexity analysis and a number of computers available for our
collision search, we estimated that it would take approximately five weeks. As
this was feasible enough, we started the actual search. It was our fortune
that a collision was found a bit earlier, namely after only three weeks."

This on a Core2 Q9550, a four year old processor. I wonder how fast this could
be done using an EC2 temporary compute cluster.

~~~
getsat
Or low end consumer GPUs which can do over half a billion hashes/sec.

~~~
LearnYouALisp
Not _that_ low-end, right? I think the cheapest "good" one was probably over
$100, when I read about "Bitcoin mining" perhaps month or so ago.

~~~
getsat
$100 is pretty low end for a GPU.

MD5 is many times faster than SHA256, too.

------
maratd
Why is this coming back up again? It is well known that you can create
collisions using MD5. However, you will have to try real hard to do so.

The implications of that are simple.

Do not use MD5 for any type of security or cryptography. On the other hand, if
you're using MD5 for other purposes, you can continue to do so.

I frequently use MD5 to generate unique ids for a ton of stuff. There is
little risk of a collision, since I'm not trying to make things collide. On
the other hand, I would never use it for anything security related.

~~~
Confusion

      However, you will have to try real hard to do so.
    

This is coming back up again, because it's not so hard anymore.

~~~
z92
The question I care is: Is the probability of naturally occurring a collision
in MD5 hashing significantly more than any other 128 bit hashing algorithm?

The answer is: No.

~~~
JoachimSchipper
Sure, but if you're willing to accept lots collisions on "bad" input there are
faster hashes (Bernstein and Jenkins have nice fast non-cryptographic hashes,
for instance.)

Non-cryptographic hash functions may be a bad idea: e.g. don't store URI
parameters (?foo=bar&baz=bar) in such a hash table, or you'll be vulnerable to
rather simple DoS (this was all over the internet a week or two ago.)

~~~
jbapple
> Non-cryptographic hash functions may be a bad idea: e.g. don't store URI
> parameters (?foo=bar&baz=bar) in such a hash table, or you'll be vulnerable
> to rather simple DoS (this was all over the internet a week or two ago.)

I think universal hashing is the usual protection against that kind of attack,
and I think universal hashing is not considered cryptographic:

[http://www.cs.rice.edu/~scrosby/hash/CrosbyWallach_UsenixSec...](http://www.cs.rice.edu/~scrosby/hash/CrosbyWallach_UsenixSec2003/)

------
jakejake
Does this have security implications? For example allowing you to compute an
alternate, working password if you only know the hash? Are there other things
this article implies?

~~~
teraflop
Well, nobody should be using plain MD5 to hash passwords anyway. However,
preimage attacks (finding an input that produces a specific hash) are still
vastly more difficult than collision attacks (where the hash is not chosen in
advance).

The security flaws introduced by collision attacks tend to be a bit subtler.
For instance, if a digital signature scheme uses MD5 as the underlying scheme,
you could generate two different documents with the same hash, convince a
third party to sign one of them, and then transfer the signature to the other
document.

~~~
andrewcooke
can you clarify what you mean by "plain md5"? if someone is using md5 with
crypt(3) is that ok?

i ask because of this - <https://bugzilla.novell.com/show_bug.cgi?id=743715>

~~~
flixic
I think teraflop had in mind salting.

~~~
dodedo
A salt only protects against pre-computed dictionary attacks (rainbow tables).
It does not offer any additional protection in this scenario.

------
justindocanto
Forgive me if this is a poor question. I need to study up on my cryptography a
little more...

Would salting both of these messages lead to md5 hashes that no longer match?

~~~
marshray
Yes, if they were given different salts. But the attack model for password
authentication is very different (e.g. there's usually only one salt in play
and the attacker doesn't get to choose salt or password he's trying to crack).
So the collision attacks on MD5 don't seem obviously relevant.

Even with salting MD5 is still far too efficient to compute to be strong for
password hashing. It could be combined in an iteration framework which made it
secure, but there are plenty of other hash functions (with better reputations)
that would be a better choice.

------
munchor
This is quite interesting. Common hashes is pretty cool. What about for SHA-1,
can there be a collision?

~~~
btilly
There is a well-known principle called the pigeon-hole principle - you cannot
put more pigeons than you have holes in holes without putting at least 2 in
one.

The larger the number of bits you have, the larger the number of possible
messages. Therefore if the size of the message exceeds the size of the hash,
there are more possible messages than there are possible hashes they can be
sent to. Therefore by the pigeonhole principle, some hash value represents
more than one message, so there must be collisions.

That part is simple. The hard part is finding them. The simplest way is to
just try lots of random messages, until 2 give the same hash value. This is
called brute force. But there are enough possible hash values that this is not
feasible.

To date we have not found an SHA-1 collision. However we know of algorithms
that should be able to find one faster than brute force. But we have not
actually found one yet.

However attacks only get better, and computers only get faster. It is widely
accepted that an actual hash collision in SHA-1 is just a matter of time now.

~~~
dedward
its not just widely accepted.... hashing algorithms, due to the pigeonhole
principle you just explained, by definition are full of collisions.

the only security relevant part is how hard it is to find them for use in
various scenarios......

(this is why i really dont get why zfs ever did deduplication relying on hash
only.... sure verify is an option, but it would be insane to use hash
only....... unlessi am overlooking aomething statistical that makes it make
sense (maybethe odds of a collision are fR less than the odds of total
hardwarefailure ?still.....)

~~~
caf
Your last sentence hits the nail on the head. Even when storing petabytes of
data, the odds on a freak hash collision are still many orders of magnitude
longer than the odds on a hardware failure.

~~~
saurik
There's a statement that has been "pretty much permanently" on a whiteboard-
covered wall of the computer lab at my college telling a joke about "the
difference between a mathematician and an engineer", that goes through the
math behind a specific type of prime number generator, calculates the
likelihood that it might fail, and then claims the mathematician cares about
that while the engineer knows that is orders of magnitude less likely than a
guaranteed algorithm failing due to a cosmic ray hitting it in RAM and
flipping one of its bits. ;P

