
How I created two images with the same MD5 hash - sebg
http://natmchugh.blogspot.com/2014/10/how-i-created-two-images-with-same-md5.html
======
est
tl;dr collision by appending data at the end of .jpeg file.

> The chosen prefix collision attack works by repeatedly adding 'near
> collision' blocks which gradually work to eliminate the differences in the
> internal MD5 state until they are the same

> This type of collision is has been termed a chosen prefix collision. In this
> case the image data is the prefix or to be more exact the internal state of
> the MD5 algorithm after processing the image is. You can't see the added
> binary data at the end of jpeg images as it is preceded with an End Of Image
> JPEG marker.

------
barrkel
Of course, algorithms like md5 are still useful when not used as cryptographic
functions. For example, they can be used for duplicate detection across a
large data set. Unless the data set is under control by an attacker, and they
have something to gain by creating a collision, there's little risk for this
use.

~~~
gliptic
You can also use e.g. BLAKE2 which is at least as fast as MD5, while still
being cryptographically strong. Win-win.

~~~
smt88
For hashing passwords, you want something slow, not fast.

~~~
gliptic
Who said anything about hashing passwords? But this is a misconception. You
don't want a primitive that is slow. You want one that gives you a lot of
security margin in few cycles, then you make it slow by, for instance,
iterating it many times. There are many password hashing schemes based on
BLAKE2, 6 different ones submitted to PHC ([https://password-
hashing.net/](https://password-hashing.net/)).

~~~
smt88
It's sadly common for inexperienced devs to use MD5 and call it a day (if they
hash at all).

I just wanted to point out that, for situations where user input and security
are important, you want the algorithm to be slow.

I didn't say anything about how to implement it or whether you should use
BLAKE2 or what. There's a lot more to it than I could put in a reply here, and
even quick Googling would turn up info about salting/iterating/etc.

------
hrjet
Do I understand this right: if the length of the original file is specified
along with the hash, then such attacks are not trivial?

~~~
dnet
Yes, see
[http://blogs.msdn.com/b/oldnewthing/archive/2004/05/19/13493...](http://blogs.msdn.com/b/oldnewthing/archive/2004/05/19/134937.aspx)

------
0x0
When you have <pre> formatted paragraphs that are wider than the viewport,
it's quite annoying that you also intercept swipe-left to open a different
article instead of letting the user scroll.

MobileSafari, iOS8.

~~~
rev087
Valid criticism, but Blogspot is to blame here.

~~~
tedunangst
Do authors not have control over their blogspot theme? I though they used to.

Or if not, maybe more people need to be told not to use blogspot. Second
article I've read in a week that was broken by this "improvement".

------
rmhsilva
This reminds me of a very interesting challenge on smashthestack.org (IO #11 I
believe), involving creating two brainfuck programs with the same MD5 hash,
which produced quite different outputs. Excellent learning process. Having
this article probably would have cut a week off the time it took to
complete...

~~~
morpher
That was a fun one. I seem to recall it taking a day or so to generate a
suitably colliding prefix pair using publicly available code. After that the
rest was pretty easy.

------
natmchugh
If anyone wants to re-run this this is the version of the shell script I ended
up with
[https://gist.github.com/natmchugh/ab3e30a45fd724888ad8](https://gist.github.com/natmchugh/ab3e30a45fd724888ad8)

------
crystaln
I'm curious what sort of vulnerabilities this could expose.

Conflicting git check ins, breaking cache layers, tampering with downloads,
etc

~~~
dnet
1\. git uses SHA-1, not MD5; former has problems, but is still better than
latter

2\. git stores the length of a blob before hashing [1] which makes it harder
(but not impossible) to perform such attacks [2]

[1] [http://www.git-scm.com/book/en/v2/Git-Internals-Git-
Objects#...](http://www.git-scm.com/book/en/v2/Git-Internals-Git-
Objects#Object-Storage) [2]
[http://blogs.msdn.com/b/oldnewthing/archive/2004/05/19/13493...](http://blogs.msdn.com/b/oldnewthing/archive/2004/05/19/134937.aspx)

------
jpswade
I've seen quite a few examples of binary collisions, but never plain text
collisions.

I'm aware that they are possible, but they appear to be unlikely, especially
when using md5 correctly, even then most people have already migrated to sha1
or sha256...

------
mcmillion
Few things are as aggravating as a website that commandeers the navigate back
gesture.

------
jmnicolas
I think at this point everybody knows (or should know) MD5 is only useful to
compare 2 files of the same size to find duplicate files. Any other use is
asking for trouble.

~~~
AgentME
There are known MD5 collisions of the same length.

------
vcarl
When is it worth it to be concerned about this type of attack?

Do both files need to be altered to find a collision, or just one? Can it be
done in a fixed file size?

------
srcmap
Do sha1, sha2 have similar weakness?

~~~
ygra
SHA-1 can be considered near-broken at this point¹, as far as I remember. No
actual successful attack like with MD5, but close enough to be theoretically
possible in the foreseeable future.

There was fear that the attacks could be extended to SHA-2, thus we now have
SHA-3 too. However, SHA-2 remains secure for now.

_____

¹ Wikipedia: »As of 2012, the most efficient attack against SHA-1 is
considered to be the one by Marc Stevens[34] with an estimated cost of $2.77M
to break a single hash value by renting CPU power from cloud servers.« I.e.
it's quite expensive, but can be done in a reasonable time, especially by
adversaries with interest and funds to do so.

~~~
tptacek
There is no indication that SHA-2 is threatened in any practical way.

SHA-1 and SHA-2 are similar at an architectural level, in some of the same
ways that two mid-1990s Feistel ciphers might be similar, and share building
blocks, but they aren't the same hash function. They are much more different
than, say, DES and 3DES.

SHA-2 remains the best practical choice for most systems today. The truncated
variants (like SHA2-512/256) even break length extension exploits.

------
callesgg

        "To search though all possible MD5 values is 2^128 operations which is massive. To be in with a good chance of finding a collision would take ~  2^64 operations which is again far too big for normal computing."
    

Seams weird, why so low to find collision compared to full search? or does the
author not understand Exponentiation?

~~~
mandalar12
He doesn't reference it but he understands exponentiation and the birthday
problem [1] that is used to find collisions when you control both files [2].

[1]
[http://en.wikipedia.org/wiki/Birthday_problem](http://en.wikipedia.org/wiki/Birthday_problem)

[2]
[http://en.wikipedia.org/wiki/Collision_attack#Classical_coll...](http://en.wikipedia.org/wiki/Collision_attack#Classical_collision_attack)

