
Poisonous MD5 – Wolves Among the Sheep - 4mnt
http://blog.silentsignal.eu/2015/06/10/poisonous-md5-wolves-among-the-sheep/
======
WalterGR
The relevance of the article's mention of the "Flame" malware was puzzling,
since no context is provided and the linked Wired article doesn't shed any
light.

Wikipedia has this to say, which seems to solve that puzzle:

"Flame was signed with a fraudulent certificate purportedly from the Microsoft
Enforced Licensing Intermediate PCA certificate authority. The malware authors
identified a Microsoft Terminal Server Licensing Service certificate that
inadvertently was enabled for code signing and that still used the weak MD5
hashing algorithm, then produced a counterfeit copy of the certificate that
they used to sign some components of the malware to make them appear to have
originated from Microsoft. A successful collision attack against a certificate
was previously demonstrated in 2008, but Flame implemented a new variation of
the chosen-prefix collision attack."

[http://en.m.wikipedia.org/wiki/Flame_%28malware%29](http://en.m.wikipedia.org/wiki/Flame_%28malware%29)

~~~
WalterGR
Whoops - I didn't remember that Wikipedia uses a separate domain for mobile
browsers.

Here's the 'real' link:
[http://en.wikipedia.org/wiki/Flame_%28malware%29](http://en.wikipedia.org/wiki/Flame_%28malware%29)

------
aylons
I'm no security expert, but I have a question.

In some systems I've built in the past I employ MD5 as a hashing mechanism to
verify firmware integrity after flashing it in the memory. I don't use MD5 for
anything security related (this is treated in other ways, depending on the
system), just to check transmission and memory integrity.

Is MD5 still considered fine for this, or is there a real risk that random or
systematic (but unintentional) noise could generate a collision between
corrupted and original data? I do believe it should suffice, but hearing all
the badmouth makes me wonder...

~~~
andrew-lucker
MD5 is good enough to prevent most _random_ collisions. The problem is when
you need to prevent _intentional_ collisions.

~~~
innocenat
As a note, even CRC32 is enough to check most random collisions.

~~~
aylons
I actually thought of using CRC32 when developing my first system, but I
preferred MD5 because it would be easier and clearer for people to check file
integrity in a *NIX command line.

------
dperfect
Aren't all hashing algorithms vulnerable to the possibility for collisions
(albeit with different degrees of difficulty)? It sounds like the problem here
is more related to the logic that relies on a hash alone to make important
decisions.

Not saying that MD5 is a good choice in this case, just that we may be blaming
the wrong thing.

~~~
mikeash
Your parenthetical is the key, though. There's a big difference between a hash
algorithm where generating a collision requires a few minutes of work on a
cheap computer (MD5, now) and a hash algorithm where generating a collision
requires a computer the size of the universe operating for a trillion trillion
years (any good cryptographically secure hash).

~~~
dperfect
Cool - didn't realize the difference was so great. I've always known that the
good algorithms are better because they're more difficult to brute-force, but
always wondered if it's just a matter of a few years before the "impossible"
becomes possible. Your illustration helps clarify that improbability in my
mind - thanks!

~~~
__z
>always wondered if it's just a matter of a few years before the "impossible"
becomes possible.

This is from 1998 but the relevant parts -
[https://www.schneier.com/essays/archives/1998/05/the_crypto_...](https://www.schneier.com/essays/archives/1998/05/the_crypto_bomb_is_t.html)

>Cryptographic algorithms have a way of degrading over time. It's a situation
that most techies aren't used to: Compression algorithms don't compress less
as the years go by, and sorting algorithms don't sort slower. But encryption
algorithms get easier to break; something that sufficed three years ago might
not today.

>Cryptographic algorithms are all vulnerable to brute force--trying every
possible encryption key, systematically searching for hash-function
collisions, factoring the large composite number, and so forth--and brute
force gets easier with time. A 56-bit key was long enough in the mid-1970s;
today that can be pitifully small. In 1977, Martin Gardner wrote that
129-digit numbers would never be factored; in 1994, one was.

>Aside from brute force, cryptographic algorithms can be attacked with more
subtle (and more powerful) techniques. In the early 1990s, the academic
community discovered differential and linear cryptanalysis, and many symmetric
encryption algorithms were broken. Similarly, the factoring community
discovered the number-field sieve, which affected the security of public-key
cryptosystems.

DES was used in the 70s, now it can be brute forced in a few days (with the
right hardware).

~~~
dperfect
I suppose one could say this is an argument against using the "computer the
size of the universe operating for a trillion trillion years"-type
illustrations. Statements like that reflect the current theoretical strength
of an algorithm, but unfortunately the illustrations can lead us to (wrongly)
assume that flaws won't be discovered in the algorithm for that period of
time, which is very much untrue and undermines the practical implications of
those statements.

~~~
__z
No arguments from me that cyptography is often explained confusingly which
leads to misunderstandings.

------
cm2187
I don't get why it is a security problem that someone can manufacture false
positives for an anti-virus. What is the benefit for a virus to have non-
malicious code caught by the anti-virus?

False negatives would be more of an issue if the anti-virus has white lists
and one can manufacture a Microsoft Excel MD5 signature with a malware. But
that's not what the article refers to.

MD5 is only broken if you want to use it as a non-reversible hashing algorithm
or if you want to use it as a an unforgeable signature. But it's perfectly
fine for many other usage.

~~~
bariumbitmap
From the article:

    
    
      As you can see, binaries submitted for analysis are
      identified by their MD5 sums and no sandboxed execution is
      recorded if there is a duplicate (thus the shorter time
      delay). This means that if I can create two files with the
      same MD5 sum – one that behaves in a malicious way while the
      other doesn’t – I can “poison” the database of the product
      so that it won’t even try to analyze the malicious sample!
    

So it's a technique to get the scanner to ignore a malicious binary by
constructing a non-malicious one with the same MD5 sum. This would be much
harder if the scanner used a SHA-1 hash or similar.

~~~
cm2187
But that's a white list. But I thought anti-virus rather work by black
listing.

~~~
Too
virustotal.com allows you to upload files to scan with a whole range of anti-
virus programs. Before uploading, it will calculate the hash of your file
client-side to see if the file should be uploaded or if a previously uploaded
(by someone else) file with same hash should be re-scanned with newer versions
of the anti-virus.

I don't know which hashing algorithm they use but just as example of a
situation where whitelist is not used.

~~~
bariumbitmap
Yes, I think that's what the author was alluding to here, although I'm not
sure:

    
    
      The approach may work with traditional AV software too as
      many of these also use fingerprinting (not necessarily MD5)
      to avoid wasting resources on scanning the same files over
      and over (although the RC4 encryption results in VT 0/57
      anyway…).

------
makomk
This is actually one of the older and easier attacks against MD5; we've known
this was possible for over a decade. Nowadays it's actually possible to so
chosen prefix attacks - you can literally take two arbitrary, unrelated files
and append some data that makes them have the same MD5. So you don't even have
to include the malicious code in the decoy file in any form anymore.

