
Reports of SHA-1's demise are considerably exaggerated - npongratz
http://www.metzdowd.com/pipermail/cryptography/2017-February/031604.html
======
simias
It's not a particularly interesting email, I don't think it's bringing
anything new on the table. The title also doesn't have anything to do with the
contents (I understand that might be because of HN submission rules but it's
very misleading in this case).

It's a bit hard to take the author seriously when he complains about "headless
chickens" "considerably exaggerat[ing]" then goes on to say that you need "a
nation-state's worth of resources" to find collisions. If anything this
shattered proof of concept showed that it was actually a lot easier than that,
giving an estimate of around 110k$ IIRC.

I'm also sure that SHA-1 remains pervasive in many codebases, although as long
as pre-image are impractical it might be hard to exploit those
vulnerabilities.

~~~
patcheudor
>I'm also sure that SHA-1 remains pervasive in many codebases

I've done a lot of security reviews of backup solutions over the years and
nearly universally they use SHA1 for de-duplication. Security professionals
have been providing warnings about this practice, but when you are dealing
with a cloud backup solution as an example, speed and low compute requirements
are a must and it's hard to win the argument when someone tosses "theoretical"
onto the table. Well, we are now past "theoretical" and that's what matters.

I just tried out the SHA1 collider: [https://alf.nu/SHA1](https://alf.nu/SHA1)
and created two PDFs containing entirely different bits of text in JPGs. It
took seconds and indeed the output of the files were SHA1 identical. This is
no longer nation-state level stuff and computing colliding documents to upload
to cloud backup providers who de-dupe across the file system and not just
restricted within the boundaries of each account could result in the ability
to wipe out and change other people's data. At a minimum that could be an
annoyance. At worst, it could result in the intentional change of someone's
content to cause significant financial or physical harm.

~~~
f2f
the two PDFs used the same (colliding) prefix and included both files you
supplied. once you've reached a hash collision, any number of identical blocks
after that would hash the same.

if you want a proof, submit a different image: the hashes will be identical,
but different from your first attempt.

~~~
jeffdavis
Can you explain the difference between a collision attack and a preimage
attack? How can you get two PDFs with arbitrary contents having the same SHA1
without a preimage attack?

~~~
jobigoud
The pdfs don't have arbitrary content, they contain very specific headers
(crafted by Google research), in one file the header selects one image, in the
other it selects the other. These two headers are SHA1 colliding and the
remainder of the files are identical.

~~~
jeffdavis
Oh, that makes sense. Thank you!

------
dz0ny
Well SVN is affected if you commit crafted pdf :)

[http://i.imgur.com/iJZe21Z.png](http://i.imgur.com/iJZe21Z.png) Rel:
[https://news.ycombinator.com/item?id=13725093](https://news.ycombinator.com/item?id=13725093)

~~~
tyingq
I think that's actually a great example of how it may not be considerably
exaggerated.

SVN is probably not the only piece of software where you can create a mess
solely with the already released collision. It's more like a DOS, and less
like actually injecting a malicious payload, but potentially still
destructive.

Edit: Perhaps I'm missing the context of "considerably exaggerated"? Are there
some examples of people saying the sky is falling?

~~~
jacquesm
One thing about git versus svn is that svn tends to be used more behind walls
than outside of them and git is used both inside _and_ outside.

------
joshuak
Peter suggests that everyone using SHA-1 should move to SHA-256. That's a
reasonable suggestion, but I say as long as you're making hashing changes why
not move to SHA-512.

Remember, it's also in the FIPS SHA-2 standard and faster on 64bit CPUs then
SHA-256. It's only 64 bytes long, surly that's not too much to handle.

Edit: Goggle also suggests SHA-256, so perhaps Peter was simply seconding the
recommendation. I suggest SHA-512 is the better recommendation.

~~~
base698
96a2dff2ba1aceed1994fddd9adc276556c789c52d3e5f8a1721703a461d41ab4ddd9563075e8adda6382a8c9d52be78ae07eec77dcfba58978ea87290a9028f

SO BIG. For git, most people don't realize in most cases if the repo isn't
giant you can use just 96a as a shortcut. I imagine people would be turned off
by the sheer size.

~~~
npongratz
If I were eyeballing two hashes to make sure they were equal, I'd worry about
my ability to see near collisions (similar to [0]) the longer the hashes were.
My eyes just start skipping ahead at some point, and would probably miss
different-but-similar-looking characters in the middle.

But I think SHA-512 is resistant to near-collisions (sorry, I don't have a
citation), and besides, one would probably best compare hashes using a script
one understands.

[0]
[http://link.springer.com/content/pdf/10.1007%2F978-3-540-286...](http://link.springer.com/content/pdf/10.1007%2F978-3-540-28628-8_18.pdf)

~~~
jgrowl
Maybe we just need better tooling to help us compare hashes easier. Not
exactly sure what that would look like though.

~~~
throwawayish
Color the character boxes and display in two adjacent lines.

------
username3
Make your own colliding PDFs: [https://alf.nu/SHA1](https://alf.nu/SHA1)

~~~
rcar
Interestingly, Gmail labeled PDFs generated through that tool as viruses.
Anyone happen to know the mechanism behind that?

~~~
arnarbi
See "Mitigating the risk..." at
[https://security.googleblog.com/2017/02/announcing-first-
sha...](https://security.googleblog.com/2017/02/announcing-first-
sha1-collision.html)

------
hueving
The sha-1 attack is only a collision attack. It's not a pre-image attack. So
if you clone a git repo and want to sneakily replace an existing object, you
will not be able to do this because one of the inputs has already been fixed.

In order to perform that kind of attack, there would need to be a second pre-
image attack, which does not exist right now.

Even md5 still has second pre-image resistance with a search space only
slightly below the entire output space.[1]

1\. [http://crypto.stackexchange.com/questions/13303/is-
md5-secon...](http://crypto.stackexchange.com/questions/13303/is-md5-second-
preimage-resistant-when-used-only-on-fixed-length-messages)

~~~
andrewflnr
So add a new object to the git repo (open source projects _usually_ allow
contributions) for which you already have a malicious SHA1-colliding object in
your pocket. If your change is widely distributed, you now have a hash in the
wild that matches your malicious data.

~~~
hueving
Right, assuming it goes through no revisions for code review and doesn't get
rebased, that could potentially work. You would need to figure out where to
have some random data in the commit that you're altering to search for the
collision. Maybe some "test data" or something and hope nobody asks you to
remove it.

------
wbl
It's all theoretical until someone loses a centrifuge.

------
mnarayan01
As a total neophyte on these kinds of things, the article seems to be talking
about Google's SHA vulnerability as if it's a preimage attack rather than a
collision one. Anyone more knowledgeable care to chime in?

~~~
loup-vaillant
Some protocols are sensitive to mere collisions. Also, this suggests actual
preimage attacks could be found later. Not today, but still. Finally an
accurate explanation is longer and has less punch.

It's safer to "join the headless chicken" route, consider SHA-1 officially
broken, and start thinking of alternatives.

~~~
WorldMaker
IIRC, collision attacks may be considered precursors to preimage attacks. Some
preimage attacks look like "brute force" monte carlo simulations of collision
attacks, running enough collision attacks in parallel until a collision attack
results in the preimage.

------
loup-vaillant
I'd rather join the headless chickens, actually. True, it took a year's worth
of computation with incredibly stock hardware _now_. But that's only going to
get cheaper.

By the way, nation-states won't use GPUs. They'll use _ASIC_. They tend to be
4-6 orders of magnitude more energy efficient than GPU for this kind of things
(at least they are for Bitcoin mining). I just hope nobody succeeded in the
business of selling MD5 colliders —it would mean the same could work with
SHA-1.

~~~
eterm
The GPU computations didn't take long for Google, it was the CPU computation
that took a long time with 110 years GPU vs 6500 years CPU time. I didn't read
into the technical detail but given it wasn't done with GPU I'd guess it
wouldn't be easily done with or improved with ASIC either in this particular
case.

~~~
loup-vaillant
Hmm, we'll need to wait for the source code and a detailed analysis of the
exploit, but 6500 years of CPU time suggest a high degree of parallelism right
of the bat.

Depending on the demands of the algorithm, an ASIC could outmatch an x86 farm
—possibly by even more orders of magnitude than they do GPUs.

Possible hurdles for the ASIC are memory hardness, (memory costs the same no
matter the architecture), branching, and complex operations such as
multiplications. They could destroy any advantage the ASIC have.

------
throwaway2016a
Can anyone answer why exactly this makes Git vulnerable?

I was under the impression the use of SHA1 was only for hashing and not for a
security signature. And what would be the benefit of intentionally causing a
collision? Would it cause some sort of DoS like someone else here mentioned?

~~~
deathanatos
> _I was under the impression the use of SHA1 was only for hashing and not for
> a security signature._

This is _mostly_ correct. The commands git commit -S and git tag -s both sign
commits and tags, respectively, using GPG. The signature covers only the
commit object: the tree's SHA1, the commit/author data, and the commit
message, _not the entirety of the data_.

git's objects' SHA1 is computed, for "file" objects, as "blob " \+ ascii
decimal size of blob + nul + data in the blob. The two PDFs in Shattered are
the same size, and thus, have the same git object header. However, naïvely
prefixing the two shattered PDFs with their git header results in different
hashes; I presume this is b/c the internal state of SHA1 differs from what the
constructed data that causes the collision expects. You can see this yourself:

    
    
      % ls -l shatter*.pdf
      -rw-r--r-- 1 - - 422435 Feb 24 19:27 shattered-1.pdf
      -rw-r--r-- 1 - - 422435 Feb 24 19:27 shattered-2.pdf
      % { printf 'blob 422435\0'; cat shattered-1.pdf; } | gsha1sum -
      ba9aaa145ccd24ef760cf31c74d8f7ca1a2e47b0  -
      % { printf 'blob 422435\0'; cat shattered-2.pdf; } | gsha1sum -
      b621eeccd5c7edac9b7dcba35a8d5afd075e24f2  -
    

If you commit the two PDFs to git, those are the hashes they will have. They
are different. Now, if you take the header into account when computing the
collisions, you can create a collision. The paper seems to say that the attack
takes a known (and _controllable_ ) prefix P, and finds two sets of two
512-bit (64 byte) blocks (different for the two files), M_1^(1) and M_2^(1)
for the first file, and M_1^(2) and M_2^(2) for the second file, that cause
the internal state of the hash to collide; after than, _any_ (also
controllable) suffix S (but the same for both files) can be appended.

This is why the diff[1] is as long as it is: each side of the diff is 128
bytes; the two M blocks.

Thus, if you computed a prefix P that started with the git blob header, _then_
some data for your file, and ran this attack, you should be able to create two
files that, when committed to git, collide. The two pieces of data in the
header don't cause any trouble: the paper's method allows you to control the
output size, mostly, so we can mostly choose any size we want; the type is
always "blob" and is thus effectively constant.

Now, from what I can gather from the paper, there doesn't seem to be real
control over the portion that differs; that's really what we need to take
advantage of this beyond just creating collisions. This is a collision, but
it's _not_ what's called a chosen-prefix collision. A chosen-prefix collision
lets me choose two _different_ prefixes (which gives the attacker much more
control over the differences in content, thus it becomes much easier to craft
a "good" and a "bad" version); this attack requires both files to have the
same prefix.

Now, here is the worst way I can think of as to how I could get a colliding
object to you:

Imagine that I can convince someone you trust to sign a commit; this commit
contains, either directly or indirectly via an ancestor commit, an object
whose hash we will collide.

Now, if I can later get you to download that commit and all its parents,
except I substitute one object's data for another's. The signature is still
good: I've not changed the commit object in any way; it references objects by
SHA1 hash, and the hash hasn't changed, "only" the data.

Here's another scenario, and you don't need signatures in this scheme; if I
push a commit to master w/ the "good" version of the object, but before you
pull it, I push to you a branch that contains the bad object, then git writes
the bad object to your objects folder, under the colliding hash. You now pull
master, but git doesn't pull the "good" version of the object, b/c it already
has an object with that hash. _Your_ master is now different; I've effectively
poisoned your repo with the bad object.

Now, whether you can pull off this stunt or not, IDK. My point is that a)
git's signatures don't cover the entirety of the repository, only a now-very-
weak cryptographic hash, and b) git is (I believe) subject to object collision
from this. But presently I'm not seeing how it can be maliciously taken
advantage of. But then, people are _really_ clever.

[1]:
[https://news.ycombinator.com/item?id=13721633](https://news.ycombinator.com/item?id=13721633)

------
btrask
It probably wouldn't be too hard to create a fork of Git that replaces SHA-1
with SHA-256 (or Blake2, if that's your thing).

At the same time, you could remove the hash prefixes (blobs are prefixed with
"blob") so that the hashes would be be identical to those generated by other
software.

------
phh
I'm sorry, but this mail is totally stupid. It assumes it takes one year to
complete the exploit.

I don't think there is anything preventing from doing such an exploit in 1
month, or even less. With the various CaaS providers the total cost even
remains the same!

~~~
ghshephard
Or, if you had sufficient budget, not even completely unreasonable for a
nation-state that presumably could use a very large cluster for other
purposes, generate a collision in an hour or two. That would be an interesting
exercise - how much hardware/kwH would it take to generate a SHA-1 collision
in 60 minutes.

~~~
fryguy
Well if you use the Bitcoin network as a metric, there's roughly 3 billion
GH/s (which is really two chained SHA1 in hardware), and realtimebitcoin.info
claims this is ~2000 MW. If you compare that to the 9 billion GH that the
shattered article claims are needed, then that indicates it would take a
network equivalent in size to the Bitcoin network ~3 seconds and ~1'600 kWh.
There's no indication how "lucky" a 9 billion GH collision is, so perhaps it
would be longer or shorter based on the statistics.

Looking at it from the other direction, they claim 110 GPU-years. A GeForce
GTX 1080 is claimed to be 180 W. That's 175'000 kWh. If you assume that
dedicated hardware ASICs are 100x more power efficient than the card I
claimed, that has at least a similar order of magnitude. To do it in an hour
would take a million graphics cards, and ~200 MW.

~~~
loup-vaillant
You have to add a couple thousand years of CPU computation. Though if ASIC can
meaningfully replace _those_ , they're as good as negligible…

------
akkartik
Too bad the thread never discusses Linus's response
([https://news.ycombinator.com/item?id=13719368](https://news.ycombinator.com/item?id=13719368)).
Who's right?

------
npongratz
I guess the submission's title was changed to the email's subject, rather than
the title of Mr. Gutmann's one-line summary (which is how I submitted it):

"Reports of SHA-1's demise are considerably exaggerated"

~~~
ghshephard
The title is incorrect. HN does have a rule that says click-bait or clearly
incorrect titles can be modified to be more representative of the actual
article.

~~~
Dylan16807
That's kind of rude to label a reasonable opinion as "clearly incorrect".

And the title given to the specific email by its author is definitely more
representative of its comments than the title of the first email in the chain.

As far as click-bait, I rate both of them equally click-baity.

There was little reason to change the submission title.

~~~
ghshephard
Well, to be fair, I labeled the title "Incorrect" \- (though I'd agree with
someone if they went and suggested, "Clearly", mostly because the content
doesn't even mention Git...

