
Adding a SHA1 collision vulnerability test hoses WebKit's source repository - raingrove
https://bugs.webkit.org/show_bug.cgi?id=168774&comment=c27#c27
======
fanf2
OK, this is quite a serious vulnerability in Subversion. SVN depends more on
raw file SHA1 hashes than git because git prepends a header which prevents raw
SHA1 collisions from translating directly into easy svn-style repository
corruption.

The reason svn is broken is its "rep-sharing" feature, i.e. file content
deduplication. It uses a SQLite database to share the representation of files
based on their raw SHA1 checksum - for details see
[http://svn.apache.org/repos/asf/subversion/trunk/subversion/...](http://svn.apache.org/repos/asf/subversion/trunk/subversion/libsvn_fs_fs/structure)

You can mitigate this vulnerability by setting enable-rep-sharing = false in
fsfs.conf - see documentation in that file or in the source at
[http://svn.apache.org/viewvc/subversion/trunk/subversion/lib...](http://svn.apache.org/viewvc/subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c?revision=1737356&view=markup#l862)

This feature was introduced in svn 1.6 released 2009, and made more aggressive
in svn 1.8 released 2013 [https://subversion.apache.org/docs/release-
notes/](https://subversion.apache.org/docs/release-notes/)

SVN exposes the SHA1 checksum as part of its external API, but its
deduplication could easily have been built on a more secure foundation. Their
decision to double down on SHA1 in 2013 was foolish.

~~~
acqq
> this is quite a serious vulnerability in Subversion

I rather believe it's a minor bug, and that once it is fixed, they can
actually keep using SHA1 as before, without having the denial of service when
somebody tries. Then, for example, if somebody actually tries to put two files
with the same SHA1 but different MD5 they can reject the second one before
accepting it. Or they if there are two different files with same SHA1 and they
accepted both and they store only one content, SVN can still continue to work.
So you can't get the second unless you, for example, put it in some archive
format first and then put in the SVN, OK, your problem, the SVN would still
work for anything else.

In short, it sounds like a denial of service at the moment, but I think that
DOS can be avoided without changing the hash algorithm.

However, I'm sure that SVN is not the only source base that was never up to
now tested with two different files that have the same SHA1.

~~~
stsp
Apache Subversion developer here.

Andreas Stieger (SUSE, SVN) has written a pre-commit hook script which rejects
commits of shattered.io style PDFs

[https://svn.apache.org/viewvc/subversion/trunk/tools/hook-
sc...](https://svn.apache.org/viewvc/subversion/trunk/tools/hook-
scripts/reject-known-sha1-collisions.sh?view=markup&pathrev=1784336)

This is the first mitigation available. If you are responsible for an SVN
server at risk, please make use of this hook.

If somebody could make a similar hook for Windows and post it here or to
dev@subversion.apache.org that would be highly appreciated.

(edit: switched script link to HTTPS)

~~~
stsp
There have been follow-up fixes to the script and the link above is now stale.
See [https://svn.apache.org/viewvc/subversion/trunk/tools/hook-
sc...](https://svn.apache.org/viewvc/subversion/trunk/tools/hook-
scripts/reject-known-sha1-collisions.sh?view=log)

------
phaemon
As mentioned in a previous comment (
[https://news.ycombinator.com/item?id=13722469](https://news.ycombinator.com/item?id=13722469)
) git doesn't see these the same as it hashes the header+content which breaks
the identical SHA trick.

Of course, I first tested this on our main production repository at work
because...oh, wait, I didn't because _what were you thinking_?!

~~~
tveita
It could be made to work on Git, but you'd need to make a collision that
included the git blob header. The resulting files would not have the same
SHA-1 hash until the header was added though, so they wouldn't be useful
except for testing Git itself.

My guess is that Git wouldn't be 'hosed' like SVN, since it currently doesn't
have a secondary hash to detect the corruption. It would simply restore the
wrong file without noticing anything was amiss.

~~~
sverige
> It would simply restore the wrong file without noticing anything was amiss.

Why hasn't Git switched to SHA-2? People have been warning that SHA-1 is
vulnerable for over a decade, but that vulnerability was dismissed with a lot
of hand-waving over at Git. Is it a very difficult technical problem to
switch, or just a problem of backward compatibility for existing repos (i.e.,
it would be expensive to change everything over)?

~~~
mintplant
> Why hasn't Git switched to SHA-2? People have been warning that SHA-1 is
> vulnerable for over a decade

Back then Linus shot this down in his typical abrasive fashion:

[http://www.gelato.unsw.edu.au/archives/git/0504/0885.html](http://www.gelato.unsw.edu.au/archives/git/0504/0885.html)

~~~
abalone
His style is a super abrasive unnecessary power trip but this key point is
relevant:

 _> It is simply NOT TRUE that you can generate an object that looks halfway
sane and still gets you the sha1 you want._

The key phrase being "looks halfway sane". Git doesn't just look at the hash.
It looks at the object structure too (headers) and that makes it highly
resistant to weaknesses in the crypto alone. His point essentially is you
should design to expect crypto/hash vulnerabilities, and that's a smart
stance, as they are discovered every few years.

~~~
cesarb
> It looks at the object structure too (headers)

Linus was not talking about the object headers, but about the object contents.
It's harder to make the colliding objects look like sane C code, without some
strange noise in the middle (which wouldn't be accepted by the project
maintainers).

Yes, it's a "C project"-centric view, but consider the date: it was the early
days of git. The main way of receiving changes was emailed patches, not pull
requests. Binary junk would have a hard time getting in. And even if it did
get in, the earliest copy of the object wins, as long as the maintainers added
"\--ignore-existing" to the rsync command in their pull scripts (yeah, this
thread seems to be from before the git fetch protocol), as mentioned earlier
in the thread.

------
jmount
(from the link) "For the record: the commits have been deleted, but the SVN is
still hosed." That is pretty much my memory of working with SVN. I remember
SVN fouling its database a few times. Sure I've broken git a few times, but I
am always able to (as Jenny Bryan says) "burn the whole thing down" and take
state from another copy of the repository.

I really tried with SVN (wanted something better than CVS) for quite a long
time.

~~~
mst
I've done surgery on svn repos to unhose things a few times over the years,
_usually_ due to PEBCAK rather than svn shitting itself. It's actually pretty
doable, up to and including the equivalent of interactive rebase.

I much prefer that git's _designed_ to let me do such things and provides
tools for doing so, but you can totally rewire svn repos with vi and a bunch
of swearing if necessary.

(and I was using svk for a merge tool at the time so I _did_ have the option
to burn it down and rebuild from scratch; unhosing svn repos wasn't quite
unpleasant enough for me to want to do so)

Then again, I started off doing more ops than dev and have also happily hand-
edited mysql replication logs to unfuck things after a partial failover, so I
may have more of a masochistic streak than you do :)

~~~
lima
I fondly remember editing the raw metadata in a Gluster cluster to recover it
after a three-node split brain :)

------
lumisota
Isn't it the SVN repo that's "hosed", not the Git repo as suggested by the
title?

~~~
PuffinBlue
Yes, the mailing list post backs this up:

[https://lists.webkit.org/pipermail/webkit-
dev/2017-February/...](https://lists.webkit.org/pipermail/webkit-
dev/2017-February/028792.html)

------
afandian
Reminds me of when I worked at an antivirus company. We had be careful with
the EICAR file in test code because it would set off AV alarms.
[http://www.eicar.org/86-0-Intended-
use.html](http://www.eicar.org/86-0-Intended-use.html)

------
isp
New SVN attack category: denial-of-service by SHA-1 collision.

~~~
dsp1234
New SaaS service: Repository SHA-1 collision detection

------
raziel2p
A bit hard for me to tell what happened here, maybe because I don't know
anything about SVN. The two PDFs with equal SHA1 hashes were git commited to
the repository, but converting that to an SVN commit failed because... SVN
can't handle two separate files with the same SHA1 hash?

~~~
wyldfire
It's likely some part of the svn implementation that assumes that the SHA1
signatures guarantee uniqueness within a repo. And they might use that hash as
an identifier.

I'm guessing shattered-1.pdf and shattered-2.pdf have identical hashes but
distinct contents. It's not clear for me to know why this results in a
"checksum mismatch."

    
    
        Checksum mismatch: LayoutTests/http/tests/cache/disk-cache/resources/shattered-2.pdf
        expected: 5bd9d8cabc46041579a311230539b8d1
            got: ee4aa52b139d925f8d8884402b0a750c
    

EDIT: see
[https://news.ycombinator.com/item?id=13725312](https://news.ycombinator.com/item?id=13725312)
for the answer

~~~
phaemon
Heh, because those are the md5 checksums which _don 't_ match.

    
    
      $ sha1sum shattered*
      38762cf7f55934b34d179ae6a4c80cadccbb7f0a  shattered-1.pdf
      38762cf7f55934b34d179ae6a4c80cadccbb7f0a  shattered-2.pdf
    
      $ md5sum shattered*
      ee4aa52b139d925f8d8884402b0a750c  shattered-1.pdf
      5bd9d8cabc46041579a311230539b8d1  shattered-2.pdf
    

As you can see.

~~~
johnchristopher
Wouldn't using both sha1 and md5 solves our problem or the fact both have
collision in some cases dooms that combination ?

~~~
AlexandrB
I asked this question yesterday. Apparently using both is not much more of a
barrier than the stronger of the two by itself.

[1]
[https://news.ycombinator.com/item?id=13715146](https://news.ycombinator.com/item?id=13715146)

[2]
[https://www.iacr.org/archive/crypto2004/31520306/multicollis...](https://www.iacr.org/archive/crypto2004/31520306/multicollisions.pdf)

------
fapjacks
I have to just say here that WebKit is one of the most over-the-top software
projects I've ever tried to dig into, in my twenty years of programming.
Building it inside a vanilla container was impossible following their
directions exactly and required _so much_ research on my part to get working.
I'm used to a bit of back-and-forth with just about every project, but WebKit
was ridiculous. After two workdays of trying, I'd been able to build a WebKit
from the source, but at that point had to concede to the universe the futility
of trying to build a golang-based Phantom, as my friend and former coworker
originally wanted. And that also gave me _mad_ respect for Phantom's author
and immediately taught me why they do not often incorporate new WebKit
versions into the project instead of just pegging to the first one they can
get to build.

------
paulddraper
Site is down.

------
sigjuice
This is why a git clone is not a real backup.

