
Script to do instant MD5 collisions of any pair of PDFs - isp
https://github.com/corkami/pocs/blob/master/collisions/scripts/pdf.py
======
isp
Announced by the author here:
[https://twitter.com/angealbertini/status/1075417521799528448](https://twitter.com/angealbertini/status/1075417521799528448)

Commit:
[https://github.com/corkami/pocs/commit/3832f62d8aad64d541c5d...](https://github.com/corkami/pocs/commit/3832f62d8aad64d541c5d1fee755f30c44535374)

Readme:
[https://github.com/corkami/pocs/blob/master/collisions/READM...](https://github.com/corkami/pocs/blob/master/collisions/README.md#pdf)
("With this script, it takes less than a second to collide the 2 public PDF
papers")

~~~
simongr3dal
The author is also the person responsible for a lot of the polyglot files from
the PoC||GTFO series:
[https://www.alchemistowl.org/pocorgtfo/](https://www.alchemistowl.org/pocorgtfo/)

------
femto113
I note this script takes two existing PDFs as an input and produces two new
PDFs that collide with each other as output, but they do not collide with
either of the originals. Thus this does not enable the obvious attack of
creating a PDF that contains different text but collides with an existing PDF
that you do not control. It does enable some other forms of duplicity, but
only if you are the source of both documents.

------
jgehrcke
This has been possible for a long time, right? Is there anything particularly
innovative about this approach which has not been done elsewhere?

~~~
tyingq
It's been possible to create pairs of colliding PDF files. Taking any two
existing PDFs and creating a colliding pair while keeping the same visible
rendered output is probably what's new.

~~~
diminoten
I don't think that's new either, I remember doing that in 2015, to demonstrate
an issue.

~~~
tyingq
There's this, from 2015:
[https://news.ycombinator.com/item?id=8555079](https://news.ycombinator.com/item?id=8555079)

And colliding SHA-1 in pdf form in 2017:
[https://news.ycombinator.com/item?id=13723892](https://news.ycombinator.com/item?id=13723892)

Not saying you didn't, but I'm unable to find any earlier reference to making
2 arbitrary, existing pdf files collide with MD5.

~~~
diminoten
I must have done the first link's thing.

But once you can do that, this isn't new. Assuming you can append garbage on a
PDF, it's the same solution.

Admittedly I know how easy it is to say, "That's obvious!" in hindsight,
but... isn't it?!

~~~
krageon
The fact that he produced something easy to see helps cement in the mind of
more people exactly how possible it is. It's a benefit not in the sense that
it was impossible before, but rather a benefit in the sense that more people
will really believe that it is possible.

~~~
diminoten
I didn't say it wasn't a benefit, I said there is no new innovation here. I
was kind of wrong, but only insofar as it uses a different way to embed
arbitrary garbage in a PDF instead of directly at the end. The concept is
still the same (as it has to be).

------
saagarjha
Also relevant, a PDF SHA-1 collider:
[https://github.com/nneonneo/sha1collider](https://github.com/nneonneo/sha1collider)

------
DoctorOetker
sci-hub should really switch away from MD5 in their SQL dump if the future
wants to be sure to be looking at the same (or same quality) article

