
An animated GIF that shows its own MD5 - svenfaw
https://shells.aachen.ccc.de/~spq/md5.gif
======
soheil
Here is the explanation:

1\. Generate a gif for each possible digit in the first column

2\. Append collision blocks to each gif to make a 16 way collision

3\. Repeat for each digit

4\. Hash the final product

5\. Replace each digit with the correct digit

From
[https://www.reddit.com/r/programming/comments/5y03g9/animate...](https://www.reddit.com/r/programming/comments/5y03g9/animated_gif_displaying_its_own_md5_hash/)

~~~
taneq
Why would you do it that way, rather than:

1\. Pick a target MD5

2\. Make whatever cutesy animation you want showing that number

3\. Append collision blocks to the resulting gif to make it match the target
MD5

~~~
cardigan
Because this attack is not feasible against MD5 with current state of the art
cryptanalysis.

Instead collisions between files are demonstrated by appending bytes to two
files until their hashes match.

~~~
taneq
Ah darn, for some reason I thought there was a practical preimage attack on
MD5. I must have been thinking of CRC32 (which is just useless for this sort
of thing.) My mistake.

~~~
dom0
CRCs are linear, all 'attacks' are efficient on them.

------
strictnein
Took me a couple of minutes looking at this to realize why this was
interesting.

It's like a baby being born holding its completed birth certificate.

~~~
superbatfish
This sentence has forty-five (45) characters.

~~~
Lxr
Reminds me of a fun problem I encountered as a younger lad:

This sentence has three as, one b, two cs, two ds, thirty-six es, three fs,
three gs, eleven hs, nine is, one j, one k, three ls, one m, eighteen ns,
twelve os, one p, one q, eight rs, twenty-six ss, twenty ts, two us, five vs,
seven ws, three xs, four ys and one z.

I eventually found the solution by iteratively updating the approximate
distribution of each letter and finally sampling [1] - not sure if there is a
better way!

[1] solve_2 in tsh.py at
[https://bitbucket.org/akxlr/tsh/src](https://bitbucket.org/akxlr/tsh/src)

~~~
ouid
you use a gradient descent approach, but there's no way to guarantee anything
like convexity, right?

I bet you can construct a language where a solution exists, but all of its
neighbors' errors (using the topology implied by your gd function) are local
maxima.

~~~
Lxr
Right, it is definitely not convex in any way and gradient descent is
basically useless (and not very interesting because all it really means here
is try perturbing each count by one and look for improvement). I left that
there as an initial attempt at solving the problem but I don't use it in my
solution (except for the call to `gd(iter(gd(v)))` where I explore the
neighbouring solutions to each sample point, which is probably not necessary).

The basic idea instead is to treat each letter count as a random variable, and
ask what the distribution of each count is. In particular, each letter count
can be expressed as a sum of (a mapping of) the other letter counts, so if you
know the distribution of all other letter counts you can improve the
distribution of the letter of interest. Initially assume uniform for all
counts, then iterate until convergence. The images in that repo show the
distribution of some letters at various stages. After doing this for about 10
iterations you stop seeing any improvement (the letter distributions are as
'peaked' as they are going to get), at which point I drew samples until I
found a solution.

~~~
ouid
Alright, I admit that I skimmed, came across a bunch of GD code at the
beginning and assumed that you just got lucky with a very greedy approach,
sorry :P.

The update function you wrote on the distribution space is continuous, and
distribution space is compact (since there's no way to have more than N of any
letter), so there is necessarily at least one fixed point.

Fixed points could still be sources though, right? It clearly wasn't, but I'm
curious to know if you got lucky, or if you merely didn't get unlucky.

I wonder how effective your code would be on the following self descriptive
sentences (base j).

For instance when j=2, "100 1,11 0" has 4 ones and 3 zeros as described.

Does your code consistently find solutions as you increase j? At what point
does the computation become unfeasible?

~~~
Lxr
Nice! I admit I don't have a good understanding of the analysis, but if you
mean "does repeatedly applying f to some starting point v, like
f(f(f(...(v)))) eventually lead to a solution" the answer is definitely not -
you can be right next to a solution and not get there by GD or by applying f.
My function `iter` applies f over and over, but this was another failed
attempt at a solution.

I will endeavour to try your problem, is it related in some way to a well-
known problem?

------
Exuma
It's like a quine almost. My favorite quine
is...[http://aem1k.com/world/](http://aem1k.com/world/) (view source)

~~~
djsumdog
Have you seen this one? It's insane:

[https://github.com/mame/quine-relay](https://github.com/mame/quine-relay)

~~~
TuringTest
It's awesome how it can maintain the image layout in the source code through
those 100 transformations.

I wonder whether it reads its own source code file to detect where to put the
spaces for the image, or is it all encoded as data in the ruby program itself?
I suppose it's the latter (otherwise it wouldn't qualify as a quine), but I
can't figure out how it is encoded.

~~~
nacs
It appears all the languages' code is encoded in the `code_gen.rb` file:

[https://github.com/mame/quine-relay/blob/master/src/code-
gen...](https://github.com/mame/quine-relay/blob/master/src/code-
gen.rb#L1030-L1056) < VB

[https://github.com/mame/quine-relay/blob/master/src/code-
gen...](https://github.com/mame/quine-relay/blob/master/src/code-
gen.rb#L120-L146) < PHP

------
MontagFTB
There's an explanation of how this 'attack' was composed here:
[http://crypto.stackexchange.com/questions/44463/how-does-
the...](http://crypto.stackexchange.com/questions/44463/how-does-the-attack-
on-md5-work-that-allows-a-file-to-show-its-own-full-hash)

~~~
bonzini
The answer there doesn't explain chosen-prefix attacks, which are more
powerful than building a collision out of a single prefix. They require much
less flexibility in the fire format; building the PDF for shattered.io wasn't
trivial exactly because the SHA-1 break didn't allow choosing two prefixes.

------
dexen
In similar spirit, a Zip archive holding itself, by Russ Cox:
[https://research.swtch.com/zip](https://research.swtch.com/zip)

Infinite recursion!

------
clishem
Title checks out.

    
    
        > md5sum md5.gif 
        f5ca4f935d44b85c431a8bf788c0eaca  md5.gif

~~~
joombaga
Never knew that md5sum took a file argument. I've always `md5sum <foo`

Slightly embarrassing, but at least I wasn't `cat foo|md5sum`

~~~
coldpie
The output format of md5sum and other checksum programs is well-defined, which
lets you use the -c switch to verify. This is the format you see in checksum
files when downloading source tarballs, distro ISOs, packages, etc.

    
    
        $ echo hello > fileone
        $ echo there > filetwo
        $ md5sum fileone filetwo | tee /tmp/ck
        b1946ac92492d2347c6235b4d2611184  fileone
        c4ff45bb1fab99f9164b7fec14b2292a  filetwo
        $ md5sum -c /tmp/ck
        fileone: OK
        filetwo: OK
        $ echo oops > filetwo
        $ md5sum -c /tmp/ck
        fileone: OK
        filetwo: FAILED
        md5sum: WARNING: 1 computed checksum did NOT match
    

Edit: Of course, nothing wrong with using stdin:

    
    
        $ wget -q -O- https://shells.aachen.ccc.de/~spq/md5.gif | md5sum 
        f5ca4f935d44b85c431a8bf788c0eaca  -

~~~
nicky0
What's the -O- for?

~~~
cobbzilla
-O is the file to write output. plain - means stdout. so, run wget and print results to stdout. tip: add -S to include response http headers.

------
z1mm32m4n
Wow! I'd love to read about how this was made.

~~~
JadeNB
soheil heard, and your request was granted
([https://news.ycombinator.com/item?id=13824445](https://news.ycombinator.com/item?id=13824445)).

------
fniephaus
Reminds me of [https://en.wikipedia.org/wiki/Tupper's_self-
referential_form...](https://en.wikipedia.org/wiki/Tupper's_self-
referential_formula)

~~~
arglebarnacle
I was a bit disappointed when I found this out about it:

"The formula is a general-purpose method of decoding a bitmap stored in the
constant k, and it could actually be used to draw any other image."

That makes it less of a self-referential formula in my opinion, and more of a
cool formula that can print any bitmap.

~~~
jwilk
Yeah, the title is misleading. There's nothing self-referential about this
formula.

Now, if the formula was also capable of drawing the constant k, THAT would be
exciting.

~~~
omaranto
Jakub Travnik has got your back. [http://jtra.cz/stuff/essays/math-self-
reference/index.html](http://jtra.cz/stuff/essays/math-self-
reference/index.html)

------
zaf
Not as impressive but similar idea:
[https://github.com/ziqbal/jpegfit/blob/master/61081.jpg](https://github.com/ziqbal/jpegfit/blob/master/61081.jpg)

------
justindocanto
Would love to know how this was made, if anybody knows

~~~
nmat
This was posted to reddit yesterday, here's the
discussion:[https://www.reddit.com/r/programming/comments/5y03g9/animate...](https://www.reddit.com/r/programming/comments/5y03g9/animated_gif_displaying_its_own_md5_hash/)

------
logotype
I think this uses one of the available GIF extensions (e.g plain-text, custom
data, or other) since gif is a block based binary file. Animation blocks in
one part of the file, then for the other blocks you could just put random data
(which is ignored by the GIF parser) until you find the MD5 collision.

------
soheil
I wonder if it being an animated GIF as opposed to just an image has anything
to do with it.

~~~
pulse7
Surely it does: you have much more combinations in animated GIF than static
GIF so it is easier to make "unvisible" changes to the file to get the MD5
hash displayed on final GIF slide.

------
quirkot
[https://xkcd.com/688/](https://xkcd.com/688/)

------
mmanfrin
I can't imagine how you'd begin to do this.

~~~
tekromancr
Just a wild guess, but I imagine you would start by designing the gif
animation to depict a plausible MD5 hash, then modifying the file in such a
way that you make the file MD5 hash match the one depicted in the animation.

Very impressive!

~~~
MichaelGG
But any 128 bit value is a plausible MD5 hash. Modifying the file to make that
value is akin to doing a preimage attack right?

~~~
rhaps0dy
Yes, that's probably the point of the maker. To prove that they can do a
preimage attack.

~~~
kmm
No, it's just some fancy tricks with collisions. A preimage attack is still
far beyond our current technologies

~~~
kevincox
It's important to note that technologies here means not only our hardware
technologies but also our cryptographic knowledge.

There is a chance that there are unknown flaws in the MD5 algorithm that make
preimage attacks far easier to compute.

Our compute power is also growing cheeper, which would let us solve harder
preimage problems.

------
soheil
This is incredible, it's like winning the lottery except that you can play
incredibly fast with almost no cost for each try.

------
mh-cx
That makes me wonder if it's also easy to create a string that is its own MD5.

~~~
ketralnis
This is called a fixed point, that is an X such that md5(X)==X. A quick google
for "md5 fixed point" shows some claims that the probability that one exists
is about 63%
([https://news.ycombinator.com/item?id=614027](https://news.ycombinator.com/item?id=614027))
but no examples of a known one

~~~
raverbashing
Also the "real" fixed point wouldn't be necessarily ASCII

So you could have two types: one where the hash matches the input bytes,
another where an hex string is inputed and that matches the hex representation
of the hash

~~~
aidenn0
Also EBCDIC and UTF-16

------
IAmGraydon
That is really amazing. How was this done?

~~~
ruleabidinguser
I bet this has something to do with the fact (or so I'm told) that MD5 is
broken.

------
svenfaw
Writeup by @angealbertini coming soon

~~~
j_s
Dubbed HashQuines; here are PDF and PostScript too:
[https://twitter.com/i/moments/838685002703466497](https://twitter.com/i/moments/838685002703466497)

------
matreyes
How did you do that!

------
dgitg5266
b12092fb2bf87e09ab9e9ad6fc55f3c5

------
Kenji
Now do a SHA1 one ;)

~~~
j_s
People are reusing the collision in many interesting ways, but I don't think
it helps in this case.

Two "different" EXEs with same SHA1/MD5 with different outputs:
[https://roastingbugs.blogspot.com/2017/03/eat-more-
hashes.ht...](https://roastingbugs.blogspot.com/2017/03/eat-more-hashes.html)

------
bbcbasic
Verified correct with [http://onlinemd5.com/](http://onlinemd5.com/) :-)

------
quakeguy
WOW!

