1. Generate a gif for each possible digit in the first column
2. Append collision blocks to each gif to make a 16 way collision
3. Repeat for each digit
4. Hash the final product
5. Replace each digit with the correct digit
This is a Nostradamus attack. For any Merkle-Damgard hash function (MD5, SHA1, SHA2 -- this is a crucial part of why the recent SHA1 collision can be used to create arbitrary colliding pdfs), if hash(A) = hash(B), hash(A+C) = hash(B+C) assuming A and B are at block size boundaries (pad them if not). So you can always add the same suffix to colliding strings and get new colliding strings.
Now, a preimage attack is beyond your means. Given a string, it's hard to find something that hashes to the same thing.
But, given two strings, it's possible to mix crap with both of them till they give you the same hash. This is reasonably fast for MD5, and expensive (but still within the means of folks like Google) for SHA1.
So what you do is you first create images for each digit. Pair them up.
Now, take each pair, and append crap to both instances till you find a collision. Now you have 8 colliding pairs. Now, applying the hash(digit0 + crap0) = hash(digit1 + crap1), hash(digit0 + crap0 + crap01) = hash(digit1 + crap1 + crap01). Appending the same suffix (crap01) to both will still get you a collision.
Now, pair up each pair. Let's say we take the digit0/digit1 pair and pair it with digit2/digit3. Find a collision between digit0+crap0 and digit2+crap2. Let's say that hash(digit0 + crap0 + morecrap0) = hash(digit2 + crap2 + morecrap2). Realize that this is also equal to hash(digit1 + crap1 + morecrap0) and hash(digit3 + crap3 + morecrap2) (since we can add the morecrap suffix to the already-colliding digit+crap combinations to get new collisions).
Now, we have a four-way collision. Repeat with the other pairs and you have 4 of these. Now do the same process, and get 2 8 way collisions. Repeat to get 1 16 way collision. Ultimately you'll have something like
- hash(digit0 + crap0 + crap01 + crap0123 + crap01234567)
- hash(digit1 + crap1 + crap01 + crap0123 + crap01234567)
- hash(digit2 + crap2 + crap23 + crap0123 + crap01234567)
- hash(digit6 + crap6 + crap67 + crap4567 + crap01234567)
- hash(digitF + crapF + crapEF + crapCDEF + crap9ABCDEF)
which is a 16 way collision, of 16 gifs of different digits, followed by some crap for each one.
This only requires 15 collision attacks, which isn't hard with MD5. Needs more than a million dollars for SHA1, though.
(snark: this is the 8th Google search result for "learn more about the specific part of MD5 that makes it weak")
Parent to the comment you are replying to said that it wasn't hard. If it's not hard then it ought to be quite easy.
However flaws in the design have made this easy.
Edit: Oh, I see what you mean.
1. Pick a target MD5
2. Make whatever cutesy animation you want showing that number
3. Append collision blocks to the resulting gif to make it match the target MD5
Instead collisions between files are demonstrated by appending bytes to two files until their hashes match.
(2015, on HN here, but without any significant traction: https://news.ycombinator.com/item?id=9003429)
The software used is "HashClash", seems it supports SHA1, too: https://marc-stevens.nl/p/hashclash/index.php
It's like a baby being born holding its completed birth certificate.
Of course this is trivial to set up (create album before, upload photos after), but I found it amusing nonetheless.
This sentence has three as, one b, two cs, two ds, thirty-six es, three fs, three gs, eleven hs, nine is, one j, one k, three ls, one m, eighteen ns, twelve os, one p, one q, eight rs, twenty-six ss, twenty ts, two us, five vs, seven ws, three xs, four ys and one z.
I eventually found the solution by iteratively updating the approximate distribution of each letter and finally sampling  - not sure if there is a better way!
 solve_2 in tsh.py at https://bitbucket.org/akxlr/tsh/src
I bet you can construct a language where a solution exists, but all of its neighbors' errors (using the topology implied by your gd function) are local maxima.
The basic idea instead is to treat each letter count as a random variable, and ask what the distribution of each count is. In particular, each letter count can be expressed as a sum of (a mapping of) the other letter counts, so if you know the distribution of all other letter counts you can improve the distribution of the letter of interest. Initially assume uniform for all counts, then iterate until convergence. The images in that repo show the distribution of some letters at various stages. After doing this for about 10 iterations you stop seeing any improvement (the letter distributions are as 'peaked' as they are going to get), at which point I drew samples until I found a solution.
The update function you wrote on the distribution space is continuous, and distribution space is compact (since there's no way to have more than N of any letter), so there is necessarily at least one fixed point.
Fixed points could still be sources though, right? It clearly wasn't, but I'm curious to know if you got lucky, or if you merely didn't get unlucky.
I wonder how effective your code would be on the following self descriptive sentences (base j).
For instance when j=2, "100 1,11 0" has 4 ones and 3 zeros as described.
Does your code consistently find solutions as you increase j? At what point does the computation become unfeasible?
I will endeavour to try your problem, is it related in some way to a well-known problem?
I wonder whether it reads its own source code file to detect where to put the spaces for the image, or is it all encoded as data in the ruby program itself? I suppose it's the latter (otherwise it wouldn't qualify as a quine), but I can't figure out how it is encoded.
https://github.com/mame/quine-relay/blob/master/src/code-gen... < VB
https://github.com/mame/quine-relay/blob/master/src/code-gen... < PHP
> md5sum md5.gif
Slightly embarrassing, but at least I wasn't `cat foo|md5sum`
$ echo hello > fileone
$ echo there > filetwo
$ md5sum fileone filetwo | tee /tmp/ck
$ md5sum -c /tmp/ck
$ echo oops > filetwo
$ md5sum -c /tmp/ck
md5sum: WARNING: 1 computed checksum did NOT match
$ wget -q -O- https://shells.aachen.ccc.de/~spq/md5.gif | md5sum
"The formula is a general-purpose method of decoding a bitmap stored in the constant k, and it could actually be used to draw any other image."
That makes it less of a self-referential formula in my opinion, and more of a cool formula that can print any bitmap.
Now, if the formula was also capable of drawing the constant k, THAT would be exciting.
Two "different" EXEs with same SHA1/MD5 with different outputs: https://roastingbugs.blogspot.com/2017/03/eat-more-hashes.ht...
Here's how a similar thing was done for PostScript: https://twitter.com/teh_gerg/status/838422647193157632
There is a chance that there are unknown flaws in the MD5 algorithm that make preimage attacks far easier to compute.
Our compute power is also growing cheeper, which would let us solve harder preimage problems.
It's still fine for non-cryptographic purposes, except that it's far slower than e.g. CRC.
So you could have two types: one where the hash matches the input bytes, another where an hex string is inputed and that matches the hex representation of the hash
I don't think any of the known weaknesses of MD5 would help you in finding MD5(x) = x.