
Image unshredder by simulated annealing (2015) - nayuki
https://www.nayuki.io/page/image-unshredder-by-annealing
======
foob
This is a fun demo but, as others have mentioned, simulated annealing really
isn't necessary, or even that appropriate, for this specific problem. I threw
together the same demo with a very trivial algorithm that I think does a more
accurate job of reconstructing the same images in significantly less time than
the simulated annealing [1]. The algorithm consists of first finding the
closest matching pair of pixels as a seed. It then finds the remaining column
that most closely matches one of the current edges and adds it to the
reconstructed columns. This then repeats until there are no remaining columns.

On a different note, if this quote:

> _The unscrambling problem being solved here is much harder than a typical
> unscrambling problem, because we are rearranging single columns of pixels._

is referring to reconstructing shredded strips of paper then it is laughably
false. The single columns of pixels are extremely clean and correspond very,
very closely to their neighboring columns. In reality, the tearing of the
paper makes the edges not match very well and it's just much noisier in
general. I've worked on this problem before and it's many orders of magnitude
more challenging than the reconstruction that is done in this demo.

[1]
[http://sangaline.com/blog/image_unshredder/](http://sangaline.com/blog/image_unshredder/)

~~~
nayuki
Thanks for the helpful critique. Your proposed algorithm is reasonable and
much, much faster than simulated annealing, and I appreciate the live demo. It
looks like your algorithm is deterministic (except for tie-breaking of a pair
of columns that have the same difference?), so it doesn't matter which way the
image is shuffled.

You are also correct that I overstated how hard the problem is; in retrospect
it isn't that hard at all to unshred clean digital images (as opposed to paper
scans).

On your web page:

> It takes roughly the same amount of time to run but correctly reconstructs
> all of the images.

This is not true, but is very close to correct. The image "Blue Hour in Paris"
has some disturbances in the lower part of the sky, "Alaska Railroad" has a
single wrong column on one side of the image, and "Abstract Light Painting"
sometimes has a few incorrect columns on the edge. At least that's what I
could spot by eye at a glance; I didn't rigorously compare them against the
original images.

~~~
foob
First off, thanks a lot for posting this and you did a great job with it! It
was fun to play around with and it obviously inspired me enough to dig into
the code and try my own variation.

You're right about them not matching exactly; I did it in a hurry and wasn't
looking closely enough. I removed that incorrect statement (though it does
seem to generally outperform the simulated annealing).

The tie breaking for pairs with equal differences and the parity of the image
is essentially random. It is deterministic in the code, because it won't
replace a candidate unless it's strictly better, but it's totally arbitrary
which is found first.

------
jcoffland
You could improve this greatly by adding another type of change to the
annealing function. Instead of just swapping columns sometimes (I.e. at
random) it should evaluate flips. A flip is performed by choosing two columns
at random and then reversing the order of all the columns between them.
Annealing algorithms in general benefit from a diverse set of change
functions.

~~~
nayuki
I really like your idea of mirroring a random range of columns. I would
consider implementing it the next time I have time to work on the code

~~~
jaggederest
I think another optimization would be to 'grow' the regions considered.
Instead of treating every column as a separate object, have the size of
regions considered vary with the temperature. The metaphor would crystal
formation within annealed metal.

------
danbruc
Why simulated annealing? Wouldn't it be much more efficient to just compare
all pairs of columns and then sort them cording to similarity? That would take
a 360,000 comparisons instead of about a billion iterations for an image 600
pixels width.

~~~
contravariant
If you have the similarity between pairs of columns and try to find an order
such that the similarity between adjacent columns is minimal, you've basically
got a version of the travelling salesman problem.

~~~
robinhouston
Although the travelling salesman problem is NP-hard, and it’s possible to
construct hard-to-solve instances, there are algorithms that work very well
for most instances.

I just tried using LKH[0] to reconstruct these scrambled images, and it does
it perfectly in a fraction of a second.

[0]
[http://webhotel4.ruc.dk/~keld/research/LKH/](http://webhotel4.ruc.dk/~keld/research/LKH/)

~~~
nayuki
Interesting experiment! It took me a minute to understand why you mentioned
the TSP, but then I realized that you can model each column as a destination
and the pixel difference between two columns as the distance. Once I got that,
I can certainly appreciate that a state-of-the-art TSP solver will solve the
scrambled images perfectly in no time. =)

~~~
robinhouston
I just posted my code, in case anyone is interested in playing with it.

[https://github.com/robinhouston/image-
unshredding](https://github.com/robinhouston/image-unshredding)

~~~
nayuki
Discussion:
[https://news.ycombinator.com/item?id=12670997](https://news.ycombinator.com/item?id=12670997)

------
08-15
It doesn't work, plain and simple.

Never mind what many others have said, that a greedy algorithm might work,
that you can reduce it to TSP and apply heuristics, etc. The application of
simulated annealing is also crippled: Suppose you reconstructed two slices of
the image. To improve it, you have to take a column from one slice and move it
to the other---but that doesn't lower the energy! What would lower the energy
is to take one of the slices and attach it to the other, possibly flipping it
in the process. So while it says "simulated annealing" on the lid, it's a
random walk for the most part.

The idea is workable, but it needs a different primitive operation: pick a
_range_ of columns, attach it in a different spot, possibly flipping it.

------
retreatguru
It looks like it might work best as a two step process. first pass to connect
the major chunks, then find the edges and anneal those chunks.

~~~
simcop2387
Along with flipping chunks horizontally if no good solution is found. right
now some chunks end up mirrored

------
pablobaz
I'd be interested to see how a Genetic algorithm with carefully chosen
mutation/recombination operators would suit this problem.

Mutations could be flips and random swaps. Recombination would be combining
chunks of already partially optimised solutions (breeding).

Genetic algorithms do suffer a cost of having to keep a population of
solutions in memory so it would be interesting to see how the end results
compares in performance to annealing.

~~~
jobigoud
> carefully chosen mutation/recombination operators

A job for a meta-genetic algorithm maybe? Where the mutation operators are
themselves evolved.

------
wwwtyro
Would like to see this applied to a page of text.

~~~
smcmurtry
And to a bag of slices from multiple pages of text.

------
Luc
Some real world unshredding by Fraunhofer Institue:

[http://www.ipk.fraunhofer.de/fileadmin/user_upload/IPK_FHG/p...](http://www.ipk.fraunhofer.de/fileadmin/user_upload/IPK_FHG/publikationen/themenbroschueren/at_vReko_en.pdf)

Shredded secret service files, fragments of papyri or frescos...

~~~
SyneRyder
The BBC also did a good story on Fraunhofer's E-Puzzler being used to piece
together shredded Stasi documents:

[http://www.bbc.com/news/magazine-19344978](http://www.bbc.com/news/magazine-19344978)

------
ttul
Simulated annealing is super cool. Back at the dawn of time, my colleague and
I used this technique to place and route transistors for a project in our VLSI
course. It was the most fun.

------
sam4ritan
there are still a lot of destroyed documents being stored around the world
(say, from the StaSi for example). I would love to see this applied on them.
It migjt shed some light on things supposedly lost to history.

~~~
yiyus
Some way to fix the image would certainly help, but you still need to find a
way to scan all the parts and put them in the right direction.

Moreover, some of those documents may have been destroyed for very valid
reasons.

~~~
diggan
Sliding a bit off topic but what would be a valid reason for them to have been
destroyed? Other than masking history, which I'm sure is valid for some, for
most it's not a great thing.

~~~
throwanem
The internal security agencies of repressive governments are very good in
general at collecting people's secrets for use as weapons. There's no
potential for harm that I can see in destroying such information, and
considerable potential for harm in recovering it.

------
catscratch
Didn't work for me either time I tried.

~~~
nayuki
You are right; with the current algorithm it is quite difficult to fully
unscramble the image into one single strip. For now, the best you can do is
get a bunch (~20) of strips after annealing, and manually flip and rearrange
them (in an image editor) to get back the original image.

The default temperature of 4000 is fine for all the example images at 600×400.

The default 30 million iterations leads to a reasonable run time of about 10
seconds, but leaves many small strips after annealing. Bumping up the number
of iterations to 300 million, 3000 million, etc. will take proportionally
longer time, and give marginally better results.

~~~
mdonahoe
Based on the description in the webpage, I expected it to work on the default
image with default parameters

------
45h34jh53k4j
Thank you for the demo. This helped me understand the recent quantum computing
(quantum annealing) process a little more concretely.

What I saw here was a scrambled image being sorted to find a local minima of
entropy in pixel color value between respective stripes. A quantum annealing
computer is doing is, where the image is the problem space (scrambled because
we dont have the solution), and the quantum computer rearranges this to find
global minima/maxima.

Where a QC is not useful would be when the problem space is too big (image too
big), or it can only find a local minima that isn't the solution to the
classical problem (picture is still objectively after annealing)

please comment someone if my understanding is wrong

------
code_sardaukar
Awesome, but please put the original energy somewhere. It's not clear whether
the final configuration actually has a lower or higher energy that the
original image, which would be interesting to know.

------
nayuki
> The unscrambling problem being solved here is much harder than a typical
> unscrambling problem, because we are rearranging single columns of pixels.
> As a consequence of this design, it is impossible to resolve the horizontal
> reflection symmetry ambiguity – every image has exactly the same energy
> score as its mirror image.

> In a more typical problem, the image would be sliced into groups of columns,
> where each group moves as a unit. This situation would reduce the number of
> pieces to unscramble, and also make the arrangement be unambiguous to
> horizontal flips.

------
bragh
IIRC, the best thing to do was to apply clustering to the rows/columns
(k-means and hierarchical both worked, but I think it was hard to find the
appropriate tree height for hierarchical clustering) and then to apply
seriation:
[http://innar.com/Liiv_Seriation.pdf](http://innar.com/Liiv_Seriation.pdf)

------
Beltiras
Couldn't you do some feature detection on the edges of the resulting columns
and repeat the annealing to get a better end-product?

------
yincrash
Is there a definition of temperature that someone can share that would be
relevant in this context?

~~~
corpus
probability of bouncing out of a local minimum

