There's a famous anecdote about human unshredding: when the Iranian revolutionary took over the US embassy in 1979, they captured the shredded remains of secret documents. They took those shredding to the carpet weavers Iran is so famous of, who manually rewoven the shreds into the original documents (See http://en.wikisource.org/wiki/Documents_seized_from_the_U.S....)
The German government is funding a project to algorithmically unshred the files that the Stasi (East Germany's secret police) destroyed in 1989 when the Berlin wall came down and protesters started to occupy their buildings:
I thought the point of this exercise was going to be to produce an Instagram filter that creates that awesome 'shredded' effect as seen in those real shredded documents.
Turns out they are just splitting images up and rearranging them - not as interesting.
This is a pretty amazing story. Thanks for sharing! I wasn't inspired by this when I wrote it up, but it looks like it's not the first time it's crossed someone's mind.
This is already a solved problem, but the existing processes are hugely expensive, darpa is looking to see if anyone has a good idea on making it cheap enough to be routine.
I knew I'd heard of a similar challenge. If I was a cynic, I'd say instagram would take the winning solution and enter it in the DARPA challenge. Crowd-sourced competition entries!
I thought it would be a fun thing to do tonight until I saw all the tips given at the end of the article. They basically give the solution, it's not funny anymore :(.
Wa can make it harder in many creative ways, and that would be fun, but I wrote this comment to express the feeling of disappointment I had when I saw the post. I felt a little bit like someone showed me a candy but already ate half of it.
It is still Fizzbuzz++: they're not looking for computer vision researchers. They're looking for programmers who are competent enough to be worth their time interviewing. If you can go from hints in the direction of the solution to a) reading library documentation, b) coming up with an innovative solution to an only-partially specified problem, and c) actually producing functioning code which implements your solution, you are heads and tails above the hiring pool.
I agree with you, and I know that you have some serious knowledge about that, but seriously, reading HN makes me feel that this hiring pool would be filled only with programmers that could solve any type of code-puzzle in their sleep.
I have to echo patio11 below. The point here isn't to find PhD-level image analysts, it's to find some relatively solid (Python) developers in fairly short order, while handing out advertising to people in the form of shirts. The problem isn't hard for people who end up reading Hacker News on a Friday night (and Veteran's Day in the US, when some folks are off work altogether), but this is just the kind of people they're looking for in a tight tech labor market.
I noticed they did remove on of the functions they had listed in the hints at the bottom, possibly to make things a bit more challenging.
My first take on it was the following:
Take the average of the differences between each pixel and the pixel adjacent to it over each column and store that as, say, X[].
Find some maximum number such that the sum of every nth column of X - the sum of all other rows of X is maximised. That's the column width.
Split into a series of columns, use stable matching to match columns based on the sum of differences between rightmost male pixel and leftmost female pixel over all rows.
Use stable marrage to give you a partial ordering, turn into a complete ordering and unscramble the image.
"sum of differences between rightmost male pixel and leftmost female pixel over all rows"
I tried this, and it worked for a few test pictures I downloaded, but it fails for the given Tokyo panorama: the dark edges and sharp corners in some buildings confuse the algorithm. The matching must be done on somewhat more than pixel values alone...
I put in a sampling resolution, to only examine every nth row. For a number of pictures, n=50 was fine. One picture even allowed n=200. For the given image, n=2 would not even work; I had to sample every row.
The starting strip is, by definition, the strip on which no other strip has been placed left of. There is no other way of determining what the starting strip is; it's not a given. The problem is that with my current pixel matcher (sum of absolute differences) one strip is wrongly attached. That made me think I also have to take other features in consideration, like color, hue and/or line detection. But that seems outside the scope of this challenge.
The algorithm works fine for some random pictures I found on the web. It's just not working for the Tokyo picture ;-(
My algorithm, given a strip, would tell you which strip was on the right and also gave a certainty. The last strip would have a "next strip" value shared with an earlier strip (thus wrongly attached) but with less certainty.
Edit: Or (worst case) the last strip would have a high degree of uncertainty.
That's what stable marriage is for. With some reasonable heuristic it should just give you the best partial ordering, which when flattened into a total ordering should give you the most likely picture.
This is exactly how I implemented my solution.
The worst case scenario is important because some images have last strips that have a unique "next strip" value.
I find it strange that they don't provide an interface that the desired program should implement. For example, "provide the input image as a command line parameter to the script / binary, write the output to out.jpg"
Because of this, is someone going to actually take the time to understand what assumptions people make to run the program? If there was an interface, they could just run a script to test submissions...
The tips seem to be just a crude pixel matcher. Seems much more elegant to transfer the image into the frequency domain via FFT and find edges that way.
That is false. (No information is lost in the Fourier Transform.) However, I'm not sure how Fourier analysis would help in this case - image processing is not my forte. Perhaps there would be a spike at the spatial frequency where the discontinuities occur (given that the discontinuities are evenly spaced)?
EDIT: Given the slices, is it possible to order them properly entirely in the frequency domain?
You are correct: no information is lost in the Fourier Transform. What I meant is that it actually becomes harder representationally to compare localized amplitudes of the time domain information in the frequency domain. And typically, what you're looking at the frequency domain will tell you nothing about location data in the original frequency.
Here's why:
When you do the Fourier Transform (or in this case, the Discrete Fourier Transform (DFT)) you turn your time domain function into a frequency domain function. This function resides on the complex plane, with a real and imaginary part.
Typically, you're doing the transform to get at the information in the magnitude, which represents the frequency information in the original time domain function. To get at the magnitude, you do: sqrt(Re^2 + Im^2).
Now, when you're looking at the magnitude, you have lost location information entirely about the original signal. Rather, you now have perfect frequency information. (yes, combined with the phase: atan(Re/Im), you can reconstruct the original signal).
Here's an example. Imagine a time domain function x(n) where x(0) = 1, and x(n) = 0 everywhere else. This is just a spike at 1.
The magnitude of the DFT of that function is f(n) = 1 everywhere! In the time domain, you have total localization of the signal all in one point. But, in the frequency domain you have the opposite (constant 1 everywhere)! Intuitively, the reason for this is because in order for sines and cosines to represent such a localized function, you have to add up a lot of them in order to cancel each other out and produce such a localized signal.
tl;dr:
When you're comparing localized values in the time domain, the FT will most likely not be the tool to use. The FT gives you frequency information, without spatial information unless you take into account the phase. In this instagram puzzle example, the information about the pixel values at the edges of the strips have now been "spread" across many frequency values in the freq domain.
EDIT:
Another example is x(1) = 1 and x(n) = 0 elsewhere. This is just a time shifted version of the function given above. The magnitude of the DFT is still just a constant 1 everywhere. No difference! Only the phase differs.
Now we are talking past each other ;) You're right, of course, which is why I said I'm not sure FT would be useful in this case. I was giving the grandparent the benefit of the doubt, however: it is at least conceivable to me that the evenly spaced discontinuities in the shredded image (due to the uniform slice width) could present as a frequency spike (or rather, a series of them) in the FT of some function of the input (the derivative, perhaps) in the same way that the FT of a Dirac comb is also a Dirac comb. Wild speculation, of course, because like I said - image processing is not my forte.
Isn't this extremely obvious? Couldn't you just take the first slice, calculate the total color difference between its right edge pixels and the corresponding left edge pixels of each of the other slices, and stick it next to the one with least difference, then repeat?
I mean, you'd need to specify some threshold to handle slices on the ends, and maybe use sum of (difference^2) instead of just the sum, but it shouldn't be hard at all. Am I missing something?
While this problem is indeed easy, what you described is a greedy algorithm and won't give you the most optimal solution and, specifically, could give you a different solution if you shuffled the shreds, which shouldn't be happening.
Hint: very often, when you think "isn't it extremely obvious?", you've come up with a greedy algorithm and it's not the right one. :)
That would be slow if you didn't know the slice size (the bonus part of the question), but no, you're not missing anything for the basic solution. That said, it would be O(n^2) since you're comparing every slice to every other slice.
The general problem is fairly easily isomorphic to traveling salesman, so if you can get even O(n^2) you're already making simplifying assumptions that mean you can't solve the general case. Further, with n=20, I wouldn't usually worry too much about asymptotic performance unless you're in exponential territory.
Actually, on second thought, this isn't quite so easily isomorphic to traveling salesman (any particular "order shredded strips" problem is easily equivalent to a traveling salesman problem, but I'm not positive that an arbitrary traveling salesman problem can be re-cast as an "order shredded strips" problem in a consistent manner, so I shouldn't make that claim). So I retract my statement, it's quite possible that someone could effectively solve this problem for reasonable inputs without quadratic runtime (though I'm not positive exactly how you would do it, and the efficacy of your method would likely depend on your assumptions about the inputs).
Interesting. A 90 degree phase change makes a huge difference
> For maximum security, documents should be shredded so that the words of the document go through the shredder horizontally (i.e. perpendicular to the blades). Many of the documents in the Enron Accounting scandals were fed through the shredder the wrong way, making them easier to reassemble.
For anyone who's thinking of using Ruby I recommend OilyPNG instead of RMagick. It's a c extension to ChunkyPNG and is much faster if you don't need all of the fancy filters that come with ImageMagick.