Hacker News new | past | comments | ask | show | jobs | submit login
Instagram Engineering Challenge: The Unshredder (instagram-engineering.tumblr.com)
156 points by mikeyk on Nov 11, 2011 | hide | past | favorite | 66 comments

There's a famous anecdote about human unshredding: when the Iranian revolutionary took over the US embassy in 1979, they captured the shredded remains of secret documents. They took those shredding to the carpet weavers Iran is so famous of, who manually rewoven the shreds into the original documents (See http://en.wikisource.org/wiki/Documents_seized_from_the_U.S....)

The German government is funding a project to algorithmically unshred the files that the Stasi (East Germany's secret police) destroyed in 1989 when the Berlin wall came down and protesters started to occupy their buildings:


Yeah it was fascinating, there's a whole article on it from Wired:


I thought the point of this exercise was going to be to produce an Instagram filter that creates that awesome 'shredded' effect as seen in those real shredded documents.

Turns out they are just splitting images up and rearranging them - not as interesting.

(see for eg. http://en.wikisource.org/wiki/File:Espionage_den02_79.png)

This is a pretty amazing story. Thanks for sharing! I wasn't inspired by this when I wrote it up, but it looks like it's not the first time it's crossed someone's mind.

Interestingly enough, DARPA is running a similar challenge (their prize is a lot better): http://www.darpa.mil/shredder_splash.aspx#Splash

This is already a solved problem, but the existing processes are hugely expensive, darpa is looking to see if anyone has a good idea on making it cheap enough to be routine.

My first thought was that not only they were inspired by that one but they're trying to parody it.

I knew I'd heard of a similar challenge. If I was a cynic, I'd say instagram would take the winning solution and enter it in the DARPA challenge. Crowd-sourced competition entries!

The DARPA challenge is much more complicated, the Instagram challenge (which is accompanied by a solution!) is child's play.

Though the apparent ease of the latter could potentially inspire some brilliant approach that ends up being the basis for a solution to the former.

Now that, my friend, seems to be an actual challenge.

I thought it would be a fun thing to do tonight until I saw all the tips given at the end of the article. They basically give the solution, it's not funny anymore :(.

So make it more fun:

* Support crosscut shredders (shredded in both directions).

* Make it work even if the shreds aren't all the same size.

* Make it work when there are some pieces missing.

* Show where those missing pieces would probably go.

* Make it work when two images' shreds are mixed together.

* (with missing pieces)

* (generalized to N images)

* Predict what the missing pieces might look like (this one is much tougher).

* Think of ways you could cheat (like searching Tineye for the fragments to find the original image, if it was originally online somewhere).

You can keep going as long as you want. If it's too easy, just make it harder.

Edited because HN doesn't support Markdown or linebreaks. :(

If it's too easy, just make it harder.


Wa can make it harder in many creative ways, and that would be fun, but I wrote this comment to express the feeling of disappointment I had when I saw the post. I felt a little bit like someone showed me a candy but already ate half of it.

* write a plugin for photoshop or gimp that will select the arbitrary shreds.

Does that exist already?

Implement zoom and enhance so we can see what all the people in those buildings are up to.

Also: make it work for shreds at any angle and overlapping, as if you dumped the pieces on a flatbed scanner and hit 'scan'.

Another idea: combine it with a file recovery or computer forensic program to stitch jpeg file fragments together.

Agreed. Showing how to use PIL, fine. But if they really intended to use this as a gate, then showing the key to a solution -- what's the point?

It is still Fizzbuzz++: they're not looking for computer vision researchers. They're looking for programmers who are competent enough to be worth their time interviewing. If you can go from hints in the direction of the solution to a) reading library documentation, b) coming up with an innovative solution to an only-partially specified problem, and c) actually producing functioning code which implements your solution, you are heads and tails above the hiring pool.

I have a serious problem with this.

I agree with you, and I know that you have some serious knowledge about that, but seriously, reading HN makes me feel that this hiring pool would be filled only with programmers that could solve any type of code-puzzle in their sleep.

This particular problem isn't a puzzle. It really is a Fizzbuzz for image processing. No qualified applicant will fail to implement this.

I have to echo patio11 below. The point here isn't to find PhD-level image analysts, it's to find some relatively solid (Python) developers in fairly short order, while handing out advertising to people in the form of shirts. The problem isn't hard for people who end up reading Hacker News on a Friday night (and Veteran's Day in the US, when some folks are off work altogether), but this is just the kind of people they're looking for in a tight tech labor market.

I noticed they did remove on of the functions they had listed in the hints at the bottom, possibly to make things a bit more challenging.

Warning, spoiler?

My first take on it was the following: Take the average of the differences between each pixel and the pixel adjacent to it over each column and store that as, say, X[]. Find some maximum number such that the sum of every nth column of X - the sum of all other rows of X is maximised. That's the column width. Split into a series of columns, use stable matching to match columns based on the sum of differences between rightmost male pixel and leftmost female pixel over all rows. Use stable marrage to give you a partial ordering, turn into a complete ordering and unscramble the image.

Any better/more elegant solutions?

"sum of differences between rightmost male pixel and leftmost female pixel over all rows"

I tried this, and it worked for a few test pictures I downloaded, but it fails for the given Tokyo panorama: the dark edges and sharp corners in some buildings confuse the algorithm. The matching must be done on somewhat more than pixel values alone...

EDIT: See how my implementation fails: http://www.michielovertoom.com/incoming/TokyoPanoramaUnshred...

I put in a sampling resolution, to only examine every nth row. For a number of pictures, n=50 was fine. One picture even allowed n=200. For the given image, n=2 would not even work; I had to sample every row.

Yeah, there are a whole bunch of things you could try, but i think a standard Gaussian blur on the edges would solve your problem.

Can you post some of your test pictures?

Re: your attempt: I think you just need to find the starting strip, looks like your algorithm is working.

The starting strip is, by definition, the strip on which no other strip has been placed left of. There is no other way of determining what the starting strip is; it's not a given. The problem is that with my current pixel matcher (sum of absolute differences) one strip is wrongly attached. That made me think I also have to take other features in consideration, like color, hue and/or line detection. But that seems outside the scope of this challenge.

The algorithm works fine for some random pictures I found on the web. It's just not working for the Tokyo picture ;-(

Minor spoiler/hint:

My algorithm, given a strip, would tell you which strip was on the right and also gave a certainty. The last strip would have a "next strip" value shared with an earlier strip (thus wrongly attached) but with less certainty.

Edit: Or (worst case) the last strip would have a high degree of uncertainty.

That's what stable marriage is for. With some reasonable heuristic it should just give you the best partial ordering, which when flattened into a total ordering should give you the most likely picture.

How do you calculate your certainty?

Currently I'm using a matrix to determine adjacency, but I'll give this graph idea a try. Thanks for the tip!

PS. Have you actually realized a succesful implementation? (Just curious)

No problem! Yes, I have this working:


It also works on the test images I created:

http://webfuel.org/a.png http://webfuel.org/b.png http://webfuel.org/c.png

This is exactly how I implemented my solution. The worst case scenario is important because some images have last strips that have a unique "next strip" value.

I find it strange that they don't provide an interface that the desired program should implement. For example, "provide the input image as a command line parameter to the script / binary, write the output to out.jpg"

Because of this, is someone going to actually take the time to understand what assumptions people make to run the program? If there was an interface, they could just run a script to test submissions...

The tips seem to be just a crude pixel matcher. Seems much more elegant to transfer the image into the frequency domain via FFT and find edges that way.

No way! You'd lose all spatial information in the frequency domain.

That is false. (No information is lost in the Fourier Transform.) However, I'm not sure how Fourier analysis would help in this case - image processing is not my forte. Perhaps there would be a spike at the spatial frequency where the discontinuities occur (given that the discontinuities are evenly spaced)?

EDIT: Given the slices, is it possible to order them properly entirely in the frequency domain?

You are correct: no information is lost in the Fourier Transform. What I meant is that it actually becomes harder representationally to compare localized amplitudes of the time domain information in the frequency domain. And typically, what you're looking at the frequency domain will tell you nothing about location data in the original frequency.

Here's why:

When you do the Fourier Transform (or in this case, the Discrete Fourier Transform (DFT)) you turn your time domain function into a frequency domain function. This function resides on the complex plane, with a real and imaginary part.

Typically, you're doing the transform to get at the information in the magnitude, which represents the frequency information in the original time domain function. To get at the magnitude, you do: sqrt(Re^2 + Im^2).

Now, when you're looking at the magnitude, you have lost location information entirely about the original signal. Rather, you now have perfect frequency information. (yes, combined with the phase: atan(Re/Im), you can reconstruct the original signal).

Here's an example. Imagine a time domain function x(n) where x(0) = 1, and x(n) = 0 everywhere else. This is just a spike at 1.

The magnitude of the DFT of that function is f(n) = 1 everywhere! In the time domain, you have total localization of the signal all in one point. But, in the frequency domain you have the opposite (constant 1 everywhere)! Intuitively, the reason for this is because in order for sines and cosines to represent such a localized function, you have to add up a lot of them in order to cancel each other out and produce such a localized signal.


When you're comparing localized values in the time domain, the FT will most likely not be the tool to use. The FT gives you frequency information, without spatial information unless you take into account the phase. In this instagram puzzle example, the information about the pixel values at the edges of the strips have now been "spread" across many frequency values in the freq domain.


Another example is x(1) = 1 and x(n) = 0 elsewhere. This is just a time shifted version of the function given above. The magnitude of the DFT is still just a constant 1 everywhere. No difference! Only the phase differs.

Now we are talking past each other ;) You're right, of course, which is why I said I'm not sure FT would be useful in this case. I was giving the grandparent the benefit of the doubt, however: it is at least conceivable to me that the evenly spaced discontinuities in the shredded image (due to the uniform slice width) could present as a frequency spike (or rather, a series of them) in the FT of some function of the input (the derivative, perhaps) in the same way that the FT of a Dirac comb is also a Dirac comb. Wild speculation, of course, because like I said - image processing is not my forte.

Isn't this extremely obvious? Couldn't you just take the first slice, calculate the total color difference between its right edge pixels and the corresponding left edge pixels of each of the other slices, and stick it next to the one with least difference, then repeat?

I mean, you'd need to specify some threshold to handle slices on the ends, and maybe use sum of (difference^2) instead of just the sum, but it shouldn't be hard at all. Am I missing something?

While this problem is indeed easy, what you described is a greedy algorithm and won't give you the most optimal solution and, specifically, could give you a different solution if you shuffled the shreds, which shouldn't be happening.

Hint: very often, when you think "isn't it extremely obvious?", you've come up with a greedy algorithm and it's not the right one. :)

That would be slow if you didn't know the slice size (the bonus part of the question), but no, you're not missing anything for the basic solution. That said, it would be O(n^2) since you're comparing every slice to every other slice.

The general problem is fairly easily isomorphic to traveling salesman, so if you can get even O(n^2) you're already making simplifying assumptions that mean you can't solve the general case. Further, with n=20, I wouldn't usually worry too much about asymptotic performance unless you're in exponential territory.

Actually, on second thought, this isn't quite so easily isomorphic to traveling salesman (any particular "order shredded strips" problem is easily equivalent to a traveling salesman problem, but I'm not positive that an arbitrary traveling salesman problem can be re-cast as an "order shredded strips" problem in a consistent manner, so I shouldn't make that claim). So I retract my statement, it's quite possible that someone could effectively solve this problem for reasonable inputs without quadratic runtime (though I'm not positive exactly how you would do it, and the efficacy of your method would likely depend on your assumptions about the inputs).

In terms of optimization, it is always better to put a solution together in pieces rather than all at once.

Yes, I meant that after combining two, treat them as a new slice, and repeat until there is only one slice.

I did just that, but the algorithm fails on the provided Tokyo picture. It works on other pictures, tho.

Isn't this like the bunny slope of Darpas shredder challenge?

Interesting. A 90 degree phase change makes a huge difference

> For maximum security, documents should be shredded so that the words of the document go through the shredder horizontally (i.e. perpendicular to the blades). Many of the documents in the Enron Accounting scandals were fed through the shredder the wrong way, making them easier to reassemble.

Seems like they're trying to fix the rolling shutter effect on iPhones or shredded photos the iPhone takes sometimes.

For anyone who's thinking of using Ruby I recommend OilyPNG instead of RMagick. It's a c extension to ChunkyPNG and is much faster if you don't need all of the fancy filters that come with ImageMagick.


Looks like I was able to do it. I'm not sure if they will be happy to get a result in C#, though.

Also, not as easy as I thought it would be. A friend had to give me some help with the math to match the stripes.

Edit: How long did you guys take to figure this out? I took nearly 4 hours.

You know you're done if you understand what this image means (and how it was made): http://i.imgur.com/Kn7Za.png

Although Instagram promised a brand-spankin’ new Instagram shirt if you complete the challenge correctly, they are only sending out stickers.

I hope they are not another Zynga :).

Did anyone manage to match the 12th shred next to 6th ? (zero index). That white and black building is really messing it up.

Ok. Got it working... See demo http://vidoss.github.com/


It says in the post you win an Instagram T-shirt, as well as an interview at Instagram if you want one.

Some of those Instagram filters look like they make the job easier.

Doesn't DARPA have a similar challange?

in the sample they provided shreds seem to be 25 pix and not 32

You want to use the linked image [1], not the one they show in the post.

[1] http://instagram-static.s3.amazonaws.com/images/TokyoPanoram...

and this is why I need to start learning python.........

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact