Hacker News new | past | comments | ask | show | jobs | submit login
Where's Waldo? (stackoverflow.com)
409 points by bkaid on Dec 18, 2011 | hide | past | web | favorite | 30 comments

This is a toy example of the kind of problem that the field of Computer Vision is actively working on: object detection. In a (tiny) nutshell, our best answer for general images and objects is:

1) Instead of using the full color pixel image, use an "edge image" with some simple additional normalizations. If color is important, do this per color channel.

2) Create a dataset with as many cropped examples of the target object as you can find (mechanical turk is useful for annotating large datasets); every other crop of every image is a negative example.

3) Train a classifier (SVM if you want it to work, neural network if you're so inclined) using this dataset.

4) Apply the classifier to all subwindows of a new image to generate hypotheses of the target object location. This can be sped up in various ways, but this is the basic idea.

5) Post-process the hypotheses using context (can be as simple as simply finding the most confident hypotheses within a neighborhood).

If you're interested in object detection, an excellent recent summary of the recent decade of research is due to Kristen Grauman and Bastian Leibe: http://www.morganclaypool.com/doi/abs/10.2200/S00332ED1V01Y2... (do some googling if you don't have access to this particular PDF).

A cool paper from a few months ago that should be mentioned when commenting on a post called "Where's Waldo?" is http://www.cs.washington.edu/homes/rahul/data/WheresWaldo.ht...

Heh, I started reading this comment and was ready to jump on something I disagreed with, but remarkably we're in full agreement!

Somehow I'm always surprised when two vision people agree on the right way to approach a problem =)

Something unrelated but perhaps interesting to some people, "Waldo" is actually a localised name for the USA and Canada, his original name is Wally.


It brings me an almost indescribable joy to find that Wally is the original name. Yet I have no idea why.

Waldo always seemed a bit of a strange name, and it still confuses me why it would be changed for the US market. Anyone know why (Wiki doesn't say).

Gone with Where is Waldo?! The all new meta-existential gamebook Why is Waldo? is out now in your nearest bookshop!

I felt dirty with all the exclamation marks.

Then you may cry when I tell you that his name is Holger in Denmark.

In Norway it's Willy. Kind of strange with all the different names for the same character.

Why is it strange? "Wally" (pronounced in "correct" Norwegian) sounds really weird.

I should have written fascinating, not strange.

I'm surprised, would have expected that in Germany.

> I'm surprised, would have expected that in Germany.

Wo ist Walter?

at a guess its for market differentiation. like how I get the "international" version of US textbooks, which are exactly the same but with a different cover on the front. Or how Harry Potters and the Philosophers Stone needed a title change to sell to US. Or how you can get adult and child versions of the Harry Potter books.

Me too. Probably some committee or executive idea.

Are there other examples of it working? (if there were links, I couldn't see them).

There's a danger of overfitting, where a technique works for one instance (or a subset of instances), but not in general. Detecting stripes could work in general, but as a SO commenter noted, "Where's Wally" images often include spurious stripes to undermine this detection strategy for humans.

To say nothing of that one Where's Wally image consisting entirely of imperfect Wally impersonators, with one real Wally identifiable only because has the correct hat, glasses and chin.


The algorithm described by Heike is essentially just looking for striped red and white shirts. Anyone who's done more than a couple of "Where's Waldo?" games knows that striped shirts are often thrown in to draw one's eye. In fact, in this very example there is another striped shirt (lower left corner, just above the wall) which could very well have been Waldo that this algorithm did not highlight. Without being able to recognize Waldo's human characteristics (thin, glasses, strong chin) the approach described will inevitably fail.

I had to play around a little with the level. If the level is too high, too many false positives are picked out.

I was impressed until I read that--the guy is basically fitting the model/procedure to the training set (of size 1). I'd wait for a more general approach before accepting the answer.

On "Wait, Wait, Don't Tell Me!", which is a comedy make-fun-of-the-news quiz show. They exaggerate everything in that way for comedic effect.

Programming potential never ceases to amaze me. I want to learn more. NOW!

You might want to check out ai-class.com - it includes an introduction to computer vision (and plenty of other cool stuff).

Cool. I've done some work on things like this before. Some of the things I do to make it work on multiple images:

Template matching is your friend in this case, because most Waldos look similar. You already tried this in a basic way by searching for the stripes of a given color. You can make it more powerful by making the template include more properties, and work in more contexts. For instance: what if Waldo's a different size?

The other option is to pretend you don't know what Waldo looks like, find him in a bunch of images, label the subimages as "waldo" candidates, measure certain properties of those subimages, and find which of coordinates of feature space have similar properties. Then use these properties as your template.

Finally, you could train a classifier on subwindows like sergeyk suggested. This has some difficulty because where's waldo images are difficult to subdivide into subwindows on the scale of a single person. Do you move pixel by pixel? Do you divide it into a grid? Each grid will contain weird parts of people in each box. Etc. If you do find a way to divide the image into "people" -- perhaps by doing a preliminary "person"-template sweep that identifies locations of people in the image -- then you can use a supervised learning algorithm to say "yes, this person is waldo" or "nope, FRWONG!", based on the image properties in the subwindow around that person.

This needs to be an augmented reality mobile app. The problem on the AI side of things is that a good algorithm that reliably "learns" what Waldo looks like would need a substantial number of examples.

A good solution to this would get close, then calculate the probabilities of every "maybe-waldo" and then display the one with the highest probability of being Waldo. An augmented reality app that highlighted Waldo on every page would be awesome.

If you've got net access (or even if you don't), it seems almost plausible that you could just identify the book/page in question and use a lookup table of coordinates.

I don't know how many variations on the /Where's Wa[a-z]+\?/ theme have actually been produced, though, so maybe it wouldn't be easier.

Then again, if you can upload unknowns, wait until you've got enough samples to generate confidence, and then store the result, it'd scale/perform much better :)

Amusing application, but I'd like to see the version that finds Waldo on the page in which everyone is wearing striped shirts

In most normal applications, the only thing that would change is what your features are. For example, if you wanted to find Waldo using the shape of his face and/or hat, you would probably just find some SIFT points (or something), and then build an eigenWaldoface, possibly using a PCA'd set of Waldo faces and hats as examples, and then SIFT the image and look for the places that are most like the eigenWaldoface.

This article is not interesting because it's an amazing new algorithm or something that solves some important world problem. It's interesting because it takes something that is not known among the general hacker population for doing this sort of thing really easily, and accomplishes it in a fairly simple way.

Don't be a grump, this is cool. :(

Ha, I wasn't being a grump...these kinds of problems are An important party of the evidence in showing thre practicality and usefulness of code. I love it.

I mostly wanted to see who else remembered that particular Waldo puzzle...it was the final one in one of the books

I do. It took us an entire plane trip from Sacramento to Chicago to find him there :)

interesting problem. i'd like to then apply this concept of finding a needle in a haystack to satellite imagery. Using super-computing + giant image data sets, you could theoretically find some pretty obscure stuff if you knew what you were looking for (hidden treasures???).

This is undoubtedly a data point on the path to the singularity.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact