
How do I find Waldo with Mathematica? - sillysaurus3
http://stackoverflow.com/questions/8479058/how-do-i-find-waldo-with-mathematica?repost_to_hn_because_the_last_discussion_was_861_days_ago=true
======
sillysaurus3
Interestingly, even though the top answer has 1337 upvotes, it appears none of
those people checked whether the answer actually works. Either that, or
Mathematica's algorithms have changed since Jan 2012.

Another commenter got me curious about just how hand-tuned the answer was, so
I fired up Mathematica 8 and put their solution in. I had to dig up the
original image using Wayback Machine (thanks, archive.org!) but then, to my
surprise, their answer didn't actually work. It merely darkened the image
uniformly, meaning it wasn't able to find any potential Waldos.

I fiddled around with their solution for a bit, and then finally just hooked
their parameter up to a slider. I thought maybe the "level" parameter just
needed a bit of tweaking. But surprisingly, it still doesn't work as
advertised: [http://i.imgur.com/XWNTvYU.jpg](http://i.imgur.com/XWNTvYU.jpg)

Their "level" parameter was originally 0.12, but like I said, that didn't work
at all. It doesn't work until the parameter is bumped up to 0.18, at which
point it gives the wrong solution (the lower-left circle). Then, 0.19 gives
another incorrect solution (the lower-right circle). Finally, 0.20 identifies
Waldo.

So this solution is a nice way to filter out potential targets, but I wish I
could figure out whether the original poster was being shady or whether
Mathematica's internals changed since they posted their answer. I'm inclined
to give them the benefit of the doubt, but that has me burning with curiosity
about what exactly must've changed in Mathematica's implementation.

EDIT: I didn't do the most basic of checks: to make sure I didn't introduce
any changes in the original experiment. It turns out by rehosting the image on
imgur, it compressed the resulting image, introducing artifacts into the color
channel. Their solution no longer worked as advertised due to this. That shows
how brittle the solution is, though! I wonder if we can make it more robust
without fundamentally rewriting it?

~~~
nswanberg
The code works as-is in Mathematica 9 (though with the URL modified to point
to archive.org), so there's likely a difference between your version of 8 and
9.0.0.0.

~~~
sillysaurus3
Can you try [http://i.imgur.com/WwQ8P5R.jpg](http://i.imgur.com/WwQ8P5R.jpg)
instead of the archive.org link? I rehosted the image on imgur, but it looks
like imgur re-compresses the jpg, resulting in differences in the image. If
so, then that demonstrates just how brittle this solution is.

~~~
mcescalante
I can confirm with this screenshot
([http://cl.ly/image/2Q2e3k1w0g25](http://cl.ly/image/2Q2e3k1w0g25)) that
imgur does indeed do some sort of compression that makes this solution break
with an imgur link for Mathematica 9. I may investigate a bit further. I had
previously tested with an uncompressed image, and the same technique did
produce the expected results from the answer on Stackexchange.

UPDATE: Changed the correlation value from .12 to .2 and the compressed imgur
image has some interesting results (3 circles, one of them is on Waldo):
[http://cl.ly/image/0Z3k150S2z10](http://cl.ly/image/0Z3k150S2z10)

~~~
sillysaurus3
_Changed the correlation value from .12 to .2 and the compressed imgur image
has some interesting results (3 circles, one of them is on Waldo):_
[http://cl.ly/image/0Z3k150S2z10](http://cl.ly/image/0Z3k150S2z10)

Yeah, I'm getting the exact same results. So it looks like compression is
messing with the algorithm's correlation ability. I think the problem is that
red color values are often thrown out in image compression (moreso than
intensity / green channels).

------
therobot24
it makes me incredibly happy that the top answer is just a simple correlation
- building a boltzman machine or svm is completely overcomplicating the
problem. In the other thread the top comment echos that this is a common
problem in CV (as in object detection) and goes into that process. But an
important thing we ML practitioners should remember is that we're building
toward a goal, and exercises like this can help put us back in our place when
we see that simple is usually better.

~~~
redmoskito
As long as we're admiring simple solutions that work on a dataset of size one,
and as long as we're allowing a human in the loop, you might like this
algorithm:

    
    
        return (150, 200);
    

It's pretty straightforward, it just took some parameter tweaking to match
Waldo's coordinates exactly.

Obviously I'm being pedantic, and I mean no disrespect, but I have a bone to
pick with so-called "computer vision" algorithms that are little more than
simple image processing. In this case, the time spent implementing and tuning
the algorithm exceeds the time it would take to solve the task manually (e.g.
pay a second-grader to do it). And as others have observed, it isn't obvious
that this algorithm would generalize to other images, in which case no time is
saved over the manual approach.

It is tempting to dismiss sophisticated techniques because (a) they are hard
to understand and (b) the task seems so easy to our own brain-equipped vision
systems. But the fact is, most interesting computer vision problems (including
this one) require sophisticated representations to achieve robustness and
generality. In other words, any good solution will need an answer to the
question "What is a Waldo?" that is better than "a 50x50 patch of pixels with
red and white stripes".

~~~
therobot24
> "so-called 'computer vision' algorithms that are little more than simple
> image processing"

Implementing any learning algorithm should be the absolute last resort.
They're complicated, hard to generalize, and difficult to guarantee in the
real world. Not that image processing is much better, but it's definitely more
deterministic and predictable in its results. With any system geared toward a
specific goal, your first approach should be to look at the simplest solutions
(no matter how rudimentary they may be).

> It is tempting to dismiss sophisticated techniques because (a) they are hard
> to understand and (b) the task seems so easy to our own brain-equipped
> vision systems. But the fact is, most interesting computer vision problems
> (including this one) require sophisticated representations to achieve
> robustness and generality. In other words, any good solution will need an
> answer to the question "What is a Waldo?" that is better than "a 50x50 patch
> of pixels with red and white stripes".

Yes, of course. I'm using waldo as an example toward a problem I often see
reading research papers, that is an overly complex system made to try to solve
more than just the initial task. "What is a Waldo" is irrelevant to the task
at hand. Identifying the best candidate for Waldo is the goal. In this
situation simple correlation (or even more efficient correlation methods:
MMCF, OTSDF, QCF, etc) can give good results (except on that all waldos
page...damn that's even hard for a human) most of the time.

To make a euchre analogy - you don't pass on a biddable hand, and you don't
waste your trump cards.

------
wingerlang
And here is the discussion mentioned
[https://news.ycombinator.com/item?id=3367865](https://news.ycombinator.com/item?id=3367865)

------
Freeboots
You could also prioritize your search area, assuming you're searching official
Wheres Wally? images.

"53 percent of the time Waldo is hiding within one of two 1.5-inch tall bands,
one starting three inches from the bottom of the page and another one starting
seven inches from the bottom, stretching across the spread."

[http://www.slate.com/articles/arts/culturebox/2013/11/where_...](http://www.slate.com/articles/arts/culturebox/2013/11/where_s_waldo_a_new_strategy_for_locating_the_missing_man_in_martin_hanford.html)

~~~
ronaldx
"53% of the time, Waldo appears in an empirically determined 25% of the page"

If you repeated this with random data, I expect you could find the same
results: this does not constitute evidence that Waldo is placed in those bands
with priority.

You can mathematically say that 100% of the time, Waldo appears in 0% of the
page (defined by individual points rather than areas). This equivalent
statement clearly has no predictive power.

In my opinion, it would not be reliable to use this on novel Waldo images.

------
akusete
At the time of writing this, the top answer has exactly 1337 upvotes.

