
Monopoly Man – the same image uploaded to Imgur 3.2M times? - kolbe
http://imgur.kosiru.com/
======
geofft
Interesting, but the analysis is a bit short. This was posted to Reddit five
months ago with more discussion:

[https://www.reddit.com/r/self/comments/2uzas7/i_found_out_th...](https://www.reddit.com/r/self/comments/2uzas7/i_found_out_that_roughly_1_of_imgur_images_are/)

They found (as I did, with a simple Google reverse image search) that the
image was used to illustrate a Forbes article in 2011:

[http://www.forbes.com/sites/venessamiemis/2011/04/04/the-
ban...](http://www.forbes.com/sites/venessamiemis/2011/04/04/the-bank-of-
facebook-currency-identity-reputation/)

[https://twitter.com/EricaGlasier/status/563770191440404480](https://twitter.com/EricaGlasier/status/563770191440404480)

Doesn't actually answer the question, but the source of the image is I think
super relevant.

I could sort of imagine some automated process making Imgur thumbnails of
images from links shared on some social media site, and going awkwardly self-
recursive and getting to 3.2M.

~~~
agumonkey
Interesting speculation:

"Sorry to be boring, but Webdriver Torso already gave us the answer: it's a
test image used in integration tests of Imgur itself. Silly image."

[https://www.reddit.com/r/self/comments/2uzas7/i_found_out_th...](https://www.reddit.com/r/self/comments/2uzas7/i_found_out_that_roughly_1_of_imgur_images_are/cod4cv3)

~~~
myohan
why would they have test data in production?

~~~
lukeschlather
The 3.2 million number is highly suggestive:

    
    
        3263847/(365*6*24)
        = 62
    

It sounds like they've been running an integration test which uploads an image
once per minute for pretty much their entire existence (they're a little over
6 years old.)

This is sensible. It's the only way to get instant feedback when the site
stops working for any reason.

~~~
tgb
The sampling was done only for the old-style 5 character URLs. They've since
moved to 7 character URLs. Depending on how that 3.2 million number was
reached, you maybe should be calculating rate using just the time they were at
5 character URLs. I don't know when the switch occurred though.

------
panic
This chart is hard to read:
[http://i.imgur.com/HJYmCLN.png](http://i.imgur.com/HJYmCLN.png)

Since the values are percentages, it's weird that the scale is out of 0.9 --
why not 100? Why leave out the other ~98% from the visualization? The radial
arrangement makes it even harder to read and compare the values.

The effect (at least for me) was to make it seem like these images appeared
much more often than they actually did. I actually found the raw numeric table
below more illustrative!

~~~
VieElm
Pie charts are already a terrible visualization, this is way worse. You're
supposed to compare the area of these half circle shaped rotated slices
easily? Should have just used a bar chart, in fact why wouldn't you use a bar
chart? You could just have a single box and illustrate the percentage amounts
by dividing the box up into different colors to indicate the percentages of a
whole, but this? Nope. Tufte would probably not be pleased.

~~~
jnbiche
Pie charts are excellent for showing parts of a whole, and very poor for any
other type of visualization. Just because people are misusing a chart type
doesn't make it fundamentally useless, regardless of what Tufte thinks.

------
arihant
What are the chances that imgur is throwing this image as an exploit
prevention technique? Maybe there were 3000+ times the author triggered the
rate limit, or some other metric?

The next one called "White Line" appeared 2000+ times too, maybe doing the
same job.

In early days of reCAPTCHA at CMU, Luis von Ahn did a similar thing. There are
a lot of sweatshops to type in CAPTCHAs, they detected that, and instead of
blocking them, they threw at them longer (sometimes entire sentences) CAPTCHAs
to get their help digitizing text.

~~~
JTon
This is interesting and it seems plausible. However, at the beginning of the
article the author said they stumbled upon this image multiple times by
manually entering random 5 character filenames. If it were an exploit
prevention feature, I would hope it would be more difficult to trigger

------
the_mitsuhiko
It's an easteregg? that you get when accessing an imgur album as an individual
image. Because the original user is scraping he basically found 3.2M
galleries.

It only happens with older albums that do not have /a in the URL.

//EDIT: source:
[https://www.reddit.com/r/programming/comments/3clho3/1_out_o...](https://www.reddit.com/r/programming/comments/3clho3/1_out_of_every_120_images_hosted_on_imgur_are/csx6ho7)

------
buddydvd
Clicking the link below creates a new imgur URL:

[http://imgur.com/upload?url=https%3A%2F%2Fnews.ycombinator.c...](http://imgur.com/upload?url=https%3A%2F%2Fnews.ycombinator.com%2Fy18.gif)

Perhaps someone had included a URL like the one above in some wordpress
template.

~~~
jagger27
I like this theory.

------
JohnyLy
This is an old story. The answer is: this is a test image used in integration
tests of Imgur itself.

------
morgante
I'd bet money that it's their standard test image.

~~~
visakanv
Good call. What else would you bet on?

------
megapatch
Why is the author assuming that different URL ids necessarily translate to
different images/uploads? The character based id could simply contain
redundant parts, which favour certain images over others.

~~~
danneu
Seems 1:1

[http://imgur.com/blog/2013/01/18/more-characters-in-
filename...](http://imgur.com/blog/2013/01/18/more-characters-in-filenames/)

------
mosselman
What I find strange is the use of a hash of the image. There are other methods
to detect whether images are visually the same while they have a different
resolution, etc.

From what I remember is that you'd resize all images to a fingerprint of yxy
(lets say 100x100 pixels), apply some filtering in order to normalize lighting
and then you'd XOR 2 fingerprints in order to see how much they are similar.

~~~
lmm
That's more complicated, more likely to false-positive, and I doubt it would
provide much more value. Very few people are resizing images and uploading a
slightly different version; the overwhelming majority of the time, duplicate
images are people uploading the exact same file, byte-for-byte.

------
imgurthing
I too noticed that imgur urls were short some time ago so I put some JS code
together and let it run in background tab for some time and what I've noticed
is that there is ever so often some images that appear from time to time.
There are at least that coke/pepsi logo history image and then there is that
man with big slash on his face.

Never seen that facebook man before for some reason.

If you have a hosting place I can share JS. But basically it just randomizes
urls and if the resulting image is "if(e.naturalWidth == 161 &&
e.naturalHeight == 81)" it removes that image as that is the default size of
the "image not found" image.

------
thebrettd
Is this your site?

Have you tried any
[https://en.wikipedia.org/wiki/Steganalysis](https://en.wikipedia.org/wiki/Steganalysis)?

~~~
steckerbrett
They are comparing SHA256 hashes, so the images are literally identical (and
don't just appear to be identical visually).

------
dluan
I wish there were more of these little experiments being done! What with just
a little more time and resources, imagine what other weird quirky discoveries
are out there.

~~~
arantius
I agree, I've seen lots of fun things like this on the internet. But good
ideas are hard to come by. I did something like this a few years back, for
favicons:

[http://tech.arantius.com/favicon-survey](http://tech.arantius.com/favicon-
survey)

------
mahouse
For those wondering, "TARINGA" is an argentinian forum full of piracy. I
assume some of those images are uploaded by users to illustrate their posts.

------
joshu
Apparently this is the result of exercising a bug in imgur.

------
anotheryou
monopoly mystery solved, but what's up with DJ David?

------
shmerl
Do they have some kind of data deduplication by the way?

~~~
astrodust
Given how an image can be hashed with something like SHA256, it's kind of
surprising that uploading the same image twice doesn't result in the same URL.

~~~
morcheeba
Well, the image title would be different, and that's a significant difference.
"Smiling guy" and "Justin Bieber at my sister's birthday party" are very
different :-)

Also, the "author" and comments wouldn't want to be merged -- different
comment streams for different audiences.

~~~
shmerl
The entry for the image can be different in the database, but there is no need
to actually store the file twice, isn't it?

------
kaivi
Yeah, I've stumbled upon this Monopoly image about 2 years ago, when I was
developing a small JS-based website. It dynamically embedded random pics from
image hosting sites like Imgur, and this Monopoly guy was everywhere.

I guessed that it must have been some sort of massive Facebook spam at some
point in time, or something like this. After all, the guy waves dollars in
your face in exchange for credentials.

------
fugyk
There is a link of DB
attached([https://mega.nz/#!WBg1zAIA!dg0g2Q0kDm1q6r1WBMPe0-nrP3wxvokeF...](https://mega.nz/#!WBg1zAIA!dg0g2Q0kDm1q6r1WBMPe0-nrP3wxvokeF5XlNjTctdg)).
Can anyone verify weather the images actually point to "The Monopoly Man" or
is a test image.

------
chavesn
I tried this, and the _second_ random string I tried was the image in
question:

[http://imgur.com/52nu1](http://imgur.com/52nu1)

------
hudell
That was actually the first image I got when trying 5 random characters on an
URL.

------
crucifiction
Probably some kind of production canary or integration test in their
deployments.

------
agumonkey
weird project link
[https://www.reddit.com/message/compose/?to=TheGamble&subject...](https://www.reddit.com/message/compose/?to=TheGamble&subject=MONOPOLIED%20MEN)

