
Ask HN: How do Imgur.com and others check that images are not malicious? - nkkollaw
On Imgur and other sites you&#x27;re allowed to upload an image without even opening an account.<p>What checks do they probably have in place to check that the image isn&#x27;t actually a malicious one?<p>Is there something that a huge site is able to implement or use instead of the checks that anyone can implement?
======
jrpt
By malicious, do you mean inappropriate, or containing a malicious metadata
inside the image? The images are probably stripped and/or compressed, removing
any malicious metadata.

You can use machine learning to classify images of various categories to
detect inappropriate images. For example, this is what Yahoo open sourced:
[https://github.com/yahoo/open_nsfw](https://github.com/yahoo/open_nsfw)

Additionally, user-generated content sites will almost always have a way for
users to flag inappropriate content. If something's flagged
disproportionately, it will be removed.

~~~
nkkollaw
I just mean viruses/malware.

------
stevekemp
One of my friends joked that the fastest way to acquire a stash of child-porn
would be to setup a free image host.

A lot of the larger image-hosting sites compare uploads against known-bad
hashes, and otherwise fingerprint images to try to determine if they're adult
or not. This is a fragile process, as you can imagine.

One thing that imgur, in particular, does is strip EXIF tags which probably
goes hand-in-hand with downsizing and compression-tweaks. I find it
frustrating when my images are stolen and uploaded there with all author-image
removed, but it seems many of the community-based internet hosts thrive on
copyright infringement. Meh.

~~~
nkkollaw
Ha!

I wouldn't think EXIF data is that huge, images are pretty heavy usually.

------
uwu
here's my internet power user opinion:

a malicious image, as in that opening it in a vulnerable program (browser or
image viewer) would allow code execution?

those exploits usually require specially crafted image files, so i think re-
encoding and optimizing the image (like using `jpegtran -optimize -copy none`
or `optipng`) would possibly do for "sanitizing" the file

a code execution flaw in jpegtran or optipng would be worse though, but
sandboxing might work to protect against them (i don't know much about
sandboxing though)

~~~
nkkollaw
Attacks where the image isn't actually an image must be the minority, simply
because an error would be thrown right away.

Attacks most likely focus on vulnerabilities of image manipulation software:
GD, imagemagick, Vips, Gifsicle, etc.

It makes sense to use jpegtran.

------
sova
There is a job called Content Screener and these are the people that basically
view all the content before it gets to the Internet.

~~~
nkkollaw
Perhaps I didn't phrase my question correctly, but I meant more malicious
scripts (malware) then content (pedophilia).

~~~
ge96
Just a random thought, you could ask the FBI to borrow the content they seize
to train your computer vision systems hahaha. How can you train something to
know what CP is without it. Look for this blob of skin, smaller than this blob
of skin, where these smaller blobs touch.

