
Open Sourcing a Deep Learning Solution for Detecting NSFW Images - pumpikano
https://yahooeng.tumblr.com/post/151148689421/open-sourcing-a-deep-learning-solution-for
======
brianwawok
So can it be reversed to become the ultimate porn-finding neural network?

~~~
XCSme
When you said "reversed" I thought about it being a porn-generative neural
network. Just enter your favorite keywords and an unique, tailored to your
needs scene will be generated just for you!

~~~
zardo
As long as what you need is a nightmarish version of the requested scene.

What happens here with generating illegal content? If you put a public
text->image gan up, and someone uses it to generate child porn, are you
responsible?

How could you make sure it couldn't?

~~~
BinaryIdiot
I would imagine if you're generating then no real people were part of its
creation therefore it would be legal. If I remember correctly cartoons of
children having sex is not illegal (in the United States as far as I know).

Though that then raises the question: when happens when it can be generated so
realistically that it looks indistinguishable from the real thing? Would it
still be treated like cartoons? How could you prove one way or the other? Lots
of questions here.

~~~
daturkel
In the USA:

Provisions against simulated child pornography were found to be
unconstitutional in Ashcroft v. Free Speech Coalition [0] in 2002.

From wiki [1]:

> Referring to [New York v. Ferber, 1982: child pornography is not protected
> speech], the court stated that "the CPPA prohibits speech that records no
> crime and creates no victims by its production. Virtual child pornography is
> not 'intrinsically related' to the sexual abuse of children".

IANAL, but following that logic alone, the degree of realism doesn't seem to
be relevant to the legal precedent insofar as photorealistic imagery would
still "record no crime" nor "create victims by its production." As to whether
it's dangerous for such material to exist because it would create plausible
deniability for the production of actual photography while claiming it's
simulated...I guess that would be a different matter.

[0]:
[https://en.wikipedia.org/wiki/Ashcroft_v._Free_Speech_Coalit...](https://en.wikipedia.org/wiki/Ashcroft_v._Free_Speech_Coalition)

[1]:
[https://en.wikipedia.org/wiki/Child_pornography_laws_in_the_...](https://en.wikipedia.org/wiki/Child_pornography_laws_in_the_United_States#Simulated_pornography)

~~~
voxic11
They actually fixed this with the 2003 PROTECT act which only makes obscene
simulated child pornography illegal. Because obscenity isn't protected by the
first amendment this has been found to be constitutional.

~~~
justinlardinois
It's hard to say that changed anything. Obscenity is a notoriously thorny
subject in American constitutional law. The legal test for determining
obscenity[0] is highly subjective and in the context of the internet very
difficult to apply.

[0]
[https://en.wikipedia.org/wiki/Miller_test](https://en.wikipedia.org/wiki/Miller_test)

~~~
Normal_gaussian
Highlights from:
[https://en.wikipedia.org/wiki/Miller_test](https://en.wikipedia.org/wiki/Miller_test)

The test:

* Whether "the average person, applying contemporary community standards", would find that the work, taken as a whole, appeals to the prurient interest,

* Whether the work depicts or describes, in a patently offensive way, sexual conduct or excretory functions specifically defined by applicable state law,

* Whether the work, taken as a whole, lacks serious literary, artistic, political, or scientific value.

Also:

Critics of obscenity law argue that defining what is obscene is paradoxical,
arbitrary, and subjective.

\---

I think it would be hard to not find generated child porn obscene by this
test, unless you have a good lawyer, at which point there is plenty of wiggle
room.

~~~
justinlardinois
I think the technical feats involved in creating such a text-to-image program
might allow a talented laywer to make an argument for scientific value.

There's also the issue that "contemporary community standards" are hard to
determine, because what community you're talking about is hard to determine.

------
bahro
I should update my sexy map finder: [http://exclav.es/2016/05/20/sexy-
maps/](http://exclav.es/2016/05/20/sexy-maps/)

~~~
the_af
Those are some hot map pictures!

I don't understand how Google's algorithm can be misled into finding sexiness
in those. I imagine it has something to do with skin tones or flesh colors,
but then what about the high-contrast patchwork of green and brown fields
Google finds "likely to contain adult content"? That's totally puzzling.

The confusion with medical images is way more understandable. If you squint,
you can almost imagine those are pics of skin cancer or lesions.

Oddly enough, I even see violence in the "violent" picture. In an abstract,
Rorschach Test sort of way. Well done, Google!

~~~
re
> I don't understand how Google's algorithm can be misled into finding
> sexiness in those.

I'm reminded of a paper for which the authors generated different pictures of
static that fooled neural network image classifiers into confidently
identifying them as different objects:
[https://arxiv.org/abs/1412.1897](https://arxiv.org/abs/1412.1897)

Wired summary: [https://www.wired.com/2015/01/simple-pictures-state-art-
ai-s...](https://www.wired.com/2015/01/simple-pictures-state-art-ai-still-
cant-recognize/)

> Computer vision and human vision are nothing alike. And yet, since it
> increasingly relies on neural networks that teach themselves to see, we’re
> not sure precisely how computer vision differs from our own. As Jeff Clune,
> one of the researchers who conducted the study, puts it, when it comes to
> AI, “we can get the results without knowing how we’re getting those
> results.”

------
inlined
Forgive my ignorance of ML but the last bit: "you'll need your own porn to
train on" confused me. Does this mean that they're just exposing the rough
topology of their neutral net (eg depth) and not the actual weights between
nodes? I'm curious to learn from an ML expert how much this actually offers.

~~~
zardo
It looks like they are sharing the trained network, but they aren't sharing
the training data set.

~~~
daturkel
The training set is almost certainly composed of copyrighted material.

~~~
Asooka
Interesting thought - doesn't every single porn producer now have a valid
copyright claim on the trained network? I don't see how you can argue this
isn't a derivative work based on the movies they produced.

~~~
derefr
I was debating with a friend about just this: whether a text-to-speech model
is a derivative work of audio recordings by a given speaker, such that they'd
then have claim on ownership of it. (You could almost certainly create an
[overfit] model that could _re-generate_ the original performance of a text
from said text.)

Moot if it was a work-for-hire, of course; but if I, say, created a Samuel L.
Jackson speech model by training on samples from his movies, and sold it as
one of those car-navigation voices, could I be sued? By Mr. Jackson? By the
copyright-holders of the movies?

And if I _could_ , what does that imply about impersonators, who do the same
thing, but with their brains?

~~~
Joof
I imagine you couldn't put his name on it which would be a huge deal if you
wanted to sell it.

------
wildpeaks
Direct link to Github:
[https://github.com/yahoo/open_nsfw](https://github.com/yahoo/open_nsfw)

------
zfedoran
Has anyone tried taking the features that are learned at the various layers of
a neural net and feeding them into something like this:
[https://news.ycombinator.com/item?id=12612246](https://news.ycombinator.com/item?id=12612246)?

I imagine we would get some really interesting images back...

------
echelon
Can you run deep dream on this? That would be quite fascinating.

~~~
RandomInteger4
I think you misspelled "horrifying". I can only imagine that producing
something akin to the human centipede ... "Infinite Girls, One Cup" ...

------
NicoJuicy
We are not releasing the training images or other details due to the nature of
the data, but instead we open source the output model which can be used for
classification by a developer.

I'm guessing the one who had to input the data/images had a fun time at work
:p

~~~
bazzargh
I used to work in a company which had a division doing manual image
classification next door. Not a fun time at all, the people who worked there
regularly burned out on relentlessly seeing terrible things.

~~~
davegauer
I've often thought that it would be even more helpful to automatically filter
violent images. Particularly to spare humans from having to be the filters
(and "relentlessly see terrible things").

However, I imagine that's far more difficult to accomplish. How do you detect
graphic violence? Looking for blood isn't going to cut it. Also, I can't
imagine how you'd separate the fictional from the real - I can watch horror
movies with realistic special effects all day, but real
violence/mutilation/death bothers me deeply.

------
darklajid
They acknowledge that NSFW (or pornographic) is hard to define, a la 'I
recognize it if I see it'.

But looking at the meager 3 sample images I'm confused about the scoring
already. Why is the one in the middle scoring the highest?

The question is an honest one. The two rightmost images seem to be
interchangeable to me and are ~boring~: People at the beach. Is this network
therefor already trained to include the biases of the creators?

~~~
plingamp
All ML networks are inherently biased towards its creators. My colleague
recently described this issue to me as the "Old, white, male" problem. This is
why most voice recognition services drastically fail when they are shown
foreign accents.

~~~
vidarh
> This is why most voice recognition services drastically fail when they are
> shown foreign accents

As someone with a broad Norwegian accent: This has gotten massively better
over the last few years.

Not that long ago, my local cinema chain started using voice recognition to
discriminate between a list of city names, and it would consistently think I
said "Birmingham" when I said "London" (!).

These days, both my Amazon Fire and the Youtube app will correctly recognise
most things I throw at it, including e.g. names of random Youtube channels
that bear no relation to real English words.

It's by no means perfect, but it's getting there. In relation to the "old,
white, male" problem (well, I do somewhat fit that), presumably because these
systems are now finally trained on huge and varied data sets.

------
eggoa
I wonder if anyone at Yahoo tried using this to "deconvolute" noise into
Cronenberg nightmare porn?

------
TrueDuality
Sit back grab some popcorn. Lets see how long it takes people to start running
data backwards to get new original porn.

~~~
pluma
I see you never have seen what kind of images that tends to generate.

Or you're really into R'lyehian porn.

~~~
venomsnake
Cthulhu has nice tentacles. So I guess it will be run of the mill hentai.

~~~
pluma
You have no idea. Try googling for "deep dream porn".

------
lifeisstillgood
My first thought was from years ago, when I was pitching open source forensic
services to London police (did not get far, bad Salesman that I am)

Cataloging, categorising pornography seized is a nasty job and one that cops
across the planet might do better with good common OSS tools.

Hopefully this will help

------
c3534l
My first thought: would probably be very useful for sites to crack down on
inappropriate content.

My second thought: I could probably use this to find porn in unexpected places
via a webscraping Python program.

------
m-i-l
Good to see they've automated this (beyond the initial classification of
training data). In the early days of the web, such filters were typically
based on manually maintained lists of sites. I actually met someone at a party
once whose full-time job was to surf for porn, to maintain the filter for a
provider of IT services to schools (he worked for a company now called RM
Education). He said it was his ideal job for the first few days, but soon grew
tiresome (note that back in those days there wasn't really any extremely
objectionable material on the web).

------
SloopJon
Anyone else see the irony in acknowledging that NFSW is subjective and
contextual, but assuming that pornographic images are not?

~~~
CodeMage
There's no definition of "ironic" that I can think of applying to that. It's
like saying that "beautiful painting" is subjective and contextual, but "still
life painting" isn't. It just happens that pornography is almost[1] always
considered NFSW, but again, I can't see how that is ironic.

[1] Almost, because porn is SFW when your work involves porn.

~~~
SloopJon
I think it's easier to agree that a picture of a naked person is NSFW than
that it's pornographic.

------
joshmn
I'm not a deep learning person whatsoever, but I do have an interesting use
case that I won't disclose publicly: Is there a way to build this, and output
detections based on the, ugh, object it has detected?

e.g.

penis 0.94

vagina 0.01

~~~
pjreddie
Yes. You would have to have a large training set with these labels but it
would be pretty straightforward to train. You would probably want a tagging
model not a classifier because there could be multiple objects of interest in
the same image. If you get me the training data I could train a model for you
pretty quickly.

~~~
bbctol
And it would be an... interesting job to tag the training set. Although for
higher level content, I suppose lots of porn videos have very specific
category tags that could be an interesting data set to play with. Uh, to
analyze.

~~~
inlined
Iirc you can use ML to learn the tags as well. They tend to be in text
surrounding the media

------
slowmovintarget
I think I'll pass on browsing the the deep dream visualizations for this.

------
zuzun
With access to Flickr and Tumblr it must have been very easy to create a huge
training set for such a task.

------
prirun
Aren't there more important problems to work on than worrying about someone
looking at naked people? This is just what we need: more effort spent on
censoring and controlling people.

~~~
myth_buster
Won't you say that preventing a kid from accidentally viewing porn while
searching images with `Safe Search: On` an important problem?

~~~
prirun
Well, considering the other shit they look at and participate in, like video
games with people killing each other and blood spattering everywhere, I'd
rather they were viewing naked, non-violent people.

------
cvwright
Reminds me of this post from hackerfactor where he describes his own porn
filter based on pHash.

[http://www.hackerfactor.com/blog/index.php?/archives/529-Kin...](http://www.hackerfactor.com/blog/index.php?/archives/529-Kind-
of-Like-That.html)

It'd be interesting to see a direct comparison of the two. Off the cuff, I'd
expect the deep neural network to be more accurate and better at generalizing,
but much more expensive to train.

------
Dim25
another work in this field: "Adult video content detection using Machine
Learning Techniques" PDF:
[http://colorlab.no/content/download/37238/470343/file/Victor...](http://colorlab.no/content/download/37238/470343/file/VictorTorres_MasterThesis.pdf)

------
johnnyo
I'll bet this would be a good tool for sysadmins or network administrators to
run against their network and see what it finds.

------
chadscira
Awesome!

I have been using nude.js to do this (
[http://s.codepen.io/icodeforlove/debug/gMrEKV](http://s.codepen.io/icodeforlove/debug/gMrEKV)
), which is hit or miss.

------
ganwar
To be precise they are only releasing the already trained model. The
associated dataset is not being made public.

Thus, it is meant to be for off the shelf use rather than being able to tinker
with the network to produce nuanced results.

~~~
mattthebaker
Or they just don't want to be distributing gigabytes of porn... most of which
is probably under copyright.

Making the data set available and whether you can tinker with or retrain it
are very different things.

------
patrickaljord
I wonder what would happen if we stopped firing people for watching NSFW
images. I mean bosses look at NSFW images all the time and it sounds like a
shallow reason to fire someone.

~~~
davidgerard
Creates a hostile work environment, however.

------
Joof
Are there any other fairly basic image recognition problems that people want?
I'd be happy to provide as long as a dataset is easy to collect.

------
CompanionCuube
Has anyone run this NN on the censored Facebook image?

------
askew
Interesting that the photo of two women on the beach is given a higher NSFW
rating than the photo of a man on the beach.

------
Happpy
Could this work on mobile to detect 18+ content in images or video? Or would
it be a trained library of 50mb+?

------
KennyCason
I literally just started working on this problem 2 hours ago >_<

------
zwindl
That's it! That's what I'm looking for.

------
matheweis
is this just the network or is it a fully trained model? The TechCrunch
article suggests the former but the yahoo post the latter...

~~~
tedd4u
It's the trained model. You can use it out of the box or refine it to tailor
to your environment.

~~~
matheweis
Nice, was on phone only at the time and not able to dig deeper.

Took a couple hours to get it all up and running but indeed it works, and not
half badly at that!

This is obviously a way cheaper alternative to
[https://sightengine.com](https://sightengine.com) or
[http://imagevision.com](http://imagevision.com)

Kudos to Yahoo for releasing this!

------
cft
I hope this is ported to TensorFlow soon!

------
rasz_pl
oh silly Americans, its just tits

------
yk
I would suggest, that the link should go to Yahoo's blog post

[https://yahooeng.tumblr.com/post/151148689421/open-
sourcing-...](https://yahooeng.tumblr.com/post/151148689421/open-sourcing-a-
deep-learning-solution-for)

which contains some technical details. (And furthermore, I guess the HN crowd
has enough Internet experience to come up with stupid jokes of their own
design.)

------
BinaryIdiot
The Yahoo blog[1] post is far more interesting than this techcrunch "article".
Suggest changing URL to the Yahoo Blog please.

[1] [https://yahooeng.tumblr.com/post/151148689421/open-
sourcing-...](https://yahooeng.tumblr.com/post/151148689421/open-sourcing-a-
deep-learning-solution-for)

~~~
sctb
OK, we've updated the link from [https://techcrunch.com/2016/09/30/yahoo-open-
sources-its-por...](https://techcrunch.com/2016/09/30/yahoo-open-sources-its-
porn-detecting-neural-network/).

------
Bud
So this is what Yahoo was up to for the last 10 years, instead of building any
sort of security, keeping Yahoo Messenger working properly, or anything else
of value? Heckuva job, Yahoo.

