
What a Deep Neural Network thinks about selfies - vkhuc
http://karpathy.github.io/2015/10/25/selfie/
======
lqdc13
A guide on how to take a good selfie that others will like:

    
    
      be female
      be blonde
      be attractive
    

Incidentally, Christian Rudder did a really good "study" on the dating site
pictures a few years ago:

[http://blog.okcupid.com/index.php/dont-be-ugly-by-
accident/](http://blog.okcupid.com/index.php/dont-be-ugly-by-accident/)

~~~
steve_taylor
A better guide on how to take a selfie:

    
    
        Don't take a selfie.

------
danblick
This is neat. I bet Facebook or OkCupid are sitting on all sorts of click data
that could be used to develop tools for helping people make their photos look
better. (Even if, personally, I can't wait for a cultural backlash against
internet narcissism...)

[Edit: Even better, he didn't use click data to train the model, just public
likes.]

~~~
visarga
The idea to use a convnet to reframe the selfie is neat. Makes it 5% better.
Also, if it can be run on the phone, it could possible warn people they are
about to post a shitty selfie before they do.

------
anunderachiever
I would like to see a deep dream selfie ...

Feed it an initial picture (noise, clouds, a selfie) and then backwards
manipulate the input to maximize the assessed quality of the "selfie".

I guess that would look pretty funny.

~~~
Tyr42
He did run something like that for cropping. He showed his favourite two
"rude" ones at the bottom, where the 'Net cropped out the face of the person
taking the selfie.

~~~
yoha
Actually, he used random crops and selected the highest rated. A "deep dream
selfie" would actually run the neural network in reverse so as to generate a
completely different image.

------
misiti3780
One thing I always found interesting is Lecun is credited with developing
covnets, but Hinton is apparently credited with scaling them and showing the
world how great they are in the paper from 2012 - why was Hinton's group
(Toronto) able to publish these ground breaking results before Lecun's group
(NYU)

~~~
pramodliv1
Geoff Hinton answers this question in episode 6 of the Talking Machines
podcast. [http://www.thetalkingmachines.com/blog/2015/3/13/how-
machine...](http://www.thetalkingmachines.com/blog/2015/3/13/how-machine-
learning-got-where-it-is-and-the-future-of-the-field)

Geoff Hinton had grad students who wanted to work on the problem, but Yann
LeCun didn't.

"In about 2012, it should have been Yann's group, but Yann was unlucky, he
didn't have a student who really wanted to do it. But we had a couple of
students who wanted to do it and we took all of Yann's techniques and added
some of our own."

~~~
misiti3780
interesting - i took the course but did not notice that - thanks!

------
nightpool
>Be female. Women are consistently ranked higher than men. In particular,
notice that there is not a single guy in the top 100.

This sounds true, but it can't be the real reason—selfies are ranked relative
to the other images by the _same user_. So unless users are taking a lot of
#selfies of people of different genders, we can assume the dataset is already
controlled for the gender of the person in the image, no? Unless there's some
confounding factor at play, such as some demographic segment being more likely
to optimize for good selfies occasionally but have boring feeds the rest of
the time.

would be super interesting, if the data is available, to normalize this by
exposure. Of the people that saw an image, how many clicked "like"?

~~~
the8472
Well, one of the other factors is long hair and the tendency to oversaturate
the face. Those factors don't seem independent to me, men are less likely to
sport long hair and they're also less likely to oversature the face to measure
up to some skin perfection standards (think of it as the photographic
equivalent of makeup).

> but it can't be the real reason

Can't? Ontop of the above-listed aspects it is entirely possible that there is
a bias that both sexes find female appearance somewhat more aesthetically
pleasing.

Similar to how focus group testing for computer voices tends to result in
female voices being chosen (at least that's what I often hear, couldn't find a
solid source).

Even if the bias is small the correlated factors would amplify it when you're
optimizing for a maximum, i.e. for the top selection.

~~~
nightpool
Neither of those explain why it would rank above the average of _other female
faces_ , in general.

Discussion about this with the author reveals that I was misinterpreting how
they were collecting averages. I was assuming the "like" count was coming from
each photo collected, but instead they collected the photos and average likes
in individual steps, where the average likes were across recent posts by that
user, rather then the selfies by that user.

~~~
karpathy
I screwed up on this point by the way - I had done this part of the experiment
a few months ago and I incorrectly remembered the details. I went back and
looked through the code and adjusted the post with more regarding this
important point. In particular:

"Now it is time to decide which ones of those selfies are good or bad.
Intuitively, we want to calculate a proxy for how many people have seen the
selfie, and then look at the number of likes as a function of the audience
size. I took all the users and sorted them by their number of followers. I
gave a small bonus for each additional tag on the image, assuming that extra
tags bring more eyes. Then I marched down this sorted list in groups of 100,
and sorted those 100 selfies based on their number of likes. I only used
selfies that were online for more than a month to ensure a near-stable like
count. I took the top 50 selfies and assigned them as positive selfies, and I
took the bottom 50 and assigned those to negatives. We therefore end up with a
binary split of the data into two halves, where we tried to normalize by the
number of people who have probably seen each selfie. In this process I also
filtered people with too few followers or too many followers, and also people
who used too many tags on the image."

~~~
simula67
Still no men in the top 100 ? There must be something deep to learn about the
difference in sexes there, I am just not sure what it is.

------
JabavuAdams
How to take a good selfie: don't be black or dark-skinned, unless you're a
celebrity.

How do we prevent our AIs from learning racism?

EDIT> Informative article, BTW. A good read.

~~~
Lawtonfogle
If a given question has an answer that is due to racism, the answer is still
the answer. For example, if society has some underlying racism that factors
into what it considers attractive, that doesn't change what it considers
attractive.

I don't think these algorithms are learning racism. They are only being blunt
in revealing what already exists.

~~~
JabavuAdams
> If a given question has an answer that is due to racism, the answer is still
> the answer.

That's why it's important to be clear about the question. This ConvNet doesn't
really answer the question "What makes a good selfie". It answers a much
narrower and more complicated to state question.

The absence of reflection in the system means that if it's used to answer a
question that's superficially similar to the designer's intent, there's no way
to reason around the bias in the training data.

Imagine I'm a Canadian who trains an automated turret to classify friend / foe
based on data from Afghanistan and Iraq. I've not trained the system to answer
"Is this group of pixels a friend / foe", in the general sense. If the system
is used outside the narrow context of its validity, say in Northern Ireland,
or in a civilian Muslim neighbourhood in Paris, we should expect bad results.

So you're right to point out that the racism is in the social context. But I'm
arguing that we don't actually want a classifier to learn that if there's a
good chance it'll be used in a way that discards or ignores that social
context. Same as using an expert system outside its domain.

------
vonnik
I think it's less about the head getting chopped than about having "the head
take up about 1/3 of the image," as Karpathy says. So what the net is learning
is composition, or balance in an image, which is really cool. The rule of
thirds is actually pretty well know to people in photography:

[https://en.wikipedia.org/wiki/Rule_of_thirds](https://en.wikipedia.org/wiki/Rule_of_thirds)

(Our deep-learning framework
[http://deeplearning4j.org](http://deeplearning4j.org) missed his list, but
it's got working convnets, too.)

~~~
Jack000
possibly, but none of the cropped examples have cropped chins. It's also well
known in photography that you can cut off someone's forehead, but never their
chin.

~~~
pshc
Echoing a law of video games: "nobody looks up"!

------
netheril96
One caveat with these machine inspired knowledge: they are prone to error,
probably more than humans, at least for now.

For example, if you train a CNN directly with human faces, its recognition
rate comes way below what a human is capable of. Only after you apply tons of
handcrafted optimizations, which are mostly black art, will you get close to
or surpass a human's capability. Without much domain specific tuning, an AI's
insight is far from reliable.

~~~
nl
This is more wrong than right.

The _example_ is correct, but not for the reasons stated. Humans are very,
very good at face _recognition_. However, CNNs are pretty close to human
performance for face detection.

 _Only after you apply tons of handcrafted optimizations, which are mostly
black art, will you get close to or surpass a human 's capability. Without
much domain specific tuning, an AI's insight is far from reliable._

This just isn't the case. Take the GoogLeNet or VGGNet papers, build the CNN
as described using Caffe/whatever, train as described in the paper and you'll
end up with something that is pretty much on par with human performance for
categorizing ImageNet images.

Take that same CNN architecture, and retrain it for another domain and it will
perform roughly as well there too, for the task of categorizing into ~1K-10K
image classes.

This _isn 't_ domain specific tuning. It's domain specific _training_ , which
is very different (although collecting the data is a big job).

 _Only after you apply tons of handcrafted optimizations, which are mostly
black art, will you get close to or surpass a human 's capability._

For CNNs, this is pretty much entirely false.

~~~
netheril96
A GoogleNet or VGGNet has tons of parameters. How many convolutional layers
are stacked together, the size and stride of each one, where to put the
dropout layers, where to put the full connection layers, how they are
connected together, global learning rate and momentum and decay, local
learning rate and momentum and decay, each of these myriad parameters have an
unpredictable effect on the final result. The initialization of the network
also has a major bearing on the final outcome. It is almost a chaotic system
where nothing small can be safely ignored. One time my result of training a
CNN was swung by the `batch_size` parameter and to this day I don't know how.

Those parameters are exactly the type of _handcrafted optimizations_ I am
talking about. You cannot just fill in arbitrary numbers and expect the
network to fare well. In fact, you cannot even expect it to converge.

You can take those papers and build a world class classifier only because
someone else has taken all the time to optimize for the specific case. Once
you switch the task, the result will be OK, but nowhere close to what a human
or a true AI would give you. Not until you take the time to optimize the
parameters.

~~~
nl
_A GoogleNet or VGGNet has tons of parameters._

Kinda, but they are defined for you. For example the GoogLeNet design is
described in[1]. Page 5 lists the parameters, the diagram on page 6 shows how
the layers are linked.

Yes, I agree that the design of a new neural network architecture is a skilled
process, and there is a lot of hard work there. I couldn't agree with that
more, but that isn't what we are talking about here.

It is quite possible to take a CNN like GoogLeNet designed for a specific
purpose and reuse it in similar situations. GoogLeNet will always do pretty
well for image classification.

I think of it as analogous to a piece of software like a database. Designing a
new database system is hard, but taking something like SQLite and using it is
easy. Yes, you can tune it and get better performance out of it, and yes, it
will break if you use it in the wrong circumstances, but it is generally
pretty reliable if used as designed.

Now this analogy breaks down because industrial use of CNNs is pretty new
compared to Database systems. It's more like trying to get msql running on
your Slackware 0.9 system in 1993 it is getting Postgres on Ubuntu 15.10.

Nevertheless, there isn't really a black art to using an existing CNN. Lots of
schlepping to get CUDA running on your machine, though.

[1] [http://www.cv-
foundation.org/openaccess/content_cvpr_2015/pa...](http://www.cv-
foundation.org/openaccess/content_cvpr_2015/papers/Szegedy_Going_Deeper_With_2015_CVPR_paper.pdf)

[2] Not MySQL, msql:
[https://en.wikipedia.org/wiki/MSQL](https://en.wikipedia.org/wiki/MSQL)

------
RealityVoid
It seems this neural network has a sense of humor if you look at the "Finding
the Optimal Crop for a selfie" area.

You can see it optimized the last selfie by cropping the face fully out of the
picture.. :))

------
spikels
DNN is a key technology of the future. I highly recommend the education
program Professor Karpathy mentions at the end of this post. All are excellent
and free.

------
amai
I have seen similar results before: [https://medium.com/the-physics-arxiv-
blog/the-algorithm-that...](https://medium.com/the-physics-arxiv-blog/the-
algorithm-that-sees-beauty-in-photographic-portraits-435ab8064646)

------
JoachimS
A really good read. Good intro to ConvNets, a well designed and implemented
test. Ad funny.

------
trhway
looking at the top 100 one can only wonder how Hollywood has figured it out
well before mighty power of computer :)

~~~
goodJobWalrus
For me, this thing about having the top of your head cut from the picture is
new. Who would have thought..

~~~
falcolas
Makes a bit of sense, in combination with the "be female" advice, cutting off
the forehead puts the center of the photograph closer to her cleavage, and
typically shows off her entire chest.

~~~
goodJobWalrus
Cleavage does not feature a lot in the top 100 actually, but I'm half way
there, in a sense that I'm a female. I'll definitely try the half-forehead
thing next time!

~~~
visarga
We could try and see if the activation for good selfies comes from the
cleavage or the eyes.

------
thewhitetulip
Well, you don't need to ask a deep neural network to say that selfies are
getting stupid daily with teens sticking their tongues out

~~~
visarga
BEEP BEEP. Bad selfie detected. You run the risk of making a fool of yourself!
BEEP BEEP

