
The infamous “AI gaydar” study was repeated – couln't be reproduced - maze-le
https://www.theregister.co.uk/2019/03/05/ai_gaydar/
======
czr
This title ('The infamous “AI gaydar” study was repeated – couln't be
reproduced') is very incorrect; it does not correctly summarize either the
linked article or the actual paper
([https://arxiv.org/pdf/1902.10739.pdf](https://arxiv.org/pdf/1902.10739.pdf))
that the article is discussing.

From the abstract on the actual paper:

> Using a new dataset of 20,910 photographs from dating websites, the ability
> to predict sexual orientation is confirmed... While demonstrating that
> dating profile images carry rich information about sexual orientation these
> results leave open the question of how much is determined by facial
> morphology and how much by differences in grooming, presentation and
> lifestyle. The advent of new technology that is able to detect sexual
> orientation in this way may have serious implications for the privacy and
> safety of gay men and women.

So, the study was repeated, and _could_ be reproduced. However, it's still not
clear if there's any merit to the facial morphology hypothesis, vs. the (less
worrying) hypothesis that the models are just picking up on differences in
grooming and presentation.

------
rahimnathwani
This article mentions the words accuracy/accurate/accurately 10 times, e.g.

"when using his own dating-site-sourced dataset, was accurate at predicting
the sexuality of males with 68 per cent accuracy – better than a coin flip"

But accuracy seems like a poor measure for something like this, when the
population is highly unbalanced. It's trivial to create a classifier with high
accuracy: just outputting 'heterosexual' every time would yield ~90% accuracy
on faces of the general population.

~~~
DanAndersen
This is true, which is why machine learning has long since learned to not even
think of what you describe as a meaningful measure of accuracy. If you look at
the linked paper [0], you'll find that the author uses the "ROC AUC" metric
[1]:

>The ROC AUC score represents the probability that when given one randomly
chosen positive instance and one randomly chosen negative instance, the
classifier will correctly identify the positive instance

[0]
[https://arxiv.org/pdf/1902.10739.pdf](https://arxiv.org/pdf/1902.10739.pdf)
[1]
[https://en.wikipedia.org/wiki/Receiver_operating_characteris...](https://en.wikipedia.org/wiki/Receiver_operating_characteristic)

~~~
rahimnathwani
Thanks. That makes more sense.

The article didn't mention AUC, so I assumed they were talking about accuracy
in the sense people normally mean it, which also matches the definition in the
sidebar of the wikipedia link you shared:

(TP + TN) / (P + N)

------
EGreg
People don’t usually look at others completely naked and shaved and devoid of
any other things like fashion and style. Why is this not allowed to be used by
the AI?

------
astazangasta
What's really astonishing about this is how opaque the behavior of the CNN is.
We have no idea why it is doing what it is doing, and multiple papers and
orthogonal investigations are required to figure it out.

------
malux85
I'd be interested to see a class activation map to see if there's anything in
it, probably not...

~~~
np_tedious
"class activation map" is a new term to me, so I looked it up very briefly.

Does it mean simply "what part of the image (or in aggregate, images) carried
the strongest signal for categorization"? Or can it also tell you something
about the feature set?

Would you suggest and good introductory example to the concept? This was the
best I found in my search [https://jacobgil.github.io/deeplearning/class-
activation-map...](https://jacobgil.github.io/deeplearning/class-activation-
maps)

~~~
czr
From the original paper
([http://cnnlocalization.csail.mit.edu/Zhou_Learning_Deep_Feat...](http://cnnlocalization.csail.mit.edu/Zhou_Learning_Deep_Features_CVPR_2016_paper.pdf)):

"A class activation map for a particular category indicates the discriminative
image regions used by the CNN to identify that category"

So, it's mostly what you said, but the maps you get are specific to the
particular trained network you're using; the class activation map tells you,
for a given network, for a given image, for a given class, which image regions
did that network consider relevant to that class.

For learning more about interpreting neural nets I would highly recommend the
posts on distill.pub; [https://distill.pub/2018/building-
blocks/](https://distill.pub/2018/building-blocks/) and
[https://distill.pub/2019/activation-
atlas/](https://distill.pub/2019/activation-atlas/) are probably most
pertinent.

------
snrji
I assume in both cases the dataset was balanced. Otherwise metrics such as
accuracy can be meaningless.

