
Physiognomy’s New Clothes (on the Limits of AI) - herf
https://medium.com/@blaisea/physiognomys-new-clothes-f2d4b59fdd6a
======
rjdagost
It sounds like there are some valid methodological criticisms that can be made
of this criminal prediction paper. But I don't like the rush to condemn the
paper as "pseudo science" just because it has implications that the OP is
uncomfortable with. The OP's main point, which could have been made far more
succinctly, is that "criminality" is a human judgment that encodes bias, and
therefore the paper authors' machine learning methods were merely trained to
learn these biases. I don't think anyone will claim that there's a perfect
criminal justice system anywhere. But to dismiss these results off hand on the
basis that criminal justice systems are imperfect is extremely hasty. The OP
is too close minded by politically correct ideology to even consider that one
might actually be able to accurately predict criminality with such a
methodology.

There are many behaviors we can predict about a person with a high degree of
confidence just by looking at an image of him. Why is predicting criminality
not one of these behaviors? If the premise is nonsense then it won't stand up
to further scrutiny. OP should do a larger study that addresses the
methodological deficiencies in the original paper. Definitively shoot down
this paper with hardcore research and gain widespread critical acclaim.

~~~
panic
The article's point is broader than that: the entire idea of physiognomy, "the
practice of using people’s outer appearance to infer inner character", is not
scientifically defensible. There's no mechanism that would cause someone's
inner character to be reflected in their appearance in any consistent way.
This remains true even if it's a computer algorithm, not a person, making
judgements based on people's appearances.

~~~
mindviews
"There's no mechanism that would cause someone's inner character to be
reflected in their appearance in any consistent way." That's a pretty big
leap. Down Syndrome has consistent physical characteristics that correlate
with a particular set of cognitive/behavioral characteristics. While it is
important to carefully critique scientific findings that may be motivated by
political biases, it is also important to give science as a process the chance
to find truths even if we might not like their political implications.

~~~
panic
Sure, that's true, and I bet a Down-Syndrome-recognizing neural net could be
made quite accurate. I guess what I mean to say is that there's no _general_
mechanism that would cause someone's inner character to be reflected in their
appearance in any consistent way. If you want to predict behavior from
appearance, you have to actually do the science and prove that there's an
underlying mechanism before you can trust that your neural net is recognizing
something meaningful.

------
gwern
> To put into perspective just how extraordinary a 90% accuracy claim is,
> consider that a well-controlled 2015 paper by computer vision researchers
> Gil Levi and Tal Hassner find that a convolutional neural net with the same
> architecture (AlexNet) is only able to guess the gender [5] of a face in a
> snapshot with 86.8% accuracy. [6]

This seems like a rather misleading comparison. I would be really surprised if
CNNs, which can hit <5% on ImageNet out of 1000 classes, and whose GANs can
generate nearly-photorealistic clearly gendered faces, can't even distinguish
gender at least that well. And guess what happens when you click through to
factcheck this claim that CNNs do worse on a binary gender prediction problem
than guessing hundreds of categories? You see that the facial images used by
Gil & Hassner are not remotely similar to a clean uniform government ID facial
photograph dataset, as they are often extremely low quality, blurry, at many
angles or lighting conditions, and I can't even confidently guess the gender
on several of the samples at the beginning and end, because as they say:

> These show that many of the mistakes made by our system are due to extremely
> challenging viewing conditions of some of the Adience benchmark images. Most
> notable are mistakes caused by blur or low resolution and occlusions
> (particularly from heavy makeup). Gender estimation mistakes also frequently
> occur for images of babies or very young children where obvious gender
> attributes are not yet visible.

It wouldn't surprise me, going off the samples, if human-level performance was
closer to 86% than 100%, simply due to the noise in the dataset.

(It's also bizarre to make this claim shortly after presenting ChronoNet! So
which is it: are CNNs so powerful learning algorithms that they can detect the
subtlest biases and details in images to the extent of easily classifying
photographs of random scenes to within years of their manufacture and so none
of their results are ever trustworthy, or are they so weak and dumb that they
cannot even distinguish male vs female faces and so none of their results are
ever trustworthy? You can't have it both ways.)

~~~
panic
_> It wouldn't surprise me, going off the samples, if human-level performance
was closer to 86% than 100%, simply due to the noise in the dataset._

I think that's kind of the point! The idea that any agent -- human or machine
-- can know someone's gender, criminal convictions, or anything else about
their background just from a photograph is fundamentally flawed.

~~~
gwern
> I think that's kind of the point!

No, my point is that you can't use a hard dataset to say what is possible on
an easy dataset. 'Here is a dataset of images processed into static: a CNN
gets 50% on gender; QED, detecting criminality, personality, gender, or
anything else is impossible'. This is obviously fallacious, yet it is what OP
is doing.

> The idea that any agent -- human or machine -- can know someone's gender,
> criminal convictions, or anything else about their background just from a
> photograph is fundamentally flawed.

This is quite absurd. You think you can't know something about someone's
gender from a photograph? Wow.

Personally, I find it entirely possible that criminality could be predicted at
above-chance levels based on photographs. Humans are not Cartesian machines,
we are biological beings. Violent and antisocial behavior is heritable,
detectable in GWAS, and has been linked to many biological traits such as
gender, age, and testosterone - hey, you know what _else_ testosterone
affects? A lot of things, including facial appearance. Hm...

Of course, maybe it can't be. But it's going to take more than some canned
history about phrenology, and misleadingly cited ML research, to convince me
that it can't and the original paper was wrong.

~~~
panic
_> This is quite absurd. You think you can't know something about someone's
gender from a photograph? Wow._

No, I'm saying that neither humans nor machines can determine gender solely by
looking at a picture, no matter how well they're trained. There will always be
examples they get wrong. The problem is not that the machines aren't as good
as humans. The problem is that they're both trying to do something that's
impossible.

And predicting at "above-chance levels" isn't enough. The article goes into
great detail about how this kind of inaccurate prediction can cause real human
suffering.

~~~
gwern
> No, I'm saying that neither humans nor machines can determine gender solely
> by looking at a picture, no matter how well they're trained. There will
> always be examples they get wrong.

This is irrelevant and dishonest. Don't go around making factual claims like
something can't be done when it manifestly can usually be done.

------
mkrum
I remember a story a professor once told me about something similar to this.
He once knew of a research team that was trying to develop an algorithm to
tell the difference between pictures of American and Russian tanks. They were
able to achieve a very high success rate very quickly. Excited but skeptical,
they decided to keep testing the algorithm on lower and lower resolution
photos. Shockingly, they were still getting close to 100% identification on
images of sizes around 10 by 10.

Turns out, all the pictures of Russian tanks were taken in the winter, while
all the american ones were taken in the summer. All they had done was trained
a model to classify how bright the picture was.

~~~
gwern
Yeah, this urban legend always gets trotted out to criticize neural networks,
but after years of looking, I've never been able to confirm it, and even when
Minsky tells it, he can't name any names or concrete details about when or
where - for example, in Minsky's version, it was how the photographs were
_developed_ , but in yours, it's winter/summer, and in other versions, it's
night vs day, or it was forest vs grass:
[http://lesswrong.com/lw/td/magical_categories/4v4a](http://lesswrong.com/lw/td/magical_categories/4v4a)
[http://lesswrong.com/lw/lvh/examples_of_ais_behaving_badly/c...](http://lesswrong.com/lw/lvh/examples_of_ais_behaving_badly/ckmm)

~~~
mindviews
And last week I heard that story but the difference was that the images of the
NATO tanks were in perfect focus while the Soviet tanks were out of focus in
the training images. Clearly this story is all over the place. I wonder if
even the likely source is correct.
[https://www.webofstories.com/play/marvin.minsky/122](https://www.webofstories.com/play/marvin.minsky/122)

------
deeth_starr
Tl;dr Two researchers claim they can use convolutional neural networks to pick
criminals from pictures.

For the criminals they use pictures from government id cards. For non-
criminals they searched the internet for random pictures.

These bozos just trained CNN to pick government id photos.

This is scary stuff because police are not very technically savvy and use
biased scoring systems already[0].

[0] [https://www.propublica.org/article/machine-bias-risk-
assessm...](https://www.propublica.org/article/machine-bias-risk-assessments-
in-criminal-sentencing)

~~~
rockmeamedee
All of the photos in the paper, both "criminals" and "non-criminals" are from
government ids. Though as the article mentions, in the pictured example all
the "non-criminals" are wearing collared shirts.

This reeks real hard of overfitting. 2000 images for training a CNN feels so
tiny. The paper should have included a learning curve.

The paper shouldn't have been written at all.

------
aisofteng
In-depth and interesting article, with thoughtful exposition. It's a longer
read, and I enjoyed it. (I almost said "but", which makes me a bit ashamed,
because the two characteristics are not at all contradictory, we've just been
trained to seek quick reads.)

------
dude01
tldr: Really long article about history of stupid theories people have had
linking superficial traits like your face to criminal behavior and
intelligence. I'm not sure why those old theories needed such debunking.

~~~
coldtea
> _I 'm not sure why those old theories needed such debunking._

Well, in case you actually "dr", it's because those old theories not only are
still held by many, but are also re-surfacing in the form of superficial deep
learning applications -- as the article mentions.

~~~
ThisNameIsTaken
To add to that, I guess the intent of the writers is explained by the
following:

> We expect that more research will appear in the coming years that has
> similar biases, oversights, and false claims to scientific objectivity in
> order to “launder” human prejudice and discrimination.

Historical theories are brought up (1) to show that they are resurfacing and
(2) because sciences (and I'm inclined to say DL in particular) are
susceptible for making similar mistakes. The hard part is that these biases
are often not clear at all, as they are based on general
preconceptions/stereotypes and the theories thus confirm something we think we
know (confirmation bias).

For example, I have been examining emotion recognition software [1] with
which, just as with physiognomy, the face is taken as a proxy for a person's
mental state. Just as OP examines the terms "criminal" and "justice" one could
inquiry into the concepts underlying the digitization of emotions, such as
"anger" and "joy". Terms that seem very clear on a brief encounter, but when
further examining them turn out to be heavily influenced by eg. culture.
Though not as obviously poignant as incriminating an innocent person, one
should still wonder then what it means to feel 34% angry.

Now, this is a single example, but I guess OP's use of historical theories
allows for a critical look at more DL applications out there. And maybe helps
convince laymen (policy makers that buy and employ such technology) that DL is
not an easy answer to complex social/political problems, such as OP's example
of Faception's classifiers for terrorists, paedophiles and white-collar
offenders.

[1]: [http://networkcultures.org/longform/2017/01/25/choose-how-
yo...](http://networkcultures.org/longform/2017/01/25/choose-how-you-feel-you-
have-seven-options/)

------
throwawayrace60
> Rapid developments in artificial intelligence and machine learning have
> enabled scientific racism to enter a new era, in which machine-learned
> models embed biases present in the human behavior used for model
> development. Whether intentional or not, this “laundering” of human
> prejudice through computer algorithms can make those biases appear to be
> justified objectively.

Let's be honest here. Realistically, could those biases _ever_ be justified
objectively to the satisfaction of the author? Probably not, because any
evidence in their favor will likely be swiftly dismissed as itself racist, and
thus, somehow automatically invalid.

On the issue of race, the modern leftist increasingly resembles a Sixteenth
Century geocentrist desperately adding epicycle after epicycle in a futile
struggle to preserve their cherished ideal. I suggest listening to Sam
Harris's recent interview with Charles Murray. It's a rare, refreshing example
of a prominent leftist at long last conceding the existence of a reality that
they for so long denied:

[https://www.youtube.com/watch?v=Y1lEPQYQk8s](https://www.youtube.com/watch?v=Y1lEPQYQk8s)

