
Automated Inference on Criminality Using Face Images - igonvalue
https://arxiv.org/abs/1611.04135
======
nl
I thought this was a joke when I read the abstract, but it appears to be a
genuine paper.

This paragraph in particular is one of the worst examples I've ever seen of
researchers NOT UNDERSTANDING WHAT THEY ARE DOING:

 _Unlike a human examiner /judge, a computer vision algorithm or classifier
has absolutely no subjective baggages, having no emotions, no biases
whatsoever due to past experience, race, religion, political doctrine, gender,
age, etc., no mental fatigue, no preconditioning of a bad sleep or meal. The
automated inference on criminality eliminates the variable of meta-accuracy
(the competence of the human judge/examiner) all together._

Please, read _Weapons of Math Destruction_ and understand how excellent
machine learning is at discovering and exploiting the biases in datasets.

Edit, no, sorry, it gets worse:

 _the upper lip curvature is on average 23.4% larger for criminals than for
non-criminals._

 _the distance d between two eye inner corners for criminals is slightly
shorter (5.6%) than for non-criminals_

~~~
fhadley
First of all, I don't think this is satire. I'll admit that the use of a gmail
account by a researcher at a Chinese uni is facially suspicious, but it's not
_that_ odd given that cursory googling shows that both authors appear to be
faculty members at Shanghai Jiao Tong University as claimed on the paper-
though neither appears to have much, if any, background or expertise in
machine learning.

I'm not much of a fan of a lot of the arguments made in _Weapons of Math
Destruction_ , but I do appreciate that in summarizing you draw the
distinction between the biases of the engineer or (illogically, but oft-
claimed nonetheless) the algorithm itself and the data which is used to train
said model, and I think it's quite a valuable concept in regards to this
particular paper.

For instance, the data set they're using here is fairly small, and while, they
did use 10-fold cross-validation, that's still a bit on the less than ideal
side generally speaking neural nets, especially CNN architectures, which are
usually pretty deep. Furthermore, the dataset itself seems fairly questionable
to me. I'm not sure how much I trust the Chinese criminal justice system to
adequately adjudicate culpability in the first place, but even setting aside
such admittedly conspiratorial notions, it seems rather odd indeed that nearly
half of their positive samples are not in fact convicted criminals but merely
suspects. I do not find their attempts at devil's advocate persuasive as it's
not readily obvious exactly how they used or obtained any of their testing
with the three different data sets.

As for the appropriateness of the broader topic, I'm more or less of the
persuasion that all questions deserve to be examined, and that provided the
work does not cause direct harm, it's hard for me to support a prohibition on
examination of a given topic. That said, I do think that the more
controversial the question, the higher quality of research required, and, good
lord, does this mess fall well short of the mark. Perhaps if there existed a
hypothetical criminal justice system free of systemic biases or, more
realistically, a method by which to exactly define those prejudices and
account for them in the composition of a data set, this could be a potentially
useful question to investigate, but even then it seems to me quite unlikely
that there's any particularly significant relationship between one's upper lip
curvature and criminal disposition.

~~~
nl
Oh, I agree it's an entirely valid area of study.

But to do it you need _experts_ in criminology, physiology and machine
learning, not just a couple of people who can follow the Keras instructions
for how to use a neural net for classification.

For example, I think I remember reading a papers in the physiology field that
show a link between increased testosterone and different facial features - but
from memory (and I don't have the paper) there was no link between that and
criminal offending.

In this case, the features they are finding don't seem to make any sense. A
slight smile in the criminals seems more likely to be due to the way that set
of photos are taken, and a number of the other features could possibly be
explained by the fact the criminal set came from a single police department
(in a single geographical area), while the other dataset was collected online.
Given the small size of the dataset, if it included a single "family"-gang of
criminals it is likely that would have been enough to taint the features.

~~~
BickNowstrom
The link between increased testosterone and criminal offending has been
established by research:

> Testosterone plays a significant role in the arousal of these behavioral
> manifestations in the brain centers involved in aggression and on the
> development of the muscular system that enables their realization. There is
> evidence that testosterone levels are higher in individuals with aggressive
> behavior, such as prisoners who have committed violent crimes.

[https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3693622/](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3693622/)

> Inmates who had committed personal crimes of sex and violence had higher
> testosterone levels than inmates who had committed property crimes of
> burglary, theft, and drugs. Inmates with higher testosterone levels also
> violated more rules in prison, especially rules involving overt
> confrontation.

[http://www.sciencedirect.com/science/article/pii/01918869940...](http://www.sciencedirect.com/science/article/pii/019188699400177T)

Though the connection can be said to be weak, and definitely not the only
factor (high testosterone alone is not sufficient for criminal offending), it
is there.

> In this case, the features they are finding don't seem to make any sense. A
> slight smile in the criminals seems more likely to be due to the way that
> set of photos are taken

From the paper: "We stress that the criminal face images in S _c_ are normal
ID photos not police mugshots."

> by the fact the criminal set came from a single police department (in a
> single geographical area)

Subset S _c_ contains ID photos of 730 criminals, of which 330 are published
as wanted suspects by the ministry of public security of China and by the
departments of public security for the provinces of Guangdong, Jiangsu,
Liaoning, etc.; the others are provided by a city police department in China
under a confidentiality agreement.

> if it included a single "family"-gang of criminals it is likely that would
> have been enough to taint the features.

Family resemblance is an interesting one. But unlikely to significantly affect
the accuracy difference between proper labeling and random labeling (they'd
all need to be related)

Overfit is sufficiently ruled out (to me), but leakage is not. Unfortunately
it is not possible to replicate this study (even if the dataset was available,
the implementation details are scarce). Differently sized raw ID pictures, or
compression artifacts, could lead to near undetectable leakage for outsiders.
I would probably not give this paper my stamp of approval, even if it was on
an uncontroversial subject, but it is not abysmally bad.

I do think one has to be careful to separate moral concerns from technical
concerns. Sure, this all feels very wrong to me, and should be taken into
account when creating new regulation for ML systems, but the research itself
(apart from the small sample size, and vague data gathering methods) is
sufficiently solid for debate. Maybe we don't want to admit that phrenology
can have a measurable impact on behavior, but that is wishful thinking, not
science. Like you said: 'a link between increased testosterone and different
facial features' exists, and I just sourced you that a link between criminal
behavior and testosterone exists. Logic would deem us to conclude that
different facial features are indicative of different criminal behavior, no
matter the bizarre, scary, immoral research that supports it.

~~~
nl
I'd note that they claim 89.5% accuracy(!) using the CNN classifier. One paper
they reference[1] use a similar technique to attempt the (seemingly much
easier) task of classifying people into Chinese, Korean or Japanese. They get
75% accuracy.

89% accuracy means that there is _almost no other feature that influences
criminality_.

That should set off all kinds of alarms. If there was some kind of
relationship between facial features and criminality (and I don't discount
that there could be) I'd expect it to be a very weak one, not one that is
accurate 9/10 times.

[1]
[https://arxiv.org/pdf/1610.01854v2.pdf](https://arxiv.org/pdf/1610.01854v2.pdf)

------
sp332
This would be more useful if it were applied exactly the opposite way. What
facial features is the court system or even society at large biased against?

------
a_bonobo
It looks like they didn't split up the two training sets
(criminal/noncriminal) into two testing and training sets?

Which would explain this 'paradox', it's just overtraining:

>The seeming paradox that Sc [the criminal set] and Sn [the noncriminal set]
can be classified but the average faces of Sc [the criminal set] and Sn [the
noncriminal set] appear almost the same can be explained, if the data
distributions of Sc [the criminal set] and Sn [the noncriminal set] are
heavily mingled and yet separable.

They're heavily mingled because they're identical and you're just testing your
predictions with your training data.

~~~
glglwty
they performed 10 fold cross validation

~~~
jupiter90000
However, there is no independent data set used to validate the model(s). We
have no idea how the models will generalize beyond this data set.

------
ongoodie
Overall I do not think that the result is surprising. The large genetic
deviations result in deviant behaviour and in deviant face. On the other hand
it is next to useless for law enforcement since if it is applied to general
population the majority of criminal-like faces belongs to law-abiding people.

The fact that we do not like the result does not make it false. For
validation, see page 4, where they checked that a random labeling of images
does not produce such a good distinguisher.

------
iamthepieman
I can't think of a comment a that doesn't immediately invoke Godwin. This is
like the setup to a Philip k Dick story. I wonder how many people outside of
HN would think this is a perfectly normal result of a perfectly scientific
study.

------
mjburns
People forget that arXiv is just an academic wikipedia. Anyone can post an
"article" here. So a being "published" here is meaningless as to potential
validity.

When referees at real journals actually do their jobs correctly, they check
arXiv when given manuscripts to read & reject them if they have been posted to
arXiv as violating the "no prior publication" rules at the real journals.

------
Friction87
From the abstract I gathered that average looking people are generally
considered law-abiding whereas people with outlier features are more likely to
fall into the criminal category: "The variation among criminal faces is
significantly greater than that of the non-criminal faces."

Likening this to the "wage gap" where the XY chromosome is responsible for
more outlier behavior: both the top of society and the bottom is both heavily
dominated by male participants, whereas the female population is closer to the
average and has far fewer outliers. Could this be related?

There's variance in XY chromosomes that cause men to swing wildly on the scale
in both positive and negative directions. There seems to be an answer to the
hypothesis that asserts that individuals with wildly differing attributes seem
more often than not to fall on the outside of the law.

------
synapticaxon
I can't believe nobody has mentioned William Herbert Sheldon and his famous
Somatotyping. This fell out of favor but he was systematically cataloging body
features as a function of criminality.

------
mattnewton
So people with more average characteristics are less likely to be convicted of
a crime? Could mean they are at a disadvantage with juries.

------
HS1
Fyi

Evidence from Meta-Analyses of the Facial Width-to-Height Ratio as an Evolved
Cue of Threat

[http://journals.plos.org/plosone/article?id=10.1371/journal....](http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0132726)

------
aseipp
I haven't read the paper, and I don't really know much about ML, but this part
stuck out to me from the abstract:

> All four classifiers perform consistently well and produce evidence for the
> validity of automated face-induced inference on criminality, despite the
> historical controversy surrounding the topic.

I realize the authors are intentionally skirting around this bit (it's not
really the point of their paper), but the "problem" isn't that some physical
features may indicate criminality, with some level of success. That's cool or
whatever I guess, but hardly an issue or truly revolutionary I think from a
social perspective (in person, people tend to understand "vibes" rather well,
and bad vibes come from a number of things like body language or visual cues
about a person. Humans have their own inference systems for these things,
flawed as they are.)

No, the problem -- the "controversy" surrounding the topic -- IMO, is that,
almost with 100% certainty, any implementation of this system will be
completely left unchecked, will effectively be private, and will be totally
unaccountable by any practical means.

Do the authors of this paper really think any implementation of this system
would be open to the public in any accountable way, if used by say, LEOs? You
know, as opposed to it being a big "every-criminal.sql" dump, based on hoarded
data mining, and driven along and utilized by proprietary algorithms, created
by some company selling to governments? LEOs in places like the US have
already shown their hands with strategies like parallel reconstruction and the
downright willingness to fabricate evidence out of thin air.

Really, who cares what some data science nerds think of their fancy criminal
face models, and whether they think they're "accurate despite the
controversy", when the police can just say "It's accurate, I say so, you're
going to jail" and they can completely make shit up to support it? It's not a
matter of whether the actual thing is accurate, it's whether or not it gives
them a reason to do whatever they like.

It reminds me of rhetoric people said, about building walls around Mexico wrt
the election. That can't happen. Who would build it. It'd be huge. Hard.
Realistically? It'd be "easy". Humans have been building walls for a long,
long time. It's not unthinkable. The difficult part is actually murdering
people who would try to cross the wall by gunning them down -- and they will
try to cross. I mean, you probably don't have to kill _too many_ people to
send the message. Just, enough of them. The Iron Curtain was a real thing,
too, after all.

This is similar. The algorithm is the "easy" part. It's "only" some science.
No, the hard part is dealing with the consequences. The hard part is closing
the box of Pandora after you opened it.

