Hacker News new | past | comments | ask | show | jobs | submit login
Automated Inference on Criminality Using Face Images (arxiv.org)
48 points by igonvalue on Nov 18, 2016 | hide | past | favorite | 31 comments

I thought this was a joke when I read the abstract, but it appears to be a genuine paper.

This paragraph in particular is one of the worst examples I've ever seen of researchers NOT UNDERSTANDING WHAT THEY ARE DOING:

Unlike a human examiner/judge, a computer vision algorithm or classifier has absolutely no subjective baggages, having no emotions, no biases whatsoever due to past experience, race, religion, political doctrine, gender, age, etc., no mental fatigue, no preconditioning of a bad sleep or meal. The automated inference on criminality eliminates the variable of meta-accuracy (the competence of the human judge/examiner) all together.

Please, read Weapons of Math Destruction and understand how excellent machine learning is at discovering and exploiting the biases in datasets.

Edit, no, sorry, it gets worse:

the upper lip curvature is on average 23.4% larger for criminals than for non-criminals.

the distance d between two eye inner corners for criminals is slightly shorter (5.6%) than for non-criminals

I think the point of what they're saying is that the algorithm largely doesn't bring additional biases.

This paper is showing there's a detectable difference between the criminal and non-criminal faces in their dataset. Obviously, this would be the case if criminals actually had different faces. However, it could also be the case that people with more extreme faces are more likely to be convicted. That is, this paper could be showing that their dataset is biased.

Personally, I find this paper really interesting. I wouldn't have previously believed you could train a classifier so accurate just from facial images. It could be revealing some really strong biases in the criminal justice system and I'd love to see this kind of work used to help combat human biases rather than simply reinforce them (which seems to be how most people envision this will be used).

I agree in general that this sort of work has a long and storied history of being junk science (like phrenology), and one might even be tempted to describe this sort of work as irresponsible-

But I think we shouldn't get too prescriptive about what constitutes interesting and useful science. There may be very interesting relationships to uncover that we will otherwise miss. Previous research suggests that the correlation between attractiveness and 'averageness of features' is a strong one and one thesis is that this is because having features 'matching the template' is indicative of general fitness and resistance to disease.

Based on this research you might wonder about whether or not a person having perceived low fitness was a risk factor for criminality. Or perhaps the low fitness itself somehow causes criminality? Indeed there are many other interesting interpretations that we might miss out on if we judge all 'dangerous' work at face value. Perhaps we as a society are willing to give up on those ideas in our desire to avoid the atrocities of the past, but personally I am not. Onwards and upwards, they say.

First of all, I don't think this is satire. I'll admit that the use of a gmail account by a researcher at a Chinese uni is facially suspicious, but it's not that odd given that cursory googling shows that both authors appear to be faculty members at Shanghai Jiao Tong University as claimed on the paper- though neither appears to have much, if any, background or expertise in machine learning.

I'm not much of a fan of a lot of the arguments made in Weapons of Math Destruction, but I do appreciate that in summarizing you draw the distinction between the biases of the engineer or (illogically, but oft-claimed nonetheless) the algorithm itself and the data which is used to train said model, and I think it's quite a valuable concept in regards to this particular paper.

For instance, the data set they're using here is fairly small, and while, they did use 10-fold cross-validation, that's still a bit on the less than ideal side generally speaking neural nets, especially CNN architectures, which are usually pretty deep. Furthermore, the dataset itself seems fairly questionable to me. I'm not sure how much I trust the Chinese criminal justice system to adequately adjudicate culpability in the first place, but even setting aside such admittedly conspiratorial notions, it seems rather odd indeed that nearly half of their positive samples are not in fact convicted criminals but merely suspects. I do not find their attempts at devil's advocate persuasive as it's not readily obvious exactly how they used or obtained any of their testing with the three different data sets.

As for the appropriateness of the broader topic, I'm more or less of the persuasion that all questions deserve to be examined, and that provided the work does not cause direct harm, it's hard for me to support a prohibition on examination of a given topic. That said, I do think that the more controversial the question, the higher quality of research required, and, good lord, does this mess fall well short of the mark. Perhaps if there existed a hypothetical criminal justice system free of systemic biases or, more realistically, a method by which to exactly define those prejudices and account for them in the composition of a data set, this could be a potentially useful question to investigate, but even then it seems to me quite unlikely that there's any particularly significant relationship between one's upper lip curvature and criminal disposition.

Oh, I agree it's an entirely valid area of study.

But to do it you need experts in criminology, physiology and machine learning, not just a couple of people who can follow the Keras instructions for how to use a neural net for classification.

For example, I think I remember reading a papers in the physiology field that show a link between increased testosterone and different facial features - but from memory (and I don't have the paper) there was no link between that and criminal offending.

In this case, the features they are finding don't seem to make any sense. A slight smile in the criminals seems more likely to be due to the way that set of photos are taken, and a number of the other features could possibly be explained by the fact the criminal set came from a single police department (in a single geographical area), while the other dataset was collected online. Given the small size of the dataset, if it included a single "family"-gang of criminals it is likely that would have been enough to taint the features.

China has 105 cities with over a million people each, and people there migrate frequently these days. A single gang?

Having dealt with some what similar datasets myself, there is a really, really good chance that the police department grabbed a days of arrests from one or two cities. There are only 730 positive cases - it's pretty easy to imagine that many of them could be from a single gang - either family or ethnically based.

The link between increased testosterone and criminal offending has been established by research:

> Testosterone plays a significant role in the arousal of these behavioral manifestations in the brain centers involved in aggression and on the development of the muscular system that enables their realization. There is evidence that testosterone levels are higher in individuals with aggressive behavior, such as prisoners who have committed violent crimes.


> Inmates who had committed personal crimes of sex and violence had higher testosterone levels than inmates who had committed property crimes of burglary, theft, and drugs. Inmates with higher testosterone levels also violated more rules in prison, especially rules involving overt confrontation.


Though the connection can be said to be weak, and definitely not the only factor (high testosterone alone is not sufficient for criminal offending), it is there.

> In this case, the features they are finding don't seem to make any sense. A slight smile in the criminals seems more likely to be due to the way that set of photos are taken

From the paper: "We stress that the criminal face images in Sc are normal ID photos not police mugshots."

> by the fact the criminal set came from a single police department (in a single geographical area)

Subset Sc contains ID photos of 730 criminals, of which 330 are published as wanted suspects by the ministry of public security of China and by the departments of public security for the provinces of Guangdong, Jiangsu, Liaoning, etc.; the others are provided by a city police department in China under a confidentiality agreement.

> if it included a single "family"-gang of criminals it is likely that would have been enough to taint the features.

Family resemblance is an interesting one. But unlikely to significantly affect the accuracy difference between proper labeling and random labeling (they'd all need to be related)

Overfit is sufficiently ruled out (to me), but leakage is not. Unfortunately it is not possible to replicate this study (even if the dataset was available, the implementation details are scarce). Differently sized raw ID pictures, or compression artifacts, could lead to near undetectable leakage for outsiders. I would probably not give this paper my stamp of approval, even if it was on an uncontroversial subject, but it is not abysmally bad.

I do think one has to be careful to separate moral concerns from technical concerns. Sure, this all feels very wrong to me, and should be taken into account when creating new regulation for ML systems, but the research itself (apart from the small sample size, and vague data gathering methods) is sufficiently solid for debate. Maybe we don't want to admit that phrenology can have a measurable impact on behavior, but that is wishful thinking, not science. Like you said: 'a link between increased testosterone and different facial features' exists, and I just sourced you that a link between criminal behavior and testosterone exists. Logic would deem us to conclude that different facial features are indicative of different criminal behavior, no matter the bizarre, scary, immoral research that supports it.

I'd note that they claim 89.5% accuracy(!) using the CNN classifier. One paper they reference[1] use a similar technique to attempt the (seemingly much easier) task of classifying people into Chinese, Korean or Japanese. They get 75% accuracy.

89% accuracy means that there is almost no other feature that influences criminality.

That should set off all kinds of alarms. If there was some kind of relationship between facial features and criminality (and I don't discount that there could be) I'd expect it to be a very weak one, not one that is accurate 9/10 times.

[1] https://arxiv.org/pdf/1610.01854v2.pdf

The first author is a well established academic in Canada: https://scholar.google.com/citations?user=ZuQnEIgAAAAJ&hl=en

All positive instances ARE convicted criminals, among whom there are NO political prisoners, just for your information.

That's appears to be a different Xiaolin Wu, affiliated with a different university, with no publications in similar areas.

I am just as shocked and appalled that it's getting upvoted. It's so un-scientific, it should be taught in class as a counterexample.

obviously because faces physically change when you do crime. charles manson was a happy camper before he started a murderin. All in the upper lip curvature, because why would a criminal smile in a mugshot.

that brings up an important point -- being convicted or incarcerated might affect one's daily mood enough to change their resting facial expression -- so to control for that the study should have made sure to only use photos of criminals taken before their first arrest or conviction, perhaps even before their first confessed criminal act if possible.

I'm not sure if you are trolling or not. Please don't.

All in the upper lip curvature, because why would a criminal smile in a mugshot.

Why would a non-criminal smile in a mugshot?

I think it's meant to be satire, in the form of "A modest proposal" but applied to ML classifying criminals.

I really, really hope so.

I'm sure we all agree with you about phrenology revivals but please don't use uppercase for emphasis!


Their data set may be more valuable than paper itself...

This would be more useful if it were applied exactly the opposite way. What facial features is the court system or even society at large biased against?

It looks like they didn't split up the two training sets (criminal/noncriminal) into two testing and training sets?

Which would explain this 'paradox', it's just overtraining:

>The seeming paradox that Sc [the criminal set] and Sn [the noncriminal set] can be classified but the average faces of Sc [the criminal set] and Sn [the noncriminal set] appear almost the same can be explained, if the data distributions of Sc [the criminal set] and Sn [the noncriminal set] are heavily mingled and yet separable.

They're heavily mingled because they're identical and you're just testing your predictions with your training data.

they performed 10 fold cross validation

However, there is no independent data set used to validate the model(s). We have no idea how the models will generalize beyond this data set.

Overall I do not think that the result is surprising. The large genetic deviations result in deviant behaviour and in deviant face. On the other hand it is next to useless for law enforcement since if it is applied to general population the majority of criminal-like faces belongs to law-abiding people.

The fact that we do not like the result does not make it false. For validation, see page 4, where they checked that a random labeling of images does not produce such a good distinguisher.

I can't think of a comment a that doesn't immediately invoke Godwin. This is like the setup to a Philip k Dick story. I wonder how many people outside of HN would think this is a perfectly normal result of a perfectly scientific study.

People forget that arXiv is just an academic wikipedia. Anyone can post an "article" here. So a being "published" here is meaningless as to potential validity.

When referees at real journals actually do their jobs correctly, they check arXiv when given manuscripts to read & reject them if they have been posted to arXiv as violating the "no prior publication" rules at the real journals.

From the abstract I gathered that average looking people are generally considered law-abiding whereas people with outlier features are more likely to fall into the criminal category: "The variation among criminal faces is significantly greater than that of the non-criminal faces."

Likening this to the "wage gap" where the XY chromosome is responsible for more outlier behavior: both the top of society and the bottom is both heavily dominated by male participants, whereas the female population is closer to the average and has far fewer outliers. Could this be related?

There's variance in XY chromosomes that cause men to swing wildly on the scale in both positive and negative directions. There seems to be an answer to the hypothesis that asserts that individuals with wildly differing attributes seem more often than not to fall on the outside of the law.

I can't believe nobody has mentioned William Herbert Sheldon and his famous Somatotyping. This fell out of favor but he was systematically cataloging body features as a function of criminality.

So people with more average characteristics are less likely to be convicted of a crime? Could mean they are at a disadvantage with juries.


Evidence from Meta-Analyses of the Facial Width-to-Height Ratio as an Evolved Cue of Threat


I haven't read the paper, and I don't really know much about ML, but this part stuck out to me from the abstract:

> All four classifiers perform consistently well and produce evidence for the validity of automated face-induced inference on criminality, despite the historical controversy surrounding the topic.

I realize the authors are intentionally skirting around this bit (it's not really the point of their paper), but the "problem" isn't that some physical features may indicate criminality, with some level of success. That's cool or whatever I guess, but hardly an issue or truly revolutionary I think from a social perspective (in person, people tend to understand "vibes" rather well, and bad vibes come from a number of things like body language or visual cues about a person. Humans have their own inference systems for these things, flawed as they are.)

No, the problem -- the "controversy" surrounding the topic -- IMO, is that, almost with 100% certainty, any implementation of this system will be completely left unchecked, will effectively be private, and will be totally unaccountable by any practical means.

Do the authors of this paper really think any implementation of this system would be open to the public in any accountable way, if used by say, LEOs? You know, as opposed to it being a big "every-criminal.sql" dump, based on hoarded data mining, and driven along and utilized by proprietary algorithms, created by some company selling to governments? LEOs in places like the US have already shown their hands with strategies like parallel reconstruction and the downright willingness to fabricate evidence out of thin air.

Really, who cares what some data science nerds think of their fancy criminal face models, and whether they think they're "accurate despite the controversy", when the police can just say "It's accurate, I say so, you're going to jail" and they can completely make shit up to support it? It's not a matter of whether the actual thing is accurate, it's whether or not it gives them a reason to do whatever they like.

It reminds me of rhetoric people said, about building walls around Mexico wrt the election. That can't happen. Who would build it. It'd be huge. Hard. Realistically? It'd be "easy". Humans have been building walls for a long, long time. It's not unthinkable. The difficult part is actually murdering people who would try to cross the wall by gunning them down -- and they will try to cross. I mean, you probably don't have to kill too many people to send the message. Just, enough of them. The Iron Curtain was a real thing, too, after all.

This is similar. The algorithm is the "easy" part. It's "only" some science. No, the hard part is dealing with the consequences. The hard part is closing the box of Pandora after you opened it.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact