Some of the press around this seems a bit alarmist - I doubt you would see anywhere near this accuracy out in the real world.
By having their DNN train on faces that have been processed by VGG-Face, they greatly reduce the risk of overfitting or relying on things that would be present in dating site pictures but not in pictures of the same people in other contexts.
Instead of learning, that person looks like a gay person.
It learns, that person looks like Tim, who is gay.
Still quite an accomplishment (maybe). But they pretty much already led the horse to water, yes?
So in the circumstance, why should we believe it's generalizable?
The human judges may have been less accurate, but they could likely explain each decision they made and the visual features they based their decision on.
According to Gallup Polling, around 4% of the American population is homosexual. So, let's be generous and say that their classifier has a 80% accuracy given balanced inputs while humans have 60%.
Let's sample 100 people of the dataset, 50 of which are homosexual (and the rest heterosexual). If the classifier has 80% accuracy (let's assume false_positive_rate = false_negative_balance since I didn't find information about that), it means that 80 people were correctly classified, 10 heterosexuals were misclassified as homosexuals and 10 homosexuals as heterosexuals.
According to the Bayes theorem, "Given a random person, probability of being homosexual assuming that the classifier said so" = "Probability of being homosexual of a random person (the so-called prior)" <times> "Probability of classifier saying it's homosexual, assuming it's indeed homosexual <divided by> "Given a random person, probability of saying it's homosexual".
Substituting, we get: P = 0.04 * 0.8 / 0.5 = 0.064
If instead of the classifier we use human "feeling", we get: P=0.04 * 0.6 / 0.5 = 0.048
In summary, 4% of the people are homosexual. If a human thinks someone is homosexual, the probability "increases" to 4.8%. If on the contrary their algorithm believes it's homosexual, the probability of actually being homosexual increases to 6.4%.
Not very groundbreaking...
Assuming that's true, no wonder the computer did better. I'd bet it would do far worse dealing with a more representative sample.
Still, even though it's not a perfect study, it's still impressive it can do as well as it did.
Very good explanation (though the op's one are too)
From what I can gather he's basically saying that the "probability of a random face being classified as homosexual" is 0.5. This isn't REALLY true (would have to run the classifier on all possible faces to find this), but that is in fact the "environment" the classifier has been trained within.
If there is 1000 persons, 40 gay, and the algorithm is 80% accurate, it will say: 960 * 20%= 192 straight people are gay and 40 * 80% = 32 gay person are gay. Not very impressive and shows a very stong gay bias if I can say so
P(being depressed | trivial detector said so) = d * 1.0 / 1.0 = d
It does raise the question of how Bayes' rule applies if your sample is underrepresented, though... say their training set was only 2% gay. Could they achieve a true positive rate of 200%?
The denominator should be the probability of the classifier being positive: P(True Positive)+P(False Positive)
Since you assumed that P(False Positive) = P(False Negative) = (1-P(True Positive))/2 = (1-P(True Negative))/2
In the algorithm case the denominator is 0.8x0.04+0.1x0.96=0.128
In the human case the denominator is 0.6x0.04+0.2x0.96=0.216
In summary, if a human thinks someone is homosexual, the probability increases to 11.1%. If the algorithm makes that call the probability increases to 25%.
This assumption is usually wrong when trying to detect rare events (such as CC fraud or straight/gay people).
In my own experience, if you train on a prepared 50/50 dataset to some kind of reasonable accuracy, on actual data where classes are more like 96/4 you get a LOT of false positives, thus making the whole thing not very useful
Where in the article does it say that? I can see
> Studies from several nations, including the U.S., conducted at varying time periods, have produced a statistical range of 1.2 to 6.8 percent of the adult population identifying as LGBT.
but there's no indication of how likely any part of that range is.
Just read the whole article. The references cited are also worth reading if you are actually interested in learning about this topic.
Why are you using the accuracy as a conditional likelihood?
It seems to have infected a lot of the sites I visit, and is always liked or voted to the top.
I guess people just love feeling superior.
> The results show that the faces of gay men were more feminine and the faces of lesbians were more masculine than those of their respective heterosexual counterpart
It looks to me you're gay if you are a man wearing glasses and straight if you have a beard and rounder face. It would be interesting to see what it made of the stereotypical "bear". If you're a brunette and have a slightly thinner face then you're probably, 58%, a lesbian ¯\_(ツ)_/¯.
> lesbians tended to wear baseball caps
I'm really not sure what to make of this.
I wonder whether the "judges" had access to that same training set. If they were people off the street that brought in their own biases, I would suspect they would do far worse than if they were able to view the training set themselves and teach themselves what hetero and homosexual people look like. Many heterosexual people have very limited contact with homosexual people. But given a training set (or sufficient contact with homosexual people), they could learn too, and I suspect they could learn better than the machine learning algorithm.
That said, this research is impressive, as well as terrifying and depressing.
I know why you say this, but it's only terrifying because there are so many idiots around. All this study says is that there is a strong genetic/developmental component to sexuality - something we've known about for decades.
Telling people they can't be gay is like telling people they can't be tall, or black.
It doesn't say that. The effects could very well be presented by how people with different sexual orientations present themselves (sub)consciously to attract similar people.
If it can be done it will be done. In this particular case it doesn't even cost much. You should have started to be concerned when machine learning evolved into a paradigm that is only possible to achieve any kind of real world performance with an amount of data and computing resources available only to megacorps and governments.
> Additionally, given that companies and governments are increasingly using computer vision algorithms to detect people’s intimate traits, our findings expose a threat to the privacy and safety of gay men and women.
He gives several examples, and explains how these type of statements are extremely misleading (and dangerous!), and not something new at all...
The problem isn't the presence of software producing likelihoods of someone being gay. The problem is people interpreting the results in a reductionist way and reducing people into labels. And another problem is you just don't like seeing what you don't like. There are companies out there profiting from the same stuff. They just don't talk about it in this way.
We must make our best effort to understand how this works, not bury our head in the sand as hostile actors use this for ill.
Even so, it is disheartening. It appears we're so fascinated by all the avenues tools like deep learning could open up, to ask more basic questions like the implications of pursuing such research.
Mini disclaimer: I vaguely knew him from university and didn't get on with him. However, I don't think he was the type of person to put people at risk for personal glory.
Also: man, that preprint formatting is annoying to read.
That said, the concern of this is the potential to do this on at scale in a horrible distopia. Cameras in airports and on sidewalks automatically scan people for "undesirable" features and they end up in reeducation camps or just simply disappearing.
Just wait until somebody publishes a "terrorist facial classifier" with "99% accuracy". That's when I'll be scared.
While I'm not a fan of censoring research, I think this is a case where the research community should refrain from publishing results that could help replicating such algorithms.
EDIT: Just to be clear, I don't say that this study makes sense or works. But there are areas where large parts of the population really hate homosexuality. Just thinking this could work can make them use such a system. Especially in situations where you can afford false positives. So next time you want to travel to a country it could happen that they don't let you in because the algorithm says that you're likely gay.
This is just the first step, things are about to get much worse.
What will be difficult is to push the accuracy up to get actionable information from it. Assuming you're an evil repressive regime, even if you're system had 90% accuracy, you'd be falsely accusing a huge number of people and doubt even a repressive homophobic regime would implement it. Getting that 90% accuracy to 99.9% or higher would take a huge effort, this study isn't anywhere close.
That said, the concerns are real. Automatically sorting hetero from homosexual people, Jews from Christians, or even black from white comes with a tone of moral issues.
It is still an impressive result, and one that might be misused badly, despite the numerous warnings used in the paper.
The exemple in the wikipedia page are very good. I had to explain that to a friend which had tested positive on the HIV test and was waiting for confirmation over the weekend. Not easy to talk math when such things happens.. I found it very troubling that doctors don't even mention that to patient and present tests as 95% effective. (In fact she was fine)
America has a bunch of angry people with twitchy trigger-fingers just begging for targets.
It's fun, but I'd never dare to make such a dumb thing public.
+ Hi trolls, yes, I'm aware of Godwin's Law. I also didn't call anybody a Nazi.
The study seems intentionally divisive. I get that a Stanford BSchool student would take it on to attract attention, but disappointed that Stanford would get behind it.
As for the debate over whether this research is ethical, consider this. If someone actually uses this to discriminate against homosexuals, they must accept that the thing actually works. Which means that homosexuality is determined by biological features beyond anyone's control, which would contradict their own ideology.
And that is the most interesting part of this work, not whether this tool is very accurate or not. This pretty solidly proves that physical features correlate well with sexual orientation, which is strong evidence for the biological theory of sexual orientation. Which has always been one of the biggest arguments for gay rights, that it's not a choice and can't be changed.
On the usefulness of this test to actually classify gay people:
They claim 91% accuracy on a balanced dataset. E.g. where there the ratio of gays to straights is 50:50. To get a ratio of correct:incorrect of 91:9 on such a dataset, their test must increase or decrease the odds a person is gay by 10.
Now in the general population, the ratio of gays to straights is about 16 to 984 (1.6%). So if their test gives someone a positive reading, that increases the odds to 162 to 984, or 14%. So you can't use this test to accurately guess someone's sexual orientation. Simply because gay people are so rare that even a few percent of straight people misclassified will overwhelm the number of actual gay people.
But still that's a lot more accurate than human guessing or the base rate, and it's scientifically interesting that this is even possible. It's a proof of concept that higher accuracy may be possible with better methods and more data.
Another article claims this:
>when asked to pick out the ten faces it was most confident about, nine of the chosen were in fact gay. If the goal is to pick a small number of people who are very likely to be gay out of a large group, the system appears able to do so.
The test gives varying degrees of confidence, it gives much higher confidence to some people than others. There are some individuals that it can tell are definitely gay or straight. But for most it is more uncertain.
Also note that the estimates for the percentage of gay people vary a lot. Which could make the true accuracy as high as 42%. Also some people believe sexuality is more of a spectrum than a binary straight/gay. If so the straight people it misclassifies might lean more on the gay/bisexual spectrum than normal and the errors wouldn't seem so unreasonable.
Lastly all these "phrenology" references are silly. If you have methodological problem with this research I'd love to hear it. But I see people discarding the research simply because they don't like the conclusions. For this study and other facial correlations based research.
This isn't new at all, there's tons of scientific research about digit ratios and all kinds of correlations they have with different things (https://en.wikipedia.org/wiki/Digit_ratio). Why wouldn't we expect even better correlations from all facial features?
Some human programmers got together and thought they could identify sexual orientation from faces, a silly bar room level idea in itself with no basis in reality and trained a neural net to express this prejudice. This is the same thing as a seer claiming to see the future.
>Among men, the classification accuracy equaled AUC = .81 when provided with one image per person. This means that in 81% of randomly selected pairs—composed of one gay and one heterosexual man—gay men were correctly ranked as more likely to be gay.
Pretty different from predicting one's sexual orientation.
What would happen if you took a big set of Facebook profiles and train some (the same if you wanna) CNN to classify picture->f for each f in profile features. Sure, for some features, your model should be able to deliver decent precision.
Does this mean that you quickly found out what features can be predicted from pictures & how well your CNN performs on that? Or is it possible that you just train models from picture->X where X is basically meaningless but significantly correlated with some feature because of the effect portrait in xkcd's "Significant" (Scientists investigate!) 
This is why the model is validated on a separate testing group from the training group which created it. There are lots of ways to do this, and the more sophisticated continually iterate training and testing to improve the model.
From the abstract: "Additionally, given that companies and governments are increasingly using computer vision algorithms to detect people’s intimate traits, our findings expose a threat to the privacy and safety of gay men and women."
Acknowledging that data analysis can create obstacles to freedom or serious social problems is a necessary step to preventing or addressing these issues, but public opinion is far from being there yet...
Articles like this give the impression that the authors are internet hustlers, not proper grad students.
A reason why we didn't read much about it is that most researchers know that you shouldn't publish about this. Otherwise you can quickly end up with blood on your hands.