
Do algorithms reveal sexual orientation or just expose our stereotypes? - ALee
https://medium.com/@blaisea/do-algorithms-reveal-sexual-orientation-or-just-expose-our-stereotypes-d998fafdf477
======
YeGoblynQueenne
Something that I found truly shocking is that, in the paper's data, the number
of straight men is exactly the same as the number of gay men- and the same for
women (for individuals with at least one picture, i.e. everyone; numbers for
those with more than one picture are different).

The paper itself cites a 7% distribution of gay men in the general population.
Yet they trained with a 50/50, uniform distribution. But- why?

Well- because a problem involving unbalanced classes, like gay/straight
individuals, where your target class (gay men/women) is less than a tenth of
your entire data is a bitch to train a classifier for. Now, if you
artificially equalise the data, by just adding more of that class, you can get
a pretty good "accuracy" score (precision over recall, which they used).

Except of course, that score is completely useless as an estimate of the true
accuracy of your classifier in the real world, against the true distribution
of your data, "in the wild". It's also completely useless as evidence for
whatever hare-brained theory you want to posit, that involves, oh, say, the
distribution of feminine and masculine features in gay and straight
individuals' faces - you know, the point the paper was making.

This should be a cautionary tale: you can't just force a classifier to give
you the results you want it to and then claim that those results prove your
theory. That's just bad machine learning. Like bad statistics, but with more
assumptions.

