
How Vector Space Mathematics Reveals the Hidden Sexism in Language - stared
https://www.technologyreview.com/s/602025/how-vector-space-mathematics-reveals-the-hidden-sexism-in-language/
======
stared
And my critical remarks:

While "the word embeddings are sexist" sounds cool (as a topic, not -
phenomenon), it's at best a misunderstanding of the word2vec scope. For any
topic that is correlated with gender (e.g. gender ratios for doctors and
nurses is different), the non-zero projection on the "man"-"woman" is
expected, and desired.

What is problematic is the over-interpretation of word2vec (and any other
"meaning by counting" techniques). Some of it's usage can be X-ist, e.g. the
infamous "black people stole my car" in a Google search suggestions. But while
people can be sexist (knowingly or not, with premeditation or accidently),
"reality" cannot be "biased".

And well "Sexism can be thought of as a kind of warping of this vector space."
is a bad metaphor. In linear spaces, it's a linear transformation (shearing),
suggesting a non-linear transformation.

...and I read this paper before it was on MIT Technology Review.

~~~
brudgers
I agree that the data is the data. I take the author's concern to be with the
practice of treating the data without considering that it may reflect
underlying biases in the materials it encodes and thereby claiming objectivity
or neutrality in the products that apply the data when making decisions.

For example, if an automated news generation application uses Word2Vec,
statistically it will reflect the same word associations as are found in the
source data. Which is to say that _if_ the source articles contain a
significant number of word associations an individual might deem sexist, then
the automated news generation will likely also contain similar word
associations the same individual is likely to deem sexist.

One way of looking at it is that at best, Word2Vec perpetuates the degree of
sexism in word associations that existed at the time the data set was
collected and applications built upon Word2Vec will have no more or less
sexism in their word associations than existed at the time of Word2Vec's
creation.

If one believes that reducing the output of word associations reflecting
sexism is a good thing, then relying on Word2Vec will not achieve that good
thing.

This is the case regardless of how one defines or recognizes "sexism."

~~~
stared
I agree. And it's why I liked the original paper, but was appalled by the MIT
Technology Review layman presentation.

