Hacker News

hprotagonist · on Feb 9, 2018

from https://medium.com/@yonatanzunger/asking-the-right-questions... , discussed at length last week:

AI models hold a mirror up to us; they don’t understand when we really don’t want honesty. They will only tell us polite fictions if we tell them how to lie to us ahead of time.

This kind of honesty can force you to be very explicit. A good recent example was in a technical paper about “word debiasing.” This was about a very popular ML model called word2vec which learned various relationships between the meanings of English words — for example, that “king is to man, as queen is to woman.” The authors of this paper found that it contained quite a few examples of social bias: for example, it would also say that “computer programmer is to man, as homemaker is to woman.” The paper is about a technique they came up with for eliminating that bias.

What isn’t obvious to the casual reader of this paper — including many of the people who wrote news articles about it — is that there’s no automatic way to eliminate bias. Their procedure was quite reasonable: first, they analyzed the word2vec model to find pairs of words which were sharply split along the he/she axis. Next, they asked a bunch of humans to identify which of those pairs represented meaningful splits (e.g., “boy is to man as girl is to woman”) and which represented social biases. Finally, they applied a mathematical technique to subtract off the biases from the model as a whole, leaving behind an improved model.

This is all good work, but it’s important to recognize that the key step in this — of identifying which male/female splits should be removed — was a human decision, not an automatic process. It required people to literally articulate which splits they thought were natural and which ones weren’t. Moreover, there’s a reason the original model derived those splits; it came from analysis of millions of written texts from all over the world. The original word2vec model accurately captured people’s biases; the cleaned model accurately captured the raters’ preference about which of these biases should be removed.

The risk which this highlights is the “naturalistic fallacy,” what happens when we confuse what is with what ought to be. The original model is appropriate if we want to use it to study people’s perceptions and behavior; the modified model is appropriate if we want to use it to generate new behavior and communicate some intent to others. It would be wrong to say that the modified model more accurately reflects what the world is; it would be just as wrong to say that because the world is some way, it also ought to be that way. After all, the purpose of any model — AI or mental — is to make decisions. Decisions and actions are entirely about what we wish the world to be like; if they weren’t, we would never do anything at all.