Finding out is the easy part. I don't mean to trivialize it, because doing stats...

cs702 · on Aug 27, 2016

...include the potentially biasing factors as features...

Who decides what the "potentially biasing factors" are? How is that decided without somehow relying on subjective societal values?

Factors that no one thought were biased in the past are considered biased today; factors that no one thinks are biased today may be considered biased in the future; and factors that you and I consider biased today may not be considered biased by people in other parts of the world. I don't know how one would go about finding those "potentially biasing factors" without relying on subjective societal values that are different everywhere and always evolving.

yummyfajitas · on Aug 27, 2016

A potentially biasing factor is a factor that you think would be predictive if you included it in the model. If it's actually predictive, you win, your model becomes more accurate and you make more money.

Go read the wikipedia article on the topic: https://en.wikipedia.org/wiki/Omitted-variable_bias

It's true that as we learn more things we discover new predictive factors. That doesn't make them subjective. A lung cancer model that excludes smoking is not subjective, it's just wrong. And the way to fix the model is to add smoking as a feature and re-run your regression.

Again, would you make the same argument you just made if I said I had an accurate ad-targeting model?

cs702 · on Aug 27, 2016

OK, I see where the disconnect is. I think the best way to address it is with an example.

Many people today would object a priori to businesses using race as a factor to predict loan default risk, regardless of whether doing that makes the predictions more accurate or not. In many cases, using race as a factor WILL get you in trouble with the law (e.g., redlining is illegal in the US).

Please tell me, how would you predict what factors society will find objectionable in the future (like race today)?

yummyfajitas · on Aug 27, 2016

My claim is very specific. If you tell an algorithm to predict loan default probabilities, and you give it inputs (race, other_factor), the algorithm will usually correct for the bias in other_factor.

I claimed a paperclip maximizer will maximize paperclips, I didn't claim a paperclip maximizer will actually determine that the descendants of it's creators really wanted it to really maximize sticky tape.

Now, if you want an algorithm not to use race as a factor, that's also a math problem. Just don't use race as an input and you've solved it. But if you refuse to use race and race is important, then you can't get an optimal outcome. The world simply won't allow you to have everything you want.

A fundamental flaw in modern left wing thought is that it rejects analytical philosophy. Analytical philosophy requires us to think about our tradeoffs carefully - e.g., how many unqualified employees is racial diversity worth? How many bad loans should we make in order to have racial equity?

These are uncomfortable questions - google how angry the phrase "lowering the bar" makes left wing types. If you have an answer to these questions you can simply encode it into the objective function of your ML system and get what you want.

Modern left wing thought refuses to answer these questions and simply takes a religious belief that multiple different objective functions are simultaneously maximizable. But then machine learning systems come along, maximize one objective, and the others aren't maximized. In much the same way, faith healing doesn't work.

The solution here is to actually answer the uncomfortable questions and come up with a coherent ideology, not to double down on faith and declare reality to be "biased".

cs702 · on Aug 28, 2016

My claim was specific too: if a corpus is biased -- as defined by evolving societal values -- then the word embeddings learned from that corpus will necessarily be biased too -- according to those same societal values, regardless of whether you think those values are rational and coherent.