

Machine Learning and Human Bias: An Uneasy Pair - denzil_correa
http://techcrunch.com/2015/08/02/machine-learning-and-human-bias-an-uneasy-pair/

======
yummyfajitas
Let me rephrase the questions that the reporter raises, but in a manner that's
too direct for any journalist.

What if machine learning systems come up with results that agree with "evil"
stereotypes and "biases"? What if machine learning systems discover that
socially unpleasant stereotypes are actually accurate predictors of reality?

Modern moral philosophy has taken an easy copout in the past. It asserts,
without proof, that various _positive_ claims are false, and therefore will
yield bad decisions if used. We also believe it would be evil to use them to
make decisions. The question we need to address is what moral claims can we
make which are independent of positive claims?

~~~
21echoes
I think it's a much more subtle point than that, and he does make it rather
directly:

* If police departments are racially biased * and the Heat List algorithm heavily factors in associations * and most people associate with others from their own race * then won't the Heat List disproportionately output people from a certain race? * Won't this then result in increased policing & suspicion of these communities?

In other words, the point is: human biases can be a seed issue that machine
learning then positive-feedback-loops out of control

~~~
yummyfajitas
_then won 't the Heat List disproportionately output people from a certain
race?_

This will only happen if that certain race commits more crimes (in the
training data). If you take race out of a statistical predictor designed to
learn crime, but race is a good predictor of crime, then the predictor might
learn race at an intermediate step.

Now there are _statistical_ issues one might run into - e.g., early
overfitting of what is essentially a bandit algorithm, and unaccounted for
feedback between training data and system outputs. But at least the way I'm
reading the article, it isn't calling for more and better math (which would be
the solution to the problems you describe).

~~~
21echoes
I think you're missing the first point in both my summary and in the article:
It does not need to be the case that "a certain race commits more crimes". It
can, instead, just be the case that a certain race is _arrested_ for
committing more crimes, despite the equal rates across races of the actual
criminal behavior.

For instance: it's a well recognized fact[1] that blacks and whites use and
deal marijuana at the same rate, but blacks are arrested for it in far larger
volume. So, if this data and other similar data sets are the seed in a machine
learning algorithm, then algorithms like the Heat List will output racially
biased data.

[1] [https://www.aclu.org/files/assets/aclu-thewaronmarijuana-
rel...](https://www.aclu.org/files/assets/aclu-thewaronmarijuana-rel2.pdf)

~~~
yummyfajitas
I didn't miss that. As I said: _Now there are statistical issues one might run
into - e.g., early overfitting of what is essentially a bandit algorithm, and
unaccounted for feedback between training data and system outputs._

There is nothing fundamental about machine learning that says seed data like
this will give biased outputs - many algorithms do have this problem (it's a
difficult one to deal with), but it's not fundamental.

I certainly didn't get the impression from the article that it was advocating
for algorithms which are less sensitive to these errors. Among other things,
that's far less of a conversation that "we have to talk about", but far more
of a conversation that some stats geeks have to talk about. These are also far
less of an "ethical" problem (as the article asserts) and far more of a
technical one.

But maybe I misread.

------
Totient
I'm reminded of the classic Charles Baggage quote:

On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the
machine wrong figures, will the right answers come out?" ... I am not able
rightly to apprehend the kind of confusion of ideas that could provoke such a
question.

I'd argue for a pretty high level of transparency in the process - I would
like to see whatever classifiers being used open-sourced, for example. And I'd
want to know where people are drawing the training data from.

But the nice thing is that the tech industry has a large population of people
very sympathetic to transparency, and with a history of a culture supporting
it. Quite frankly, I think the legal community has a lot more to learn from
the open-source community than the other way around.

------
xiler
Summarizing the complexities of human behavior using models appears as an
unpleasant echo from the past. Statistical pioneers Karl Pearson and Francis
Galton were strong proponents of social Darwinism aka scientific racism [1,
2].

The biggest problem with an observational approach to aggregate human behavior
is that it generally ignores internal structure and makes hasty judgements
based on mere appearance.

[1]
[https://en.wikipedia.org/wiki/Karl_Pearson#Politic](https://en.wikipedia.org/wiki/Karl_Pearson#Politic)

[2]
[https://en.wikipedia.org/wiki/Francis_Galton#Heredity_and_eu...](https://en.wikipedia.org/wiki/Francis_Galton#Heredity_and_eugenics)

~~~
DerKommissar
So was Ronald Fisher. He was a big proponent of eugenics.

------
musesum
While at Harvard Law School, Barrack Obama argued that economic status is
better indicator of crime than race (can't find the cite, as there is more
written about Obama than by him, at HLS).

If using an Artificial Neural Network, would this mean that race should be
downstream from economic status? Maybe race should be a hidden layer? What
should be the input nodes? Is there a way to automatically create input nodes?

Perhaps finding the better inputs could lead to proactive measures? For
example, let's say there several inputs are: adolescence, public schools,
evenings, and free time. That might lead to keeping schools open late for
extracurricular activities.

~~~
Retra
Thinking about things that way comes across very much like rationalizing the
conclusions you expect, and building a machine that will just agree with you.

~~~
musesum
Yes, I suppose so if I was the one choosing the inputs. That's why I rather
the machine to find the inputs automatically. But, then again, I supposed
there is the bias of the training set.

