
A.I. ‘Bias’ Doesn’t Mean What Journalists Say It Means - Jerry2
https://jacobitemag.com/2017/08/29/a-i-bias-doesnt-mean-what-journalists-want-you-to-think-it-means/
======
zachwooddoughty
>> ProPublica labelled the algorithm as biased based primarily on the fact
that it (correctly) labelled blacks as more likely than whites to re-offend
(without using race as part of the predictor), and that blacks and whites have
different false positive rates. >> In the conception of these authors, “bias”
refers to an algorithm providing correct predictions that simply fail to
reflect the reality the authors wish existed.

The gist of the article is that statistical bias is not the bias that
journalists are interested in. The article doesn't discuss how these biases
are related or relevant to each other, but rather assumes that statistical
bias is the only one that should matter. I think the article is missing a
discussion of the gaps between what are the measured inputs into your
statistical model, and what can be acted upon from a policy perspective.

As a thought experiment, suppose the only two inputs that determine a person's
recidivism rate is their past criminal history and whether they had lead
poisoning as a child, but that of these two we can only measure past criminal
history as an input into our algorithm. If race is strongly associated with
childhood lead poisoning (such as in real life [1]), then our algorithm might
get higher classification accuracy by including race [2] as an input in its
training data. This might have less statistical bias, but would bias against
individuals of a race who are in truth not at higher risk of recidivism.

[1]
[https://scholar.harvard.edu/files/alixwinter/files/sampson_w...](https://scholar.harvard.edu/files/alixwinter/files/sampson_winter_2016.pdf)

[2] The actual COMPAS algorithm doesn't use race as an explicit input, but
that doesn't really change the issue.

~~~
zo7
I feel the author failed to address the crux of why people argue that "non-
statistical" bias is bad – that we should be judged by our actions, and not by
factors out of our control such as race, class, the family we were born into,
or where we were born.

If we included every aspect about a person in some statistical model, we may
discover "uncomfortable truths" that hold true for the general population. But
these truths, while statistically correct, may fail our test for what we
consider to be philosophically fair, and ultimately undermine an individual's
agency to act independently.

So perhaps in your experiment, the problem is that our feature selection is
not reflective of the values we'd like to uphold, and that aspects like "had
lead poisoning as a child" is not a sound feature to include in our model
because it measures aspects of a person outside their control. Instead maybe
our feature set should only include aspects that measure facets that are under
the individual's control such as community service, whether they still
associate with other criminals, whether they have or are pursuing education,
whether they have children to care for, etc. (or some other feature set that's
more thought out and sound, but you get the gist)

This still may not have as good accuracy as a model that included other
features about the person, but it's arguable that this system would be more
fair, especially over a model using more features but was artificially fudged
to satisfy some prior about what we consider fair/unbiased.

~~~
bobcostas55
>I feel the author failed to address the crux of why people argue that "non-
statistical" bias is bad – that we should be judged by our actions, and not by
factors out of our control such as race, class, the family we were born into,
or where we were born.

This is exactly what the author is talking about. You are comparing the
predictions against your fantasy of a world where these aspects do not matter
because they're not "fair". When they don't match up, you call the predictions
biased. But these factors outside our control do matter, accounting for them
does not introduce bias, and averting our eyes will not change that fact.

~~~
zo7
I'm not claiming that we should ignore these aspects and lie to ourselves
about reality, in fact I'm acknowledging that these relationships exist. My
greater point is that the author is trying to use the meaning of statistical
bias to dismiss what journalists/laypersons consider bias without addressing
why the latter is concerned with bias in the first place.

My suggestion is that we should be using a better feature set that only looks
at aspects that we can reasonably hold an individual responsible for rather
than using demographic information which is out of an individual's control. If
we have two convicted criminals with similar crimes, behaviors, and histories,
but one is white and grew up in a wealthy neighborhood while the other is
black and grew up in a poorer town, why should the former be granted a higher
probability of parole than the latter? Why should either of them be held
responsible for the actions of others? Even if in expectation people from the
latter demographic were more likely to reoffend than the former, that is not
justice – it undermines liberty.

------
skybrian
Remember that these algorithms are attempting to predict the future based on
past results. What the author calls a "true fact about reality" is a fact
about the present or past. We then use this understanding of the past to try
to predict the future. But as they say in finance, past performance is no
guarantee of future results.

The whole idea behind "debiasing" is to avoid writing people off based on
history of their group. Yes, taking chances on people _who haven 't done
anything yet_ can cost money and a profit-maximizing algorithm will sometimes
automatically avoid the risk by reproducing biases.

And that's why the algorithm designers need to watch over the algorithm and
make sure it's not writing people off based on inferring the group they're in.
As a society we've decided that giving people a chance to defy the historical
odds of their group is worth the cost. Maintaining the status quo may be
profit-maximizing but profit isn't actually the only goal. (Any more than
paperclip-maximizing is.)

On the other hand these inferences can often be used in better ways. For
example, colleges can identify "at risk" students and make sure they get extra
help.

~~~
carlmr
>On the other hand these inferences can often be used in better ways. For
example, colleges can identify "at risk" students and make sure they get extra
help.

I like that last sentiment. We shouldn't lie to ourselves about the truth, but
we should use it to help make the situation better for society as a whole, not
to discriminate against certain groups.

------
shubb
This brushes over the very real prospect of automated discrimination that we
increasingly face as AI techniques replace simple statistical methods.

As a society, we would want to avoid a situation where someone is turned down
for a job for being female or black. There are various laws in place to try
and prevent this.

In the olden days, a way of pricing insurance policies would be to pick a
bunch of features, identify how each one contributes to risk, and use that for
pricing.

Some characteristics are protected and you can't use them for pricing. If you
are using them in your CV sifting algorithm you are heading for a lawsuit.
It's fairly easy to spot if ethnicity is in your feature set.

Sometimes, you'd have a feature that was a proxy for a protected
characteristic. Supposing that women are safer drivers, perhaps a medical
history of using contraceptive pills would be predictive of safe driving,
because it is in fact a proxy for gender. So statisticians would examine the
dominant factors in their pricing model and show that whey were not proxies.

Now maybe we could conceivably throw a bunch of data at a neural network type
thing, and get an opaque pricing model. It might find proxies, even if you
don't include the protected data items.

Maybe, if (according to the article) black people with certain characteristics
are more likely to default on loans, it could be using common black names or
media preferences as a proxy for ethnicity. An applicant who had few negative
characteristics might get a high quote simply because the system figured out
he was black. And no one would really know.

It would be very hard to look at a complex bunch of weights in layers, and
figure out that not only did liking certain TV shows effect your claim, but
also that certain specific TV shows had a high impact and were correlated with
race or gender or whatever. You'd just see a bunch of weights trained on
clickstream data or something.

That's worth thinking about as a society I think?

~~~
carlmr
You mean if we model an AI after our biological brains it will be prone to
stereotyping? So our artificial brains do in fact do what we set out to do.
They act a bit too human.

~~~
shubb
No, that isn't what I meant.

Older statistical methods were not complex and could be analyzed by a human to
ensure they complied with legal and moral obligations.

Newer approaches such as but not limited to neural networks create a model
that is so complex that it is opaque. You cannot necessarily verify that it is
complying with legal and moral obligations.

This is fine when you are doing song recommendations. It is more of a problem
when you are making decisions that could be seen as discriminatory.

There are probably best practices that can mitigate some of these problems,
and we can use our human understanding of the features to try and reduce risk.

Indeed, this is the same problem as 'How do we build a safety critical system
with AI, and deterministically show its safe like we could with a simpler
one'. Here is a writeup of some of those issues [1].

But not everyone will - because sometimes discrimination is profitable if you
can get away with it. If a car insurance company can hide gender
discrimination behind a complex model then they are incentivized to do that.

So, we possibly have a social problem.

The problem is not that newer techniques are 'like a brain and susceptible
bias in the same way' because to some extent that isn't true - they work on
the real data available unless you feed them movie scripts.

The issue is if they intentionally or accidentally become a way of getting
around laws society made for good reason.

[1] [https://intelligence.org/2013/08/25/transparency-in-
safety-c...](https://intelligence.org/2013/08/25/transparency-in-safety-
critical-systems/)

------
launchtomorrow
The thing that I worry about more is the media’s bias toward fairness. Nobody
uses the word lie anymore. Suddenly, everything is 'a difference of opinion.'
If the entire House Republican caucus were to walk onto the floor one day and
say “The Earth is flat,” the headline on the New York Times the next day would
read 'Democrats and Republicans Can’t Agree on Shape of Earth.' I don’t
believe the truth always lies in the middle. I don’t believe there are two
sides to every argument. I think the facts are the center. And watching the
news abandon the facts in favor of “fairness” is what’s troubling to me.

\--Aaron Sorkin

