
Unexpected sources of bias in artificial intelligence - Parbeyjr
https://techcrunch.com/2016/12/10/5-unexpected-sources-of-bias-in-artificial-intelligence/
======
interpol_p
Here is a paper titled _Automated Inference on Criminality using Face Images_
[0]. The system is trained on existing conviction records. Inheriting all the
prejudices and biases of our current justice system.

[0] [https://arxiv.org/abs/1611.04135](https://arxiv.org/abs/1611.04135)

~~~
tokai
Phrenology is alive and well I see.

~~~
mattkrause
And raising funds! [http://www.faception.com/](http://www.faception.com/)

~~~
notahacker
It's scary that despite parts of their website appearing to be non-too-subtle
satire, all the media evidence suggests they're dead serious about this...

~~~
mattkrause
Crunchbase says that they recently had a Series A round:
[https://www.crunchbase.com/organization/foundersx-
ventures#/...](https://www.crunchbase.com/organization/foundersx-
ventures#/entity) which would make for a _really_ elaborate prank.

The page describing (defending?) the "theory behind our technology" is
amazing. In brief,

1\. Personality is somewhat heritable.

2\. Face shape is also determined by genes.

3\. ???

4\. Profit.

There is even an explicit _positive_ call-out to phrenology. The mind
boggles.....

------
antoinevg
In the days when Sussman was a novice, Minsky once came to him as he sat
hacking at the PDP-6.

"What are you doing?", asked Minsky.

"I am training a randomly wired neural net to play Tic-tac-toe", Sussman
replied.

"Why is the net wired randomly?", asked Minsky.

"I do not want it to have any preconceptions of how to play", Sussman said.

Minsky then shut his eyes.

"Why do you close your eyes?" Sussman asked his teacher.

"So that the room will be empty."

At that moment, Sussman was enlightened.

~~~
justinpombrio
With risk of spoiling the Haiku, could someone explain this in more detail?

What is Sussman suggesting that Minsky do instead: wire the neural net in a
fixed way instead of randomly? What does that have to do with the difference
between viewing the room with your eyes open vs. closed?

~~~
JMStewy
The koan is commonly accompanied by the following:

"What I actually said was, 'If you wire it randomly, it will still have
preconceptions of how to play. But you just won't know what those
preconceptions are.'" \-- Marvin Minsky

------
gosubpl
Obligatory: go to
[http://www.catb.org/jargon/html/](http://www.catb.org/jargon/html/) and
search for "Sussman attains enlightenment"

~~~
Chris2048
is the take-away: bias is not in the initial conditions of the net, but in the
implementation of the neurons and its structure;

closing my eyes makes my eyes insensitive to the state of the room, it does
not make the room empty;

randomizing my network makes it insensitive to the biases of the AI, it does
not mean there are no biases.

I'm finding it hard to figure this one out...

~~~
AnimalMuppet
Bias _is_ in the initial conditions of the net. It's just not deliberately
biased by the author. But it still has bias. "Closing your eyes" to the
initial content/state of the net does not make the initial state go away.

~~~
GFischer
Thank you both for the explanations, I saw the top comment (the Koan) and I
didn't know what to make of it :)

------
dsjoerg
You can gain insight into human problems by noting that these sources of bias
all pertain to natural intelligence as well.

~~~
tomquin
This. Should not come as a great surprise that AI is inherently biased, as are
the people who create it and the people who use it.

Is it possible to not be affected by so many nested layers of bias?

~~~
Chris2048
The bias of the people need't bleed into the AI, since the whole point is that
AI _isn 't_ constructed by experts, but inferred from collected data.

~~~
prolways
The worry is that the training data has been affected by human bias, so that
bias will bleed over into the ai.

------
vanderZwan
> _We tend to think of machines, in particular smart machines, as somehow
> cold, calculating and unbiased. We believe that self-driving cars will have
> no preference during life or death decisions between the driver and a random
> pedestrian. We trust that smart systems performing credit assessments will
> ignore everything except the genuinely impactful metrics, such as income and
> FICO scores. And we understand that learning systems will always converge on
> ground truth because unbiased algorithms drive them._

I absolutely hate this myth, and I suspect it's just an excuse used by
technologists (consciously or not) to wash their hands clean of taking social
responsibility.

There is no such thing as innocent technology, because it is made by and for
humans, and with a purpose. Which means that before we even built anything we
have already left the domain of "is" and entered "ought". Technology "encodes"
values by being an answer to how we attempt to implement some intended
purpose, and thus becomes entwined with the questions of responsibility that
we all struggle with.

Here's a great anthology of writings on that topic: _Inside the Politics of
Technology_ [0], mostly written from a post-phenomenological[1] perspective.

Excerpt from the introduction:

> _It was eventually proven that the pilot acted as capably as possible, so he
> was not to blame. But that only partly settled the question. After this
> first clarification, however important it was to the pilot, the accident
> could still be ascribed either to human error, e.g., false instructions from
> the control tower, poor maintenance, or lack of knowledge about the
> migration cory patterns of birds, or, alternatively, to technical
> deficiencies, e.g., fuel problems or engine failure._

> _In the end, a year after the crash, an official research report established
> the “real” cause of the crash: the snapping-off of a 10-cm metal pin that
> regulated the position of a fin in one of the 13 cogs in the rotor of the
> F-16. This set off a chain reaction demolishing the cogs one by one, ending
> up in a complete breakdown of the engine._

> _But even then the problem of humanity versus technology was not solved. Who
> could be blamed for this technical defect: service engineers, the Air Force,
> or the manufacturer? The latter was ultimately left holding the bag. But
> still then, was it a production fault or a designer’s error? Again,different
> actors and different technicalities are involved._

> _Apparently, a definitive dividing line between technical and human causes
> cannot be drawn. However technical the cause of the crash appeared to be,
> human beings always come along with the technicalities – and vice versa.
> Purely technical causes are just as illusory as purely human faults._

The idea of machines being unbiased and innocent is not just naive, but
outright damaging.

[0] Free PDF:
[https://oapen.org/download?type=document&docid=340207](https://oapen.org/download?type=document&docid=340207)

[1] A friend of mine explained post-phenomenology as follows: _" My
understanding is that for Verbeek and others like him in general,
postphenomenology continues the core traditions of phenomenology but with a
deliberate emphasis on social science methods and empiricism over
philosophical arguments - a positivist Heideggerian phenomenology"_

~~~
andrewboooo
Are there any other writings you could point me to? I've never heard of post-
phenomenology, but I quite like the perspective. The concept of AI certainly
isn't an unfamiliar one, so I imagine there'd be some interesting takes on
that -- hopefully from a realistic, non-AGI perspective.

~~~
vanderZwan
Hmm.. Well, if I'm completely honest I can't say I dove too deep into the
material; this just happened to be part of the theory/philosophy that I had to
study during my design bachelor/master that really resonated with me. But _"
What Things Do Philosophical Reflections on Technology, Agency, and Design"_
by Peter-Paul Verbeek is a touchpoint when it comes to post-phenomenology,
AFAIK.

[http://www.psupress.org/books/titles/0-271-02539-5.html](http://www.psupress.org/books/titles/0-271-02539-5.html)

------
UhUhUhUh
The funny thing about bias is that it was invented, technically, as a way
of... improving the accuracy of data in audio recordings. By adding something
(current or frequency) to the signal you could increase the quality of the
output. Without chasing the metaphor too much, it's also interesting to
considerer that, maybe, adding something, yes a bias, to the data would also
increase their quality. Why did authors (e.g. Azimov) came up with fundamental
robotic laws?

~~~
TeMPOraL
> _Why did authors (e.g. Azimov) came up with fundamental robotic laws?_

To criticize them by highlighting their failings in the stories they wrote, in
order to show us friendly AI is not an easy problem?

------
yummyfajitas
I think we need to stop using the word "bias" and "prejudice". The term "bias"
has a real, well-defined meaning in statistics, and both of these terms imply
_incorrect_ decisions are being systematically made.

Only one example in this article is actually about incorrect decisions. Most
examples are about unbiased learning systems doing exactly what they are
designed to do, but people other than the designers wishing they did different
things.

For example, consider "conflicting goals bias". This isn't a bias - the system
is maximizing clicks exactly as desired. It's just that some random third
party wishes the system were actually trying to mitigate a nonexistent
psychological problem (namely stereotype threat) instead [1].

What they call "similarity bias" is the same thing. The system is attempting
to show people stories they like (and probably does a good job at it), but the
author wishes instead that the selection of stories was closer to what a
journalist might choose.

Another social "bias" \- namely "redlining"/redundant encoding/etc is actually
the _elimination_ of statistical bias. A lot of input metrics - e.g. SAT
score, FICO score, etc - are biased in favor of non-Asian minorities (and
against Asians) [2] and machine learning algorithms designed to find hidden
patterns discover this bias and fix it.

Conflating "someone is doing X but I wish they did Y" with bias is not useful.
It's also not useful to conflate the elimination of socially desirable biases
with introducing new biases.

[1] Stereotype threat has repeatedly failed to replicate. Funnel plots suggest
it only existed to begin with due to publication bias.
[http://www.sciencedirect.com.sci-
hub.cc/science/article/pii/...](http://www.sciencedirect.com.sci-
hub.cc/science/article/pii/S0022440514000831)
[https://dl.dropboxusercontent.com/u/85192141/2013-ganley.pdf](https://dl.dropboxusercontent.com/u/85192141/2013-ganley.pdf)
[https://en.wikipedia.org/wiki/Stereotype_threat#Failures_to_...](https://en.wikipedia.org/wiki/Stereotype_threat#Failures_to_replicate_and_publication_bias)
[https://replicationindex.wordpress.com/tag/stereotype-
threat...](https://replicationindex.wordpress.com/tag/stereotype-threat-and-
womens-math-performance/)

[2] See for example Figure 7 here:
[https://drive.google.com/file/d/0B-wQVEjH9yuhanpyQjUwQS1JOTQ...](https://drive.google.com/file/d/0B-wQVEjH9yuhanpyQjUwQS1JOTQ/view)

~~~
pessimizer
> The term "bias" has a real, well-defined meaning in statistics, and both of
> these terms imply incorrect decisions are being systematically made.

Bias is not a word that was invented by statisticians[1], it's a word that
statisticians adopted. It's just as legitimate to say that when a machine
learning technique is trained with biased input that it produces biased output
as it is to say that some particular machine learning method produces biased
output from unbiased input. Either way, it's perfectly fine to say that the
use of that technique results in bias _even if that bias is in an aspect of
the output that the algorithm wasn 't selected to optimize._

It seems that what you're arguing is that the algorithm shouldn't be blamed in
these cases, but in addition, you're declaring that there's no such thing as
prejudice or bias, and implying that what the algorithms are doing is pulling
some (uncomfortable for gatekeepers) platonic truth out of the ether (without
any acknowledgement of the possibility of biased input.) Or more simply,
you're using this article as an excuse to push a favorite "race realist"
agenda which you use to justify a purist libertarianism (and which I agree it
requires.)

[1]
[http://etymonline.com/index.php?term=bias](http://etymonline.com/index.php?term=bias)

~~~
yummyfajitas
The term "bias" and "prejudice" even outside statistics are defined by the
dictionary as being about getting the wrong results: "preconceived opinion
that is not based on reason or actual experience".

This agrees more or less with the statistical term, it's just less precise.
That's also NOT what machine learning algorithms do.

If you want to declare an algorithm biased because it optimizes what it was
designed to optimize rather than some random tangential goal, then the term
has become meaningless. Similarly, my cell phone is broken because it can only
make calls and is ineffective at pulling trains. A locomotive is also broken
because it doesn't make tacos.

 _It seems that what you 're arguing is that the algorithm shouldn't be blamed
in these cases, but in addition, you're declaring that there's no such thing
as prejudice or bias...what the algorithms are doing is pulling some
(uncomfortable for gatekeepers) platonic truth out of the ether (without any
acknowledgement of the possibility of biased input.)_

I have no idea how you could possibly think I made this claim given that I
_explicitly listed two examples of algorithms which are biased._ I then hinted
at how algorithms can eliminate this bias. Consider rereading what I wrote.

If you want more details on how algorithms correct biased inputs (it's not
based on "ether") read the blog post I linked to downthread:
[https://www.chrisstucchio.com/blog/2016/alien_intelligences_...](https://www.chrisstucchio.com/blog/2016/alien_intelligences_and_discriminatory_algorithms.html)

 _Or more simply, you 're using this article as an excuse to push a favorite
"race realist" agenda which you use to justify a purist libertarianism (and
which I agree it requires.)_

I have no idea why you think "race realist" agendas require purist
libertarianism, or vice versa. That's a random political tangent and totally
unrelated to this conversation. I'm beginning to think you are seeking to
derail the conversation rather than discussing statistics in good faith.

In the hopes that I've misunderstood you, I will nevertheless provide you with
a good faith clarification. I made no normative claims regarding
libertarianism or anything else. The only normative claim I've made is that we
should not wrongfully apply an existing term (bias) to completely unrelated
concepts (having undesired social outcomes).

~~~
throwaway729
_> That's also NOT what machine learning algorithms do._

A machine learning algorithm run on non-representative input could be said to
lack experience with the portion of input that isn't represented in the
training set.

E.g. I think it's pretty obvious that machine learning algorithms exhibit far
more vernacular-bias than purely logical reasoning techniques.

 _> I made this claim given that I explicitly listed two examples of
algorithms which are biased._

Your parent is pointing out the very real possibility that _technically-
unbiased algorithms can produce vernacularly-biased outputs_. I.e., that
statistic's definition of "bias" does not sufficiently capture the notion of
bias as it's used in the vernacular.

I don't think you've refuted that claim.

At the end of the day, your approach toward side-stepping the issue of
vernacular bias is intellectually lazy. Instead of tackling the problem head
on, you're hiding behind a mathematical object that happens to have the same
name as the actual thing under discussion.

 _> I'm beginning to think you are seeking to derail the conversation rather
than discussing statistics in good faith._

I think your characterization of bias in terms of statistic's technical
definition is already skimming the edges of good faith. You know what the
reporter means when they say "bias", and that's _not_ the meaning that
statisticians use in technical settings.

~~~
yummyfajitas
I know what the reporter means: "bias" refers to "outcomes the reporter
dislikes and thinks he can generate clickbait with". I'm explicitly advocating
against this practice and in favor of using clear and precise language
instead.

 _At the end of the day, your approach toward side-stepping the issue of
vernacular bias is intellectually lazy. Instead of tackling the problem head
on, you 're hiding behind a mathematical object that happens to have the same
name as the actual thing under discussion._

I'm not side stepping the issue. I'm advocating in favor of describing the
problem with clear language - once that's done we can begin useful discussion
on how to address it.

In precise terms, what do _you_ think the problem is?

~~~
throwaway729
_> "bias" refers to "outcomes the reporter dislikes and thinks he can generate
clickbait with"_

That's quite a straw man for someone who claims _others_ are trying to derail
a conversation.

 _> I'm explicitly advocating against this practice and in favor of using
clear and precise language instead._

The language you're suggesting masks the problem.

It's nice that statisticians came up with a mathematical object that happens
to have a politically charged name, but the conflation you're making is _not
an argument_ , and doesn't address the underlying (inherently political)
problem.

 _> In precise terms, what do you think the problem is?_

The precise problem is that things like skin color etc. often have strong
predictive power. We've recognized this unfortunate fact -- and its societal
implications -- and decided to create laws and social norms that regulate
(either restrict or sometimes even require) the use of these properties in
certain high-impact decisions.

Unfortunately, algorithms sometimes use these features to make decisions in
settings where a typical human wouldn't (or might even get into legal hot
water if they did).

Notice that an algorithm can exhibit this behavior without being
(statistically) biased.

Now, perhaps you think this problem isn't actually a problem. And you can
certainly make the argument that _any form of vernacular-discrimination should
be OK as long as it doesn 't exhibit statistical bias_. But that's a separate
conversation, and you can't defend that position by playing with definitions.
And if you're not making that argument, then it should be clear why
vernacular-bias differs from statistical-bias.

~~~
yummyfajitas
You and pessimizer have just demonstrated pretty clearly why Techcrunch's
language choices are bad.

Pessimizer thinks the article is discussing inaccurate conclusions and that
race realism is wrong (
[https://news.ycombinator.com/item?id=13158797](https://news.ycombinator.com/item?id=13158797)
). You think race realism is accurate but we should ignore it. You both read
the same article but drew totally different conclusions due to it's imprecise
language.

Many examples in the article don't fit your description of the problem at all
(e.g. difficulties in image processing, Tay the trolled chatbot, filter
bubbles), so I don't think you are correctly describing what the article means
by "bias" either. I can't figure it out either; the best I can come up with is
"AI-related things that make the author have negative feelings".

I didn't argue in favor of or against your mysterious "vernacular-
discrimination" \- how could I when I don't even know what it is? The only
thing that I argued is that the imprecise language used by Techcrunch is
confusing people and that this is bad.

~~~
throwaway729
_> You think race realism is accurate but we should ignore_

No, I do not think race realism is accurate. I just think that race realism is
not an adequate defense of racial discrimination, AND ALSO that race realism
is inaccurate.

Pessimizer and I agree.

 _> Many examples in the article don't fit your description of the problem at
all_

You asked me what _I_ think the problem is, not what _I think the article 's
author thinks the problem is_. The fact that I emphasize different things is
not proof that I misunderstand the article's author.

 _> The only thing that I argued is that the imprecise language used by
Techcrunch is confusing people and that this is bad._

Who, exactly, is supposed to be confused?

Subject matter experts should know the difference, and if they don't, that's
really their own problem.

The public at large doesn't even know what statistical bias is, so why in the
world would they be confused?

No matter how confusing the current state of affairs is, it would be _far more
confusing_ if suddenly everyone starting using bias to mean statistical bias.
Mostly because 99.9% of the people using the term wouldn't have an adequate
grasp of mathematics to even know to which object they are referring. So if
your priority is clarity, then keeping around the messy moralized and
politicized vernacular notion of bias is far preferable.

 _> I didn't argue in favor of or against your mysterious "vernacular-
discrimination" \- how could I when I don't even know what it is? _

Not everything in the world can be mathematically formalized. Most things
don't even have fixed meaning over time. If your criticism is genuinely about
imprecision, you've got a _lot_ of much lower hanging fruit than
"discrimination" and "bias"...

~~~
yummyfajitas
_The precise problem is that things like skin color etc. often have strong
predictive power...Notice that an algorithm can exhibit this behavior without
being (statistically) biased._ \- throwaway729, 2 hours ago
[https://news.ycombinator.com/item?id=13159539](https://news.ycombinator.com/item?id=13159539)

 _No, I do not think race realism is accurate._ \- throwaway729, 22 minutes
ago
[https://news.ycombinator.com/item?id=13160709](https://news.ycombinator.com/item?id=13160709)

Whoops!

 _The fact that I emphasize different things is not proof that I misunderstand
the article 's author._

Fine - what do you believe the author thinks "bias" means? I.e. what is the
underlying concept behind all his examples?

 _Who, exactly, is supposed to be confused?_

Pessimizer, for one, who thinks the problem is inaccurate predictions. 2 hours
ago you disagreed with this, but 22 minutes ago you agreed. I'm getting kind
of confused.

 _If your criticism is genuinely about imprecision, you 've got a lot of much
lower hanging fruit than "discrimination" and "bias"..._

Sure, but this is my field. I give talks about misuse of p-values and the
multiple comparison problem, even though there might be other issues worth
worrying about. It's just what interests me. The fact that bigger problems
exist doesn't mean we shouldn't fix this one.

~~~
throwaway729
_> Whoops!_

I take race realism to be the sort of thing that posits an inherent
relationship between skin color/other physical features, and other
innate/intrinsic/genetic properties such as intelligence or criminality --
especially as a prior and when the trait is normative and negative. So "dark
skinned people are less intelligent" is race realism, but "dark skinned people
are treated poorly by the political establishment" is not race realism. (We're
far afield, I don't mean for this thread to devolve into a definition or
debate about race realism -- I'm just saying what I take race realism to
mean.)

The important point isn't a precise characterization. Rather, the important
point is this:

It's possible for race to be predictive without there being _inherent_
differences between people of different physical attributes. Setup a system in
which I take away all green people's life savings at the end of the day with
probability .99 if they don't have a house and .5 if they do have a house.
Both numbers are .2 for purple people. An unbiased lender can choose not to
lend to green people without exhibiting statistical bias -- in fact, color has
huge predictive power.

But that predictive power comes from the social system we setup specifically
to screw green people, not from _inherent_ differences that are causally
related to color posited by race realism.

But this is an extremely obvious observation, so perhaps you meant something
different by race realism? In any case, I don't think the disagreement you
posited really exists.

The rest of the points you're making are covered elsewhere.

I deal with Real numbers every day, but I don't barge into discussions of
epistemology and demand my axioms be used in discussions of a vaguely related
but definitely different topic. And I'm careful in pop lectures to point out
that Real numbers are just a mathematical object and no more more "real" than
natural numbers.

If people in your statistics lectures don't know what bias means, then you
should account for this in your teaching and lecturing. I know my statistics
lecturer did.

------
_pdp_
Junk in Junk out - it applies for both machine and human learning.

------
tudorw
Cognitive bias cheat sheet :) [https://betterhumans.coach.me/cognitive-bias-
cheat-sheet-55a...](https://betterhumans.coach.me/cognitive-bias-cheat-
sheet-55a472476b18)

------
jakosz
I struggle to see how data-driven bias is unexpected.

