
Text Embedding Models Contain Bias - gajju3588
https://developers.googleblog.com/2018/04/text-embedding-models-contain-bias.html
======
danielvf
This is a genuine, difficult problem. It's so easy to join up on your
political team of choice and scream about it, and all this makes any real
attempt to solve it so much harder to talk about in public or collaborate on.
In fact, there's practically guaranteed to be some greyed out text in the
discussion here.

So some of these associations simply reflect the way-the-world-was or the way-
the-world-is - like associating "woman" with "housewife". That's a whole
debate in itself.

But some of these can be accidental. Suppose a runaway success novel/tv/film
franchise has "Bob" as the evil bad guy. Reams of fanfictions are written with
"Bob" doing horrible things. People endlessly talk about how bad "Bob" is on
twitter. Even the New York times writes about Bob latest depredations, when he
plays off current events.

Your name is Bob. Suddenly all the AI's in the world associate your name with
evil, death, killing, lying, stealing, fraud, and incest. AI's silently,
slightly ding your essays, loan applications, uber driver applications, and
everything you write online. And no one believes it's really happening. Or the
powers that be think it's just a little accidental damage because the AI
overall is still, overall doing a great job of sentiment analysis and fraud
detection.

~~~
joe_the_user
With current technology the problem of Bob (or Adolph or Mohamed) becoming
associated with evil is insolvable by the fact that current deep learning
systems are fundamentally characterized by not being able to distinguish
causation from correlation.

The only solution I can see is forcing any company that imposes _life-defining
actions_ on people (credit bureaus, banks, parole boards, personnel offices,
etc) to use only rules based on objective criteria and to prohibit systems
based on a "lasagna" of ad-hoc data like present day AI systems. Indeed, if
one looks at these in the light of day, one would have to describe such system
as _fundamentally evil_ , the definition of "playing games with people's
lives." (just look at the racist parole-granting software, etc).

~~~
AnthonyMouse
> The only solution I can see is forcing any company that imposes _life-
> defining_ actions on people (credit bureaus, banks, parole boards, personnel
> offices, etc) to use only rules based on objective criteria and to prohibit
> systems based on a "lasagna" of ad-hoc data like present day AI systems.

That is probably the exact opposite of what you really want. If the problem is
that someone's name is Bob and the AI thinks Bobs are evil, what you want is
for there to be 100,000 other factors for Bob to show the system that it isn't
so. As many factors as possible, so that the one it gets wrong will have a
very low weight.

Even the objective criteria will have biases. There is a significant racial
disparity in prior criminal convictions, income, credit history and nearly
every other "objective" factor. The more factors you bring in, the more
opportunities someone in a given demographic has to prove they still deserve a
chance.

~~~
ori_b
> That is probably the exact opposite of what you really want.

No, it really isn't. In an ideal world, the reasons behind a decision are
transparent, auditable, understandable, and appealable. Machine learning is
none of those.

~~~
erik_seaberg
In an ideal world, society would compensate you when they would like you to
assume a risk below its market price, rather than forcing you to pretend not
to notice the risk.

~~~
joshuamorton
But there isn't actually a risk associated with an Adolf. Its an inefficiency
born out of an incorrect belief by the whole of (or at least most of) society.
The correct solution is not to price in the incorrect assumption, but to not
make the incorrect assumption.

In other words, by offering Adolf's below-market rates, you're exploiting a
market inefficiency at no additional risk. This is an ideal world as you
describe it. It's capitalism at it's finest!

------
troupe
> But what if we found that while Model C performs the best overall, it's also
> most likely to assign a more positive sentiment to the sentence "The main
> character is a man" than to the sentence "The main character is a woman"?

As I understand the problem, they are saying that statistically, the statement
about the main character being male is a bit more likely to be positive than
if the same thing is said about a woman. If that is statistically true and you
are trying to create a model to determine the level of positive sentiment in a
review, then that may be a legitimate indicator of how people categorize
things. If the goal is to try to "fix" how people talk and write, I'm not sure
ignoring statistical patterns in the way we talk is really the right approach.

~~~
lsiebert
Actually it's any name, even the name of the reviewer, as the test they
conducted showed.

the issue is that humans don't understand multi-variable statistical analysis,
whether in the form of ANOVA or machine learning training, so they try to pack
everything down into two or sometimes three variables of output.

And that's fine if you are pulling from a population that's homogenous. But if
there are two or more discrete subpopulations, you want to control for them or
represent them separately, not just ignore them or pretend they reflect the
information you want.

Anyway if the reviewer name is enough to throw the results, it may suggest
that guys praise more movies, rather than movies with male character's get
more praise.

I just looked at the first page of 10 star reviews of battlefield Earth and
none seemed to be female, just saying.

~~~
dibstern
If the multiple sub populations exist, that’ll be reflected in the data, as
long as your dataset is good enough.

~~~
lsiebert
Maybe? But even something as simple as an analysis of human height has
genetic/ethnic, nutritional, age and gender components, not to mention
historical differences. If your input is name, and your output is predicted
height, knowing and testing for bias in your data sources is definitely
important.

I think where people get caught up is that they don't see the world at large
as biased, because they view their understandings as essentially correct. For
example, we expect judges to rule fairly on every case, right?

To pick a non controversial issue, likelihood of parole is apparently effected
by how recently the judge ate
[https://www.economist.com/node/18557594](https://www.economist.com/node/18557594)

Now, parole data accurately reflects how people were paroled, so predicting
likelihood of clemency requests is a perfectly valid use of that data. If you
were doing machine learning to try to help people get paroled, you'd want to
leave that bias in as a predictor, because it's unfair but real.

But you'd probably want to adjust that data to correct for the recently having
eaten bias if you were writing a system for parole recommendations for new
judges based on past judicial decisions.

You wouldn't want people to be more likely to be denied just because they came
before a judge before lunch. And if you don't test for a bias like that, how
would you be able to tell that the machine learning algorithm had it? And it
wouldn't even need to be direct... seeing people A-Z in court could mean a
bias based on name.

Long story short, bias is a real issue, and you need to be aware of it and
test for it, not assume that your input data isn't effected by human error.

------
rspeer
I'm glad that Google is part of this conversation, and they're now applying
tests for bias to new models that they release. (Some of their old models are
pretty awful.)

If you want to see a further example, in the form of a Jupyter notebook
demonstrating how extremely straightforward NLP leads to a racist model,
here's a tutorial I wrote a while ago [1]:

[1] [http://blog.conceptnet.io/posts/2017/how-to-make-a-racist-
ai...](http://blog.conceptnet.io/posts/2017/how-to-make-a-racist-ai-without-
really-trying/)

~~~
nl
For those who aren't aware, @rspeer has been taking this problem seriously for
years.

His ConceptNet NumberBatch embeddings[1] are one of the few pre-built releases
which attempt to fix this.

[1] [https://github.com/commonsense/conceptnet-
numberbatch](https://github.com/commonsense/conceptnet-numberbatch)

------
cperciva
I imagine that the model would also score "he was murdered" higher than "she
was murdered". Models reflect their inputs, and it happens that yes, murder
victims are disproportionately likely to be male and nurses are
disproportionately likely to be female.

Is there a problem we should address here? Absolutely -- but the problem is
that men keep on getting murdered, not that the model recognizes truths with
which we are uncomfortable.

~~~
allenz
Biased models can cause ethical and legal problems. While your specific
example is not a huge deal, the article gives the example of making hiring
decisions in part based on sentiment analysis of candidates' text reviews. In
this context, an engineer has responsibility to ensure that the model has no
gender, race, or age bias towards candidates' names.

For a real life example, in 2017 Google was more likely to filter the comment
"I am a woman" than "I am a man": [https://www.engadget.com/2017/09/01/google-
perspective-comme...](https://www.engadget.com/2017/09/01/google-perspective-
comment-ranking-system/)

Or consider the impact of any bias in AI for criminal sentencing
recommendations: [https://www.wired.com/2017/04/courts-using-ai-sentence-
crimi...](https://www.wired.com/2017/04/courts-using-ai-sentence-criminals-
must-stop-now/)

~~~
home_boi
If the biased models replace human decision making, then it just has to be
shown that the models are less biased than humans which may not be that high
of a bar to pass.

~~~
allenz
US law prohibits businesses from discrimination. If you're sued, you can't
argue that you only discriminated the average amount.

------
smallnamespace
Be careful what you wish for.

Under this definition of 'bias', an unbiased model would, say, spit out equal
associations between any occupation and any gender/sex/age/race/religion
label.

We should probably ask ourselves whether that's a strictly desirable outcome,
since by definition the 'biased' model has a higher predictive value. How much
accuracy are we willing to sacrifice for the sake of erasing inconvenient
facts about either our world, or our current models of the world?

~~~
perfmode
Predictive value isn’t an end in itself.

~~~
maldeh
Hit the nail on the nose.

Sure, you can build a discriminating classifier or generative model that is
the most accurate, correctly identifying / emulating the reality of our world
down to 5 nines; and nobody says you shouldn't be able to identify or quantify
all of these "inconvenient" associations. The trouble is always when you
intend to make decisions based on your framework -- the decision to offer
somebody a loan, the choice of language your chatbot uses, etc. -- and that is
where fairness ought to be paramount.

And yes, if you manage to build a discriminating model that sneaks in
protected classes through indirect causal effects with no attempt to suppress
them, it would yield your insurance agency higher returns over time, and just
because that's the way the world is, currently. But all this would achieve is
perpetuating the current status quo, placing short-term gain ahead of long-
term equality. You might be doing right by yourself, but that still makes you
morally impugnable.

One shouldn't need to wait for an overreaching law prohibiting such
indirection, to do the right thing.

~~~
haberman
If a person is loan-worthy, offering them a loan is good business sense. If
others are denying loans for no good reason, that is a business opportunity.
It doesn't require benevolence to offer people loans that they are qualified
for.

How far do you expect systems to go to ignore real-world associations? What if
it is coming down to personal safety? Would you object to a self-driving car
that routes itself around more dangerous neighborhoods? What about a model
that predicts that unknown men on the street at night are more dangerous than
unknown women?

~~~
paganel
> If others are denying loans for no good reason, that is a business
> opportunity. It doesn't require benevolence to offer people loans that they
> are qualified for.

Afaik the black families who were segregated out of their neighborhoods back
in the 1940s, 1950s and going into the 1960s were never provided with another
comparable "business opportunity" that could have "righted" their situation,
they had to settle with living in what turned out to become "ghettos" (for
lack of a better word).

> Would you object to a self-driving car that routes itself around more
> dangerous neighborhoods? What about a model that predicts that unknown men
> on the street at night are more dangerous than unknown women?

Following the same line of thought, would you be ok with an AI system giving a
person named Deion or Jayla or Latisha a higher interest rate on their
mortgage (and so, potentially, driving them out of certain markets) compared
to the interest rate offered to persons named Chad or Emma or Sophia?

~~~
haberman
The shameful history of "redlining" (housing segregation) was driven by
policy, not market forces. Black people weren't being judged non-credit-
worthy, they were being explicitly prohibited from borrowing and buying by FHA
policy and housing covenants. So this history, while awful and something that
we should not repeat, does not speak to my point that making rational
decisions about credit-worthiness is good business sense.

> Following the same line of thought, would you be ok with an AI system...

You didn't answer my question. I asked it because I want to know if the people
most vocally arguing that de-biasing is a moral imperative will admit even one
case where there might be a compelling reason to see the world as it is, even
if that association is considered "problematic".

If people will admit this, then we can argue over where the line should be.
But many don't appear to admit that a line even exists.

I will admit that the line exists, and that an incident like this is a clear
example where removing hurtful associations is proper. This was a case where a
ML model reinforced a loaded and racist stereotype, and the harm of removing
that from the model is almost zero:
[https://www.theverge.com/2015/7/1/8880363/google-
apologizes-...](https://www.theverge.com/2015/7/1/8880363/google-apologizes-
photos-app-tags-two-black-people-gorillas)

~~~
paganel
> You didn't answer my question. I asked it because I want to know if the
> people most vocally arguing that de-biasing is a moral imperative will admit
> even one case where there might be a compelling reason to see the world as
> it is, even if that association is considered "problematic".

I didn't find those questions to be that smart, to be honest, they're more on
the scare-mongering side, and I find that a smart question should have half of
the answer included in it and a scare-mongering question doesn't look like it
contains anything smart (at least to me). But if you really want to know my
answer is "yes" to all of your questions. To get into more details: I grew up
as a kid into a middle-class-ish family and back then I had no issues going to
the "ghetto"/dangerous area of the town I grew up in (I grew up in Eastern
Europe, so that the "dangerous" area was populated by the local gipsy
community instead of the AfroAmerican/Latino communities now associated with
the "dangerous" areas of US cities). I turned up fine.

> If people will admit this, then we can argue over where the line should be.
> But many don't appear to admit that a line even exists.

I know of that line, I was just trying to say that further reinforcing it
using ML techniques will only aggravate things at a societal level (so that
"dangerous" areas will become even more "dangerous"). At least when the
discrimination happens out of our (us, humans) own volition there are ways to
fix it, but once we "outsource" our racist-tendencies to ML-like tools the
voice inside of us telling us that this is all wrong will become even weaker.
After all, the algorithms/machines are more "right" then us, humans, that's
what we like to think.

~~~
haberman
If we're trading anecdotes, I grew up in a small town where I didn't worry
about where I went. Then as an adult in a big city I was mugged at knife-point
for obliviously walking through a bad part of town at night. If you want to
dismiss people's concerns for their own safety as "scaremongering", don't be
surprised when people don't find much use for your ideology.

------
haberman
This sounds a lot like how people get TSA redress numbers when their info
falsely flags them as suspicious ([https://www.dhs.gov/redress-control-
numbers](https://www.dhs.gov/redress-control-numbers)). Or how Barack Obama
suffered innuendo around the middle name "Hussein." Mistaken identity or
unfortunate associations are as old as humanity. AI systems (and non-AI
systems) need ways to deal with these problems, but we also have a lot of
experience about how to do that.

~~~
bobthepanda
The problem is when we hand AI full reigns over a system that should have
humans with the final say, because doing it with humans is "too hard" or
"doesn't scale." At least with humans you have auditable decision making and
recourse via the legal system, even if it can be hard to fight. AI are
essentially black boxes that unconsciously learn the biases of the datasets
they are given.

When AI starts determining rather consequential things like how long to send
someone to prison for, that's a problem.
[https://www.nytimes.com/2017/05/01/us/politics/sent-to-
priso...](https://www.nytimes.com/2017/05/01/us/politics/sent-to-prison-by-a-
software-programs-secret-algorithms.html)

~~~
manux
> AI are essentially black boxes that unconsciously learn the biases of the
> datasets they are given.

Please don't confuse "AI" with the current state-of-the-art of the latest deep
learning model. Many of us researchers are working on interpretability and
understanding of causality.

Calling it "AI" makes it appear final, as if in 50 years all machine learning-
based decision systems will behave exactly as they do now, without nuance.

~~~
bobthepanda
Calling currently existing models AI is no different than calling a Model T a
car. It being more primitive than what may exist in the future is of little
consequence to the public dealing with the consequences of AI today.

AI can only learn from the data it has, so it will always carry some sort of
bias, because it is impossible to collect the nuance of every last bit of
context into a digestible data format. At best it's an advisor, but it should
never be a decision-maker.

~~~
AstralStorm
Very different in fact. Model T had most big level components of a modern car
in place. Rubbered wheel. Protection from elements. Combustion engine.
Transmission.

In comparison, current automated decision making systems are at the stage
where we do not even know what the components are. And calling them
intelligent is insulting.

------
ajwnwnkwos
This is an example of sacrificing the scientific method to make results more
politically correct. We've come full circle.

~~~
to_bpr
>This is an example of sacrificing the scientific method to make results more
politically correct. We've come full circle.

Wrongthink is nothing new, comrade ajwnwnkwos, and is as prevalent as ever.

The output of science has been suppressed throughout history where it didn't
fit the narrative of the day. At one period, that was by the Church. Another,
it was at the hands of the government. Today, mainly by a self-censoring,
everything-must-be-pleasant-and-entertaining society that is very highly prone
to fits of outrage.

Inconvenient facts are, after all, inconvenient.

~~~
SolaceQuantum
I never actually understood the difference between self-censoring and just
deciding to not be offensive, and why deciding not to be offensive is such an
unscientific thing. If you can objectively observe that saying something
doesn’t accomplish your goals(assuming your goals is to make friends and get
along with others) and seems to offend or harm people around you, what are the
consequences to simply not saying things?

In the specific context of science I’m unsure what self-censoring would even
look like. Not only would such conclusions be a relatively small subset (hard
to discuss the self-censorship of a graph algorithm) but there is plenty of
attention(therefore funding) on both sides to uncover some truth about any
political subject.

~~~
olleromam91
Because sometimes to reach the truth, you must risk being offensive.

------
taneq
In Which We Define 'Biased' As Meaning 'Not Conforming To Our Ideas Of How The
World Should Be'. Because it's unthinkable that movies with male main
characters could actually just be better than movies with female main
characters.

(I'm not saying they are, mind you - but when we analyze sentiment in a large
dataset and reach a result like that, the first question to ask should be "is
that result accurate?" not "how do we tune out this problematic result?")

~~~
stfwn
> the first question to ask should be "is that result accurate?" not "how do
> we tune out this problematic result?"

That depends on the product that you want to make. If you’re a company that
wants to sail on the status quo and just make money, by all means be amoral.

On the other hand, technology now allows us to analyze and steer social
structure at scale. We could reproduce past inequalities in the name of
efficiency, or we could think a bit longer about our axis of optimization.

“But cultural relativism!” — I agree. It’s a tough problem, but we don’t all
of the sudden need a definitive generalized social plan. Biases can be tackled
one algorithm at a time.

~~~
iguy
Maybe a useful way to think about this is Positive/Normative distinction.

Science is "amoral" in your sense, it's trying to describe how things really
are, and make predictions. So is the stock market: you're rewarded for correct
predictions, whether or not others regard those outcomes as desirable.

Of course we are also moral beings, interested in changing things for the
better. And we have varying and contradictory ideas about what better means.

As soon as you say "steer social structure at scale" the question is: who gets
to steer? If this were obvious, then we wouldn't need democracy, we could just
all work together not to "reproduce past inequalities"... but we can't.

------
home_boi
The politicization of bias in this realm is unproductive. We know where it
leads. There will be 'committees' of people who are not trained in statistical
thinking, who haven't even taken a statistics class, who are not qualified to
make any statements and who's interests are not in line with making the best
product acting as nuisances and aggressors to people who are do have the
expertise and who strive to make the best product.

The best course of action is to treat this bias like normal bias with non-
politicized, inanimate objects. How would data scientists that encountered
similar bias while running machine learning models of the motion of waves act?

~~~
gowld
The article was written by trained experts in statistical thinking. It's worth
a read.

------
lsiebert
I'm reminded of
[https://whitecollar.thenewinquiry.com](https://whitecollar.thenewinquiry.com)

It turns out that white collar crime is predominantly committed by white men.
A system trained to detect white collar crime using, say, enron emails, might
suggest a white guy's emails over someone whose name doesn't sound like an
enron employee, or who shared pictures of their cat.

I mean, I suppose you can argue that hey, maybe that bias is usually correct.
Maybe it usually is the white guy. But personally, I'd probably control for
things conflated with gender or race and then look for indicators that
differentiate between criminals and innocent people. You will probably have a
lower AUC, but better differentiation between criminals and innocent people is
what matters.

------
Spooky23
I guess the question posed in the article is an interesting one and is an
interesting discussion re how to deal with bias and building counter-biases
into algorithms without the editorial decision asking being clear.

Movie reviews are editorial content. Measuring that content is a difficult
problem in this type of context... Are the best reviewers people who dislike
movies with female leads? Are you going into a back catalog of movie reviews
from an age where societal expectations were different? Are popular genres
skewing the result?

You could have a curation issue as well — if the female lead movies are
dominated by "Hallmark Channel" fare, algorithm C has a point!

------
pcunite
Here is a video that explains this blog well.
[https://youtu.be/59bMh59JQDo](https://youtu.be/59bMh59JQDo)

The unnerving part for me is this "eliminate negative associations/bias".
Okay, how about we learn the truth, and then address that outside _in real
life_ and keep the computer doing what it's good at ... showing us the data.

~~~
gowld
> Okay, how about we learn the truth, and then address that outside in real
> life

Aside from the aspect you overlooked -- that our online lives _are_ real life,
and that AI models are (over)used to make _decisions_ , not merely show us the
data -- your idea is quite what the article recommends:

> . As with Tia, Tamera has several choices she can make. She could simply
> accept these biases as is and do nothing, though at least now she won't be
> caught off-guard if users complain. > She could make changes in the user
> interface, for example by having it present two gendered responses instead
> of just one, though she might not want to do that if the input message has a
> gendered pronoun (e.g., "Will she be there today?").

She could try retraining the embedding model using a bias mitigation technique
(e.g., as in Bolukbasi et al.) and examining how this affects downstream
performance, or she might mitigate bias in the classifier directly when
training her classifier (e.g., as in Dixon et al. [1], Beutel et al. [10], or
Zhang et al. [11]).

No matter what she decides to do, it's important that Tamera has done this
type of analysis so that she's aware of what her product does and can make
informed decisions.

------
dgudkov
OK, factual data fed to some AI-based algorithms produces results that are not
so politically correct as some people would like it to be. Is this a problem
with the AI algorithms, or with the people?

------
avz
I think the core issue here is that people want two things. On one hand we
want our models to accurately describe reality, not an idea of what reality
should be. On the other hand, we don't want ML to freeze society and culture
in their current state, but to help decide on and drive social change. The
tension between these goals arises when models trained today are used to make
decisions tomorrow.

One way to resolve the tension might be to add time dimension and historical
training data. The models might then be able to return in addition to any
prediction variable p also its time derivative dp/dt. For example, a model
might then return results such as: "movies with female main character: lower
sentiment, trending up; movies with male main character: higher sentiment,
trending down".

------
jfasi
It's hard to be believe these days, but once upon a time language models were
written by hand. Imagine you hand-wrote such a model and put it to use in (for
the sake of example) a psychological evaluation application. Now imagine that
after years of use you discover that your model systematically marks african
americans as less psychologically fit than white americans. Who would be to
blame? Naturally, you would. Your actions led to a biased model being used to
unjustly and arbitrarily harm innocent people, and your leadership would be
right to call into question every decision your application ever made.

Now imagine the same scenario except your app was trained on data instead of
hand-written. Make no mistake, the answer to the question of who's to blame is
exactly the same: the developer. The response should be exactly the same: a
complete loss of confidence in the model.

I'm appalled that this needs to be said, but reading this comments section I'm
afraid it does: _Machine learning models are inference and pattern recognition
devices, not scientific tools_. They don't magically reveal hidden patterns in
the world, they repeat the patterns that the developers train them on. If you
trained a machine learning model to perform psychological evaluations [1] or
sentence convicts [2] or recognize faces [3], and your model is biased in a
way that is unnecessary and unjust, your model is bad you should be held
accountable for its failures.

[1]
[https://affect.media.mit.edu/projects.php?id=4079](https://affect.media.mit.edu/projects.php?id=4079)

[2] [https://www.nytimes.com/2017/05/01/us/politics/sent-to-
priso...](https://www.nytimes.com/2017/05/01/us/politics/sent-to-prison-by-a-
software-programs-secret-algorithms.html)

[3] [https://www.wnycstudios.org/story/deep-problem-deep-
learning...](https://www.wnycstudios.org/story/deep-problem-deep-learning/)

~~~
whatshisface
What do you do when your classification is correlated to race through no fault
of yours? For example, I might successfully predict credit score from backyard
size, and end up with a correlation to race that doesn't have anything to do
with me.

I don't think the blame always lays with the algorithm, especially when it
doesn't have access to race as an input (this is a reasonable expectation). I
can score students with a simple algorithm based on what they write on their
math tests, and even that's going to correlate with race. In that case the
blame pretty clearly lies in the process that produced the reality that's
being measured, not the measurement technique itself.

Let's say that black people default on their loans more often then white
people. Is it better to criticize the math that discovered that fact, or the
root cause that made it true to begin with?

~~~
jfasi
> Is it better to criticize the math that discovered that fact, or the root
> cause that made it true to begin with?

The only place where math was involved was the guts of the training stage of
the model. A crucial stage to be sure, but one that's bookended in the front
by problem definition, data selection, model selection, and a design for an
evaluation process, and behind by the execution of that evaluation and the
decision to launch the model. Literally every other stage of this process is
driven by human decisions.

I'll say it again, because apparently the point didn't sink in the first time
around: _Machine learning models are inference and pattern recognition
devices, not scientific tools._ The fact that it's inhuman and unthinking
mathematics that produced a biased model offers no ethical or legal cover to
the people who decide to put that model into use.

The decision to apply the model is key here. Contrast two applications: one
that takes in a patient's information and diagnosis to compute a dosage for a
drug, and one that takes in a potential tenant's request and produces a
rent/no rent decision. In the first, there are cases in which the bias is
admissible if not necessary, e.g. [1]. However, the legitimacy of that model's
application comes not from the supposed objectivity of the model's findings
but from volumes of peer (i.e. human) reviewed research. In the second, there
is no legal way in which this model can be applied, and I struggle to imagine
a moral one. I can't imagine any court of law taking "the machine made me do
it" as a defense in an FHA case.

[1] [https://www.nytimes.com/2005/06/24/health/fda-approves-a-
hea...](https://www.nytimes.com/2005/06/24/health/fda-approves-a-heart-drug-
for-africanamericans.html)

~~~
iguy
_pattern recognition devices, not scientific tools_

Isn't science all about pattern recognition? If the pattern exists, in the
real world, then a good theory is one which encodes this.

What you're asking for in a "moral" way of applying the model is that our
actions should abide by your moral preferences. Perhaps even universal
preferences. But it seems useful to me to keep logically separate these ideas
about how we ought to do things. They don't flow naturally from observations
of how things are.

The example of adjusting drug & dosage based on race is a good one. The
science backing this is exactly the same kind of statistical correlation as
backs the rental decision. The training input is what race some test patients
ticked on a form, and their tick mark sure as hell isn't the causal factor...
that's some gene which is correlated, maybe, or some diet difference, or
what's on TV, who knows. Nevertheless the correlation is there, as far as we
can tell: the peer-reviewed science process isn't infallible. The reason we're
OK with using this information is, I guess, that it aims to improve things for
the patient. (Not every single patient, only statistically.) We make a moral
judgement that this is more important than a landlord's wish to avoid bad
tenants (again statistically).

------
_pmf_
"Pray, Mr. Babbage, if you put into the machine biased models, will unbiased
answers come out?"

------
jayd16
I'm reminded of a much more obvious example.

[https://www.theverge.com/2016/3/24/11297050/tay-microsoft-
ch...](https://www.theverge.com/2016/3/24/11297050/tay-microsoft-chatbot-
racist)

I see a lot of comments about how its somehow sinister to want your model to
be better than the lowest common denominator and that is pretty damn
ridiculous.

------
throwaway84742
I’ll tell you more: _human judgment_ contains bias. You can’t possibly think
logically about every single judgment, particularly when information is
incomplete, which it is in the overwhelming majority of cases. It is not a
given, to me, that on average AI does any worse than the human population
outside the “woke” segment. Or even _within_ that segment considered in
isolation.

~~~
jimmytidey
If you put a human in charge of a judgement call, mostly there are mechanisms
to monitor them, a requirement that they give reasons for their decisions, and
mechanisms to appeal those judgements.

We aren't used to having to do that with decisions made by computers.

It's not that AI makes better or worse decisions, it's the way we treat those
decisions.

~~~
iguy
"a requirement that they give reasons for their decisions"

But this is mostly a sham. People lie about their reasons for doing things,
even to themselves. Especially when they know what reasons are publicly
acceptable.

Maybe the major difference is that it's much easier to run experiments on
computers. I mean people try this on humans but it's very hard to do
realistically: most of those studies where you submit 1000 CVs with varying
details are garbage, because they can only access an unrealistic part of the
process (I mean who ever got a job without networking? etc). Whereas on almost
any computer system you feed it completely realistic fake data.

------
pgodzin
Key takeaway: It is important to be aware of bias in ML models. Some biases
may correctly model the reality of the world, and some may show the bias in
the underlying dataset or in what the model has focused on in the data. The
goal is not to "unbias" everything, as people seem to be focusing on, but
rather to determine if the bias is appropriate given the context.

~~~
avinium
Damn, I just wrote almost exactly these words before scrolling down to your
comment. Could have saved myself a few minutes by upvoting yours instead.

------
blackbagboys
>> Normally, we'd simply choose Model C. But what if we found that while Model
C performs the best overall, it's also most likely to assign a more positive
sentiment to the sentence "The main character is a man" than to the sentence
"The main character is a woman"? Would we reconsider?

It seems like you have discovered that movie reviewers tend to review movies
with am male main character more highly than movies with a female main
character; what you need to consider is that while this may tell you something
about movie reviewers, it doesn't necessarily tell you anything about the
quality of the movie.

------
sooham
Despite the controversy surrounding "debiasing" classifier outputs, I think
further research in this area is still of merit. This area of research would
help us understand and build transformations over latent / high level
representation space, a general use case applicable to all fields interacting
with machine learning.

------
sagarm
There's a lot of complaints here about "erasing reality" and other hyperbolic
talk, but these models are trying to make predictions within a particular
user's context.

It's just inappropriate to apply some global biases for a particular user, and
avoiding that can result in a better user experience.

~~~
haberman
I think you make an interesting point: if we use context to train a model to
reflect _local biases_ instead of _global biases_ , will that be more just
and/or lead to better user experience?

It seems related to the question of whether Google results should be tailored
to you. If I Google "did Russia interfere in the election", should Google
tailor the results so I always see articles that reinforce my world view?

If we go that route, I think we take the path of Stephen Colbert's concept of
"Truthiness", where we judge something as true because it "feels" true. Users
will definitely be happier if everything they see reinforces their existing
world view. So companies will be incentivized to accommodate this desire. But
does this actually lead to a more just society?

------
ablx_
FYI, a good talk about this from 33C3:

[https://media.ccc.de/v/33c3-8026-a_story_of_discrimination_a...](https://media.ccc.de/v/33c3-8026-a_story_of_discrimination_and_unfairness)

------
bloak
I wish people who write articles about "bias" would explain what they mean by
"bias". I've seen hundreds of these articles. I'm still waiting for a usable
definition.

------
cup-of-tea
If the models _reflect_ bias then surely that's a good thing.

It's funny. I like programming because a computer can't lie and doesn't make
mistakes. I guess some people don't like that.

~~~
UncleMeat
The goal of ml isn't to accurately model training data. The goal is to make
something useful. Correctly doing the former can hinder the latter.

------
callesgg
The only thing that can be biased is the training data.

The model is simply a statistical breakdown of the training data.

------
nukeop
Google uses their AI systems for profiling where the sex of the people being
profiled is a crucial piece of information and acts as a predictor for
interests which is then further used in targeting ads. As long as it makes
Google money, it's not a problem. This bias actually accurately reflects
reality and the advertisers know that, otherwise they wouldn't be paying
Google for targeted ads.

