
Attacking discrimination with smarter machine learning - dudisbrie
http://research.google.com/bigpicture/attacking-discrimination-in-ml/
======
mvindahl
Well. At the end of the day, the companies will pick thresholds and rates to
maximize their profits, based upon the data that they have available. They
don't know every detail of our personal lives -- which would also be kind of
unsettling -- so they have to resort to a simplified picture. Simplifications
are always prejudiced at the individual level but if the prejudice is
reflected in the numbers, it's balanced. It may not feel perfectly just but at
least it's more just than an imbalanced prejudice.

A real life example: In many (all?) countries, it's common to pay more for
your car insurance if you are young, if you are male, or if you recently
acquired your driver's license. One might frame this as a prejudice towards
young, male drivers as being reckless. But statistically they are just that,
so the prejudice is balanced (within a reasonable tolerance). It's unfair to
the careful young, male driver, but well, life is not always fair.

~~~
tptacek
The difference with machine learning is that the model isn't designed by
humans, through an actuarial process we can keep our brains wrapped around.
It's a black box. We are OK attributing "crash risk" to young male drivers,
because we can observe both that they are as a cohort statistically likely to
crash and also understand why that would be the case. On the other hand, we're
not comfortable with the idea that a computer program might spot some
irrelevant correlation that determines that Armenians shouldn't get car loans
--- and we're _especially_ not comfortable with the idea that we might learn
that this has happened only after years of unintended discrimination, because
there is no straightforward way to interrogate the model for every possible
bogus correlation it may have snagged on.

That's the computer science problem being worked on here.

~~~
tjic
Please don't say "we".

Not everyone shares your politics.

I'm perfectly happy with algorithms detecting that certain people are more
likely to be safe drivers than average, and giving them lower rates, and
concentrating premiums on the groups more likely to be in accidents, even if I
don't understand why Armenians (in your example) get in more crashes.

~~~
KirinDave
I've run into many people who share this sentiment, and it always surprises
me. I've never once met a person who was cheerful about experiencing
algorithm-driven prevenge. I think it's very easy to sit back and say "I think
we should let this happen!" if they've never knowingly experienced loss based
on this phenomenon.

A great example is how very resentful many young white men of college age are
that universities are requiring them to take sensitivity courses designed to
reduce the instance of campus rape, but strictly speaking men of that age are
the overwhelming majority of bad actors in that environment. Statistically and
logistically speaking, it's smarter and cheaper to just require all men of
college age to take courses reminding them that rape is not okay rather than
dealing with the moral, legal and healthcare costs of the alternative.

In some cases, our relatively primitive algorithms pick up on correlations
that should not be acted on because we're actively working to correct them.
For example, it would be inappropriate to pre-reject job applications based on
skin color if in a certain culture, it's less likely for that person to have a
college degree.

Even if that insight is correct, it's usually part of something that society
hopes to correct or that applicants should be given the benefit of the doubt
about, otherwise very serious negative responses will emerge.

Acting on existing categories may reinforce. It may not. For now, it's a case-
by-case basis we'll have to act on. Maybe one day, modeling techniques and
data sources will become sophisticated and robust enough to make every
decision for us. That day is not today.

~~~
tjic
> . I think it's very easy to sit back and say "I think we should let this
> happen!" if they've never knowingly experienced loss based on this
> phenomenon.

I'm sure that I've experienced increased costs based on this.

The issue is that I don't consider that the morality of forcing my will on
other people depends at all on whether their current behavior is advantageous
or disadvantageous to me.

~~~
KirinDave
> The issue is that I don't consider that the morality of forcing my will on
> other people depends at all on whether their current behavior is
> advantageous or disadvantageous to me.

Why on earth would I believe anyone who says this? I don't think humans are
capable of such abstractness. It's not a matter of wanting to, biology itself
is at odds with this mindset.

But also, there are many markets which are _not_ effectively free markets.
Insurance is a good one, and health service is an especially good one. In
these cases, it's very dangerous to start agreeing that insurance agencies can
start to pay "pass the puck" with human life.

------
Scea91
>[...] concept called equal opportunity. Here, the constraint is that of the
people who can pay back a loan, the same fraction in each group should
actually be granted a loan.

This does not seem fair to me, because if this is applied then your race
(group) would determine your credit score threshold which feels discriminatory
to me.

I feel that, by definition, it is not discriminatory only if none of your
attributes that you could be discriminated against (gender, race, ...) are
taken into account at all.

Maybe the concept of 'equal opportunity' is just some compromise between
discrimination and making less informed decisions.

~~~
tomp
I think it's useful to figure out _why_ there are discrepancies between two
groups.

For example, let's take blacks in the US. The data tells you that a black
person is more likely to be a criminal than a white person. There are two
possible reasons for this: (1) blacks are more prone to crime, or (2) blacks
are more likely to live in circumstances that make them criminals. With access
to only anecdotal data, I strongly believe that (2) is true, and that if you
took into account enough circumstances (e.g. single parent, school district,
income level, parents' wealth) you'd be able to remove race from your model
and _still_ arrive to the "equal opportunity" result. That way, you wouldn't
discriminate based on race, but you would still help the people most needy of
help (poor, uneducated, etc.). I think the same applies to colege applications
and sex wage gap, which is why I strongly oppose any kind of affirmative
action.

Life expentancy sex gap might be a different case. IIRC, men die earlier
because of some behavioural/social tendencies (working dangerous jobs,
supressing emotions, risky behaviour (speeding, smoking)), so maybe there are
some non-discriminatory (or at least less-discriminatory) ways of determining
life expectancy (e.g. testosterone level, job description, ...). However,
there are also biological differences that seem very strongly linked to the
quality of being male (i.e. the Y chromosome). E.g. prostate cancer is less
lethal than breast cancer, and women are more likely to get MS than men. These
particular examples suggest that men should live longer, but I'm guessing
there might be other gender-specific ilnesses that might reduce their (our)
expected lifespan. If that's the case, I don't think it's too discriminatory
to have different insurance fees for different sexes.

In all such scenarios, however, there could still be broader societal goals
that would override specific instances of (non-)discrimination. For exapmle,
AFAIK women's health is more expensive (because of pregnancy), but having more
children benefits everyone in the society (in the West), so it makes sense to
"discriminate" against men by letting women pay less for their health
insurance.

~~~
corey_moncure
> The data tells you that a black person is more likely to be a criminal than
> a white person. There are two possible reasons for this:

I'd like to add a third possible reason for your consideration. Since
"criminality" i.e. guilt of committing a crime is determined after a process
engaging the law enforcement and justice systems, we have to examine whether
there are inherent biases in those systems that result in skewed statistics.
For instance, do police officers selectively target blacks for monitoring and
investigation? Are blacks discriminated against in the courtroom as a result
of procedure or human nature?

~~~
zmoreira
This is studiable and has been studied. A study regarding police killings, for
example:

 _Do White Police Officers Unfairly Target Black Suspects?_

[https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2870189&...](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2870189&download=yes)

 _Using a unique data set we link the race of police officers who kill
suspects with the race of those who are killed across the United States. We
have data on a total of 2,699 fatal police killings for the years 2013 to
2015. This is 1,333 more killings by police than is provided by the FBI data
on justifiable police homicides. When either the violent crime rate or the
demographics of a city are accounted for, we find that white police officers
are not significantly more likely to kill a black suspect. For the estimates
where we know the race of the officer who killed the suspect, the ratio of the
rate that blacks are killed by black versus white officers is large — ranging
from 3 to 5 times larger. However, because the media may under report the
officer’s race when black offic-ers are involved, other results that account
for the fact that a disproportionate number of the un-known race officers may
be more reliable. They indicate no statistically significant difference be-
tween killings of black suspects by black and white officers. Our panel data
analysis that looks at killings at the police department level confirms this.
These findings are inconsistent with taste-based racial discrimination against
blacks by white police officers. Our estimates examining the killings of white
and Hispanic suspects found no differences with respect to the races of police
officers. If the police are engaged in discrimination, such discriminatory
behavior should also be more difficult when body or other cameras are
recording their actions. We find no evidence that body cameras affect either
the number of police killings or the racial composition of those killings._

~~~
sangnoir
> This is studiable and has been studied

True, but there has been more than one paper written on the subject, which
don't all agree with the one you linked vis over-representation for crimes.

 _The black /white marijuana arrest gap, in nine charts_

[https://www.washingtonpost.com/news/wonk/wp/2013/06/04/the-b...](https://www.washingtonpost.com/news/wonk/wp/2013/06/04/the-
blackwhite-marijuana-arrest-gap-in-nine-charts/)

 _As you 're probably aware, black Americans are arrested for marijuana
possession far more frequently than whites. You may also know that there's not
much evidence that black people consume marijuana with greater regularity than
whites do.

...And this is a uniform phenomenon. It's not that some states treat the races
equally and others treat them really unequally. Only in Hawaii are the rates
even close to equal, and that's biased by the fact that blacks make up only
1.6 percent of the population. In the state with the second-lowest disparity,
Alaska, blacks are 1.6 times more likely to be arrested. In the state with the
biggest, Iowa, blacks are 8.34 times more likely to be arrested. D.C. has the
second biggest; in the District, blacks are 8.05 times more likely to be
arrested._

~~~
weberc2
Do the findings control for which group uses more in public? My experience
living in Chicago and Waterloo, Iowa (the "blackest" city in
Iowa--[http://www.iowadatacenter.org/Publications/aaprofile2016.pdf](http://www.iowadatacenter.org/Publications/aaprofile2016.pdf))
suggests that this disparity alone could explain the disparity in arrest
rates. It's quite possible that there's a cultural difference in drug abuse
that could skew the results.

------
Dowwie
For more information about "ethics and algorithms", read: "Weapons of Math
Destruction" [1] or at least listen to the EconTalk podcast between the author
and host Russ Roberts [2]

[1] [https://www.amazon.com/Weapons-Math-Destruction-Increases-
In...](https://www.amazon.com/Weapons-Math-Destruction-Increases-
Inequality/dp/0553418815)

[2]
[http://www.econtalk.org/archives/2016/10/cathy_oneil_on_1.ht...](http://www.econtalk.org/archives/2016/10/cathy_oneil_on_1.html)

~~~
icebraining
Previous discussion on the podcast episode:
[https://news.ycombinator.com/item?id=12642432](https://news.ycombinator.com/item?id=12642432)

------
denzil_correa
> Restricting to equal opportunity thresholds transfers the "burden of
> uncertainty" away from these groups and onto the creators of the scoring
> system. Doing so provides an incentive to invest in better classifiers.

Personally, I find this as the key outcome here. The accountability is on the
people/systems who make the decision and that leads to an appropriate
incentive. Win-win as a start.

------
sergioisidoro
I've played around with this concept, trying to replicate some previous work
[1].

It's a sensitive topic, because sometimes we're actually tampering with the
data, trying to eliminate known human or selection biases.

The first defence against discrimination is, in my honest oppinion, for
everyone working with data to be aware of these problems. To know that,
besides ROC, precisions and recalls, we should measure the impacts of the
models in sensitive demographics (gender, race, nationality, sexuality).

And one of the things that I learned (in [1]) is that, even if you're carfull
with the features you use, you might still have a negative effect.

[1]
[https://github.com/sergioisidoro/aequalis](https://github.com/sergioisidoro/aequalis)

~~~
denzil_correa
> And one of the things that I learned (in [1]) is that, even if you're
> carfull with the features you use, you might still have a negative effect.

One needs to understand how these features interplay with each other. For
example, you may not directly use a protected class feature (race) to make
your prediction but you might end up using a secondary or tertiary variable
(like location) to end up learning a protected class feature due to
statistical correlations.

------
cs2818
What strikes me as interesting in this example is that it seems to assume the
classifier is run once and that its decisions have no effect on future
decisions.

I would imagine that if we are separating people into groups based on
demographic or social factors to make decisions, then those decisions may have
an impact on that entire group. (In this example, maybe granting more loans to
the blue group alters the group characteristics and leads to higher profit in
the long term despite higher immediate risks).

Is there an area of ML research that considers this kind of concept?

~~~
zitterbewegung
I don't think there is though it may fall under ethical considerations of AI
or economics of AI. Would be hard to do a study since AI has really only been
around for less than 50 years.

------
amelius
But how can we find out if a company uses ML in a non-discriminating way? If
we cannot see it, and check it, then there is no incentive for companies to
use it.

My guess is that at most companies spending time on making an algorithm non-
discriminating will be viewed as a waste of time and money.

~~~
riskable
I work for a big bank... As long as people freak out when banks are found to
be discriminating they will do their best to _not_ discriminate.

Banks are built on trustworthiness. Having your bank's name in the headlines
for discriminatory practices can have a _severe_ negative impact on
trustworthiness. They have a whole teams of people devoted to this topic and
every year at most banks every employee has to learn about, "reputational
risk."

Having worked at big banks for over a decade now I am 90% certain that
discrimination by banks at this point is primarily due to carelessness.

~~~
tptacek
That is a story that banks like to believe about themselves, but it's worth
considering that emergent discrimination (and the broader bucket of emergent
malfeasance) is a property of all complex systems, not just machine learning.
So, for instance, Wells Fargo needed to do more than just hope that its
trustworthiness would survive its incentive systems.

~~~
riskable
Not taking steps to prevent emergent discrimination or malfeasance _is_
careless. Do you think the board of directors wants to find out that the CEO
intentionally ignored calls to the ethics hotline? That he didn't take steps
to ensure whistleblowers weren't punished?

The consequences of these things are _very_ bad for stock prices! This is
doubly true for banks which take reputational risk far more seriously than
most businesses.

~~~
GFK_of_xmaspast
Wells Fargo stock appears to be back up to where it was a few months ago

------
d--b
Nice to see that the debate has reached the ears of the main people working on
this field.

What is important to note here is that we need to tweak the mathematical model
to the culture we want to achieve. In other words, the objective function of
the optimization problem needs not only match the current state of the world,
and provide an hindsight in one's own economic interests, it also needs to
take into account the culture that we want to reflect, and influence.
Otherwise, these statistics are just a giant status quo amplifier.

~~~
icebraining
I'm not sure that's clear. There are actually two ways to achieve the outcome
we want; tweaking the model or _changing the inputs_.

What I mean, say the model identifies that a certain group has a greater risk
due to systemic problems. If you change something about the group, you can
change the calculated risk without changing the model. And this may very well
be a better way to achieve the outcome you want.

Specifically, by preventing insurance companies from using a more accurate
mode, what you're demanding is that the random people who happen to have taken
the same insurance packages but are not part of the group should make an extra
contribution to fix these systemic problems.

But why them? Shouldn't we all contribute instead, hopefully using a fair
system for assessing how much should each pay?

Instead of tweaking the model, you can change the inputs by providing a state-
backed guarantee to the underprivileged groups. Isn't it more fair overall?

~~~
d--b
No I don't think I expressed myself clearly. The problem is that machine
learning learns from biased data. The input data is the total US population,
which has wealthier groups than others. Because of the way we train machine
learning (using all the data we have), the bias that is in the original data
gets transfered to the logic.

Specifically, a machine learning will produce different numbers for 2
individuals with the exact same characteristics except the race. And that is
the problem that needs to be addressed.

Let's put it in another context. Let's say I'm a white athlete, and I'm very
good at running the 100m race. Actually I run just as fast as a black person
who is my main rival. Now if someone has to select one of us to go to the
Olympics, they should toss a coin to decide who goes. If you use a ML
algorithm, it would absolutely send the black person, because no white people
has won a 100m race in the last 20 olympics. That's the kind of bias ML does
and that needs to be addressed.

------
beejiu
Presumably it means unlawful or unethical discrimination, since classification
without discrimination doesn't make sense.

~~~
tptacek
It's referring to the much-talked-about effect of emergent discrimination,
where the model fitting process has the effect of amplifying the status quo,
despite the fact that the status quo is informed in large part by structural
injustices. For (oversimplified) instance: poor black people represent a
cohort of loan applicants likely to default, and the model fitting process may
go a step worse and attribute "default risk" to all black people.

The key thing to understand is that we're talking about discrimination that is
usually unintended and unexpected by the designers of these systems.

~~~
tjic
> despite the fact that the status quo is informed in large part by structural
> injustices.

This is presented as if it's an unambiguous fact, when it's largely a
political stance.

~~~
dragonwriter
It's an unambiguous fact that the status quo is shaped heavily be many
generations of _de jure_ discrimination including chattel slavery and
continting structural inequalities in political power that still exist that
were designed to protect those other unequal institutions.

It's a subjective political view that any or all of those things are
injustices, of course, since justice is a subjective thing.

~~~
tptacek
This happens to be something I believe strongly, but the comment you're
responding to doesn't make sense even if you strongly disagree with that, as
it supposes we can snapshot American society as it is in 2016 and synthesize
from it reliably just and sensible decisions. Nobody believes this, no matter
what their politics (unless there's a "status-quo-ism" I'm unaware of).

The problem (or at least, one of the more important problems) being addressed
in this work is the unintended amplification of the status quo --- the
implicit notion that if something is a certain way now, it is best that it
always be that way.

------
pidge
If membership in a discrimination-protected category is not completely binary,
equal-opportunity policies incentivize people who could identify with multiple
categories to choose the one with the lowest discrimination -adjusted
threshold.

------
zimzim
Is it possible to make the equal opportunity and the max profit thresholds the
same? maybe with a special tax system (the closer your threshold is to the
"equal opportunity" the less tax you pay)? edit: i forgot its illegal to
discriminate

------
lifeisstillgood
sorry, just checking - young _black_ men are statisically less likely to rape
than equivalent white (young men)

That is surprising. And given this discussion about only wanting stats that
have a valid explanation as well as fitting the facts, is interesting.

~~~
sctb
We would ask that you please don't introduce this into the discussion out of
nowhere, it's just off-topic. See also:
[https://news.ycombinator.com/item?id=13007847](https://news.ycombinator.com/item?id=13007847).

We detached this subthread from
[https://news.ycombinator.com/item?id=13006361](https://news.ycombinator.com/item?id=13006361).

~~~
lifeisstillgood
I did not introduce it out of nowhere it was a reference to the parent
comment, which did seem relevant to the thread

It is a sensitive subject I suspect

And I cannot find the original parent comment in the discussion and I look
like a total dick with my comment coming out of nowhere.

I will try and understand what moved where and get back to you

Aha:

"""A great example is how very resentful many young white men of college age
are that universities are requiring them to take sensitivity courses designed
to reduce the instance of campus rape, but strictly speaking men of that age
are the overwhelming majority of bad actors in that environment. Statist"""

It is only my memory but I am fairly sure that the above comment said
"strictly speaking young white men ..." which what prompted my comment. Maybe
I transposed the white in the first part of the paragraph to the final part.
Not sure. Please do check, let me know if I was being a fool or not.

Darn I need to md5 hash parent comments I reply to now :-)

------
weberc2
I agree that this should be a possibility, while I think the odds that
millions of law enforcement officers and criminal justice faculty would
spontaneously conspire against a particular race, I also think it's more
likely than some genetic predisposition toward crime (i.e., the first option).

~~~
Retric
Now, this is his first offence, but a young man is found high on PCB walking
down the street hitting cars with a baseball bat. It takes five cops and a
significant struggle to arrest him.

Having read that, what would you assume is race and economic background is?

After criminal proceedings he was sentenced to community service.

Now, what would you assume is race and economic background is?

PS: Bias is insidious and really hard to control for.

~~~
striking
I'd assume he's a poor white man, from the very beginning.

White, because most people in the US are white.

~~~
Retric
Interesting you said poor, at the time he had over 6 million in a trust fund
and stood to inherent vastly more. Some people change their minds because of
the community service vs. prison but some don't.

PS: PCP has a reputation as a rural poor white persons drug. I wonder if you
would have had a different impression if I started by saying he got community
service.

~~~
striking
Look, if we're talking about assumptions, we're talking about probability. I
have no idea who you're talking about. I'll happen upon some statistics once
in a while, but otherwise will avoid making assumptions because I simply _don
't know._ You asked me for what I'd say was most statistically probable, you
got it.

I assume you mean "PCP" and not what circuit boards are printed on, yeah.
You're right, I've never heard of a rich person taking PCP (cocaine, fancy
liquor, etc. for them, right?) although I'm absolutely unsure of how PCP use
breaks down racially. (If you have any source for those statistics, you've
piqued my interest in them and I'd love a link)

Again, my guess of "white" was based on the fact that most people are white,
and thus most cases of violence (without further stratification) are probably
by whites. And what, do African-Americans not get community service? That
probably wouldn't have changed my mind, and actually sounds a little bit
racist.

~~~
Retric
No, people assume poor people are less likely to get community service after
assaulting police officers.

Further, you pulled poor from thin air so you did make an assumption without
evidence.

~~~
striking
Poor people are more likely to commit street crime.

> Findings on social class differences in crime are less clear than they are
> for gender or age differences. Arrests statistics and much research indicate
> that poor people are much more likely than wealthier people to commit street
> crime.

> [...] most criminologists would probably agree that social class differences
> in criminal offending are “unmistakable” (Harris & Shaw, 2000, p. 138).

[http://catalog.flatworldknowledge.com/bookhub/reader/3064?e=...](http://catalog.flatworldknowledge.com/bookhub/reader/3064?e=barkansoc_1.0-ch08_s03)

And again, as you admitted, PCP has the connotation of being used by poor
people. I can't find stats on it right now, having just run out of time, but
you even admitted it directly.

I didn't pull it out of thin air.

~~~
Retric
As to connotation's, that's faulty reasoning. The only correct response to any
of my questions was, _not enough information._ Yet, you where more than happy
to try and both pick a response and then justify it. Even when I directly said
_you were wrong._

Now, the same identical reasoning happens all over the place in the criminal
justice system. Making the link between crime statistics and crimes almost
meaningless. Arrest statistics are just that arrest statistics and they don't
tell you about who was actually committing crimes.

PS: I suspect you feel very confidante about his race, except I never
confirmed or denied your assumption.

Edit: TLDR; Making a judgement with limited evidence is a really bad habit.
People soon forget they chose something because it was slightly better odds
and reinforce the judgement so something that may have been even odds to start
with often feels much more likely over time.

------
alvarosm
This is how it should be: "Max Profit. The most profitable, since there are no
constraints. But the two groups have different thresholds, meaning they are
held to different standards."

Here's the big fallacy: "the two groups have different thresholds, meaning
they are held to different standards." _They are not held to different
standards because they 're different groups, but because of other reasons that
indicate different loan default rates._ So you cannot call this
"discrimination". This is how things should be.

~~~
alvarosm
Any SJW care to explain why I'm wrong instead of downvoting? thanks! this is a
lot of fun :D

~~~
dang
Since you've ignored our repeated requests to stop breaking the HN guidelines,
we've banned your account.

If you don't want to be banned, you're welcome to email hn@ycombinator.com.
We're happy to unban people if they give us reason to believe they'll only
post civil, substantive comments in the future.

