
Ethics in Machine Learning: Interview with Dr. Hanie Sedghi, Google Brain - benbreen
https://medium.com/@RoyaPak/ethics-in-machine-learning-54a71a75875c
======
ASpring
A lot of what Sedghi is saying about definitions of fairness echoes my own
research regarding how different academics define algorithmic bias. There is a
large class of academics who would say that algorithmic bias is simply a data
problem but I believe if we ignore the societal element then we don't properly
account for many examples of algorithmic bias.

I write a bit more about how to define algorithmic bias here:
[http://aaronlspringer.com/algobias-
overview/](http://aaronlspringer.com/algobias-overview/)

------
poster123
From the article: "What this means is that, we calibrate classifiers
parameters such that it has the same acceptance ratio for all subgroups of
sensitive features, e.g. race, sex, etc."

But statistics such as crime rates and default rates are NOT the same across
race and sex. Is she saying that models should be fiddled with to make the
same average prediction by race, sex, and other demographic variables?

~~~
ABCLAW
No, she's saying that if the input data is itself skewed, then one method of
fixing the model you generate is to apply corrections to the calibrating
parameters.

Imagine we have a 50/50 population of purple and orange people. We know that
purple offenders are caught twice as often for a certain crime, and 10 purple
offenders are caught and 5 orange offenders are caught. If we run our data and
our model predicts that we're getting substantially more purple offenders,
then we know our model is wrong - it isn't taking into account the skew due to
differential enforcement.

How do you fix this? She gives 3 methods. 1) Change the params to normalize,
2) resample and get better data, 3) use causal reasoning and other data to
plug in the holes.

~~~
bobcostas55
The data[0] shows that the race distribution of the assailant in crime reports
is about the same as the race distribution of arrestees. There's no support
for the differential enforcement hypothesis.

[0]
[https://www.ncjrs.gov/App/Publications/abstract.aspx?ID=2560...](https://www.ncjrs.gov/App/Publications/abstract.aspx?ID=256035)

~~~
ucaetano
> The data[0] shows that the race distribution of the assailant in crime
> reports is about the same as the race distribution of arrestees. There's no
> support for the differential enforcement hypothesis.

Unless the reports are also influenced by race, such as a police officer not
stopping a white person who was a criminal because he/she is white, but over-
inspecting non-white people.

~~~
tschwimmer
Incidentally "Driving While Black" is a fairly well-known phenomenon.[1]

[1]
[https://en.wikipedia.org/wiki/Driving_while_black](https://en.wikipedia.org/wiki/Driving_while_black)

~~~
ucaetano
Exactly. Let's say cops stop 10% of blacks for "looking suspicious" (aka,
being black), but only 1% of whites.

Now say that 10% of both whites and blacks are criminals. And they always get
identified if stopped, and eventually arrested and sentenced. An that the
population is evenly split between blacks and whites.

(1) Criminal prevalence: 50% B, 50% W

(2) Police stops: 91% B, 9% W

(3) Criminals caught (reports): 91% B, 9% W

(4) Incarceration: 91% B, 9% W

Now remember that only (3) and (4) are available to the public, so one might
look and say that the problem isn't that blacks are incarcerated differently,
but that blacks commit more crimes, when they don't.

------
roenxi
If the techniques in use put emphasis on correlation over causation then the
results will be unfair.

But once you put causation as the primary driver, is machine learning still
powerful? My understanding of a causative model is that you start with a model
that includes information sourced from real-life observation, then fit
parameters from the data. That doesn't sound like machine learning to me.

Isn't the point of the various machine learning methods to find obscure
correlations? They will by really effective at picking up proxies for race and
gender if race and gender are correlated to a statistic. The 'machine
learning' part is irrelevant, the question here is an age-old one related to
using data to make decisions.

~~~
andrewprock
I think you are overstating the purpose of machine learning quite a bit.

What machine learning does is function fitting, no more, no less. Whether this
is causal, correlative, obvious, or obscure is irrelevant to the algorithm.

All it does is try to find parameters within a model function, which provide
the best predictive power.

~~~
ASpring
There is an argument to be made though that we should be looking for causal
connections rather than correlative when we think about what fairness looks
like. Unfortunately, even with the many recent advances in causal network
detection, we're not quite at a point where I would trust causal modeling for
this.

~~~
PeterisP
Causal factors don't seem to be a road to avoid discrimination.

If we're looking at causal factors for classic examples of potentially
discriminative classifiers e.g. loan default risk and crime reoffending risk,
then no matter how you slice it the important causal factors for these things
aren't only the objective measurements and things under your control but also
different factors of influences, upbringing and cultural values. They're not
_the_ cause, and likely not the majority of the cause, but they're certainly a
non-zero causal factor.

Having "bad" friends is not only a correlation, but a causal factor that
affects these things - we're social animals, and our norms are affected by
those around us. Would we consider fair to discriminate people in these
ratings because of the friends they have? Do we want to ostracize e.g. ex-
convicts by penalizing people who associate with them (so motivating them to
choose not to associate), _even if_ there's a true causal connection of that
association increasing some risk?

Abuse of alcohol and certain drugs during pregnancy is not only a correlation,
but a causal factor for these things (the mechanism IIRC was an decrease in
risk avoidance and intelligence) - would we consider fair to discriminate
people in these ratings because of what their mothers did?

Etc, etc - I have a bunch more in mind. _And_ on top of that, many of these
things will (in USA) be highly correlated with race for various historic and
socioeconomic reasons, so taking that into account would still harm some races
more than others. It seems that it just might be the interest of everyone just
to avoid that huge can of worms.

------
fnl
A very relevant "must see" if this stuff interests you: Kate Crawford's NIPS
2017 keynote on bias in machine learning
[https://nips.cc/Conferences/2017/Schedule?showEvent=8742](https://nips.cc/Conferences/2017/Schedule?showEvent=8742)

------
krautt
It's pretty insane when we live in a time where we are on the brink of a
technology that could surpass electricity and we're still selectively
acknowledging reality only when it conforms to our own biases.

------
VanillaCafe
Step 1? Maybe if you don't want your model predicting crime based on race,
sex, or other sensitive features, you shouldn't train your model on race, sex,
or other sensitive features.

~~~
xyzzyz
This approach doesn’t work. If sensitive features are actually predictive (and
they very often are), the model will just learn to predict these features
based on the non-sensitive ones. For example, if you know that someone is 30
years old, lives in Bay Area, and earns less then 30k/year, if you bet that
this person is neither white nor Asian, you are more likely to win this bet.

~~~
mathperson
I don't like this example-they could be a graduate student

~~~
xyzzyz
Of course. They could be many other things as well. The question is, what
would you bet on, if you didn't want to lose money?

------
shadowtree
Tangent - on the other side of the spectrum is r/deepfakes. unbelievable how
video clips can be altered with ML, destroying the last bit of credibility for
_any_ video.

and no ethics.

