
Bias detectives: the researchers striving to make algorithms fair - onuralp
https://www.nature.com/articles/d41586-018-05469-3
======
local_yokel
It's worth pointing out that the original ProPublica investigation was
conducted by journalists unskilled in statistics and machine learning. There
was a convincing rebuttal posted by the actual scientists involved, which is
of course ignored since "racist AI" is the kind of headline that's just too
golden to abandon.

[http://www.uscourts.gov/sites/default/files/80_2_6_0.pdf](http://www.uscourts.gov/sites/default/files/80_2_6_0.pdf)

~~~
Bartweiss
ProPublica's work on algorithmic bias has all seemed well below their usual
standards. I haven't followed this rebuttal, but their work showing racism in
car insurance pricing was heavily criticized, and while the authors defended
the work it looked to me like they picked only the weakest criticisms to
respond to.

(In the insurance case, ProPublica attempted to compare areas with comparable
crash frequencies and show that rates were higher in poor and minority areas.
But the data they had was moving accidents, especially with injuries, and the
data they _didn 't_ have was stuff like "rate of car break-ins" and "odds of
being hit by an uninsured driver". Which you would obviously expect to vary by
region even when serious-injury accidents don't.)

------
kyleperik
The purpose of Machine Learning is to generalize on a large scale. I like to
think of it as the equivalent of someone who has years of experience in a
particular area. Someone who has been doing something for years has seen so
much, that they can know in an instant what a situation is based on clues and
generalizations. It wouldn't be fast if it wasn't generalizing.

If you want to claim you know what fair is in any given situation, then go and
hardcode all your own fair rules, because you aren't going to find "fairness"
in machine learning.

~~~
throwawayjava
_> If you want to claim you know what fair is in any given situation, then go
and hardcode all your own fair rules_

This strikes me as eerily similar to the argument that type systems are
impossible because of the halting problem. It's sort of true in some sense,
but not in an even remotely useful way. So it mostly functions as a way of
derailing the conversation away of the more subtle distinctions that do matter
(e.g., could we design an easy-to-use type system that rules out this
_particular_ type of non-termination/other class of bugs).

There's a large middle-ground between "hard code all your own rules" and
"completely unconstrained learning". Learning under constraints is not a new
idea.

A classical programming analogy to your argument might be "well the halting
problem is undecidable so ignore all this high-level language stuff and just
go code up your own turing machine; it's the best you'll ever be able to do".

 _> because you aren't going to find "fairness" in machine learning._

Why not? The human notion of fairness is fuzzy, which is why researchers have
provided various formal notions of fairness in machine learning tasks.
Obviously, these formal definitions may or may not correspond to your own gut
instinct about what is "fair". And there might be friction between different
notions of fairness. None of that should be surprising; otherwise, fairness
wouldn't be something that philosophers continue to bleed ink about.

But it is equally obvious that for some notions of fairness, there will exist
machine learning algorithms that learn well under the given constraint.

~~~
kyleperik
I agree with you when you say fairness is fuzzy. Though this implies that
there are no set of rules that define it, that also means there is no way to
train on it. By choosing the right features and utilizing it in the right way
I believe you can avoid putting people in bad situations for bad reasons. I'm
saying, if you train an algorithm to guess if someone is involved with crime,
it's going to be incredibly stereotypical with it's answers. Same as if you
rely completely on statistics.

I'm not actually saying hardcore all your rules. I was making an example that
if you really know exactly what fairness is, then program it. But we both
don't know there are no definite rules. So why hardcode at all?

I think it is never the ML that is "unfair", it's the one who made it who is
responsible. If you're finding yourself running into issues in your ML with
unfairness, I think you're just using it wrong.

 _EDIT: Rewording last sentence_

------
Sol-
When I was reading through some algorithmic fairness literature some time ago,
I came back a bit frustrated because as the article mentions, the fairness
definitions are mutually incompatible (though some seem more plausible than
others) and it's not really a problem that can be fully solved on a technical
level. The only flicker of hope was that a perfect classifier can, by some
definitions, be considered fair, so at least you have something to work with -
if your classifier discriminates by gender or other attributes, you should at
least make it good enough to back up its bias by delivering perfect accuracy
(at which point you can investigate why inherit differences between groups
seem to exist).

It's good that some Computer Science researchers are ready to work in such
politicized fields though, it's definitely necessary. I find it admirable
because I personally wouldn't enjoy those discussions.

~~~
LoSboccacc
If unconstrained learning emits biased result it was given biased samples.
That or the bias is in the dataset itself, which could still be fixed by
removing and randomizing traits but at which point your alghorithm is learning
a representation if reality which usefulness depend on the realm if
application, say, great for university admittance and not so great for medical
insurance purposes.

~~~
tomjen3
You assume that the underlying data cannot be correct and not fair.

~~~
LoSboccacc
uh, no?

> [then] it was given biased samples

------
kgwgk
In summary: “You can’t have it all. If you want to be fair in one way, you
might necessarily be unfair in another definition that also sounds
reasonable.”

~~~
mlthoughts2018
It reminds me of Arrow’s Impossibility Theorem [0].

David Deutsch had an anecdote in his book _The Beginning of Infinity_ about
explaining Arrow’s theorem to a US congressman and eventually getting him to
the point of understanding how it applies to the electoral college _in
principle_ and there is no simple legislative change that could mollify it (I
think the discussion was about how preferential voting would improve upon FPTP
voting).

The congressman replied something like “this is lamentable” and Deutsch wrote
a big passage about whether it makes sense that anyone should ever find a
_mathematical fact_ to be “lamentable.”

I do think there will be pragmatic varieties of this sort of impossibility
theorem for machine learning fairness and that society in a broad sense will
have a hard time processing it, and the possible legislative reactions might
be totally unreasonable, even unintentionally harmful.

[0]: <
[https://en.m.wikipedia.org/wiki/Arrow's_impossibility_theore...](https://en.m.wikipedia.org/wiki/Arrow's_impossibility_theorem)
>

------
aldanor
Note that there's not only a potential selection bias problem, but a feedback
issue as well. For instance, if the algorithm is biased toward assigning
higher criminal activity risk to black people, the black people will be more
likely to be checked and, as a consequence, the future versions of such
algorithms will be even more biased in the same direction. Debiasing in such
situations is a very tough endeavour.

~~~
marcoperaza
A lot hinges on how you define fairness. Is it unfair if a higher percentage
of <racial group> is flagged as “criminal activity risks”, to use your
example, than of <other racial group>? What if it aligns with reality?

No matter how accurate your metric is—nay, _because_ of its accuracy—people
will be mad at you because it reflects an underlying reality they are in
denial of or believe is itself unfair, and want to rectify by requiring
fictions elsewhere.

I don’t envy people working on this because you by definition can’t win. Some
of the powerful political forces, on the one hand, and the demands for
accuracy and effectiveness, on the other, are irreconcilable.

~~~
jakelazaroff
That's why the grandparent mentioned feedback. Even if a disparity "aligns
with reality", if the algorithm is used in a way that reinforces the
disparity, then it's discriminatory. When designing algorithms as parts of
systems, we need to be careful to not ossify statistics we're ostensibly just
reflecting.

------
thisisit
Isn't the whole point of machine algorithms to find the best spot between
variance and bias? In which case, every algorithms will have some bias.

IMO, the focus should instead be on not overselling algorithms as being
infallible rather something which will have some bias and needs overriding
time to time. If a system is fully automated without checks and balances we
might have serious problems. A good non-ML example was discussed couple of
days ago on HN where a person was terminated by a machine without much
oversight:

[https://idiallo.com/blog/when-a-machine-fired-
me](https://idiallo.com/blog/when-a-machine-fired-me)

~~~
pliny
>Isn't the whole point of machine algorithms to find the best spot between
variance and bias? In which case, every algorithms will have some bias.

This references two different, almost opposite, meanings of the word bias. The
meaning in TFA is being individually subject to judgements about groups (eg an
individual AA prisoner being denied parole because AAs as a population have
higher rates of recidivism) whereas in the context of ML it refers to a model
ignoring important features of a dataset.

------
andrewlee224
Aren't the algorithms already reasonably fair? The researchers are just trying
to get them to be politically correct?

~~~
Sol-
Before throwing around accusations of political correctness you should
consider that questions like that are not new, they have just received more
attention now that algorithms decide much more things in life than in the
past.

For instance, US law has "disparate impact" provisions at least since the
civil rights act (a time which was probably not dominated by political
correctness), which requires outcomes to be not too different between races or
other groups.

(Though disparate impact is not a particularly good metric of fairness and
doesn't seem to be used so much in algorithmic fairness nowadays.)

~~~
marcoperaza
The disparate impact doctrine is quite controversial, especially the ever more
aggressive applications of it.

It has led to some unfortunate rules. For example, it’s (generally and
presumptively) illegal to hire based on intelligence tests, but seems to be
okay (in practice) to hire only from elite universities that select students
largely on the basis of SAT scores, which correlate very strongly with IQ.

~~~
dragonwriter
> The disparate impact doctrine is extremely controversial, especially the
> ever more aggressive applications of it.

What's mostly controversial is _fictitious_ applications of it.

> For example, it’s (generally and presumptively) illegal to hire based on
> intelligence tests, but totally okay to hire only from elite universities
> that select students largely on the basis of SAT scores, which correlate
> very strongly with IQ.

No, it's not. Both policies have differential impact, and both have the same
requirement of a tight connection to job performance under disparate impact
analysis. The main relevant differences are:

(1) There is an widespread cargo cult belief, including among many people
involved in hiring, that IQ tests are _categorically_ illegal in hiring, so
people avoid them in part based on a prohibition that does not actually exist,
and

(2) The same false belief above also is common among _workers_ , so they are
particularly likely to seek legal counsel if they experience an adverse hiring
decision and were subjected to an IQ test; at which point they are likely to
discover the actual disparate impact rule. People who have adverse results and
were, even overtly, subjected to another criteria which might invoke disparate
impact analysis are less likely to seek a remedy.

(3) IQ tests are overt and the failed applicant is unlikely to be ignorant of
the standard used; places that filter strictly by elite universities rarely
_disclose_ that, they just require a resume and apply opaque criteria to it.
Even if those criteria (such as an elite university filter) would provoke
disparate impact analysis, it is difficult for an injured party to know what
criteria were used (and to prove what criteria were used even if they know or
suspect that one was used that would be subject to disparate impact analysis.)

------
paulus_magnus2
This will be interesting. Most actions we people and ALL actions corporations
take are optimised for maximal self / personal gain and not for justice (which
is hard or impossible to define). This is the basis of neoliberalism. It will
be interesting to see where pressure points will emerge and who / how
negotiations will progress.

