
Machine Bias - r0h1n
https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
======
kough
This is totally fucked. Morally wrong, deeply unethical, and probably illegal
– if you're adding punishment without having that additional punishment based
on new evidence, isn't that like being treated guilty without proof? Obviously
I'm not a lawyer, but how could anyone, let alone the whole huge set of people
that led to these policies, think that applying group statistics to
individuals to determine the severity of their punishment is ok?

On the other hand, these biaces (most notably the racial ones) exist in the
process anyway, and now they're simply being codified and exposed. If these
algorithms were published we could see exactly how much more punishment you
get for being black in America versus being white.

Thanks again to ProPublica for an important piece of reporting; hopefully
changes get made for the better.

~~~
mikeash
Punishment is always considered somewhat separately from the determination of
guilt. The judge would already try to account for things like this when
determining your sentence. They just do it in a deeply ad hoc and personal
manner, where they just take a stab at it, try to account for things like how
sorry you seem to be, apply guidelines, and come up with a number. This means
that you might ultimately be punished for the judge not having a good
breakfast:

[http://www.scientificamerican.com/article/lunchtime-
leniency...](http://www.scientificamerican.com/article/lunchtime-leniency/)

And of course it goes without saying that judges will be affected by their
biases, racial and otherwise.

I'm not sure what to do about it, though. Handing down the exact same
punishment for every single person who commits a particular crime seems too
blind. But any variation is going to be problematic.

~~~
pessimizer
All this does is systematize those biases so that they can't be challenged
like a judge with a record of bias can. The statistics that they choose to
record create bias in and of themselves - by using race in the algorithm, you
are building in the possibility that race influences criminality. If you built
in favorite foods, some foods would end up resulting in higher sentences than
others, just as if you built in phases of the moon when the crime was
committed or the astrological sign of the victim.

Where there was absolutely no effect, one out of every twenty combinations of
all other variables would show significance in combination with the current
value of that particular variable in the likelihood of future crime.

Furthermore, the algorithm would simply extend existing biases in arrest and
sentencing, because it simply can't account for crimes that are uncaught and
unpunished. Groups that are stopped, searched, arrested, and convicted at
greater rates would without fail be sentenced to more time. Just another
benefit of being white in America.

You end up using the fact that some groups are punished _more often_ to
justify punishing them _more harshly._

Even worse, I bet that the fact that it thinks that women are at a higher risk
for recidivism means that somewhere within the algorithm it's using the fact
that women in general are less criminal than men to decide that women who do
commit crime are more exceptional (within women), and therefore more deviant.
It's disgusting. If you can't legally discriminate against a person on
particular grounds, you certainly can't feed those grounds into an algorithm
to let it discriminate for you while you shrug and feign innocence.

The algorithm is the innocent one - it's just attempting to reflect the system
as it is. It's like an algorithm you would write to predict the winners of
horse races, or the sports book. And just like one of those algorithms, if you
stuff it with garbage (the kind of garbage that makes it wrong 77% of the
time), it will result in garbage. If you use the results for something not
external to the system, bad variables will feed back into themselves and make
the results progressively worse - what's the effect of a longer sentence on
recidivism? How does profitable is the arbitrage on your sports book algorithm
if people use the results to bet, and the distribution of bets shift the odds?

~~~
argonaut
But they can be challenged. That's why you're reading an article about it. If
you have a judge that is biased, it is probably harder to challenge his
sentences than if you had an algorithm that you proved was biased.

------
Malarkey73
One of the most mind boggling sentences in that article was:

"On Sunday, Northpointe gave ProPublica the basics of its future-crime formula
— which includes factors such as education levels, and whether a defendant has
a job. It did not share the specific calculations, which it said are
proprietary."

How on earth can you lock people up based on secret information? That is Kafka
meets Minority Report.

~~~
yummyfajitas
This is done regularly. It's called "judicial discretion" \- a judge uses a
neural network so secret that even he doesn't understand it (in fact the
entire scientific field of "neuroscience" exists to try and analyze it).

Variables used in the formula include details of the case, race/appearance of
the defendant, and how recently lunch was at the time of sentencing. Unlike
the ProPublica claims of racial bias (which are merely "almost statistically
significant" at the p=0.05 level), the lunch bias is statistically significant
at the p < 0.01 level.

[http://www.pnas.org/content/108/17/6889.full](http://www.pnas.org/content/108/17/6889.full)

This system sounds like a huge improvement.

~~~
mattkrause
Just FYI: The lunch paper has very serious problems as described in this
reply, also published in PNAS:
[http://www.pnas.org/content/108/42/E833.full](http://www.pnas.org/content/108/42/E833.full))

In particular, the cases are heard in a particular order. For each prison, the
prisoners with counsel go before those who are representing themselves. As in
the US, those representing themselves typically fair worse. The judges try to
finish an entire prison's worth of hearings before a meal, so the least-
likely-to-succeed cases are typically assigned to spots right before a break.

There are some other bits of weirdness in the original data too. They found a
statistically significant association between the ordinal position (e.g., 1st,
2nd, ..., last) and the parole board's decision, but failed to find any effect
of actual time elapsed (e.g., in minutes), even though the latter is much more
compatible with a physiological hypothesis like running out of glucose.

~~~
yummyfajitas
Interesting, I was unaware. I need to associate more uncertainty to my beliefs
about how terrible humans are at making decisions.

------
pdkl95
Weapons of Math Destruction

[http://boingboing.net/2016/01/06/weapons-of-math-
destruction...](http://boingboing.net/2016/01/06/weapons-of-math-
destruction-h.html)

It's easy to hide agenda behind an algorithm; especially when the details of
the algorithm are not publicly visible.

~~~
yummyfajitas
It's far easier to hide an agenda behind verbiage and anecdotes. Go read the
author's actual statistical analysis:

[https://github.com/propublica/compas-
analysis/blob/master/Co...](https://github.com/propublica/compas-
analysis/blob/master/Compas%20Analysis.ipynb)

In the statistical analysis (unlike the verbiage) she is completely unable to
hide the _lack_ of bias and the _accuracy of the algorithm_ , all of which are
clearly on display in line [36]. In contrast, her verbiage somehow conveys the
exact opposite impression.

~~~
zyxley
Uh... it's all right there in your link, across several sections that analyze
specific parts of the data.

> Black defendants are 45% more likely than white defendants to receive a
> higher score correcting for the seriousness of their crime, previous
> arrests, and future criminal behavior.

> Women are 19.4% more likely than men to get a higher score.

> Most surprisingly, people under 25 are 2.5 times as likely to get a higher
> score as middle aged defendants.

> The violent score overpredicts recidivism for black defendants by 77.3%
> compared to white defendants.

> Defendands under 25 are 7.4 times as likely to get a higher score as middle
> aged defendants.

> [U]nder COMPAS black defendants are 91% more likely to get a higher score
> and not go on to commit more crimes than white defendants after two year.

> COMPAS scores misclassify white reoffenders as low risk at 70.4% more often
> than black reoffenders.

> Black defendants are twice as likely to be false positives for a Higher
> violent score than white defendants.

> White defendants are 63% more likely to get a lower score and commit another
> crime than Black defendants.

Calling out one specific section that doesn't show bias doesn't magically
exonerate the rest.

~~~
yummyfajitas
None of these things are evidence of bias.

The algorithm is biased if it's giving the wrong score due to race or
redundantly encoded race. To show that the algorithm is biased, you need to
show that (score, race) pairs are more predictive than (score, ) singletons.

Line [36] and [46] both attempt to address this question. The only one of
these which is statistically significant is
"race_factorOther:score_factorHigh" in line [46].

The other things you bring up are interesting, but do not show bias. At best
they show disparate impact which isn't remotely the same thing.

~~~
TheCoelacanth
It is consistently giving incorrectly low scores to white subjects and
consistently giving incorrectly high scores to black subjects. That is clearly
bias, at least in the colloquial sense.

~~~
yummyfajitas
The degree to which it does this cannot be distinguished from random chance (p
> 0.05).

If the predictor were biased then you could build a more accurate score based
on both the original scores and race_factorBlack:score_factorHigh (and other
interaction terms). I.e. you'd be building a new bias in to cancel the old
bias, leaving an accurate predictor.

Their analysis doesn't show that this is possible.

~~~
TheCoelacanth
p > 0.05 is the type of cutoff you would see to get published in a peer-
reviewed paper. Such a high bar of evidence is not necessary in this
situation. To prevail in a civil suit, a person harmed by this algorithm would
only have to prove that is more likely than not that the algorithm is biased.

------
yummyfajitas
According to propublicas own analysis, the claim of bias cannot be shown to be
statistically significant. [https://www.propublica.org/article/how-we-
analyzed-the-compa...](https://www.propublica.org/article/how-we-analyzed-the-
compas-recidivism-algorithm)

This article is terrible data journalism and probably deliberately misleading.

Step 1: write down conclusion.

Step 2: do analysis.

Step 3: if analysis doesn't support conclusion, write down a bunch of
anecdotes.

Really, here's her R script: [https://github.com/propublica/compas-
analysis/blob/master/Co...](https://github.com/propublica/compas-
analysis/blob/master/Compas%20Analysis.ipynb)

Just read that. It's vastly better than this nonsensical article.

~~~
daveguy
They analyzed what they could -- the outcomes of the algorithm
(recommendation) and the accuracy of those recommendations. They picked out
specific examples, but the analysis was over the whole data set. I think you
missed these relevant parts from the article:

> We obtained the risk scores assigned to more than 7,000 people arrested in
> Broward County, Florida, in 2013 and 2014 and checked to see how many were
> charged with new crimes over the next two years, the same benchmark used by
> the creators of the algorithm.

> The score proved remarkably unreliable in forecasting violent crime: Only 20
> percent of the people predicted to commit violent crimes actually went on to
> do so.

> The formula was particularly likely to falsely flag black defendants as
> future criminals, wrongly labeling them this way at almost twice the rate as
> white defendants. White defendants were mislabeled as low risk more often
> than black defendants.

> Could this disparity be explained by defendants’ prior crimes or the type of
> crimes they were arrested for? No. We ran a statistical test that isolated
> the effect of race from criminal history and recidivism, as well as from
> defendants’ age and gender.

> Black defendants were still 77 percent more likely to be pegged as at higher
> risk of committing a future violent crime and 45 percent more likely to be
> predicted to commit a future crime of any kind.

~~~
yummyfajitas
Go read the description of the statistical analysis or just view their R
notebook:

[https://github.com/propublica/compas-
analysis/blob/master/Co...](https://github.com/propublica/compas-
analysis/blob/master/Compas%20Analysis.ipynb)

Their own analysis shows that (p ~= 0) that high and medium risk factors are
predictive. They also showed that the racial bias terms (race_factorAfrican-
American:score_factorHigh, etc) are probably not predictive (p > 0.05).

Your quotes are not evidence of bias, though I see how they might confuse an
innumerate reader. It's interesting how good a job this article is doing
confusing the innumerate - it's almost as if it was written to mislead without
technically lying.

For example, black defendants being pegged as being more likely to commit
crimes can be caused by one of two things: bias or perhaps black defends
_actually are_ more likely to commit crimes. According to ProPublica's own
analysis (see race_factorAfrican-American), the latter is actually the case.
This is true with p = 4.52e-06 - see line [36].

~~~
daveguy
I read through the entire analysis. It appears that you stopped reading after
you saw a p-value that supported your bias. That is bias in the sense of pre-
conceived notion. You then proceeded to pedantically argue that the well
demonstrated bias of the algorithm (more false positives for blacks than
whites about 40% vs 20%) does not exist because of a p-value that came in
between 0.05 to 0.1 instead of below 0.05.

Please let me know when your reading comprehension catches up with your
mediocre statistics comprehension.

Maybe you just didn't realize that the 20-20 hindsight data -- prediction vs
recidivism -- is included right there in the analysis. Or maybe you did
realize it later and just decided you'd dug in so much that you didn't want to
admit your ignorance.

Or maybe you still haven't comprehended the difference between the meanings of
the word bias.

------
thejefflarson
Thanks for posting this. I encourage this crowd to to take a look at the
methodology too: [https://www.propublica.org/article/how-we-analyzed-the-
compa...](https://www.propublica.org/article/how-we-analyzed-the-compas-
recidivism-algorithm)

~~~
gleb
Are you sure what you found is not just Simpson's paradox?

When I look at the 2 KM plots for white/blacks, they are mostly the same. It's
pretty clear that the model is not prejudiced against blacks, in fact it's
somewhat prejudiced against whites. [1]

Your main editorial claim is that whites tend to be misclassified as "good"
and blacks as "bad."

But I think what's actually happening is that algorithm is more likely to
misclassify low_risk as "good", and high_risk as "bad".[2] Combine that with
vastly more whites than blacks being low_risk (as you show earlier) and you
get the observed "injustice".

I'll also note that the KM for whites flatten out at 2 years, unlike for
blacks. This is actually a big deal if statistically significant. But that's a
separate conversation.

Footnotes:

1 - this is acknowledged in methodology page "black defendants who scored
higher did recidivate slightly more often than white defendants (63 percent
vs. 59 percent)."

2 - why that is I don't yet fully understand (and I'd like to) but it looks's
to be simple math that follows from low risk mostly not recidivating, and high
risk mostly yes recidivating

------
wyager
I don't have an issue with using statistical analysis to direct crime
prevention efforts. I think it's unconscionable to use statistical analysis
for sentencing. We don't want Minority Report in real life.

~~~
michaelbuddy
I think the problem is, the amount of crime is causing the legal system to
buckle. So there is a search for solutions to make the process more efficient.
There may be some value to sentencing standards that _may_ serve as a
deterrent.

In other words, if you are a repeat offender, some cases you think you know
what your lawyer can do for you. But a system replacing that, one that is
overly harsh may deter you. All things being equal in a system of punishment,
I think I want the one that's got some deterrence in it. So this is worth
exploring.

If every criminal knew that getting caught meant being put into a meat grinder
of sorts, I wonder how that would change their thinking about how to navigate
the world and problem solve.

------
Dowwie
Consistent with the theme of this story is the content from discussions held
at a conference at NYU School of Law, featuring human rights and legal
scholars. Coincidentally, I submitted a link on this yesterday

See
[https://news.ycombinator.com/item?id=11753089](https://news.ycombinator.com/item?id=11753089)

------
thisisdave
In block [37] of the ipython notebook, are racial main effects missing? I only
see interactions.

[https://github.com/propublica/compas-
analysis/blob/master/Co...](https://github.com/propublica/compas-
analysis/blob/master/Compas%20Analysis.ipynb)

