
The accuracy, fairness, and limits of predicting recidivism - thomasz
http://advances.sciencemag.org/content/4/1/eaao5580/tab-pdf
======
thomasz
> Algorithms for predicting recidivism are commonly used to assess a criminal
> defendant’s likelihood of committing a crime. These predictions are used in
> pretrial, parole, and sentencing decisions. Proponents of these systems
> argue that big data and advanced machine learning make these analyses more
> accurate and less biased than humans. We show, however, that the widely used
> commercial risk assessment software COMPAS is no more accurate or fair than
> predictions made by people with little or no criminal justice expertise. We
> further show that a simple linear predictor provided with only two features
> is nearly equivalent to COMPAS with its 137 features

~~~
wu-ikkyu
>Proponents of these systems argue that big data and advanced machine learning
make these analyses more accurate and less biased than humans.

This assumes the data sets of the criminal punishment system are not
inherently biased, which is of course a false assumption.

All this does is make the runaway train of the US criminal punishment system a
more efficient machine.

~~~
vec
> This assumes the data sets of the criminal punishment system are not
> inherently biased, which is of course a false assumption.

It doesn't even require that assumption to go awry.

Assume there exists a stereotype, say "people with freckles are more likely to
be criminals". It doesn't actually matter if the stereotype has any basis in
reality, just that it is widely believed.

People will, on the margin, be less likely to hire freckled people. This
reduces the legitimate employment opportunities available to a freckled
individual, which tends to make the illegitimate opportunities more attractive
by comparison. So the stereotype becomes a self-fulfilling prophecy: by
assuming that freckled people are more likely to commit crimes, society
actually causes freckled people to be more likely to commit crimes.

An expert system will notice this and begin using "has freckles" as a
weighting factor in predicting recidivism. It's important to note that the
expert system is not wrong. Freckles _are_ in fact, at this point,
statistically correlated to recidivism. But the expert system can't know _why_
the correlation exists. All it can do is tighten the vicious feedback loop,
noticing a statistical correlation that strengthens the existing stereotype,
which exacerbates the real world impacts, which increases the statistical
correlation, which strengthens the stereotype, and so on.

------
gizmo686
Not the main point of the paper, but

>it is argued that the COMPAS score is not biased against blacks because the
likelihood of recidivism among high-risk offenders is the same regardless of
race (predictive parity), it can discriminate between recidivists and
nonrecidivists equally well for white and black defendants as measured with
the area under the curve of the receiver operating characteristic, AUC-ROC
(accuracy equity), and the likelihood of recidivism for any given score is the
same regardless of race (calibration)

Simpon's paradox [0], probably one of the most insidious problems of well
intentioned statistics.

[0]
[https://en.wikipedia.org/wiki/Simpson%27s_paradox](https://en.wikipedia.org/wiki/Simpson%27s_paradox)

~~~
harryh
How is this Simpson's Paradox?

~~~
gizmo686
There is a corralation across the entire population that does not exist in
'bucket' of the population.

More importantly, the bucketing involved is reasonable and has explanatory
power. (If this were not the case, you might have picked the buckets
specifically to get this result. Still an example of Simpsons Paradox, but not
interesting.)

------
mrow84
This is an interesting paper on a subject that has provoked debate on HN in
the past. I would be interested to read criticism of the methods from those
who have expressed support for these kinds of automated systems.

The most obvious criticism is that the uninspiring performance of the COMPAS
model does not provide any evidence of the weakness of automated systems in
general, which is surely correct. This kind of argument would not, however,
address more general concerns about how and why these kinds of systems are
adopted - why is this system being used when its performance is not obviously
competitive? (The answer to that is presumably "corruption", in a generalised
sense, but that is hardly an adequate response.)

Even more broadly, as the complexity of human society increases, it seems
likely that the difficulty of making constructive interventions will also
increase, and possibly more rapidly ("The Collapse of Complex Societies" by
Joseph Tainter investigates this hypothesis). It does seem risky to just
plough on, hoping that we can invent our way out of any problems that might
arise as a result of what we are doing.

~~~
stult
Well hold on. COMPAS performs as well as people but costs less than getting a
jury together or paying a professional. So I wouldn't say that's uninspiring.

It also performs as well as their simpler model. But they only evaluated the
simple model on data from a single county. There's not enough evidence to
determine whether that model would perform as well on other data sets. I would
presume, perhaps wrongly, that more work has been done to ensure the
generalizability of COMPAS, the results of which work militated against the
adoption of a simpler algorithm. Possibly because the algorithm has to avoid
racial disparities that might creep up in counties with a different
demographic mix. Even if the simpler model performs equally well across many
jurisdictions, that merely suggests that a simpler non-proprietary algorithm
would be better. But we're still choosing between algorithms, not discrediting
the idea of algorithmic sentencing altogether.

~~~
mrow84
As long as you are vaguely positivist then we are surely only "choosing
between algorithms" \- the question is rather how those choices are made. When
we consider computerised solutions we are nearly always forced to radically
simplify our frame, dispensing with "judgment". This can be an advantage,
because it reduces the scope for individual bias, but it can also be a
disadvantage, because it reduces the scope for mitigating _systemic_ biases.
It is much more difficult to design a perfect system than it is to apply
corrections to a merely good one - capturing what those corrections are,
however, is very difficult.

As an aside, do you have a source for the cost comparison, because I didn't
see anything in the paper?

~~~
stult
In the interests of disclosure and not as an assertion of authority: I must
admit that I am a lawyer and my perspective is shaped by my dissatisfaction
with the profession's lack of scientific rigor both in the academic and
practical spheres.

And as an initial matter, I would point out that there is no judicial
procedure which relies on algorithmic sentencing or bail decisions without a
judge reviewing the decision. COMPAS is merely used to produce a
recommendation for a bail amount which a judge then has to review and approve.
Defendants are still afforded a hearing where they can object and raise any
extenuating circumstances. So to the extent that judicial discretion can
address systemic biases, it already does.

And I certainly recognize the value of judicial discretion. Mandatory minimums
alone demonstrate how prioritizing the punitive and retributive goals of the
judicial system over simple human mercy can amplify the negative effects of
racial and other social biases.

However, my experience has been that the legal field has a strong systemic
bias _against_ empirical or statistical techniques. Essentially anything that
involves math. There's a running, tired joke at every law school that students
pursue JDs because they couldn't get a good score on the math GREs (the LSATs
in contrast involve no math).

And to clarify since I am at risk of confusing what we mean by the term bias:
I am not denying that the legal profession is susceptible to the social biases
you mention, but rather I would argue those biases are impossible to address
until the deeper, methodological biases in how the legal profession pursues
objective truth are addressed.

Algorithmic sentencing and bail setting software can assist there in two ways.
First, it ensures the collection of extensive, objective, and standardized
data that can be used to evaluate the potential systemic biases you are
concerned about. A single, consolidated, structured database with enormous
amounts of information about each individual defendant is a gold mine for
researchers. Simply put, we cannot even begin to resolve our biases without
first identifying them. Software like COMPAS can help (though I'm not sure if
it does in this case because I don't think they provide public access to the
data).

Second, the introduction of this software gives lawyers a chance to get used
to machine learning algorithms and other advanced statistical tools. They're
powerful techniques and need to become a larger part of the standard lawyer's
tool kit.

I don't think COMPAS is the terminal destination for sentencing software. In
fact, I really hope otherwise. But, as a profession, I think we need to start
taking steps to modernize our practices, one little step at a time. Each
individual step may be imperfect at first, but so long as we maintain judicial
discretion and careful oversight, these tools will be a boon for American
justice.

And in response to your last question, I don't have a citation on hand. I've
read it somewhere before. Basically they replace a professional clerk or low
level prosecutor in the DA's office who prepares a bunch of data and
biographical information for the bail recommendation to the judge. It ends up
being about one FTE a year and the software costs less than that to license.

~~~
mrow84
Thanks for a comprehensive reply, and I apologise for not seeing it earlier.
To frame what follows, my perspective is that of someone who spends a lot of
time learning about and using these kinds of techniques.

You seem to appreciate the problems with potential biases, and your point
about data collection is well taken, though I would note it doesn't require
application of algorithmic sentencing. I would, however, urge you to continue
to think about the difficulty of automating the sentencing decisions made by a
human, from a mathematical perspective - the problem amounts to performing
factor analysis by simply throwing away a large number of factors that are
difficult to measure. It is not clear to me that that method will be able to
capture those difficult parts of the distribution that people intuitively
refer to as justice.

I also urge you to continue to think about the difficulty of optimising that
automated sentencing procedure, which requires you to select parameters to
optimise for. This problem faces similar difficulties - whereas a human might
update their judgment on a wide range of observations (potentially including
everything they experience), for an automated procedure we typically have to
choose a handful of parameters based on some expert intuition about what is
important. The choice of those parameters can be critical, and can expose
feedbacks that would be naturally corrected by humans.

The previous two points are often dismissed as merely being difficult
challenges, rather than fundamental problems with the techniques, and, whilst
I am inclined to agree with that perspective, they are _difficult_ challenges,
and the decisions that such systems end up making can (and often do, in my
experience) have consequences that weren't expected.

This leads to my political reason for being cautious about these systems,
which is motivated precisely by the fact that they can be cheaper, at least
when one is only making a comparison based on decisions per dollar, or
something along those lines. I am happy for these systems to be used by
informed practitioners as a mechanical aid to their decision making, but cost
savings are such an evergreen issue that I fear pressures to dispense with the
human and "make do" with the automated system will become too strong to bear
(for the politicians, not the lawyers).

Ultimately one can imagine a system that is constantly improved over time (an
expensive prospect) could result in more objective sentencing - my concerns
are that uncaring politicians will exploit the availability of the techniques
to cut costs, and that once those costs are cut it will be difficult to
justify the (relatively) increased expenditures necessary to fund
improvements. I don't have your insider's perspective, but I suppose overall I
struggle to accept that saving costs in this way is one of the most necessary
reforms to the American justice system.

------
gadders
On a semi-related note, Theodore Dalrymple (ex-prison psychiatrist) argued
that the practise of parole should be ended for similar reasons:
[https://www.spectator.co.uk/2018/01/parole-is-unfair-and-
unw...](https://www.spectator.co.uk/2018/01/parole-is-unfair-and-unworkable-
lets-abolish-it/)

