
Sent to Prison by a Software Program’s Secret Algorithms - gscott
https://www.nytimes.com/2017/05/01/us/politics/sent-to-prison-by-a-software-programs-secret-algorithms.html
======
jawns
Haha, I just looked up the product sheet for Compas CORE, the software at
issue:

[http://www.equivant.com/assets/img/content/Risk-Needs-
Assess...](http://www.equivant.com/assets/img/content/Risk-Needs-
Assessment.pdf)

It has a section titled "Make Defensible Decisions":

> Fully web-based and Windows compliant, COMPAS is applicable to offenders at
> all levels from non-violent misdemeanors to repeat violent felons. COMPAS
> offers separate norms for males, females, community and incarcerated
> populations.

That has ... nothing to do with defensibility. Defensibility is being able to
say, "Here's the evidence and the reasoning I used to arrive at this
conclusion."

What I think is especially frustrating is that in all three risk scales the
product offers -- risk of new violent crime, risk of general recidivism, and
pretrial risk -- they're dealing in probabilities that don't work on an
individual level. That's why they're called "norms." So, if you have 1,000
offenders who match a certain profile, you might be able to fairly accurately
predict that 40% of that population are going to reoffend. But being able to
predict _which_ offenders constitute the 40% and which constitute the 60% is a
totally different thing. There is simply no way to take these probabilities
that apply to groups of people and apply them in any fair way to individuals.

~~~
sideshowb
Not saying it's right but isn't that exactly what we do with insurance for
driving, health etc?

~~~
pdkl95
Many types of data are prohibited for various types of insurance decisions.
This varies greatly, from general rules such as banning price discrimination
due to race or sex, to specific topics like the ACA[1]'s prohibition against
denying coverage due to "pre-existing conditions".

With modern data analysis methods, it is (sometimes) possible to infer
prohibited data from other collections of data. For example, various machine
learning techniques used on economic values such as income, home value, and
proximity to good schools _might_ reveal something about recidivism rates, but
it's probably finding a confounding variable such as redlining[2] or other
types of institutional bias[3].

This _is_ a problem in both legal decisions and insurance, and transparency in
how decisions are made is vital in _both_. That said, the impact on insurance
is probably[4] a relatively minor change in price, which is very different
from a Boolean decision about someone's freedom. Even in the unlikely case
where the statistical biases are known and accounted for, it still isn't
appropriate to over-interpret results about _populations_ as if they apply to
any particular _individual_.

[1] Affordable Care Act ("Obamacare")

[2]
[https://en.wikipedia.org/wiki/Redlining](https://en.wikipedia.org/wiki/Redlining)

[3]
[https://www.youtube.com/watch?v=qXQA6D4JC0A](https://www.youtube.com/watch?v=qXQA6D4JC0A)

[4] for definitions of "probable" somewhere between "an educated guess" and
"meh; whatever"

~~~
sideshowb
Good answer. Btw in the EU it has only recently been made illegal to
discriminate on sex for car insurance rates.

------
a3n
Consider:

\- Write the algorithm out.

\- Extract all the decision points.

\- Pay legal interns to collect all the data, apply the data to the decision
points.

\- Produce a report.

\- Produce a recommendation.

\- The judge sentences based on that process.

And now tell the defense that they have no right to any knowledge about the
above process. Not the decision points, and not what data was fed to what
decision, and not even what the data was.

If you can't release that information to the defense because of trade secrets,
then, it seems to me, it's not appropriate as a sentencing tool.

Of course, IANAL.

~~~
crb002
You could do automatic differentiation. Take each input, race, sex, age, ...
and see how the output changes. If one field has a huge swing it might be a
good basis for claiming variable X based discrimination.

~~~
BearGoesChirp
Sex would have a major impact. At least in the US our justice system is
extremely sexist, far more than it is racist. I once saw sex offender
guidelines that automatically put the default sex offender sentence as medium
for males and low for females (the default was used before an official
evaluation was completed, and I would bet some money that the official
evaluations discriminated based on sex).

While I don't disagree our legal system is also racist, the racism pales in
comparison to the current level of sexism.

~~~
dTal
That's not a bug, it's a feature! User jawns posted some of their marketing
copy elsewhere in this thread:

"COMPAS offers separate norms for males, females, community and incarcerated
populations"

And they call that "defensible decisions"! In other words, they want to make
sure the software doesn't rock the boat. If there's a bias, they want to
preserve it so it doesn't look like their algorithms are "weird".

~~~
a3n
> In other words, they want to make sure the software doesn't rock the boat.
> If there's a bias, they want to preserve it so it doesn't look like their
> algorithms are "weird".

Isn't that what judges do when they follow precedent?

~~~
dragonwriter
No, and judges don't always follow precedent (they theoretically have to when
it is binding, but no precedent is binding on the highest court, even it's
own.)

When they follow precedent, they are not matching outcomes to outcomes to
avoid looking wierd, they are applying what was previously established as the
formal _decision rule_.

That's a very different thing that attempting to preserve biases that may not
even reflect the formally-annoubced decision rules in the cases that produced
them, but may represent silent deviations from the formal rules.

------
danso
It's worth remembering that there are already algorithms used in determining
how long a prison sentence should be, such as how much time off an inmate gets
for good behavior, and whether there are mitigating factors about the
conviction that would reduce the inmate's chances of early parole. Keeping
track of these calculations for all the inmates isn't a trivial task in that
the Bureau of Prisons has a separate complex in Texas to handle it: the
Designation and Sentence Computation Center
[https://www.bop.gov/inmates/custody_and_care/sentence_comput...](https://www.bop.gov/inmates/custody_and_care/sentence_computations.jsp)

California's rule for `good behavior == half the prison sentence` [0], like
any algorithm implemented in the real world, is ostensibly based on data
analysis (studies of recidivism) and cost trade-offs. So is the 3-strikes-law
[1], which vastly simplifies the cost and complexity of sentencing judgments,
while making moral tradeoffs.

And like in software systems, sometimes judicial algorithms have severe bugs.
The one that recently comes to mind is how sex offenders face penalties beyond
prison because of alleged recidivism worries. Apparently, the belief that sex
offenders are at a greater danger of recidivism was based off of an
unsupported statistic found in a magazine that ended up being cited in a
pivotal Supreme Court decision [2]

[0] [http://www.latimes.com/local/crime/la-me-ff-early-
release-20...](http://www.latimes.com/local/crime/la-me-ff-early-
release-20140817-story.html)

[1] [https://en.wikipedia.org/wiki/Three-
strikes_law](https://en.wikipedia.org/wiki/Three-strikes_law)

[2] [https://www.nytimes.com/2017/03/06/us/politics/supreme-
court...](https://www.nytimes.com/2017/03/06/us/politics/supreme-court-repeat-
sex-offenders.html)

~~~
Bartweiss
I worry that articles like this are driven by an unwillingness to grapple with
how screwed up the _whole_ system is.

Sure, opaque decisions made by software are creepy. But mostly, they
faithfully reproduce the decisions humans (judges and legislators) were
already making, and are in fact trained on those decisions. Algorithms are an
easy no-guilt target to write about, but I worry that it obscures deeper
sentencing issues.

~~~
tasty_freeze
Say in a trial an expert witness is called -- based on the data, was this fire
intentionally set or an accident? That expert might be a poor one, but the
other side has the chance to cross examine and expose that expert as being
unreliable.

In the case of this software, though, there is no cross examination. Say the
secret sauce is simply a data set and Bayesian reasoning. How do we know if
the data set is representative? If it is just an opaque box, that is
unacceptable.

~~~
Bartweiss
The arson example is an interesting one.

As far as I can tell, pretty much all of arson science was fundamentally
broken for at least 30 years. What were the basic principles of the field -
that arson fires burned hotter than others, that puddle marks indicated
accelerants, that multiple-origin burns indicate arson - are all wrong.

Multiple people were executed for arson homicides on this evidence. Some were
executed or even convicted _after_ these principles were rejected. Nor was
this a "science marches on" thing. The assumptions had simply been made based
on observation and intuition, with no formal testing _ever_.

So this is the sort of thing I mean. It's not a poor expert who can be cross-
examined. It's an entire field producing entirely broken results for decades
using unquestioned best practices. You can't cross-examine "are even textbooks
in your field a complete pack of lies?" and win.

I understand the concern, I'm not comfortable with this, but I'm also not
convinced that black box software is actually much different from what we have
now. The grounds for conviction and sentencing are often completely opaque to
the defense already.

[http://www.newyorker.com/magazine/2009/09/07/trial-by-
fire](http://www.newyorker.com/magazine/2009/09/07/trial-by-fire)

[https://books.google.com/books?id=GSJ7Ja95oegC&pg=PA382&lpg=...](https://books.google.com/books?id=GSJ7Ja95oegC&pg=PA382&lpg=PA382)

------
mnm1
How do we know that the algorithm is at all appropriate? How do we even know
it's an algorithm at all and not a mechanical turk? The algorithm could be to
hand off the case files to some minimum wage college student and have them
come up with some nice charts and graphs. After Theranos, this isn't even a
hypothetical. It is clearly an injustice to use a secret algorithm
_regardless_ of outcome. But once again, corporate profit must come before
justice, it seems.

------
zyxzevn
I think that the problem is very simple:

a) The person involved has to obey the law.

b) The computer-program dictates the law and punishment.

c) In a free country, a person should be able to follow the law.

So: The person must be able to know the law, and thus must be able to know the
program.

If the program is secret in any way, the person can never obey it.

So: if the punishment is based on this program alone, the person should be let
free.

~~~
CompanionCuube
The program dictates the punishment, not the law. The program can be secret
and have no impact on the ability of the person to follow the law which
remains public.

------
bhhaskin
It seems to me that we are at a critical tipping point in becoming a dystopian
society. If we are not careful we are going to fall right off the edge. Crazy
times we live in.

------
creaghpatr
The obvious flipside: Sent to Prison at The Discretion of a Human Being.

~~~
danso
Reminds me of this 2011 study of how time relative to lunch break seemed to
have an impact on judges' leniency:
[https://economix.blogs.nytimes.com/2011/04/14/time-and-
judgm...](https://economix.blogs.nytimes.com/2011/04/14/time-and-judgment/)

~~~
schoen
This has become pretty proverbial for questioning human judges' objectivity,
but another interpretation has been suggested: possibly different kinds of
matters were commonly scheduled for morning and afternoon sessions (for
example, maybe defendants represented by counsel, or those expected to enter a
plea bargain, were commonly seen before lunch).

I forgot where I read this point, but it might imply that the judges' hunger
is not necessarily the only factor leading to the different outcome.

~~~
Bartweiss
This got passed around from SlateStarCodex, not sure where else it got
traction.

But yes: a followup study found that judges schedule open and shut cases for
near lunch so they won't get held up, and more ambiguous cases for open
periods. Since "open and shut" usually means "guilty", the original study
found higher conviction/sentence rates for near-lunch cases. Remove cases
where judges can set their own case schedule and you see way less of this.

~~~
schoen
I thought it was probably on SSC and I even searched there, but didn't manage
to find the article. Thanks for the summary!

------
simpfai
I found this paper[1] to be a particularly good source of information on the
legal landscape of using algorithms in the judicial system.

[1] Barocas, Solon, and Andrew D. Selbst. "Big data's disparate impact."
(2016). APA
[https://www.accmeetings.com/AM16/faculty/files/Article_461_D...](https://www.accmeetings.com/AM16/faculty/files/Article_461_D117_SSRN-
id2477899.pdf)

------
gajjanag
The issue raised by this article is discussed in a broader context in the
excellent book by Cathy O'Neil: "Weapons of Math Destruction: How Big Data
Increases Inequality and Threatens Democracy". The dangers of proprietary,
secret algorithms making judgements at critical junctures (e.g whether a
person is sentenced or not) is raised in the introduction of the book.

The book also gives copious concrete examples of these dangers. In particular,
the book describes the LSI-R (Level of Service Inventory-Revised)
questionnaire and how it effectively pinpoints race even though it does not
actually ask for the person's race, which is illegal.

------
smcg
I think I've seen this Black Mirror episode.

------
bobcostas55
We don't have access to the sentencing algorithms in the minds of judges,
either.

~~~
zardo
They don't keep their reasoning secret and sell their sentancing
recommendations either.

~~~
Jtsummers
There's reasoning and there's justification. It's entirely possible that the
reason a judge does something is not revealed by the justification they
present.

------
yeukhon
As a technologist, I feel there are problems we can solve using computer, and
there are problems we cannot solve using computer.

I am a big fan of smarter computation, but when it comes to legal judgement, I
defer to a human. We all have heard of bizarre rulings before (I don't have to
remind everyone the Stanford rape scandal last year), but the human
involvement in making a judgement call is what makes justice system precious.

I am a big fan of kicking out Donald Trump out of office. I would describe
this secret algorithm as Donald Trump. Some data were collected, not sure how
much, how authentic, and how much bias has been introduced. We just know some
answer is produced. The algorithm might be as simple as tossing a coin. If I
can't trust the leader of our government, how can I trust a machine, whether
secret or not, making judgement call, when human is prone to making poor and
irrational judgement call.

So why human if human are prone to make mistake and to make unfair judgement?
Because there should be humanity in justice. Yes, Lady Justice is blind-
folded, but that doesn't mean we can't show compassion or anger.

Is there a real correlation of crime rate and number of years in prison? I
heard many said criminals are likely to commit crime again because either the
criminals have no other skills to depend on, or they have mental illness that
prevented them from obeying laws. So if the dataset says 90% of recurrence
rate, are we going to sentence people longer? Then why not lock the person up
or go for an immediate execution if we want peace?

You see, the purpose we want to add to a jail senetence is correction. This is
not an ideal talk. There are many convicts do turn out right and fine if they
are given the chance to redeem themselves. We shouldn't be begging for a safe
prison, a prison with staff ready to help, because those should be a
requirement of a jail.

I can't help but to remind myself Futurama, there are robot judges (one of the
cops is also a robot). We should fear people trying to robotized our humanity.
If judging can depend on data, then raising a kid from infant to adulthood
could be done using algorithms too. We just need lots of data, lots of
simulations.

------
1024core
Hypothetically speaking: what if the algorithm had hardwired names and
probabilities? {name:"Loomis", prob_recidivism:0.99} ? Then you're just
throwing him into jail because a lot of other Loomises turned out to be jerks.

He must be allowed to verify the validity of the algorithm. The current
situation is just 1 step above secret laws.

------
kirykl
If the prison system actually focused on reform, perhaps history and this
algorithm would be a largely trailing indicator of repeat offense

------
iplaw
My main issue with this algorithm is that it purports to predict THE
DEFENDANT'S actual risk to the community. That's impossible to do.

Instead, it seems like it should provide data-centric trends based on
objective data and metrics, such as age, sex, race, socio-economic metrics,
housing, charges, peers, gang affiliations, state, city, ZIP, block,
historical recidivism, charge severity, marital status, child status, and any
other obtainable piece of data. From this, you would be able to generate
concrete statistical evidence which could be used to supplement, not replace,
the standard factors considered by the Court. This data could be used by
prosecution or defense to counteract any intentional bias introduced by those
running the numbers.

Even then, however, this data would be generalized, treated as a predictor,
and applied as prophecy to the particular defendant in question. It seems like
it should be inadmissible.

~~~
olliej
Here's the problem with saying include "age, sex, race, socio-economic..." as
being /objective/ is that it literally repeats existing stereotypes and
biases.

There's plenty of evidence that there is a huge amount of bias due to race,
being poor means you can't afford good lawyers, etc. So now what happens is
the information hat feeds into the algorithm is going to be something like
"<X> currently receives longer sentences" leads it to just repeat the existing
behaviour, but now it claims to be a machine and so unbiased, so judges defer
to it.

I feel that as a defendant i should be able to ask them to re-run the
algorithm with a changed race and a changed zip code so we can get a
comparison and measure of effective algorithmic bias.

~~~
iplaw
My suggestion was to 1) not use any data or algorithms, or 2) use an objective
algorithm to figure out if poor white males with criminal records but with a
new child are likely to be repeat violent offenders.

I did not suggest to use a single variable, nor did I suggest that it be used
to determine guilt, explicitly dictate sentencing, etc. If there is going to
be any algorithmic "fact" finding, it should involve plugging in all available
data for a defendant and trending it against all available data. Not cherry
picking.

