
AI ‘judge’ can predict court verdicts with 79 per cent accuracy - CapitalistCartr
http://www.telegraph.co.uk/science/2016/10/23/artifically-intelligent-judge-developed-which-can-predict-court/
======
Grue3
There are countries where more than 90% of court cases result in "guilty"
verdict. My AI judge that always says "guilty" has very high accuracy in these
jurisdictions.

~~~
binalpatel
This is the imbalanced class problem that comes up often in the real world.

You can't optimize for overall accuracy, otherwise you'd have classifiers that
always just predict guilty (or not sick, or not fradulent, and so on) because
they'd achieve 95%+ accuracy.

~~~
hammock
So what is the metric to optimize?

~~~
binalpatel
I use AUC often, and always look at the confusion matrix. Metrics based off of
it (recall, precision) are both useful to use as well.

In the end the probability cutoffs you choose are more often than not based on
business context and costs associated with various actions. For example - if
we're trying to predict whether a customer is going to leave or not, I'd
choose a different threshold based on whether we're mass e-mailing people, or
if we're having customer service reps call people, since the underlying costs
for each action are so different.

In the first case it's fine to cast a wide net, and e-mail lots of people who
we're less certain about, in the second we'd optimize to target people we're
most sure are going to leave, since it costs so much more to reach out to
them.

------
mtgx
I'm always amused by people who see these headlines like "AI does X with 80%
accuracy" and think that's _high_.

That's _terrible_ accuracy. Imagine this algorithm would then be refitted to
replace the judge (or augment judges in a way that judges mostly rely on the
tool to make a decision, which is already starting to happen [1]) and
automatically put people in prison - with _only 80% accuracy_. Does that
number still seem high now?

[1] - [https://www.propublica.org/article/machine-bias-risk-
assessm...](https://www.propublica.org/article/machine-bias-risk-assessments-
in-criminal-sentencing)

~~~
arcanus
I generally agree with your skepticism of \% as a meaningful metric for AI
accuracy or technological prowess.

However, do we have any reason to believe that a collection of random,
qualified judges would be able to predict the outcome of a court verdict with
80% accuracy? I would be surprised if judicial decisions are not stochastic.

Also, we should consider a substitution effect. According to a cursory
internet search, 'As of 2012, the pay for federal district judges was $174,000
per year'. If this system is substantially less expensive to purchase and
maintain, then there is an argument to be made that it is more _efficient_ ,
and provides better value, even with a perpetually reduced accuracy.

~~~
tpm
Even if it was cheaper to purchase and maintain, it wouldn't be more
efficient, because money spent is not necessarily the measure of the
efficiency of the justice system. That could be, for example, lower
criminality. If a system fails to jail enough criminals to do that, even if
it's cheap, it's not working. And also, false positives (e.g. jailed innocent
people) are especially troubling and can be very expensive in a justice
system.

------
ppereira
It should be noted that these predictions are not based on the lawyers'
pleadings, but on the facts and law sections from the judges' decisions.[1]

There is definitely a possibility of bias in the characterization of the
facts. The authors state that for their appellate court, the facts section is
uncontested by the parties and derived from the lower court's findings. Even
so, it is not clear whether the court's reversal rate is high or low. If low,
it could be that the lower court's presentation of the facts was equally
varnished.

With some judges, it is possible to predict the outcome of the case from the
very first sentence laying out the facts and circumstances. In a famous
dissenting judgement by Lord Denning, in which he attempts to save a cricket
field, the judge begins as follows:[2]

    
    
      In summertime village cricket is the delight of everyone. 
      Nearly every village has its own cricket field where the 
      young men play and the old men watch. In the village of 
      Lintz in County Durham they have their own ground, where 
      they have played these last 70 years. They tend it well. 
      The wicket area is well rolled and mown. The outfield is 
      kept short. It has a good club house for the players and 
      seats for the onlookers. The village team play there on 
      Saturdays and Sundays. They belong to a league, competing 
      with the neighbouring villages. On other evenings after 
      work they practise while the light lasts. Yet now after 
      these 70 years a judge of the High Court has ordered that 
      they must not play there any more.
    

Even with this critique, I think that this research is an excellent first
step. It would be great to use pleadings, which are available with effort from
some appellate courts. It would likely be necessary to OCR some PDF files.

I doubt that this textual approach would work for the more technical parts of
the law. Many cases are in areas with very few preceding cases from which to
train an n-gram based algorithm. While the author's approach worked for
certain human rights cases, it would likely fail for cases turning on an
obscure tax provision.

[1] Original paper:
[https://peerj.com/articles/cs-93/](https://peerj.com/articles/cs-93/)

[2] Miller v Jackson [1977] QB 966, see:
[https://en.wikipedia.org/wiki/Miller_v_Jackson](https://en.wikipedia.org/wiki/Miller_v_Jackson)

------
denzil_correa
Full Paper :
[https://peerj.com/articles/cs-93/](https://peerj.com/articles/cs-93/)

I think the authors list down the "Accuracy" but I would be really interested
to see "Precision, Recall and F1" scores for this classification task.

------
danvoell
I assume that inputting historical judgements by the judge ruling over the
case would improve the results but could also prove whether bias was involved.

------
h4nkoslo
It sounds like they effectively built a rhetoric detector.

"The team found that judgements by the European Court of Human Rights are
often based on non-legal facts rather than directly legal arguments,
suggesting that judges are often swayed by moral considerations father than
simply sticking strictly to the legal framework."

------
dragonwriter
Predicting the outcome of a case from the text of the decision in the case
(even if only the facts and law sections) isn't that impressive, since that
material is selected from all the facts and law that has been presented and
considered to focus on what is relevant to and determinative of the outcome.

------
visarga
With under 600 training examples I doubt it would be able to generalize well
on unseen data.

------
norswap
Another question is whether a human could have predicted as well or better. If
not, it does not help with the actual difficult case of predicting the outcome
of a case that isn't clear-cut for a human.

------
raverbashing
This is one case where something much higher above this value would be an
indicative of overfitting.

This is not like classifying pictures between obvious classes, but something
more complex

------
thecolorblue
What cases are misses? Did those cases get overturned later?

------
Iv
Compared to which score for a layperson?

