
Impossibly Hungry Judges - dnetesn
http://nautil.us/blog/impossibly-hungry-judges
======
program_whiz
Here is the refutation being referred to in the article, the article author
did a poor job explaining it, and the reason to reject isn't "effect is too
large", because the effect is real -- but this study explains it:
[http://journal.sjdm.org/16/16823/jdm16823.html](http://journal.sjdm.org/16/16823/jdm16823.html)

1\. Prisoners are ordered by whether they have an attorney (so those going
last in a session are self-represented)

2\. Judges order prisoners often by "complexity of the case", so the ones
taking most time are first (and therefore most likely to have a favorable
decision, a short/easy case is probably a no-parole situation)

3\. Statistically, the cases that gained parole take longer than those that
don't, so even if they are random/normally distributed, then if one falls at
the end, it will come back to the beginning of next session. A simulation
shows that even with random ordering, you still get the same graph (because
the long/complex cases that could be paroled tend to be moved to next session
when the session is almost out of time).

So after reading this, the graph means: Ratio of cases with a lawyer that can
be finished in the remaining time in the session.

Both values are decreasing as the session continues, so it produces a heavy
down sloping graph that resets each session to some approximately random value
(.65).

~~~
logfromblammo
I don't remember much from my operating systems class, but I do remember that
"shortest job first" is a more efficient scheduling algorithm than "longest
job first".

I cannot fathom why anyone would want to schedule 12 cases that will likely
take 15-30 minutes each after the one that will almost certainly take more
than 4 hours. I understand that type of scheduling in hospitals, because the
complex, time-consuming case is probably the guy who will die if not treated
_immediately_ , while the simple cases can afford to wait, but no one is going
to die or go broke if their court appearance gets pushed back a week because
too many "easy" cases suddenly popped up and pre-empted it.

~~~
Bartweiss
The big thing missing from that assessment is case _variability_.

The clear-cut case with a plea bargain already assessed is short, but also
predictable. If you schedule it at 11:50 AM, you'll still go to lunch at
12:00. The case that's going to take 16 different motions with unclear
evidence isn't just long, it's unpredictable. Even if such cases take an
average of one hour, you don't want to put them at 11:00 AM.

Shorter tasks are also easier to reschedule. Pushing three short cases from
before lunch to after is much lower-impact than interrupting one big case
midway through.

So I don't think your assessment is wrong, but there're other factors deciding
this.

~~~
Retric
You impact fewer people by delaying a long case vs 5 short ones. But, you are
likely to forget details and/or spend lunch thinking about a case that's open
which is IMO more motivation.

------
jakewins
A lot of replies in here seem to think the article is arguing that the _data_
be disregarded. My reading is that the author is saying that _because_ of the
data, the conclusion that the cause is purely psychological must be
disregarded.

The argument for that being, partially, that if the effect is caused by hunger
that would predict widespread issues elsewhere in society that we do not see
and, partially, that there are no other psychological effects this strong.

Another hypothesis is needed. For instance, anyone who has spent time in a
courtroom knows judges like to get small simple cases done first to get people
on about their day - perhaps the effect is because more complex cases
scheduled towards the end of a shift at the bench are more likely to get a
negative ruling?

~~~
exclusiv
From the article: "If hunger had an effect on our mental resources of this
magnitude, our society would fall into minor chaos every day at 11:45 a.m."

This looks compelling at first but how many people are really making any
considerable decisions on an empty stomach before lunch? Everyone at my
company sets up meetings earlier (9 or 10am) or after lunch because they know
how they are when they are hungry and/or thinking about their lunch break. How
many people are actually subject to this effect on a given day and how many
are making important decisions at that time?

There are a lot of aspects to our daily life that could be under this effect
without us observing them individually or as a whole as "minor chaos".

Automobile drivers could be more aggressive and dangerous when hungry. "the
highest percentage of them (crashes) occurring between 6-9pm -- evening rush
hour. Commuters rush home daily to eat, spend time with their families, watch
television and/or get to work on a second job." [1] and "Nationwide, 49% of
fatal crashes happen at night, with a fatality rate per mile of travel about
three times as high as daytime hours." [2]

Perhaps this effect is causing more automobile crashes but as individuals we
chalk it up to people just wanting to get home under their own free will
(because we all want to get home after work for a variety of reasons). Or
we've just become accustomed to how evening rush hour is and we don't observe
it as a more chaotic event affected by psychology.

Separately, why is "minor chaos" necessary to prove the data? This is just the
author's parameters. Even though what the judges do is messed up in the hungry
situation, it doesn't cause minor chaos and they're making substantial
decisions. And the aggregation of this effect could be minor chaos but still
not readily observable as the author suggests.

So to the author's key argument about minor chaos - if this effect has always
been around then what we know about our daily life might not have attributable
and observable chaos even though it does. Or the chaos might not even exist
because the number of people truly affected is very small on a daily basis.

[1] [https://coverhound.com/blog/post/what-time-of-day-do-most-
ac...](https://coverhound.com/blog/post/what-time-of-day-do-most-accidents-
occur)

[2] [https://www.forbes.com/2009/01/21/car-accident-times-
forbesl...](https://www.forbes.com/2009/01/21/car-accident-times-forbeslife-
cx_he_0121driving.html)

~~~
cakedoggie
> How many people are actually subject to this effect on a given day and how
> many are making important decisions at that time?

That is the point, this effect is so remarkable, that even a small number of
people making a small number of decisions over a long period of time is going
to add up to a huge effect.

~~~
BearGoesChirp
Maybe it does. How many 2pm meetings are had to correct for 11am meetings? The
thing is, most systems have a lot of checks and balances built even. Even if a
decision is made, it is quite easy to revisit it hours later. Even days after
it is often pretty easy to revise a decision.

It would be better to look in areas where decisions are irreversible and
consequences are immediate instead of society in general.

------
lubujackson
This feels like half good science (be skeptical of outlandish results) and bad
science (can't be true! so... dismiss it?). Humans think in narratives, which
is why every time the stock market goes down 3 points it's becaise "investors
are nervous about Syria" then when it rebounds it's "investors unfazed by
Syria" (or whatever).

The scientific approach should be to come up with a plausible narrative and
then do everything possible to discredit it. If the narrative survives, it is
likely accurate. This hungry judges narrative seems inaccurate for a lot of
reasons, the double dip being the easiest reason for dismissal. But it does
still beg the question: why this pattern? Are cases staggered in a specific
way? How many cases were looked at? Does this pattern still hold and could we
sit in on a case and see it happen in realtime?

~~~
Symmetry
As scientists in a field dismissing an outlandish result out of hand is
horribly dangerous. As lay consumers of science who don't have the time to
double check every paper we hear about we should absolutely ignore counter
intuitive papers until such time as they're replicated and accepted by the
relevant scientific community.

~~~
Retra
Dismissing a result doesn't mean it won't come back later with better data and
modelling. It's really not that dangerous.

------
thefalcon
A) Cases are not presented in a random order, B) judges attempt to complete
all cases from a given prison before a break is taken, C) the study groups
together "deferred" and "not granted parole" \- it's possible "deferred" cases
are pushed to the back of the prison's docket. Also, D) shared representation
is common, and in these cases it's possible and likely for the representative
(who chooses what order to present his cases to the judge) to present the most
promising case first.*

How this at all boils down to "hungry judges" and then gets reported as such
is basically everything wrong with science reporting (which is to say, almost
everything is wrong with science reporting).

* Probably oversimplified from [http://www.pnas.org/content/108/42/E833.full](http://www.pnas.org/content/108/42/E833.full)

------
lukas
I really appreciate the author's critical analysis of this correlation
presented as "fact" by Radiolab and I love how Hacker News and other blogs
take these types of scientific findings and dig in for the truth. I think the
PNAS paper refutes the original conclusion pretty thoroughly - I wish the
Nautilus author would just explain that.

I don't think we should dismiss effects just because they seem really large
(as the Nautilus author claims) but I do think that it's incredibly
irresponsible of Sapolsky and Radiolab to be uncritically citing a study that
looks like it was debunked in 2011.

I also think it's strange that the author cites the SJDM paper which is much,
much less convincing, claiming that it refutes the original experiment. It
looks to me like that paper just shows that by simulating a non-random order
of parole requests they can create data that looks like the original
experiment.

I love that Hacker News posts these things and people go through and analyze
the papers. No one outside of the specialized field could possibly have time
to analyze all of these papers but they clearly have implications that matter
for everyone. I wish that popular science shows would do a more thorough
analysis of these results on their own.

------
roywiggins
For some reason I assumed the title referred to a problem like the "Byzantine
Generals", or "Dining Philosophers." This is much more interesting.

------
DINKDINK
This author misuses a gaussian distribution by way of claiming that a standard
deviation is outrageous...all on a phenomenon that is overwhelmingly likely
NOT to be gaussian distributed.

~~~
simonster
For anyone who's curious, the procedure used is described in
[http://www.aliquote.org/pub/odds_meta.pdf](http://www.aliquote.org/pub/odds_meta.pdf).
The standardized mean difference is given by:

    
    
      log((65/35)/(5/95))/(pi/sqrt(3)) = 1.9646484930140864
    

The justification for this procedure is that the judges are assumed to make
decisions by dichotomizing a continuous variable with logistic distributed
error (this one statistical justification for logistic regression; see
[https://en.wikipedia.org/wiki/Logistic_regression#Latent_var...](https://en.wikipedia.org/wiki/Logistic_regression#Latent_variable_interpretation)).
The mean difference in the continuous variable is given by the log odds ratio
times some constant, and the standard deviation of the continuous variable is
pi/sqrt(3) times the same constant. Because the logistic distribution
resembles a normal distribution (see
[https://en.wikipedia.org/wiki/Logistic_distribution](https://en.wikipedia.org/wiki/Logistic_distribution)
and figure in paper), the standardized mean difference given by this method
will approximately equal the standardized mean difference of the latent
continuous variable.

------
kerkeslager
There are a lot of problems with the conclusions Radiolab and other science
journalism draws from the original study. But this article isn't better: the
author dismisses the results of the study as impossible, because it _seems_
impossible, because we would already know, etc.

~~~
wahern
If I ran a study that said whenever a bell rings at 11:45AM students in public
schools exit their classrooms (99% within 2 minutes) and quickly file into the
cafeteria, and that this was evidence for mass systemic Pavlovian
conditioning, what would you say?

Might you say, "that's an impossibly unnatural conclusion", given everything
we accept to be true, including our lived experiences--as students, as human
beings.

Of course, it's a _logical_ fallacy to say so. But pure, unadulterated
Aristotelian logic has surprisingly little application even in science. While
as a logical matter such a conclusion on its face might not be impossible,
it's not necessarily hyperbolic to label it so even in scientific discourse.
Not the least because everything we know about Pavlovian responses suggests
that there should be far more outliers, such that there must be at least some
structural component involved, not a purely psychological response. In other
words, the result is too _clean_. The real world is much messier, especially
in the world of human psychology in complex settings, and the odds of seeing
such a clean and consistent response relationship is extremely slim.

Quantum physics suggests that you could spontaneously teleport a kilometer
away. Would I be wrong in believing impossible a paper that concluded that you
spontaneously quantum teleported from home to work? What if the paper fails to
establish--even nominally--the absence of other, less shocking, explanations?

Logically speaking, the teleportation might not be impossible, but in most
others contexts (including statistics and other forms and methods of
reasoning) one might fairly call the conclusion impossible, preserving the
label implausible for scenarios more deserving of critical assessment.

~~~
kerkeslager
> If I ran a study that said whenever a bell rings at 11:45AM students in
> public schools exit their classrooms (99% within 2 minutes) and quickly file
> into the cafeteria, and that this was evidence for mass systemic Pavlovian
> conditioning, what would you say?

> Might you say, "that's an impossibly unnatural conclusion", given everything
> we accept to be true, including our lived experiences--as students, as human
> beings.

No, I would say that there are alternate theories which better explain the
data. And it would not be a logical fallacy to say that.

> Of course, it's a _logical_ fallacy to say so. But pure, unadulterated
> Aristotelian logic has surprisingly little application even in science.
> While as a logical matter such a conclusion on its face might not be
> impossible, it's not necessarily hyperbolic to label it so even in
> scientific discourse. Not the least because everything we know about
> Pavlovian responses suggests that there should be far more outliers, such
> that there must be at least some structural component involved, not a purely
> psychological response. In other words, the result is too _clean_. The real
> world is much messier, especially in the world of human psychology in
> complex settings, and the odds of seeing such a clean and consistent
> response relationship is extremely slim.

The problem with the "argument from too-clean" is that clean data occurs all
the time. If you drop a ball in my living room it will fall down 100% of the
time. Are we to disbelieve these results because they are too clean? Obviously
not.

Look at your analogy. Obviously the migration to the lunch rooms isn't caused
by Pavlovian response (at least not exclusively) but that's not because the
data is wrong: the students _do_ migrate to the lunch room at the described
time. If you're claiming this data is too clean to be believable, then you're
claiming that the students _don 't_ migrate to the lunch room in consistent
numbers. This is just as wrong as the Pavlovian response theory.

------
coldtea
> _But I want to take a different approach in this blog. I think we should
> dismiss this finding, simply because it is impossible. When we interpret how
> impossibly large the effect size is, anyone with even a modest understanding
> of psychology should be able to conclude that it is impossible that this
> data pattern is caused by a psychological mechanism._

That's not how science works.

~~~
cwyers
Isn't it? The answer wasn't that judges were making decisions based on their
hunger, it was that judges arrange their schedules for some reason other than
random. If a biologist working with samples gets a reading that can't be
explained by a natural process but can be explained by a miscalibrated piece
of equipment or a test tube that wasn't properly washed, it absolutely is how
science works for the problem to be identified and the labwork to be run
correctly, rather than publishing a result based on bad data. It's the same
here; if the magnitude of the effect you find is at least an order of
magnitude higher than any other study of similar effects, before publishing
the finding you should have an airtight case that what you found is the effect
you're measuring, not a data quality problem. They couldn't have an airtight
case because their findings weren't driven by what they thought they were
measuring.

------
rtpg
> If hunger had an effect on our mental resources of this magnitude, our
> society would fall into minor chaos every day at 11:45 a.m. Or at the very
> least, our society would have organized itself around this incredibly strong
> effect of mental depletion.

Kind of pedantic, but we indeed organize society around lunch time

------
erikb
It would be great if such kind of criticism would start by just writing down
what it sees in that diagram. What I see without training looks totally normal
and as if it would be validating the claim strongly.

------
jimmywanger
> But I want to take a different approach in this blog. I think we should
> dismiss this finding, simply because it is impossible.

That is truly begging the question. He goes into details about what the impact
to society would be if we're that affected by hunger/fatigue, but the validity
of the finding still remains.

> It is up to authors to interpret the effect size in their study, and to show
> the mechanism through which an effect that is impossibly large, becomes
> plausible. Without such an explanation, the finding should simply be
> dismissed.

Again, horrible circular logic here. Simply disregarding things because
they're "impossible" is the antithesis of science. The authors of the paper
gave a hypothesis that explains the (undisputed) data.

You should not simply dismiss this by just saying "it's not true because it
can't possibly be true". For instance, try explaining the germ theory of
disease to somebody before Anton van Leeuwenhoek invented the microsope.

~~~
nerdponx
_Based on this data, the difference between the height of 21-year-old men and
women in The Netherlands is approximately 13 centimeters. That is a Cohen’s d
of 2. That’s the effect size in the hungry judges study.

If hunger had an effect on our mental resources of this magnitude, our society
would fall into minor chaos every day at 11:45 a.m. Or at the very least, our
society would have organized itself around this incredibly strong effect of
mental depletion. Just like manufacturers take size differences between men
and women into account when producing items such as golf clubs or watches, we
would stop teaching in the time before lunch, doctors would not schedule
surgery, and driving before lunch would be illegal. If a psychological effect
is this big, we don’t need to discover it and publish it in a scientific
journal—you would already know it exists. Sort of how the “after lunch dip” is
a strong and replicable finding that you can feel yourself (and that, as it
happens, is directly in conflict with the finding that judges perform better
immediately after lunch—surprisingly, the authors don’t discuss the after
lunch dip)._

It is impossible because we have very strong evidence against it.

~~~
ballenf
The flaw is that the leap from decision in difficult cases to dangerous
driving or other behavior are wild guesses. He sets up straw men versions of
what the impact of the evidence should be, then rebuts them.

The reality is that the same psychological state that makes a judge less
lenient might also make us better drivers, not worse.

~~~
nerdponx
You're missing the point. The point is that the effect sizes in question are
large enough that people should notice them in their day to day lives, because
the proposed mechanism of action should not only apply to judges, but should
apply to everyone everywhere. But we don't see this big obvious affect in
other places.

