
Impossibly Hungry Judges (2017) - laurex
http://m.nautil.us/blog/impossibly-hungry-judges
======
mark212
This is a ridiculous study because it implies that the order of when cases are
heard is randomly distributed. That is, a "hard" case is just as likely to be
first in the morning session as it is to be fifth in an afternoon session, or
vice versa.

The data is easily explained as judges taking clear case of leniency first,
then doing the tough ones (that could go either way), and ruling on the clear
cases of parole denials later in the session. The judges have hopefully done
their homework in advance and have read the case files and petitions, so they
come in with at least a rough sense for which cases are in each bucket.
There's likely a significant degree of agreement on that metric among the
members of the panel, so it's not surprising that cases are heard in that
order: clear grants, ones that require discussion, and clear denials.

The lesson learned seems to me to be when the evidence shows a result this
large, take a really hard look at your study before you publish it.

~~~
claytonjy
This is exactly what the criticism linked in TFA explores:
[http://www.pnas.org/content/108/42/E833.long](http://www.pnas.org/content/108/42/E833.long)

A related problem here is that even though the original study was bad for many
reasons, it continues to be cited much more than the refutation.

~~~
99052882514569
And mentioned in podcasts by people[1] who really, seriously, obviously should
know much better (based on how famous and otherwise competent they seem to
be).

[1]
[https://en.wikipedia.org/wiki/Robert_Sapolsky](https://en.wikipedia.org/wiki/Robert_Sapolsky)

~~~
oh_sigh
If we could stop quoting this, and the baloney implicit bias study, we would
all be so much better off.

~~~
jfz
Can you post a link to the implicit bias study you're talking about and a
refutation?

------
Waterluvian
I did a bit of science as an undergrad and a bit more in grad school. I
learned a few things, chiefly, that I suck at statistics and won't make a good
formal scientist. But the other main takeaway was that when results look
strikingly obvious and "beautiful" they're usually just wrong. I did GIS so
the statistical data was usually in the form of big colourful maps. And when
those maps told such an obvious story, they were really telling me that I
messed something up in my processing.

Not that this is universally true, but it's what I found to be pretty common
over the years.

~~~
nonbel
>"I suck at statistics and won't make a good formal scientist"

Most really good science got done before statistics was even a thing (starting
in the 1930s-1940s). Stats has very little to do with being a good scientist.

~~~
whatshisface
Depends on the field. Now that we all realize that you can't actually conclude
that you've found something until you've thought about how likely your result
was to occur randomly, only a few stragglers and the "lucky ones" where effect
sizes are gigantic (you either land on the moon or you don't) can get away
without it.

~~~
nonbel
>"how likely your result was to occur randomly"?

What does random mean? It just means you didn't include every possible
influence in your mathematical model of whats going on. Instead you used some
distribution of "noise" as an approximation to whatever was really going on.

Eg, if you know everything about airflow, gravitational, electrical, etc
forces on a coin throughout a flip, you can predict which side it will land on
with near 100% accuracy. Most of the time we don't have info about that
available so use the binomial distribution approximation.

>"where effect sizes are gigantic"

Most science has nothing to do with "effect sizes", in fact I'd say concern
about "effect sizes" is indicative of very rudimentary science. Advanced
science is concerned with making precise and accurate predictions.

~~~
whatshisface
> _What does random mean? It just means you didn 't include every possible
> influence in your mathematical model of whats going on._

In physics, it is fundamentally impossible to predict the outcome of wave
function collapses/pick your favorite interpretation's verb. CERN doesn't do
statistics on their histograms because they don't have good models...

That aside, does the difference between a lack of knowledge and a lack of
predictability really matter? Either way all you can do is calculate the odds.

> _Most science has nothing to do with "effect sizes",_

True, that language is not used everywhere. However if your department deals
with signals that are hard to distinguish from background then you have some
way of expressing that idea in your vernacular, whatever it is.

~~~
nonbel
>"In physics, it is fundamentally impossible to predict the outcome of wave
function collapses/pick your favorite interpretation's verb."

This is one interpretation of quantum mechanics. Plenty of physics is done
without any consideration of this.

>"CERN doesn't do statistics on their histograms because they don't have good
models..."

I believe the histograms you are referring to were done on the data, not on
the predictions. And afaik they actually do lack a good model of the higgs
boson mass, since they never predicted an exact mass.

>"However if your department deals with signals that are hard to distinguish
from background then you have some way of expressing that idea in your
vernacular, whatever it is."

Distinguishing signal from background is the same thing, its rudimentary stuff
you do when you can't predict exactly what you are looking for. The key
difference is that in advanced science you check for deviations from the model
you believe is correct. No one at CERN believed the standard model without a
higgs boson they used for the background reflected reality.

~~~
whatshisface
There are zero valid interpretations of quantum mechanics where you can tell
me which particles will come out of an event, even in principle. It seems
silly to suggest that describing your certainty in your results is
incompatible with "advanced science," when the reason you're doing it is
because of a direct law of nature. The Higgs boson might have never been
observed - there is a certain chance that it was just lucky photons all along.

If you are in a field where you don't need statistics, it is not because they
aren't at play, but instead because they all have -9999 in the exponent and
you have implicitly decided not to worry about them.

~~~
nonbel
>"There are zero valid interpretations of quantum mechanics where you can tell
me which particles will come out of an event, even in principle. It seems
silly to suggest that describing your certainty in your results is
incompatible with "advanced science," when the reason you're doing it is
because of a direct law of nature."

There is no fundamental uncertainty in GR predictions. There is none in the
stefan-boltzmann law, etc. In many cases whatever QM uncertainty exists will
be negligible next to measurement error and so is not considered. Acting like
QM uncertainty has anything to do with 99% of physics is nonsense.

~~~
whatshisface
> _In many cases whatever QM uncertainty exists will be negligible next to
> measurement error and so is not considered._

If you are in a field where you don't need statistics, it is not because they
aren't at play, but instead because they all have -9999 in the exponent and
you have implicitly decided not to worry about them.

~~~
nonbel
>"a field where you don't need statistics"

I feel some topics are getting confused here.

Now you seem to be saying "a field that doesn't need statistics" _only_
doesn't need it because the error due to QM is negligible compared to other
sources. How do you account for all the advances before QM _and_ statistics?

~~~
whatshisface
If you have predictively modeled an error source you can subtract it, and then
you're left with error sources that you have not modeled. If this is repeated
indefinitely you will end up with transistor noise and quantum effects which
cannot be modeled. As a result you will always have some aspects of your
experiment that must be dealt with statistically. If a field is getting by
without expecting its contributors to do statistics it is either a straggler
before we realized this aspect of rigor or it is a case where the effects are
so large that a nonrigorous detection of them works well enough.

Discoveries made before the introduction of error analysis turned out to
either be lucky or wrong - a pattern repeated every time we all realize that
something was being overlooked. It's worth pointing out that the astronomers
of yore probably knew that their measurements weren't perfect and that
averaging them was a good idea, which counts as statistics.

~~~
nonbel
>"If a field is getting by without expecting its contributors to do statistics
it is either a straggler before we realized this aspect of rigor or it is a
case where the effects are so large that a nonrigorous detection of them works
well enough."

It makes no sense to be concerned about sources of prediction error that are
orders of magnitude smaller than your measurement error. Such concerns would
only waste time and money for no benefit. This has nothing to do with
"straggling" or expected "effect size".

~~~
whatshisface
It sure sounds like you're describing large effect sizes to me. The size of
the effect you're measuring is large compared to signal uncertainty, and
incidentally your systemics are also high.

~~~
nonbel
So you want to use "effect size" to mean "deviation from prediction"?

~~~
whatshisface
You compare the effect sizes with the measurement uncertainty. For example if
I wanted to conclude that a solenoid worked I could correlate piston position
with applied voltage, and I might not need to worry about statistics because
the motion of the piston is very large compared to the uncertainty in my
implicit visual measurement of its movement. Admittedly that's not how anybody
thinks intuitively, but it is correct.

~~~
nonbel
>"For example if I wanted to conclude that a solenoid worked I could correlate
piston position with applied voltage"

No one would do this, its once again a "rudimentary science" behavior. People
have figured out what we need to know to get the exact relationship between
voltage and position.

------
thesausageking
Here's a good explanation of the flaws in the original study:
[http://journal.sjdm.org/16/16823/jdm16823.html](http://journal.sjdm.org/16/16823/jdm16823.html)

It's primarily due to "a statistical artifact resulting from favorable rulings
taking longer than unfavorable ones".

~~~
baybal2
Not much of an artifact - during my BLAW101 I was told that in North America
judges really don't like people intentionally dragging on cases, but can't do
much about them other than give the innuendo of them raising white flag -
something along lines "you babbled for 3 hours, for sure we have enough
argument from you that the worst case scenario ruling is now out of the
question"

------
quotemstr
Am I the only one who feels sharper and more adapt while somewhat hungry? Food
makes me lethargic. I've never been sympathetic to ideas that "you should
always eat breakfast!". If possible, I'd prefer to skip breakfast _and_ lunch.

My idea: our ancestors were opportunistic savanna hunters. Why would we be
unable to maintain homeostasis with respect to cognition even without food?
Our ancestors would have needed their best cognition when they lacked food!

~~~
gowld
Optimal alertness/health is at a point between "paralyzed with starvation" and
"paralyzed by overstuffedness". What you are calling "somewhat hungry" may be
better stated as "full, but not overstuffed".

To maintain a consistent energy level, 5+ small snacks is better than 3 large
meals.

~~~
air7
I'v always found it amusing that there is no word for English meaning "not
hungry". There's only "full". It exists it many languages alongside "full" and
there's a clear distinction between them.

~~~
quotemstr
Satiated?

------
olooney
"The effect size is too large" seems like a weak heuristic too me. Feynmann
talks about the kind of problems such a heuristic can cause:

> It's interesting to look at the history of measurements of the charge of an
> electron, after Millikan. If you plot them as a function of time, you find
> that one is a little bit bigger than Millikan's, and the next one's a little
> bit bigger than that, and the next one's a little bit bigger than that,
> until finally they settle down to a number which is higher.

> Why didn't they discover the new number was higher right away? It's a thing
> that scientists are ashamed of—this history—because it's apparent that
> people did things like this: When they got a number that was too high above
> Millikan's, they thought something must be wrong—and they would look for and
> find a reason why something might be wrong. When they got a number close to
> Millikan's value they didn't look so hard. And so they eliminated the
> numbers that were too far off, and did other things like that...

[https://hsm.stackexchange.com/questions/264/timeline-of-
meas...](https://hsm.stackexchange.com/questions/264/timeline-of-measurements-
of-the-electrons-charge)

A more theoretical argument is as follows. Suppose we systematically remove
from analysis any data point which are more than 1.5 times the IQR below the
first quartile or about the third quartile.
([https://en.wikipedia.org/wiki/Outlier#Tukey's_fences](https://en.wikipedia.org/wiki/Outlier#Tukey's_fences))
Points censured by such a rule are often called "outliers," but because we do
not yet know the true underlying probability distribution this terminology is
misleading. Then our estimates for skew, kurtosis, and all higher moments will
be biased downward, and furthermore the censured sample is more likely to
"pass" normality tests such as Shapiro-Wilks which can lead to us applying
inappropriate models. However, what is the interpretation of a model when
applied to a new observation? If the data point is well within the Tukey
fences, then we can use the model to make a valid prediction. However, if it
is outside the fences, we can say nothing. What then is the population mean of
the predicted value? It is the weighted mean of the values for those data
points for which our model is valid and those which lie outside of it. But
those that lie outside of it can be enormously influential.

An example can make this concrete. Suppose an actuary estimates the amount of
an insurance payouts for house insurance. 99% of the time, these effects are
small, a few hundred or a few thousand dollars. But 1% of the time they are
much larger, perhaps $500,000 or more. If the actuary applies Tukey's fences
to his data and fits a model, he may infer that 1% of claims payout each year,
and those payouts average $1,000 dollars. So he sets the price of the
insurance at $20/year and makes a 100% profit margin. Until the first house
burns down, and his company goes bankrupt.

The point of this is merely to illustrate the dangers of ignoring any effect
which simply seems "too large." The beneficial effects of penicillin are
"impossibly large" \- would we reject the evidence on those grounds? The
change in resistance of a semiconductor in the presence of an electric field
seem "impossibly large" until the solid state physics is carefully analyzed -
would we reject this clear and highly reproducible evidence of a qualitative
change in behavior because it is _too_ evident?

A much better approach is to scrutinize the methodology for, say, omitted
variables. This is what the more detailed criticisms he cites in the third
paragraph do. Or to attempt to reproduce the study, which is perhaps the
strongest approach.

~~~
Bartweiss
The point being made isn't that large effects are impossible, or that high-
_d_ results should be dismissed outright. Rather, it's that for sufficiently
large _d_ , subtlety becomes impossible. Results will either be overwhelmingly
obvious to any observer, or false. It's worth giving that list of sample
effects in the article another look. In particular, the studies showing that
juries usually make decisions matching what the majority of jurors started out
believing is d=1.62. This study claimed that hunger was having a bigger effect
on trial outcomes than whether people believed the charges were true. It was
still worth looking for the confounding effect, certainly, but when people
start making claims that disproportionate to comparable claims, simply
refusing to accept a causative claim is an entirely reasonable decision.

The effects of penicillin are overwhelmingly obvious: for some set of
diseases, penicillin produces rapid and complete recovery even in otherwise-
terminal patients. I don't know the _d_ for "penicillin treats staph", but in
pharmacology _d=5_ isn't considered unreasonable for precisely these reasons.

The average _d_ in psychology, though, is around 0.4 - and that was before
many of the 'strongest' results fell apart in recent years. This study's
effect size wasn't inherently insane, but it was claiming confidence in the
realm of "antibiotics kill bacteria" for a subtle and previously unobserved
force on judge's decisionmaking. It absolutely should have raised questions
like "if this effect holds, why aren't trial lawyers acutely aware of it and
working to manipulate it?"

~~~
olooney
What you are saying is reasonable but it is not what the author proposed. This
is your rephrasing:

> The point being made isn't that large effects are impossible, or that high-d
> results should be dismissed outright.

But what the author actually says in the concluding paragraph is that certain
results _are_ to be dismissed outright and moreover the burden of proposing a
plausible mechanism is wholly on the experimenter:

> Implausibility is not a reason to completely dismiss empirical findings, but
> impossibility is. It is up to authors to interpret the effect size in their
> study, and to show the mechanism through which an effect that is impossibly
> large, becomes plausible. Without such an explanation, the finding should
> simply be dismissed.

He does not say, "other experimenters should attempt to reproduce the result"
or "theoreticians should explore the phenomena mathematically" but only "the
finding should simply be dismissed." Dismissed! Note the stark contrast: while
you emphasize that such results should _not_ be "dismissed outright," that is
_exactly_ what the author calls for! No follow up studies, no novel
theoretical models proposed, just blanket dismissal; furthermore, in the
authors own words, the test used to divide those results which are dismissed
is "implausible" vs. "impossible."

But who can so finely demarcate the implausible from the impossible? Certainly
not the contemporary scientists, who in every age we find railing against
newly discovered theories and inconvenient facts. And why would the
experimenter who first notices a phenomenon also be required to at once to
provide a satisfying theoretical explanation? The role of the experimenter is
careful lab work. Was the mechanism behind penicillin understood immediately?
Was the quantum physics of the depletion zone fully understood when the
strange electrical properties of doped silicon were first observed? When half
of the neutrinos from the sun go missing, the experimenter must simply attend
to his instruments to ensure he has not made an error, and then throw up his
hands and let the theoreticians get to the bottom of the thing. This is not
the only way science can proceed, but it is an important way: surprising
empirical observations, outside the realm of current theory or even directly
in contradiction to it are announced decades, or centuries, or in a few cases
(such as ancient supernovas) millennia before they are fully understood.

[https://en.wikipedia.org/wiki/Solar_neutrino_problem](https://en.wikipedia.org/wiki/Solar_neutrino_problem)

[https://en.wikipedia.org/wiki/History_of_supernova_observati...](https://en.wikipedia.org/wiki/History_of_supernova_observation)

If the author had been content with making a specific criticism of this one
study, I would not have objected. The study is almost certainly flawed, and
various good reasons why it is flawed have been discussed. But the author
seeks to elevates this principle to that of a general maxim (as the concluding
paragraph makes clear), recommending it to one and all as as a valid and
useful form of inference. But it is not (in the general case) a valid "form"
of inference, as we see by swapping out the details of this one flawed
psychological studies for several others, such as penicillin, the field
effect, or neutrinos. In _those_ cases, arguments of the exact same form would
have resulted in the dismissal of legitimate data. It is often _not_ a good
idea to reject experimental results just because they seem "impossible."
Concretely, we may analyze the authors proposed method as a three step
process:

1\. Before the experiment, decide what is possible and what is impossible. (We
fix a hypothesis space. Anything outside of this space is given zero a priori
probability.)

2\. We conduct the experiment and observe the outcome.

3\. If the outcome of the experiment is impossible, we "dismiss" the result.
(We update no posterior probabilities.)

It is difficult to see how science could have progressed if such a rule were
universally applied. For example, Rutherford was absolutely shocked when a
certain percentage of alpha particles were reflected from a thin sheet of gold
foil back at an angle near 180 degrees:

> "It was quite the most incredible event that has ever happened to me in my
> life. It was almost as incredible as if you fired a 15-inch shell at a piece
> of tissue paper and it came back and hit you."

Yet, because he did not follow the author's advice and dismiss these
incredible observations, the nucleus was discovered.

Kuhn argues that when this situation occurs in the history of science, what
actually happens is that the old theories are discarded and new ones are
proposed and validated until even the "impossible" observations are now
explained. (A paradigm shift.) I would suggest, and I think Kuhn would agree,
that in step #3 we should not discard the observation, but rather discard the
theory used in step #1 that led us to conclude that it was "impossible."

[https://en.wikipedia.org/wiki/Paradigm_shift](https://en.wikipedia.org/wiki/Paradigm_shift)

~~~
soVeryTired
Quoting the article again: "It is up to authors to interpret the effect size
in their study, and to show the mechanism through which an effect that is
impossibly large, becomes plausible. Without such an explanation, the finding
should simply be dismissed."

Rutherford was able to come up with such a mechanism in his work with Niels
Bohr. Atomic theory was quite new when Rutherford made his discovery: in a
sense this was low-hanging fruit. The article's point is that this effect size
is so big that if it was real, any half-decent lawyer would _aready know about
it_. Without an explanation for why everyone somehow missed this point, the
claim is pretty much invalid.

I think the right analogy is if someone comes up with evidence for a speed of
light that does not agree with the standard measurement. If the claim is to be
taken seriously, not only do they have to be rigorous in their experiment -
they also need to explain why all other measurements were incorrect. This
stuff happens all the time in science. Experiments are sloppy, and results are
difficult to replicate. The go-to reaction _should_ be to throw out a result,
unless it can be replicated time and time again.

Kuhn's description of a paradigm shift included room for 'problems' that can't
be explained with the confines of normal science. You only start throwing
things out when the evidence against you becomes unassailable.

------
chicob
_«If a psychological effect is this big, we don’t need to discover it and
publish it in a scientific journal—you would already know it exists.»_

On the one hand, shoudn't we? There are a lot of supposed facts that need to
be checked. Also, the size of the supposed effect everybody knows still needs
to be measured, if, for instance, comparisons between different populations
can be made at all.

On the other hand, and personally, I find it to be common sense never to have
difficult conversations on an empty stomach.

------
tantalor
[2017]

Previous comments:
[https://news.ycombinator.com/item?id=14701328](https://news.ycombinator.com/item?id=14701328)

------
jellicle
This article, which boils down to "if the data seems unlikely from what we
know beforehand, throw it the hell out and never look at it again", is the
exact opposite of what scientists are taught, and what they are taught - not
this article - is the correct approach.

~~~
claytonjy
This is a very poor and incorrect reading of the article here. The point is
not that the data were unlikely and should be thrown out; it's that if the
effect size reported was accurate, that would affect so many other things so
strongly that the current state of civilization refutes the estimated effect
size from the study.

It's not that the data was poor, but that it and the analysis was incomplete
and ill considered.

In psychology in particular, large effects will seep into everyday life well
beyond the phenomena being studied, which gives a very useful check on
astronomical effects like those originally reported on judges and time of day.

~~~
MrPrinter
GP says, and I concur, that the scientific method would not to be just say
"this must be impossible, let's ignore it", but try to disprove the study
(maybe by trying and failing to replicate the study, or designing better
studies).

~~~
claytonjy
I think we all agree with that position; the issue is the author of TFA did
not make that position, so GGP was arguing against a strawman.

The original study was disproved, and TFA includes multiple such links near
the top.

~~~
Retric
The study was not disproved, the conclusion was.

The critical part of science is the actual data and methodology. If you do X
you will see Y is what constrains new theories. Conclusions on the other hand
have very little weight.

PS: Sorry pet peeve.

