
The GRIM test – a method for evaluating published research - maxharlow
https://medium.com/@jamesheathers/the-grim-test-a-method-for-evaluating-published-research-9a4e5f05e870
======
canjobear
Another simple thing you can test in a paper to see if it is credible is
p-curve and related methods from Uri Simonsohn et al.

[http://www.p-curve.com/](http://www.p-curve.com/)

You just look at the distribution of p values that are used to support the
authors' hypotheses. If the distribution is skewed high, then something fishy
is going on.

------
jldugger
Interesting approach -- get the scientific community to agree on the
mathematical principles first, before anyone specifically is outed as
cheating.

But this article feels like reading a teaser chapter of a bigger story.

> The amount of (toil) required to actually create data like this from scratch
> if (very) nightmarish. It’s a task drastically out of reach of the (foolish
> people) who’d try such a bush league stunt in the first place.

This assumes that all experiments lead to publications. We know there's a
strong publication bias, and that the bias favors positive results, and
dramatically favors unintuitive positive results. Which means you need to find
correlations where none were expected. How many experiments do you need to get
a significant correlation when there is none? Hint: more than one.

It's also worth noting that it wouldn't be difficult to produce a genetic
algorithm using various statistical checks, including this one, as a fitness
function.

~~~
moyix
Just to address your comment about publication bias – we do have techniques
for detecting that across a set of studies on the same subject, namely funnel
plots:

[https://en.wikipedia.org/wiki/Funnel_plot](https://en.wikipedia.org/wiki/Funnel_plot)

~~~
jldugger
Wasn't implying we don't. Just that the calculus of data fraud is weighted
more towards fraud than the article suggests.

------
bayesian_horse
I don't agree with the idea that faking data is more difficult than running
the experiment. A lot of courses in universities now teach R or something
similar to run statistics. Relatively simple "monte carlo" simulations would
provide results which satisfy the GRIM test.

------
michaelmior
While it's true that fractional ages are pretty much never used, ages do not
necessarily have to be recorded as whole numbers.

~~~
LionessLover
Since few people will find it, the response on the web page is

[https://medium.com/@jamesheathers/a-lot-of-people-are-
hung-u...](https://medium.com/@jamesheathers/a-lot-of-people-are-hung-up-the-
fact-that-its-possible-to-collect-ages-in-finer-grains-to-
the-c2739f9e0b9b#.9gc1sesfz)

> A lot of people are hung up the fact that it’s possible to collect ages in
> finer grains — to the nearest month, or day, etc.

> Remember a) this is just a hypothetical illustration and b) the vast
> majority of the time, age is collected exactly how I’ve described here.

~~~
moyix
A more practical problem with applying this on age data specifically is that
everyone I looked at (granted, only a handful of CS papers) only gave one
decimal place, not two.

------
_bdog
I work together with a psychology professor (was head of department) now and
then. She said there's a large problem with students, even graduates, just not
understanding statistics and math properly (or _at all_ )

~~~
untilHellbanned
Same is true in biology

~~~
davidgerard
Medical researcher rediscovers integration, gets 75 citations
[https://fliptomato.wordpress.com/2007/03/19/medical-
research...](https://fliptomato.wordpress.com/2007/03/19/medical-researcher-
discovers-integration-gets-75-citations/)

------
bluenose69
Some of these might be simple errors, with results being typed in from other
documents. Authors who are worried about making such errors might want to
consider using methods of reproducible research, e.g. writing *blah had mean
value `round(mean(x), 3)` (n=`length(x)`)" or similar in Sweave, where the
items in the back-ticks are R code working on the actual data. This is a bit
more work, but it prevents transcription errors, and also a pernicious type of
error that comes about by adjusting the data analysis during the writing
process.

------
bane
Also see:
[https://en.wikipedia.org/wiki/Benford%27s_law](https://en.wikipedia.org/wiki/Benford%27s_law)
for a related test used in various fields.

------
moyix
Wow, this is a really clever technique, and the results are really alarming. I
suspect we'll see some careers unravel as a result of it.

~~~
xiphias
Not really...there's a reason while psychology is not considered science by
non-psychology scientists: it's not because the researches couldn't be done in
a scientifically useful way, but because even the teachers don't care about
math. I think Facebook and Google and other ad agencies have a better model
about human behavior than psychologists (although they only have data on a
specific part of human behavior)

------
progers7
I've done a similar trick where the ratio of two secret integers is released
publicly with many significant digits and you can sometimes find the two
integers by brute forcing the division over all possible values. Does anyone
know a name for this approach?

~~~
kurlberg
Brute force is not needed, check out "continued fraction" (in particular "11\.
Best rational approximations" at wikipedia.)

------
flerchin
Good. Fuck the liars and cheats right in their... careers.

