
How should we critique research? - bookofjoe
https://www.gwern.net/Research-criticism
======
leksak
Should we maybe encourage research that tries to disprove existing research?
Or at least, ensure that there is some funding allocated to reproducing
scientific studies to ensure that as we progress we aren't progressing based
on falsehoods?

This is a side-bar more directed at the unfortunate consequences that we
suffer from caused by the pressure on researchers to produce positive results.

~~~
frankling_
Some ACM conferences are now handing out "badges" that mark papers where the
associated "artifacts" are publicly available or have been independently
reproduced or replicated: [https://www.acm.org/publications/policies/artifact-
review-ba...](https://www.acm.org/publications/policies/artifact-review-
badging)

The main issue with that right now seems to be the lack of real incentives:
those badges are nice to have, but in the end you're evaluated by publications
in highly regarded journals and conferences, citations, and funding brought
in. Those things of course tend to let people focus on novel, fancy, exciting
work, instead of the time-consuming process of making things easy to replicate
by others, or even replication of other people's experiments. Incentive-wise,
the publication business follows the principle of fire-and-forget.

What's needed, but I believe very difficult to introduce, is the strict
expectation of independent replication. That would mean replication as part of
the core review process, which would A. create a lot of work for reviewers B.
initially reduce the attractiveness of whichever conference introduces it
first.

One thing that would be easy to introduce in computer science would be to make
it mandatory to share all code used in experiments - if you cannot/won't do
that, you should not able to take part in this game.

~~~
seanmcdirmid
In my CS area, many results aren’t interesting enough to be replicated, even
in high quality journals. Often the results aren’t even very relevant to what
is otherwise a design paper, and are just there to check some boxes. The fact
that research results aren’t replicated is just symptomatic of a much larger
disfunction.

It has been a long time since I’ve read a paper where I actually wanted to use
whatever they were selling, let alone reproduce whatever results they were
claiming. We don’t even have “novel, exciting, fancy” work!

When something comes out of a paper that really changes the game (like say
MapReduce), it gets replicated a lot.

~~~
frankling_
Good point, a lot of work really falls below the threshold where replication
would still make sense.

However, I do think there is a lot of incremental, piecemeal type of work that
over the years could amount to a decent step forward. It's just that
currently, from looking at the code behind quite a few published papers, the
statements often just cannot be trusted enough to actually build on these
works. It is my view that some subsubfields in CS sustain themselves by
avoiding the most pertinent questions, because their answers would reveal that
the entire subsubfield has been superseded, or was never that promising to
begin with. Unfortunately, that type of noise generation is actually more
profitable than a single paper saying "nope", although the value of the latter
in terms of knowledge generation is enormously high.

------
hoseja
What a beautiful site.

Look at the initial. Look at the notes in the actual margins.

~~~
bookofjoe
Concur. It's why I pay $1/month to his Patreon account.

------
monarda
Interesting read and worth thinking about the question the way it's framed,
although I don't know there's actually an answer. I agree that how much
something matters is how much things would change if it were changed, but I
think that depends heavily on the scenario, in terms of the questions asked,
the design, and so forth. I also think problems in contemporary academics go
far beyond statistics, and won't be addressed with statistics, but rather
funding and sociopolitical changes.

Take the conclusion that "issues like measurement error or distributions,
which are equally common, are often not important." I strongly disagree with
this. In a classic randomized controlled trial, maybe yes. But even then there
are serious problems potentially. Just to offer a few examples:

Measurement is key in this age of unreproducability. It's not uncommon for
claims to be made based on results involving one particular measure, when the
hypothesis would apply to many measures of the same thing in the sample. When
an author claims X causes Y, and there are multiple measures of Y, but results
are only reported for one, problems are there. Modeling the joint effect on Y,
rather than the measures of Y, is key.

Overfitting goes hand in hand with distributional misassumptions, because to
the extent that an observed distribution deviates from the assumed one,
overfitting models can capitalize on excess information missed by the base
distribution. A classic example of this is assuming a normal distribution when
fitting a linear regression but fitting to a very nonnormal distribution: in
many cases an interaction term will add significantly even in the absence of a
real interaction, because it captures more of the interestingness of the data,
information-theoretically speaking.

Measurement error also becomes key in modeling things like mediation effects,
or in trying to control for covariates. Residual confounds are increasingly
being recognized, which is all about measurement error. This is maybe related
to overfitting, but many claimed effects can be attributed to measurement
error, especially differential measurement error between variables. This is
often more of a problem in observational studies, but it can easily apply to
experimental designs as well, when there's some ambiguity about how an effect
is acting, or if it's actually acting through the mechanisms being
hypothesized.

------
lazyjones
Researchers could publish their findings along with sources for all involved
data in a machine-readable/analyzable form (formal logic or simple natural
language) so that conflicting results in other publications could be quickly
found and reports needing additional verification would be scrutinized more.

------
dalbasal
IMO, the most prominent/important critique in thesis in this space was Karl
Popper. He, for example, championed falsifiability for scientific theories.

Popper's two big targets for his criticisms were marxism and freudianism, who
he accused of pseudo-science. Despite making a lot of enemies, his criticism
made an impact and the fields evolved to address them, to some extent. They
often dropped the "science" claim, or adopted more scientific methodologies.
Today, Popper's pseudosciences are often characterized as (or evolved into)
"soft sciences."^

Still, I think it's telling that 80 years later these general areas of study
(economics, psychology, sociology, policy research) contain the majority of
the problems this article is talking about.

A big part of the problem is statistics. That is, they are studyings
tatistical phenomenon. The relationship between the "big hairy theories" which
(eg supply & demand) and the more scientific/falsifiable hypothesis that they
can actually test is.. problematic.

So... part of the problem is fixable... The scientific framework needs to be
designed for statistics. Expiremental design plans can be published prior to
expirementation, for example. Negative results need to be published too. Part
of the problem is harder. Without a huge increase in independant replication
studies, many of these fields are not going to be genuinely producing
scientific knowledge, as fields. Individual results are more akin to anecdote.

The bigger problem is that "big" theories (eg keynsian macro, Maslow's
hierarchy, etc.) are not generally scientific theories. That split between
"small" testable theories and interesting, fundamental big theories is just
very hard to bridge in a fundamental way.

^He went easy on liberal economists, possibly because of personal
friendships... mostly because they didn't claim to be scientific to the same
extent. I think in retrospect, this may have been a mistake.

~~~
mxcrossb
Popper’s approach still works here. When some one writes a statistics based
paper, they basically have said “I have a hypothesis that X correlates with Y.
If this is not true (falsifiable), X will not correlate with Y in this data”.
The hypothesis survives the test and this is science. Of course, the authors
probably did the opposite, but this doesn’t matter for the philosophical
underpinnings.

So how do we detect bad studies? With this same framework. Take the same
hypothesis, but design a different test. If the hypothesis fails the treat,
publish that. That is science.

We shouldn’t be so hung up on reproducing old papers or scrutinizing every
study. You’ll get lost in the details. If a hypothesis is true, it will stand
up to every test of it. So if you doubt a result, test it!

------
amelius
Perhaps publishing bad research should be punishable in some way. Also the
system of citations does not work because negative citations also count as
citations.

~~~
gradstudent
> negative citations also count as citations.

If by "negative citations" you mean works that are widely criticised, I don't
see the problem. Most scientific papers cite prior works in order to point out
their limitations. This is not a bad thing. We need to understand how previous
attempts fell short in order to understand how and why we might want to do
better in the future.

Maybe however by "negative citations" you mean citations of works that are
plainly wrong. I think this occurs very infrequently, to the point where it
probably isn't an issue. I certainly haven't come across works in my area
which are cited for being wrong. I don't see the point of citing them either;
I'd rather cite a paper that points out the problems and analyses their impact
(i.e. something constructive).

~~~
gwern
The problem is that raw citation counts are used for promotion and hiring. And
they look the same for 'did groundbreaking work' as for 'actually so flawed
that everyone cites it just to make fun of it and as a cautionary lesson to
everyone else to not be so incompetent'. Some authors explicitly take a
mercenary attitude and don't care about sloppiness - after all, if someone
criticizes them, that just means their citation count goes up...

~~~
gradstudent
With respect, I think you don't know what you're talking about. I've never
seen cases where a researcher will cite a paper to poke fun at it. One might
cite an erroneous paper in order to point out errors if one were interested in
surveying the types of errors which occur in Science. However; (i) that paper
is unlikely to be famous and; (ii) even then it's considered poor form to shit
on your colleagues.

------
klim_bim
Determine whether the practical application of research contributes to
extending human lifespan for the purpose of inhabiting other planets.

