
The Irreproducibility Crisis of Modern Science - vixen99
https://www.nas.org/projects/irreproducibility_report
======
lisper
The site is down so I can't read the original report, but I've read reports on
this topic in the past so I'm going to chime in with some "usual suspects"
caveats:

1\. No result is 100% reproducible because you can never completely reproduce
the conditions of any experiment. The best you can hope to do is to reproduce
the conditions that _matter_ , but enumerating those has to be part of the
theory you are testing, and so you can never be 100% sure that you have a
complete list.

2\. Even a completely non-reproducible result can be scientifically
significant. For example, celestial events are almost never reproducible. Our
understanding of celestial mechanics nonetheless rests on solid science.

3\. The end-product of science is not truth, it is _explanations_ of
observations. Those observations can (indeed must) include non-reproducible
ones. Sometimes the explanation of non-reproducible results is "experimental
error" or "delusion" or "we just don't know." But non-reproducible events are
nonetheless within the purview of science.

On the other hand...

4\. The statistical tests currently in widespread use as a criterion for
publication in peer-reviewed journals _guarantee_ that _at least_ one result
in 20 will be due to chance and not because the hypothesis being tested is
actually true. That, combined with the suppression of negative results,
guarantees that the results published in scientific journals that adhere to
those standards _will_ be unreliable. But that doesn't really have anything to
do with reproducibility _per se_ , it has to do with the fact that the
journals use a weak criterion for defining positive results. That, combined
with our human predilection to value positive results over negative ones and
the understandable desire of scientists to advance their careers, all but
guarantees that journals will contain many defensible but false results. But
this is not because of a lack of reproducibility per se, it's because of poor
policy choices.

~~~
3JPLW
> The statistical tests currently in widespread use as a criterion for
> publication in peer-reviewed journals guarantee that at least one result in
> 20 will be due to chance and not because the hypothesis being tested is
> actually true.

That's wildly overly pessimistic. That would only be the case if scientists
just went around, looking at the world, and came up with null-hypotheses willy
nilly. That's generally not the case. There is often a _reason_ for conducting
the test and a plausible cause of action. There are two possible reasons for a
significant result:

* The null hypothesis is correct but they got "unlucky" data (5% chance or p% chance)

* There is a real effect (and the null hypothesis is actually wrong)

This becomes more problematic in reproducibility tests, though, since that
biases my prior towards a correct null hypothesis and now you must be very
careful about pre-registration and the numbers of folks worldwide that are
trying to reproduce a given experiment.

~~~
TangoTrotFox
Are you aware of the scale of failure here? The article (referring to the
executive summary here) gives some really incredible figures. In psychology
100 reproducibility studies were only able to produce statistically
significant results from 36% of prominent papers - compared to the 97% of the
original studies. In biotech, a firm tried to reproduce 53 "landmark" studies
in hematology and oncology - only 6 could be replicated.

The reason this is being called a 'crisis' is because it's looking like the
vast majority of science, particularly in the social and human physiological
sciences, is junk.

~~~
3JPLW
Absolutely! I'm only addressing the common misconception in the grandparent
that the p-value is the percent likelihood that the effect is real. That's not
at all accurate.

There are definitely other reasons that lead us to such horrifying results —
especially in fields where experimenters _do_ conduct experiments seemingly at
random and without a reasonable pathway of action (I'm looking at you, social
psychology and friends).

------
Thrymr
To be clear, the "NAS" (nas.org) that published this study is the National
Association of Scholars [0], a political group, _not_ the National Academy of
Sciences (nasonline.org) [1], a nongovernmental organization that consists of
scientists elected by their peers to provide independent scientific advice to
the US government. There was in fact a study published recently in PNAS, the
Proceedings of the National Academy of Sciences, on this topic [2].

[0]
[https://en.wikipedia.org/wiki/National_Association_of_Schola...](https://en.wikipedia.org/wiki/National_Association_of_Scholars)

[1]
[https://en.wikipedia.org/wiki/National_Academy_of_Sciences](https://en.wikipedia.org/wiki/National_Academy_of_Sciences)

[2]
[http://www.pnas.org/content/early/2018/03/08/1802324115](http://www.pnas.org/content/early/2018/03/08/1802324115)

~~~
jrochkind1
Good catch thanks! Wikipedia says "The National Association of Scholars (NAS)
is an American non-profit politically conservative advocacy group, with a
particular interest in education"

Thanks for the link to the PNAS -- looks like a whole special issue, not just
an article, in fact. Great. I'm gonna ignore the fake-NAS one, and read the
real-PNAS one.

I wonder what National Association of Scholars' motivation is here exactly.

~~~
DanielBMarkham
Isn't the entire purpose of reproducability is to eliminate such questions
from discussions about science?

~~~
jrochkind1
To eliminate what such questions?

To eliminate any questions about social context or bias or agenda? Totally not
possible, social context is an inherent part of science, as in all human
endeavors.

I think the purpose of reproducibility is to well, make science _possible_, it
is the fundamental bedrock of science -- to determine what and how we can make
reliable predictions about the world. It doesn't mean bias and effects of
social context no longer exists though. Science exists anyway, in a social
context -- as evidenced by the past couple hundred years!

In general, I don't think science exists to eliminate _any_ questions, science
exists to ask questions. Eliminating questions may be a goal of ideology or
religion, but not science. (and in the real world, they certainly influence
each other. science is a goal to strive for, not a fait accompli. Scientific
understandings of the world are always changing; science is an approach to
understanding the world, not a set of settled conclusions).

~~~
jrochkind1
Also, if we're looking for the social context that leads to the
reproducibility crisis, I think it's a lot more likely to be about academic
career incentives (you _gotta_ publish things, a lot of things, that all seem
like important and socially useful results, none of which are negative results
-- when the actual practice of science means negative results are an expected
and useful thing, not indication you are a bad scientist), than the political
ideologies/biases of researchers.

Maybe I'm wrong. There are probably ways to investigate it. This essay isn't
one of them.

------
mistrial9
Some of the comments miss an important distinction.. Science in the popular
press, often refers to science versus completely non-science ways of forming
an opinion or deciding policy. Meanwhile, science within a technical
community, is subject to human error and manipulation, and relies on a
reproducible result, as well as peer-review, to find answers to conflicting
claims.

There are certainly non-Science ways of forming an opinion and deciding
policy, in many cases totally legitimate. But once science is claimed, then of
course it has to be subject to science rigors.

~~~
m-watson
Yes, thank you for saying this. I feel like, often, people use science as some
ethereal being. It is a process that has to be used. If that process isn't
happening it isn't science. That doesn't mean everything has to be science,
but things that are science should be held to a certain standard but should
also be taken with the extra weight if that standard is met.

~~~
Bograff
It's important to make the distinction between Pure Science and anything on
the sliding scale from technical description to ELI5. There are levels to it,
you and I know.

~~~
m-watson
Oh absolutely there is a sliding scale on both the actual practice of science
as well as the reporting. From the practice of it, I know as simple anecdote I
have participated in conducting research in hard science (physics), softer
hard science (biology), social science (behavioral and economic), as well as
straight policy analysis with a "scientific approach" through data science.
The findings from those are reported differently and have stronger or weaker
words and conclusions. I think that is where the practice sliding scale comes
in, is how the researcher decided to write up and report their findings and
appropriately identifying their methodology and short-comings.

From the journalism side I think you end up with the problem the wide spread
field of journalism has, you have to be read by your audience. That means
something with a wider audience is probably going to be more on the ELI5 side
which means both the reader and the journalist have to take any reported
findings more lightly.

The sliding scale, I believe, comes in with the rigor something is reported.
That is both in the practice and the journalistic side. More details can give
a more granular understanding showing limitations but some times broad strokes
are necessary to get some information out to a wider audience or more quickly.

------
timtadh
In response to several threads here: it is important to distinguish when
scientists are self critical vs. when non-scientists are critical of the
scientific method. For instance, there is a long history of scientists
criticizing how the scientific process is currently conducted for the purposes
of improving the scientific endeavor. That work is sometimes used by non-
scientists who question the overall scientific method. However, such use is
invalid as the scientific self-criticism

1\. assumes the validity of the scientific method

2\. relies on the scientific method as its critical lens

Whereas those who critique science as a whole:

1\. assume that the scientific method does not work and does not arrive at
"truth"

2\. then use scientists being self critical to prove #1.

Such a "proof" does not work as there is its uses the assumption "the
scientific method arrives at truth" to derive the contradiction "the
scientific method does not arrive at truth". See for instance comment:
[https://news.ycombinator.com/item?id=16859200](https://news.ycombinator.com/item?id=16859200)

In reality, work on reproducibility is about improving the practice of science
overall. It does not in itself show that science is inherently untrustworthy.
What it does show is that scientific discovery is difficult and it takes a lot
of effort and new findings should be treated _critically_. What does
critically mean in this context? It means _with in the boundaries of science_
analyzing the theoretical basis, hypothesis, method, and experimental results
for potential flaws. It does not mean to be skeptical as a default because
science "doesn't work."

~~~
buvanshak
>scientists criticizing how the scientific process is currently conducted for
the purposes of improving the scientific endeavor.

I think what happening here is a bit more serious. They are showing a
widespread crisis. It is not just some minor feedback to improve the process.

>It does not in itself show that science is inherently untrustworthy.

I think when statistics is involved, the results are inherently untrustworthy.
This is not really surprising because there is a whole bunch of ways these
studies that involve statistics could go wrong. And we are still finding new
ways on how this could go wrong.

Then there are things like publication bias, that takes this to a whole new
level. Things like that means that a biased body of journals can project any
consensus that it favors just by selecting studies that fit its narrative. The
inherent issues with statistics means that you can find studies that shows any
possible outcome.

~~~
TangoTrotFox
> _" I think when statistics is involved, the results are inherently
> untrustworthy. This is not really surprising because there is a whole bunch
> of ways these studies that involve statistics could go wrong. And we are
> still finding new ways on how this could go wrong."_

Another very real issue here is that malicious use of statistics can be used
to show nearly anything in ways that can be extremely difficult to detect,
even when the maliciousness is hidden in plain sight. And then going a step
beyond that there's plain old number fudging which is almost impossible to
prove since variance works as sufficient plausible deniability. And finally
there is of course plain old ineptitude. Like you mention even when trying to
do things completely by the book, statistics are _incredibly_ difficult to get
right.

Something that comes to mind here is the recent MIT study stating that Uber
drivers earned $3.37/hour. That study was completely broken. [1] It's
debatable whether the cause was maliciousness or ineptitude, but the point is
that these problems arise, with a disturbing regularity, even when the most
reputable of names are attached to them.

[1] - [https://qz.com/1222744/mits-uber-study-couldnt-possibly-
have...](https://qz.com/1222744/mits-uber-study-couldnt-possibly-have-been-
right-it-was-still-important/)

~~~
AnimalMuppet
Feynmann said, "The first rule is that you must not fool yourself. And you are
the easiest person to fool."

There can be malice, sure. But there can also be _desire to believe_. And,
hey, here's a statistical analysis that shows what the investigator is already
biased to believe anyway...

------
danharaj
Lets take a page from Marx. Science is many things, in particular a
relationship between capital and labor. The scientific method is a wonderful
idea, but it is subordinate to the economic forces that underlie scientific
activity. Look at the conflicts and contradictions between those doing science
(labor) and those deciding the science to be done (capital), and that is the
ultimate source of these crises.

The executive summary lists 40 points on how to improve the reproducibility of
science. A bit over half of them are addressed to the sources of capital such
as private organizations, universities, and governments. I think many of those
points are good. However, I don't think the other points, the ones that
recommend doing science in different ways, have a good punch. Even if you fix
the problems there are today, so long as science is a rat race of trying to
get grant money to stay afloat while burning out grad student after grad
student, I think other pathological practices will creep in as a completely
rational response on the parts of scientists to a hostile ecosystem. There's
just a very big gap between how science should be done and what capital owners
want from science.

~~~
JumpCrisscross
> _There 's just a very big gap between how science should be done and what
> capital owners want from science_

It was "a team from Bayer Healthcare" who "tried to replicate the results of
basic cancer studies," failed, "and kicked off a media storm questioning the
legitimacy of cancer science—and science in general" [1]. The "capital owners"
looking out for their own buck are performing more effectively, narrowly
speaking, than academia.

[1] [https://www.wired.com/2017/01/fighting-cancers-crisis-
confid...](https://www.wired.com/2017/01/fighting-cancers-crisis-confidence-
one-study-time/)

~~~
danharaj
Even if that's true, that only science that can have a profit motive slapped
on it can be done effectively in the current system is a problem. If only such
science were done, then we would have different crises of science to deal with
because profit seeking is at odds with much science, in particular basic
research.

It's also quite obviously the case that profit seeking research has incentives
to be bad science in other ways.

~~~
JumpCrisscross
> _It 's also quite obviously the case that profit seeking research has
> incentives to be bad science in other ways_

I agree. My point is the Marxist framework is a bad one for modern science.

We have endowed academics and ones working for private institutions and
getting grants from public bodies run by scientists. Privately-financed
biotech companies founded by academics taking moon shots and getting acquired
by marketing and distribution powerhouses. Even delineating who is "capital"
and who is "labour" quickly becomes pedantic more than useful.

~~~
danharaj
> My broader point is the Marxian framework is a bad one for modern science.

I will concede that it's reasonable to think it's an incomplete and limited
framework, but I have to take issue with calling it _bad_. For example, I
think this starting point is more productive than the premises of this report.

> Privately-financed biotech companies founded by academics taking moon shots
> and getting acquired by marketing and distribution powerhouses. Even
> delineating who is "capital" and who is "labour" quickly becomes pedantic
> more than useful.

In this example it's plain to me that VCs are capital and entrepreneurs are
labor. That's usually the game in such moonshots. Here we can analyze what
sort of science is being done through this relationship through a Marxian lens
too. VC's want to make big bets with potentially astronomical upside. That
means only certain kinds of science will ever be done in this way. Yea, this
kind of science is probably more replicable on average, but what about
something like Theranos? This sort of science funding has its own problems
with inefficient and incomplete allocation of resources.

I'm not a hardcore Marxist; I don't think you can reduce the world to one
grand theory based on hot takes inspired by Hegel, but I think it's a more
powerful tool than most care to admit.

------
btilly
The article may be good, but the source has a definite bias. See
[https://www.sourcewatch.org/index.php/National_Association_o...](https://www.sourcewatch.org/index.php/National_Association_of_Scholars)
to see what that bias is.

That said, they should be well-informed on the subject. So it probably does
contain a lot of good information.

------
makecheck
There has to be similar prestige/career-building/notoriety/funding for
spending time on reproducing the experiments of others. Without that shift
there will _clearly_ be a greater tendency to just try something new.

Also, when experiments depend on source code, etc. we need real engineering
tools/principles applied. (Something like: “you _can’t publish_ paper X if you
aren’t including a public repository with build/run instructions”.)
Unfortunately, there are all kinds of reasons why scripts/builds could fail
just a few months or years later so they would have to be checked too.

I think it would be cool if document-generation caught on in the publication
of papers, i.e. the paper itself is _generated_ by running actual experiment
scripts and producing charts, etc. from plain text source.

~~~
coldacid
So, Jupyter?

------
Retric
Being reproducible is only _critically_ important if people treat individual
studies as meaningful.

That IMO is a far more dangerous stance. Any study can have hidden flaws, none
should be trusted without some form of replication.

~~~
rplst8
How often is a study even repeated in the course of normal scientific
research? I think most studies are focused on expanding the research and
therefore knowledge about the science being studied. Accepting previous
studies as fact is dangerous and could easily lead to a house of cards
scenario. I think this is especially true in the very specific, niche studies
that are common in today's highly competitive graduate student and publish or
die research landscape.

~~~
lotsofpulp
That is why, as a (global) society, we should spend our resources in making
sure the data we have is accurate by funding research and repeated studies,
and that data should be open to all to verify and build upon.

~~~
jbob2000
I think we're passed the point where studies can be reproduced effectively. I
actually think the scientific model is falling apart. We've discovered the
easy things, and those studies were easy to reproduce. We built a model of
science around the low hanging fruit.

How are you going to reproduce a study about a cancer drug? Synthesizing the
drug itself is a huge endeavor, never mind finding a group of participants
with the exact cancer you need. How would you reproduce the Large Hadron
Collider experiments? You'd need to rebuild one of the most technically
challenging, most expensive scientific instruments ever made.

~~~
haZard_OS
> I actually think the scientific model is falling apart.
    
    
      You must have a totally different definition of "the scientific model" than I do. I suspect (although I could be wrong) is that you have an oversimplified view of just what the scientific process is.
    

The scientific model/process is not a linear list of discrete steps moving
from "Ask a question" to "Communicate results"; it is not a discrete and
static set of criteria we apply to individual experiments or instrument
output.

Berkeley University does a good job of providing visualizations on this topic:

[http://undsci.berkeley.edu/article/0_0_0/howscienceworks_02](http://undsci.berkeley.edu/article/0_0_0/howscienceworks_02)

~~~
jbob2000
Ok, I read your link. I still think my comments stands. My argument would be
that the "Community Analysis and Feedback" section is practically impossible
to maintain.

Back in early science days, this used to be a valuable section because the
pool of "scientists" was huge and they often crossed disciplines. Now you have
all sorts of specialties such that the pool of colleagues who are able to
understand and assess your work might be, like, 8 people.

There was a Japanese mathematician who recently put out a huge proof that
claimed to solve a major math problem (I am forgetting all of the important
details here, so forgive me). The news wasn't so much that he created the
proof, but that they couldn't find anyone to verify it!

------
tonystubblebine
We did an article trying to help people be better critical thinkers when they
hear about psychology research. I thought the author did a good job getting
into how these studies go wrong and also the types of magical results you
should be suspicious of. [https://medium.com/@jhreha/most-psychology-research-
is-bs-73...](https://medium.com/@jhreha/most-psychology-research-is-
bs-73d4793b4dc6)

------
SubiculumCode
We know that weak classifiers can be bagged to produce a strong classifier al
la Adaboost.

Each study is a weak classifier and would have a 'reproducibility crisis' if
retested on new data. However after lots of studies of similar phenomena, a
strong classifier emerges. In the field, we call this converging lines of
evidence.

~~~
scottfr
I'm not sure I really agree with this, because in science studies are biased
towards aligning with the results of previous studies. This can be due to many
factors: authors' expectations, journals like to publish positive over
negative results, peer reviewers will look more critically (in an analytical
sense) at work that disagrees with accepted research, etc....

It can be very hard for many reasons to stand up and say "this prior work is
wrong".

~~~
SubiculumCode
The argument is not that all is perfect in science. It is not. The argument is
that strong replication for each study is not required for progress. Back to
my analogy to the algorithms like Adaboost. Why ever aggregate over a a bunch
of weak classifiers over a single strong one? Well, please correct me if I am
wrong (machine learning is not my field), but the primary advantage is
computational cost. Sometimes, for the same overall classification
performance, training hundreds or thousands of weak classifiers requires fewer
computations than training the strong classifier.

To me it brings up an interesting point. If we view experiments as
classifiers, then how would a machine learning expert set science policy and
practice?

Also viewed this way, it makes me wonder about p-hacking, which increases
sensitivity while increasing false positives. Since negative results are not
generally reported, I wonder whether p-hacking diminishes replicability at the
study level for efficiency at the aggregate level. This is an empirical
question, as ethically it is of course a dubious practice under current
understanding.

~~~
PeterisP
The ensemble advantage is completely the other way around - we might choose to
use a large ensemble of classifiers because they can get slightly better
results than the best single strong classifier we can make; and we might
choose not to use an ensemble because of computational cost reasons,
especially because you train a model once but infer forever (and likely on
more limited hardware) and _inference_ for a hundred classifiers takes a
hundred times more computing power even if you skimped on training length. A
good example is the Netflix Prize, where the best accuracy was achieved by a
large ensemble, but the practical implementation afterwards chose a slightly
less accurate non-ensemble approach for performance reasons.

What you describe sometimes happens in a tradeoff between very different ML
models (e.g. an ensemble of decision trees versus a single deep neural network
has the properties you describe), but within any single paradigm (an ensemble
of neural networks versus one NN with more training time) it's the other way
around.

~~~
SubiculumCode
Always willing to learn more. I was going off of
[https://en.wikipedia.org/wiki/AdaBoost](https://en.wikipedia.org/wiki/AdaBoost)
which mentions ___" Unlike neural networks and SVMs, the AdaBoost training
process selects only those features known to improve the predictive power of
the model, reducing dimensionality and potentially improving execution time as
irrelevant features need not be computed."_ __

edit: When I read about ensemble theory, you receive support for the increased
computational cost.

 __ _Evaluating the prediction of an ensemble typically requires more
computation than evaluating the prediction of a single model, so ensembles may
be thought of as a way to compensate for poor learning algorithms by
performing a lot of extra computation. Fast algorithms such as decision trees
are commonly used in ensemble methods (for example Random Forest), although
slower algorithms can benefit from ensemble techniques as well._ __

But I found this interesting:

 __ _Empirically, ensembles tend to yield better results when there is a
significant diversity among the models.[4][5] Many ensemble methods,
therefore, seek to promote diversity among the models they combine.[6][7]
Although perhaps non-intuitive, more random algorithms (like random decision
trees) can be used to produce a stronger ensemble than very deliberate
algorithms (like entropy-reducing decision trees).[8] Using a variety of
strong learning algorithms, however, has been shown to be more effective than
using techniques that attempt to dumb-down the models in order to promote
diversity.[9]_ __

..which under my analogy would seem to suggest that exact replication is less
productive than having a diversity of study designs.

~~~
AstralStorm
Not quite. If you read the latter paragraph closely, you would note that an
ensemble of stronger models performs better than a multitude of dumb models.

Then there is a problem that an ensemble of biased classifiers (mostly in the
same ditection in this case - positivity bias) will magnify the bias.

This is a reason why metaanalyses have to get at the actual raw data to pool
and analyse as well as correct for multiple biases. Even then the process is
not perfect.

------
arafa
What I'd like to know is if the "irreproducibility crisis" is really some
combination of "sample size was too small" and "effect size was too small".
When I went through a lot of these studies myself and saw the ones that don't
reproduce, I saw this theme over and over. "P-hacking" is less of a concern to
me when the effect is real and widespread.

It's so bad now that for any article/study, I look at the sample size first.
If it's too small (especially < 100) or they don't say I just ignore it. And
if you don't publish or give some estimate of your effect size I just think
about it directionally but don't give it much weight mentally.

~~~
hnuser1234
I'm not sure this captures the whole problem. Sure, huge sample sizes are
great, but look at the current top comment by lisper - we have gained a
boatload (that's a scientific term) of insight from single events. LIGO is
currently rocking a sample size of 7 but I doubt you ignored it.

~~~
arafa
Of course there are exceptions and what I said doesn't capture all cases (like
with massive fixed costs or unique kinds of studies), you're right. I find
it's a pretty good rule of thumb, though.

------
air7
A good solution would be to simply reduce the p-value threshold.

The current "standard" p-value in many fields, of 5%, is arguably too high.
consider throwing a double six in backgammon has probability of 3%. That means
that, on some level, throwing 6-6 would be a valid scientific "proof" of ESP.
(p-value<0.05) This of course is even before p-hacking.

A high p-value is basically externalizing part of the research cost onto other
scientists and in the process creates a lot of false-positive "noise".

------
JohnL4
I think Planet Money did an episode on this a few years ago. Good stuff, and
maybe easier than reading a paper. :)

[https://www.npr.org/sections/money/2018/03/07/591213302/epis...](https://www.npr.org/sections/money/2018/03/07/591213302/episode-677-the-
experiment-experiment)

~~~
Afforess
The report (not a paper) is actually written for a lay audience and very
readable:
[https://www.nas.org/images/documents/irreproducibility_repor...](https://www.nas.org/images/documents/irreproducibility_report/NAS_irreproducibilityReport.pdf)

~~~
HarryHirsch
It's a position paper from a conservative think tank and needs to be treated
as such.

~~~
ItsMe000001
> and needs to be treated as such

What does that even mean?

You have one point "it's a conservative think tank" \- which does not mean
anything - coupled with another empty statement.

Now I sure don't mind criticism of that entity, but I find it ironic that it
is done with such an empty statement that's just a not very cleverly disguised
type of ad hominem attack.

May I suggest you take some time actually reading the paper and then coming
back with criticism of their actual points?

~~~
samfriedman
It is absolutely relevant to consider the political motivations of publishers
of reports like this, when the publisher is a politically active organization
as opposed to an independent research group.

While the report itself may not push an obvious partisan agenda, a critical
reader should consider the conservative agenda of the NAS in relation to why
they might publish such a report. In the conservative wing's long-standing
opposition to science, it's not far-fetched to view this report as being
published tactically -- even if all of its points are well-founded -- as
something to point to when future efforts attempt to undermine science-based
policy on issues like climate change, stem-cell research, abortion, etc.

Especially when the President of the NAS is a public "climate skeptic".

[https://www.chronicle.com/blogs/innovations/bottling-up-
glob...](https://www.chronicle.com/blogs/innovations/bottling-up-global-
warming-skepticism/29754)

------
terminado
I mean, this is still a good thing, since it charts a map of decidedly grey
areas, where information may always be ambiguous, and useful information needs
to be sussed out carefully.

It's better than presumptively assuming that " _Science_ " is an infallible
always black-and-white.

------
rdlecler1
Maybe papers need Yelp reviews on reproducibility. 1-Star couldn’t reproduce.

------
mathgenius
I have met several "refugees" from particle physics, that left in part because
of flimsy statistical methods. So let me ask: who is going to reproduce the
experiment that found the Higgs boson?

------
gringoDan
One of Slate Star Codex's top all-time articles discusses this very issue.
Highly recommend: [http://slatestarcodex.com/2014/04/28/the-control-group-is-
ou...](http://slatestarcodex.com/2014/04/28/the-control-group-is-out-of-
control/)

~~~
jnordwick
> On the meta-level, you’re studying some phenomenon and you get some positive
> findings. That doesn’t tell you much until you take some other researchers
> who are studying a phenomenon you know doesn’t exist – but which they
> themselves believe in – and see how many of them get positive findings. That
> number tells you how many studies will discover positive results whether the
> phenomenon is real or not. Unless studies of the real phenomenon do
> significantly better than studies of the placebo phenomenon, you haven’t
> found anything.

This is such an astute observation that reproduction studies can be used to
find a placebo control group for bad scuence. Parapsychology was one such
group, but now we can find many others. Brilliant.

------
blueprint
"In order to be a true scientist, you should be familiar with philosophy,
first"

------
flamedoge
There is the problem of conflating science with mathematics.

------
ajarmst
Judging by the sheer amount of XKCD content in this report, Randall Munroe
should get a co-author credit.

------
rplst8
This is the exact reason I don't believe science should be used blindly to
affect government policy. It flies in the face of democracy and is subject to
falsehoods just like any other human endeavor.

~~~
lmm
If we don't use science to make decisions, what are we supposed to use? Gut
instinct?

Of course science is imperfect, but it tends towards truth over time, whereas
dogma and populism have no way of correcting mistakes that isn't just as
likely to introduce new ones.

~~~
lliamander
> If we don't use science to make decisions, what are we supposed to use? Gut
> instinct?

I thought the parent comment made it clear: the alternative we are supposed to
use is "democracy". That is, the people who are ruled should determine the
rules that are imposed on them.

Now, democracy does not preclude the use of science to make decisions. The
voters may rely on science to inform their decisions (when the science is
available). It does however preclude a rule by scientific experts/bureaucrats.

But I also think it's foolish to frame the decision as solely one of "science"
vs. "gut instinct". The are many phenomena, especially when it comes to human
society, that are too causally dense for us to sufficiently tease out the
causal factors to make accurate predictions.

In those conditions, the goal should be to minimize the consequences of being
wrong. We do that by decentralizing decisions to the greater extent possible.

~~~
AstralStorm
How do you know democratizing decisions results in minimax optimal decisions?

I would expect the result to be strongly biased due to societal norms, much
less direct manipulation.

Suppose a majority wants strong social programmes and basic income. The tiny
problem is the economy is nowhere near big enough to afford it. Satisficing
(!) executive branch puts some of the policies starving innovation and
education budgets. Causing long term harm and stagnation.

Where is the minimax decision in here? Over what timeframe?

(P.s. I could've as easily picked liberal, fascist, military or conservative
example. This one is relatively easy to understand.)

~~~
lliamander
I should clarify that there were two parts to my answer: first was answering
your question "on behalf of" the parent poster. It seemed their position was
to "let people vote on it". I was somewhat defending that position, but more
just clarifying that democracy wasn't necessarily in conflict with using
science to make decisions.

The second part of my response was to your framing of "science vs. gut-
instinct". My point is that sometimes ignorance is thrust upon us by the
complexity of the circumstances. In that case, my point was that decentralized
policy making (separate from any notion of voting) was essential to reducing
the worst case.

In other words, _I_ am not claiming that democratization "results in minimax
optimal decisions". I _am_ claiming that decentralized governance structures
do "result in minimax optimal decisions". Does that change your question?

edit: fixed spelling

------
quantumwoke
Recently I have been beginning to question university education. It seems that
today's education system has mismatched priorities which may have an impact on
the reproducibility crisis. You see, to come up with a fantastic new alloy,
you don't have to be a genius. Or even very good. You have to be persistent,
and clever enough to realize when you've found something interesting. You will
need to understand your field very well, but there are many, many fields. And
each one is quite narrow, so while it is hard to understand your field well,
many, many people can do this.

So my less dystopian future goes like this. Train lots and lots of physicists,
chemists, engineers, programmers, materials scientists, biologists,
mathematicians. And (shock horror) through higher taxes employ them. Employ
them on promising ideas, and on impossible ideas. Employ them on the problems
of the day, and on arcane research that will almost certainly never bear
fruit. And sure, right at the top in the Apples and Googles of the world,
you'll have the best and brightest pulling together all of the discoveries
made without all these reproducibility issues.

Or we could continue down the dystopian path. The low tax path where there
really are only meaningful jobs for the best and the brightest. Where we rely
on Apple and Google to decide what ideas are worth pursuing. Where the skills
and abilities of a great many are not used. And most of us fight for the
crumbs from the oligarch's tables. And the oligarchs? For every Elon Musk,
there will be 10 who just want the biggest yacht and the prettiest mistress.

Let me say that again. In a low tax environment, the skills and abilities of a
great many will not be used. And we will all be poorer for that. Genuinely
poorer, because great discoveries will not be made. And it is happening now.

And I'll go one step further. We are also wasting the talents of our best
writers and artists. The BBC and other national broadcasters are squeezed.
Squeezed so that the idea of something as original and amazing as Monty Python
appearing now is laughable. And again, we are all the poorer for that. Not
just the lucky ones who get to make amazing shows or works of art, but all of
us who never get to revel in them.

The low tax, small government world is a step backwards. If you wish to sum it
up succinctly, you may say it is a world where, in order to placate the very
rich, we take away opportunities from many, and impoverish the world as a
whole.

~~~
emodendroket
> Recently I have been beginning to question university education. Why are we
> bothering to educate anyone except the very, very best students in physics,
> maths and engineering? In fact, why aren't we identifying these students and
> sending them straight to Caltech? It seems intuitive that this would solve
> one half of the reproducibility crisis.

That doesn't seem intuitive at all.

~~~
quantumwoke
Please read the comment again. I address this in the very next sentence.

~~~
emodendroket
Well, not really.

~~~
quantumwoke
From the HN guidelines:

>Please don't post shallow dismissals, especially of other people's work. A
good critical comment teaches us something.

~~~
emodendroket
I just don't see what relationship the second thought has with the first.

