
AAAS: Machine learning 'causing science crisis' - adzicg
https://www.bbc.co.uk/news/science-environment-47267081
======
kevcampb
Is machine learning really to blame for the reproducibility crisis? I'm not in
academia, but it seemed to me that the problem was entirely present without
machine learning being involed.

For example, Amgen reporting that of landmark cancer papers they reviewed, 47
of the 53 could not be replicated [1]. I would have assumed that most of them
didn't involve 'machine learning'

[1] [https://www.reuters.com/article/us-science-cancer/in-
cancer-...](https://www.reuters.com/article/us-science-cancer/in-cancer-
science-many-discoveries-dont-hold-up-idUSBRE82R12P20120328)

~~~
hannob
The problem was there before, but there are reasons why Machine Learning is
amplifying bad practices.

In the past people were manually fishing for results in available datasets.
Now they have algorithms to do it for them.

In medicine a popular way to use ML is to improve diagnosis. Now there's
already a problem in medicine that the benefits of early diagnosis are
overrated and the downsides (overtreatment etc.) usually ignored. You get more
of that.

And TBH computer scientists aren't exactly at the forefront when it comes to
scientific quality standards. (E.g. practically noone is doing preregistration
in CS, which in other fields is considered a prime tool to counter bad
scientific practices.)

~~~
cle
It seems what ML is really doing is exposing weaknesses in our scientific
processes. The appropriate response here is to fix the processes, instead of
blaming the latest fad and imploring people to "try harder". If the root cause
isn't fixed, the next fad after ML will cause the same thing again. What
feasible systemic changes can we make so that scientists can't get away with
publishing sloppy results?

It's not an interesting question for many scientists, who prefer focusing on
technical solutions over political ones.

~~~
Certhas
I think it is a very interesting question for a lot of scientists, but also an
extremely hard one to answer. And an even harder one to implement.

Even obvious wins that almost everyone can agree on, like getting publishing
out of the hands of for-profit entities that add no value is taking forever,
because cultural, social and political institutions are hard to move.

------
harry8
Fails to touch on the perverse incentives in academia, "publish or perish"
etc. Torturing a dataset to find a p value that a journal will like (or
equivalent stat measure) is better for your career than not publishing a paper
that will be discredited in time. You have no incentive at all to decide "my
results are unconvincing at this point, I'm not going to submit them" and
every reason to write them up as a useful contribution to human understanding
even if you kind of know, deep down, it really isn't. Especially if you're not
senior...

~~~
konschubert
Some days ago there was a post here on HN about a post doc who failed to get
tenure.

And every other comment was like: “What did he expect, he had much less than
the usual two papers a year.”

So it seems that even here on HN, the mindset of quantity over quality still
persists.

~~~
baxtr
The problem really is: there is no alternative (which I know of). Having a
number of papers in well known journals is everything we have to gauge the
quality of people that look for a life time position. It’s sad but true

~~~
pas
No, the number of papers is an okay metric, thr problem is that journals don't
like to publish honest negative results

~~~
Jeff_Brown
There are a few journals that will promise to publish your results based
entirely on your pre-registered plan -- e.g. whether or not you find the
correlation you were looking for.

In the long term, it seems like journals that publish lots of falsified papers
should be punished, and journals that don't (e.g. because of a judge-upon-pre-
registration policy) should crowd them out.

~~~
pas
Pre-registration is great, and I think it even helps
researchers/scientists/post-docs to submit something (it doesn't have to be
perfect, after all, it's just an experiment design, and they don't have to
worry about massaging the data to have promising results to report) and then
stick to the topic, and then carry out the experiment as well as they can, and
gather as much high quality & fidelity data as they can, and do the analysis
according to the plan, and then report it. No extra pressure to think about
how to frame what you "found".

Though this will inevitably lead to the problem that grant boards face, that
it'll be a lot harder to differentiate between proposed experiments. And it'll
be even harder to do boring stuff. (So if we assume all submitted plans are
sound, they have to publish them all. Though then we'll have journals based on
how strict they are with experiment design requirements, 1 sigma, 2 sigma, 5
sigma, etc.)

------
hobofan
ML is not causing a reproducibility crisis, it just exposes one that is
already there.

> If we had an additional dataset would we see the same scientific discovery
> or principle on the same dataset?

The same holds true for traditional science based on traditional statistics.
It just seems that traditional datasets are under less scrutiny of
reproducibility and are taken more easily at face value.

~~~
rocqua
A specific issue with machine learning is overfitting and non-
interpretability.

The first means it is possible to get results that don't generalize (even if
they survive cross validation). The second means it is a lot harder to detect
use of correlations that cannot possibly be causal.

~~~
a_bonobo
One of the issues with problems of reproducibility in the social
sciences/psychology is that early studies usually choose WEIRD (white/educated
from a rich/industrialised/democractic country) subjects, which are often uni
students who are very different from the rest of the world.

One article: [https://slate.com/technology/2013/05/weird-psychology-
social...](https://slate.com/technology/2013/05/weird-psychology-social-
science-researchers-rely-too-much-on-western-college-students.html)

Could you not interpret this, in the widest sense, as overfitting
_interpretations_ to a 'weird' dataset, with the results that do not
generalize, even though the stats (in this case, likely t-test) say everything
is fine? In which case overfitting isn't a ML-only problem?

~~~
rocqua
Overfitting is about reading too much in a properly sampled data set from any
distribution. It comes about when your model has so many parameters it can be
made to fit anything. Regularization keeps it in check, but it remains an
issue.

The example you cite has an issue earlier in the chain. Here, we are dealing
with a biased sample from some distrubution. The results from that won't
generalize to the unbiased distrubution, no matter how great your model is.

------
willj
Curious (possibly naive) question: isn't there a fundamental difference
between the goals behind creating models with ML vs the "old-fashioned" way?
That is, in modern ML applications, you're creating a model with
dozens/hundreds of potential variables, without a hypothesis of how they
relate or contribute to the target (other than that they might, hence your
including them in the modeling process). You're using the model for
predictions more than for explainability (though there is work ongoing into
improving explainability, but it seems kind of post hoc to me). And there's an
expectation that you will retrain, or at least tune, the model as its
predictive accuracy decays over time.

By contrast, traditionally in science you're coming in with a hypothesis ahead
of time about what variables predict what target. The goal is to come up with
a model that is consistent with your hypothesis (and possibly some existing
theory), and which can be applied generally, and which should need no tuning.
For example, the very simple model for Beer's Law-- absorbance vs
concentration. That is a law that will apply in every other circumstance, but
if modern ML methods had been applied, the scientist might have chosen the
model with a slightly better score but which includes nonsense variables in
addition to concentration.

All that to say, it seems to me the problem stems from scientists' lack of
hypotheses at the outset of a project, and/or the understandable desire to get
the best bang for their buck out of an experiment by measuring dozens of
variables at once and hoping the magic of ML can find a hypothesis for them.

Hope that made sense.

~~~
fock
I think you got the point. A lot of people don't seem to realize that ML might
be great for finding patterns but will never yield scientific knowledge in the
sense of cause-reaction sense.

Unfortunately everyone thinks he can use it for finding "new stuff" and so in
my field they "predict material properties", etc. using ML fed with data where
every review about the physics tells you that the algorithms they use for
extracting that data are domain-specific and might yield results different on
the order of magnitudes. But nobody cares; take some SW off the net, which
claims to be able to extract what you want, run it, train your ML, publish
your results.

~~~
ppod
What method would you use to yield scientific knowledge in the sense of cause-
reaction? Many important processes really do have large numbers of causal
factors that interact non-linearly. If we want to try to learn about that,
some statistical method that deals with many parameters will be needed. Such
models are generally referred to as "Machine Learning". Their generalisation
or causal inference properties are particular to each implementation and
identification strategy, but you can't just say "ML will never yield
scientific knowledge".

~~~
fock
A tool called Mathematics which can exactly describe this interactions. And if
those processes have a lot of variables, a ML-model might certainly be useful,
but it will never be generally applicable! This probably also contributes to
"scientific knowledge" but it's not the same as scientific facts (or whatever
you call universally transferable results).

~~~
ppod
You're building your mathematical model based on the knowledge you have, which
is from the data you have, and there is still the same risk that your theory
won't generalize to new observations.

------
paraschopra
Science works because it posits models first, and then data is sought to
confirm or disconfirm it. The benefit of having a model first is that it is
much more likely to be general (and hence reproducible).

ML does completely opposite. Data first, and then the model is discovered
using data. It's pretty easy to see why it would lead to non-reproducible
models.

~~~
hinkley
From this perspective, where is the line between ML and automated p-hacking?

------
fock
An undergrad to his supervisor in our office talking about publishing a paper:
I've fixed the data, now the plots look ok. I (undergrad too) am sitting there
thinking - well, you are using ML as a regression blackbox to plot a line, I
can do that too w/o ML if I'm fixing the data. Supervisor: ok, that's really
great. Me cringing...

I'm not hammering the ML-keyword above my work (and thus am getting
considerably less academic attention), but it's nice to hear from people who
made it in academia that they support my theory. 50% off the people are just
showoffs throwing buzzwords and positivity around while they produce a load of
sh __...

~~~
altairiumblue
Yup. As someone who is in a junior position in the field - I'm torn between,
on one hand, riding the wave so to speak and taking advantage of all the
buzzwords that I can put on my resume (which is fine with me because I can
back them up), and on the other hand avoiding association with a certain type
of person/career path that might turn out be just hot air in a couple of
years.

So if I can actually write decent code, have solid understanding of software
development principles, have studied math and statistics from the ground up to
an advanced level, am familiar with relevant research - then do I really want
to call myself a data scientist (or ML-something) just because it might
improve my job/salary prospects, or do I want to stay away from it because
everyone who takes a one-week course on Udemy calls themselves a data
scientist without being able to back it up with actual skills?

~~~
fock
Having been subject to a lot of buzzword-blarers over the last term I'd say
that you just tell everybody that you are working the foundations of what all
the other people are doing if they ask, what makes you special. If you can
back it up, why not ride the wave. I would refrain from putting the sticker on
everything though. I als made a turnaround and now spin everything I do which
takes time (basically automating my research and data-vis) as building blocks
for (gradschool) ML-based research.

------
caramelsuit
That was a terrible article. I didn't see even one concrete example of their
complaint. Blaming the reproducibility crisis on machine learning methods is
just a cheap dodge.

------
bitL
My impression from the article was that the doctor stating those opinions has
no idea how ML works and how to apply it properly, leading to statements like
that. "ML gap" is real I guess...

~~~
sonofaragorn
She has a PhD in Statistics from Stanford. The title of her thesis was
"Transposable Regularized Covariance Models with Applications to High-
Dimensional Data".
([http://www.stat.rice.edu/~gallen/](http://www.stat.rice.edu/~gallen/))

I think she knows what she is talking about.

~~~
neurostats
Thanks for pointing this out. The professor cited most certainly knows the
best of ML. I'm guessing the author of this article simply attended the AAAS
session where Allen gave a talk on recent work on addressing inferential
challenges with modern ML and wrote this piece that doesn't do her work
justice. See
[https://aaas.confex.com/aaas/2019/meetingapp.cgi/Session/215...](https://aaas.confex.com/aaas/2019/meetingapp.cgi/Session/2159)
& list of recent papers
[https://arxiv.org/search/?query=genevera+allen&searchtype=al...](https://arxiv.org/search/?query=genevera+allen&searchtype=all&source=header)

Nearly all statisticians realize the need for more inferential thinking in
modern ML. E.g.
[http://magazine.amstat.org/blog/2016/03/01/jordan16/](http://magazine.amstat.org/blog/2016/03/01/jordan16/)
We still don't do that well in high dimension, low sample size regimes that
make up the majority of life science research.

------
boomskats
Good read. It's also refreshing to see a mainstream article that talks about
ML without once mentioning 'AI'.

~~~
harry8
ML is statistics with a different name using a computer, that should always be
mentioned in articles for the general public.

On the other hand AI is fantasy BS hype boosting off the fact that ML sounds
similar to AI to people who aren't aware Machine Learning _is_ stats.

Maybe AI one day but today it is utterly ridiculous. No really. Every single
article should mention both of those things at least in passing. Downvote away
all you AI hype Surfers but you know it's true.

~~~
rocqua
I'd say currently ML is heuristics done by a computer. The rigour of
statistics isn't quite present in ML yet. At least, not in the basic courses
yet.

~~~
harry8
You can equally do the thing named statistics without rigour. In fact that
describes most of it. Sadly. See replication crisis, p-hacking, garden of
forking data etc. etc. etc. I'd say the overwhelming majority of university
stats courses have no rigour at all and that may not be a bad thing in and of
itself?

------
evrydayhustling
How does this article manage not to mention a single actual example of ML-
related misconceptions?? I'm sure they exist, but there is literally nothing
here except some assertions and a plug for a vaguely remedial research line.

~~~
neurostats
Here is the missing context from press release:

``` "In precision medicine, it's important to find groups of patients that
have genomically similar profiles so you can develop drug therapies that are
targeted to the specific genome for their disease," Allen said. "People have
applied machine learning to genomic data from clinical cohorts to find groups,
or clusters, of patients with similar genomic profiles.

"But there are cases where discoveries aren't reproducible; the clusters
discovered in one study are completely different than the clusters found in
another," she said. "Why? Because most machine-learning techniques today
always say, 'I found a group.' Sometimes, it would be far more useful if they
said, 'I think some of these are really grouped together, but I'm uncertain
about these others.'"

Allen will discuss uncertainty and reproducibility of ML techniques for data-
driven discoveries at a 10 a.m. press briefing today, and she will discuss
case studies and research aimed at addressing uncertainty and reproducibility
in the 3:30 p.m. general session, "Machine Learning and Statistics:
Applications in Genomics and Computer Vision." Both sessions are at the
Marriott Wardman Park Hotel. ```
[https://eurekalert.org/pub_releases/2019-02/ru-
cwt021119.php](https://eurekalert.org/pub_releases/2019-02/ru-cwt021119.php)

& the context of the AAAS session
[https://aaas.confex.com/aaas/2019/meetingapp.cgi/Session/215...](https://aaas.confex.com/aaas/2019/meetingapp.cgi/Session/21598)

------
dguest
This is a misleading title. The researcher they quote is

> ... developing the next generation of machine learning and statistical
> techniques that can ... also report how uncertain their results are and
> their likely reproducibility.

So she's actually using machine learning to access systematic uncertainties,
i.e. to get better, more reproducible research. Of course, like all forms of
automation, people tend to sensationalize progress as a crisis because it
makes it too easy to shoot yourself in the foot.

But doing things "the old fashioned way" isn't any better. Early particle
physics experiments would get armies of undergrads classify photographs of
collisions in bubble chambers. These results took thousands of researcher-
hours to compile, which might seem all fine and dandy, until you realize that
there may have been a systematic bias in your classification. Now what do you
do?

Thanks to machine learning, there are a lot of things we can do: we can try to
remove the bias and retrain the algorithm, or we can train with extreme
examples of bias and use that to quote a systematic uncertainty. We can try a
multitude of approaches to estimate uncertainties rerun our entire analysis in
a few hours. Good luck doing that with an army of undergrads.

------
sgt101
Case in point : LHC Higgs results - how many detection's vs how many events?
How were the detection's determined... The answer is with a large booster [1]

I postulate that out of 12 billion random events it would be remarkable if a
booster didn't extract 100 or so items that looked similar to a Higgs
detection.

Well, let's give it 20 years and a new generation of PI's who aren't invested
in this and have grad students who are keen to find something different in the
data.

But ohh.. all the data has been thrown aways... oh! [2]

[1]
[https://indico.cern.ch/event/705941/contributions/2897000/at...](https://indico.cern.ch/event/705941/contributions/2897000/attachments/1605280/2546655/mlhepAthens-
Feb22-2018.pdf)

[2]
[https://www.forbes.com/sites/startswithabang/2018/09/13/has-...](https://www.forbes.com/sites/startswithabang/2018/09/13/has-
the-large-hadron-collider-accidentally-thrown-away-the-evidence-for-new-
physics/#d1c86469270a)

~~~
dguest
90% of the work in LHC physics is estimating the amplitude of backgrounds that
look almost exactly like your signal process. Coming up with the "large
booster" for classification is only a small part of it. So yes, machine
learning is used, but no, we don't use it blindly like you imply.

As for throwing all the data away, the article you link to actually does a
good job of explaining how this is done: we look at every collision with
thousands of sensors before deciding whether to keep it. At this stage there
is absolutely no machine learning anyway (just physics knowledge), so be
careful blaming machine learning for any missed discoveries.

------
afabisch
Overfitting is a well-known problem in the ML community. There are methods to
avoid this: cross validation, train-test splits, etc. There are also models
that give you an estimate of the standard deviation of a prediction. What is
the point? We don't need new algorithms, we just have to apply existing
methods properly.

------
itg
Title makes it sound as if the AAAS made this statement, its a single
researcher who is making this claim.

------
x3tm
> Machine learning 'causing science crisis'

ML or more generally mathematics do not cause anything. People who misuse
mathematics are to blame here. Some fields are simply using tools they don't
understand and this predates ML advances by decades. Thinking of stats use in
psychology and medicine for instance.

This trend of presenting ML are some kind of magic powder is ridiculous. I
blame hyped presentations by influential ML scientists for this.

------
e_carra
I wonder: don't machine learning frameworks' results come with a level of
confidence?

Ps: I have no experience with anything regarding ML.

~~~
paraschopra
Because it's difficult. Bayesian methods don't scale well to the number of
parameters that most modern ML demands. Doing inference for large models is
intractable.

However, even if you're able to do that, the problem isn't solved. You have to
answer level of confidence for what? The uncertainty / confidence that you get
assumes your model is right. No model can tell you whether it is a true
reflection of reality. I had written more about this on my twitter:
[https://twitter.com/paraschopra/status/1075033048767520768](https://twitter.com/paraschopra/status/1075033048767520768)

------
anjc
I can see there being issues with reproducibility, i.e. getting the exact same
results, but has there ever been a time when science was more replicable?
Data/techniques/findings/papers are under more scrutiny than ever. No positive
results will be taken as sacrosanct in CS anymore. This is a complete 180 from
10+ years ago.

~~~
analog31
In my view, rather than talking about time periods, it might be preferable to
consider different fields of science wrt replicability. In fact it doesn't
even make sense to put all of "science" in one basket. The medical and
behavioral sciences get the most attention these days, but are not comparable
to physics, chemistry, geology, astronomy, etc. My field (physics) doubtlessly
has its own problems, yet has produced theories of astounding generality and
accuracy in spite of potential flaws in the individual studies that led to the
success of those theories.

Replication might not turn out to be the big problem. The lack of progress
towards a unifying theory might be a more important long term issue.

------
raverbashing
Hopefully machine learning helps with confidence and making predictions out of
experiments as opposed to the limited capability of "understanding" from the
way things are done now (as if an experiment with slightly higher p values are
ignored or with smaller values might have hidden biases, etc).

------
77pt77
I've often imagined how different Newtonian physics would be if we had gone
the ML route from the beginning.

------
bayesian_horse
The other day someone lamented that you can't get published as an honest ML
researcher, because other scientists are rendering whole professions obsolete
all the time...

~~~
sampo
> you can't get published as an honest ML researcher

If you research ML, you can publish in ML journals, there are several. If your
research is about applying ML to domain problems, are you then an ML
researcher or a domain researcher?

~~~
bayesian_horse
The point was that it is difficult to get noticed with down-to-earth work when
the whole field seems to be aiming for the stars.

------
daodedickinson
It's not like teaching to the test works better for humans.

------
repolfx
As other comments observe, the replication crisis predates the use of ML, so
the causes are clearly deeper.

I think there's actually a very simple explanation for this which lots and
lots of people hate, so they're sort of in denial about it. Academia is
entirely government funded and has little or no accountability to the outside
world. Academic incentives are a closed loop in which the same sorts of people
who are producing papers are also reviewing them, publishing them, allocating
funding, assessing each other's merits etc. It's a giant exercise in marking
your own homework.

Just looked at in purely economic terms, academia is a massive planned
economy. The central planners (grant bodies) decide that what matters is
volume and novelty of results, so that's what they get, even though the
resulting stream of papers is useless to the people actually trying to apply
science in the real world ... biotech firms here but the same problem crops up
in many fields. It's exactly what we'd expect to see given historical
precedent and the way the system works.

There's another huge elephant in the room here beyond the replication crisis
("to what extent are the outputs wrong") which is the question of to what
extent are the outputs even relevant to begin with? Whenever I sift through
academic output I'm constantly amazed at the vast quantity of obviously
useless research directions and papers that appear to be written for their
cleverness rather than utility. The papers don't have to be wrong to be
useless, they can just solve non-problems or make absurd tradeoffs that would
never fly in any kind of applied science.

I read a lot of CS papers and I've noticed over time that the best and most
impactful papers are almost always the ones coming out of corporate research
teams. I think this is because corporate funded research has some kind of
ultimate accountability and connection to reality that comes from senior
executives asking hard questions about applicability. For instance in the
realm of PL research academia pumps out new programming languages all the
time, but they rarely get any traction and the ideas they explore are
frequently ignored by the industrial developers of mainstream languages
because they're completely impractical. This problem is usually handwaved away
by asserting that the ideas aren't bad ideas, they're just incredibly
futuristic and 30 years from now we'll definitely be using them - but this
kind of reasoning is unfalsifiable on any kind of sensible timescale so it's
the same as saying, "I shouldn't be held accountable within the span of my own
career for how I spend tax and student money".

As time goes by I am getting more and more sympathetic to the idea of just
drastically cutting academic funding and balancing the books by drastically
reducing corporation tax. The amount of total research would fall
significantly because corporations wouldn't invest all the newly available
money in research, or even most of it, but it's unclear to me that this would
be a bad thing - if 75% of research studies coming out of academic biotech are
wrong then it stands to reason that if standards were improved significantly,
funding could be reduced by (say) 50% and still get a similar quantity of
accurate papers out the other end. It's possible the science crisis is really
just reflecting massive oversupply of scientists, massive undersupply of
accountability and in general research should be a much smaller social effort
than it presently is.

~~~
agent008t
Somehow, it seems like during the Cold War (50s, 60s), there were fewer
scientists and the quality of the output was higher. Not sure if that is the
case, or survivorship bias. But if it is the case, what was different about
the system back then?

To play the devil's advocate: scientists in industry/corporations do not come
out of nowhere - they come from academia. Will the academics not move to
countries where academic research is better funded? The students will follow.
Corporations will set up their research labs in those countries near the
universities to poach the best talent. Suddenly, your country is at a
disadvantage.

------
stiff
A dishonest scientist can mine a dataset for statistically significant
hypotheses and for a long time no institutional protection against it was in
place:

[https://en.wikipedia.org/wiki/Data_dredging](https://en.wikipedia.org/wiki/Data_dredging)

[https://www.xkcd.com/882/](https://www.xkcd.com/882/)

Machine learning makes it easier to test great many hypothesis, but even going
fully "by hand" it is very easy to deviate from what the statistical framework
of hypothesis testing would demand. There are now some discussions about
counter-measures, e.g. about preregistration of studies:

[http://www.sciencemag.org/news/2018/09/more-and-more-
scienti...](http://www.sciencemag.org/news/2018/09/more-and-more-scientists-
are-preregistering-their-studies-should-you)

You can see this as another chapter in the long debate about the correct way
to test scientific hypotheses:

[https://en.wikipedia.org/wiki/Statistical_hypothesis_testing...](https://en.wikipedia.org/wiki/Statistical_hypothesis_testing#Criticism)

~~~
raverbashing
As your number of samples increase the chance that a hidden variable that
explains the phenomenon but correlates with the thing you're testing also
increases.

All experiments have a limit it seems

------
maxander
The issue talked about here is distinct from the larger "reproducibility
crisis"; the latter is a result of shoddily designed (or simply fraudulent)
_experimental_ work, whereas the issue here is the aggregate effects of the
huge amount of _computational_ work that is being done- even when that work is
being done correctly and honestly.

Testing a hypothesis against a pre-existing dataset is a valid thing to do,
and it is also almost trivially simple (and completely free) for someone with
a reasonable computational background. There are researchers who spend a
decent portion of their careers performing these analyses. This is all well
and good- we want people to spend time analyzing the highly complex data that
modern science produces- but we run into problems with statistics.

Suppose an analyst can test a hundred hypotheses per month (this is probably a
low estimate.) Each analysis (simplifying slightly!) ends with a significance
test, returning a p-value indicating the likelihood that the hypothesis is
false. If p < 0.01, the researcher writes up the analysis and sends it off to
a journal for publication, since the odds that this result was spurious are
_literally_ hundred-to-one. But you see the problem; even if we assume that
this researcher tests _no valid hypotheses at all_ over the course of a year,
we would expect them to send out one paper per month- and each of these papers
would be entirely valid, with no methodological flaws for reviewers to
complain about.

In reality, of course, researchers sometimes test true hypotheses, and the
rate of true to false computational-analysis papers would depend on the ratio
of "true hypotheses that analysis successfully catches" to "false hypothesis
that squeak by under the p-value threshold" (i.e., the True Positive rate vs
the False Positive rate.) It's hard to guess that this ratio would be, but if
AAAS is calling things a "crisis," it's clearly lower than we would like.

But there's a further problem, since the obvious solution- lower the p-value
threshold for publication- would lower _both_ the False Positive rate and the
True Positive rate. The p-value that gets assigned to the results of an
analysis of a _true_ hypothesis are limited by the statistical power
(essentially, size and quality) of the dataset being looked at; lower the
p-value threshold too much, and analysts simply won't be able to make a
sufficiently convincing case for any given true hypothesis. It's not a given
that there is a p-value threshold for which the True Positive/False Positive
ratio is much better than it is now.

"More data!" is the other commonly proposed solution, since we can safely
lower the p-value threshold if we have the data to back up true hypotheses.
But even if we can up the experimental throughput so much that we can produce
True Positives at p < 0.0001, that simply means that computational researchers
can explore more complicated hypotheses, until they're testing thousands or
millions of hypotheses per month- and then we have the same problem. In a race
between "bench work" and "human creativity plus computer science," I know
which I'd bet on.

