
How big data has created a big crisis in science - kouzant
https://theconversation.com/how-big-data-has-created-a-big-crisis-in-science-102835
======
x3tm
Not sure what big data has to do with this. It is a problem that has always
existed. Particularly in the "soft" sciences and biology. Its' certainly not
the case in physics for instance.

Big data may bring another dimension to the problem when deep learning will be
used in science. However, that's a detail and we're not there yet.

The fact that a paper is reproducible or not is not problematic per se. This
is not what defines science. The real problem arises when 1/ a big
claim/discovery is made in a paper that is not reproducible, 2/ nobody tries
to check the results independently, and 3/ the community takes nevertheless
the paper seriously and accepts its findings. All this has nothing to do with
the use of statistics (unless the whole community makes the exact same errors)
or big data.

------
iso1337
I don’t think the author has the causality correct here, at least for
biosciences. The statistical problems existed long before omics, big data,
etc.

Most of the graduate programs don’t require students to take statistics, or if
they do, it’s very cursory. Furthermore, students often learn very little
about assay design - they end up thinking that non-linear responses are linear
and do things like divide assay signals to get ratios (two sins here: assuming
the assay response intercepts at 0 and that it’s linear).

So at least for the biosciences, it’s been a shitshow for a while.

~~~
amelius
One question is: would science come to a halt if people are not sufficiently
careful, or would scientific discovery just slow down a bit?

I'm asking because there will probably always be something you are not careful
about.

So in other words: how bad is bad statistics?

~~~
jeffdavis
Why can't it go in reverse?

New discoveries that are actually false seem likely to cause a lot of
confusion leaving you worse than wrn you started.

~~~
iso1337
Indeed, you can look at this sirtuin boondoggle

[https://blogs.sciencemag.org/pipeline/archives/2011/09/22/th...](https://blogs.sciencemag.org/pipeline/archives/2011/09/22/the_latest_sirtuin_controversy)

[https://blogs.sciencemag.org/pipeline/archives/2011/12/13/th...](https://blogs.sciencemag.org/pipeline/archives/2011/12/13/the_sirtuin_saga)

[https://en.m.wikipedia.org/wiki/Sirtris_Pharmaceuticals](https://en.m.wikipedia.org/wiki/Sirtris_Pharmaceuticals)

Famous professor from Harvard discovers possible route to elongating lifespan
and then spends 1bn+ in funding to pursue that idea. I haven’t kept up with
that story since then, so it’s possible that someone might be able to use it
as a therapeutic target.

------
sgt101
Data driven hypothesis have always been central to science, but the trick is
that they are used to generate a theory which produces a prediction that's not
seen in the data (so far) that can then be tested with statistical methods.

~~~
jeffdavis
Or, after making the hypothesis, collect _new_ data to try to refute it.

~~~
sgt101
yup - even better; new data from new sources with different instruments -> if
it thwarts the prediction of your theory then your theory's wrong!

------
rossdavidh
For an article on proper use of statistics in science, this is rather short on
data for an empirical test. For example, did studies from the pre-Big Data era
(whenever you think that was) actually have a higher rate of reproducibility?
If this has been demonstrated, I am not aware of it, and certainly we are not
given a reference to such data in this article.

------
matchagaucho
Seems like the failure is in paper editing and review... why are these
"findings" getting published at all?

------
rafiki6
I don't see an issue with having a data collection and data engineering
function that operates separately from those scientists who are creating
hypotheses and then they can search data catalogs and libraries to serve their
hypotheses no? It seems the author has mistaken the ability to collect and
process data with running an experiment. Further, does cross validation not
apply in the sciences? Does sample size not apply? In the apocryphal example
given in the story, wouldn't a study get tossed for if it used a sample size
of 8 data points to begin with? And wouldn't it be really really stupid for
scientists attempting to reproduce the study to go and reuse the same 8
samples?

~~~
iso1337
Usually the cost of doing the experiments precludes large sample sizes.

And you would be shocked at the lack of statistical education amongst
biologists. Often many them freely admit that they chose biology because they
like science but are terrible at math.

------
guscost
Science Has Only Two Legs:
[https://m-cacm.acm.org/magazines/2010/9/98038-science-has-
on...](https://m-cacm.acm.org/magazines/2010/9/98038-science-has-only-two-
legs/fulltext)

