
Science Isn’t Broken - ot
http://fivethirtyeight.com/features/science-isnt-broken/
======
yk
Science is not broken, but the scientific process is severely misrepresented.
I can only really speak about physics, but at least there and I believe also
in other fields, people do experiments, observations etc. and then write
papers on whatever they did. The papers are intended for a highly specialized
audience and contain a lot of detail besides some claim and a p value and
there are usually quite a few closely related papers. Actually I sat in more
than one conference session where every single talk had the title "The X-ray
spectra of NGC XXXX." Then someone will write a review article on the field.
And that person is highly specialized, has probably read hundreds of papers in
the field over the last decade(s) and has hands on experience. That guy knows
that if there is no mention of X in the methodology section, then the authors
did not deal with X, which may or may not be justifiable. After that someone
writes a textbook, that guy will read several review papers, and he has
probably enough experience to know that a sentence like "compare the
contradictory results of Y" is a indication that he should look into the
actual papers and perhaps use very cautionary language or omit the topic
entirely. At that point, when it appears in textbooks, a result is settled
science, not when the first indications appear in a paper.

On the other hand, science writes want a headline "Something baffles
scientists." Well, the scientists are probably baffled because it contradicts
their intuition, which is a strong indicator that something is not really
working. This something may be a honest mistake by the authors, a poorly
understood experimental effect, outright fraud or it may be something worth
reporting. The entire process of science writing is geared towards the papers
which are most likely to be wrong.

~~~
Retric
I agree with what your saying. However, I feel publish or perish has infected
many fields resulting in a lot of crap instead of more focused high quality
work. Few people can risk spending 10 years on something and have nothing to
show for it.

It's a negative feedback loop where quality is hard for people outside the
field to judge. So, it's just race to see who can lower quality more while
still being published.

~~~
cossatot
> Few people can risk spending 10 years on something and have nothing to show
> for it.

This is a common sentiment, and fortunately for scientists I don't think it's
a very accurate description of the real problems we face. I really don't think
that there are many good reasons to spend 10 years on _one thing_ when there
is a substantial risk of failure. Scientists spend 10 years on problems all
the time! But it's either part time, or has measurable metrics for progress
along the way that can be published. I mean, you try to go to the moon before
Pluto, right? Science (as an institution) is really a lot more like Tetris
than people think, except some of the blocks (publications) get moved or
destroyed later. But if the blocks are smaller, they're much easier to fit
together and build upon, and fault tolerant and so forth, so if they blow up
it's less of a big deal. Smaller, more well-defined and robust studies are
generally better.

To change analogies to paths instead of blocks, which is a better description
of the decision making a scientist goes through (rather than building the
edifice of human knowledge), it's a lot more like finding your way through a
new city than choosing two paths in a snowstorm that will not encounter
anything for 10 years' walk. At every fork, you have to decide where to go and
there will be some reason you have made that choice. After some smaller
period, you can if you choose report where you've gotten and why. Other people
care! They will be happy to read about it, as long as your logic is pretty
good, especially if you make nice observations along the way. You can publish
this, and probably no one will steal it from you. Maybe someone will... but if
you do it right, do it incrementally and let people know what you're up to,
you'll at least have your batshit ideas published as unreviewed abstracts and
folks will know who made the first progress.

~~~
Retric
Let's say you want to study long term impacts of taking Viagra which is
clearly going to take some time. Sure, you could look at old data, but that
always has issues. So, you get funding, start some research, and want to
publish something when your done.

Option A: Pick just one thing to look at say cardiovascular health.

Option B: Collect a lot of data, and then look for trends.

Option B has much stronger risks of false positives simply because your
looking at more factors, but those false positives seem much more interesting
which helps you get published. Worse you can't use any of that data for
verification, so you now need to run another 10 year study. Upside, you got
published, downside, your results are almost meaningless. Bigger upside, you
then get to publish again in 10 years.

Well what about option C use approach B but publish sooner. Now your not only
risking false positives, but also extrapolating from more limited data.

Hmm, guess what say nutritionists are going to pick... "Chocolate good/bad for
you!"

------
thinkmoore
I feel like every description of the problem of p-values I read is missing the
real issue. And that's that people are using the p-value incorrectly.

A fairly standard definition of the p-value (from wikipedia) says: "In
statistics, the p-value is a function of the observed sample results (a
statistic) that is used for testing a statistical hypothesis. Before the test
is performed, a threshold value is chosen, called the significance level of
the test, traditionally 5% or 1% and denoted as α."

What this description is missing though is the crucial importance of the fact
that the threshold value is chosen _before_ doing the analysis. And moreover,
that the entire _analysis_ plan has been chosen before doing the analysis.
Because what the p-value is really telling you is the probability that on
repeating the experiment (and its accompanying analysis!) you would see a
result as or more extreme than what you observed.

If your experiment comprises "try all of the combinations of variables to see
what gives me the best answer", the p-value you compute would need to be some
very fancy test that took that into account... and you would see your analysis
as having _much_ less statistical power.

For a simple example, look at a statistically rigorous method for dealing with
multiple hypothesis testing when you plan it in advance:
[https://en.wikipedia.org/wiki/Bonferroni_correction](https://en.wikipedia.org/wiki/Bonferroni_correction).

Of course p-hacking is bad. The problem isn't frequentist statistics or
p-values though, its scientists not understanding the statistics that they
use. If you want to use a p-value to help make a decision about a hypothesis,
you have to commit to your analysis plan _in advance_.

edit: Furthermore, p-values were designed to deal with experimental data. If
you're doing an observational study, perhaps you should use statistical tools
designed for that purpose.

To sum up: when you have people who have no idea what they're doing do
statistics, they will do it badly.

~~~
Hermel
> Because what the p-value is really telling you is the probability that on
> repeating the experiment you would see a result as or more extreme than what
> you observed.

That's incorrect. If you perform the exact same experiment twice, the chances
are exactly 50% that the first result is more extreme than the second one
(neglecting equal outcomes).

This illustrates nicely how hard it is to properly explain the p-value. Things
would be more intuitive if people would report confidence intervals. I prefer
"x is with a likelihood of 95% between a and b" a thousand times over "we
measured x=c and reject the null hypothesis with 95% probability".

~~~
gjm11
> the chances are exactly 50% that the first result is more extreme than the
> second one

The chance is 50% if you condition only on the information you have before
doing either experiment. But once you've done the first experiment, the chance
of a more extreme result _given what happened the first time_ may be much more
or much less.

(Extreme example: Your experiment consists of rolling ten ordinary 6-sided
dice. The null hypothesis is that they're fair dice, fairly rolled, in which
case you expect a total not too different from 35 pips. All the dice come up
6. It is not now true that if you run the experiment again, you're as likely
to get a more extreme result as you are to get a less extreme one!)

> confidence intervals [...] "x is with a likelihood of 95% between a and b"

But that isn't what a confidence interval means! A 95% confidence interval
[a,b] means "If we ran the experiment lots of times, using the same method of
computing the interval [a,b] each time, then in 95% of runs (in the long run)
the true value would be in the interval [a,b] obtained on that run".

(What you described is what Bayesians call a "credible interval". Of course
that interval depends on your prior.)

~~~
Hermel
> in 95% of runs (in the long run) the true value would be in the interval
> [a,b] obtained on that run

So 95% of the runs produce an interval that includes the true value. Doesn't
that imply that when doing one run, there is a 95% chance of the resulting
interval including the true value? I.e. the probability is 95% that the true
value is in the interval?

~~~
kgwgk
Not really. For example, following a procedure which behaves well in the long
run and provides 95% coverage you could get a confidence interval that
contains only impossible values. Consider also the following example: you want
to estimate x from two random numbers taken uniformly from the
interval[x-1,x+1]. The interval defined by the two random numbers is a 50%
confidence interval (half of the times x will be in the middle of the two).
However, for some specific realizations (when the distance between the numbers
is grater than one) you will be able to say with certainty that the interval
contains x. See also
[https://github.com/richarddmorey/ConfidenceIntervalsFallacy](https://github.com/richarddmorey/ConfidenceIntervalsFallacy)

~~~
Hermel
Thanks, you are right. I thought I understood confidence intervals, but
apparently I did not.

------
dluan
I think it'd be cool if there was a rogue batman scientist on the loose, self
or publicly funded, who'd go around cleaning up the streets by replicating and
forcing retractions.

Why batman? Because he had to fight cops just as often as baddies. Not that
the current system is necessarily corrupt, but that our current system is
almost as blind when it comes to recognition of good science. Our current
system is a bit, well, dumb.

~~~
kansface
You'd need a rogue in each specialty - I'm also guessing that replicating
studies doesn't lead to tenure or papers in Nature.

~~~
dluan
Batman doesn't care about tenure.

------
jonasvp
From the article:

> some people have begun to ask: “Is science broken?” > I’ve spent many months
> asking dozens of scientists this question, and the answer I’ve found is a
> resounding no.

News at 11: cognitive dissonance is a thing.

~~~
robotresearcher
Science's old and new fruits benefit the lives of almost everyone on the
planet every day and revolutionary new technologies, therapies and knowledge
are created faster than ever before. It's accelerating, not slowing down and
certainly not stopping. So it's not very broken.

Things can certainly improve, and they usually do, perhaps a bit gradually.
Take a long view. Science is funded by the public and not the church these
days for example. But 'broken'? Not really.

Edit: and non-scientists are very quick to think they know the problems and
offer solutions. Working scientists are actively discussing the problems and
trying things out all the time. These are not the dumbest people you'll meet.
The current model is a bit like democracy - it's the worst system apart from
all the other things we've ever tried.

To be clear: I'm not an apologist and I'm not sweeping troubles under the rug.
It's just that it is complicated, we're working on it, and it is currently
much too productive to be called broken. We have to be careful not to break
it.

~~~
tdaltonc
We don't expect insiders to objectively improve most social systems (politics,
business, education, defense, etc). Why should we expect scientists to fix
science (as opposed to just pushing the system to serve the insiders)?

~~~
robotresearcher
We do expect insiders to improve those systems. Why would outsiders know what
needs done?

~~~
WalterSear
We would like them to, but it's reasonable not to expect them to.

------
Xcelerate
I think there needs to be a classification system for the "rigor" of studies.
A low rigor rating doesn't mean bad science; it just means that the topic is
very complex and difficult to study.

For instance, studies that compute the gyromagnetic ratio of the electron (to
10 decimal places no less) and then compare that against the experimentally
obtained value would be classified as "high rigor". Studies that assess
whether watermelon is linked to heart disease would be "low rigor".

The question now is how to come up with a rigorous classification system...

~~~
tikhonj
Rigor is a loaded word, but otherwise I love the proposal. Maybe call it
confidence instead, or something?

I'd love this to extend to pursuits outside of science too, ranging from
"mathematical result verified with multiple proof assistants" to "this thing I
made up and argued persuasively". Unfortunately, most of my arguments for why
Haskell is the best language would fall dangerously close to that end of the
spectrum :P.

------
WalterSear
That made a pretty convincing argument that it's broken, followed by
thoroughly unconvincing, hand-wavey, claim to victory because "we eventually
muddle through."

------
joaorico
I like this summary of the evidence that most published research is false[1]:

-"One of the hottest topics in science has two main conclusions:
    
    
      Most published research is false
    
      There is a reproducibility crisis in science
    

The first claim is often stated in a slightly different way: that most results
of scientific experiments do not replicate."

\- [Ioannidis'] "Paper: Why most published research findings are false.

    
    
      Main idea: People use hypothesis testing to determine if specific scientific discoveries are significant. This significance calculation is used as a screening mechanism in the scientific literature. Under assumptions about the way people perform these tests and report them it is possible to construct a universe where most published findings are false positive results.
    
      Important drawback: The paper contains no real data, it is purely based on conjecture and simulation."
    

\- then it summarises in the same way 7 other papers, including the Many Labs
results.

-"I do think that the reviewed papers are important contributions because they draw attention to real concerns about the modern scientific process. Namely
    
    
      We need more statistical literacy
      We need more computational literacy
      We need to require code be published
      We need mechanisms of peer review that deal with code
      We need a culture that doesn't use reproducibility as a weapon
      We need increased transparency in review and evaluation of papers"
    

\- the final paragraph:

"The Many Labs results suggest that the hype about the failures of science
are, at the very least, premature. I think an equally important idea is that
science has pretty much always worked with some number of false positive and
irreplicable studies. This was beautifully described by Jared Horvath in this
blog post from the Economist. I think the take home message is that regardless
of the rate of false discoveries, the scientific process has led to amazing
and life-altering discoveries."

[1] [http://simplystatistics.org/2013/12/16/a-summary-of-the-
evid...](http://simplystatistics.org/2013/12/16/a-summary-of-the-evidence-
that-most-published-research-is-false/)

------
1971genocide
Science and modern capitalistic thinking do not work well together.

Most people in business view any sort of rigorous research as a "cost" center.
And this thinking has diffused into government too.

On the other hand marketing and advertising produces the most value for
dollar. Why ?

In a market those activities generate new demand - something that a capitastic
system needs in order to survive.

Why spend 20 years doing research while we could spend 1 year aggressively
expanding our market. Even a 1% growth would he worth it given a 0% growth if
the money was spent doing research.

I like that rather than villify a single person Nate silver points the finger
at the system. This type of thinking is where we should all be heading
towards.

~~~
brc
On the contrary, only free markets generate enough surplus capital for it to
be invested in long term research. Planned and managed economies don't
generate enough production to even feed and supply their existing populations
well.

It's a false dichotomy to say that you can only choose between 20 year
research and 1 year marketing. Plenty of private money goes into research with
long term payoff. There are private companies engaged in fusion research,
because the payoff is huge if it is achieved. Even the public money that goes
into publicly funded research comes from private earnings in the first place.
So science and free markets are absolutely suited to each other. That's before
we even start discussing corruption of objectives in totalitarian states,
based on leaders whims (ie, Nazi science research, Lysenko etc)

~~~
Umn55
"Planned and managed economies don't generate enough production to even feed
and supply their existing populations well."

Sorry to tell you but capitalist society as it exists is planned and managed
economy. Large corporations are command economies.

You should see what science has discovered about the brain:

[https://www.youtube.com/watch?v=PYmi0DLzBdQ](https://www.youtube.com/watch?v=PYmi0DLzBdQ)

~~~
brc
If you truly believe that you live in a planned economy there is absolutely
nothing more I can add.

~~~
Umn55
What do you think copyright, IP and patent law is? AKA you have the money to
change the rules of the game.

------
animefan
This article is a good summary of the issues of the (mis)use of statistics in
science. Having done some research myself, in most cases the statistics don't
scream a particular message to you, and it's really hard to understand the
data. Even without any external pressures, statistical tools like p-values
have limits. If you run 10 different models (all a priori reasonable) and 8 of
them seem to say roughly the same thing, does that mean your result is
correct?

------
unabst
Science might not be broken for the reasons mentioned, but I find the article
overlooks the greatest evidence of all which is the article itself.

Objectively deconstructing the practice of p-hacking is science at work _and_
a scientific work. So say, if we devised a way to measure p-value based
research for it's p-hack-ability, we would have a way to validate what that
research is saying (or not saying), as well as test the integrity of the
research and its researchers.

------
speechduh
I saw a great talk about this @ SIAM data mining by Dave Madigan, for
[http://omop.org/](http://omop.org/)

Basically, they took a TON of demographic research in the health sciences,
explored all possible hyperparameter tunings, and found that they could get
p-values of 0.05 in either direction for most of the papers depending on
choice of data sources and many other types of hyperparameters.

