Hacker News new | past | comments | ask | show | jobs | submit login
We Should Not Accept Scientific Results That Have Not Been Repeated (nautil.us)
910 points by dnetesn on July 29, 2016 | hide | past | favorite | 275 comments



Define repetition.

It's not as simple as that, for all sciences - once again an article on repeatability seems to have focused on medicinal drug research (it's usually that or psychology), and labelled the entire "Scientific community" as 'rampant' with " statistical, technical, and psychological biases".

How about, Physics?

The LHC has only been built once - it is the only accelerator we have that has seen the Higgs boson. The confirmation between ATLAS and CMS could be interpreted as merely internal cross-referencing - it is still using the same acceleration source. But everyone believes the results, and believes that they represent the Higgs. This isn't observed once in the experiment, it is observed many, many times, and very large amounts of scientists time are spent imagining, looking for, and measuring, any possible effect that could cause a distortion or bias to the data. When it costs billions to construct your experiment, sometimes reproducing the exact same thing can be hard.

The same lengths are gone to in order to find alternate explanations or interpretations of the result data. If they don't, they know that some very hard questions are going to be asked - and there will be hard questions asked anyway, especially for extraordinary claims - look at e.g. DAMA/LIBRA which for years has observed what looks like indirect evidence for dark matter, but very few people actually believe it - the results remain unexplained whilst other experiments probe the same regions in different ways.

Repetition is good, of course, but isn't a replacement for good science in the first place.


ATLAS, CMS, ALICE, LHCb, etc. are all different experiments that just happen to share the same accelerator. They are done using different detectors, designed, built, operated and the data is analyzed by different groups of people.

If you follow your logic, that the different experiments using the same accelerator negates the whole thing, to the extreme then doing two experiments on the same planet/solar system/universe won't be enough..

And I don't buy your inverse argument that one (good) experiment is good enough either. It is very difficult to tell from outside the group if the experiment is actually good or not (although you probably can tell if it's bad). Screw-ups can happen no matter how many people look at the data if there is some flaw in the experimental setup - only way to make really sure is to use different experiments to measure the same thing.


I think you've got my argument completely backwards - I'm pointing out that if repetition as implied by the article is everything, then results from e.g. the LHC are discounted.

Similarly, I am not saying that you don't need to repeat - just that it isn't the be all and end all of what defines 'science'. Supporting this interpretation is that I mentioned DAMA. Nobody is accusing them of not taking care, but nobody really believes the result either.


There is a philosophical assumption underpining astronomy and cosmology, the so-called "cosmological principle."[0] This is an assumption (albeit not completely unfalsifiable) that physics across the universe at large scales is the same for all observers, that here on earth there should be similar to physics in andromeda. Of course, all scientific experiments done have been done on or near earth, and thus, our interpretation of astronomy on earth is tempered by experiments we do here, for example, we assume that the speed of light here is the same as the speed of light elsewhere.

This is, in principle, a problem. We unfortunately have no way to measure the speed of light in andromeda, unlike what we can here on earth, so we really have no idea if our astronomical models are wrong given the non-constancy of the speed of light in andromeda or elsewhere.

So, yes, in principle not being able to repeat science experiments everywhere in the universe is a problem. However, I think if one thinks a little less broadly, testing newtonian gravity in say, Italy and also in China shows at least across the Earth, the phenomenon is similar. Then, at least one can say, "certainly, gravity is the same in Italy and China, and perhaps across the surface of the Earth." That is a stronger statement than "gravity is this way in Italy." Ordering claims by "scientific goodness", we can say that

   Gravity is the same *across the universe* > gravity is the same across the Earth
                                             > gravity is in Italy
My point here is that even if one can't fulfill the extreme of testing theories everywhere in the universe, one can progressively prove stronger and stronger statements regarding the validity of scientific theories.

Somewhere the LHC stands between "the SM is validated across the universe" and "the SM is validated at one detector at the LHC". Yes, it would be "better" if the Higgs was found at other experiments, but the current situation is "better" than if the Higgs was found at one detector there and not in any other. Repeatability, like everything in science, is not a binary step function but is some continuous function over the domain [0,1].

[0] https://en.wikipedia.org/wiki/Cosmological_principle


But we can measure gravity in faraway galaxies. We don't need to actually go there, we have telescopes :)


How can you measure the strength of the gravitational constant if you don't know the masses of the objects in question?


The Chandrasekhar mass depends on G and a bunch of fundamental constants, and that's the mass of a type Ia supernova. Unfortunately this Wikipedia article doesn't talk about astronomy much:

https://en.wikipedia.org/wiki/Gravitational_constant#History...

But it does point at one measurement of G on cosmological timescales and distances from 2006:

https://arxiv.org/abs/1402.1534

There are other papers with similar measurements published in the past two decades.


That's only true if all the other physics involved are the same wherever that supernova is; your own link mentions this. We can't prove that's the case since we can't test the individual parts of that theory - it's possible to construct a different set of physics that produces results identical to the physics we're familiar with.

That isn't a particularly fruitful approach since it doesn't permit any real discovery - it's rather brain-in-a-jar - but it is still something you have to assume away.


There are some phenomena which allow good remote assessments of properties, based on various interactions. Cephid variable brightness, for example, serving to validate the expanding universe theory. The stellar Main Sequence gives a good sense of stellar masses. Spectroscopy works pretty well across a wide span of the Universe (several billions of light years). Etc.

Since these operate in concert across the observable universe, and themselves involve various other interactions (elements, strong and week nuclear forces, gravity, speed of light, rates of hydrogen fusion, etc.), we can conclude that either there is no appreciable change in any of the underlying fundamental constants or that change occurs in a compensated fashion such that no net change is detectable.

The second fails Occam's Razor. The conclusion that the laws of physics appear to be similar throughout observable space seems robust.


I read the article carefully, and was unable to find where it says what you say it says.

In particular, you're making super-strong statements like "be all and end all of what defines 'science'" -- that's not what I see in the article. The article is reasonably pointing out that we have a repetition crisis and that more emphasis should be placed on it. That's not the various super-strong things you're claiming you see in it.


The very title claims that unrepeated results shouldn't be accepted.


The contents of the article are the only thing that ever matters.

The title is picked by editors to get the most people to click on it.


I think the title is all that matters. I'm only here for the discussion, and the title defines the topic.


Your approach will lead you to having an incorrect take quite often. You should reconsider unless you enjoy being wrong for no particular reason.


Well it hasnt, and Ive had this "approach" for years.


That's like going to a book club without having read the book.

You miss all nuance.

It's becomes very quickly apparent to those who've read the article when others comment without having read beyond the title.


Youre wrong


See? There you go again. Man, trolls here are way less funny than on reddit.


Not a troll


Are you under the impression you aren't wrong here?


This is why I don't participate in HN discussions. The level of vapidness is mind boggling.


If the title is click-bait that doesn't match the content, it should be corrected when submitting it on HN.


2nd paragraph.


In fact, CMS and ATLAS are sort of same experiment repeated twice, just in parallel rather than serially. Repeatable doesn't imply identical experimental design.


Does it not? If the experimental design is not preserved, don't you open yourself to inconsistent controls between the experiments? Where do you draw the line?


If you think about it, complete replication of any experiment is useless, and impossible.

But that doesn't matter, because it is not the experiment you are trying to replicate, is the effect or the observation.

Carefully changing the design of the experiment can allow to verify that your explanation of the effect or observation is accurate.

Of course, if the change the experiment too much, a failure to replicated will have less useful information.


In hindsight I do think it's true that you want numerous experiments with possibly different controls with the exception of whatever is important for the effect. In that sense you want your experiment to "discover the right abstraction" of details that your principle works over.

So I think what has to occur is a gradual "loosening up" of the controls from strict replication to weak replication, with both types of replication giving information about the effect you are testing: its validity and its generality, respectively.


We don't need to 'define repetition', we need to foster a culture that a) accepts repetition and b) does not accept something for a fact just because it's in a journal. Right now, (a) is not even acceptable; nobody will publish a replication study. Ofc, LHC is impossible to replicate, but the vast majority of life science studies are.

I should think this is mostly needed in life sciences. Other, more 'exact' sciences seem to not have this problem.


> we need to foster a culture that a) accepts repetition

Do we?

I don't think we do. I think we need to foster a culture of honesty and rigor. Of good science. Which is decidedly different from fostering a culture of "repetition" for its own sake.

Paying for the cost of mountains upon mountains of lab techs and materials that it would require to replicate every study published in a major journal just isn't a good use of ever-dwindling science dollars. Replicate where it's not far off the critical path. Replicate where the study is going to have a profound effect on the direction of research in several labs. But don't just replicate because "science!"

In fact, one could argue that the increased strain on funding sources introduced by the huge cost of reproducing a bunch of stuff would increase the cut-throat culture of science and thereby decrease the scientist's natural proclivity toward honesty.

> and b) does not accept something for a fact just because it's in a journal

Again, it's entirely unclear what you mean here.

It's impossible to re-verify every single paper you read (I've read three since breakfast). That would be like re-writing every single line of code of every dependency you pull into a project.

And I'm pretty sure literally no scientist takes a paper's own description of its results at face value without reading through methods and looking at (at least) a summary of the data.

Taking papers at face value is really only a problem in science reporting and at (very) sub-par institutions/venues.

I don't care about the latter, and neither should you.

WRT the former, science reporters often grossly misunderstand the paper anyways. All the good reproducible science in the world is of zero help if science reporters are going to bastardize the results beyond recognition anyways...


> I don't think we do. I think we need to foster a culture of honesty and rigor. Of good science. Which is decidedly different from fostering a culture of "repetition" for its own sake.

No one is proposing repetition for its own sake. The point of repetition is to create rigor, and you can't do rigorous science without repetition.

> Paying for the cost of mountains upon mountains of lab techs and materials that it would require to replicate every study published in a major journal just isn't a good use of ever-dwindling science dollars. Replicate where it's not far off the critical path. Replicate where the study is going to have a profound effect on the direction of research in several labs. But don't just replicate because "science!"

I could see a valid argument for only doing science that will be worth replicating, because if you don't bother to replicate you aren't really proving anything.


I could see a valid argument for only doing science that will be worth replicating, because if you don't bother to replicate you aren't really proving anything.

Exactly. A lot of the science I've done should not be replicated. If someone told me they wanted to replicate it, I would urge them not to. Not because I have something to hide. But because some other lab did something strictly superior that should be replicated instead. Or because the experiment asked the wrong questions. Or because the experiment itself could be pretty easily re-designed to avoid some pretty major threats.

The problem is that is that hindsight really is 20/20. It's kind of impossible to ONLY do good science. So it's important to have the facility to recognize when science (including your own) isn't good -- or is good but not as good as something else -- and is therefore not worth replicating.

I guess the two key insights are:

1. Not all science is worth replicating (either because it's too expensive or for some other reason).

2. Replication doesn't necessarily reduce doubt (particularly in the case of poorly designed experiments, or when the experiment asks the wrong questions).


This is a really good post which contributes to the conversation. Why make it on a throwaway account? We need more of this here!


>I don't think we do. I think we need to foster a culture of honesty and rigor.

Foster all you want, an honor system doesn't protect you from the incompetent people and dishonest people publishing junk for funding or self-promotion. If we had a culture if repetition it promotes cross checks that make up for the flaws in human nature.


Trust but confirm.

The entire point of science, and this is not a hyperbole, is that results are reproducible. If the experiment is not reproducible one must take the results on faith. There is no such thing as faith based science.

In order to build a shared body of knowledge based on scientific facts, then, results must be repeated. It is how different people can talk about the same thing, without fearing an asymmetry of knowledge and understanding about the axioms on which their discussion of the world rests. Otherwise it is faith or religion or narrative, something other than than science.


> The entire point of science, and this is not a hyperbole, is that results are reproducible.

No, it's not. The point of science -- its end -- is to understand the natural world. Or to cure diseases. Or, more cynically, to learn how build big bombs and more manipulative adverts.

Reproducible results are the means, not the end.

I know that seems like hair splitting, but it's important. Epistemological purity can do just as much harm as good, because even the most pure science is usually motivated more by "understand the natural world" or "improving our understanding of some relevant mathematical abstraction", rather than by episemological purity itself.

To be quite honest about it, I feel that this sort of epistemological purity that insists on reproducability as a good in itself feels a lot like some sort of legalistic religion.

> If the experiment is not reproducible one must take the results on faith. There is no such thing as faith based science.

I don't think I (or anyone here) is arguing against this. Or against reproducing important experiments.

I'm wholly supportive of reproducing results when it makes sense. But I'm also wary of, in a resource-constrained environment, prefering reproducing results over producing good science in the first place.

To be concrete about it, I'll always prefer a single (set of) instance(s) of a well-designed and expertly executed experiment over 10 reproductions of a crappy experiment. In the former case I at least know what I don't know. In the latter case, the data from the experiment -- no matter how many times it's reproduced -- might be impossible to interpret in anything approach a useful way.

Put simply, a lot of science isn't worth the effort of reproducing. Either because it's crap science, or because the cost of reproducing is too high and the documentation/oversight of the protocol sufficiently rigorous.

The point of science isn't an to perfectly adhere to the legalistic tradition of a Baconian religion. The point of science is learn things.


The only way we "understand" the natural world is by making verifiable predictions. If those predictions can't be consistently verified, then we don't understand the relevant phenomenon at all, and we haven't learned anything.

> To be concrete about it, I'll always prefer a single (set of) instance(s) of a well-designed and expertly executed experiment over 10 reproductions of a crappy experiment.

I'd take 2-3 repetitions of a moderately well-designed and moderately executed experiments over either. Even the most well-designed and executed experimental protocols can produce spurious results, due to the stochastic nature of the universe.


I think the issue here is that scientists want to get funding, have prestige, and perhaps learn about the world in the process. The public wants to be cured of diseases, see new technologies, and learn about the world too.

There is a disconnect between the motivation and capability of scientists in the current funding system and what the public wants. So an easy solution is that if the public wants reproduce-able science, they need to pay for it. I'm sure some scientists who couldn't make it into Harvard or Caltech (ie., me) and thus can't do cutting edge science would be happy to take the dollars, have a living, and just reproduce the work of others. But you can't simply declare to scientists they should do X while not enabling them to.


Science is a tool. Tools are means, not ends. We use sciene to gather facts we can agree on. But sciene isn't the "truth". Science isn't the facts. It a process. That produces facts. If and only if they are reproducible. Otherwise it is faith, religious, or a narative.

What's more, the scientific process is used discretely. One fact at a time. Understanding of our world, its meaning, these things are accumulative, over the entire context of our experience, and utilize things like feelings, and faith, and religion, and narative, to create.


> Taking papers at face value is really only a problem in science reporting and at (very) sub-par institutions/venues. > WRT the former, science reporters often grossly misunderstand the paper anyways. All the good reproducible science in the world is of zero help if science reporters are going to bastardize the results beyond recognition anyways...

Science is funded by the public, and done for the public. Good science reporting is very important to ensure that science continues to get funded. Too often scientific papers are written in a way that makes them incomprehensible to anyone outside of the field, whether that is through pressure to use the least amount of words possible or use of technical jargon.


There's also the option that papers are written the way they are so that they still remain papers and not books. A five page paper on someone's findings is much easier to read than a 20-30 page paper where field specific knowledge is redefined and explained instead of referenced.



I know I made the original "dwindling funding" claim, but it's actually a red herring. I should have said something like "increasingly competitive nature of grants". The argument relevant to this article is NOT about whether there's enough funding for science. Rather, the argument is about how difficult it is to get funding. I think it's fairly uncontroversial (and correct) to say that getting good science funded has become considerably harder over the years. I'm not sure anyone has done the work of trying to quantify that; maybe someone else can help find data for that.

But to address your question anyways:

1. USFG scientific funding instutitions are only one source of science funding. There are many others. If you look across the federal government, there's a downward trend: http://www.aaas.org/sites/default/files/DefNon%3B.jpg

One must also take into account non-federal-governemnt sources, which in many cases have substantially decreased their investments in R&D since 2008.

2. As pcnt of GDP there's been a steady decline: http://www.aaas.org/sites/default/files/RDGDP%3B.jpg

3. From an impact-on-culture perspective -- which is the relevant one in my comment -- I think (2) is more intresting than your data and also more interesting than (1). The question should be "how difficult is it to fund good science", not "how much are spending in absolute or relative terms". This is, of course, very difficult to quantify. But looking at percentage of GDP is at least better than looking at absolute dollars.


Its not repetition for its own sake. Its for the sake of good science.

I find many who are against repetition have certain views that are helped by soft science.


I suppose I should remind you, then, that the up-thread comment we're responding to was written in the context of a physics experiment.

And I'm about as far from "soft sciences" as you can get.


Oh I was not commenting on what you do. I was commenting that many who hold that view of non repetition have leanings that are favored by soft sciences. They dont feel the "need" for repetition even in hard science. It is a personal bias thing that influences their need. It can happen when emotion overcomes hard logic and science.


> many who hold that view of non repetition have leanings that are favored by soft sciences. They dont feel the "need" for repetition even in hard science.

Really? You're suggesting that psychologists (to arbitrarily pick a softer science) would deny physicists (to arbitrarily pick a harder science) should reproduce their studies where possible?

That seems remarkable to me, perhaps I've missed these discussions. Can you provide evidence that this is a pervasive movement in some sciences, rather than the opinion of a few?

"Many" is a trigger weasel word, of course, and needs backing up.

My interpretation -- perhaps incorrect -- is that you feel the softer sciences are wilfully undermining the quality of harder sciences. I very much doubt this is the case. Some philosophers of science and some softer science key influencers may introduce difficult and challenging questions about the appropriateness and usefulness of some research methodologies (as are people in this thread) but I doubt they'd make the blanket assertion you're suggesting.


You're clearly reading far too much into the parent. Do you have an emotional attachment to "soft" sciences that is causing you to be defensive in the face of no attack at all?


Project much? Looks like you're the one getting all heated and angry at someone defending the social sciences and are trying to attack them. It's better not to bring emotion into discussions of science.


Agreed , only important studies are worth replicating, the problem is that no major journal will publish a replication. Journals in general push for novelty rather than quality here.

b) can be a problem with meta-analyses and reviews. When gathering data "from the literature" , not all the data gathered is of the same quality/certainty, which can have a compounding effect. Or when someone from a mathematical or computational field tries to create a model using data reported in the literature. It is often difficult when working in an interdisciplinary environment to assess the quality of everything you read, especially if you re not familiar with all the experimental methods.

Also, off topic, but i wonder why you chose a throwaway account to weigh into this. i hope it's not a "science politics" reason.


Results can be irreproducible even if nobody was dishonest.

Reproduction is a way of bolstering rigor.


codewise, wouldnt the analog be writing a new set of unit tests for each dependency you pull?

it'd probably make sense to do that, actually, so you can verfiy that the dependency actually fits your use case as time goes on.


> I think we need to foster a culture of honesty and rigor. Of good science. Which is decidedly different from fostering a culture of "repetition" for its own sake.

That is not sufficient. Honesty and rigor is of course required for good science, but it is not sufficient.

Even if honesty and rigor you WILL still get false positives. Statistics is used to measure how likely this is. Statistical methods do nothing to tell you if any particular case happens to be a false positive. For many studies a confidence of 95% is considered good enough to publish, but if you do the math that means a honest researcher who publishes 20 such studies has probably published one false result! If there are 20 studies published in a journal statistically one is false. Thus replication is important.

It gets worse though, the unexpected is published more often - if it is true it means a major change to our current theories and this is important to published. However our theories have been created and refined over many years, it is somewhat unlikely they are wrong (but not impossible). Or to put it a different way, if I carefully drop big and small rocks off the leaning town of Pisa and measure that the large rock "falls faster" that is more likely to be published than if I found they fell at the same speed. I think most of us would be suspicious of that result, but after examining my "methods and data" will not show any mistake I made. Most science is in areas where wrong results are not so obvious.

> It's impossible to re-verify every single paper you read

True, but somebody needs to re-verify every paper. It need not be you personally, but someone needs to. Meta-analysis only works if people re-verify every paper. Note that you don't need to do the exact experiment, verifying results with a different experiment design is probably more useful than repeating the exact same experiment: it might by chance remove a design factors that we don't even know we need to account for yet.

> And I'm pretty sure literally no scientist takes a paper's own description of its results at face value without reading through methods and looking at (at least) a summary of the data.

I hope not, but even if they check out, it doesn't follow that things are correct. Maybe something wasn't calibrated correctly and that wasn't noticed. Maybe there are other random factors nobody knows to account for today.

The above all assumes that good science is possible. In medical fields you may only have a few case studies: several different doctors saw different people each with some rare disease, tried some treatment and got some result. There is no control, no blinding, and a sample size of 1. But it is a rare disease so you cannot do better.


The key here is we need a culture of skepticism. That's the fundamental core philosophical trait of the enlightenment: not belief in what anyone says (just 'cause they say it or have authority to say it) but doubt in claims until they are rationally proven, and proven again, and proven again. Skepticism builds on itself because a culture where everyone is skeptical is a culture where the burden of proof lies with the claimant. This applies to science as well: whether an experiment is repeated once or hundreds of times, the real measure is to what degree it convinces the winnowing number of skeptics aligned against the idea.


Does doing 50 studies and only publishing the 3 that produce favorable results count as repetition? Because, BTW, that's exactly what Big Pharma does, and also intrinsically what the bias towards publishing only significant results does.


Pharma needs to send information about studies beforehand and regulators pay attention to this for new drugs which counters the effect your talking about. Further, they are happy to report inconclusive results the FDA mostly cares about harm to people not effectiveness.

Now, after drugs are on the market then there is another wave of less reputable research. But, the FDA has already approved the drug so they don't care as much.


> Because, BTW, that's exactly what Big Pharma does

That's exactly what almost everyone in academia does. Money is not the only corrupting factor.


>Other, more 'exact' sciences seem to not have this problem

I'm not sure of this. By 'exact' disciplines I'm assuming you mean disciplines of science more dependent on mathematical proofs. A CACM paper a little bit ago discusses this, and found a large number of papers where not repeatable. If I recall, it mostly focused on that nobody shares code and/or the code didn't build.

http://cacm.acm.org/magazines/2016/3/198873-repeatability-in...


I think there's also a certain lack of transparency in the scientific process that needs to be fixed. But that's just like my opinion.


and b) does not accept something for a fact just because it's in a journal.

I can't say I know any scientists who think any differently.


You ask that like it is something new. Reproducibility[0] is something at the core of science.

The problem these days is that spending time doing replication is not glamourous and will not help you get funds.

[0]: https://en.wikipedia.org/wiki/Reproducibility


The problem is worst in fields that don't have reproducibility 'built in' to the field.

I do genetics and development, and the main sanity check we have is the distribution of mutant lines. If you say that mutant X does Y, other people are likely going to see that (or not) when they get their hands on it and start poking around. This strength of working with mutants is at the core of the success of molecular biology. Even if don't set out to confirm someone else's results, you're quite likely to come across some inconsistencies in the course of investigation.

If a field lacks that sort of mechanism, they need to take special care to address reproducibility.


Exactly. Most 'good science' start with premises A, B, and C from their respective papers, and go on to show that X is likely true because of this new data. In general, it would be impossible to propose X if A, B, and C were not themselves true - they were the foundations of this new research. In that way, even though the publication of X may not have directly (though often precisely directly) repeat the experiments of A, B, and C, they still have replicated the core concepts.

As noted, this does not get rid of the confounding negative effects, where paper D was also used as a (in retrospect, faulty) premise as well. Though no one ever actually comes out and says 'D' is bad.

Over time, A, B, and C theories accrue significant weight while D falls off. It's not as explicit as some may like, but in the end the spirit of replication is well and alive.


Add to that the impossibility to publish negative studies. Science really needs to get its shit straight.


I suppose the funding agencies could play a role in this. When you submit a proposal that cites prior work as a basis or motivation for you project, you should be required to show that either 1) The cited works have been reproduced, or 2) you're going to reproduce them.


And also we need to start publishing negative studies way more.


I dimly recall some physics papers from years ago, with titles such as: "New experimental limit on the existence of..." Whenever my project was going particularly badly, we'd lighten things up by describing it as a negative study. One such highlight was: "New experimental limit on the existence of oil in the vacuum pump."


I think the difference is that we are able to test and understand/verify the operation of the LHC at multiple levels. It's not like they just built what was on paper, flicked the ON switch, and assumed all the data that came out was correct. The components were individually tested and collisions at energies accessible by other supercolliders were done so we could compare to other results.

Unfortunately for medical drugs and psychology, researchers are mostly gathering data without an understanding of the underlying mechanisms. There are also virtually never have proposals which can be tested for compliance with reality in a quantifiable and isolated way as we can with physics, chemistry, or parts of biology.

So I feel the replication crisis is not a matter of various fields just not knowing how to do "good science", but that these fields by their nature make clean hypothesis testing vastly more difficult and p-hacking and statistical trickery (intentional or otherwise) harder to sift out.


> but that these fields by their nature make clean hypothesis testing vastly more difficult and p-hacking and statistical trickery (intentional or otherwise) harder to sift out.

How certain are we that its 'the nature' of these fields rather than political choices that were solidified long ago? There could be much larger samples and six-sigma confidence in life sciences if the grants were allocated differently. I know this is happening, for example, in neuroscience where the (private) Allen Institute is creating significantly more rigorous (and useful) datasets compared to the bulk of studies, because they are funding their studies differently.


Thinking about physics, the LHC experiment benefits from not being a completely isolated study. Instead, its results are multiple pieces of a larger jigsaw puzzle or web of interconnected results that include both a quantitative predictive theory and other experiments that test other predictions of the same theory. And the Higgs was not the only thing that could be tested at the LHC. For instance, it probably replicated a huge slew of prior studies during construction and testing of the system, from already-known particles all the way down to the calibration of voltmeters.

In physics, some theories have been tested and interconnected to such a degree, that if an experiment conflicts with theory, it's reasonable to suspect the experiment rather than discard the theory. That's what happened with that apparent faster-than-light travel of neutrinos captured in Italy. It was something like a partially unplugged cable. Once corrections were made for the cable, the theory snapped right back into place. Those theories can be trusted as akin to tools in day to day physics. For my humble graduate experiment, refutation of any major law would have led me to fix the experiment and try again. I simply assumed things like conservation of charge. Such laws probably include electrodynamics, gravitation, quantum mechanics, thermodynamics, and Darwinian evolution.

And this is a common feature of major studies in physics, chemistry, evolutionary biology, and other sciences as well.

Where we may run into trouble is in branches of sciences that don't have over-arching theories or that web of connected results. At the other end of the scale are areas of sciences where the results are mainly a database of observed statistical correlations, with little or no apparent progress towards a general theory. When someone publishes a surprising result, there is no reason to say that the experiment must have been done wrong. You just add it to the pile. The best that can be hoped for is that some kind of meta-analysis will demonstrate an inconsistency among multiple studies. Those fields don't have the "hard" theories that can be used as tools to test experimental results.

To be fair, there's another situation, where you have not one, but zero, reproducible results. That's the state of affairs in the search to unify gravity and quantum mechanics.


There were several "bogey Higgs" at different energies during earlier Tevatron and LHC runs. Maybe these were 3-5 events which hinted at a signal. Then they disappeared. But the real Higgs was not confirmed until there were 40 events of two different decay paths at two different detectors, i.e. 5-sigma significance. Different runs and decay paths are like multiple replications.


The LHC is somewhat of a special case, because repetition is extremely difficult for obvious reasons. That doesn't preclude making repetition an appropriate requirement in 99.9% of other scientific findings. In this particular case, we just have to make sure that the LHC guys conform to extremely stringent burdens of proof and high scientific accountability, which they already impose on themselves but which the 99.9% do not impose on themselves for a variety of reasons (an uncomfortable truth, but due to many things like pressure to publish and generally being less good, a truth nonetheless).


Physics is a special case for several reasons, [1] most importantly there is a good theory with the effect that different experiments measure the same thing. So once you start looking for a specific deviation from the standard model, you can not only look at accelerator experiments, but also at g-2 experiments [2], or lamp shift or beta decay in the case of the electro-weak interaction. (Or at different places in accelerator experiments, in the sense that data at one point should lead to corrections at another point.)

In other disciplines, the role of repetition is a lot more important. The look elsewhere effect means, that your statistical significance depends on all other research, published or unpublished. [3] If you are doing a repetition study, the look elsewhere effect just goes away. So good science in the first place is very important of course, but there is a lot of value in doing repeated experiments for fundamental statistical reasons. Especially in fields were there is no good predictive theory (everywhere except physics) and where it is very hard to go to very high significance.

[1] I recently read

John Lewis Gaddis, The Landscape of History, 2004

which is a essay on epistemology of history and in a way tries to reject "physics envy," with the argument that physics is everything where there is a good theory, so in a way where we are winning. And consequently physics envy looks a lot like cargo cult, other disciplines need to figure out their own epistemology.

[2] Incredible high precision experiments which measure the magnetic moment of electrons.

[3] https://xkcd.com/882/


Not that it isn't important in physics, it is important but for different reasons. Nobody is coming around selling physicists anything based on a non-repeated experiment. Drug studies has a lot of perverse incentives in play, and the subsequent interpretation of results for, for example, off-label use are also skewed toward one set of interpretations. There could be billions of dollars in sales at stake.


>Repetition is good, of course, but isn't a replacement for good science in the first place.

Repetition is science.

>Define repetition.

Google says: "the recurrence of an action or event."

Science requires 1) a prediction of an observation 2) repeated observation consistent with the prediction.

If there is no repeated observation of what was predicted, it is not science.

>But everyone believes the results, and believes that they represent the Higgs.

It doesn't matter what is believed. "Everyone" used to believe all kinds of things, that doesn't mean that those things were true, or accurate. Repeated observation of prior predictions is all that matters.

>When it costs billions to construct your experiment, sometimes reproducing the exact same thing can be hard.

That doesn't excuse, or allow in, things that haven't been reproduced.


With all respect I do not agree with your example for the lhc . For two reasons primarily: The first has already been mentioned about various groups and experiments / detectors. The second, I think is more important. In order to claim discoveries at any of those experiments certain statistical constraints must be met. The 5 sigma rule is by its very nature garnered by repetition and is a necessary because of the nature of quantum mechanics and particle physics. Here is a write up on cern and 5 sigma http://physicsbuzz.physicscentral.com/2012/07/does-5-sigma-d...


What has been seen is reading of instruments which has been built according to some mathematical model to register conditions which supposed to happen according to bunch of theories and then a lot of calculations follows which presumably conform the assumptions according to the reading.

The simple analogy is pictures one has on fMRI. These pictures are, well, pictures based on approximations, not the mind. Not even close.

There is a paradox - one cannot make an instrument out of atoms to "see" what is going on inside these atoms. What they saw are pictures created out of mathematical models, not reality as it is.


Repetition is given all the facts and original conditions the same or similar result can be expected.

Physics has its own cases such as cold fusion which is unreproducable and should not be if all the above hold true.


>The confirmation between ATLAS and CMS could be interpreted as merely internal cross-referencing

Why? ATLAS and CMS are different detectors operated by different teams.


The difference is maths. E.g. Einstein proved his theory of gravity mathematically, THEN it was proved via empirical observation.

We don't have a way to mathematically prove a drug for depression a psychological phenomena.


What does "prove mathematically" mean here? He constructed a self-consistent mathematical theory, and validated that it reproduced both the special relativistic and Newtonian limits, but I don't think "proved mathematically" is a good way to state that.

You can't "mathematically prove" a theory agrees with reality except by reconciling it with experiments (possibly indirectly through links to previous theories), so the only difference here is that a non-ad-hoc mathematical theory of depression medication doesn't appear to be feasible, not that physics has some magic alternate method of proof.


Did you google it?

https://en.wikipedia.org/wiki/Mathematical_proof

You're missing the point of what I said w/ the Einstein example. He proved it via maths. Whether or not it was empirically validated is irrelevant. It was true mathematically

Today, this cannot be done with fields like medicine and psychology. No one can create a mathematical proof for a cure for cancer, make a treatment based upon that proof, and have it work on the first try


I've done enough mathematical proofs that I don't think I need to "google it", but I read the Wikipedia page anyway. It failed to explain how a mathematical proof could have anything to do with "proving" General Relativity in absence of empirical evidence. He certainly proved that (as I said before) it agreed with existing (experimentally backed) theories using mathematics, but if you have a method which allows you to deduce the truth of physical laws from purely mathematical proofs (no links to experiments at all!) you'll revolutionize epistemology. As it stands most scientists are using the crutch of experimental evidence at some level (either directly or by reasoning within theories that are at least for now validated by experiment).

> Whether or not it was empirically validated is irrelevant. It was true mathematically

If you mean it was a self-consistent theory, note that there are infinite physically incorrect theories which are "true mathematically". It is true mathematically in the same way number theory is "true mathematically", but you don't see anyone assume that we can describe gravity with prime number theory. If you mean it was mathematically proved consistent with previous physical theories backed by evidence, we're back to the experimental link, and also there are infinite physically incorrect theories which would agree with Newtonian mechanics and special relativity. If you mean he proved it satisfied some properties we expect from physical theories because they've been consistently upheld (e.g. energy conservation), there's an infinite number of wrong ones there too.

Mathematical convenience has been an excellent guide in physics, particularly fundamental physics (electromagnetic waves, antiparticles, and the Higgs Boson were all the results of positing things for mathematical convenience), but it is not a substitute for verifying novel experimental predictions. Nobody taken seriously in the physics (or mathematics for that matter!) community thinks it is, not even the often decried string theorists.


Interesting points. I still disagree, or perhaps I'm not doing a good job explaining.

My point is this: Something is true whether one accepts it or not. E.g. the earth was proven to be round before anyone actually went around it.

Another example is climate change deniers. To them there's no proof of; 1) Climate change; 2) That it's man made.

Another example is people who believe the earth is 10,000 years old (I'm not joking millions of ppl believe this). They will deny any evidence you put in front of them.

That's the beauty of maths. If you can prove it mathematically, it's true. Whether or not you believe it is irrelevant.


Epistemical simplicity is one of the things I love about math, but it's hard to extend it wholesale to physics. GR will almost certainly be proved wrong some day, at least in some regime (Einstein was ironically one of the first to feel it needed something more, as an alternative to the inelegant cosmological constant). If GR had been mathematically proven, this would be terrible, as it would prove our mathematics are inconsistent!

As for the existence objective reality, that's another thing that seems hard to prove conclusively. I'd suggest looking at some of the basic epistemology surrounding modern science (e.g. Popper[1]) for some thoughts on this.

As an interesting aside, while it's true that mathematical proofs (idealizing here and assuming incorrect proofs are never accepted by the mathematical community, because they on occasion are) are absolute statements of truth, they may not be stating precisely the truth you expect. Thanks to Godel[2], we know that it is not possible for any consistent mathematical theory rich enough to talk about addition and multiplication of natural numbers to prove it's own consistency. As a result, we may have a proof that 2+2 != 5, but that actually doesn't exclude the possibility that there is a proof of 2+2 = 5. In fact, his result shows we will never be able to prove such a thing does not exist (since it would imply the consistency of our mathematical system). So our absolute truths from proofs of X are actually of the form ZFC implies X, where ZFC is the background theory which is generally taken to underly modern mathematical works unless otherwise specified. So things are not so clear cut even here.

[1]: http://plato.stanford.edu/entries/popper/ [2]: https://en.m.wikipedia.org/wiki/G%C3%B6del%27s_incompletenes...


You've totally lost sight of the issue at hand. Now you're talking about things utterly unrelated to this discussion.

It seems you're more interested in dicing up and attacking what I said, instead of what I mean.

I'll re-articulate it for you once more: We don't have a way to mathematically prove a drug for depression or to explain psychological phenomena. The studies in the article are talking about those which rely on empirical observation.

Go home.


What? You've been talking about (a) whether GR needs empirical observation, (b) whether things are true whether people admit them or not (i.e. Objective reality) and (c) mathematical proofs being evidence of absolute truths. Those were my three paragraphs.

How many problems have you worked through in GR? Have you gone through the proofs that GR recreates Newtonian gravity in the low-energy limit? If not, don't go around talking about how Einstein proved GR was a true description of reality with naught but mathematical proof when he didn't think he did that.

Physics has the same requirements for empiric evidence as the life sciences, it's just that we work within regimes where we can apply fundamental, empirically validated mathematical theories directly. If you don't believe me, go ask another physicist.


>It's not as simple as that, for all sciences

Yes, it is - that's the very basis for science. One of the main problems is that many academic disciplines have been wrongly classified as hard science (see "social" sciences).

>The LHC has only been built once.. But everyone believes the results, and believes that they represent the Higgs.

Nobody intelligent "believes" anything. We examine evidence and draw tentative conclusions based on that evidence, always retaining doubt because, unless you are omniscient, there is always new information that can come to light that can cause you to change your conclusion. Science is a process. If you "believe" anything without doubt, you aren't a scientist, you are a priest.


I see a lot of people commenting here that there's no incentive to repeat previous research because it's not useful for getting grants, etc. This kind of true but I think it misses something important.

At least in life sciences (can't comment on other fields), it's not that scientists don't repeat each other's results. After all, if you're going to invest a significant fraction of your tiny lab budget on a research project, you need to make sure that the basic premise is sound, so it's not uncommon that the first step is to confirm the previous published result before continuing. And if the replication fails, it's obviously not a wise idea to proceed with a project that relies on the prior result. But that work never makes it into a paper.

If the replication succeeds, great! Proceed with the project. But it's time-consuming and expensive to make the reproduction publication worthy, so it will probably get buried in a data supplement if it's published at all.

If the replication fails, it's even more time-consuming and expensive to convincingly demonstrate the negative result. Moreover, the work is being done by an ambitious student or postdoc who is staring down a horrible job market and needs novel results and interesting publications in order to have a future in science. Why would someone like that spend a year attacking the work of an established scientist over an uninteresting and possibly wrong negative result, and getting a crappy paper and an enemy out of it in the end, instead of planning for their own future?

If enough people fail to replicate a result, it becomes "common knowledge" in the field that the result is wrong, and it kind of fades away. But it's not really in anyone's interest to write an explicit rebuttal, so it never happens.


As a former life scientist, I completely agree.

If something truly revolutionary is published and it's relevant to your own work, it's very common to repeat the experiment yourself. If it works, great, continue on with your own work and reference the original paper when you publish new results.

If it doesn't work, the outcome can range from: (1) mentally note and move onto an alternative if they are available to (2) spend a ton of time trying to "fix" it, including contacting the original author.

The only time you see reproducibility experiments published is if the original paper made some serious errors, including outright fraud.

An there is no way you'd be able to test the reproducibility of every paper because some papers aren't that important. The important ones get the attention and if you see multiple scientists referencing a work (and building off of it), you can have confidence it actually works.


As a former life scientist, I completely agree also. Things do get repeated. If the findings aren't robust, scientists cut bait and move on. This knowledge is shared informally, for certian. This "Formalzation" of the experimental repetition would be difficult since no one wants grant money or their careers spent generating "yeah, s/he's right about that' outcomes.


It's a bit redundant now, but it's a point that deserves repetition. Replication happens by the scientists that want to build on the original work and advance it. Not because they simply want to replicate it, but simply to be sure that you understand the method and all the practical details that are necessary.

If there is only a single paper about the result, most scientists will regard that with a certain suspicion. That doesn't mean we expect every new thing to be fake, but there are just too many ways these experiments can go wrong.

One problem is certainly that the negative reproduction of experiments are more often communicated informally, and not published. If you're active in the field you will likely hear from the scientists that tried to reproduce the results without any success. But most of these are not published.

Any experiment worth replicating, is usually also worth expanding and building on, and that is where the actual verification is taking place. Science works in this regard, but it doesn't work as cleanly and efficiently as one could hope for.


Yes. Life scientist here. This is 100% on point. It seems that most people who comment that we need 100% published repetition aren't aware of how the process currently works as you describe it. Repetition happens regularly if not in the majority of published research. It's just that it often doesn't get written up in a way that one can easily see the connection with the previous work. Citations are supposed to help this problem but because citations aren't just links and are hidden behind paywalls (a whole different can of worms), its not easy to see what's been replicated vs. not.


This is why I'm always skeptical of people zealously claiming "all GMOs are perfectly safe! It's been proven!"

Yeah, proven by expensive studies that were funded by the company making the GMO. Who is going to pay for another study to try to disprove it?

There's a reason we sprayed DET all over our vegetables for years before it was banned: there was no scientific studies proving that it was harmful, even though it clearly was harmful in hindsight.

Science is not instant, and there's no way someone can claim that some brand-new GMO is "perfectly safe", without any long-term studies on its effects over 10, 20, 30 years of exposure. That's just not possible. And yet you try to explain it to these science zealots and they just brush you off as being "anti-science".


Do you mean DDT?

Anyway, beyond the 'philosophy of science' issue of whether you can prove something, there is good affirmative evidence that existing GMOs are safe for numerous reasons.

First, there's no mechanistic reason to think they would be dangerous. T-DNAs are not somehow magically toxic to humans; everything you eat is riddled with millennia of t-dnas, old viruses, transposon blooms, etc. etc.

The technologies themselves should be safe as well. BT is well understood and embraced by the organic community as an applied natural pesticide, so you would need to find evidence that the localization somehow makes it toxic. Glyphosate resistance is also unlikely to have any effect a priori because it affects a metabolic pathway completely absent in animals.

Argue all you like about how nothing can be 'perfectly safe', sure, but there's no reason to think that GMOs are dangerous, and people have looked quite hard. ______

Finally, just look at the Seralini pile-of-bullshit for evidence that there's plenty of incentive to publish anything critical of GMOs. No one is sitting on career-making evidence.

https://en.wikipedia.org/wiki/S%C3%A9ralini_affair


> First, there's no mechanistic reason to think they would be dangerous.

That's like saying "there's no reason to think peanuts would be dangerous" since humans eat them all the time. And yet, they are deadly to some humans. No one knows why.

But they do know that food allergies are far more prevalent in the U.S. than in other countries. And now, suddenly, people in Africa and China are starting to exhibit food allergies that the U.S. has had for a while. So what have we started shipping over to them that's causing these allergies? Who will fund that study?


What you've hinted at here is fascinating to me.

Do you have some references?

In case it's not clear, I'd like to read articles about food allergies that have been common in USA for some time now becoming common in countries that are coming out of 2nd world status and into 1st world.


This is just a large strawman argument. Talking about the safety of GMOs is more like talking about the safety of "Medicine"- you're actually talking about the safety of a large variety of different products. Pushing the narrative that all GMOs should be treated equally is like pushing the narrative that all medicines should be treated equally, which is nonsensical.

The very idea that there can be "something wrong with GMOs" is as anti-scientific as the narrative that there can be "something wrong with medicines". It's anti-scientific on principle, because it's not the safety of individual products you are taking issue with, it's the very existence of the entire scientific field.

Anyways, it takes about 10 years for the average Ag product to make it to market. That's on par with the time for the average drug. If you are serious about safety, start talking about real improvements that could be made to the Ag approval process. Start talking about the flaws in individual studies. Until you do something like that, your anti-science blanket dismissal of an entire field is as ridiculous as someone who is "against medicines".


>The very idea that there can be "something wrong with GMOs" is as anti-scientific

Are you kidding me? Have you read Taleb on the risks of GMOs? He argues that the possibility space of danger from GMOs is different from the possibility space of danger from regular breeding.


Sorry but this is such a fundamentally wrong assertion on so many levels. Quite the contrary people ought to prove that there is a good reason to assume GMOs are inherently more unsafe than anything else. It is the anti-GMO crowd who are the creationists here relying on emotions and dogma.

GMO is controlled alteration of specific genes in a crop. Why on earth should we assume this is less safe than uncontrolled changes? Massive changes of crop DNA has been going on for decades. Food you eat every day has originally been produced by essentially nuking seeds and making lots of random mutations. One has then grown various "nuked" seeds looking for those with favorable characteristics. Plants get changed getting genes from bacteria, cross breeding and all sorts of things. Why on earth should be assume uncontrolled massive changes of plant DNA is somehow safer than GMO?

GMO is not really what needs to be proven. It is the other methods of changing plants which need to prove that they are safer than GMO.

The problem with GMO isn't the science of it. The problem with GMO is the politics. Here I agree with the anti-GMO crowd. Companies like Monsanto are highly immoral and a scourge of the earth. It is companies owning particular strains of seeds and requiring everybody to keep buying from them which naturally causes resentment.


>>If the replication fails, it's even more time-consuming and expensive to convincingly demonstrate the negative result.

Said scientist will gain immense fame and recognition if they show established ideas are wrong. (Source: academia Physicist)


You only get famous if you produce a new order that replaces the old one. Throwing stones at the old order doesn't actually get you very much unless you provide something better.

Especially in psych/bio, a negative finding can almost always be blamed on the experimentalist -- even more so if they are young. It is no great feat to conduct a study so poorly that fail to detect a signal others have reported.


I think your's and the parent comment are both right, just addressing different extremes.

Sometimes you find evidence that doesn't quite match up with previous work, but if an explanatory framework isn't forthcoming, it can be difficult to say who's results are right, and what they mean. On the other hand, sometimes you come across something blatantly wrong, so you do your follow up experiments to confirm and you're all set. It's only the latter case that will quickly make it into a manuscript.

I will say, I've seen some big corrections (like this-allele-was-actually-a-completely-different-mutant bad) that just get buried in the results section of a subsequent publication, and no retraction was ever submitted for the original paper. That was definitely a failure in the field, and likely the result of the status of the authors involved.


Why is it not in anyone's interest to write an explicit rebuttal?


I'm a physician and I've been suggesting this to my colleagues for a few years, only to be met with alienated stares and labeled cynical.

The doctors and doctors-in-training I work with have altruistic motives, but place too much stock in major medical studies. They also frequently apply single-study findings to patient care, even to patients that would've been excluded from that study (saw this a lot with the recent SPRINT blood pressure trial).

And don't even get me started on the pulmonary embolism treatment studies. What a clinical mess that is.

It's frustrating.


Personal experience (I've met many pre- and current doctors during my studies) suggests that few doctors are scientists. They know a lot of things, but they're not what I'd call scientists in the same way a physicist or biologist would be (and I've often wondered why they'd get a "Doctorate" at all). So this might explain the abovementioned behaviour...

Yet I'm surprised by your colleagues behaviour nonetheless. I would have thought they'd have more retenue.


I also wouldn't call most of us scientists, though many at university hospitals are.

I think the problem is that it's simpler to just take a study's conclusion and believe it. Because hey, it's peer-reviewed and in the NEJM! Easy!

The adverse reaction I described is normal in medicine when you oppose the status quo. That's probably true in other professions and industries.


Aside from MD/PhD's, medical doctors are not trained in research or evaluating evidence, so are not scientists by any reasonable definition.


but in medicine, what is the practical alternative?

how do you incorporate these findings? ignore them?

if so, it's probably bad for your patients. the only thing worse than a single-study finding is a zero-study finding.


I was simply suggesting what we all learn in medical school and residency: to appropriately evaluate clinical studies. Just don't think most doctors do.

Let me give you an example of how I approach things. The guidelines for acute pancreatitis recommend using a fluid called LR instead of NS for volume resuscitation. This is based on an single study that included 10 patients and simply noted slightly better lab numbers; there was no difference in clinical outcome. Lots of problems with that study, right (small, underpowered, confounders, validity issues, etc)? However, there's no major disadvantage for using LR in those patients (unless hyperkalemia is a concern), so I use it since it might have a benefit.

This is a very simple example. It gets much more complicated than that.

"Probably" is one of favorite words in medicine, btw :).


    "Probably" is one of favorite words in medicine, btw :).
Right as it should. If somebody answers my question by "It depends", then I know I'm in good company!


And the root of the word "probably" is latin probare meaning "to test". As in, you have to have some empirical basis for your belief in the likelihood of something.


No. Step 1: Do no harm. In practice, this means staying up to date on the latest research (by law), but follow the guidelines. These are updated regularly, just not after each and every study that comes out. And it's a good thing a lot of doctors exercise caution.


Apply Bayesian reasoning. Weigh results appropriately.


> And don't even get me started on the pulmonary embolism treatment studies. What a clinical mess that is.

Please, start.

I'm very interested and will read what you write with full attention.


Can I have you as my PCP?

I had a severe reaction to fluoroquinolones and maybe some confounding comorbidities and have been pretty much unable to get effective help in our medical system so far :(


> Nowadays there's a certain danger of the same thing happening (not repeating experiments), even in the famous field of physics. I was shocked to hear of an experiment done at the big accelerator at the National Accelerator Laboratory, where a person used deuterium. In order to compare his heavy hydrogen results to what might happen with light hydrogen, he had to use data from someone else's experiment on light, hydrogen, which was done on different apparatus. When asked why, he said it was because he couldn't get time on the program (because there's so little time and it's such expensive apparatus) to do the experiment with light hydrogen on this apparatus because there wouldn't be any new result. And so the men in charge of programs at NAL are so anxious for new results, in order to get more money to keep the thing going for public relations purposes, they are destroying--possibly--the value of the experiments themselves, which is the whole purpose of the thing. It is often hard for the experimenters there to complete their work as their scientific integrity demands.

-- Richard Feynman, "Surely You're Joking, Mr. Feynman", pp. 225-226


Current top comment https://news.ycombinator.com/item?id=12186295 , specifically rebutted 30 years before the comment was written.


This isn't a rebuttal of the linked comment.

The linked comment doesn't state that it would be a waste of time to replicate on a hypothetical LHC clone.

Rather, the linked comment states that we can accept the Higgs result with reasonable confidence even though it's currently infeasible to replicate that experiment.

Feynman's issue was also qualitiatively different -- the scientist was comparing results from two different instruments. The people in charge of one of the instruments wouldn't allow the scientist to run both experiments on a single instrument. In fact, from context, it's not even clear to me Feynmann would have insisted on re-running the original experiment if the scientist were not using a different accelerator for the second one. Anyways, in the Higgs case, there's no potential for a "comparing readings from instrument A to readings from instrument B" type bug.

More to the point, and FWIW, I somehow doubt Feynman would insist on building a second LHC for the sole purpose of replicating the Higgs measurement. But I guess we have to leave that to pure speculation.


My first reaction to this headline was "duh." Of course we should hold off on accepting scientific claims (i.e., predictions about the natural world) that to date have been verified only by the same person making those claims!

My next reaction was, "wow, it's a sad state of affairs when a postdoctoral research fellow at Harvard Medical School feels he has to spell this out in a blog post." It implies that even at the prestigious institution in which he works, he is coming across people who treat science like religion.


Pretty much everyone in science and academia is saying we need to reproduce results. It is a sad state of affairs, but I'm not sure why having someone from Harvard say it is surprising or different than anywhere else.

The real issue is that there are anti-incentives to reproducing other people's results. All scientists want to see it, and nobody is able to actually do it, because they'll lose status, publication opportunities, and funding. It's viewed as career suicide. Unfortunately, this article doesn't suggest any solutions to that problem.

It would be best to not suggest this problem is somehow due to "people who treat science like religion". That association isn't called for, nor is it largely true or applicable here. Ahmed even said very early in the article "the majority of irreproducible research stems from a complex matrix of statistical, technical, and psychological biases that are rampant within the scientific community."

Statistical and technical biases are common human errors that affect everyone equally and don't amount to religion. Even beliefs don't amount to religion. You and I both believe electrons run through our computers, yet I haven't verified electrons exist and I've never seen one - have you? Almost everything we know is belief based on what others have told us, whether it's science or not. Even if scientific results are reproduced, unless you're doing the reproducing, you're still subject to believing the results. It's a more believable story when two independent people verify some result, but that doesn't mean that believing one story or one paper demonstrating a result is somehow akin to blind faith, ritual, and deity worship.


I don't have anything constructive to add other than sadness. This bums me out.

Culturally academia is responsible for a lot of cult-like approaches to things. It's just a humanity problem that we have to acknowledge and use science to fight against.


My reaction was the same. I would argue that the requirement is even stronger: the hypothesis must be verified by different scientists over time. One of the key roles of the scientific method is error correction for bugs in our mental machinery (i.e., cognitive biases, of which there are many ).

I look at science as an essentially evolutionary process: a single study has a very low probability of being accurate, if the results survive additional testing over time, the probability of accuracy increases.


" he is coming across people who treat science like religion."

The only difference between Scientists and people in other fields is that Scientists completely lack self-awareness.

Science is obviously systematically biased.

Have you ever been to a good school? Every word a prof says is designed to make you think he is smart. That's the only way they make careers. They 'live in the identity' of being smart.

It's laughable and hyper competitive.

Bad studies ensue.

It's obvious to anyone with a basic grasp of human behaviour.

The only thing that surprises me is how they are unwilling to admit the problem.


Of course you as a reader of said claims confirm at the very least that they've been independently reproduced, right?

(If so, this shouldn't be news.)


Unfortunately, there is no easy way to do it. Confirmation studies are not easily accepted by impactful journals/conferences, thus nearly nobody bothers to do them. Even if there is one, it can be surprisingly hard to find it.

As a point of anecdata: my wife's master thesis was a confirmation study of using LLDA for face recognition. I remember seeing it included in some book by the university press. I gave up Googling for it after 5 minutes.


There needs to be better scientific protocol. More linking through data instead of annoying cites. I think anyway.


We shouldn't "accept" or "reject" results at all.

It's not a binary option. One poor experiment might give us some evidence something is true. A single well reviewed experiment gives us more confidence. Repeating the results similarly does. As does the reputation of the person conducting the experiment and the way in which it was conducted.

It's not a binary thing where we decide something is accepted or rejected, we gather evidence and treat it accordingly.


So many scientists I talk to don't have a basic understanding of philosophy of science. I don't necessarily blame them--I understand why "philosophy" as an academic field is seen as a soft, speculative, and pretentious field compared to the rigor of science, but as Daniel Dennett said, “There is no such thing as philosophy-free science; there is only science whose philosophical baggage is taken on board without examination."

These days, if you ask a scientist "So how do we prove something is true using science?" they'll be able to recite Popper's falsificationism as if it's a fundamental truth, not a particular way of looking at the world. But the huge gap between the particular theory that people get taught in undergrad--that science can't actually prove anything true, just disprove things to approach better hypotheses--and the real-world process of running an experiment, analyzing data, and publishing a paper is unaddressed. The idea that there's a particular bar that must be passed before we accept something as true is exactly what got us into this mess in the first place! There's a naive implicit assumption in scientific publishing that a p-value < 0.05 means something is true, or at least likely true; this author is just suggesting that true things are those which yield a p-value under 0.05 twice!

What's needed, in my opinion at least, is a more existential, practically-grounded view of science, in which we are more agnostic about the "truth" of our models with a closer eye to what we should actually do given the data. Instead of worrying about whether or not a particular model is "true" or "false," and thus whether we should "accept" or "reject" an experiment, focus on the predictions that can be made from the total data given, and the way we should actually live based on the datapoints collected. Instead, we have situations like the terrible state of debate on global warming, because any decent scientist knows they shouldn't say they're absolutely sure it's happening, or a replication crisis caused by experiments focused on propping up a larger model, instead of standing on their own.


^ One of the only reasonable responses here.


The arXiv, but instead of saying "published" or "not published" we put a score on the paper suggesting the probability one should believe its thesis.

Kinda like Rotten Tomatoes, but... for science?


I agree in principle. There are a few concerns:

1. How should we receive costly research that took special equipment and lots of time to develop and cultivate? I.e., CERN?

2. A lot of research is published, ignored, and then rediscovered. In this case, we may want to accept the research until it cannot be repeated (i.e., in another journal publication).

3. Reviewers of academic publications probably are not qualified or have the time to recreate all scientific research.

4. Isn't the academic system at its core kinda... broken?


"Isn't the academic system at its core kinda... broken?"

can your elaborate on what you mean?


I gather its things like misaligned incentives. Like funding sources dictating the "desirable" result, reluctance to publish negative results, pursue of (vanity) metrics that leads to quota system by universities, etc.


Maybe conferences should have a "reproducibility" track for that purpose? Also, I don't know about other fields, but I'm pretty sure that in CS, if you just took a paper and tried to reproduce the results, you'll get rejected on the ground that you offer no original contribution; no original contribution => no publication => no funding.


I don't think reproducibility should necessarily mean literally repeating the same study over again.

Whether or not a paper is fully correct is less important that the further work it stimulates or informs-- either via more papers (impact factor) or via APPLICATIONS of the research.

In the example you're giving, to get accepted, the author doesn't need to do much more than extend the research in some small way, compare it to other research, or explore some aspect more fully. If someone is going to take the time to repeat something, they might as well go a little further.

If someone takes the time to apply research results they're going to, in a way, test the validity of the results. Maybe the author's experiences in the pharma world is very different, but I doubt that.


"I don't think reproducibility should necessarily mean literally repeating the same study over again."

Yes, it should. We have too many demonstrated instances of studies built on other studies that turned out to be flawed, but the second study, rather than showing the first one was flawed, was rationalized and massaged until it conformed to the first study because the first study already had the imprimatur of peer-reviewed correctness on it.

Only someone replicating the original study directly (give or take sample size or duration or other simple such changes) will have the guts and moral authority to stand up and say "I can't replicate this. The original study may be wrong."


But there should also be replication of the concepts of the study under a different paradigm, otherwise you might just be finding out effects of the particular procedure. An original study is rarely "wrong" (fraud is a serious problem, but not a typical one), it's that it's merely one data point, from one procedure, and a lot of the time the effect that can be replicated is "real," it's just a side effect of the process. Basically, there's way more replication that should be going on in science at many levels.


Yes. But replication is a starting point. It isn't just about fraud, it's also about error and failure to communicate the experiment. (It's also beyond time for paper journals to just contain the relevant summaries and conclusions for given studies and for "what fits in a journal" to cease being "what fits in publication". Authors ought to be able to publish as much stuff as they want with a paper, including full source code sets or data sets or whatever, with the paper limitations affecting only the top-level summary.)


So I work for a scientific journal (insert comments about how evil I am here) and we have much bigger problems with authors not providing full data than with us refusing it. It is almost always possible to upload a full data set and code to a paper; what we're working on now is standardizing formats as much as possible so it's easy to immediately see what equipment and reagents were used, where the experimental procedures came from, what data was gathered and when, and how exactly it was analyzed without grabbing tons of files in different formats. "As much information as authors want" is high for an excellent but small set of authors who are probably right that the journal should do more to facilitate hosting their data--for most authors, that's far less information than we actually need.


Yes, I do not mean to just criticize journals, authors also need to start realizing they need to produce that as well. 10-20ish pages of paper for the work of a dozen people for three years (and that for the good-to-great papers) is just not an acceptable payoff for the amount of money involved.


For CS, reproducing should be easy given the code and input data right? Other sciences' input isn't so easily shared


Ideally you should re-implement the algorithm based on the description in the paper to verify that the description of the algorithm is correct. You should also test with your own data to make sure that the algorithm works on all reasonable data and not only on some provided cherry picked data. If you can't get the expected results with your own implementation and your own data then the results aren't reproduced.


Yes. So being able to rerun with the same code and same inputs to get the same outputs is a lower bar. Many papers don't meet even that bar.

(Mostly because they don't publish code nor data; and academic code is often a horrible mess, and the code was mucked around with between different stages of running.)


Great points, thanks!


Rerunning code does not count as reproduction as it is not an independent test of what is being claimed. You do have the option of verifying the code against the goals of the study, though that is a review rather than a reproduction (which is still valuable.)


most cs papers i read (actually all) do -not- include code (or input data). which makes this an issue even in the case of cs.


Not necessarily. There are fields like complexity and computability wich are pureley mathematical in nature and have little to do with actual code, but much with the abstract theoretical fundamentals of code in general. And on the other hand you have fields like machine learning, where you can have an algorithm, wich you can implement (or have as code), but you don't nessessarily know the values of certain parameters, specific for your problem space.


I mean, if a paper gives an algorithm, proves its correctness, and you are convinced by the proofs, then you're done. I don't see how implementing the algorithm gives you more insight. I'm doing my master degree in computational geometry and most people in my lab don't even implement their algorithms. They just know they are correct from their proofs.


Because evaluating algorithms is not always that straight forward. For some algorithms runtime is hugely important, and I don't mean the asymptotic complexity but a hard benchmark of how much it can do in what time span. Stuff like a SLAM algorithm being O(n^2) is nice and all, but to compare it to other SLAM algorithms, I need hard numbers on what it can do in how many milliseconds.

Often, I find that authors don't publish their code. If they do publish their code, they rarely publish their code for their benchmarks.


Yeah sure but at that point, it's more optimization than algorithm design. Of course, with any algorithm, you always have the hidden constant that you must account for. Also, what I was saying does not apply to the entier CS field. It only applies when you try to design an algorithm for a problem that does not yet have an efficient algorithm. I don't have much experience but I don't think it is really hard to see the cost of the hidden constant in most algorithms.


“Beware of bugs in the above code; I have only proved it correct, not tried it.”

— Donald Knuth

(https://staff.fnwi.uva.nl/p.vanemdeboas/knuthnote.pdf)


Good luck getting the code. Abandon all hope if you want to make it run on a machine different from the laptop on which the original grad student implemented it.


Real issue is that the way science is funded is broken.


Broken assumes it was working well at some point. Nothing new here.


It worked pretty well back when scientific inquiry was funded by selling inventions or other work that the scientist did "as a day job." That was a long time ago, though.


This has worked and still works for technology, but is not how basic science research was done. There's never been "enough" money for sciences, nor they ever been allocated in a manner agreeable to all parties. For a long time, research was either a gentleman's hobby or a side project under auspices of unsuspecting donor. Lagrange has developed variational calculus while doing his artillery school tenure. Einstein's annus mirabilis happened at his stint in Swiss patent office.


Many people have mentioned that replicating an experiment can be expensive, but I don't think anybody has really brought up just how expensive this can be.

Not all science is done in a lab. Replicating an experiment is obviously feasible for a short term psychology experiment, but in earth sciences (oceanography for instance.) it is far less often possible to reproduce an experiment for the following reasons. N.B. This is all from my personal experience of one field of science.

1.) Cost. If you got funding to take an ice-breaker to Antarctica to "do science" it required several million dollars to fund. It is difficult enough to secure funding for anything these days, none the less prohibitively expensive attempts to reproduce results. (honestly any serious research vessel will run costs into the millions, regardless of destination.)

2.) Time. Say you are on a research vessel taking measurements of the Amazon river basin. This is a trip that takes months to years to plan and execute. If you return to duplicate your experiment 2 years later, the ecology of the area you were taking measurements of may have changed completely.

3.) Politics. Earth sciences often require cooperation from foreign entities, many of which are not particularly stable, or whom may be engaging in political machinations that run counter to your nationality's presence in the country, or both. Iran and China are two good examples. Both are home to some excellent oceanographers, and both of which can be very difficult to Science in when your team includes non Iranian/Chinese nationalities.


The big issue right now is funding of replicated research, who wants to fund a research to prove someone else was right? Most of these funds are granted based on potential outcome of the new discovery like: potential business, patents, licenses, etc... not being the first one would probably wipe most of these benefits, cutting down to a small probably getting funded...

Now, straight to the point, who's going to pay for the repeated research to prove the first one?


On a low level, I think it should be mandatory for masters students to do a pre-thesis project, which is replicating findings in a published paper.

It would do something about low hanging fruit in terms of testing reproduceability and since there is a published paper, the student has access to guidelines for setting up and reporting on a large project, which will help them learn how to do their own, original thesis.


I had my Masters students do this as part of my wireless networking class this year. It was very instructive for me and the students seemed to enjoy it, so I'll definitely keep it in the syllabus.


> Now, straight to the point, who's going to pay for the repeated research to prove the first one?

Who is and who should are two different questions. The body who funded the original research should be best placed to fund the verification of the research. If the research isn't compelling enough to fund verification then why was it funded in the first place? And if the principle research group is requesting additional funding for more research that builds on the initial unverified research then that sounds like poor governance.

I realise that this simplistic view gets messy and confused when research is really academic led innovation and incubation.


Perhaps those who are skeptical of the first research (and are losing money from it) should fund the replication research.

Incentive for corrupting the data seems high, however.


This works for some "controversial" researches, which can be dismantled. What about successful (or apparently) ones which may lead to tons of $$$ in return? Is there any value to pay for a research which is going to prove someone else was right ending in making their wallets fatter?

Sorry for the informal language, but makes things a little bit more salty.


If someone is making money from it, someone else is (most likely) losing money from it as well.

When the lightbulb was invented, Edison made lots of money, but I am sure candlemakers had plenty of incentive to fund research that hypothesized that lightbulbs emitted toxins.


We can all donate to organizations like the Reproducibility Project: https://osf.io/ezcuj/

But yeah, larger entities (universities, businesses) should also be factoring the cost of reproduction when they commission research.


Totally agree. I'd go even further and make free licenses on scientific source and datasets mandatory. Research that is funded by public money should lead to public code and data.


So, $1 of public funding triggers public code release? Or is there a threshold?

No one has been able to tell me why the need for reproducibility requires software freedom.

Consider the program 'nauty'. It is available in source code form for anyone to review, but it cannot be used for military purposes. That's not free, certainly. But isn't that enough to call it good science?

Similarly, consider the clause "only for use in verifying the result of paper X". That's also not free. But it serves the goal of letting others be able to verify X.

Also, you haven't gone far enough. It's not only the license that matters, but access. You have to mandate that either anyone can get access to the code for no/low cost for some years (since I can sell my GPL'ed software for $30,000 or take down the download link once published), or link publication with a required submission to some repository with the mission of keeping all that source and data around, available to anyone, at no cost.


> So, $1 of public funding triggers public code release? Or is there a threshold?

Yes, 1$, no threshold necessary.

For projects where the cost of publication of data is not worth the dollar, people will just not accept the dollar. (That's where a 'natural threshold' comes in.)


So, if I have a summer intern who is paid by the government, and helps on my otherwise non-government-funded project, then is that enough to trigger the requirement?

Or, if I use a government facility, like a supercomputing center, then that also trigger the release requirement? Even if no money changes hands? What about a government network?

It sounds very tricky. If I get a government grant to work on a project, and that grant buys equipment X, which I also use for another project, then must both projects be subject to this form of release?

Even if the second project is 10 years later?

If in working on project X, but in the process of working on X I find some new knowledge Y, and publish it, even though that has nothing to do with the grant from the original X, does that count? (Think AT&T's observation of the cosmic microwave background, when their goal was to reduce noise in terrestrial microwave communications.)

It seems like a very complicated scheme.


No it's very simple. If you want "the people" to pay for you to do things at all, you owe them all of the things or you don't get their money. Your objections make it seem complicated but this feels like an effort to muddy fairly clear waters.


I'm pointing out that it's impossible to work this way. People don't like it. The accounting systems aren't set up for this. The entire post-war research system isn't structured this way.


The entire post-war research system is set up to create endless streams of non-replicable studies based on incorrectly-applied statistics all the while convincing itself that it's doing something useful while wasting billions upon billions of dollars.

I'm not to impressed by "well, that's not how it works right now". The whole problem is "how it works right now". That's what we're discussing, the need for it to not work that way.


You want to change the system. I understand that.

We have many systems to go on over the last few hundred years of science. We have the pre-war system, primarily funded by private philanthropy. We have the communist system.

None of them seem to create the stream of highly replicable studies you want.

That may indicate something deep about how people work and how science is really done, and suggest that your admirable goals are not tenable.


I tend to agree, actually. In some sense the real solution is take science off its pedestal, as it does not generally deserve to be up there. The 17th-19th centuries were in some sense a fluke of low-hanging fruit, and the science of most of the 20th and the 21st centuries do not deserve to be regarded with the same worshipful gaze, a word I choose carefully. By taking it off its pedestal and subjecting it to a lot more scrutiny, we'd all win, including science in general.

This is not necessarily because we're worse people than them, but because the problem is now much much harder. It's always better to acknowledge that hard problems are hard, rather than trying to solve them by pretending they're easy.

As for the model I would propose, I believe all funding models are fundamentally flawed, and the best model is all of them at once, so hopefully the flaws at least sort of cancel out. At the moment, that generally means seeking a decrease of the current government funding strategy and breaking the peer review monopolies, not because either of them are necessarily especially bad, but because they are too powerful and their flaws are coming to define the flaws of science in general.

Some of this would just be a mindset change, to recognize that "research" isn't isomorphic to "producing peer-reviewed papers" and that there's nothing wrong with setting up some equivalents of Xerox Park in other disciplines. Potentially with government money, since my point is more about multiple models than the literal funding sources. If "science" as it is practiced today was less pedestalized, this would be a much less horrifying suggestion.


"Science" is no longer on a pedestal. The PR campaigns against conclusions for leaded gas, smoking, acid rain, global warming, and vaccine safety, and the scientific development of leaded gas, ozone-depleting CFCs, Agent Orange/dioxin, etc., plus concerns like GMOs and Monsanto, mobile phone safety, plasticizers/hormone disruptors, and more make for a decidedly mixed view of science by the general public.

As you can see from http://www.pewforum.org/2013/07/11/public-esteem-for-militar... , the military, teachers, and medical doctors are on higher pedestals than scientists.

That said, I'm all for the mixed development model.


I think it's on a pedestal where it matters, where the funding decisions are being made. The public's opinion only matters in the long term. Though, admittedly, the long term is probably coming up pretty quickly. There's a lot of things that have had their opinion trending negative for so long that people have become blase about the negative trends suddenly coming due this year.


On reconsideration, I'm wavering on my belief. NIH and NSF get a lot more support than, say, the NEA. HST, fine as it is, was extremely expensive.

I still maintain that the military is still on a higher pedestal than science, in terms of funding and prestige. You hear stories of people buying military people in uniform their meal, to honor their service. That's much less common for scientists.


No, it isn't.

You calculate the fair market value of the public resources you used, and subtract what you paid the public for them. If it is positive, you have a publicly-supported project.

So if you use that government-paid intern for several hours, you ought to pay their agency or department $7.25 for each. You pay for what you use, and there's no problem.

If you work in a government-built facility, and you pay rent for your space, there's no problem. It doesn't matter that the public is your landlord. The space has a market value, and you pay it. There is no net transfer of value to your project at the expense of anyone else. If someone else could make better use of your space, they could have paid higher rent to get it.

If you're accepting a grant, that makes it a bit more difficult for you. If you get $50000, you would have to pay back $50000, plus the interest and the administrative overhead for processing your grant request. And then there's the value of the risk premium and moral hazard. You would have to find some other source of funding to "close" the research, and it would have to do it before starting work. Otherwise, potentially profitable projects could get privatized just before triggering the public release requirement, and the money sinks would be left as public.

If you use public funds to buy equipment for a publicly supported project, and then later want to use it for a private project, you have three options: lease it from the public project, or pay the depreciated value of the equipment to buy it outright, or make your private project publicly-supported.

It isn't any more complicated than the GPL copyleft. If you use GPL'ed code, you have to make public everything you do with it. If you don't want to do that, don't use GPL'ed code.


> "You calculate the fair market value of the public resources you used"

Which is very hard for things with no market.

I use government libraries. They are free to me. What is the fair market value of that? There are private and subscription libraries, so it's not like no market exists.

What is the fair market value of time on Hubble?

> "if you use that government-paid intern for several hours ... You pay for what you use"

I think you mean $0, not the $7.25 you estimated. Under the Fair Labor Standards Act, an internship is "for the benefit of the intern", not the company. An internship is not supposed to improve the bottom line of a company. An intern may even get in the way, and cause negative value.

And that's my point. The public gains more than can easily be counted by simple, direct market valuation. What is the worth of having students with industrial training? What is the worth of having broad public access to the literature?

Or, for a more real-world case, companies might not be interested in tropical disease research because the revenue won't justify the development costs. But the US military would like to be able to send troops to places with an endemic tropical disease, so they want some way to be able to prevent or treat the disease. The US foreign diplomatic policy would also like the good-will of those countries. The US could, by subsidizing tropical disease research, tilt the "fair market value" so is more weighted towards its military and diplomatic policy goals.

That assumes that part of the corporate revenue comes from subsidy, and part comes from being able to sell the drug on the market. But now, if part of the revenue comes from the government, the company cannot seek patent protection. This reduces the profit expectation, which means the government will need to subsidize the project even more to get a company to be interested in the effort.


The public does not want its funds diverted into private profits, period. Allowing any exceptions or leeway is a wide-open door to a trough filled with corruption and deceit.

You receive no special additional benefit from a library by being a researcher. Everyone can read the same materials as you do. Time on the Hubble costs more than any individual astronomer could pay. If you are keen on closed, private astronomy, you would need to check the NASA budget figures.

If you derive useful benefit from work done at your request, you need to pay the person doing it. If the intern is working for the government for no pay, how would they not just laugh in your face when you ask them to do work for you? You invented the hypothetical; I won't fix it for you.

If the military or state department could derive some benefit from subsidizing private research, they can bloody well do the research on their own. "US Army cures Dengue" would be great for both operations and PR, and would be a much better use of funds than a smart bomb that can stalk you on Facebook and blow up all your friends at the same time as you. If you as a private company want to sell a cure for Dengue on your own, then don't go begging the government for money. Fund it yourself!


> "The public does not want its funds diverted into private profits, period"

Sure. But no grants allow the diversion of fund into private profits, so I don't know what you're referring to.

Take the SBIR grants. It's a way for the government to help small, for-profit companies do the R&D that might lead to results that will benefit the overall US economy and policy. The hope is for the companies to commercialize the results and do well.

It's not money that the SBIR recipients can use to party on Maui. The SBIR system has accounting and oversight in place to help prevent that.

Or, take the (infamous) Bayh–Dole Act. Quoting from https://en.wikipedia.org/wiki/Bayh%E2%80%93Dole_Act :

> The key change made by Bayh–Dole was in ownership of inventions made with federal funding. Before the Bayh–Dole Act, federal research funding contracts and grants obligated inventors (where ever they worked) to assign inventions they made using federal funding to the federal government. Bayh–Dole permits a university, small business, or non-profit institution to elect to pursue ownership of an invention in preference to the government.

That sounds very much like that the public, through its elected officials, don't actually want what you say they want, because what we had was more like what you say we should have, and they decided to change it.


Just set up some very clear and simple rules. Don't worry about small threshold when setting up the rules, simplicity trumps.

(Because corner cases where small amounts of funding would trigger the requirements can be worked around by just not using that find. As long as the circumstances triggering the requirements are easy to predict.)


What about defense research?


Well, there's the GPL-styled approach: anyone with access to the results must also have access to the associated data. This doesn't mean it is mandatory to make it public, though you'd have to restrict the redistribution freedom.


I recently used a large dataset of tweets in a research project. As far as I know, I do not have the rights to distribute these.

I also used a dataset consisting of newspaper articles. It cost me $1.000 to get access to, and I definitely do not have the rights to redistribute it.


As long as you provide a detailed enough description of the source of your dataset that I can reproduce it myself then that is fine. So in your first case tell me what criteria you used to select your tweets and in the second tell me where to send my $1000 and what to ask for.


Unfortunately not everyone reports this information. Here is a study that we did of over 500 papers using online social network data: http://tnhh.org/research/pubs/tetc2015.pdf While most authors would report high-level characteristics (e.g., which social network they measured), fewer authors reported how they sampled the network or collected data, and very few people reported on how they handled ethics, privacy and so forth.


In that vein, what about {industrial|scientific|...} espionnage?


Lots of scientific results are repeated but not published. If it doesn't work then people just move on. The problem is journals. There is no way to publish your attempts to repeat an experiment, unless you put it into another paper.

The other issue, especially in the life sciences, is inaquedate statistical input. If someone performs an underpowered, confounded experiment and gets a positive result, then someone else performs the same underpowered confounded experiment and gets a negative result, what have we learned except that the experiment is underpowered?


With science, the profession and the product are distinctly different, and we are failing to hold the profession to the standards of the product. Science, the profession, is political, incentive driven, and circumstantial. Scientists need to get paid. Science, the product, is apolitical, existential, and universal. So those who love and believe in the products of science may wish upon themselves to be these things also. I know I do. Except, sometimes it just ins't practical, or even possible.

But repeatability actually matters more professionally. Scientifically speaking, if the science is bad it just won't work when others try to make use of it. All bad science will be identified or corrected as we try and make use of it and convert it into new technology. Technology mandates repeatability. So those scientists who fail to produce repeatable science, regardless of how professionally successful they may be, will inevitably fail to produce any new technology or medicine, and vice versa.


Obviously I agree that scientific results must be reproducible. But I also realize that it's simply infeasible to repeat the entirety of every study, and much less to also go to the effort to write and peer-review those repeated results.

What I think is overlooked in this discussion as that a lot of confirmation work already happens. Most (all?) scientific results are incremental progress built on a heap of previous work. In the course of normal research, you reproduce existing results as necessary before altering conditions for your own study. If you can't confirm the results, well then perhaps you have a paper (though it can be politically challenging to get it published, and that's a separate problem). But if you do, then you don't waste time publishing that, you get on with the new stuff.

Ultimately, I don't think scientists do accept results in their field that they have not repeated.


Cue all the people justifying their pseudoscientific behavior. If it is too expensive to fund twice, it shouldn't be funded once. If that means the LHC and LIGO wouldn't get done, then we should have only funded one of them. We need to remain skeptical of those results until replicated by a new team. Even one replication is pretty weak...

Independent replications of experiment (and the corresponding independent reports of observations) are a crucial part of the scientific method, no matter how much you wish it wasn't. Nature doesn't care if it is inconvenient for you to discover her secrets, or that it is more difficult for you to hype up your findings to the unsuspecting public.


You do realize that scientists who work on the LHC have the highest repeatability standards of any science profession, right?


The LHC experiment is not the issue here.

There is a lot of transparency there, a lot of well meaning people with a lot of oversight.

I suggest most would admit 'there could be a problem' there, but it's out in the open if there is.

The problem of lack of repeatability I think has to do with subconscious bias on the part of the experimenters which will be less pronounced when there are 5000 people working on it.


As far as I know there is only one organization running a machine that can check the LHC results. That is the team that runs the LHC.

On the other hand, there are many other experiments that are repeated billions+ times a day in order for consumer electronics to work, etc.


Having wasted time trying to replicate someone else's results who 'lost' their code, I agree! Maybe repeating the experiment should be part of the peer review.


So frustrating. Lost = "I didn't think anyone would hold me accountable for it."


Looking good is, sadly, better rewarded than doing good in many areas of life. It's doubly sad that this affects our body of scientific knowledge. Even claims that are reproduced can suffer from funding bias and confirmation bias. The truth hopefully comes out in the end, but I'm sad for the harm that's caused in the interim.


I don't get why this is not top on the agenda for the scientific community and the government. Huge amounts of research money is lost in repeating stuff that doesn't work. Huge amounts of money is lost chasing broken science.

I blame this on the neo-liberal ideology. This intense focus on getting money's worth, on tying grants to specific goals, counting publications etc. Driving research exclusively on a very narrowly defined money incentive has driven us further into this sort of mess. The money grabbing journals which has prevented any significant innovation in how science is shared.

I think what science needs is a model closer to that of open source. With open projects anybody can contribute to but where verification happens through personal forged relationships. The Linux kernel code quality is verified by a hierarchy of people trusting each other and knowing something about each others quality of work. Work should be shared like Linux source code in a transparent fashion and not behind some antiquated paywall.

I don't think the grant system can entirely away, but perhaps it should be deemphasized and instead pay a higher minimum amount of money to scientists for doing what they want. Fundamental science breakthrough doesn't happen because people had a clear money incentive. Neither Einstein, Nils Bohr, Isaac Newton or Darwin pursued their scientific breakthroughs with an aim of getting rich. Few people become scientists to get rich. Why not try to tap into people's natural desire to discover?


This problem, like many in modern day science, can in large part be traced back to unstable funding. On the Maslow's-style hierarchy of research lab needs, the need for funding is a lot lower on the scale than the aspiration for scientific purity, just as a human's need for food is lower on the scale than their desire for self-actualization.

If competition for research dollars ceases to be so cutthroat, it will go a long way towards solving this and many other seemingly entrenched cultural problems.


A big distinction here is that different fields have different levels of dependence on prior results. In fields like psychology etc, you don't need the previous results to work in order to run your own experiment. In other words, if you cite a well-known paper saying "people seem to work faster near the color red" and your paper runs an experiment to see if they work faster near the color yellow, if the red paper is later unreplicable, it doesn't change the outcome of your experiment in any way.

In contrast, if you are in machine learning and you are extending an existing architecture you are very directly dependent on that original technique being useful. If it doesn't "replicate" the effectiveness of the original paper, you're going to find out quickly. Same for algorithms research. Some other comments here have mentioned life sciences being the same.

So I think there's a qualitative difference between sciences where we understand things in a mostly statistical way (sociology, psychology, medical studies) where the mechanism is unknown (because it's very very complicated), but we use the process of science mechanistically to convince ourselves of effectiveness. e.g. I don't know why this color makes people work faster/ this drug increases rat longevity / complex human interactions adhere to this simple equation, but the p value is right, so we think it's true. Versus sciences where we have a good grasp of the underlying model and that model is backed up by many papers with evidence behind it, and we can make very specific predictions from that model and be confident of correctness.


In the world of chemistry, biochemistry and microbiology a huge step forward would be for journals to require a complete list of products used. The publication should also include the certification of analysis for each item as they vary over time.

For example, here are two product specifications for a dye called Sirius Red, the first by Sigma-Aldrich[1] and the second by Chem-Impex[2]. The Sigma-Aldrich product contains 25% dye while the Chem-Impex contains equal or greater than 21%. These two dyes could be quickly assessed with a spectrophotometer in order to determine an equivalency, however you need both dyes on hand which doesn't seems like a good use of funding. Also this touches on another problem in replication which is, what is in the other 75%+ of the bottle?

[1] http://www.sigmaaldrich.com/Graphics/COfAInfo/SigmaSAPQM/SPE... [2] http://www.chemimpex.com/MSDSDoc/22913.pdf


Look at research done on many political hot button topics. They love results that have not been repeated. I see all sorts of posts even on HN that reference such "science" as well. The root problem, people who are pushing an agenda.


> The inconvenient truth is that scientists can achieve fame and advance their careers through accomplishments that do not prioritize the quality of their work

An even more inconvenient truth is that scientists cannot even keep their jobs if they prioritize the quality of their work. The pressure to publish novel results is too strong and it is almost impossible to get any support for confirming previous ones.


I agree with the main point of this article but in terms of its analysis and prescriptions I think it gets two things backwards. (1) Most scientists seek fame as a means to the end of getting tenure and funding, not the other way around; if you gave them tenure (and the ability to move their tenure to somewhere else if they wanted to move) and perpetual funding and told them they could choose to be anonymous, I think many would choose that option. (2) Replication is not done/published enough because the incentive to do so (measured in: increase in probability of getting tenure per hour spent) is not high enough, not because people are overly willing to accept unreplicated work.

In order for a lot more replication to get published, what would be needed would be for people who spent their careers replicating others' results (at the expense of not producing any important novel results of their own) to get tenure at top institutions (outcompeting others who had important novel results but not enough published replications).


"repeated" in this context is not incorrect, but i think "replicated" is perhaps a better choice.

That aside, i think repeatability is a much more useful goal (rather than "has been repeated"). For one thing, meaningful replication must be done by someone else; for another, it's difficult and time consuming; the original investigator has no control over whether and when another in the community chooses to attempt replication of his result. What is within their control is an explanation of the methodology they relied on to produce their scientific result in sufficient detail to enable efficient repetition by the relevant community. To me that satisfies the competence threshold; good science isn't infallible science, and attempts to replicate it might fail, but some baseline frequency for ought to be acceptable.


This is wrong-headed in the extreme.

What we should demand is scientific results that have FAILED.

When we see a p=0.05, but we don't know that this SAME EXACT EXPERIMENT has been run 20 times before, we're really screwing ourselves over.

Relevant: https://xkcd.com/882/


Replication isn't enough. It's also necessary to know how many non-replications have occurred but got swept under the rug. It's not the existence of replications that matter---it's the rate of replication relative to number of replication attempts.

So I agree with the title "We Should Not Accept Scientific Results That Have Not Been Repeated". But I would add to it "We Should Not Accept Scientific Results from Studies That Weren't Preregistered". Registration of studies forces negative results to be made public, allowing for the positive result rate / replication rate to be calculated.

Otherwise the existence of a "positive" result is more a function of the trendiness of a research area than it is of the properties of the underlying system being studied.


More pragmatically, we should not accept scientific publications and conferences which do not publish negative results and disconfirmations.


I disagree.

One part of science is observation. Including observations which cannot be, or at least have not been, repeated. For example, consider a rare event in astronomy which has only been detected once. Is that science? I say it is. But it's surely not repeatable. (Even if something like it is detected in the future, is it really a "repeat"?)

Some experiments are immoral to repeat. For example, in a drug trial you may find that 95% survive with a given treatment, while only 5% survive with the placebo. (Think to the first uses of penicillin as as real-world example.)

Who among you is going to argue that someone else needs to repeat that experiment before we regard it as a proper scientific result?


> One part of science is observation. Including observations which cannot be, or at least have not been, repeated. For example, consider a rare event in astronomy which has only been detected once. Is that science? I say it is. But it's surely not repeatable.

First off, you can accept the observation at face value as an observation, but conclusions drawn from the claims which have no other support or means of verification should not be accepted and would not be accepted. Fortunately, most of the time even if something is initially sparked by a very rare occurrence, it will have some kind of implications that are verifiable by some other means other than just waiting for something to happen in space.

But even something that is rare and relies on observation, like gravitational waves, we have already been able to identify more than one occurrence.

> Some experiments are immoral to repeat. For example, in a drug trial you may find that 95% survive with a given treatment, while only 5% survive with the placebo.

What's more immoral, releasing a drug that's only had one test, even a striking one, on the public as a miracle cure that you have not truly verified or performing another test to actually be sure of your claims before you release it?

> Who among you is going to argue that someone else needs to repeat that experiment before we regard it as a proper scientific result?

That's how science works. If something is not independently repeatable and verifiable then science breaks down. Look at the recent EM drive. Most scientists in the field were skeptical of it, and once it was finally attempted to be independently verified the problems were found.

Independent verification is the cornerstone of science and what makes it different from bogus claims by charlatans.


> conclusions drawn from the claims which have no other support or means of verification should not be accepted and would not be accepted

I disagree. In all cases, even with repeated experiments, the claims are only tentatively accepted. The confirmation by others of Blondlot's N-rays didn't mean they were real, only that stronger evidence would be needed to disprove the conclusions of the earlier observations.

Astronomy papers make conclusions based on rare or even singular observations. Take SN1987a as an example, where observations from a neutrino detector were used to put an upper limit on the neutrino mass, and establish other results.

> "or performing another test"

This question is all about repeating an experiment. Repeating the experiment would be immoral.

There are certainly other tests which can confirm the effectiveness, without repeating the original experiment and without being immoral. For the signal strength I gave, we can compare the treated population to the untreated population using epidemiological studies.

But under current medical practices, if a drug trial saw this sort of effectiveness, the trial would be stopped and everyone in the trial offered the treatment. To do otherwise is immoral. As would repeating the same trial.


> But under current medical practices, if a drug trial saw this sort of effectiveness, the trial would be stopped and everyone in the trial offered the treatment. To do otherwise is immoral. As would repeating the same trial.

Then perhaps current medical practices should change. The benefits to those who were previously given the placebo should be balanced against the probability that the observed outcomes may not occur in other circumstances.


Are you for real? You would sacrifice people upon the alter of reproducibility?

Down that path lies atrocities. The system was put into place to prevent repeats of horrors like the "Tuskegee Study of Untreated Syphilis in the Negro Male".


I'd rather not sacrifice people on the altar of a single study, no matter how significant the results. Down that path lies atrocities, too, albeit of a quieter sort.


As I said earlier, there are alternatives which are both moral and can verify effectiveness without having to repeat the original experiment.

You chose to not verify, and insist upon repeating, thus likely consigning people to unneeded pain and even death.

I'll give a real-world example to be more clear cut about modern ethics and science. Ever hear of TGN1412? https://en.wikipedia.org/wiki/TGN1412

It went into early human trials, and very quickly caused a reaction. "After very first infusion of a dose 500 times smaller than that found safe in animal studies, all six human volunteers faced life-threatening conditions involving multiorgan failure for which they were moved to intensive care unit." (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2964774/ )

Here's a publication of the effects: http://www.nejm.org/doi/full/10.1056/NEJMoa063842 .

Is it moral to reproduce that experiment? I say it is not moral, and must not be repeated even though it is possible to do so.

Can a publication about the effects still be good science even though medical ethics prevent us from repeating the experiment? Absolutely.

What say you?


I love that both of our replies used 1987A as an example...


> But even something that is rare and relies on observation, like gravitational waves, we have already been able to identify more than one occurrence.

But who didn't believe the original result? And does the same experiment observing multiple occurences really count as 'reproducability'?

We've only had ONE observed local group supernova within the past several hundred years (and that was within our lifetime, thankfully with several relevant detectors up and running). Should we ignore any result or conclusions from this instance?

No - if the data from the next supernova disagrees or reshapes the field - and it probably will, given the huge amounts of resources dedicated to studying it (see e.g. http://snews.bnl.gov/), this will just be evidence of scientific progress - reshaping your position based on experimental data.

Again, I think there is a certain amount of crosstalk with people who say that the "Entire community of scientists" has a problem, whilst actually meaning specific fields. Perhaps an ironic imprecision.


The point of science is to discover truth in a way that is objective and convincing. That's not to say there isn't other truth out there. I think people mixing 'science' up with 'truth about the universe' is causing some unnecessary cognitive dissonance.


Didn't John Oliver say the same thing few months ago in his episode on scientific studies, https://www.youtube.com/watch?v=0Rnq1NpHdmw


It's not just money that prevents people from repeating experiments, it's recognition.

The general idea for research to be accepted is that it makes some novel, albeit small, impact on the field, acceptable for publication in a peer reviewed journal or proceeding. Repeating someone else's experiments wont get you that, so in general it wont help you graduate or move you toward a higher position at a university or in your profession, meaning there is very little motivation for researchers to pursue such endeavors.

So instead of just throwing money at the problem, we may need to entirely revamp how we recognize the pursuits of researchers.


We learned the importance of this in high school science and it baffles me that it's not already the case.


We have some kind of weird hero-worship of scientists where the general public just believes what they say, even if they never even attempt to replicate their results. They do an experiment (which may or may not be scientifically sound to start with) and then publish results, and the public eats it up.

And then people have the nerve to say, "Last week chocolate was bad for me, now it's good? Make up you mind!" No, stop listening to un-replicated studies! Jeez.


Good point about the public involvement. The public and the news systems are part of the problem.

I've lost count of how many 'battery breakthrough' articles I've come across, but they seem to pass the newsworthy test.


Wasn't the problem with battery breakthroughs that they don't commercialise well, rather than that the science doesn't repeat?


I have an alternative proposal: do a study right the first time.

That means:

A) Pre-registering the study design, including the statistical analysis. Otherwise, attaching a big label "Exploratory! Additional confirmation needed!"

B) Properly powering the study. That means gathering a sample large enough that the chances of a false negative aren't just a coin flip.

C) Making the data and analysis (scripts, etc.) publicly available where possible. It's truly astounding that this is not a best practice everywhere.

D) Making the analysis reproducible without black magic. That includes C) as well as a more complete methods section and more automation of the analysis (one can call it automation but I see it more as reproducibility).

Replication of the entire study is great, but it's also inefficient in the case of a perfect replication (the goal). Two identical and independent experiments will have both a higher false negative and false positive rate than a single experiment with twice the sample size. Additionally, it's unclear how to evaluate them in the case of conflicting results (unless one does a proper meta-analysis--but then why not just have a bigger single experiment?).


Your proposal is comparable to saying that checks and balances are not needed in a democracy, politicians just need to govern "right". This is about incentivising scientists to do the right thing instead of merely demanding it, like you do.


How is advocating for a new set of best practices any more "demanding" or wishful than a regime of obligatory replication? And how is this categorically different from current practices such as peer review, disclosing conflicts of interest, an IRB, etc.?


I think with the increased visibility of scientific research to the general public, it's less that science needs to stop accepting unrepeated results, but instead the paper process needs to be updated to reflect the new level of availability, and journal databases need better relationship views between papers and repeated tests.

As an outsider looking in on the Scientific process, I am not really sure how applicable my opinions are, but I see these as useful changes.

Basically, in reverse order, my suggestions for science to adopt are as follows:

Papers in databases need to have fields related to reproduction studies, and it needs to start becoming a prideful part of the scientific process; just as there is a lot of pride and money, researchers should start to thump their chest based on the reproducibility of their work, actively seeking out contemporaries and requesting a reproduction study as part of the pubilshing process, and subsequently updating.

The papers published themselves should take a moment (perhaps no more than a paragraph) to include a "for media" section that outlines the "do's and don't's" on reporting on the research. For example, cancer research should clearly state examples of acceptable understandings in lay person terms as a sort of catch for sloppy reporting. Something like "Do not write "cure for cancer found" or "Effective treatment", instead write "progress made, etc". Basically put a sucker punch to outlandish headlines and reporting right in the paper itself, and let journalists who want to be sensationalist embarrass themselves.

This seems like two very simple acts that could raise the bar for science a bit.


Those are both good but the key here is the media needs to understand that scientific papers that have not been independently verified are in a "maybe" state.

Of course, they probably do know this and just choose to ignore it because "Unverified Study that MIGHT Point to M&M's Being Good For You" won't get as many clicks as "M&M's Are Good For You Says New Study!"


This is sort of why I think having it stated explicitly within the paper, not just an aside but part of the actual process. It's to pit less scrupulous journalists against one another, in an "honor among thieves" sort of way I guess. If someone wants to go ahead and write clickbait, they can, but it leaves them open to someone else looking to discredit them going "well, did you even read the paper? they told you not to write that."

it's not so much checking for the public purpose, it's for others.


I think also it would be helpful if they listed all the possible flaws first, including things about if it has been replicated etc.


Most disciplines where correctness is important seem to end up having some adversarial component. It is explicitly how the justice system in the US works [1]. Many software companies have separate QA departments that are deliberately kept at a remove from the engineers to encourage some rivalry between them. Security issues are almost always managed in an adversarial way (though here, you could argue that's because it reflects how the system itself is [mis-]used). Markets are intended to allow fair competition between producers to find an optimal price and product for consumers.

Peer review is supposed to do this, but the fact that peer reviewers are often colleagues leads to collusion, whether intended or not.

Maybe we need a separate body of scientists whose sole job—and whose entire prestige—derives from taking down and retracting bad science.

[1]: https://en.wikipedia.org/wiki/Adversarial_system


It's unfortunate that the suggestions at the end don't seem to offer a realistic attack vector.

> First, scientists would need to be incentivized to perform replication studies, through recognition and career advancement. Second, a database of replication studies would need to be curated by the scientific community. Third, mathematical derivations of replication-based metrics would need to be developed and tested. Fourth, the new metrics would need to be integrated into the scientific process without disrupting its flow.

Yes, absolutely those things need to happen, but the problem is how to get this funded, how to get people to not see reproducing results as career suicide, right? Items 2-4 will fall out as soon as item #1 happens.

How do we make item #1 happen? What things could be done to make reproducing results actually an attractive activity to scientists?


The problem is that, if you put mere reproduction as a goal, many scientists would see that as low hanging fruit to beef up the resume, so we'd get countless unnecessary "experiments".

I'd say the goal that gets credited should not be merely reproducing the results, but finding errors in the previous research. That would count as novel, and is something that is presently recognized as contribution. The only problem is that journals or conferences treat it as unattractive, so good luck publishing something of the kind...


> The problem is that, if you put mere reproduction as a goal, many scientists would see that as low hanging fruit to beef up the resume, so we'd get countless unnecessary "experiments".

Only if you assume the incentives for the 2nd, 3rd, 4th, etc. reproduction experiments remain the same, right? I wouldn't assume that, both because the first reproduction is the most valuable, and for the reasons Ahmed discussed in the article - that scientists are motivated by their perceived ability to do something novel. So first reproduction might be novel, but the fifth would certainly be less valuable, so I wouldn't personally assume we'd get a flood of useless experiments.

> I'd say the goal that gets credited should not be merely reproducing the results, but finding errors in the previous research

Reproducing an experiment is meant to, without prejudice, either confirm or deny the previous research. It's not meant to confirm the previous results, it is meant to ask whether there could be errors in the research, but without assuming there are errors.

It is novel to validate a result the first time, whether it's positive or negative, and for this incentive system to work, it has to appeal to people who might not find something dramatic or contradictory. It must be appealing to do the work, regardless of the outcome, or it's not an incentive at all.


I thought this was in the definition of "scientific"


Peer review is how most science is defined as science and peer review does not require reproduction of the work.


Much of what we peer review is not real science, at least in its definition of applying the scientific method.

For example, much of computer "science" is not. Math maybe, engineering probably, design sometimes, but "science" is rarely done. BUT the science envy is there, especially post 1990s, and it is as confusing as heck when multiple definitions of "science" collide in a conference culture.

Yes I'm a researcher, no I'm not a scientist.


Peer review is a relatively new aspect of science: the Philosophical Transactions of the Royal Society started systematic peer review in the mid-19th century https://arts.st-andrews.ac.uk/philosophicaltransactions/brie...


You've illustrated the distinction between prescriptive and descriptive definition nicely.


Define reproduced? Do we mean "conduct the same experiment multiple times so we can assess the variance on the outcome"? Or do we mean "conduct the same experiment multiple times to figure out if the first result is a screw-up"?

Those two aren't the same, and I think far too many think that the point is the latter when, imho, it's actually the former. Pure screwups will likely get found out, just like glaring bugs are usually found. It's when your result actually has a huge variance but you're looking at only one (or a few) samples and draw conclusions from it that's insidious, like the fact that it's the bugs that just change the output by a tiny bit that are the hardest to notice.


I've always been amazed by how widely the Stanford Prison Experiment results are accepted when a) the experiment has not been repeated and b) the experiment didn't even get completed. It was stopped when the researchers had made up their minds about the results.


So, we should have a structured way to represent that one study reproduces another? (e.g. that, with similar controls, the relation between the independent and dependent variables was sufficiently similar)

- RDF is the best way to do this. RDF can be represented as RDFa (RDF in HTML) and as JSON-LD (JSON LinkedData).

... " #LinkedReproducibility "

https://twitter.com/search?q=%23LinkedReproducibility

It isn't/wouldn't be sufficient to, with one triple, say (example.org/studyX, 'reproduces', example.org/studyY); there is a reified relation (an EdgeClass) containing metadata like who asserts that studyX reproduces studyY, when they assert that, and why (similar controls, similar outcome).

Today, we have to compare PDFs of studies and dig through them for links to the actual datasets from which the summary statistics were derived; so specifying who is asserting that studyX reproduces studyY is very relevant.

Ideally, it should be possible to publish a study with structured premises which lead to a conclusion (probably with formats like RDFa and JSON-LD, and a comprehensive schema for logical argumentation which does not yet exist). ("#StructuredPremises")

Most simply, we should be able to say "the study control type URIs match", "the tabular column URIs match", "the samples were representative", and the identified relations were sufficiently within tolerances to say that studyX reproduces studyY.

Doing so in prosaic, parenthetical two-column PDFs is wasteful and shortsighted.

An individual researcher then, builds a set of beliefs about relations between factors in the world from a graph of studies ("#StudyGraph") with various quantitative and qualitative metadata attributes.

As fields, we would then expect our aggregate #StudyGraphs to indicate which relations between dependent and independent variables are relevant to prediction and actionable decision making (e.g. policy, research funding).


According to old-school philosophy of science truth could be discovered only by removing all the nonsense, as a remainder, not by pilling up nonsense on top of nonsense out of math and probabilities.

Probabilities, for example, are not applicable to partially observed, guessed and modeled phenomena. It should be a type-error.

As for math - existence of a concept as a mathematical abstraction does not imply its existence outside the realms of so-called collective consciousness. Projecting mathematical concepts onto physical phenomena which could not be observed is a way to create chimeras and to get lost in them.

Read some Hegel to see how it works.)


Ironically, one of the reasons Semmelweis's colleagues rejected his "hand-washing" hypothesis was that it did not have a good enough empirical/statistical basis.

http://www.methodquarterly.com/2014/11/handwashing/ https://en.wikipedia.org/wiki/Contemporary_reaction_to_Ignaz...


Or at least, the media shouldn't report on results until they have been repeated. This would cut down on the daily "X causes cancer / X doesn't cause cancer" media spam.


The solution is easy and it applies to most sciences: all research articles should include a pointer to download the dataset that was used and an annex with the details on how it was collected.


Agreed, which means 50% of social science at least is disqualified and should not be making into future publications or become part of curriculum.


Like climate science, right? Let's set up a statistical meaningful set of equivalent earths, and start doing some serious peer review.


this increasingly includes code that needs to run in the future, and citations within code, see this group working in that field https://www.force11.org/sites/default/files/shared-documents...


Just to play devils advocate, won't there be a self correcting mechanism?

If results are genuinely useful, then people will want to build upon that work, and will have to repeat the science. On the other hand if it can't be repeated, then it will not get further work done and fade into obscurity. Curious what other peoples opinion on this are?


I think a better way of thinking about what we want than "repetition" is "independent corroboration".


Sometimes in cs if your research is embedded in a huge ecosystem, it can become quite expensive to reproduce results. I mean proper reproduction, not just rerunning the Benchmarks. If you are dealing with complicated stuff, the reproducer might also just not be able to do the same thing technically.


Maybe, maybe not. Do you have something specific in mind here?

I hope researchers and scientists don't considers others not capable enough, and therefore withhold info on how to reproduce.

Even if the experiment is crazy expensive and complex right now it might be considered much more tractable in 10 years, or someone builds upon your work and invents a simpler method to show the same thing.


I am thinking of huge endeavors like building an asic or huge complex systems like virtual machines. Not always a comparable system for Repetition is available and must be built from scratch. Affording such rebuilds require huge sums.

Of course nobody does consider others not capable enough. Its just that there are not so many people experienced enough to build certain systems in a decent amount of time.


Just being able to rerun the benchmarks (or other data analysis for non-CS papers) would be an improvement on the current state of affairs, where people often don't publish code nor data.


Agreed. Some cs conferences verify artifacts these days, which is a good first step.


In for instance, a bioscience lab, I don't believe that results should even be accepted unless they're repeated with similar reagents. Some reagents are so specific they only prove something.... for that one single thing, which could be unique on this planet.


In that case, macro-economics is simply disqualified from being scientific. It's almost impossible to repeat large-scale events, controlling for all variables. Have to say I'm not particularly impressed with the quality of Nautilus's analysis.


> In that case, macro-economics is simply disqualified from being scientific.

Well, duh? (In other words: of course it's not a science.)


One problem is that there is no incentive to replicate. From the PhD onwards, academia creates incentives for original research. Replications, particularly those that confirmed existing research, would not benefit the researcher much.


From the authors of "Why Science Needs Metaphysics" this rings a little hollow.

Nautilus is just a slightly less vitriolic version of the Aeon-class anti-science postmodernist blog. Like Aeon, it's garbage.


Independently verified and repeated I would add.

After all any scientific test that fails when somebody else repeats it is not science but the domain of magic and religion, so clearly not science.


I searched for the word "tenure" in the article, but didn't find it.

The drive to get tenure is a big reason that scientists publish so much, funny that was not mentioned.


What we need is accountable statistics - something that cannot be manipulated.

One idea is to enforce storing or indemnifying a time-stamped data base of the raw data on the block chain


How do you that in the medical field? Studies are often based on a small number of patients affected by a particular condition.


wait! we should use the scientific method for science?

its a radical suggestion. sad that it is, but true... ;)

seriously though... you can't have falsifiable results if you don't constantly try to falsify them. then it just becomes a result, which means the conclusion you can draw is close to nothing.... not quite nothing, but exceptionally close. :)


There is no science without repetition.


We also should not accept historical claims that have not been repeated :)


We now have the tools to do it , and we should be doing it. The fate of scientific findings is not to publish papers, they belong to open and continuous scrutiny. And someone should build a github of scientific facts.


let's try it, if interested, send me a tweet at @isaacpei, seriously thinking about creating something for this


There is no lack of efforts to get scientists discussing (shameless related plug http://sciboards.com), but unfortunately there is a disincentive for scientists to do so (politics).


UGH. Scientists are not getting funds to repeat result.


Lottery < Statistics < Science




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: