Hacker News new | past | comments | ask | show | jobs | submit login
Science Exchange Reproducibility Initiative (reproducibilityinitiative.org)
113 points by abbottry on Aug 14, 2012 | hide | past | web | favorite | 35 comments

This is excellent. I have a few questions:

You need to provide the background of your study, the types of experiments undertaken, the materials and methods, and initial results of your study.

Do the technicians reproducing the results get to see the initial results? It seems like it might be more accurate if they didn't. Lots of parameters can be fudged and adjusted, a la Millikan's Oil Drop, when the results don't quite match. I imagine this might be exacerbated with the necessity of researcher-validator communication.

How are conflicts resolved? If my results are not validated, someone made a mistake - me or the validator. If both parties stand by their mutually incompatible results, where does it go from there? I can imagine a lot of researchers I know feeling annoyed that someone whose expertise they cannot verify (due to anonymity) won't "do my experiment correctly".

I imagine that in time there might be specific requirements or explicit funding allocations for such reproduction on grant applications, which would really allow it to take off. As it stands, I imagine a lot of PI's would just ask "hmm, I can spend money that might risk my already high-impact paper, or I can keep the money and not be considered wrong."

Still, this is a great first step toward facilitating a central tenet of the scientific method. Congratulations.

This is Bilal from Science Exchange, and we greatly appreciate your support for our initiative!

We will provide the methodology of the original study to those reproducing the results, while the results we believe will be helpful to check against.

For conflicts, it's true we can't force an investigator to publish or note the lack of reproduced outcomes. We do hope they will through the PLOS Collection, for transparency. We do feel though it can provide a valuable check for 'failing fast', for those investigators who want robust results.

In this initial stage, we also agree that funding will be difficult. That's why we hope to focus on small biotechs and research labs that are interested in commercializing their research, and need to show robustness of results for licensing opportunities.

Hopefully then, it can serve as a proof-of-principle for funding agencies to provide a requirement or increase support for reproducibility.

I think this will be a very valuable service for small biotechs. It would improve the value of their patents and increase chances of getting further funding.

As an experienced scientist myself, I can say there is plenty of scope for misunderstanding and misinterpretation, no matter how careful the original authors were. So I only hope that there will be some (maybe blinded) mechanism for communication between the original researchers and those replicating the published findings, in the event of the "usual" complications.

Also, who would do this replication? Would some of it be outsourced to academic labs with the requisite experience? Advanced findings often depend on advanced techniques. Outsourcing could be problematic, politically, since Big Shot No.1 might not be too interested in shooting down his pal, Alpha-MD-PhD.

Thank you for the support!

To clarify, the validation studies will be matched to core resource facilities and commercial research organizations, who specialize in conducting certain experiments on a fee for service basis. As they are paid upon completion of a service, regardless of outcome, we feel they are the solution to many of the misaligned incentives in academic research.

With respect to communication between the original authors and those conducting the validation, we definitely agree there needs to be some degree of communication, given the complexities of research. We will originally match a researcher to their provider in a blind fashion, so they have no choice in who conducts their study. But once a provider is selected, they can communicate with one another in explaining the methodology, experiments, etc.

It's probably a bit late to respond to this... However, given that this work is likely only to be undertaken when the stakes are high, I would think that blinded communication between the original researchers and those replicating the work would be a good idea. If they are going to shoot it down, they probably don't want the original people to know that they did so.

Bilal, I just want to thank you whole hearty for your, and your teams, effort! I'm a Phd student in engineering, where as far as I can tell, it is even worse. I love academia, but the fact that we don't even attempt to live up to the standards we supposedly agreed on makes me sad.

Especially in computer focused areas like CS or Statistics, it would be straight forward to submit more or less completely self-reproducible papers. Alas, it is seemingly impossible to find an adviser who will allow you to publish all your source code and primary data, let alone spend the additional time to get them into a publishable form.

You're welcome! We completely acknowledge the difficulty in reproducing scientific research, acros disciplines. We are focused on preclinical biological research for now, but hopefully in the future we can expand to other areas like CS and Statistics.

It's very expensive (prohibitively so) to reproduce any substantial study. It also feels like there's a lot of potential downside and very little reward, since there's a presumption of correctness in published papers today. Further, wet lab protocols are subjective enough that I'd imagine most labs would ignore a negative result as having been performed incorrectly.

Where does the money for this come from? Are you expecting that labs will write this cost into their grants? Have you seen interest from grant agencies to then actually pay for this?

> Further, wet lab protocols are subjective enough that I'd imagine most labs would ignore a negative result as having been performed incorrectly.

That kind of illustrates the usefulness of replication! If the effect is so finicky that it cannot even be replicated in another lab, in what sense is that result interesting, useful, - or even real?

I think you overestimate most biology labs. It's really easy to screw up a protocol in subtle ways. Often it took the original lab weeks of debugging to get it to work (yes, to get it to work, not to produce a false positive).

Things get more reliable as time goes on and methods become better understood, but the critical steps in most experiments are usually nontrivial to replicate in the beginning. Especially by CROs, which are notorious black boxes, incredibly expensive, and often just plain old unreliable.

Hi, this is Bilal from Science Exchange. We are in discussions with funding agencies and foundations to possibly gather increased financial support. We feel this initial launch will provide a proof-of-principle to gather support.

Currently though, we hope investigators will pay for the validation themselves. We believe it can provide a valuable service to those in small biotechs or those who want to commercialize their research, to improve the robustness of outcomes for potential licensing opportunities.

There are very few journals that do this -- I can't imagine this ever taking off.

Reproducing experiments seems like a costly (and mostly thankless) effort that few PIs would ever take up.

The only journal I know of that completely reproduces results is Organic Syntheses (http://www.orgsyn.org/), which reproduces every reaction before publication and has a Procedure Checklist for authors: http://www.orgsyn.org/AuthorChecklist.pdf

Hi, this is Bilal from Science Exchange. Appreciate your concern, but that is why we feel Science Exchange is uniquely positioned to make a difference. We have developed a network of 1000 core facilities and CROs on our platform, who operate on a fee-for-service basis to conduct experimental services. The Reproducibility Initiative will leverage this network to outsource specific experiments within a study to these facilities.

It is quite common for experiments to be replicated, usually as preliminary to a further advance.

If the findings claimed are significant enough, other labs will often attempt to replicate a few of the key experiments, and if they don't do so quickly, abandon that direction of investigation, and not report their negative findings.

The part I like about the execution is where you have CROs and Core Facilities, and not scientists themselves validate results. Apparently Milikan when he measured the electric charge was off (smaller than actual) in the measurements... but researchers who checked, worried perhaps about their academic reputation (speculating here) slowly adjusted this number upwards over several years till it asymptotically approached the truth. Having a 3rd-party, that is less impacted by academic politics might be a good thing.

From Feynman's Caltech commencement speech on this:

We have learned a lot from experience about how to handle some of the ways we fool ourselves. One example: Millikan measured the charge on an electron by an experiment with falling oil drops, and got an answer which we now know not to be quite right. It's a little bit off, because he had the incorrect value for the viscosity of air. It's interesting to look at the history of measurements of the charge of the electron, after Millikan. If you plot them as a function of time, you find that one is a little bigger than Millikan's, and the next one's a little bit bigger than that, and the next one's a little bit bigger than that, until finally they settle down to a number which is higher.

Why didn't they discover that the new number was higher right away? It's a thing that scientists are ashamed of--this history--because it's apparent that people did things like this: When they got a number that was too high above Millikan's, they thought something must be wrong--and they would look for and find a reason why something might be wrong. When they got a number closer to Millikan's value they didn't look so hard. And so they eliminated the numbers that were too far off, and did other things like that. We've learned those tricks nowadays, and now we don't have that kind of a disease.

[1] http://www.lhup.edu/~dsimanek/cargocul.htm

I was very surprised when I first learned that results could be published without first having been reproduced. I thought at the least they should have been published with a separate label.

This is a worthy initiative.

Traditionally, the public literature is how scientists communicated, which is what made republication possible in the first place. Lab A, perhaps in Germany, works on a research project, and gets some results. They write up their methodology and results, and communicate them to other scientists by publishing the write-up in a journal. The journal will have the write-up peer-reviewed before publishing it, but that doesn't mean verifying it's correct, only that it's a contribution to the scientific literature worth communicating to its readership. For example, it would make sure the paper has done appropriate data analysis, has cited relevant existing work, is not outright kookery, etc.

Now other scientists, anywhere in the world, will read this and decide what they think of it, which may include attempting to replicate it. If they find different results, or have a qualm with an interpretation or methodology, they can write a response, which will also (after peer-review) be communicated by the journal to its readership, either as another paper, or if shorter, as a letter. Thus the back-and-forth of claim/replication/critique happens, all in the open air of the journal's pages, with any lab anywhere in the world able to follow the conversation and jump in if they wish.

To me that's more transparent than doing all this in some kind of closed pre-publication process. Perhaps in the modern era, the open communication doesn't necessarily have to happen in journals anymore: perhaps it should happen on arXiv, and journals should become more conservative in what they publish. But I would like it to still be public.

I was very surprised when I first learned that results could be published without first having been reproduced.

A lot of psychology journals have been burned by that over the years,




and as a consequence, just within the last year or so, the most prestigious psychology journals are beginning to ask authors of papers on new experimental research to show replication across at least two distinct data sets. This, in turn, is driving more collaboration among researchers at different research centers. It's a long overdue development, one in accord with best practice in science,


and I hope this practice spreads to most disciplines that publish new research reports in the major journals of the discipline.

(Although the requirement for prepublication replication is not explicitly mentioned in the publication guidelines of Psychological Bulletin,


which is the most prestigious psychology journal published in the United States, I have been told at the University of Minnesota behavior genetics "journal club" that replication across more than one data set is becoming a tacit requirement for publication in most of the better journals. Perhaps that guideline will eventually be made explicit for all of the better journals.)

I don't see this approach really able to solve the underlying problems that it references from

http://www.nytimes.com/2012/04/17/science/rise-in-scientific... and other articles.

"Each year, every laboratory produces a new crop of Ph.D.’s, who must compete for a small number of jobs, and the competition is getting fiercer. In 1973, more than half of biologists had a tenure-track job within six years of getting a Ph.D. By 2006 the figure was down to 15 percent."

I would claim that science requires some basic integrity in its practitioners and if an institution is treating it's members as so many throw-away resources, it is hard to expect those members to move ahead with great idealism. The model of every desperate competitor watching every other competitor seems to be the replacement for model of science as a high ideal. I don't see it working out well.

It's true the issue of reproducibility in research is a complex one, with many underlying factors. Culture, misaligned incentives, and lack of resources play strong roles.

We believe however our Initiative can help in laying an initial framework for how one can possibly address aspects of the problem. We've tried to build incentives (ease of outsourcing, rewarding publications) that factor into this issue, and can assist in improving outcomes. But we definitely agree that a holistic solution will require further changes to the academic research infrastructure.

Proofs are for math, not science.

Agree, the title is poorly editorialized. Proofs and validations are very different things. Validation is about being confident that a model can make useful predictions.

Maybe the title should say "probe it". Just saying.

The NSF CISE guidelines (which apply to much of the funded CS research that folks on HN care about) currently require you to retain all data required to reproduce your experiment and make them available to any reasonable request: http://www.nsf.gov/cise/cise_dmp.jsp

My advisor is on appointment there right now, and they're working on guidelines that require you to provide full reproducibility of your program results (modulo system differences). At least for software, things are looking up. Of course, reproducibility is both infinitely easier (I can do it on any system!) and harder (what do you mean the kernel patchlevel or CPU/GPU sub-model-number matters?).

Plasmyd does this via crowdsourcing, it lets users comment on papers so that researchers can point out anomalies or post their inability to reproduce the same results. It also gives the author a chance to explain their work.

It is frustratingly difficult to get biomedical scientists to openly discuss the work in their field. PLoS tried to have discussion sections to papers, and Nature has tried various approaches as well. Neither have worked. I looked-up my favorite topic (a popular one) on Plasmyd, and found 20 papers, none of which had a single comment associated with it.

It's honestly a vast waste of intellectual potential, I wish I knew the solution!

Bilal from Science Exchange here. Definitely agree that Plasmyd is a valuable service. We feel this Initiative operates in complement, providing a mechanism for independent validation.

"Validations are conducted blind, on a fee-for-service basis."

Seems like a potential conflict of interest there.

How? If they don't know what the answer is supposed to be, it's not clear how they can fake data. This can further be improved by universities contributing to a 'noise' fund, where requested experiments are expected to _not_ turn anything up.

Also, the main purpose of this is not to catch corruption, just prevent mostly honest researchers from fooling themselves (and then others), and increasing the base line for research.

I see it as something like the National Institute of Standards and Technology (http://www.nist.gov/index.html).

I don't think most academic labs will pay for this. However, I can see it as being something that VC investors might like to see the next time someone presents with a wonder drug.

Bilal from Science Exchange here. The validations are blind in the sense that we match an investigator who has submitted a study to a provider who can validate it. The investigator doesn't have a choice of who validates their study though.

The fee-for-service actually helps to guard against biased results in two ways. Firstly, providers who validate studies are paid on completion, regardless of a reproducible or irreproducible outcome. Secondly, operating on a fee-for-service basis allows providers to operate outside academic incentives for publications, ensuring no incentive for biased-positive results.

This is great! Too often scientific papers are published that are never actually checked by anyone else. A key part of the scientific method is reproducibility though.

The title irks me. Instead of "prove it", how about "let us reproduce your results" or "show us the data". Proof in this context is a mistaken concept.

Title changed -- geeeeeze #cowersinthecorner

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact