Notice of Retraction due to a programming error

DavidSJ · on Oct 13, 2019

To reduce the occurrence of future similar programming errors, the Johns Hopkins Biostatistics Center has instituted a new standard operating procedure for checking randomization assignment to be followed in all trial analyses. To ensure that the group assignment used in any of the trial analyses is correct, a verification process will be included at the beginning and end of each analysis program. This process is intended to confirm that the group assignment separately provided by the trial team matches the group assignment used in the analysis program. The matching confirmation is reviewed by a second biostatistician/analyst before its use in the results.

I don't know what software quality control is already in place at this organization, but this corrective measure seems on its face wholly inadequate to me: they're just preventing a recurrence of the same exact problem, rather than the much broader class of problems due to programming errors. Do they have a code review process in place?

This speaks to a larger issue: if you write software for manipulating data as part of the production of a scientific paper, then the source code should be available for review as an attachment to that paper, and review of said code should be part of the peer review process in any reputable journal. Professional software engineers write bugs all the time that invalidate the correctness of their programs, never mind individuals whose primary job is research, not software.

lol768 · on Oct 13, 2019

>This speaks to a larger issue: if you write software for manipulating data as part of the production of a scientific paper, then the source code should be available for review as an attachment to that paper, and review of said code should be part of the peer review process in any reputable journal. Professional software engineers write bugs all the time that invalidate the correctness of their programs, never mind individuals whose primary job is research, not software.

Completely agreed, the source should be open (ideally FOSS) - but also, the software development should be conducted properly too. On a practical level, using version control and a code review mechanism (e.g. GitHub PRs) within the research group equivalent in rigorousness to what you'd see at a good practice software development shop in industry.

Clinical decisions are made off the back evidence published in peer-reviewed, respected journals. It would seem to me that serious software errors in this domain have the capability to contribute to grave patient consequences. Much more serious consequences than if I introduce a bug into a client project.

virtuabhi · on Oct 13, 2019

> On a practical level, using version control and a code review mechanism (e.g. GitHub PRs) within the research group equivalent in rigorousness to what you'd see at a good practice software development shop in industry.

Let us (professional s/w engineers) not pat ourselves in the back by confusing standard industry practices with 'rigorousness'. Rigorousness would be formal verification and proofs. How many s/w engineers can do that? How much will it slow down the development speed?

DavidSJ · on Oct 14, 2019

There’s a tradeoff between rigor and output. Except in toy or very niche problems, I’m not aware of formal verification striking a good balance between the two, although perhaps I’m just ignorant. But it seems to me that code review and some unit and functional tests are usually a clear win.

DavidSJ · on Oct 13, 2019

Right, that's what I meant by code review in the first paragraph: internal review within the research group.

Agreed also that they should use version control and other standard practices, etc.

benmaraschino · on Oct 13, 2019

The way many clinical research groups are structured (unfortunately) precludes code review, and to some extent, version control. Often for projects like the one referenced in the OP you'll have just one statistical programmer working with a PhD-level biostatistician in addition to the MD investigators. The biostatistician guides the programmer through the statistical methods to use and steers the overall study design, but otherwise they never see the underlying code. There are some exceptions, but in many cases, the programmer ends up the only person seeing the code. A lot of this has to do with how funding is structured—grants are written assuming one FTE programmer, and something like 0.05 FTE for the biostatistician. It's hard to convince funding agencies that you need more than one FTE programmer in many cases; on top of that, farming out the biostatistican's time in small chunks like that splits their attention between dozens of projects, which precludes them engaging with any one of those projects in depth.

So, there's literally nobody else on these projects who could conduct a code review. This also sort of provides a disincentive towards using VCS even though it's so obviously a good idea if you're the only person contributing code—I've talked with programmers about this before and the response is "why bother?" unfortunately.

lol768 · on Oct 13, 2019

> Right, that's what I meant by code review in the first paragraph: internal review within the research group.

Ah, I took it as code review during the peer review process (i.e. from reviewers). Unsure how realistic this would be.

allenofthehills · on Oct 13, 2019

> Code review during the peer review process

Unfortunately, this would make the already laborious process of peer review even longer and require more work. Given the reality of academia today (publish vs perish), not to mention the added work of preparing even well structured, version controlled code (which is not, in my experience common) for publishing, most researchers would not opt-in (or support) something which would make publishing more difficult.

DavidSJ · on Oct 14, 2019

Maybe this sounds harsh, but:

Who cares? If researchers are producing crap to get published, why should we mind if they stop producing that crap when journals raise the standards of publication?

DavidSJ · on Oct 14, 2019

That’s what I was referring to in my second paragraph. :)

trombonechamp · on Oct 13, 2019

I previously made a big list of papers that were retracted due to software bugs. It was intended to go in a manuscript but I had to cut it out because the conference limited the number of references for the camera-ready version. If anyone is interested I can try to dig up the list again!

sdenton4 · on Oct 13, 2019

Sounds great for the arxiv!

In fact, I would treat the retracted papers more like a data set than things to be cited. Then you get a nice paper with counts and statements about common themes, and references on those themes. Then post the dataset as supplementary material available on the arxiv.

sjf · on Oct 13, 2019

Good point, and by not citing the flawed papers you are not inflating their academic pagerank.

Yoric · on Oct 13, 2019

That sounds interesting, yes.

trombonechamp · on Oct 13, 2019

Some of these links go directly to the retracted paper, some go to the retraction notice (where available), and some go to reports of software bugs which impacted others' results but didn't cause any retractions. This list is of course incomplete, but here are a few, mostly pulled from Retraction Watch:

https://dx.doi.org/10.1038/nbt0308-274

https://dx.doi.org/10.1371/journal.pone.0038234

https://dx.doi.org/10.1371/journal.pcbi.0030158

https://dx.doi.org/10.1126/science.314.5807.1875b

https://dx.doi.org/10.1186/1471-2105-5-80

https://dx.doi.org/10.1038/nm0610-618a

https://dx.doi.org/10.1093/bioinformatics/btx811

https://dx.doi.org/10.1200/jco.2015.62.0294

https://dx.doi.org/10.1038/s41562-018-0507-0

https://dx.doi.org/10.1177/0022146515595817

https://dx.doi.org/10.1021/acs.orglett.9b03216

https://dx.doi.org/10.1073/pnas.1602413113

https://dx.doi.org/10.1093/cje/bet075

https://dx.doi.org/10.1001/jama.2019.11954

https://dx.doi.org/10.1126/science.314.5807.1856

juliendorra · on Oct 13, 2019

I would be very interested!

Gatsky · on Oct 13, 2019

This isn't surprising, and I'm sure has happened many times. If you get the result you expect, you are much less likely to check for a mistake. The authors deserve a lot of credit for owning up to it.

They did the analysis with Stata.

catoc · on Oct 13, 2019

This may be the best way to get a negative result published: as a retraction of the published reversed-positive findings!

dmix · on Oct 13, 2019

> Given the corrected finding of a paradoxical increase in acute care use in the intervention group

Now I’m curious why long term intervention/support increased the number of acute cases. Maybe people were more likely to find themselves sick when provided with additional monitoring after they leave the hospital? Some sort of psychological connection or being overly careful?

Plenty of doctors will simply blame your past diagnosis for any broad new symptoms, without doing much critical thinking or investigating. I’ve seen this personally many times in the years following a colitis diagnosis. The symptoms are quite broad and easily mistaken.

Anyone know if the new article is available yet?

ImaCake · on Oct 13, 2019

You can find it here [0]. Easy to miss the reference since it is only indicated in the text by a superscript. Personally I prefer using the text as link or a reference put inside brackets inline.

0. https://jamanetwork.com/journals/jama/fullarticle/2752467

Tarq0n · on Oct 13, 2019

Could easily be type I error. That's the problem with null hypothesis testing. You can't confidently say that the intervention caused the observed change without replication.

cheez · on Oct 13, 2019

I feel like a "best retraction of the year" award with a real monetary prize would be a good incentive to keep these coming.

killjoywashere · on Oct 13, 2019

How does this stand up as a post-mortem? Sufficient? Is there a standard for post-mortems?

user5994461 · on Oct 13, 2019

Curious what is the incentive for an author to retract a study?

Shoulnd't they just leave it out? Continuing to accrue more publications and quotes, or whatever the metrics are in research.

zitterbewegung · on Oct 13, 2019

Well other than academic integrity and general honesty if people can't reproduce your research you might not keep your job.

ImaCake · on Oct 13, 2019

>Over the course of this reanalysis, we detected an error in imputing missing values for the SGRQ, whereby the worst possible score (100) was incorrectly imputed for missing values of participants who had died beyond the 6-month study period. The correct approach would have been to classify those values as missing because those participants had not died by the 6 months after discharge study end point.

The reassignment error is possibly forgivable, but I think this second error should have been easier to catch and is much less easy to forgive. A simple filter check between possible score and some other status variable in the dataset would of caught this mistake. I am doing a Masters in Biostatistics and this kind of checking is being taught to us early on, I hope there is more focus on it later to help avoid mistakes like this.

tomrod · on Oct 13, 2019

Even when following every protocol, everyone will screw up at some point.

The authors' approach is the correct way to deal with a screw up in the academic/research context -- broad communication, transparent assessment of the mistake, eager explanation.

codetrotter · on Oct 13, 2019

Trivial errors can slip through the cracks easily.

For example, people sometimes misspell “would’ve” as “would of” even if they actually know that the latter spelling is actually incorrect.

Pointing fingers is easy after the fact, but spotting every possible error all of the time – no one is able to do that.

I’ve even made that very same spelling mistake you did a time or two myself even though I try really hard to be correct in spelling and in all aspects of grammar, and even though I am well aware that “would of” is just plain wrong. We all slip up, and sometimes we do so in embarrassing ways. Especially when we are lacking sleep or when we are otherwise exhausted.

tomrod · on Oct 13, 2019

Exactly.

The notion of "failing forward" is somewhat related here -- don't focus on the blame game when addressing honest mistakes. (Fraud is a different matter). Focus on rectifying then moving forward.