Hacker News new | past | comments | ask | show | jobs | submit login
Notice of Retraction due to a programming error (jamanetwork.com)
88 points by jphoward on Oct 13, 2019 | hide | past | favorite | 29 comments



To reduce the occurrence of future similar programming errors, the Johns Hopkins Biostatistics Center has instituted a new standard operating procedure for checking randomization assignment to be followed in all trial analyses. To ensure that the group assignment used in any of the trial analyses is correct, a verification process will be included at the beginning and end of each analysis program. This process is intended to confirm that the group assignment separately provided by the trial team matches the group assignment used in the analysis program. The matching confirmation is reviewed by a second biostatistician/analyst before its use in the results.

I don't know what software quality control is already in place at this organization, but this corrective measure seems on its face wholly inadequate to me: they're just preventing a recurrence of the same exact problem, rather than the much broader class of problems due to programming errors. Do they have a code review process in place?

This speaks to a larger issue: if you write software for manipulating data as part of the production of a scientific paper, then the source code should be available for review as an attachment to that paper, and review of said code should be part of the peer review process in any reputable journal. Professional software engineers write bugs all the time that invalidate the correctness of their programs, never mind individuals whose primary job is research, not software.


>This speaks to a larger issue: if you write software for manipulating data as part of the production of a scientific paper, then the source code should be available for review as an attachment to that paper, and review of said code should be part of the peer review process in any reputable journal. Professional software engineers write bugs all the time that invalidate the correctness of their programs, never mind individuals whose primary job is research, not software.

Completely agreed, the source should be open (ideally FOSS) - but also, the software development should be conducted properly too. On a practical level, using version control and a code review mechanism (e.g. GitHub PRs) within the research group equivalent in rigorousness to what you'd see at a good practice software development shop in industry.

Clinical decisions are made off the back evidence published in peer-reviewed, respected journals. It would seem to me that serious software errors in this domain have the capability to contribute to grave patient consequences. Much more serious consequences than if I introduce a bug into a client project.


> On a practical level, using version control and a code review mechanism (e.g. GitHub PRs) within the research group equivalent in rigorousness to what you'd see at a good practice software development shop in industry.

Let us (professional s/w engineers) not pat ourselves in the back by confusing standard industry practices with 'rigorousness'. Rigorousness would be formal verification and proofs. How many s/w engineers can do that? How much will it slow down the development speed?


There’s a tradeoff between rigor and output. Except in toy or very niche problems, I’m not aware of formal verification striking a good balance between the two, although perhaps I’m just ignorant. But it seems to me that code review and some unit and functional tests are usually a clear win.


Right, that's what I meant by code review in the first paragraph: internal review within the research group.

Agreed also that they should use version control and other standard practices, etc.


The way many clinical research groups are structured (unfortunately) precludes code review, and to some extent, version control. Often for projects like the one referenced in the OP you'll have just one statistical programmer working with a PhD-level biostatistician in addition to the MD investigators. The biostatistician guides the programmer through the statistical methods to use and steers the overall study design, but otherwise they never see the underlying code. There are some exceptions, but in many cases, the programmer ends up the only person seeing the code. A lot of this has to do with how funding is structured—grants are written assuming one FTE programmer, and something like 0.05 FTE for the biostatistician. It's hard to convince funding agencies that you need more than one FTE programmer in many cases; on top of that, farming out the biostatistican's time in small chunks like that splits their attention between dozens of projects, which precludes them engaging with any one of those projects in depth.

So, there's literally nobody else on these projects who could conduct a code review. This also sort of provides a disincentive towards using VCS even though it's so obviously a good idea if you're the only person contributing code—I've talked with programmers about this before and the response is "why bother?" unfortunately.


> Right, that's what I meant by code review in the first paragraph: internal review within the research group.

Ah, I took it as code review during the peer review process (i.e. from reviewers). Unsure how realistic this would be.


> Code review during the peer review process

Unfortunately, this would make the already laborious process of peer review even longer and require more work. Given the reality of academia today (publish vs perish), not to mention the added work of preparing even well structured, version controlled code (which is not, in my experience common) for publishing, most researchers would not opt-in (or support) something which would make publishing more difficult.


Maybe this sounds harsh, but:

Who cares? If researchers are producing crap to get published, why should we mind if they stop producing that crap when journals raise the standards of publication?


That’s what I was referring to in my second paragraph. :)


I previously made a big list of papers that were retracted due to software bugs. It was intended to go in a manuscript but I had to cut it out because the conference limited the number of references for the camera-ready version. If anyone is interested I can try to dig up the list again!


Sounds great for the arxiv!

In fact, I would treat the retracted papers more like a data set than things to be cited. Then you get a nice paper with counts and statements about common themes, and references on those themes. Then post the dataset as supplementary material available on the arxiv.


Good point, and by not citing the flawed papers you are not inflating their academic pagerank.


That sounds interesting, yes.



I would be very interested!


This isn't surprising, and I'm sure has happened many times. If you get the result you expect, you are much less likely to check for a mistake. The authors deserve a lot of credit for owning up to it.

They did the analysis with Stata.


This may be the best way to get a negative result published: as a retraction of the published reversed-positive findings!


> Given the corrected finding of a paradoxical increase in acute care use in the intervention group

Now I’m curious why long term intervention/support increased the number of acute cases. Maybe people were more likely to find themselves sick when provided with additional monitoring after they leave the hospital? Some sort of psychological connection or being overly careful?

Plenty of doctors will simply blame your past diagnosis for any broad new symptoms, without doing much critical thinking or investigating. I’ve seen this personally many times in the years following a colitis diagnosis. The symptoms are quite broad and easily mistaken.

Anyone know if the new article is available yet?


You can find it here [0]. Easy to miss the reference since it is only indicated in the text by a superscript. Personally I prefer using the text as link or a reference put inside brackets inline.

0. https://jamanetwork.com/journals/jama/fullarticle/2752467


Could easily be type I error. That's the problem with null hypothesis testing. You can't confidently say that the intervention caused the observed change without replication.


I feel like a "best retraction of the year" award with a real monetary prize would be a good incentive to keep these coming.


How does this stand up as a post-mortem? Sufficient? Is there a standard for post-mortems?


Curious what is the incentive for an author to retract a study?

Shoulnd't they just leave it out? Continuing to accrue more publications and quotes, or whatever the metrics are in research.


Well other than academic integrity and general honesty if people can't reproduce your research you might not keep your job.


>Over the course of this reanalysis, we detected an error in imputing missing values for the SGRQ, whereby the worst possible score (100) was incorrectly imputed for missing values of participants who had died beyond the 6-month study period. The correct approach would have been to classify those values as missing because those participants had not died by the 6 months after discharge study end point.

The reassignment error is possibly forgivable, but I think this second error should have been easier to catch and is much less easy to forgive. A simple filter check between possible score and some other status variable in the dataset would of caught this mistake. I am doing a Masters in Biostatistics and this kind of checking is being taught to us early on, I hope there is more focus on it later to help avoid mistakes like this.


Even when following every protocol, everyone will screw up at some point.

The authors' approach is the correct way to deal with a screw up in the academic/research context -- broad communication, transparent assessment of the mistake, eager explanation.


Trivial errors can slip through the cracks easily.

For example, people sometimes misspell “would’ve” as “would of” even if they actually know that the latter spelling is actually incorrect.

Pointing fingers is easy after the fact, but spotting every possible error all of the time – no one is able to do that.

I’ve even made that very same spelling mistake you did a time or two myself even though I try really hard to be correct in spelling and in all aspects of grammar, and even though I am well aware that “would of” is just plain wrong. We all slip up, and sometimes we do so in embarrassing ways. Especially when we are lacking sleep or when we are otherwise exhausted.


Exactly.

The notion of "failing forward" is somewhat related here -- don't focus on the blame game when addressing honest mistakes. (Fraud is a different matter). Focus on rectifying then moving forward.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: