Hacker News new | past | comments | ask | show | jobs | submit login
Fatal Dose – Radiation Deaths linked to AECL Computer Errors (1994) (ccnr.org)
120 points by agopinath on July 5, 2014 | hide | past | web | favorite | 40 comments

>>As a result of the Therac-25 accidents, the FDA now requires documentation on software for new medical and other products: a paper trail, in other words, that can be examined by an independent body and retraced for flaws.<<

Anyone have any idea if this can be looked at by the end user? I'm not a radiation technologist of the flavour mentioned in the article, I'm on the diagnostic side. I use an MR scanner with numerous software bugs that I have reported but which remain. Similarly, the scanner can be made to display data which it says it is going to use in the next scan, but which it isn't. I suspected a bug and found the way to reproduce it. My last email listed 24 similar bugs (I've found more since) but other than a "thanks, we will forward this on" there has been no reply or comment. It is hard to imagine when this could be a safety issue, but it is a waste of valuable time, it is a waste of money and it's frustrating when I have gone to the trouble of working out the exact way of creating the issues. If anyone is interested, the interface is so god awful that instead of having an on off button or switch interface, the scanner gets the user to type 1 or 0 for on and off into a text field. Some fields take other values like 1, 2 and 3. Some take decimal values like 0 to 1 in 0.1 increments. There is no pattern to what the user is expected to type. Yuck. This data is not properly sanitized either, and you can make the scanner say its "doing" something it's not. Type in 1.999, and error message appears, the field corrects to 2.0 but the scanner does the thing that a setting of 1 would produce. These sorts of bugs occur all over the place.

Edit: The "thanks" email is the most positive I've ever got, my previous reports were me with statements like "we have some very experienced users who haven't had this issue" when there were clear safety problems with earlier scanner implementations (The scanner was producing axial slices at a location different to where I asked for them to be, on a spine patient due in theatre - good luck operating on the correct vertebral level). Its FDA approved and its on the latest software release. I have undergone manufacturer training and have had additional training half a dozen times at my request and at the manufacturers request after my bug reports were met with "you're doing it wrong". I'm not, the software is buggy and I have some excellent and amazing screen shots and camera phone video of the bugs in action.

I don't know if this makes you feel any better, but if the device manufacturer are indeed playing by FDA rules, the e-mails that you have sent should have triggered serious investigations into these bugs. That doesn't mean that they would be fixed, but they would be triaged to assess how and when they happen, and what risk they pose to patients.

This is known as the Corrective Action, Preventative Action (CAPA) process [0]. Note that the investigation into your complaint is an absolute requirement. Not just e-mails, but even phone conversations, or comments made in passing verbally - if any of them constitute a comment (positive or negative) on the device, this needs to be logged and, if the comment warrants it, an investigation must take place.

So either your comments have been or are in the process of being investigated, or the device manufacturer is not following the FDA rules.

[0] http://en.wikipedia.org/wiki/Corrective_and_preventive_actio...

My son the hacker used to work in the medical device industry as a summer employee while he was a student. The code he wrote for a medical device user interface was to be submitted for a line-by-line code review by the FDA. He estimated that the product would actually come to market more than three years after the summer he worked on it. And maybe that is what you are encountering--the person at the company who built in the bugs you have discovered has moved on, and doesn't work at the company anymore, and the other employees there are trying to figure out how to debug that old code and fix the problem. (Similarly, my son groused about the code in the device he was working on, which was acquired by his company from another company that had originally developed the device.) Always comment your code. You never know how long after you wrote it someone else will have to fix it, especially if the code is embedded in a medical device.

Thanks - this has been in the back of my mind and is a reason I'm trying to be patient. A 2 line message saying what was happening would remove my frustration. Usually I get a corporate speak reply with a suggestion it is my fault though. What does the FDA code review do? If it isn't catching bugs that take the scanner offline for hours at a time, what is the point?

There's a difference between bugs that cause downtime and bugs that endanger the life of the patient, and I think the FDA is primarily concerned with the latter. I would think a bug that caused the wrong image to be captured and could cause doctors to make the wrong decisions would be taken very seriously.

I've worked on several FDA-regulated products and have never had the FDA review my code. I would guess this only happens in extenuating circumstances. The FDA does not have the resources to do this for most products out there.

We are required, however, to review our own code and maintain records of those reviews.

Exactly. FDA doesn't review code!

If there are complaints, FDA does sometimes review is the mountain of device related documentation. Design, assembly, maintenance, end user manuals, etc. Checking the paper trail. Is the paperwork done correctly, signed by a competent employee and reviewed by appropriate persons. There also needs to be watertight trail of employee training. Failure to have that does not end well!

Traceability (both physical and code) is another thing you better get right as a medical company. You need to know where, when, etc. each major component of the device came to be.

Medical companies literally generate so much paperwork, that separate storage facilities are needed for it. While you'd obviously have it in digital format for yourself, all of it is also printed out and signed.

Compliance officer for a med device company, can confirm. Even vendor audits don't look at code, just SOPs and spreadsheets documenting that you have the processes in place to log the shit out of everything.

For future bugs:

Don't send random emails to people in the company, most people can't be held accountable for mishandling bug-reports (or ignoring them).

Look up the contact to send non-conformance reports in the user-manual of the device, there has to be a contact address, maybe even a (paper) form to fill out. Send it by paper-mail to the address (most likely the QA department). Request a classification of the issue (urgent, user-error, critical, ...) and an ID under which this issue is tracked. Set a deadline for replies to your inquiries.

If you really want to be serious, the FDA takes reports on defective medical products, here's a webpage on this process:


GE scanner, by chance? I've heard their software is pretty terrible compared to their competitors.

I actually laughed. Yes. Latest release, flagship scanner 3T 750W. We have a few of the extra bells and whistles but not all. I'm not entirely sure which we have, as trying to run something you haven't paid for used to crash the scanner, so I'm hesitant to try some things. This bug I reported.

Perhaps it's time to forward your materials to the FDA? (Or whatever the appropriate regulator is for your device.)

Time to get in touch with the press. 25 bugs in something health-critical? They'll have a field day.

My hope (which is steadily fading) was to make contact with someone in the development team to suggest a few improvements and show a few of the more obscure bugs that I haven't reported. I'd love to be able to positively influence the development. There has been limited "end user" input as far as I can tell and some small changes would make the platform so much more powerful. The upside to the lack of communication is that I've begun leaning to code and use my code on my phone to bypass the scanner's crap - without the poor scanner interface I wouldn't have done this. I should note that my basic code is truly awful to look at, although it works reliably and does a better job of its small task than the multimillion dollar console I use.

They're waiting for patents on all the stuff you suggested before contacting you back to say that you've violated their intellectual property. Don't worry. You'll hear from them soon enough...

There are so many layers between you and the developer... Without going to specifics, probably about 5. There's pretty much no chance you get to interact with the developers directly.

Like others have told, any complaint you make must be processed by the company. If it endangers human safety, failure to do so is against FDA's rules and can have dire and expensive consequences to the company in question.

Fixing bugs in a medical device is not slow because it'd take a long time to fix the bug itself. It's slow because there's a lot of paperwork (Device History Files, specifications, tons of reviews, etc.) and system testing that follows. That's why you'd only correct issues that do not endanger patient safety once there are enough or when they can be combined with new features in firmware.

There's also always a risk that those fixed issues cause patient endangering bugs.

It can take almost one year from implementing the fix until it is running on any significant number of medical (imaging) devices in question.

FDA's rules are there for a reason. Although... someone should tell FDA git exists. It's silly to revision things manually in Word documents.

The Therac-25 case study is a tragic one, but fortunately it is not forgotten.

I work on medical devices (and have worked on radiotherapy devices previously) and the standards for quality systems and regulatory hurdles (which I occasionally see bemoaned here on HN) are there with good reason. In fact, Therac-25 is often cited when training new hires on quality (as required with any ISO-13485 compliant QMS).

Diagnostic imaging guy here - we point our recruits to this, balding patients when doing diagnostic tests shouldn't happen. http://www.ajnr.org/content/31/1/2.full

Definitely not forgotten, I wrote an essay on software safety when I was at college (not Uni, UK meaning) and the Therac 25 was a big chunk of it.

The research I did has always stuck with me because of the suffering the patients where exposed to combined with a company attitude of "admit only what we are forced to so we don't hurt sales".

Nothing I write is safety critical (though it was a field I was fascinated with when I was younger) though so I can sleep at night.

I hope there are sensor mechanisms that confirm and/or failsafe the exposures nowadays.

One of the more infamous classes in Computer Science at Cal Poly SLO is "Professional Responsibilities", is taught by Dr. Clark Turner. The class delves into Therac-25, and similar cases that have happened since. I found the class really interesting because it does make you question and think about the ethics of what you are building and what others have built.

Knowing about, and thinking about, the ACM Code of Ethics, Stuxnet, Therac-25, the Windows Security Patch Policy, and other problems our programming culture have come across is important. Realizing that the code we write can affect people in both positive and negative ways on a long and short term scale is something that can change both your product and how you build a product.

There was an article that appeared in the NY Times, a few years ago, that discusses the malfunctions of radiology equipment. There was one story, in particular, that stood out for me. It describes a, reportedly not unusual, malfunction/crash of a linear accelerator used for Intensity Modulated Radiation Therapy (IMRT):

"An error message asked [the medical physicist operating the device] if she wanted to save her changes before the program aborted. She answered yes."

How many programmers read that and cringe? I know I did. My guess is that the operating system being used for the device is some standard OS (Windows CE, maybe?) that is being repurposed to run the application and provide the GUI for the device. It's not that this is necessarily bad, but I would think the most important thing to do would be to strip the OS (or UI) of the various "user conveniences" that in a life or death situation could have all kinds of unintended consequences.

If a person is coding or doing graphic design -- or typing up cooking recipes -- and a crash happens, it's a good thing to have the opportunity to save your work. If 1 teaspoon of butter gets changed to 1 tablespoon because of some kind of data corruption, big deal. So your cookies come out terrible!

It's quite a different matter if the application is coordinating 120 moving parts to direct a radiation beam onto a human body.

The article is here:


Thanks, I hadn't read this before.

For those that haven't read it, here's Levesons article on the Therac-25: http://sunnyday.mit.edu/papers/therac.pdf

This one made me think about public outrage against tobacco companies.

One minor theme in this article is that AECL denied knowledge of any reports of Therac-25 malfunctions even when, looking at a timeline of publicly-known events, such ignorance might be described as "implausible".

They don't seem to have been punished for this, and while I agree that it isn't laudable I also agree that it's not the greatest infraction. AECL really did care about the proper functioning of their machine. They really did look for problems. They cooperated with the FDA to a very great extent. It's hard to fault them for not thinking of testing "what if we enter incorrect configuration information, and then correct it within 8 seconds?"

But tobacco companies are routinely vilified for sitting on cigarette mortality data, as if this was by itself enough to make them irredeemable. They didn't even get off with a light punishment, much less the zero punishment AECL received. I suspect the difference, in the minds of many, is that AECL was a benign company advancing a useful purpose, while tobacco companies sold a product whose only use was to kill the operator. But that was legal then and remains legal today -- how can it be the justification for punishing them extra-hard for otherwise minor problems? AECL's misrepresentedly-unsafe product didn't even kill the operator; it killed random sick people who trusted the hospitals.

Didn't the tobacco companies also spend money to discredit scientists and peer-reviewed articles and seed misinformation about the real risks of smoking, all while they were sitting on that mortality data? I think that was the real problem.

All of what you've said is also legal today.

At the time of Therac-25, FDA was only budgeted to investigate 6 percent of device applications.

Currently, the same mistakes made in the eighties with Therac-25 are being made in many radiation therapy devices. The two NY Times articles (Pulitzer Prize winning) in 2010 and 2011 describe some of the newer cases.

What's shocking to me is that the incidents are always reported in isolation. People become briefly outraged, then the furor dies down until the next death.

Many of the comments in this thread suggest that people can't or won't face the fact that this is a current, ongoing problem of great complexity.

A couple of comments mentioned the coverage of Therac-25 in schools. Very little of what is taught in schools makes it into the programming of radiation therapy devices. History has shown that schooling is not a sufficient solution.

Other comments claim (erroneously) that the FDA is attending to the problem. The FDA has been carefully defanged by the medical device lobby. The FDA has gotten smarter, but has nowhere near the funding to keep pace with its charge and never will.

I wish I could say that I see some hope but I don't see it.

Why hasn't the hardware failsafe for overdoses become mandatory? Why don't we apply defense-in-depth to all worst-case scenarios involving deadly things?

Of course, sometimes hospitals aren't logical, air circulation between rooms comes to mind. And here, I'm sure everyone just trusts the machines because they paid a lot of money for them and it's always worked in the past ...

> Why hasn't the hardware failsafe for overdoses become mandatory? Why don't we apply defense-in-depth to all worst-case scenarios involving deadly things?

Because money

"A professor in computer engineering at the University of Toronto told me that, as a matter of course, his undergraduate students are warned about the risks of incrementing numbers in a computer program."

As someone with a computer science degree who was warned of such risks and studied the Therac-25 in my classes, this sentence made me realize how far we have to go as professionals. Something seemingly so simple as incrementing a number, one of the most common things done in a program, can cause serious problems (of course we have more help with this now than in the mid 80's). Other people must read things like that and cement any distrust they have in computers and computer programmers. And they're probably right to.

The Therac-25's software program, relatively crude by today's standards, probably contained 101000 lines of code. At one error for every 500 lines, that works out to the possibility of twenty errors.

I'd say 200, not twenty.

I think the article was OCR'd. There were a few other mistakes that were clearly misinterpreted characters. I think the 1 in the thousands place is actually a comma on the source.

Good point. There are many hints that article was OCR'd, for instance an ".4ECL" instead of "AECL".

I had a professor read this case study in a lecture.

It amazes me that merely one programmer was trusted with building the software for a radiation beam canon.

I think a manager for a product would give as much work and responsibility to one person as possible if they say they can do it, and sometimes even when they say they don't know if they can do it but they'll try. An experienced manager might know how realistic the workload is and downsides of having only one person on task x but every manager sees the upside, fewer people = less cost.

It makes me uncomfortable as well.

But keep in mind that at that time, it may have well been the only programmer on the project, and that the manager was not likely qualified to read code.

101000 lines, 500 lines per error gives about 200 errors, not 20

We covered this in my CS courses as well. I feel bad if anyone comes out of a CS program and isn't exposed to the Therac-25 incident even if superficially.

Applications are open for YC Summer 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact