Hacker News new | past | comments | ask | show | jobs | submit login
Siblings miss crucial life-extending treatment because of CrowdStrike outage (kiro7.com)
155 points by nullindividual 11 months ago | hide | past | favorite | 169 comments




It's probably been raised before, but the CrowdStrike terms of use (https://www.crowdstrike.com/software-terms-of-use/), section 6.1, have the usual blurb on them (emphasis mine):

> Neither the software or any other Crowdstrike offerings are for use in the operation or aircraft navigation, nuclear facilities, communication systems, weapons systems, DIRECT OR INDIRECT LIFE-SUPPORT SYSTEMS, air traffic control, or any application OR INSTALLATION WHERE FAILURE COULD RESULT IN DEATH, SEVERE PHYSICAL INJURY, or property damage. SOFTWARE USER agrees that it is SOFTWARE USER’S RESPONSIBILITY TO ENSURE SAFE USE OF SOFTWARE AND ANY OTHER CROWDSTRIKE OFFERING IN SUCH APPLICATIONS AND INSTALLATIONS.

We don't really think long and hard enough about isolation of systems, and what levels of access they actually need to be able to do their tasks. It's entirely practical to build completely isolated networks. US Government (and most major governments) operate classified networks with air gaps, network diodes and the like. We don't have to make everything actually internet accessible, while still retaining the ability to get data in to such isolated networks.


Now comes the important question: did CrowdStrike's sales team try to sell solutions to operators of such critical systems? If so, how hard did they push?


Been in Hospital IT, yes they did. Sure, they talked about not putting on "Life Critical Systems" but just about every system at Hospital is life critical. This is extremely common in IT because I'm sure Microsoft Windows has similar clause. Hell, I bet Epic (Electronic Medical Records) has some clause about "Make sure you have backups" when their whole pitch is "throw out the paper!"


I have seen first hand the mental gymnastics that finance/MBA leadership will go through to pretend to themselves (and the regulators) that a particular IT system isn’t “critical”.

“We have a tested backup paper procedure” - Yes, but have you tested being able to access that document when the system is down? Have you tested it when everyone’s workload is 5x normal during a real life incident, given that doing it on paper takes much longer and everyone’s already at the edge of their capacity on a normal day? Have you considered in a real life disaster scenario that something like 20% of the employees might just call out sick?

Of course not, but nobody asks that, so they just tick the “risk mitigated” box and don’t allocate any engineering effort to ensuring the system is robust.


Given this: https://www.theregister.com/2024/07/19/crowdstrike_update_nh... someone has certainly been selling Crowdstrike software to be used on life-critical systems.


Of course they did. This type of fine print is meaningless and only there to push responsibility onto their customers. This type of issue shouldn’t happen at large companies. We need regulations that create penalties and jail time for larger companies that have security incidents or other issues like this one.


We need regulations that create an environment for which those affected can directly seek retribution with full civil and criminal immunity.

I'm sure there are lesser measures that could work, but I'm exceedingly confident the above will be extremely effective.


We can look at a bit more regulated industries like airlines (see: Boeing) and see that it’s not necessarily working whatever path we’re on is certainly not that approach.


with larger corp users there is often an MSA that overrides the TOS; no idea if that's the case here, but give hospital GC benefit of the doubt re due diligence

also would be interesting if crowdstrike was installed by a reseller who specializes in healthcare / airlines, not the hospital or airline itself

waivers not always enforced, per [1]

> Courts will hold overbroad liability waivers unenforceable on public policy grounds. ... injurers routinely ignore these holdings and persist in requiring would-be plaintiffs to sign such unenforceable waivers anyway

at least one federal case in florida[2] saying advertising a product as safe doesn't defeat the EULA, but it's florida

1. https://wp0.vanderbilt.edu/lawreview/wp-content/uploads/site...

2. https://casetext.com/case/justtech-llc-v-kaseya-us-llc


NT 4 EULA had a similar note, but it was limited in scope to JAVA.

> 8. NOTE ON JAVA SUPPORT. THE SOFTWARE PRODUCT CONTAINS SUPPORT FOR PROGRAMS WRITTEN IN JAVA. JAVA TECHNOLOGY IS NOT FAULT TOLERANT AND IS NOT DESIGNED, MANUFACTURED, OR INTENDED FOR USE OR RESALE AS ON-LINE CONTROL EQUIPMENT IN HAZARDOUS ENVIRONMENTS REQUIRING FAIL-SAFE PERFORMANCE, SUCH AS IN THE OPERATION OF NUCLEAR FACILITIES, AIRCRAFT NAVIGATION OR COMMUNICATION SYSTEMS, AIR TRAFFIC CONTROL, DIRECT LIFE SUPPORT MACHINES, OR WEAPONS SYSTEMS, IN WHICH THE FAILURE OF JAVA TECHNOLOGY COULD LEAD DIRECTLY TO DEATH, PERSONAL INJURY, OR SEVERE PHYSICAL OR ENVIRONMENTAL DAMAGE.


If I recall correctly, part of the context for this was that Microsoft had shipped its own implementation of Java separate from Sun's, but it only implemented a subset of the language.

https://en.wikipedia.org/wiki/Microsoft_Java_Virtual_Machine


I remember it being a superset, they added an extension to make implementing native methods easier on windows. ActiveX Jdirect or something like that.


I wonder if that's MS's Java or Sun's. If it's from a third party they probably copy-pasted the relevant paragraphs.

Quicktime also has a paragraph about not using them to operate nuclear facilities...


The Quicktime nuclear clause also got copied into the iTunes EULA, which is extremely funny.


Well shit, I guess I have to pivot. I was going to operate my nuclear facility with Quicktime.


I probably shouldn't mention how UK Nuke subs used to use WinXP. Maybe they still do. Even if they were using CS, I doubt they would update systems while deployed. I had this horrified image of a nuke sub at the bottom of the ocean while the onboard IT tech was trying hard to make sense of why all the critical systems kept blue-screening.

https://www.theregister.com/2008/12/16/windows_for_submarine...


The disclaimer about not using Java for X, Y, and Z was part of the Java EULA for many years (and my still be - no idea).


QuickTime? The video stuff from Apple?


Yeah, the one we had to install on Windows 98 to play MOV files. Which, I learnt yesterday[1]/[2], is the video format that turned to MP4.

[1] https://obsproject.com/blog/obs-studio-hybrid-mp4

[2] https://news.ycombinator.com/item?id=40951187


I don't think an EULA absolves you of any possible liability. Courts can more or less decide to ignore it if they see sufficient reasons to.


Why does the hospital get a pass? They have critical care equipment, with no redundancies, and no plan to deal with an outcome so predictable it's been in EULAs for decades.

These lessons need to cut in all directions. If you want to profit off of treating disease you should be held to a much higher standard. Passing the buck off to your AV provider is convenient in the current atmosphere but it's incredibly short sighted.


> If you want to profit off of treating disease you should be held to a much higher standard.

The consequences of which will be even higher costs for an incredibly marginal improvement in outcomes.

We spend ~$13,500/year/person at the moment, what's another few thousand to that number, for an average gain in a few life-days-per-person?

At some point, you have to accept that medicine is an unlimited money hole, and that it can always do better, and that people are going to die, and that you're going to have to draw the line at some arbitrary cost/benefit limit.


> and that you're going to have to draw the line at some arbitrary cost/benefit limit.

Well US is spending significantly more money than all other highly-developed countries with (on average across population) worse outcomes than many of them to show for it. Also there is a lot of variance inside the country (i.e. the cost of identical treatment might vary wildly depending on health/insurance/etc. provider) making cost/benefit analysis semi meaningless.


If you've got a defendant with a war chest of several billion USD there's going to be a very drawn out legal process.


I suspect that if the operators of such facilities obeyed all of the terms of use for every product they wanted to use, they’d be using pen and paper for everything…


No, that’s why medical and aviation products are expensive and these are difficult markets to penetrate.

The degree of reliability that is required is insane, I cannot read the article since I am outside the US, BUT if these are the terms of service and the product was used in any area the was excluded under these terms, the entity that used the product might very well be guilty of gross negligence.


and it was not. No plane was flying with that software. It was booking services and similar needs. Planes could fly just fine, it was impossible to book people, thou.

My guess is that it's similar in this case. (Site is down)


Hospital networks and computers running Epic are very much indirect life-support systems, faulire of which can cause lots of injuries and deaths - as we're learning now in real-time.


I'm hearing that most hospital cybersecurity insurance requires Crowdstrike (or a product like it) on all the endpoints, so if that's true the liability might fall back on them. It will be a protracted argument for sure.


This is likely the case in a lot of places. We're still in the midst of ransomware groups targeting hospitals.


> communication systems

So anything connected to the Internet? Shame on these lawyers.


That's only legal until a court throws it out


https://crowdstrike.wd5.myworkdayjobs.com/crowdstrikecareers

Looks like Crowdstrike outsources their SDET/QA while keeping most software engineers stateside.

I generally don't have an issue with outsourcing, but it's obvious they're trying to save money on QA here. A few 200k SDETs could of probably caught this.

I see this at tons of companies, they see QA as less important...


There are 3 axes of risk: probability that something goes wrong, the impact of something going wrong and the time to remediation when something goes wrong.

You're arguing that on shoring QA would reduce the probability of something going wrong. I'm neither going to agree nor disagree.

However, I think the failure here is to mitigate the impact of something going wrong. Their rollout plan was fundamentally flawed - it shouldn't have taken out so many machines at the same time. It should have been rolled out in stages, with only 1 machine at most at any given customer receiving early versions.

It's best to assume a bug will get through 1 day or another, and spend some time mitigating the other axes too.


My argument is they decided to cut cost on QA. It's very likely a higher paid QA team would of caught this.

A higher paid QA might of told management, hey this is a very high risk change. If we're going to roll this out let's limit it to reduce the numbers of people affected.

If you on shore your core development, but outsource all of your QA, I'm forced to assume you value QA less.


"If we're going to roll this out let's limit it to reduce the numbers of people affected." Ime this is something senior developers would themselves do - and not only for changes they deem "high risk", but also by default.

I say this because this case a data file was changed. Probably done thousands of times without an issue.

QA would have never said "we need a staged rollout for this". Developers and those who set the process should do it.


Senior SDETs can and should set the process for deploying software fixes.

But many companies don't view SDETs as equal partners in the developing process.

Anyway the entire world knows they cut corners here.


I am not sure if pay of QA team is truly a factor, but it is extremely likely if you off-shore QA you don't give them the absolute power to override anyone below CEO/CTO...


The flawed version was only up for about an hour and 18 minutes. In that time it was able to have the impact that it did

https://www.crowdstrike.com/blog/falcon-update-for-windows-h...


I had to get someone life-critical medicine yesterday. My GP practice's computers were down because, presumably, of Crowdstrike. Manual pen-and-paper processes saved the day.

I wonder how many people didn't get so lucky?


I was due to pick up my ADHD meds yesterday, and couldn't, for crowdstrike and reversion to paper reasons.

For me, it's mildly annoying, but I've got an emergency supply. The lines of truly desperate people with much more urgent needs than mine were long, and there was a lot of crying and despair in the lobby. I can only imagine the situation in larger cities.


This is what I was most curious about for this story. What exactly failed that meant the appointment had to be cancelled? If it was set up and planned for, presumably all the infrastructure was in place to conduct the fairly specific treatment.


Was this a first time prescription? If not, was there something that prevented the script from being filled earlier?


In usual quiescent conditions, the "insurance" companies often do their damnedest to delay medical care as long as they can get away with. I've had to suffer them only allowing one week of a prescription to be filled at a time, with fulfillment done by overnight shipping. I don't even think it's about saving money per se (overnight shipping of a refrigerated package isn't cheap, and I sure as shit wouldn't have been eating the cost of a misdelivery), but rather control for control's sake.

So ultimately it's not bona fide regulation taking away that last few weeks of slack and creating a needless mad rush, but rather the common setup of the "free market" unaccountably setting uniform policy in lock step, while you might get to choose which hold music you listen to.


Is that relevant?


If there's something that says you can't fill the script until X days before previous runs out because of some "regulation", then yes, it is relevant. It's just another example of short sighted JIT style expectations that only compound situations like this where the supply chain is interrupted.


Just to be a little fair here, healthcare providers are a major target of ransomware. How many ransomware attacks has crowd strike thwarted?


It's not "fair" that a product can brick your appliances if they have previously protected against it.

In my mind, that is because the protection is what they're paid bags of cash to do.

If Crowdstrike was a charity I might find myself agreeable on your description of fair in favor of Crowdstrike.


Why don't EMA/EHR systems have write only encrypted journal storage that they use to guarantee data safety against this problem?

I mean, just for _basic_ audits, you would hope to have that. If ransomeware can destroy your entire facility, than an angry insider can do much worse.


Even with encrypted append only storage, any actor just needs to be more patient. Wait a week, 2 weeks, a month, whatever.


By "write only" do you mean "append only"? Just for my own understanding.


Why isn't this stuff air gapped?


Should be, or with a clearly defined interface against the open web


Ugh:

    Error 451
    It appears you are attempting to access this website from a country outside of the
    United States, therefore access cannot be granted at this time.
Fortunately the archive.today link works.


Why is this a thing?


GDPR


As does my vpn.


Article is void of any information about why they missed the treatment that day, just that their appointments were canceled, and thats it. What terrible reporting.



Just FYI: site is inaccessible from outside the US.


Ooof yeah, just found out about too, first time seeing something not available in my country


People ridicule the German fetish for doing things on paper and using cash, but many things tend to work here even if the computers stop working.

My general practitioner once treated me during a power outage, all I had to do was come back and have my insurance scanned later.


As someone who straddles the Austrian and Swabian ways of doing things, if I had to pinpoint the true underlying German fetish, it's one of avoiding risk. The bureaucracy and obsession with paper-based processes really do stem from that.


That is an upside, but what are the downside costs? Are there more errors? Do people die more often because something got overlooked because it was not shared? Certainly it is more expensive, but even putting that aspect aside I would want to know for sure that it was a net positive and not just in this one situation.


That's a really moot point because the legacy banking industry there got heavily affected.

The nation's weird affinity for pen and paper is nothing but Luddism. Once you have experience living in a country with good e-governance you'd roll your eyes at Germany and their love for faxes.


Faxes - I fully agree.

But there are areas where it definitely isn't just Luddism, especially in healthcare. See what happened in Finland [1]. Yeah, you can break into a GPs office and steal the physical data quite easily, but that doesn't scale, hacking a centralized service does.

(Sure, there are solutions which would be similarly resistant against hacks as paper - like saving the data on the actual insurance card. But those are not implemented - it has to be a centralized (often SaaS) solution, where hacks can scale nicely)

[1] https://www.bbc.com/news/technology-54692120


Well I did not notice whatever affected the banking industry of my country, which at least for me makes my point.

Good e-governance is incredibly difficult, which explains our love for faxes.

E.g. Microsoft faced a lot of scrutiny in the congress hearing in June, some people go as far as saying MS is a danger to the national security of the USA.

If the US, which is the home of these companies and can put pressure in ways the German government cannot, still can‘t force them to deliver secure systems, a fax ( at least an encrypted one) looks pretty attractive.

When the EU parliament still used faxes, the US at least hat to break into the offices and manually install components to the machines to get access.


>Good e-governance is incredibly difficult

And yet Estonia, a former impoverished communist country significantly less wealthy than Germany did it, and did it well. But no, Germans always have a laundry list bingo of FUD excuses as to why it can't possibly work. The bingo usually starts with "but m'uh privacy!" even though BAMF, Schufa and every law firm and government agency remotely interested in you can find out everything about you if they want to fine you for something.


1.Estonia started more or less from scratch after 1990. Building a new system based on modern standards is easier than reforming one that is working under full load.

2. Estonia is much smaller than Germany, and afaik much more centralised.

Estonia did a lot of things right, but countries like Britain, France and Germany can learn little from Estonia. The results just do not translate.

3. The German system as it is, is designed to make it slow to change. Even if hard right AFD would get the majority of votes in the next election, during one legislative term they could not change the system a lot. To do this, even the absurd long Merkle reign was not long enough.

3. might sound like a disadvantage, and it certainly is for the Germans, BUT if Germany would follow the likes of Hungary, Poland and Turkey, that would destabilise the whole continent.

There is a joke, that claim that the German anthem actually is:“ Stability, stability über alles“


> BUT if Germany would follow the likes of Hungary, Poland and Turkey, that would destabilise the whole continent.

Kind of an unrelated point. Using faxes and paper based burocracy won't save you from crappy politicians implementing crappy policies or a government going crazy.

Why are Germans so obsessed with correlating that crappy burocracy automatically means more political stability as some lame excuse for maintaining the inefficient and crap public burocracy? You can also have political stability with efficient burocracy. The key is political accountability and separation of democratic powers, to maintaining stability, nothing to do with using digital or paper for burocracy.


Germany is NOT politically stable. I would also prefer that when (not if) the next Nazi party wins an election, it has a hard time getting my data.

Germany seems to have not learned very much the first time, though, because it still keeps a hierarchical registry of everyone's religion and home address, and everyone in the bureaucracy has a "just follow orders" mentality.


> Even if hard right AFD would get the majority of votes in the next election, during one legislative term they could not change the system a lot. To do this, even the absurd long Merkle reign was not long enough.

This is a widespread misunderstanding. The AfD could drastically change the system if it had the sufficient number of seats (or a willing partner). There are very different pathways depending on what your goal is. Merkel was a conservative who had no interest in "changing the system a lot" (quite the opposite). The AfD wants to drastically change things like the immigration system.

It's a bit like how in the US the SCOTUS, thanks to the Trump appointed judges, effectively ruled that the President has a lot more power and is largely above the law (more so than these things already used to be the case before) but the Dems and Biden think the ruling is bad and thus refuse to do anything with that. They could also easily change the tune of the SCOTUS by appointing more judges but refuse to do so to avoid setting a precedent (as if the GOP needed it). "Centrists" often value decorum above succeeding at their stated goals.

There's an episode of Die Anstalt that plays through how the AfD could functionally abolish most of the constitution and democratic system within a single term if you want to know the specifics but it mostly comes down to abusing rules that exist because the system was created under the assumption that everyone would play fair (which is ironic given how things like the 5% hurdle are justified).

> There is a joke, that claim that the German anthem actually is:“ Stability, stability über alles“

I'm not sure where you heard that joke or whether this is a translation of the joke but the German anthem does not actually contain the words "über alles". The German national anthem consists of a single stanza of the Das Lied der Deutschen, the first stanza of which begins with "Deutschland, Deutschland über alles".

BTW it's a common misunderstanding that the "über alles" ("above all") part is why only the final stanza was used by West Germany for its anthem. However that was originally meant as an appeal to German nationhood and unification "above" the monarchs although it was later adapted as an expression of national superiority. The actually problematic part comes later in the first stanza where it names rivers as boundaries - not only did that include East Germany (a separate country) but also parts that were no longer part of either of the two countries.

As for why the second stanza didn't make the cut, I guess it was just a weird one to start a national anthem with as it celebrates German women, fidelity, wine and song and their "old respected fame" which at this point probably felt anachronistic and also wasn't as strong as the third stanza's "unity and justice and freedom" (with "unity" also having a new meaning when East Germany had become a separate country). That said, personally as a German I think it's a terrible anthem, especially given the melody was originally written to celebrate the Kaiser (first of the Holy Roman Empire and later of Austria-Hungary).

Personally I would have preferred the original East German anthem Auferstanden aus Ruinen, at least textually. It's also not great but at least it's better than digging through scraps to adapt an outdated poem set to a monarchist hymn.


> even though BAMF, Schufa and every law firm and government agency remotely interested in you can find out everything about you if they want to fine you for something.

This isn't true, IMO. German administrative offices in general don't talk to each other without your permission, with exceptions like the police. This is also the reason why it's e.g. a giant pain to change your name in Germany - everyone has their own independent database. Can you maybe clarify, please?

BAMF has no data of German citizens ("Bundesamt für Migration und Flüchtlinge", Federal Ministry for Migration and Asylum Seekers).

Schufa is a private company which only gets data from other companies, not the government. If you don't allow a company to give your data to the Schufa (which, to be fair, you have to do for many things), they cannot legally get the data (and in this case you could force them to delete it via GDPR).

German administrative authorities don't even have compatible databases. Like, if you go from Munich to Berlin, they have to basically enter your data manually. The software of the local municipalities (Einwohnermeldeämter) have no common API, and up until a few months, there wasn't even a unique ID for every citizen which could be used as a key in databases.

Law firms only can get some data if a court allows it.

If you get e.g. a ticket for speeding, and you didn't drive your car, but your spouse did and you don't tell them who the person in the driver seat on the picture is, there is nothing they can do (except forcing you to, from now on, keep a log book of who drives your car). They can't just call your local municipality and get the ID pictures of your spouse or something.


> but many things tend to work here even if the computers stop working.

Sure, they work. At the pace of 1980's business speed. The U.S. is 4x bigger population and 6x gdp than Germany.


The amount of military power that is used to support that gdp is much larger though. As long as oil is traded in dollars, the us will not have anormal economy.


Full title:

Siblings miss crucial life-extending treatment at Seattle Children’s because of CrowdStrike outage


There’s are lots of reports like this. In Boston, Mass General and Brigham both shut down normal operations from what I heard.


Do it really make sense to blame just CrowdStrike for this?

They were one link in what appears to be a pretty fragile dependency graph.

For example, wouldn't it possibly make sense to also blame:

* Regulators / insurers / etc. who require passing the audits that mandate using services like this.

* System designers who failed to implement disaster recovery plans for this scenario.

* Auditors who failed to highlight this risk.

* Device vendors who made medical equipment susceptible to this kind of DoS.

* U.S. FDA / DEA who allowed and/or mandated systems with this kind of vulnerability.

* Voters (in democracies) who ultimately bear responsibility for their government's actions/inactions.

Etc.?


There's lot of blame to pass around, and a lot of systems to reconsider, but at least initially, the blame lies with people who had a kill switch to critical infrastructure in multiple countries, were fully aware of that fact, and yet were so careless they accidentally pulled it.


I don't exactly care who is blamed for this in the chain of stupidity, but it must happen. This corrosive attitude of "oops software problems nothing we can do" must end fast.


The more you spread the blame, the less likely it is anything will change.


While I appreciate the effect this kind of downtime can have, I just don't understand these stories.

Presumably it was planned in advance, so the patients know the time of their appointment and the doctor knows what was planned, and everything necessary to physically perform the treatment is already prepared at the hospital. What's stopping them from doing it without filling it into a digital system? Why is it impossible to make a paper record and fill it into the computer system later?

If somebody was literally dying, would they stand around the computer like confused characters in a The Sims game who can't find the door, instead of saving the life? And if not, why is this less urgent case different?


A nurse was unable to give my wife medication while in labor because the barcode on the bag of drugs wouldn’t scan. Fortunately we just had to wait another 20 minutes to get a new bag from the pharmacy but I can easily imagine a world where doctors are unable to perform procedures they are physically capable of doing because of liability surrounding not using the computer systems as intended. Epic particularly has really done a number on the healthcare system.


I really, sincerely don't understand that. How does an unscannable barcode prevent a doctor/nurse from administering medicine they are holding in their hands?


The other commenter already said it: Liability. What if the scan is part of a procedure that ensures that the right drug is given to the right patient? Giving someone the wrong drug or even the wrong dose can cause serious harm. Imagine they kill someone that way and then during the investigation it turns out the they didn't scan the meds. It doesn't matter why they didn't scan it (lazy, forgetful, computer problem), it is en enormous legal risk for every party involved. Thousands of people die each year because of medical errors, so trying to prevent doctors from killing people by using strict procedures is very important. Even if it means that in extreme situations like this the procedure can cause harm as well. Overall it will save many, many more people than it will kill.


What if it turns out they harmed the patient by insisting on following the standard procedure during a worldwide outage? Isn't that the same kind of liability risk, and is the regulation really going to protect them in this case? If so, isn't that a hugely problematic regulation?


In that case the nurse or doctor has a strong defense of "I was following policy" for their insurance and boss.

The people writing hospital policies or regulations aren't thinking about individual patient outcomes unless some notable news story came out recently, and even then it's maybe the third or fourth priority on a list a hundred items long.


We don't know that this is what actually happened in OP's case. I was referring to the comment you replied to and there it is pretty obvious that the regulation is exists to prevent harm from being done. But even if there is a clear justification, you would expose yourself to a lawsuit and need argue all this in court. I can totally understand why people don't want that, especially in the US. So if anything, you should blame the legal system.


I doubt refusing to treat patients when the computer is down insulates you from malpractice claims.


Yes, and yes. Welcome to our messed up society.


It probably also automates the chart entry and billing to insurance/patient, at least to an extent. I wouldn’t rest sole responsibility for this system on legal compliance or risk mitigation. Under normal circumstances, there’s also an efficiency improvement. The problem arises when there either is no workaround when the system doesn’t work, or workers aren’t trained well enough to know how to do things manually (or don’t have enough time under the less efficient mode of operating).


It doesn't. We do this all this time in rapid responses and cardiac arrest scenarios, when we can't wait for an order in the EHR; someone keeps track of the medications, doses and rough times of administration, and it's entered into the EHR later.


Because the law doesn't want to do its job anymore, so it created useless bureaucracy to make their lives easier and human life hell


An awful lots of apparently useless bureaucracy exists because many people, left to themselves, are often very, very stupid.

Bureaucracy certainly stops smart people from doing the right thing, but more often, it stops stupid people from doing the wrong thing. Hack away at bureaucracy at your peril.


It also stops smart people from doing stupid things.


Well said.


If the barcode wouldn't scan there it might also have not scanned correctly when that bag was being filled which could have led to it being filled incorrectly.


Because they’re accepting the liability of it going wrong if they make an unusual choice to disregard the error


I think a good remedy would be to completely remove "normal procedure" as a defense against liability. Our legal standard should defend people who break protocols if they know they will result in harm, and prosecute people who don't, or prosecute the people who make the protocols in those cases. Law should supercede corporate policy, not treat it as a form of law


So the problem really isn't CrowdStrike or any computer at all, but dumb policy or regulation?


That's not how liability works. There is no "I followed some written procedure when it didn't make sense to do so" defense to malpractice claims.


> What's stopping them from doing it without filling it into a digital system? Why is it impossible to make a paper record and fill it into the computer system later?

It’s not filling in new data that’s the problem - every person involved in treatment needs to be able to access the patient’s medical records to check for contraindications. Allergies and drug interactions are a quick way to kill someone when injecting drugs directly into their veins even if they’re already in a hospital.

At a major hospital there’s too many patients coming through and the data changes too frequently to keep paper backups.


>Her children’s appointments were cancelled, the first they would miss in five years.

They've been going every two weeks for the last five years. I doubt they wouldn't know what to do ...


It’s a large children’s hospital with thousands of employees treating tens of thousands of kids a year, not some rural family doctor with a list of patients that can fit on a single sheet of A4. They’re not going to get the same staff every time and the staff isn’t going to memorize the charts of every patient.


It’s not clear to me that this case is actually life threatening. They have a regular procedure which even in the article they say they have wiggle room for timing.

If all of your computers go down your throughput is going to go down because other kinds of organization are going to be slower to do ad hoc… so you triage.


See some of the comments from affected medical staff on the main outage story, but the tldr is, tightly coupled systems.


I can get stories like call centers, but I absolutely don't understand how life critical systems aren't air gapped and rigidly controlled.

Fail safe is the only acceptable failure mode for any critical system. Crowdstrike failed here, but they're not the only thing that can go wrong with computers. Where is the redundancy?


Life-critical systems are air-gapped. Just no one considered systems running Epic to be life-critical. It turns out they are, probably more so than most.

Also, air-gapping helps only so much when network dies and hospitals can't exchange patient information or send images from MRIs and X-rays to radiologists.


>and hospitals can't exchange patient information or send images from MRIs and X-rays to radiologists

My dentist literally took a photo of my x-ray with his phone and sent it to to my orthodontist via Whatsapp and everything went quick and smooth, much faster than the official channels. Solutions to get a job done quickly and efficiently in case of emergency always exist, they're just not "by the book".


Imagine a news story about a dentist that violated HIPPA (or equivalent) laws because they used Whatsapp / Facebook to share medical records. Will this news story be about a hero vs someone who got into trouble?


How would he get in trouble?

Hippa doesn't apply in Europe but GDPR, and I don't see how that would be in violation since my information was exchanged only between the two parties with my consent, on an encrypted channel.

They would only get into trouble if that info would leak in an identifiable way to unauthorized third parties and would cause damages (here there's no punitive damages like in the US). And people here tend to guard their WhatsApp chats pretty well since it's what everyone uses and it also contains their private chats so in a sense it can even be more secure than the official medical channels which are just more burocratic but offer no actual guarantee of more data security.


> my information was exchanged only between the two parties with my consent, on an encrypted channel

Say WhatsApp is found to have a security hole that has been leaking data to 3rd parties. What may be the fate of dentists / doctors that decided to use it an "encrypted channel" for medical records? Are doctors / dentists not fat targets for lawsuits? What might the guidance be from their lawsuit insurance policy?


Lots of encryption software that’s been used in the past was found to be deficient. I can’t recall any entity that used it while being unaware of its deficiency being held responsible.


I get that, my point is, why is it absolutely necessary to use the computer system? Why don't they just knock on the door, go grab the medicine and tools, apply it, then fill it into the system later?

I understand they would just postpone whatever can be postponed to save the headache, I don't get the stories about life/health threatening situations.


Have you ever worked a job that requires high degree of physical world logistics? In times where the primary coordination mechanism is down, any action becomes much slower to implement and often at a direct cost to implementing other actions.

With regard to this case, I don't know any specifics, but I can imagine tools require digital calibration, inventories not tracked outside digital systems, certain meds behind digital access control, and emergency response striained to the point where complicated non emergency procedures would be more risk than benefit.


I have managed IT departments that managed hundreds of locations and thousands of computers running Windows XP and Windows Server 2003, no cloud at all. And I went through several similar outages (similar in impact on our operations, not cause or impact on others). Our first priority was to get the critical computers that operated machinery running - we did that hours (1-2) after the problem started. Then we played around with the servers and network for few weeks - but critical stuff was operable, albeit with lesser capacity and efficiency.

And we were managing forests and waterways, not hospitals and human lives.


That's all fine, but this time, no one could get those computers back up in the first few hours, since they were stuck in a boot loop. Plus, systems like hospitals had to be running all that time. Plus, at the scale this outage is reported to be - banks, stores, factories, phones, emergency services, CNC machines, networking, aircon - I imagine everyone was confused and trying to figure out if anything works.

I'm happy nothing significant was hit over here in Poland; reading the main HN thread on the outage feels like reading war reports.


If it's stuck in a boot loop, the first thing I do is call the local admins and tell them to take a fresh SSD and a Windows installation USB drive with them. Plug the new SSD, reinstall the OS and copy the files from the old one. Computer running in less than an hour.

That's literally what we did to restart our forest logging machinery. Are human lives less critical than that?


You might consider that things have changed in the past 20 years. Also that medicine operates differently than forest logging.


Things haven't changed in IT so much. I am not in ICT management anymore, but I write software for the modern enterprise systems and networks - I'm reasonably up to date.

Ad medicine - hence my question, I'd really like to know what's the blocker. So far it seems the blocker is bad IT management, regulation and liability, not impossibility to perform the treatment.


Your answers indicate that you have not worked in n environment heavily dependent on ever shifting physical world logistics. You might try talking to some coordinators on the ground of a hospital, rescue center, consteuction site, theme park, or military operation for insight.


I talked to people in charge of the operations on a daily basis for years. I really don't think these considerations have changed that much since my times of leadership of an entire department managing just that.


reminds me of a ticket I once worked on, no emergency or anything, I just needed to crank out a few pages of cobol, it took a little while to type it in, and the boss of the department wanting the ticket done came by and asked what the holdup was, "I'm working on it, will be ready in a bit", the boss asked "what, don't you just need to press a button or something?" she too "lead a department." hahaha


I can imagine that for something like this procedure, which is an infusion of medication into the brain it sounds like?, that the "tools" to perform the procedure themselves are computer based or computer dependent. It might not be as simple as injecting a drug into an IV line.

Note that I am not a doctor and have absolutely no specific knowledge beyond what is in the original article, but I am guessing at potential explanations.

Additionally, the article states that there is some "wiffle [sic] room" around the timing of the infusions. So it may be that the delay is not quite as serious as the title makes it sound.


Presumably they would fix these computers first thing during the night from a backup? If not, is this really about CrowdStrike, and not about a hospital unable to keep their absolutely critical computers backed up and restored in a timely manner?

Again, I understand that restoring a complex net of servers is hard and takes time. But they surely have local hospital IT admins for these absolutely critical computers who are always available on site and can do it individually - it's not like there will be more than a hundred of these at a particular hospital? Hack it a little if you have to, disable the SSO etc - all that can be fixed later.


The unfortunate fact of the matter is that centralizing IT systems around large corporate products, including the on-prem software and any cloud services, necessarily means less local control of what can go wrong and how it can be mitigated, and thus often problems that simply can't be fixed, even by competent on-prem staff. Even when it is possible, it's often highly illegal, and most organizations do a lot to beat risk-aversion into everyone on their staff, and of course I mean aversion to risk of breaking rules or protocols, not risk like "someone dying"

I think it's always a mistake to outsource control of a mission-critical system, but that is exactly what large tech companies have been encouraging every organization that will listen to them to do for decades now


I have trouble accepting that. Even if they had to unplug the computer from the network and disable SSO and antivirus in safe mode, it's possible to get the computer operational. Even if they had to reinstall the OS and the critical software from scratch. There are solutions, the question is - did they even try? If not, why? And is CrowdStrike really to blame if they didn't? I just don't think so.


Who in the org do you expect to have that competency, and do you think hospitals aren't keeping crucial things like credentials or software that gates access to things in the cloud when literally everyone in the world is encouraged to at every turn?

The culture of organizational IT is broken because a lot of powerful companies found it profitable to break it and leave something inadequate in its place


I agree with this sentiment. If you ask me, the entity that comes out looking the worst from this Crowdstrike debacle are the companies that bought their service. Crowdstrike made a poorly designed and maintained product. I heard multiple people on reddit say it's the best of that type of product, but what the hell? Why does it need kernel-level control?

Why did we get here? If you're installing kernel-level software you might as well run a kiosk that only runs presigned code and runs off a read-only system image. And a lot of the machines in question DO APPEAR to be kiosk settings (like hospital data entry terminals).

It's easy to sit back and armchair, I'm sure there will be many cybersecurity experts who would figuratively jump at my throat for suggesting that trusting a vendor to run a rootkit on your computers is a bit incompetent. LOL. :D


Everyone installing Crowdstrike seems like they want to build locked-down kiosks but haven't heard of Windows Embedded yet. Or at least I'm assuming there's an Embedded configuration that lets you do AMFI[0]-tier code signing enforcement.

[0] AppleMobileFileIntegrity, the daemon and kext on iOS that enforces very strict code signing.


At this point I just assume any "cybersecurity expert" that defends Microsoft's nonsense is a cop


I expect the local admins to be able to install a fresh OS not connected to the enterprise network. And I expect them to have physical copies of stuff like disk encryption keys, also backups of OS installations and images, and all critical software. If they don't have that or can't use it during an outage, the problem is incompetent IT management that has no business running a hospital, not CrowdStrike. Something else would take them out sooner or later.

Again, we had all of this for a forest logging operation - is it too much to expect at a hospital?


I agree with you, and kind of even agree that crowdstrike may not directly be at fault. But my point is that this competency is bled out of hospitals by external forces, primarily two: distant administration from companies that buy and manage multiple hospitals, often applying the same "efficiency" mindset that stripmines other industries in the name of profit, and the cloudtech sector, that is Google, Amazon, and Microsoft in particular, are very aggressive about selling their services along with demands that everything be given to their platforms, which often involves purging technicians who want on-site redundancy. This makes the systems more brittle, but also often causes people with the competency you're advocating to be fired


Absolutely. The risk being managed is the risk to the CEO/CTO's jobs, not the risk to life.


Hospital IT sucks. Look at a news report about a ransomware or this and it can easily be a few weeks for them to get back in shape. This one is hopefully easier because reportedly CloudStrike can sometimes pull an update before it BSODs and most windows machines auto restart on BSOD, so just leaving things unattended may be enough.

Restore from backup or reimaging fresh often means you need a working backup or image server, which at a lot of these places is also a Windows server and is likely also running the same endpoint protection, and is likely also boot looping.

Restore from zero isn't something any IT wants to do, and many of them aren't prepared to do it either.

Like it or not, hospital care revolves around the electronic medical records systems, and while Kaiser Southern California in the 90s was using amber screens and some sort of mainframe, afaik, almost everyone is on EPIC now, which is a windows application with all the baggage that contains. Even before EPIC took over Kaiser, they were running terminal emulators on Windows.

IMHO, it would be better for them to put together a ground up desktop distribution with exactly what they need, but that has user training costs and development costs.


From having seen the infusion process myself, I take it that it requires precision measurements over an extended period of time. This would seem unreasonable requirement for staff to perform.

Again, from what I've seen, infusions are not just "throw it in an IV bag and wait".


If it requires a computer, why was that operationally critical computer not restored from a backup within hours after the problem started? This has nothing to do with CrowdStrike or other bugs - it could've simply failed hardware wise and the hospital should have been able to replace it immediately.


You have a naive view of how modern operations work, I must say. This shows when you suggest endpoints have backups. We're back to the mainframe/terminal times where all software is running on a web server or other centralized application server, which is also in a boot loop, somewhere else.

Failed hardware is different, but hospitals likely have very few computers just 'lying around'. Especially the highly regulated machines, such as those which are attached to MRIs and the like.

CFR 21 Part 11 was the bane of my existence. Software that can be installed and configured in a matter of minutes? That's a six month project, at least. Sure, backups are great, but then you've got a significant process to get it back up and running.

These aren't early-2000 logging operations.

I see you'll never be convinced, but this is how modern operations work. Being a hospital (or other industry with heavy government regulations) make operations that much worse.


You misunderstood me, I am easily convinced that this is the case - what I don't get is how they could let it be the case.


Very few companies, for-profit or otherwise, keep gobs of machinery on hand "just in case". It's expensive, not only the machinery, but the space to store it, maintain it while not in use, replace it when it ages out, and so on. It's also exceedingly rare to need it.

Hospitals also have limited resources in terms of IT staff. It's not a Azure army of operations staff that can rush out to every endpoint and click buttons.

When I was in helpdesk eons ago, I was "responsible" for roughly 300 - 400 endpoints, plus a handful of servers. As were all of the other helldesk techs. If something like this happened, there's simply not enough hands to go around as fast as everyone would like.


What I meant when I said reinstall the PCs was to reinstall the critical computers necessary for operation of medical machinery to make basic and still mostly manual/paper based operation possible, not every computer they have there. I really don't think they have hundreds of computers necessary for operations of MRIs and other machines.


Modern operations doesn't go around reinstalling things on machines like that.

And due to CFR 21 Part 11, what you ask would be a non-starter.


Hotels have difficulty with paper and pen bookings when their computers are down. You expect a modern hospital to function in those circumstances?


the hospital better function.

what you're saying is, if the less important service fails, of course the more important one will fail too.


Yes


It's because these computers are a means of corporate control. Policies and checks and procedures and whatever are all delivered through them.

It's preferable, from the corporate perspective, to have everything fail temporarily than to relinquish this level of workforce management.

If this is hard to imagine, just think of a Lyft driver from the perspective of Lyft Inc.


The truth is that many medical personnel are not agentic. They are human robots unable to act unless instructed to by a computer. The computer tells them when they can do something and they do it.


*because of CrowdStrike and Microsoft


If some third party software you chose to install on your system added a kernel module and started causing kernel panics, would you blame the kernel maintainers?

I'm sure if MS decides to remove the ability for third parties to write code that runs in kernel mode in the name of security/reliability/whatever, this site would immediately turn on a dime and say that Microsoft is evil for removing user control over their machines.


"Well since it's not open source and they can't audit the code line by line (like I always do), they shouldn't use it so it still the user's fault." Probably. Tech nerds tend to be hilariously out of touch with big picture stuff beyond their basement lab.


I’m not even sure Microsoft could actually restrict 3rd party code running in kernel mode like that from a legal perspective. There are a certain requirements about documenting interfaces in Windows from the 90s antitrust stuff.


Security should be baked in. This incident and the one where Azure certs were stolen and were abused, security isn't an upcharge, it is a necessity.


According to some comments in Yesterday's thread Debian was also hit with something similar back in April.

Another comment in this thread quotes Crowdstrike's ToS which states that their software should not be used on critical systems.

I blame the hospital for its inability to operate with pen and paper in the event of a computer crash or a power outage.


It's pretty difficult to operate a CT scanner with pen and paper, to name just one thing that fell over yesterday. CT scanners are life-critical.


Why any machine related to the operation of a CT scanner itself has to be connected to the internet? The problem is not "using technology" in general. The problem is internet connectivity being not correctly identified as a liability in designing our technology infrastructure systems.


From what I've seen when I've had CTs (I am not a medical professional and have no direct ties to their industry), the machine sends the images in real time to a technician in another room. Those images are then sent to an offsite service for review by a radiologist, then returned to the doctor to give you the results and they're uploaded to Epic where I can review them online at my leisure.

It's all on a network for a reason. If it's on a network, it has to comply with all regulations that govern the service.


Data transfer can happen after data acquisition through another machine that can be connected. If that machine is compromised alternative channels can be found. There is no fundamental reason in how the systems work that before that step anything has to be connected online. Having things online is actually a liability, as evidenced by the mere fact that software like crowdstrike should be installed with kernel access on them. Why instead of going that path, we do not just segregate networks to make them more resilient? The only reason I see is all these -aaS business models. Nothing that really relates to the real needs of people or the healthcare system itself. After the ransomware attacks, instead of reducing attack surface the direction was to actually increase liability and risk by having another point that things can fail. I do not think anybody really learnt from that anything imo.


Coulda woulda shoulda, of course. Money. That's why.


Siblings miss critical life-extending treatment because the hospital IT department didn’t architect their endpoint update strategy correctly. This should have rolled out in small, incremental steps to verify no failures. A mass “select * from hosts” global update with no testing (even if the vendor says it’s good) is entirely foolish.

Hopefully folks learn from this.


Maybe try reading what happened during the Crowdstrike fiasco and then comment about it. Crowdstrike auto updates itself, the agent allows you to select update cadence, but this was an update to the "channel" files which auto update themselves a few times per day and you don't have any control over it.


Doesn’t matter. You had control over instituting it in your environment. The onus is on you.


No you didn't. External audits and compliance forced you to install an EDR, said EDR had no option to defer this kind of update. Crowdstrike sold itself to the suits as said EDR. What do you do, ignore both your management and the security checklist given by the government?


I've personally trialed Crowdstrike for my company (around 30 people) and found it a buggy mess, especially on non-Windows platforms - after the trial was over we decided to not use it. However this is not the case for most companies, feds, auditors or both will force you to use this BS and there's absolutely nothing you can do.


In the 20 years I've been in IT, in a variety of industries, with all the legacy manufacturing systems I've dealt with, I've never seen a software patch that blue screens all computers. Firmware, yes, rare, but never software. This is way outside of the norm.


Anything operating that low in the stack is going to be a risk if shit goes wrong. It’s essentially a untrusted third party kernel update. Any sysadmin worth a dime knows not to blindly upgrade something like a kernel without staging it first - this is no different.

But honestly windows admins don’t understand the systems they’re maintaining.. so this was inevitable.


You have no idea what you're yapping about. Bye.


Do you?


Crowdstrike apparently doesn't believe in doing that and there is a way for firms to set that, but this issue bypassed (ignored) that protection




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: