They're probably deployed to a virtualized system to easy with maintenance and upkeep.
Updates are partially necessary to ensure you don't end up completely unsupported in the future.
It's been a long time, but I worked IT for an auto supplier. Literally nothing was worse than some old computer crapping out with an old version of Windows and a proprietary driver. Mind you, these weren't mission critical systems, but they did disrupt people's workflows while we were fixing the systems. Think, things like digital measurements or barcode scanners. Everything can be easily done by hand but it's a massive pain.
Most of these systems end up migrated to a local data center than deployed via a thin client. Far easier to maintain and fix than some box that's been sitting in the corner of a shop collecting dust for 15 years.
Real problem is not that it's just a damn lift and shouldn't need full Windows. It's that something as theoretically solved and done problem as an operating system is not practically so.
An Internet of Lift can be done with <32MB of RAM and <500MHz single core CPU. Instead they(for whoever they) put a GLaDOS-class supercomputer for it. That's the absurdity.
You’d be surprised at how entrenched Windows is in the machine automation industry. There are entire control systems algo implemented and run in realtime Windows, vendors like Beckhoff and ACS only have Windows build for their control software which developers extend and build on top with Visual Studio.
Siemens is also very much in on this. Up to about the 90s most of these vendors were running stuff on proprietary software stacks running on proprietary hardware networked using proprietary networks and protocols (an example for a fully proprietary stack like this would be Teleperm). Then in the 90s everyone left their proprietary systems behind and moved to Windows NT. All of these applications are truly "Windows-native" in the sense that their architecture is directly built on all the Windows components. Pretty much impossible to port, I'd wager.
According to reports the ATMs of some banks also showed the BSOD which surprised me; i wouldn't have thought such "embedded" devices needed any type of "third-party online updates".
So for maintenance and fault indications. Probably saves some time from someone digging up manuals for checking error codes from where ever they maybe placed or not. Also could display things like height and weight.
Its easier and cheaper (and a lil safer) to run wires to the up\down control lever and have those actuate a valve somewhere, than it is to run hydraulic hoses to a lever like in lifts of old, for example.
That said it could also be run by whatever the equivalent of "PLC on an 8bit Microcontroller" is, and not some full embedded Windows system with live online virus protection so yeah, what the hell.
I'm having a hard time picturing a multi-story diesel repair shop. Maybe a few floors in a dense area but not so high that a lack of elevators would be show stopping. So I interpret "lift" as the machinery used to raise equipment off the ground for maintenance.
The most basic example is duty cycle monitoring and trouble shooting. You can also do things like digital lock-outs on lifts that need maintenance.
While the lift might not need a dedicated computer, they might be used in an integrated environment. You kick off the alignment or a calibration procedure from the same place that you operate the lift.
how many lifts, and how many floors, with how many people are you imagining? Yes, there's a dumb simple case where there's no need for a computer with an OS, but after the umpteenth car with umpteen floors, when would you put in a computer?
and then there's authentication. how do you want key cards which say who's allowed to use the lift to work without some sort of database which implies some sort of computer with an operating system?
It's a diesel repair shop, not an office building. I'm interpreting "lift" as a device for lifting a vehicle off the ground, not an elevator for getting people to the 12th floor.
Your understanding of stuxnet is flawed, Iran was attacked by the Us Gov in a very very specific spearfish attack with years of preparation to get Stux into the enrichment facilities - nothing to do with lifts connected to the network.
Also the facility was air-gapped, so it wasn't connected to ANY outside network. They had to use other means to get Stux on those computers and then used something like 7 zero days to move from windows into Siemens computers to inflict damage.
Stux got out potentially because someone brought their laptop to work, the malware got into said laptop and moved outside the airgap from a different network.
"Stux got out potentially because someone brought their laptop to work, the malware got into said laptop and moved outside the airgap from a different network."
The lesson here is that even in an air-gapped system the infrastructure should be as proprietary as is possible. If, by design, domestic Windows PCs or USB thumb drives could not interface with any part of the air-gapped system because (a) both hardwares were incompatible at say OSI levels 1, 2 & 3; and (b) software was in every aspect incompatible with respect to their APIs then it wouldn't really matter if by some surreptitious means these commonly-used products entered the plant. Essentially, it would be almost impossible† to get the Trojan onto the plant's hardware.
That said, that requires a lot of extra work. By excluding subsystems and components that are readily available in the external/commercial world means a considerable amount of extra design overhead which would both slow down a project's completion and substantially increase its cost.
What I'm saying is obvious, and no doubt noted by those who've similar intentions to the Iranians. I'd also suggest that the use of individual controllers etc. such as the Siemens ones used by Iran either wouldn't be used or they'd need to be modified from standard both in hardware and with the firmware (hardware mods would further bootstrap protection if an infiltrator knew the firmware had been altered and found a means of restoring the default factory version).
Unfortunately, what Stuxnet has done is to provide an excellent blueprint of how to make enrichment (or any other such) plants (chemical, biological, etc.) essentially impenetrable.
† Of course, that doesn't stop or preclude an insider/spy bypassing such protections. Building in tamper resistance and detection to counter this threat would also add another layer of cost and increase the time needed to get the plant up and running. That of itself could act as a deterrent, but I'd add that in war that doesn't account for much, take Bletchley and Manhattan where money was no object.
I once engineered a highly secure system that used (shielded) audio cables and amodem as the sole pathway to bridge the airgap. Obscure enough for ya?
Transmitted data was hashed on either side, and manually compared. Except for very rare binary updates, the data in/out mostly consisted of text chunks that were small enough to sanity-check by hand inside the gapped environment.
Stux also taught other government actors what's possible with a few zero days strung together, effectively starting the cyberwasr we've been in for years.
To work with various private data, you need to be accredited and that means an audit to prove you are in compliance with whatever standard you are aspiring to. CS is part of that compliance process.
Another department in the corporation is probably accessing PII, so corporate IT installed the security software on every Windows PC. Special cases cost money to manage, so centrally managed PCs are all treated the same.
Anything that touches other systems is a risk and needs to be properly monitored and secured.
I had a lot of reservations about companies installing Crowdstrike but I'm baffled by the lack of security awareness in many comments here. So they do really seem necessary.
They optimize for small batch development costs. Slapping windows PC when you sell a few hundred to thousand units is actually pretty cheap. Software itself is probably same order of magnitude, cheaper for UI itself...
And cheap both short and long term. Microsoft has 10 year lifecycles you don't need to pay extra for. Linux you need IT staff to upgrade it every 3 years. Not to mention hiring engineers to recompile software every 3 years with the distro upgrade.
Probably a Windows-based HMI (“human-machine interface”).
I used to build sorting machines that use variants of the typical “industrial” tech stack, and the actual controllers are rarely (but not never!) Windows. But it’s common for the HMI to be a Windows box connected into the rest of the network, as well as any server.
In a lot of cases you find tangential dependencies on Windows in ways you don't expect. For example a deployment pipeline entirely linux-based deploying to linux-based systems that relies on Active Directory for authentication.
I'm more confused because I have never, ever encountered a lift that wasn't just some buttons or joysticks on a controller attached to the lift. There is zero need of more computing power than a 8-bit microcontroller from the 1980s. I don't know where I would even buy such a lift with a windows PC.
No one sells 8 bit microcontrollers from the 1980s anymore. Just because you don't need the full power of modern computing hardware and software doesn't mean you are going to pay extra for custom, less capable options.
I think the same question can be asked for why lots of equipment seemingly requires an OS. My take is that these products went through a phase of trying to differentiate themselves from competitors and so added convenience features that were easier to implement with a general purpose computer and some VB script rather than focusing on the simplest most reliable way to implement their required state machines. It's essentially convenience to the implementors at the expense of reliability of the end result.
My life went sideways when organizations I worked for all started to make products solely for selling and not for using those. If the product was useful for something, that was the side effect of being sellable. Not the goal.
Worse is Better has eaten the world. The philosophy of building things properly with careful, bespoke, minimalist designs has been totally destroyed by a race to the bottom. Grab it off the shelf, duct tape together a barely-working MVP, and ship it.
Some idiot with college degree in office no-where near the place sees that we have these PCs here. And then they go over compliance list and mandate this is needed. Now go install it and the network there...
Or they want to protect their Windows-operated lifts from very real and life threatening events like an attacker jumping from host to host until they are able to lock the lifts and put people lives at risk or cause major inconveniences.
Not all security is done by stupid people. Crowdstrike messed up in many ways. It doesn't make the company that trusted them stupid for what they were trying to achieve.
For the same reason people want to automate their homes, or the industries run with lots of robots, etc: because it increases productivity. The repair shop could be monitoring for usage, for adequate performance of hydraulics, long-term performance statistics, some 3rd-party gets notified to fix it before it's totally unusable, etc.
I have a friend that is a car mechanic. The amount of automation he works with is fascinating.
Sure, lifts and whatnot should be in a separate network, etc, but even banks and federal agencies screw up network security routinely. Expecting top-tier security posture from repair shops is unrealistic. So yes, they will install a security agent on their Windows machines because it looks like a good idea (it really is) without having the faintest clue about all the implications. C'est la vie.
But what are you automating? It's a car lift, you need to be standing next to it to safely operate it. You can't remotely move it, it's too dangerous. Most of the things which can go wrong with a car lift require a physical inspection and for things like hydraulic pressure you can just put a dial indicator which can be inspected by the user. Heck, you can even put electronic safety interlocks without needing an internet connection.
There are lots of difficult problems when it comes to car repair, but cloud lift monitoring is not something I've ever heard anyone ask for.
The things you're describing are all salesman sales-pitch tactics, they're random shit which sound good if you're trying to sell a product, but they're all stuff nobody actually uses once they have the product.
It's like a six in one shoe horn. It has a screw driver, flash light, ruler, bottle opener, and letter opener. If you're just looking at two numbers and you see regular shoe horn £5, six in one shoe horn £10 then you might blindly think you're getting more for your money. But at the end of the day, I find it highly unlikely you'll ever use it for anything other than to put tight shoes on.
I imagine something keeps monitors how many times the lift has gone up and down for maintenance reasons. Maybe a nice model monitors fluid pressure in the hydraulics to watch for leaks. Perhaps a model watches strain, or balance, to prevent a catastrophic failure. Maybe those are just sensors but if they can’t report their values they shutdown for safety’s sake. There are all kinds of reasonable scenarios that don’t rely on bad people trying to screw or cheat someone.
None of these features require internet or a windows machine, most of them do not require a computer or even a microcontroller. Strain gauges can be useful for checking for an imbalanced load, but they cannot inspect the metal for you.
In my office, when we swipe our entry cards at the security gates, a screen at the gate tells us which lift to take based on the floor we work on, and sets the lift to go to that floor. It's all connected.
Remote monitoring and maintenance. Predictive maintenance, monitor certain parameters of operation and get maintenance done before lift stops operating.
These requirements can be met by making the lift's systems and data observable, which is a uni-directional flow of information from the lift to the outside world. Making the lift's operation modifiable from the outside world is not required to have it be observable.
It's a car lift. Not only would it be irresponsible to rely on a computer to tell you when you should maintain it, as some inspections can only be done visually, it seems totally pointless as most inspections need to be done manually.
Get a reminder on your calendar to do a thorough inspection once a day/week (whatever is appropriate) and train your employees what to look for every time it's used. At the end of the day, a car lift on locks is not going to fail unless there's a weakness in the metal structure, no computer is going to tell you about this unless there's a really expensive sensor network and I highly doubt any of the car lifts in question have such a sensor network.
Moreover, even if they did have such a sensor network, why are these machines able to call out to the internet?
The same reason everyone just uses a microcontroller on everything. It's like a universal glue and you can develop in the same environment you ship. Makes it easy.
Lathes probably have PCs connected to them to control them, and do CNC stuff (he did say the controllers). Laser alignment machines all have PCs connected to them these days.
The cranes and lifts though... I've never heard of them being networked or controlled by a computer. Usually it's a couple buttons connected to the motors and that's it. But maybe they have some monitoring systems in them?
Off then top of my head, based on limited experience in industrial automation:
- maintenance monitoring data shipping to centralised locations
- computer based HMI system - there might be good old manual control but it might require unreasonable amounts of extra work per work order
- Centralised control system - instead of using panel specific to lift, you might be controlling bunch of tools from common panel
- integration with other tools, starting from things as simple as pulling up manufacturers' service manual to check for details to doing things like automatically raising the lift to position appropriate for work order involving other (possibly also automated) tools with adjustments based on the vehicle you're lifting
Remember that CNC is programming environment. Now how do actually see what program is loaded? Or where is the execution at the moment? For anything beyond few lines of text on dotmatrix screen actual OS starts to be come desirable.
And all things considered, Windows is not that bad option. Anything else would also have issues. And really what is your other option some outdated, unmaintained Android? Does your hardware vendor offer long term support for Linux?
Windows actually offers extremely good long term support quite often.
> And all things considered, Windows is not that bad option
I'm gonna go out on a limb and say that it actually is. It's a closed source OS which includes way more functionality than you need. A purpose-built RTOS running on a microcontroller is going to provide more reliability, and if you don't hook it up to the internet it will be more secure, too. Of course, if you want you can still hook it up to the internet, but at least you're making the conscious decision to do so at that point.
Displaying something on a screen isn't very hard in an embedded environment either.
I have an open source printer which has a display, and runs on an STM32. It runs reliably, does its job well, and doesn't whine about updates or install things behind my back because it physically can't, it has no access to the internet (though I could connect it if I desired). A CNC machine is more complex and has more safety considerations, but is still in a similar class of product.
> Does your hardware vendor offer long term support for Linux?
This seems muddled. If the CNC manufacturer puts Linux on an embedded device to operate the CNC, they're the hardware manufacturer and it's up to them to pick a chip that's likely to work with future Linuxes if they want to be able to update it in the future. Are you asking if the chip manufacturer offers long-term-support for Linux? It's usually the other way around, whether Linux will support the chip. And the answer, generally, is "yes, Linux works on your chip. Oh you're going to use another chip? yes, Linux works on that too". This is not really something to worry about. Unless you're making very strange, esoteric choices, Linux runs on everything.
But that still seems muddled. Long-term support? How long are we talking? Putting an old Linux kernel on an embedded device and just never updating it once it's in the field is totally viable. The Linux kernel itself is extremely backwards compatible, and it's often irrelevant which version you're using in an embedded device. The "firmware upgrades" they're likely to want to do would be in the userspace code anyhow - whatever code is showing data on a display or running a web server you can upload files to or however it works. Any kernel made in the last decade is going to be just fine.
We're not talking about installing Ubuntu and worrying about unsolicited Snap updates. Embedded stuff like this needs a kernel with drivers that can talk to required peripherals (often over protocols that haven't changed in decades), and that can kick off userspace code to provide a UI either on a screen or a web interface. It's just not that demanding.
As such, people get away with putting FreeRTOS on a microcontroller, and that can show a GUI on a screen or a web interface too, you often don't need a "full" OS at all. A full OS can be a liability, since it's difficult to get real-time behaviour which presumably matters for something like a CNC. You either run a real-time OS, or a regular OS (from which the GUI stuff is easier) which offloads work to additional microcontrollers that do the real-time stuff.
I did not expect Windows to be running on CNCs. I didn't expect it to be running on supermarket checkouts. The existence of this entire class of things pointlessly running self-updating, internet-connected Windows confuses me. I can only assume that there are industries where people think "computer equals Windows" and there just isn't the experience present, for whatever reason, to know that whacking a random Linux kernel on an embedded computer and calling it a day is way easier than whatever hoops you have to jump through to make a desktop OS, let alone Windows, work sensibly in that environment.
5-10 years is not unreasonable expected support I think.
And if you are someone manufacturing physical equipment be it CNC machine or vehicle lift hiring entire team to keep Linux patched and making your own releases seems pretty unreasonable and waste of resources. In the end anything you choose is not error free. And the box running software is not main product.
This is actually huge challenge. Finding vendor that can deliver you a box where to run software with promised long term support, when the support is actually more than just few years.
Also I don't understand how it is any more acceptable to run unpatched Linux in networked environment than it is Windows. These are very often not just stand-alone things, but instead connected to at least local network if not larger networks. With possible internet connections too. So not updating vulnerabilities is as unacceptable as it would be with Windows.
With CNC there is place for something like Windows OS. You have separate embedded system running the tools. But you still want a different piece managing the "programs". As you could have dozens or hundreds of these. And at that point reading them from network starts once again make sense. Time of dealing with floppies is over...
And with checkouts, you want more UI than just buttons. And Windows CE has been reasonably effective tool in that.
Linux is nice on servers, but often with embedded side keeping it secure and up to date is massive amount of pain. Windows does offer excellent stability and long term support. And you can just simply buy a computer with sufficient support from MS. One could ask why do not not massive companies run their own Linux distributions?
> 5-10 years is not unreasonable expected support I think.
A couple of years ago, I helped a small business with an embroidery machine that runs Windows 98. Its physical computer died, and the owner could not find the spare parts. Fortunately, it used a parallel port to control the embroidery hardware, so it was easy to move to a VM with a USB parallel port adapter.
That was very lucky then. USB parallel ports adapters are only intended to work with printers. They fail with any hardware that does custom signalling over the parallel port.
Maybe you want your lift to be able to diagnose itself. Tell possible faults, instead of spending man hours on troubleshooting every part each time downtime included. With big lifts there are many parts that could go wrong. Being able to identify which one saves lot of time and time is money.
These sort of outages are actually extremely rare nowadays. Considering how long these control systems have been kept around must mean that they are not actually causing that many issue that replacing them would be worth it.
you log into the machine, download files, load files onto the program. that doesn't need a desktop environment? you want to reimplement half of one, poorly, because that would have avoided this stupid mistake, in exchange for half a dozen potential others, and a worse customer experience?
> you log into the machine, download files, load files onto the program. that doesn't need a desktop environment?
Believe it or not, it doesn't! An embedded device with a form of flash storage and an internet connection to a (hopefully) LAN-only server can do the same thing.
> you want to reimplement half of one, poorly
Who says I would do it poorly? ;)
> and a worse customer experience?
Why would a purpose-built system be a worse customer experience than _windows_? Are you really going to set the bar that low?
Or lathe, or cranes, or alarms, or hvac... what the actual fuck.
Next move should be some artisanal as mechanical-as-possible quality products, or at least Linux(TM) certified product or similar (or Windows-free (TM)). The opportunity is here, everybody noticed this clusterfuck, and smart folks don't like ignoring threats that are in your face.
But I suppose in 2 weeks some other bombastic news will roll over this and most will forget. But there is always some hope
I feel like this is the fake reason given to try to hide the obvious reason: automatic updates are a power move that allows companies to retain control of products they've sold.
Yep. And even aside from security, its a nightmare needing to maintain multiple versions of a product. "Oh, our software is crashing? What version do you have? Oh, 4.5. Well, update 4.7 from 2 years ago may fix your problem, but we've also released major versions 5 and 6 since then - no, I'm not trying to upsell you ma'am. We'll pull up the code from that version and see if we can figure out the problem."
Having evergreen software that just keeps itself up to date is marvellous. The Google Docs team only needs to care about the current version of their software. There are no documents saved with an old version. There's no need to backport fixes to old versions, and no QA teams that need to test backported security updates on 10 year old hardware.
Its just a shame about, y'know, the aptly named crowdstrike.
Fine. But Google can mass-migrate all of them to a new format any time they want. They don’t have the situation you used to have with Word, where you needed to remember to Save As Word 2001 format or whatever so you could open the file on another computer. (And if you forgot, the file was unreadable). It was a huge pain.
Yes it is better than the Word situation, but no it isn't not caring. There do exist old format docs and Google does have to care - to make that migration.
Yes, they have to migrate once. But they don’t need to maintain 8 different versions of Word going back a decade, make sure all security patches get back ported (without breaking anything along the way), and make all of them are in some way cross compatible despite having differing feature sets.
If google makes a new storage format they have to migrate old Google docs. But that’s a once off thing. When migrations happen, documents are only ever moved from old file formats to new file formats. With word, I need to be able to open an old document with the new version of word, make changes then re-save it so it’s compatible with the old version of word again. Then edit it on an old version of word and go back and forth.
I’m sure the Google engineers are very busy. But by making Docs be evergreen software, they have a much easier problem to solve when it comes to this stuff. Nobody uses the version of Google docs from 6 months ago. You can’t. And that simplifies a lot of things.
They have to migrate each time they change the format, surely. Either that or maintain converters going back decades, to apply the right one when a document is opened.
> but they don’t need to maintain 8 different versions of Word going back a decade, make sure all security patches get back ported
Nor does Microsoft for Word.
> With word, I need to be able to open an old document with the new version of word, make changes then re-save it so it’s compatible with the old version of word again.
You don't have to, unless you want the benefit of that.
And Google Docs offers the same.
> Nobody uses the version of Google docs from 6 months ago. You can’t. And that simplifies a lot of things.
Well, I'd love to use the version of Gmail web from 6 months ago. Because three months ago Google broke email address input such that it no longer accesses the contacts list and I have to type/paste each address in full.
That's a price we pay for things being "simpler" for a software provider than can and does change the software I am using without telling me let alone giving me the choice.
Not to mention the change that took away a large chunk of my working screen space for an advert telling me to switch to the app version, despite have the latest version of Google's own Chrome. An advert I cannot remove despite having got the message 1000 times. Pure extortion. Simplification is no excuse.
It used to be the original reason why automatic updates were accepted and it was valid.
But since then it has been abused for all sorts of things that really are nothing more than consolidation of power, including an entire shift in mentality of what "ownership" even means: Tech companies today seem to think it's the standard that they keep effective ownership of a product for its entire life cycle, no matter how much money a customer has paid for it, and no matter deeply the customer relies on that product.
(Politicians mostly seem fine with that development or even encourage it)
I agree that an average nontechnical person can't be expected to keep track of all the security patches manually to keep their devices secure.
What I would expect would be an easy way to opt-out of automatic updates if you know what you're doing. The fact that many companies go to absurd lengths to stop you from e.g. replacing the firmware or unlocking the bootloader, even if you're the owner of the device is a pretty clear sign to me they are not doing this out of a desire to protect the end-user.
Also, I'm a bit baffled that there is no vetting at all of the contents of updates. A vendor can write absolutely whatever they want into a patch for some product of theirs and arbitrarily change the behaviour of software and devices that belong to other people. As a society, we're just trusting the tech companies to do the right thing.
I think a better system would be if updates would at the very least have to be vetted by an independent third party before being applied and a device would only accept an update if it's signed by the vendor and the third-party.
The third-party cold then do the following things:
- run tests and check for bugs
- check for malicious and rights-infringing changes deliberately introduced by the vendor (e.g. taking away functionality that was there at time of purchase)
- publicly document the contents of an update, beyond "bug fixes and performance improvements".
What you're describing is what Linux distro maintainers do: Debian maintainers check the changes of different software repos, look at new options and decide if anything should be disabled in the official Debian release, and compile and upload the packages.
The problem you are complaining about here is the weakening of labor and consumer organizations vis a vis capital or ownership organizations. The software must be updated frequently due to our lack of skill in writing secure software. Whether all the corporations will take advantage of everything under the sun to reduce the power the purchasers and producers of these products have is a political and legal questions. If only the corporations are politically involved then only they will have their voice heard by the legislatures.
no reason why both can't be true — the security is overall better, and companies are happy to invest in advancing this paradigm because it gives them more control
incentive can and does undermine the stated goal. what if the government decided to take control of everyone's investment portfolio to prevent the market doing bad things? or an airplane manufacturer gets takes control of its own safety certification process because obviously its in their best interest that their planes are safe? imposed curfew, everyone has to be inside their homes while its dark outside because most violent crimes occur at night?
how much lathe-ing have you done recently? did you load files onto your CNC lathe with an SD card, and thus there is a computer, which needs updates, or are you thinking of a lathe that is a motor and a rubber band, and nothing else, from, like, high school woodshop?
I bought a 3d printer years ago then let it sit collecting dust for like 2 or more years because I was intimidated by it. Finally started using it and was blown away how useful it has been to me. Then a long time later realized holy shit there are updates and upgrades one can easily do. I can add a camera and control everything and monitor everything from any online connected device. I always hated pulling out the sd card and bringing it to my computer and copying it over and back to the printer and so on. Being online makes things so much easier and faster. I have been rocking my basic printer for a few years now and have not paid much attention to the scene and then started seeing these multi color prints holy shit am I slow and behind the times. The newer printers are pretty rad but I will give props to my Anycubic Mega it has been a work horse and I have had very little problems. I don't want it to die on me but a newer printer would be cool also.
There are immense benefits to using modern computing power, including both onboard and remote functionality. The cost of increased software security vulnerability is easily justified.
1. Nobody auto updates my linux machines. They have no malware.
2. It's my job to change the oil in my car. When Ford starts sending a tech to my house to tamper with my machines "because they need maintenance" will be the day I am no longer a Ford customer.
The irony of this comment is almost perfected by the fact Ford were one of the leading companies in bringing ECU's (one of the myriad of computer systems essential to modern vehicles that can and do receive regular updates) to market in checks notes 1975.
Those Linux systems that aren't getting updates must be the ones sending Mirai to my Linux systems, which are getting updates (and also Mirai, although it won't run because it's the wrong architecture).
No malware? Only if you have your head in the sand.
I assume that comment was saying that they handle the update process and that their machines don't have any malware on them.
I ignored it because it was somewhat abusive and is missing the problem that automatic updates are trying to solve: that most people, but not all, don't do updates.
Wow, this hits close to home. Doing a page fault where you can't in the kernel is exactly what I did with my very first patch I submitted after I joined the Microsoft BitLocker team in 2009. I added a check on the driver initialization path and didn't annotate the code as non-paged because frankly I didn't know at the time that the Windows kernel was paged. All my kernel development experience up to that point was with Linux, which isn't paged.
BitLocker is a storage driver, so that code turned into a circular dependency. The attempt to page in the code resulted a call to that not-yet-paged-in code.
The reason I didn't catch it with local testing was because I never tried rebooting with BitLocker enabled on my dev box when I was working on that code. For everyone on the team that did have BitLocker enabled they got the BSOD when they rebooted. Even then the "blast radius" was only the BitLocker team with about 8 devs, since local changes were qualified at the team level before they were merged up the chain.
The controls in place not only protected Windows more generally, but they even protected the majority of the Windows development group. It blows my mind that a kernel driver with the level of proliferation in industry could make it out the door apparently without even the most basic level of qualification.
> without even the most basic level of qualification
That was my first thought too. Our company does firmware updates to hundreds of thousands of devices every month and those updates always go through 3 rounds of internal testing, then to a couple dozen real world users who we have a close relationship with (and we supply them with spare hardware that is not on the early update path in case there is a problem with an early rollout). Then the update goes to a small subset of users who opt in to those updates, then they get rolled out in batches to the regular users in case we still somehow missed something along the way. Nothing has ever gotten past our two dozen real world users.
Exactly this what I was missing in the story. Like why not to have a limited set of users have it before going live for the whole user base at a mission critical product like this is beyond comprehension of everyone ever came across software bugs (so billions of people). And then we already overcame the part of not testing internally well, or at all? Something clusteruck must have happened there which is still better than imagining that this is the normal way the organization operates. Which is a very scary vision. Serious rethinking of trusting this organization is due everywhere!
The funniest part was seeing Mercedes F1 team pit crew staring at BSODs at their workstations[1] while wearing CrowdStrike t-shirts. Some jokes just write themselves. Imagine if they loose the race because of their sponsor.
But hey, at least they actually dogfood the products of their sponsors instead of just taking money to shill random stuff.
Because CrowdStrike is an EDR solution it likely has tamper-proofing features (scheduled tasks, watchdog services, etc.) that re-enables it. These features are designed to prevent malware or manual attackers from disabling it.
These features drive me nuts because they prevent me, the computer owner/admin, from disabling. One person thought up techniques like "let's make a scheduled task that sledgehammers out the knobs these 'dumb' users keep turning' and then everyone else decided to copycat that awful practice.
Are you saying that the compliance rule requires the software to be uninstallable? Once it's installed it's impossible to uninstall? No one can uninstall it? I have a hard time believing it's impossible to remove the software. In the extreme case, you could reimage the machine and reinstall Windows without Crowdstrike.
Or are you saying that it is possible to uninstall, but once you do that, you're not in compliance, so while it's technically possible to uninstall, you'll be breaking the rules if you do so?
The person I originally replied to, rkagerer, said there was some technical measure preventing rkagerer from uninstalling it even though rkagerer has admin on the computer.
I was referring to the difficulty overriding the various techniques certain modern software like this use to trigger automatic updates at times outside admin control.
Disabling a scheduled task is easy, but unfortunately vendors are piling on additional less obvious hooks. Eg. Dropbox recreates its scheduled task every time you (run? update?) it, and I've seen others that utilize the various autostart registry locations (there are lots of them) and non-obvious executables to perform similar "repair" operations. You wind up in "Deny Access" whackamole and even that isn't always effective. Uninstalling isn't an option if there's a business need for the software.
The fundamental issue is their developers / product managers have decided they know better than you. For the many users out there who are clueless to IT this may be accurate, but it's frustrating to me and probably others who upvoted the original comment.
Is what you're saying relevant in the Crowdstrike case? If you don't want Crowdstrike and you're an admin, I assume there are instructions that allow you to uninstall it. I assume the tamper-resistant features of Crowdstrike won't prevent you from uninstalling it.
It's currently a DOS by the crashing component, so it's already broken the Availability part of Confidentiality/Integrity/Availability that defines the goals of security.
But a loss of availability is so much more palatable than the others, plus the others often result in manually restricting availability anyway when discovered.
I think the wider societal impact from the loss of availability today - particularly for those in healthcare settings - might suggest this isn't always the case
What is the importance of data integrity? If important pre-op data/instructions are missing or gets saved on the wrong patient record which causes botched surgeries, if there are misprescribed post-op medications, if there is huge confusion and delays in critical follow-up surgeries because of a 100% available system that messed up patient data across hospitals nationwide, if there are malpractice lawsuits putting entire hospitals out of business etc etc, then is that fallout clearly worth having an available system in the first place?
Huh? We're talking about hypotheticals here. You're saying availability is clearly more important than data integrity. I'm saying that if a buggy kernel loadable module allowed systems to keep on running as if nothing was wrong, but actually caused data integrity problems while the system is running, that's just as bad or worse.
If Linux and Windows have similar architectural flaws, Microsoft must have some massive execution problems. They are getting embarrassed in QA by a bunch of hobbyists, lol.
Isn't DoSing your own OS an attack vector? and a worse one when it's used in critical infrastructure where lives are at stake.
There is a reasonable balance to strike, sometimes it's not a good idea to go to extreme measures to prevent unlikely intrusion vectors due to the non-monetary costs.
In the absence of a Crowdstrike bug, if an attacker is able to cause Crowdstrike to trigger a bluescreen, I assume the attacker would be able to trigger a bluescreen in some other way. So I don't think this is a good argument for removing the check.
That assumes it's more likely than crowdstrike mass bricking all of these computers... this is the balance, it's not about possibility, it's about probability.
If you're planning around bugs in security modules, you're better off disabling them - malware routinely use bugs in drivers to escalate, so the bug you're allowing can make the escalation vector even more powerful as now it gets to Ring 0 early loading.
I use Explorer Patcher on a windows 11 machine. It had a history of crash loops with Explorer that they implemented this circuit breaker functionality.
It's baffling how fast and wide the blast radius was for this Crowdstrike update. Quite impressive actually, if you think about it - updating billions of systems that quickly.
This was my first thought too. I'm not that familiar with the space, but I would think for something this sensitive the rollout would be staggered at least instead of what looks like globally all at the same time.
This is the bit I am still trying to understand. On CrowdStrike you can define how many updates a host is behind. I.e. n (latest), n-1 (one behind) or n-2 etc. This update was applied to a 'latest' policy hosts and the n-2 hosts. To me it appears that there was more to this than just a corrupt update, otherwise how was this policy ignored? Unless it doesn't separate the update as deeply and maybe just a small policy aspect, which would also be very concerning.
I guess we won't really know until they release the post mortem...
Yeah, my guess is that they roll out the updates to every client at the same time, and then have the client implement the n-1/2/whatever part locally. That worked great-ish until they pushed a corrupt (empty) update file which crashed the client when it tried to interpret the contents... Not ideal, and obviously there isn't enough internal testing before sending stuff out to actual clients.
But do you ever get free world-wide advertisement that everyone uses your product? Crowdstrike sure did and I'm sure they'll use that to sell it to more people.
> It blows my mind that a kernel driver with the level of proliferation in industry could make it out the door apparently without even the most basic level of qualification.
Discussed elsewhere it is claimed that the file causing the crash was a data file that has been corrupted in the delivery process. So the development team and their CI have probably tested a good version, but the customer received a bad one.
If that is true to problem is that the driver first uses an unsigned file at all, so all customer machines are continuously at risk for local attacks. And then it does not do any integrity check on the data it contains, which is a big no no for all untrusted data, whether user space or kernel.
> And then it does not do any integrity check on the data it contains, which is a big no no for all untrusted data, whether user space or kernel.
To me, this is the inexcusable sin. These updates should be signed and signatures validated before the file is read. Ideally the signing/validating would be handled before distribution so that when this file was corrupted, the validation would have failed here.
But even with a good signature, when a file is read and the values don’t make sense, it should be treated as a bad input. From what I’ve seen, even a magic bytes header here would have helped.
the flawed data was added in a post-processing step of the configuration update, which is after it's been tested internally but before it's copied to their update servers
"And they promise fast threat mitigation... Let allow them to take over EVERYTHING! With remote access, of course. Some form of overwatch of what they in/out by our staff ? Meh...
And it even allow us to do cuts in headcount and infra by $<digits_here> a year."
> I didn't know at the time that the Windows kernel was paged.
At uni I had a professor in database systems, who did not like written exams, but mostly did oral exams. Obviously for DBMSes the page buffer is very relevant, so we chatted about virtual memory and paging. So in my explanation I made the difference for kernel space and user space. I am pretty sure I had read that in a book describing VAX/VMS internals. However, the professor claimed that a kernel never does paging for its own memory. I did not argue on that and passed the exam with the best grade. Did not check that book again to verify my claim. I have never done any kernel space development even vaguely close to memory management, so still today I don't know the exact details.
However, what strikes me here: When that exam happened in 1985ish the NT kernel did not exist yet, I'd believe. However, IIRC a significant part of the DEC VMS kernel team went to Microsoft to work on the NT kernel. So the concept of paging (a part of) kernel memory went with them? Whether VMS --> WNT, every letter increased by one is just a coincidence or intentionally the next baby of those developers I have never understood. As Linux has shown us today much bigger systems can be successfully handled without the extra complications for paging kernel memory. Whether it's a good idea I don't know, at least not a necessary one.
The VMS --> WNT acronym relationship was not mentioned, maybe it was just made up later.
One thing I did not know (or maybe not remember) is that NT was originally developed exclusively for the Intel i860, one of Intel's attempts to do RISC. Of course in the late 1980s CISC seemed deemed and everyone was moving to RISC. The code name of the i860 was N10. So that might well be the inside origin of NT, the marketing name New Technology retrofitted only later.
"New Technology", if you want to search the transcript. Per Dave, marketing did not want to use "NT" for "New Technology" because they thought no one would buy new technology.
Actually it was not only x86 hardware that was not really planned for the NT kernel, also Windows user space was not the first candidate. Posix and maybe even OS/2 were earlier goals.
So the current x86 Windows monoculture came up as an accident because strategically planned new options did not materialize. The user space change should finally debunk the theory that VMS andvances into WNT was a secret plot by the engineers involved. It was probably a coincidence discovered after the fact.
"Perhaps the worst thing about being a systems person is that
other, non-systems people think that they understand the daily
tragedies that compose your life. For example, a few weeks ago,
I was debugging a new network file system that my research
group created. The bug was inside a kernel-mode component,
so my machines were crashing in spectacular and vindic-
tive ways. After a few days of manually rebooting servers, I
had transformed into a shambling, broken man, kind of like a
computer scientist version of Saddam Hussein when he was
pulled from his bunker, all scraggly beard and dead eyes and
florid, nonsensical ramblings about semi-imagined enemies.
As I paced the hallways, muttering Nixonian rants about my
code, one of my colleagues from the HCI group asked me what
my problem was. I described the bug, which involved concur-
rent threads and corrupted state and asynchronous message
delivery across multiple machines, and my coworker said,
“Yeah, that sounds bad. Have you checked the log files for
errors?” I said, “Indeed, I would do that if I hadn’t broken every
component that a logging system needs to log data. I have a
network file system, and I have broken the network, and I have
broken the file system, and my machines crash when I make
eye contact with them. I HAVE NO TOOLS BECAUSE I’VE
DESTROYED MY TOOLS WITH MY TOOLS. My only logging
option is to hire monks to transcribe the subjective experience
of watching my machines die as I weep tears of blood.”
Ah, the joys of trying to come up with creative ways to get feedback from your code when literally nothing is available. Can I make the beeper beep in morse code? Can I just put a variable delay in the code and time it with a stopwatch to know which value was returned from that function? Ughh.
Some of us have worked on embedded systems or board bringup. Scope and logic analyzer ... Serial port a luxury.
IIRC Windows has good support for debugging device drivers via the serial port. Overall the tooling for dealing with device drivers in windows is not bad including some special purpose static analysis tool and some pretty good testing.
Yeah. Been there, done that. Write to an unused address decode to trigger the logic analyzer when I got to a specific point in the code, so I could scroll back through the address bus and figure out what the program counter had done for me to get to that piece of code.
This is an interesting piece of creative writing, but virtual machines already existed in 2013. There are very few reasons to experiment on your dev machine.
At the time, Mickens worked at Microsoft Research, and with the Windows kernel development team. There may only be a few reasons to experiment on your dev machine, but that's one environment where they have those reasons.
>Doing a page fault where you can't in the kernel is exactly what I did with my very first patch I submitted after I joined the Microsoft BitLocker team in 2009.
Hello from a fellow BitLocker dev from this time! I think I know who this is, but I'm not sure and don't want to say your name if you want it private. Was one of your Win10 features implementing passphrase support for the OS drive? In any case, feel free to reach out and catch up. My contact info is in my profile.
Win8. I've been seeing your blog posts show up here and there on HN over the years, so I was half expecting you to pick up on my self-doxx. I'll ping you offline.
"It blows my mind that a kernel driver with the level of proliferation in industry could make it out the door apparently without even the most basic level of qualification."
It was my understanding that MS now sign 3rd party kernel mode code, with quality requirements. In which case why did they fail to prevent this?
Drivers have had to be signed forever and pass pretty rigorous test suites and static analysis.
The problem here is obviously this other file the driver sucks in. Just because the driver didn't crash for Microsoft in their lab doesn't mean a different file can't crash it...
How so? Preventing roll-backs on software updates is a "security feature" in most cases for better and for worse. Yeah, it would be convenient for tinkerers or in rare events such as these, but would be a security issue in the 99,9..99% of the time for enterprise users where security is the main concern.
I don't really understand this, many Linux distributions like Universal Blue advertise rollbacks as a feature. How is preventing a roll-back a "security feature"?
Imagine a driver has an exploitable vulnerability that is fixed in an update. If an attacker can force a rollback to the vulnerable older version, then the system is still vulnerable. Disallowing the rollback fixes this.
This is what I don’t get, it’s extremely hard for me to believe this didn’t get caught in CI when things started blue screening. Every place I ever did test rebooting/powercycling was part of CI, with various hardware configs. This was before even our lighthouse customers even saw it.
Apparently the flaw was added to the config file in post-processing after it had completed testing. So they thought they had testing, but actually didn't.
I was thinking, this doesn't seem like its a case of all these machines still on an old version of windows, or some specific version, that is having issues. Therefore QA just missed one particular variant in their smoke testing. It seems like its every windows instance with that software, so either they don't have basic automated testing, or someone pushed this outside of a normal process.
> Even then the "blast radius" was only the BitLocker team with about 8 devs, since local changes were qualified at the team level before they were merged up the chain.
Did I mention this was 15 years ago? Software development back then looked very different than it does now, especially in Wincore. There was none of this "Cloud-native development" stuff that we all know and love today. GitHub was just about 1 year old. Jenkins wouldn't be a thing for another 2 years.
In this case the "automated test" flipped all kinds of configuration options with repeated reboots of a physical workstation. It took hours to run the tests, and your workstation would be constantly rebooting, so you wouldn't be accomplishing anything else for the rest of the day. It was faster and cheaper to require 8 devs to rollback to yesterday's build maybe once every couple of quarters than to snarl the whole development process with that.
The tests still ran, but they were owned and run by a dedicated test engineer prior to merging the branch up.
Oh I rebooted, I just didn't happen to have the right configuration options to invoke the failure when I rebooted. Not every dev workstation was bluescreening, just the ones with the particular feature enabled.
But as someone already pointed out, the issue was seen on all kinds of windows hosts. Not just the ones running a specific version, specific update etc.
There's "something that requires highly specific conditions managed to slip past QA" and then there's "our update brought down literally everyone using the software". This isn't a matter of bad luck.
The memory used by the Windows kernel is either Paged or Non-Paged. Non-Paged means pinning the memory in physical RAM. Paged means it might be swapped out to disk and paged back in when needed. OP was working on BitLocker a file system driver, which handles disk IO. It must be pinned in physical RAM to be available all the times; otherwise, if it's paged out, an IO request coming would find the driver code missing in memory and try to page in the driver code, which triggers another IO request, creating an infinite loop. The Windows kernel usually would crash at that point to prevent a runway system and stops at the point of failure to let you fix the problem.
Linux is a bit unusual in that kernel memory is generally physically mapped and unless you use vmalloc any memory you allocate has to correspond to pages backed by RAM. This also ties into how file IO happens, swapping, and how Linux approach to IO is actually closer to Multics and OS/400 than OG Unix.
Many other systems instead default to using full power of virtual memory including swapping kernel space to disk, with only things explicitly need to be kept in ram being allocated from "non-paged" or "wired" memory.
Must have been DNS... when they did the deployment run and the necessary code was pulled and the DNS failed and then the wrong code got compiled...</sarcasm>
that they don't even do staged/A-B pushes was also <mind-blown-away>
- lifts wont operate.
- cant disarm the building alarms. (have been blaring nonstop...)
- cranes are all locked in standby/return/err.
- laser aligners are all offline.
- lathe hardware runs but controllers are all down.
- cant email suppliers.
- phones are all down.
- HVAC is also down for some reason (its getting hot in here.)
the police drove by and told us to close up for the day since we dont have 911 either.
alarms for the building are all offline/error so we chained things as best we could (might drive by a few times today.)
we dont know how many orders we have, we dont even know whos on schedule or if we will get paid.
reply