The correct lesson is to stop introducing more vulns into your systems by running "security" products. Crowdstrike was just an outage but could have just has easily been Solarwinds 2.0.
Crowdstrike is probably less bad than the alternatives that I have run into that are largely developed by very low cost engineers cough TrendMicro cough but even so, they aren't NT kernel engineers nor do they have the NT kernel release process.
Companies need to find ways to live without this crap or this will keep happening and it will be a lot worse one day. Self-compromising your own systems with RATs/MDMs/EDR/XDR/whatever other acronym soup needed to please the satanic CISSPs are just terrible ideas in general.
Probably runs in some dev machines through. Maybe it can infect a valuable FOSS dev and then do the xz backdoor without bothering with the Jia Tan persona.
The regulation requires controls, there are usually lots of ways of implementing those controls.
However more often than not the people buying these piles of goop are more interested in the cheapest/fastest/neatest way of satisfying those controls, actual security be damned.
So when vendor comes a long and says "our product meets those controls! all you need to do is tell the auditor you use product $X and you are done!" well that is an awfully compelling argument when those are your priorities.
Of course every security minded person dies a little inside but that is just how the world works for now, no-one has been sufficiently punished for these bad decisions yet but it will happen. Not a question of if, just when.
The lobbying does matter though as they lobby for increasingly useless controls that are harder and harder to satisfy with actual security (read least privilege design, capabilities, sandboxing, etc) and audit logging and become increasingly about "how big is your ruleset in your SIEM or equivalent and does it contain a specific rule that we can show auditor for control $Y" (yay Splunk/SumoLogic lobbyists) etc.
Not to mention stuff like pushing for TLS MITM boxes (self-compromising all your outgoing encrypted sessions is beyond insane) etc. The modern "security" industry is an embarrassment sigh.
Essential Eight 'maturity levels' are vague in this regard, though seems to mandate it (at least 'application control'), at varying levels.
Seems like the only way third-parties could do it 'effectively' is with kernel-level access, though I am absolutely not an expert in this field (light years from it).
Having been an end-user going through the adoption of one of these third-party application control and auditing systems, I won't say which, or for whom, I can definitely state it was painful. Said system broke quite a few business-as-usual processes, despite a months long 'learning phase' and often required 'super user' intervention. Occasionally this is still the case now, because almost no software vendor (commercial or open-source) digitally signs all their files capable of executing instructions (either natively or via an interpreter). That said, the third-party app control package stops everything it doesn't recognise or has an exception for. I don't know how far it embeds itself into the system though.
It seems like vendors like CrowdStrike swoop in on the 'easy to be compliant' gimmick.
One wonders how many are aware of Microsoft's own products in this space? App Locker comes to
mind.
I am very interested in alternative ways of implementing controls to satisfy regulations; especially in ways that reduce, rather than increase, attack surface. I'd love if you could elaborate here, but links to documentation/technical articles are also welcome!
I'm not up-to-date on regulations (or many other things) but decreasing the attack surface in a way that could give a direct percentage improvement could get serious.
If lots of people wanted to reduce their attack surface by about 90% or more they would already be unplugging their LAN cable (or equivalent wifi logoff) except when they have a strong need to be online to accomplish something that is not (or never was) possible to handle otherwise.
If that's not enough, then escalate until no-one without a desperate need to be online is ever connected, and then only during the most desperate times, if necessary.
Depends on how much isolation from questionable incoming code (whatever you did not specifically request) that you prefer to have how much of the time.
Ideally, all incoming code or downloads should be directed to safe storage and never executed until after the user has had a chance to deliberately scan it offline as many ways as they would like before attempting a test deployment.
Things just used to be a lot more ideal more of the time. So many kinds of business-as-usual once raked in the bucks without any online connection, have little or nothing more to show for it afterward, and now threats are eclipsing the benefits if any.
>the cheapest/fastest/neatest way of satisfying those controls, actual security be damned.
Cheapest, across-the-board is auto-patching when online, since so many systems are expected to be online whether it's good for them or not.
I thought it was plain to see it was moving in this direction consistently since the arrival of Windows Update.
That's the one big thing that initiated the possibility of mindless on-line patching replacing the previous deliberate, focused approach.
Every year since then fewer people even try to make software that is good enough to release under the original paradigm where it had full value, back when the need for patching would only occur on the rare occasion of true unforeseen bizarreness that could only be discovered after full release.
There's always been a real advantage when everything is more ideally suited to the task to begin with, and stays the same for years in a row except for when you really want it to change and know exactly what those changes are going to do in detail.
The need or preference for routine patching or updating is a dead giveaway for software that's probably not worth money to begin with.
This completely ignores security incident response entirely. I don’t think removing the tools that slow or prevent a hack is the answer. I think it is unfair to call it crap if you have not dealt with security incidents.
Of all the lessons that could be learned from this, I think this not the best one. We don’t call for everyone to go back to having their own datacenter because AWS had a massive outage.
> I don’t think removing the tools that slow or prevent a hack is the answer.
No one ever said that. It's about removing tools that introduce a huge security risk for abysmal gain. Antiviruses are there for compliance and are detrimental to actual security as explained by GP.
CrowdStrike isn't just an anti-virus/malware solution. It also provides observability into the running processes. It integrates into threat intelligence so you can take hashes of known bad processes and see if it exists across your fleet of devices.
It can isolate devices from a network and only allow the SOC access to investigate.
The difference between a vulnerability and a bug is slim. This is why I don't immediately install every single software update; official code is more likely to cut me off from my livelihood than an actual exploit taking place.
I think that Russians are now in a unique position, having learned this lesson almost universally due to sanctions. Any foreign vendor of self-updating software can, in theory, be ordered to kill your system remotely or to deliberately introduce a backdoor.
P.S. I do realize that the same applies to domestic products, especially to the ones mandatory to install, like Russian software that is required to be present on new smartphones sold in Russia. But my point is about people learning a lesson.
What a ridiculous take. I'd love to see any one of the 70 Fortune 100 companies that are Crowdstrike customers turn off Crowdstrike for an extended period of time and see how long it'll take for someone to brick their entire infrastrucure with ransomware.
We live in an immensely networked world built on top of swiss cheese infrastructure (even the latest Windows is riddled with points of attack surface), unless you plan rehaul the underlying infrastructure it's simply not an option to run large scale infra without a protective layer on top.
> I'd love to see any one of the 70 Fortune 100 companies that are Crowdstrike customers turn off Crowdstrike for an extended period of time and see how long it'll take for someone to brick their entire infrastrucure with ransomware.
I've got two objections to this conventional wisdom:
1) Endpoint protection in particular isn't essential to good infrastructure. It's only a bandaid over bad infrastructure. Perhaps a moot point, given that bad infrastructure is prevalent, but I digress.
2) Every IT director asserting this should be challenged to prove it. Prove that you stopped credible threats equal to X dollars of damage or lost revenue compared to the Y dollars of actual lost revenue due to production outages. And I'm not talking about the inflated threat totals that such systems produce to justify their existance, but actual threats that absent CrowdStrike would have penetrated other defenses. Until this can be done, it's simply a hypothetical vs the actual damages inflicted by the outage.
For your second point: That line of reasoning has traditionally resulted in underfunding of security.
It’s also quite impossible in practice.
Say that you know for a fact that CrowdStrike prevented a ransomware attack. How can you accurately assess the money saved by prevention if you don’t know what the ransom would have been set at? How can you accurately assess the value of the data loss from not paying the ransom?
You can make educated guesses in both cases, but they are still strictly in the realm of the hypothetical.
>For your second point: That line of reasoning has traditionally resulted in underfunding of security.
Underfunding relative to what? Another example I'd cite would be the TSA. How much do we pay it now vs how much would we lose based on what they've truly stopped?
>Say that you know for a fact that CrowdStrike prevented a ransomware attack. How can you accurately assess the money saved by prevention if you don’t know what the ransom would have been set at? How can you accurately assess the value of the data loss from not paying the ransom?
You could estimate at an order of magnitude based on the level of access of the user and a model of your infrastructure. For example, if only one machine is compromised, you could estimate order of magnitude based on the value of the data lost, the floor being set at the days of work lost * hourly rate of the user.
Underfunding in the sense that extremely preventable real world hacks and breaches happened, with loss of customer private data. And underfunding because those companies had documented IT asking to be allowed to do exactly things that would prevent the attack.
If you want proof that real world attack would be stopped by measure, then you will leave the system vulnerable to everything your internal people were not able to exploit.
Do we have available statistics as to how many actual attacks are thwarted by Crowdstrike or similar products (and not by other mitigation strategies -- i.e. all mitigations failed, CS succeeded)?
For some definition of "other mitigation strategies", yes. The differences between CrowdStrike's EDR and any of the other top 5-10 products in its category are negligible, for example. But having strong GPO controls and a firewall and an email gateway doesn't absolve you of the need for something on endpoints. "Malware can hide, but it must run", as the old SANS saying goes, and the endpoint is where that happens.
This is an outdated philosophy that should have died long ago, and CrowdStrike's marketing itself agrees. The solution is to have zero-trust endpoints. That's what CrowdStrike claims to be doing, and that's also what it can't do. Having true zero trust endpoints absolves you of the need for something, anything, on the endpoints. It's not even as disruptive as it seems to implement.
Even if, say, banks and insurance companies could move a couple of hundred thousand Windows endpoints to fully functional zero trust tomorrow, they'd still need something on the endpoint, if only for log aggregation and detection. It is certainly awesome if the bad guys can't move laterally but you still want to know they're there.
If you want to harden the endpoints that's fine. But it's not worth doing that if it means that you now have to trust your endpoints. If the choice is between a zero trust architecture or detection, you should take zero trust every time. That said, you don't need to trust the endpoint to harden it.
Why does my log aggregation and monitoring tool need to be a kernel driver? Why can't I just use Windows Defender and a network scanner for detection, and netboot the endpoints to make bootkits useless? That would provide just as much in the way of detection, and more in the way of protection, and I still don't need to trust my endpoint, nor any third party.
From a security perspective what you're describing is a nice-to-have, and to get it you'd decuple your attack surface by adding a networked, unfirewallable, impossible to remove rootkit-shaped target for every hacker in the world to instantly own your entire network.
Banks and insurance companies already have a lot of endpoints with no kernel level EDR, those are their Android and iOS devices. It works just fine and has for a long time.
> a networked, unfirewallable, impossible to remove rootkit-shaped target for every hacker in the world to instantly own your entire network.
This old chestnut?
1. Exploiting endpoint agents is mildly popular on blogs and in conference talks. It could happen in the wild, but that's costly R&D for something that would be found and patched immediately. The economics are against it, so in practice it doesn't happen.
2. Exploits in general are actually pretty rare in the corporate world, and when we do see them they're mostly command injections like Log4Shell. This is a 180-degree turn from a decade ago, when Internet Explorer was the dominant browser, the web ran on Flash, people read PDFs in Acrobat, and Microsoft could barely patch the Office equation editor because they'd lost the source. That world is gone.
3. In the 2020s, almost every attack you'll have heard of was because of weak passwords or missing MFA. SolarWinds, Colonial Pipeline, Uber, Okta, Microsoft, UnitedHealth, on and on. Plenty of malware, zero exploits.
So to sum up, you have real risk from bad things that actually happen, and then you have tools that effectively manage that risk, and then you have hypothetical risk that never happens associated with those tools. And your position is that the hypothetical risk that never happens is so great that people shouldn't use these tools to stop the real risk that actually does happen.
> 1. Exploiting endpoint agents is mildly popular on blogs and in conference talks. It could happen in the wild, but that's costly R&D for something that would be found and patched immediately. The economics are against it, so in practice it doesn't happen.
You can't know that with anywhere near this level of confidence. For most organizations the only thing that would detect such an attack to begin with is the ERR itseld. An exploited endpoint agent in a world where endpoint agents are the cornerstone of security means that there is no guarantee it would be found immediately, especially if it's used for a targeted attack and cleaned up afterwards. Even if it was, it would be impossible to patch automatically and would require reinstalling the OS.
> 3. In the 2020s, almost every attack you'll have heard of was because of weak passwords or missing MFA. SolarWinds, Colonial Pipeline, Uber, Okta, Microsoft, UnitedHealth, on and on. Plenty of malware, zero exploits.
There were plenty of attacks that relied on exploits in the 2020s. The KEV database itself lists hundreds every year. Multiple ransomware groups relied on RCE vulnerabilities. Sometimes weak credentials are also implicated, that doesn't mean they are the only reason, and certainly doesn't mean "zero exploits".
> So to sum up, you have real risk from bad things that actually happen, and then you have tools that effectively manage that risk, and then you have hypothetical risk that never happens associated with those tools. And your position is that the hypothetical risk that never happens is so great that people shouldn't use these tools to stop the real risk that actually does happen.
The real risk that gets exploited every day is that we trust endpoints we shouldn't and don't need to. These tools do not manage this risk effectively, and in fact they make it worse because they are extremely difficult to administer when you don't integrate the endpoints in your network, and in addition to that they require you to trust one more party. The "hypothetical" risk of trusting your endpoint and integrating it in your network literally proves itself dozens of times every day.
Besides, SolarWinds and Okta showed that trusting additional security vendors is a real risk. What if instead of SolarWinds being the supply chain attack, it was CrowdStrike, and that patching the malware would require manual intervention? Supply chain attacks are hardly a hypothetical risk.
And before you say that endpoint trust and EDR is unrelated, try managing an EDR deployment without Active Directory. It's basically impossible. EDR tools assume your endpoint is trusted by default because they're too invasive to be managed otherwise.
> Why can't I just use Windows Defender and a network scanner for detection, and netboot the endpoints to make bootkits useless?
Defender for Endpoint runs in kernel mode. If you just mean plain jane Defender, well, there's a reason Microsoft sells Defender for Endpoint.
I'm not sure what a network scanner is intended to do here but if you mean a network intrusion detection system, those still exist but the market has kind of been strangled by EDR because you get higher fidelity detections and need one fewer skillset.
> It works just fine and has for a long time.
Sort of. There are a lot of controls (MDM and sandboxing and such) that go into that. I'd guess most orgs would just slap EDR on the phones if they could. Again, same result with fewer skillsets and products if so. You need a pretty severe risk to outweigh that and I'm not sure a one-time incident and the slight risk of an actual hostile compromise of an EDR vendor get there. Maybe if there are more like this.
Plain Jane Defender also has a kernel component. The point is to not have to trust any additional vendor. You can't avoid trusting your OS vendor with writing and updating kernel code but you can avoid trusting CrowdStrike.
> Sort of. There are a lot of controls (MDM and sandboxing and such) that go into that. I'd guess most orgs would just slap EDR on the phones if they could. Again, same result with fewer skillsets and products if so. You need a pretty severe risk to outweigh that and I'm not sure a one-time incident and the slight risk of an actual hostile compromise of an EDR vendor get there. Maybe if there are more like this.
There's nothing wrong with MDM and group policy in and of itself.
If orgs could slap EDRs into their mobile devices, the mobile devices would have to become less secure to begin with. The EDR seems to be a marginal risk, but the problem is in the architecture change that's needed to accomodate such an invasive product as an EDR that is a huge risk.
Windows has Defender built in and most of those companies are presumably already using an SCCM like Software Center. If they upgrade from ancient versions of Software Center (because let's be honest if they are using CrowdStrike they are probably behind the times) to more recent versions of Intune it includes most of the exact same "fleet configuration" and "fleet anti-virus" tools (including ransomware prevention/quarantine) that CrowdStrike ever provided, only this time with stuff the Fortune 100 companies are likely already paying Microsoft for. They are trading one vendor for another, but the new vendor actually owns the Operating System and has a lot more reasons not to break it with bad update deployments.
It's not like Microsoft has spent the last few decades entirely ignoring these large scale infrastructure needs, they've just been competing with these bloatware vendors like CrowdStrike with significant marketing arms and incredible sunk cost inertia. (Given most of these companies already pay for things like Intune, it's a weirdly doubled sunk cost inertia, but still a sunk cost inertia.)
The Microsoft competitor to CrowdStrike is Microsoft Defender XDR (specifically Defender for Endpoint), an enterprise product that is pretty good (in my opinion) but not one that Microsoft just throws in as a sweetener with Office 365 or whatever. You will pay specifically for it. It also runs in the kernel, as most effective EDR does, so until eBPF for Windows is done you are taking the same chance with it as with CrowdStrike. It's just a matter of which company you trust more not to BSOD you.
One of those two companies also makes the OS, the other one piggybacks on it. So it seems updates to Defender XDR would be less likely to brick the OS like this.
I've never worked for Microsoft so I'm speculating, but I doubt their org structure supports that. The Windows team isn't going to be the same as the EDR team.
The EDR team might have more info about Windows internals than CrowdStrike does, but then again, maybe not; CrowdStrike already has to work pretty closely with Microsoft to get into the kernel in the first place, and Microsoft is (sometimes) pretty careful about 'bundling' antitrust concerns.
I think the feeling is that Microsoft as the company that has to field the blame for any problems in Windows has more incentive structures in place internally to "never further tarnish the Windows brand". CrowdStrike doesn't care if people blame Windows (and partially benefited in some parts of this news cycle from people blaming Windows for being the messenger), but Microsoft has nothing but skin in that game, and even if the team was in a terribly remote part of the org structure from the Windows team would still be incentivized to avoid making Windows (any more) the bad brand even if for no other reason than "shareholder value" and the equity stakes in their total compensation. (But Microsoft will have other internal incentives in place, especially the closer the team gets to Windows.)
Red Hat says CrowdStrike themselves have sent out eBPF files that have caused all sorts of kernel panics. Any sufficiently advanced kernel scripting language is still kernel-level code. I'm not sure eBPF is exactly the savior people are hoping for, though I do still think it remains something of a general improvement if Microsoft can actually get AV vendors and others to use eBPF rather than bespoke kernel drivers.
I'm pretty sure those were actual kernel modules, not eBPF. eBPF for their Linux agent is on their roadmap but not something they are actually doing yet. eBPF should be sandboxed and unable to crash the kernel. Could be I'm missing something, though.
The price is no joke. Microsoft has a lot of really good security products locked behind a paywall that security vendors know they can beat. We were looking the price for sentinel the other day and holy shit it’s costly.
CrowdStrike included some basic SCCM/fleet management "compliance checkboxes". That is one of the reasons companies were paying for CrowdStrike is "fleet management" of AV.
> Intune is already outdated btw.
I do know that part of Microsoft loves to change brand names every quarter for fun and the "parent" organization has changed a few times from different parts of Office and Windows Server to now "Microsoft 365", but as far as I can tell (and I don't keep up with it) Intune is still the active brand name: https://en.wikipedia.org/wiki/Microsoft_Intune
It's because of the merger into Microsoft 365 that I feel confident that most big Enterprises are already paying for Intune even if they aren't using the most up-to-date version and don't know what compliance "checkboxes" it solves versus something like CrowdStrike.
Maybe stop running critical infra on top of Windows? lol. Microsoft is claiming they have zero culpability here but they created this ecosystem and Crowdstrike is just playing around in it.
Ancient IT mindset says "me need domain controller. me buy win server." but it does not need to be like this.
Most companies have 3rd party software they buy which is critical to their and most of that software was designed to run on Windows. Since people are locked into this software, the vendors of said software have little incentive to switch off Windows Server. Exchange Server is prime example of this if everyone hadn't switch to 365.
Agree, but in addition, the situation that created this problem will soon be resolved when eBPF for Windows goes to prod. It won't be possible to bluescreen a Windows box just because EDR has a bad update.
The situation(s) that make EDR valuable, however, are not going away. Not just for prevention, but for detection and threat hunt, EDR in general and CrowdStrike in particular have pretty much eaten the market that was once shared by network detection, etc.
EBPF is a good direction although I'm not sure what the maturity of the Windows fork is right now, and how many kernel apis are enabled in it - i.e. how far away are we from having enough capability to run an EDR in it
I immensely doubt that actually happened (saw the Musk tweet) - from working with some companies of similar size, EDR rehauls are multi-month projects. Yes they mightve dumped them as a vendor but I'm 100% certain they're not going without an EDR throughout this time.
If they are it'll be a fantastic litmus test of my claim.
They probably bought another similar solution like S1 or Defender. I don’t think they are running without a security solution which was this thread’s discussion point.
> A Microsoft spokesman said it cannot legally wall off its operating system in the same way Apple does because of an understanding it reached with the European Commission following a complaint. In 2009, Microsoft agreed it would give makers of security software the same level of access to Windows that Microsoft gets.
It's not Microsoft that should wall off the operating system. It's banks, airlines, health care providers that should not use Windows the way they currently do.
No employee there needs the possibility to install any software themselves. Without the possibility to install software you don't need anti-virus software.
These systems should just run immutable images, in A/B deployment, just in case the new image is broken.
Of course that does solve the supply chain security. How do you make sure that the images contain know malware? But the problem does not not need to be addressed on millions of machines with millions of employees. It gets reduced to thousands.
Your suggestion simply shows that you have no understanding of how things work.
Terminals already does not allow users to install anything. As for rest of workstations and work laptops - users already don't install anything on them. The issue with Crowdstrike is not with users but with service that is maintaing these computers. A very frightening thing that all those companies are dependant on the f*ck off of their service provider and it costs them all their business.
No viruses need installation - in fact it would be easiest thing if viruses were listed among installed programms. Are yiu representing your whole generation or you are only one such strange person?
Also your suggestion os outdated by at least 60 years as it assumes that hardware that has no software update capabilities can't be hacked...
The predecessor anti-virus to Windows Defender was originally meant to be released in-box with Windows XP. Due to pressure from both the US and the EU (themselves pressured by massive lobbying by Mcafee and Symantec/Norton) Microsoft was not allowed to ship the anti-virus with XP and had to release it separately on a web page as an "optional" download. This gave the anti-virus vendors an additional "free decade" (just about exactly) of being able to advertise that Windows was insecure by default and pretend like this was Microsoft incompetence.
Today a lot of average users (and as CrowdStrike has indicated, many large enterprises) still believe that Windows doesn't have built-in anti-virus because of "Microsoft incompetence" despite Defender having been bundled with Windows since Vista (2007).
Microsoft has spent decades removing security holes but doesn't get even half the credit for it because it still has to deal with an open Kernel because people want to pay for security blanket "products" like CrowdStrike's and Symantec/Norton's bloatware. That's in part because the US DOJ and the EU in trying to do the "right thing" for anti-trust reasons did the exact "wrong thing" for consumer protection reasons and left all these shady vendors with too much "everyone knows Windows has no anti-virus out of the box" PR based on Microsoft forced to remove it from Windows XP to an "optional download" and that still being the benchmark version of Windows in many minds.
Microsoft made a market for these snake oil products because of their incompetente to make a secure operating system. Not because of governments. Things such as Defender wouldn't even be necessary (well they are still not necessary today, but people believe they are and you can't disable Defender anyways).
Linux still has AV scanners. MacOS still has AV scanners, the most common ones are just built-in and unbranded.
Everyone needs Ransomware scanners. Some Linux users and MacOS users rely on security through obscurity, which isn't actual security.
Even with the most rock solid and secure kernel, as long as software is allowed to run in userspace you need to detect when the user accidentally ran software they didn't intend to and/or that is trashing that user's space. You can't just delete a bad userspace, people store their files and increasingly their whole lives there.
You likely will never agree with me on this, but from what I've seen the NT Kernel is one of the most secure kernels on the planet in active mainstream usage. It doesn't have that reputation because the NT Kernel also paradoxically has to be the most open to plugins and third party drivers. People blame the NT Kernel for things the plugins and third party drivers get wrong. Every time Microsoft closes plugin APIs and moves drivers to userspace: companies and users get angry even as the overall security goes up. (That was the real "Vista problem": it moved too many drivers to userspace at once and hurt a lot of third party feelings and seemed to break a lot of hardware for a bit while things caught up.)
But you also don't really care how secure the kernel is because you don't live in kernel space, you live in userspace. You and everybody else also want to be able to run whatever software you want in userspace because you should be in control. (Yes, it's good to have control of your own userspace, that's a lovely freedom.) So Windows doesn't have a working central App Store today and users can still install software from anywhere they find it. That's considered a useful freedom. Things like Defender (ClamAV) and UAC (sudo) and more are still desirable tools that need to exist to protect userspace. (Tron fights for the users!) That's not a failing of OS security, that's a tool to protect user freedom. We know for a fact from mobile OSes that the alternative is locked down app stores, locked down file systems, and a lot less freedom in your userspace. Those are trade-offs we make every day now in which devices we prefer to which tasks. Neither is necessarily the best solution and it is nice being able to pick between systems with more user freedom for some tasks and systems with less for other tasks.
I don't expect you to agree with me and this discussion is close to arguing in circles at this point, but I still believe the reputation of Microsoft's "incompetence" is sorely over-exaggerated, in part by third parties that have always benefited from the platform's openness and predilection towards user freedoms over kernel lockdowns (and also some governmental oversight decisions that claimed to be for user freedom but mostly just lined the pockets of third parties while moving userspace security features out of the normal install for too long).
Microsofts products aren't full of security holes. If you have an 0day on fully patched Windows that is worth a pretty penny, which implies they aren't they easy to come by.
They aren't worth quite as a much as an iOS 0day but they are by no means cheap.
Of course if you think otherwise you can be making 7 figures per bug (assuming you are OK selling to brokers for the 3 letter agencies) so go dig some up?
>Microsofts products aren't full of security holes
They are though, just look at Exchange[1] and what problems Microsoft itself has.[2] There is no such thing as a "secure Microsoft product". Microsoft is single-handedly responsible for making the IT world worse because they do not care and have a monopoly.
>If you have an 0day on fully patched Windows that is worth a pretty penny, which implies they aren't they easy to come by.
It's what the market pays for it, not what it's actually worth as you have already pointed out. Three-letter agencies buy these 0-days themselves for a big sum and support the black market so the prices go even higher because they have infinite money.
Maybe Windows Defender should also not have the ability to crash the kernel. Instead Microsoft can provider proper hooks to pluggable drivers that can be used by both themselves and third party AV.
I found the article unbearable and just a convoluted way to say: this incident would have had a lot less impact if CrowdStrike had less customers or more competitors. A real page filler without any insight or solutions, just look at this paragraph, completely void of anything useful.
> This time, the digital cataclysm was caused by well-intentioned people who made a mistake. That meant the fix came relatively quickly; CrowdStrike knew what had gone wrong. But we may not be so lucky next time. If a malicious actor had attacked CrowdStrike or a similarly essential bit of digital infrastructure, the disaster could have been much worse.
Gee, the damage from an honest mistake (what does the author even base that on) is most likely easier to fix than the damage done by a malicious actor with bad intent. I feel so enlightened!
Funny way of spelling ‘complete failure in functional testing before going to prod causing billions of dollars worth of damage likely due to criminal level negligence’.
You’re not wrong about the end result, but the breakdown of systems this complex goes deeper than placing the blame on some CrowdStrike employee.
Whoever thought up
the great idea to allow auto-update-able kernel modules for something as mission critical as emergency response or healthcare deserves just as much blame.
I’ve worked in healthcare for my whole career, this is madness. Not that their process is without flaw, but can we remind ourselves of how stringently we assess medical devices? I cannot imagine it’s controversial to say that emergency response equipment is every bit as critical as a insulin pump. If they fail, someone dies.
> Whoever thought up the great idea to allow auto-update-able kernel modules
What's made this whole thing so "interesting" is that the whole point of these "channel files" was to decouple the risk from updating the kernel driver.
Accepted best practice for this product has been to stagger rollout of the kernel driver, so a pilot group gets the current release, the herd get n-1, and sensitive machines get n-2. The product provides for this, and most sites either use it, or admit they should.
So when your pilot group start bluescreening with "DRIVER OVERRAN STACK BUFFER" (actual example from last year), it's caught (by the customer, still) and triaged before it reaches n-1, let alone n-2 & front page of The Times.
But the whole 'sell' of the product is that they get 0-day definitions. So endpoints running the relatively trusted n-2 release still get the same protection against active threats. n-2 have a stable driver running today's "channel data".
I'm not clear if Friday's "channel file" is the issue in itself, or whether it triggered a less-explored code path in the kernel driver - but the result is the same. The best practice of staggering the kernel driver releases, didn't save us from a logic bomb in the "channel file".
I just think the distinction is interesting because following accepted best practices, vendor recommendations, and conservative deployment recommendations did not protect from this. It's not the customers that were yolo'ing this.
It seems like a (possibly obvious?) variation of the Church-Turing Theory that any sufficiently advanced scripting language for a Kernel driver is still a kernel level deployment and should be treated as such. Which is to say that these "conservative deployment recommendations" don't seem conservative enough given what we know of Turing Completeness and how easy it is to break any Turing machine. (I still love that our academia has found an unfixable "0-Day jailbreak" in the Universal Turing Machine itself, proving that this root problem is truly deep in computation theories and reproducible at the most abstract levels.)
(The other recent news that Red Hat has been blaming CrowdStrike for sending eBPF files that also kernel panic on Linux also contributes evidence to this any sufficiently advanced scripting language for kernel drivers is itself a kernel driver-level of deployment risk.)
this was a very valuable insight, I’m a med student at the moment, my interest in networking and tech in general is a tad more shallow, but i appreciate your perspective nonetheless!
Additionally, would you mind sharing your thoughts on the following observations? Afaik, similarly to medical devices, we recognize the criticality of software for applications such as ATC or microcontroller-based railway switchyards; for obvious reasons ofc. Alright, but ensuring the availability of barebones emergency response or Hospital IT shouldn’t be far off in terms of criticality, no?
Yet, ATC, avionics, rail DMIs/infrastructure and similar go through the effort of building ultra-available, purpose built systems that are very different from Windows instances running CS kernel tools, even thoughtful ones.
In contrast, apparently said healthcare/emergency related applications seemingly are okay with relying on mission critical windows boxes. I hope that info is factual, otherwise mea culpa.
I don’t mind healthcare using less elaborate tech for non-critical purposes, the equivalent of the service responsible for providing train delay updates, stuff far away from operating signals type ops. But if its mission critical or able to impede critical services, that’s really worrying to me.
So straight up I have to admit that this isn't my wheelhouse - I support a bunch of developers who seem to enjoy breaking things. Safety-critical or life-critical just isn't my thing. If breaking stuff is half the fun, you probably shouldn't be in medicine ;)
Say you have one server that houses all your patient data, and 1000 workstations that access it. I think it's safe to assume you'd treat that one server as your "crown jewels". You want it to be triple-redundant, you want it to be on battery, generator, a very conservative lifecycle management, replicas in different fire zones, immutable backups, etc etc.
Your thousand desktops .. meh. This is where you want your endpoint protection, this is where you're worried about data egress, etc. They still need to be controlled because they have access to the patient data. But you're not so worried about resilience. If a workstation goes bang, you just go out and image it.
I'd consider this a fairly typical way to evaluate risk and threat.
"I felt a great disturbance in the Force, as if millions of voices suddenly cried out in terror and were suddenly silenced. I fear something terrible has happened."
So on Friday, those thousand workstations simultaneously turned blue. Our hypothetical threat model so far has treated workstations as disposable, replaceable, but didn't consider workstations in their entirety. And once we lose the entirety, all our "crown jewels" are safe on our triple-redundant servers, but there's no way to access them. And the resulting "stop work" is a risk to any patient who really needed that work done today.
Now as I said, this isn't my area at all, I'm spit-balling here, but this is how I understand the fallout from this. An analogy is that we put more effort into protecting the president than the man on the street - but if you wake up one morning and the general population has disappeared, the impact is bigger than losing the president.
Automated CI/CD - many of us already do this hundreds of times a day. If you’re an emergency call centre, join a consortium of similar orgs and standardise tech and do it properly.
Defer updates. Most things can wait 8-12 hours. Even more can wait 3 weeks (did this for all but security-critical npm package updates in one place).
Demand legal changes to ensure fair liability for failure to undertake basic measures by service providers for paid software and services. Demand proper liability for C-suites not ensuring that actual risk management is in place instead of stupid box-ticking.
Design better software. Seriously, the kinds of half-baked stuff that costs so much is incredible. It doesn’t take longer, and it doesn’t cost more to do things right, the only change is that management needs to be engaged with outcomes and have skin in the game. Execs should run the risk of going to jail for egregious failures.
Don’t let remote operation of safety critical systems at oil refineries + removal of physical emergency automatic shutdown systems at oil refineries worry you either. When one of those fails for a stupid reason or is hacked by psychopaths it’ll make Bhopal look like a walk in the park.
Don’t over-egg the Crowdstrike thing - the really poor tech choices are going to deliver between three and six orders more of death when the inevitable happens.
It’s a relatively low-impact warning. We can learn from it or not. I know what I’m expecting.
When you factor on the time spent in meetings discussing endlessly what to do about this in corporations around the world, besides the cancelled flights and banking and hospital and 911 mayhem, a trillion isn't indefensible.
If crowdstrikes internal competency failed to detect a consistent bsod, I doubt they’d notice North Korea dropping malware into their product solarwinds style.
But it's also true that companies depending on more or less random third parties to provide them with kernel modules for a monolithic kernel - and then giving them control over auto-updating those kernel modules - is beyond dumb.
I mean, we've seen impacts of these unstaged updates in the deeper layers of banks, airports and hospitals. It's unfair that CrowdStrike gets more than half the blame here.
Not to mention, probably half these companies were actually forced to install this. Getting SOC2 is essentially letting a senior-branded KPMG or EY intern tell you how to enforce security policies based on their checklist.
There is far more to blame in this incident than CrowdStrike themselves.
Compliance is one thing but a cybersecurity incident response team is very blind without an EDR not to mention you would have less protection from insider threat.
Crowdstrike does not have a shortage of competitors, there are plenty in this field, at least a dozen. They just happen to be the best, primarily because of their intel, and you wouldn't believe this but because their sensor has always been very lightweight and easy to manage.
The article is barely scratching the surface, supply chain security is a lot more scary than this.
> They just happen to be the best, primarily because of their intel, and you wouldn't believe this but because their sensor has always been very lightweight and easy to manage.
I think an important part of their dominance is that they're the zero configuration EDR. Buy Carbon Black, and you'll need to know something about what's installed in your environment, what you do and don't want running. Buy SentinelOne, and you need to know that plus you have to learn a complex product.
But with CS, you pretty much just install it, then get that sweet, sweet green checkmark on your security audit.
You nailed it. It is a sweet green check mark but it also works well. The main reason CISO's get it so that if they ever do get compromised they can say they purchased the best security product out there. but there are still a certain portion of their customers that get targeted by APT's, think defense contractors, government orgs like the NRO, oil and gas, finance, power companies that have a large budget and that checkbox is the least of their worries.
You are right about that. The company seems to have bad QA process and that is not something that can be improved easily as it requires change in culture and management.
The warning in this case is hire security people who actually have a clue and include vendor software in their risk assessment.
Literally every time I see stuff like this go down, the security software had exactly zero engineering research put into it whereas everything else did.
If people did this, CrowdStrike would either not exist or look completely different.
The culture at MS$ is to servitize enough of their products and then force customers to use them. That way, the products won't be so exposed to users and Microsoft will be able to limit their own liability without actually improving the back end product.
Servitization is a clever way to consolidate your perpetual licensed customers over to perpetual service contacts, while also further obfuscating and locking down the underlying operating environment.
This is in the best interest of Microsoft bottom line, at the expense of all private business, government, or anyone who values consumer experience really. It reduces the number of drive-by security incidents, but when WW3 happens and 75% of our economy is hosted in a whopping 12 datacenters across 3 companies I'm sure we'll be screwed. I mean just depth charging Google fiber today would probably take down 25% of the world economy.
No. It was just a failure. The warnings have been trumpeted for decades.
It should have been no surprise that the giant company that was trusted to secure our single source of OS software against "supply chain attacks" ended up committing the largest "supply chain attack" yet seen on Earth.
We are effectively still in the wild west. The gold rush has to end before we can truly civilize the place.
is it true that the owner of crowdstrike is an ex-employee of McAfee and the same company that got sold because they had massive downtime for basically the same reason
If these machines were backend systems (which most of the ones that mattered were), you have to ask: why are they running malware detection when they should have minimal-to-no surface area for an attacker?
Sorry, but if decades of warnings from qualified security actual experts who were hired specifically for their expertise in such matters went ignored enough for long enough to reach this point, then this incident isn't gonna change much anything. It'll be news for a short while, then forgotten. No lessons will have been learned, and few if any changes will be made. More things like this will happen in the future. Guaranteed...
I'm a grumpy old man on the internet.... let's just get that out of the way
The root cause is NOT capitalism, nor is it users, Microsoft, or even CrowdStrike. You can't legislate, regulate, or "be more careful next time" your way out of this. Hell, blaming the users won't even work.
Here are 3 stories:
---
Imagine yourself as an inspector for the Army. The 17th Fortress has exploded this month, and nobody can figure out why. You've checked all the surviving off-site records, and are reasonably sure that the crates of dynamite that used to make up the foundations and structure of the cart were properly inspected, and even updated on a regular basis.
You more closely inspect the records, looking for any possible soldier or supplier who might have caused this loss. It might possibly be communist infiltration, or one of those pacifists!
You encounter an old civilian, who remembers a time when forts were built out of wood or bricks, and suggests that. But he's not a professional solder, what could he know.
---
Imagine you're a fire inspector. You've been to your 4th case this month of complete electrical network outage. This time, the cause seems to be that Lisa Douglas at Green Acres had Eb Dawson climb the pole, and he plugged in one too many appliances to the electricial.
If only there were a way to make sure that an overload anywhere couldn't take down the grid, and ruin so many people's days. You desperately want a day without house fires, and so many linemen being called out to test and repair circuits before connecting them back to the grid.
It will take some time before the boilers and generators get back on line from their cold re-start. In the mean while, business in town has ground to a halt.
The paperwork and processes to track and certify each appliance doesn't seem efficient.
There's this grumpy old guy who talks about fuses and circuit breakers, but he's just a crank.
---
The United States found itself embedded in yet another foreign entanglement in VietNam. There was a severe problem planning air strikes, because there were multiple sources required to plan them, and no single computer could be trusted with both of them. The strikes themselves were classified, but the locations of the enemy radar installations couldn't be trusted to the computers, because they were occasionally accessed by enemy sources. Thus the methods and means of locating the enemy radar equipment could become known, and thus rendered ineffective.
A study was done[1], and the problems were solved. There were systems based on the results of these studies[5], and they worked well.[2] Unfortunately, people thought that it was un-necessary to incorporate these measures, and they defaulted to the broken ambient authority model we're stuck with today. Here's some more reading, if you're interested.[3]
---
If you're bored... I've even got a conspiracy theory that explains how I think we actually got here, it it wasn't simply historical forces (which I think it was, 95% certainty).[4] If true, those forces would still be here today, actively suppressing any such stories.
I don't disagree that the prevailing ambient authority model is less than ideal, but I'm left scratching my head at how capability security would solve the more acute problem we're facing.
Of particular note, the kinds of ransomware compromise that lead to organizations asking underwriters to sell them cybersecurity insurance policies wouldn't immediately be thwarted by moving to capabilities. If a principal has valid requisite pointers to the resources they need to do their job, they have valid requisite pointers to the resources that a cryptolocker needs to cause disaster. It's those cybersecurity insurance policies (or regulatory frameworks, where those exist) that ultimately drive the decisions to implement EDRs.
Who would trust the code that the ransomware sneaks use with all of the resources of their system? You can hand $5 over to someone without risking your life savings, surely you could do the same to hand over a temporary folder to run a random piece of code from the internet, and not your full administrator credentials.
Capabilities change things in a profound way. They make it trivial to run any old code, yet not have to ever trust it.
The thing that still puzzles me is how we determine which workload gets which capabilities.
On Plan 9, I took a lot of glee in rfork()ing my way into a tightly curated namespace that disallowed subsequent use of mount and bind, but that was a deliberate decision under the philosophy of the principle of least privilege. Is that the other half of what you're after?
Unlike money in a wallet, capabilities can be composed. You can take an existing capability, and tack a filter onto it, and use that composed capability. For example, you need to do backups.
Take the system administrator's access to everything, but put it through a read-only filter, and pass that to the backup program.
Instead of using the "File Open" dialog in the Win32 gui api... and then a "file open" which could go anywhere, replace it with a call to the file-open "powerbox", which hands a file back to the application. The user could even make it read only, append only, etc.
As far as users are concerned, it doesn't have to be hard. We just have to learn some new command line tricks.
Right. On Plan 9, this kind of composition usually took the form of rfork()ing to inherit the parent process's namespace, mutating it in a handful of ways deemed useful, and then exec()ing the new image. The mechanics are clear to me, I feel.
What isn't clear are the logistics: what, concretely, do we need to undertake in order to avoid a security posture lateral step from ambient authority to object-capabilities? Importantly, how do we make this ergonomic?
Doing it in a GUI should be fairly trivial, it's just update/replacement of the file pickers with a few more options.
As far as the command line goes... that's the problem. We don't have any standardization for command line syntax. My first instinct is to try to impose it, but that seems like a "boil the ocean" approach.
Next up, something like pipes? Maybe a new set of delimiters to pipe capabilities, and allow testing of them in some fairly transparent way?
I strongly suspect that a good GUI implementation will look like how file choosers on RISC OS work, and the command line environment might be inspired by both Plan 9's namespaces and how data definitions (and their accompanying data control blocks) work in JES2 JCL on MVS.
My only concern is that, without well-thought-out access controls, the only people who will care will be nerds or folks with a high frustration tolerance for poorly-considered human-computer interaction.
Sorry, I was imprecise. If a principal starts a new process that indiscriminately inherits the same capabilities of its parent, the security posture is no different from what we have today.
My experience with this on Plan 9 (while not an object-capability operating system, it gets IMO most of the way there in the ways that matter in order to experiment) was that it worked a treat for long-running services, but it was a little hard to make ergonomic for the end user sitting at a terminal.
I played with augmenting factotum with a form of mandatory access control, whereby I could ask the hostowner's factotum to mount and bind things into a freshly rfork()'d namespace on an as-needed basis, but the feeling was that it could build up access delegation fatigue pretty quickly.
plan9 is a very inspiring system but its security model is in no way similar to the object-capability model. well, one way: it has file descriptors. but can you even pass a file descriptor across a file descriptor in plan9 the way scm_rights lets you do in bsd and linux?
plash did get a reasonably ergonomic pola shell working under linux, but unfortunately it has bitrotted. fundamentally the insight is that most of the time when you want a process to open a file you have to tell it which file to open. if you give it an open file descriptor instead of a string, it doesn't need the authority to open existing files itself. just one step further down the path where the shell does the globbing, not the program launched
in theory you could do the same thing with per-process filesystem namespaces as you are describing, but it seems like it would take a lot more work to make it reliable and usable
"but can you even pass a file descriptor across a file descriptor in plan9 the way scm_rights lets you do in bsd and linux?" No, the mechanism was even simpler. We had the #s device (usually bound as /srv) for this, whereby you could drop a file in /srv containing a single integer representing a file descriptor, and other processes could open() that file to get another reference to that file descriptor.
My mandatory access control experiments abused the fact that I could have open file descriptors and barely anything bound in the namespace by the time I exec()ed what I wanted to run. What files did those file descriptors reference? You didn't need to know, and you couldn't see them.
thanks for explaining! that sounds like, as long as the confined process couldn't mount more devices, it could definitely work, with a whole new set of core utilities that looked at predetermined filenames in the filesystem rather than taking filenames on the command line
the #s mechanism sounds considerably less horrific than scm_rights
Yeah, that was the subtext that I left unsaid through this. rfork(RFNOMNT|RFCNAMEG) got you both a completely clean namespace and a namespace that denied additional mounts (as well as dereferencing pathnames beginning with #, as in #s).
The Plan 9 primitives were refreshingly simple and consistent.
i see, thanks! were you working at cisco at the time? i remember they were having a hard time sourcing video cards with plan9 support because nobody had added pci video card support to it yet
i feel like plan9 still tends to pass a lot of ambient authority to new processes by default
No, I've never worked for Cisco, but I did have my own firewall product at one point. (: That's another story for another day though.
Plan 9 is, at its core, still an ambient authority system, and I will never try to misrepresent it otherwise. That said, I've found it does an admirable job emulating what a "familiar" (in the sense of not looking completely alien to someone who only knows ambient authority) object-capability system could look like, kind of like how you can emulate the free monad in Ruby despite it not being a functional programming language.
This is possible principally because the mantra of Plan 9 is that everything is a filesystem: if you can bind it into your namespace, you can access it. As a corollary, if you can't bind it into your namespace, you can't access it. It makes shaping one's ambient authority a remarkably productive endeavor.
> [You] are reasonably sure that the crates of dynamite that used to make up the foundations and structure of the cart were properly inspected, and even updated on a regular basis.
Who could ever have predicted that the head of maintenance would leave his cigar on top of the newest crate of dynamite he had just carried in to update the foundation? And that the slight tremor of the closing door would be enough to let it roll towards the fuse of the closest stick, that needed to stick out of the box per company requirement? Which then caused, due to some not-fully-understood thermochemical reaction, a series of unfortunate events?
A rigorous post-mortem analysis will reveal that what we need to avoid aa repeat of this event, are bigger, vibration-proof, double doors with additional draft prevention, as this was determined to be a possible future path of failure.
And may the wrath of god strike the unholy communists.
the people making the decisions know they need computer security, but they can't tell the difference between real computer security and security theater. moreover, secops con men and edr vendors are better at security theater than approaches that could potentially provide actual security, because they're constantly active, constantly detecting and stopping attacks, just like antivirus vendors. pola doesn't have that
as long as nobody serious is attacking you, security theater is indistinguishable from real security for anyone who isn't technical (and for technical people who are indoctrinated in pci or secops snake oil). thinkpieces like this will say bullshit like 'CrowdStrike worked like clockwork' and focus on anti-bourgeois nonsense that management will correctly ignore. and credit unions and hospitals will keep installing wide-open vulnerabilities like clownstrike and getting popped
it's like how westen medicine bled sick people to death for 20 centuries, and how people keep using homeopathy
i don't know enough about what happens inside the nsa to know whether your conspiracy theory is true or not. certainly they did work hard to backdoor ec_drbg and weaken exportable ssl, and bernstein makes a good case that they weakened des back in the 70s, and it's well known that internet protocols before ssl didn't include encryption because the dod sponsors didn't want them to. so lots of similar conspiracy theories were actually true. but i don't have convincing evidence for this one. evidently ignorance is a sufficient cause
> the people making the decisions know they need computer security, but they can't tell the difference between real computer security and security theater.
Hit the nail on the head.
Also real security being something you do rather than something you can buy is a big thing, these people think you can buy your way out properly understanding and designing systems. Vendors have taught them to think this way across all software (not just security) so it's hard to blame them but it does make this very hard to fix.
What baffles me is just how many IT personnel in so many organizations around the world apparently just blindly hit the "Deploy this zero-day update to all production systems without any testing" button instead of the "Test this update on our test systems first" button.
Or maybe even just looking up the update online to see whether any problems had been reported before deploying it wholesale across their organizations.
Are these the same IT people whose systems all went offline in the left-pad incident because they 'accidentally' set their production servers to be dependent on a third-party repository?
I've worked at some low-budget places that didn't have much in the way of a vetting process, but even there auto-deploying unknown updates to third-party dependencies into production was always a capital N No.
They didn’t. Most orgs run with at least an n-1 Sensor version, with test groups for latest. This was essentially a definitions update pushed by Crowdstrike to all customers, regardless of the deployed sensor version.
Crowdstrike is probably less bad than the alternatives that I have run into that are largely developed by very low cost engineers cough TrendMicro cough but even so, they aren't NT kernel engineers nor do they have the NT kernel release process.
Companies need to find ways to live without this crap or this will keep happening and it will be a lot worse one day. Self-compromising your own systems with RATs/MDMs/EDR/XDR/whatever other acronym soup needed to please the satanic CISSPs are just terrible ideas in general.