I'm confused as to how this issue is so widespread in the first place. I'm unfamiliar with how Crowdstrike works, do organizations really have no control over when these updates occur? Why can't these airlines just apply the updates in dev first? Is it the organizations fault or does Crowdstrike just deliver updates like this and there's no control? If that's just how they do it, how do they get away with this?
Can somebody summarize what CrowdStrike actually is/does? I can't figure it out from their web page (they're an "enterprise" "security" "provider", apparently). Is this just some virus scanning software? Or is it some bossware/spyware thing?
It's both. Antivirus along with spyware to also watch for anything the user is doing that could introduce a threat, such as opening a phishing email, posting on HN, etc.
It's not really up to the companies. In this day and age, everyone is a target for ransomware, so every company with common sense holds insurance against a ransomware attack. One of the requirements of the insurance is that you have to have monitoring software like Crowdstrike installed on all company machines. The company I work for fortunately doesn't use Crowdstrike, but we use something similar called SentinelOne. It's very difficult to remove, and it's a fireable offense if you manage to.
No doubt mandated so that the NSA can have a backdoor to everything just by having a deal with each one of those providers.
I think there's a Ben Franklin quote that applies here. "Those who would give up essential liberty, to purchase a little temporary safety, deserve neither liberty nor safety."
It is kinda implied throughout SP 800-171r3 that EDRs will make meeting the requirements easier, although they are only specifically mentioned in section 03.04.06
Most corporate places I've encountered over the last N years mandate one kind of antivirus/spyware combo or another on every corporate computer. So it'd be pretty much every major workplace.
Just because everyone does it doesn't not make it a dumb idea. Everyone eats sugar.
If the average corporation hates/mistrusts their employees enough to add a single point of failure to their entire business and let a 3rd party have full access to their systems, then well, they reap what they sow.
I think you have to look beyond the company. In my experience, even the people implementing these tools hate them and rarely have some evil desire to spy on their employees and slow down their laptops. But without them as part of the IT suite, the company can't tick the EDR or AV box, pass a certain certification, land a certain type of customer, etc. It is certainly an unfortunate cycle.
This goes way higher than the average corporation.
This is companies trying desperately to deliver value to their customer at a profit while also maintaining SOC 2, GDPR, PCI, HIPAA, etc. compliance.
If you're not a cybersecurity company, a company like CrowdStrike saying: 'hey, pay us a monthly fee and we'll ensure you're 100% compliant _and_ protected' sounds like a dream come true. Until today, it probably was! Hell, even after today, when the dust settles, still probably worth it.
Sounds like the all too common dynamic of centralized top-down government/corporate "security" mandates destroying distributed real security. See also TSA making me splay my laptops out into a bunch of plastic bins while showing everyone where and how I was wearing a money belt. (I haven't flown for quite some time, I'm sure it's much worse now)
There's a highly problematic underlying dynamic where 364 days out of the year, when you talk about the dangers of centralized control and proprietary software, you get flat out ignored as being overly paranoid and even weird (don't you know that "normal" people have zero ability or agency when it comes to anything involving computers?!). Then something like this happens and we get a day or two to say "I told you so". After which the managerial class goes right back to pushing ever-more centralized control. Gotta check off those bullet point action items.
They fixed that. Now you can fly without taking your laptop out, or taking your shoes and belt off. You just have to give them fingerprints, a facial scan and an in-person interview. They give you a little card. It's nifty.
My response was intended as sarcasm. But eventually, I don't think it will be a two-tiered system. You simply won't be allowed to fly without what is currently required for precheck.
And fwiw, I don't think the strong argument against precheck has to do with social class... it's not terribly expensive, and anyone can do it. It's just a further invasion of privacy.
Precheck is super cheap, it's like less than $100 once per 5 years. Yes, it is an invasion of privacy, but I suspect the government already has all that data anyway many times over.
> showing everyone where and how I was wearing a money belt
I only fly once every couple years, but I really hated emptying my pockets into those bins. The last time I went through, the agent suggested I put everything in my computer bag. That worked a lot better.
Last time I flew, in sweden, the guy was angry at me for having to do his job so he slipped my passport away from the tray, so that I'd lose it. Lucky for me I saw him doing that.
At my work in the past year or 2 they rolled out Zscaler onto all of our machines which I think is supposed to be doing a similar thing. All it's done is caused us regular network issues.
I wonder if they also have the capability to brick all our Windows machines like this.
Zscaler is awful. It installs a root cert to act as a man-in-the-middle TCP traffic snooper. Probably does some other stuff, but all you TLS traffic is snooped with zscaler. It is creepy software, IMO.
Ah, yeah, they gave us zscaler not too long ago. I wondered if it was logging my keystrokes or not, figured it probably was because my computer slowed _way_ down ever since it appeared.
Zscaler sounds like it would be a web server. Just looked it up: "zero trust leader". The descriptiveness of terms these days... if you say it gets installed on a system, how is that having zero trust in them? And what do they do with all this nontrust? Meanwhile, Wikipedia says they offer "cloud services", which is possibly even more confusing for what you describe as client software
Somebody upthread pointed out that it installs a root CA and forces all of your HTTPS connections to use it. I verified that he's correct - I'm on Hacker News right now with an SSL connection that's verified by "ZScaler Root CA", not Digicert.
ZScaler has various deployment layouts. Instead of the client side TLS endpoint, you can also opt for the "route all web traffic to ZScaler cloud network" which office admins love because less stuff to install on the clients. The wonderful side effect is that some of these ZScaler IPs are banned from reddit, Twitter, etc, effectively banning half the company.
Zero trust means that there is no implicit trust whether you’re accessing the system from an internal protected network or from remote. All access to be authenticated to the fullest. In theory you should be doing 2FA every time you log in for the strictest definition of zero trust.
They are a SASE provider, I am assume they offer a beyond Corp style offering allowing companies to move their apps off a private VPN and allow access on the public internet. Probably have a white paper on how they satisfy zero trust architecture.
See the recent waves of ransomware encrypting drives and similar attacks. They cause real cost as well and this outage can be blamed on crowdstrike without losing face. If you are in the news for phished data or have an outage since all data is encrypted blaming somebody else is hard
Well it’s not aimed at IT people and programmers (though the policies still apply to them), it’s aimed at everyone else who doesn’t understand what a phishing email looks like.
These comments make me think that both you and the commenter you replied to have never read 1984.
It's anti totalitarian propaganda. There is IIRC not much about how Airstrip One came to be, it's kinda always been there because the state controls history. People did not ask for the telescreens, they accept them.
The system in the book is so strongly based on heavy-handed coercion and manipulation that I actually find it psychologically implausible (though, North Korea...). The strength of the book, I would say, is not its plausibility, but the intensity of the nightmare and the quality of the prose that describes it.
So there's the control freak at the top who made this decision, and then there are the front lines who are feverishly booting into safe mode and removing the update, and then there are the people who can't get the data they need to safely perform surgeries.
So yeah, screw 'em. But let's be specific about it.
I think the question this raises is why critical systems like that have unrestricted 3rd party access and are open to being bricked remotely. And furthermore, why safety critical gear has literally zero backup options to use in case of an e.g. EMP, power loss, or any other disruption. If you are in charge of a system where it crashing means that people will die, you are a complete moron to not provide multiple alternatives in such a case and should be held criminally liable for your negligence.
Agreed on all points, but if we're going to start expecting people to do that kind of diligence, re: fail-safes and such (and we should), then we're going to have to stop stretching people as thin as we tend to, and we're going to have to give them more autonomy than we tend to.
Like the kind of autonomy that let's them uninstall Crowdstrike. Because how can you be responsible for a system which at any time could start running different code.
What I don't get why nobody questions how's OS that needs all third-party shit to function and be compliant, gets into critical paths in the first place??
This kind of thing is required by FedRAMP. Good luck finding a company without ending management software who is legally allowed to be a US government vendor.
If you stick to small privately held companies you might be able to avoid ending management but that's it.. any big brand you can think of is going to be running this or something similar on their machines -- because they're required to
Presumably endpoint detection & response (EDR) agents need to do things like dynamically fetch new malware signatures at runtime, which is understandable. But you'd think that would be treated as new "content", something they're designed to handle in day-to-day operation, hence very low risk.
That's totally different to deploying new "code", i.e. new versions of the agent itself. You'd expect that to be treated as a software update like any other, so their customers can control the roll out as part of their own change management processes, with separate environments, extensive testing, staggered deployments, etc.
I wonder if such a content vs. code distinction exists? Or has EDR software gotten so complex (e.g. with malware sandboxing) that such a distinction can't easily be made any more?
In any case, vendors shouldn't be able to push out software updates that circumvent everyone's change management processes! Looking forward to the postmortem.
My guess is it probably was a content update that tickled some lesser trodden path in the parser/loader code, or created a race condition in the code which lead to the BSOD.
Even if it’s ‘just’ a content update, it probably should follow the rules of a code update (canaries, pre-release channels, staged rollouts, etc).
CrowdStrike is an endpoint detection and response (EDR) system. It is deeply integrated into the operating system. This type of security software is very common on company-owned computers, and often have essentially root privileges.
Well, actually more than root. Even for an administrator user on Windows, it’s pretty hard to mess with things and get into BSOD. CrowdStrike has these files as drivers (as indicated by .sys file extension) which run in the kernel mode.
Companies operate on a high level of fear and trust. This is the security vendor, so in theory they want those updates rolled out as quickly as possible so that they don't get hacked. Heh.
These updates happen automatically and as far as I can tell, there is no option to turn this feature off. From a security perspective, the vendor will always want you to be on the most recent software to protect from attack holes that may open up by operating on an older version. Your IT department will likely want this as well to avoid culpability. Just my 2 observations, whether it is the right away or if CS is effective at what it does, no idea.
Crowdstrike did this to our production linux fleet back on April 19th, and I've been dying to rant about it.
The short version was: we're a civic tech lab, so we have a bunch of different production websites made at different times on different infrastructure. We run Crowdstrike provided by our enterprise. Crowdstrike pushed an update on a Friday evening that was incompatible with up-to-date Debian stable. So we patched Debian as usual, everything was fine for a week, and then all of our servers across multiple websites and cloud hosts simultaneously hard crashed and refused to boot.
When we connected one of the disks to a new machine and checked the logs, Crowdstrike looked like a culprit, so we manually deleted it, the machine booted, tried reinstalling it and the machine immediately crashes again. OK, let's file a support ticket and get an engineer on the line.
Crowdstrike took a day to respond, and then asked for a bunch more proof (beyond the above) that it was their fault. They acknowledged the bug a day later, and weeks later had a root cause analysis that they didn't cover our scenario (Debian stable running version n-1, I think, which is a supported configuration) in their test matrix. In our own post mortem there was no real ability to prevent the same thing from happening again -- "we push software to your machines any time we want, whether or not it's urgent, without testing it" seems to be core to the model, particularly if you're a small IT part of a large enterprise. What they're selling to the enterprise is exactly that they'll do that.
Oh, if you are also running Crowdstrike on linux, here are some things we identified that you _can_ do:
- Make sure you're running in user mode (eBPF) instead of kernel mode (kernel module), since it has less ability to crash the kernel. This became the default in the latest versions and they say it now offers equivalent protection.
- If your enterprise allows, you can have a test fleet running version n and the main fleet run n-1.
- Make sure you know in advance who to cc on a support ticket so Crowdstrike pays attention.
I know some of this sounds obvious, but it's easy to screw up organizationally when EDR software is used by centralized CISOs to try to manage distributed enterprise risk -- like, how do you detect intrusions early in a big organization with lots of people running servers for lots of reasons? There's real reasons Crowdstrike is appealing in that situation. But if you're the sysadmin getting "make sure to run this thing on your 10 boxes out of our 10,000" or whatever, then you're the one who cares about uptime and you need to advocate a bit.
I would wager that even most software developers who understand the difference between kernel and user mode aren't going to be aware there is a "third" address space, which is essentially a highly-restricted and verified byte code virtual machine that runs with limited read-only access to kernel memory
Not that it changes your point, and I could be wrong, but I'm pretty sure eBPF bytecode is typically compiled to native code by the kernel and runs in kernel mode with full privileges. Its safety properties entirely depend on the verifier not having bugs.
fwiw there's like a billion devices out there with cpus that can run java byte code directly - it's hardly experimental. for example, Jazelle for ARM was very widely deployed
Depending on what kernel I'm running, CrowdStrike Falcon's eBPF will fail to compile and execute, then fail to fall back to their janky kernel driver, then inform IT that I'm out of compliance. Even LTS kernels in their support matrix sometimes do this to me. I'm thoroughly unimpressed with their code quality.
JackC mentioned in the parent comment that they work for a civic tech lab, and their profile suggests they’re affiliated with a high-profile academic institution. It’s not my place to link directly, but a quick Google suggests they do some very cool, very pro-social work, the kind of largely thankless work that people don’t get into for the money.
Perhaps such organizations attract civic-minded people who, after struggling to figure out how to make the product work in their own ecosystem, generously offer high-level advice to their peers who might be similarly struggling.
It feels a little mean-spirited to characterize that well-meaning act of offering advice as “insane.”
This is gold. My friend and me were joking around that they probably did this to macos and linux before, but nobody gave a shit since it's... macos and linux.
(re: people blaming it on windows and macos/linux people being happy they have macos/linux)
I don’t think people are saying that causing a boot loop is impossible on Linux, anyone who knows anything about the Linux kernel knows that it’s very possible.
Rather it’s that on Linux using such an invasive antiviral technique in Ring 0 is not necessary.
On Mac I’m fairly sure it is impossible for a third party to cause such a boot loop due to SIP and the deprecation of kexts.
I believe Apple prevented this also for this exact reason. Third-parties cannot compromise the stability of the core system, since extensions can run only in user-space.
I might be wrong about it, but I feel that malware with root access can wreak quite a havoc. Imagine that this malware decides to forbid launch of every executable and every network connection, because their junior developer messed up with `==` and `===`. It won't cause kernel crash, but probably will render the system equally unusable.
Root access is a separate issue, but user space access to sys level functions is something Apple has been slowly (or quickly on the IOS platform, where they are trying to stop apps snooping on each other) clamping down on for years.
On both macOS and Linux, there's an increasingly limited set of things you can do from root. (but yeah, malware with root is definitely bad, and the root->kernel attack surface is large)
Malware can do tons of damage even with only regular user access, e.g. ransomware. That’s a different problem from preventing legitimate software from causing damage accidentally.
To completely neuter malware you need sandboxing, but this tends to annoy users because it prevents too much legitimate software. You can set up Mac OS to only run sandboxed software, but nobody does because it’s a terrible experience. Better to buy an iPad.
> but nobody does because it’s a terrible experience
To be fair, all apps from the App Store are sandboxed, including on macOS. Some apps that want/need extra stuff are not sandboxed, but still use Gatekeeper and play nice with SIP and such.
FWIW, according to Activity Monitor, somewhere around 2/3 to 3/4 of the processes currently running on my Mac are sandboxed.
Terrible dev experience or not, it's pretty widely used.
It depends on your setup. If you actually put in the effort to get apparmor or selinux set up, then root is meaningless. There have been so many privilege escalation exploits that simply got blocked by selinux that you should worry more about setting selinux up than some hypothetical exploit.
It's not unnecessary, it's harder (no stable kernel ABI, and servers won't touch DKMS with a ten foot pole).
On the other hand you might say that lack of stable kernel ABI is what begot ebpf, and that Microsoft is paying for the legacy of allowing whatever (from random drivers to font rendering) to run in kernel mode.
I’ve had an issue with it before in my work MacBook. It would just keep causing the system to hang, making the computer unusable. Had to get IT to remove it.
> we push software to your machines any time we want, whether or not it's urgent, without testing it
Do they allow you to control updates? It sounds like what you want is for a small subset of your machines using the latest, while the rest wait for stability to be proven.
This is what happened to us. We had a small fraction of the fleet upgraded at the same time and they all crashed. We found the cause and set a flag to not install CS on servers with the latest kernel version until they fixed it.
I wonder if the changes they put in behind the scenes for your incident on Linux saved Linux systems in this situation and no one thought to see if Windows was also at risk.
So in a nutshell it is about corporations pushing for legislation which compels usage of their questionable products, because such products enable management to claim compliance when things go wrong, even when the things that go wrong is are the compliance ensuring products.
CrowdStrike Falcon may ship as a native package, but after that it completely self-updates to whatever they think you should be running. Often, I have to ask IT to ask CS to revert my version because the "current" one doesn't work on my up-to-date kernel/glibc/etc. The quality of code that they ship is pretty appalling.
Thanks for confirming. Is there any valid reason these updates couldn't be distributed through proper package repositories, ideally open repositories (especially data files which can't be copyrightable anyway)?
Yes but that puts a lot of complexity on the end user and you end-up with:
1. A software vendor that is unhappy about the speed they can ship new features at
2. Users that are unhappy the software vendor isn't doing more to reduce their maintenance burden, especially when they have a mixture of OS, distros and complex internal IT structures
IMO default package manager have failed on both linux and windows to provide a good solution for remote updates so everyone re-invents the wheel with custom mini package managers + dedicated update systems.
This seems to be misinformation? The CrowdStrike KB says this was due to a Linux kernel bug.
---
Linux Sensor operating in user mode will be blocked from loading on specific 6.x kernel versions
Published Date: Apr 11, 2024
Symptoms
In order to not trigger a kernel bug, the Linux Sensor operating in user mode will be prevented from loading on specific 6.x kernel versions with 7.11 and later sensor versions.
Applies To
Linux sensor 7.11 in user mode will be prevented from loading:
For Ubuntu/Debian kernel versions:
6.5 or 6.6
For all distributions except Ubuntu/Debian, kernel versions:
6.5 to 6.5.12
6.6 to 6.6.2
Linux sensor 7.13 in user mode will be prevented from loading:
For all distributions except Ubuntu/Debian, kernel versions:
6.5 to 6.5.12
6.6 to 6.6.2
Linux Sensors running in kernel mode are not affected.
Resolution
CrowdStrike Engineering identified a bug in the Linux kernel BPF verifier, resulting in unexpected operation or instability of the Linux environment.
In detail, as part of its tasks, the verifier backtracks BPF instructions from subprograms to each program loaded by a user-space application, like the sensor. In the bugged kernel versions, this mechanism could lead to an out-of-bounds array access in the verifier code, causing a kernel oops.
This issue affects a specific range of Linux kernel versions, that CrowdStrike Engineering identified through detailed analysis of the kernel commits log. It is possible for this issue to affect other kernels if the distribution vendor chooses to utilize the problem commit.
To avoid triggering a bug within the Linux kernel, the sensor is intentionally prevented from running in user mode for the specific distributions and kernel versions shown in the above section
These kernel versions are intentionally blocked to avoid triggering a bug within the Linux kernel. It is not a bug with the Falcon sensor.
Sensors running in kernel mode are not affected.
No action required, the sensor will not load into user mode for affected kernel versions and will stay on kernel mode.
For Ubuntu 22.04 the following 6.5 kernels will load in user mode with Falcon Linux Sensor 7.13 and higher:
6.5.0-1015-aws and later
6.5.0-1016-azure and later
6.5.0-1015-gcp and later
6.5.0-25-generic and later
6.5.0-1016-oem and later
If for some reason the sensor needs to be switched back to kernel mode:
Switch the Linux sensor backend to kernel mode
sudo /opt/CrowdStrike/falconctl -s --backend=kernel
At one point overnight airlines were calling for an "international ground stop for all flights globally". Planes in the air were unable to get clearance to land or divert. I don't believe such a thing has ever happened before except in the immediate aftermath of 9/11.
A pilot WILL land, even without clearance. They're not going to crash their own plane. Either way, ATC has fallback procedures and can just use radio to communicate and manage everything manually. Get all the planes on the ground in safe order and then wait for a fix before clearing new takeoffs. https://aviation.stackexchange.com/questions/43379/is-there-...
Planes always get landing clearance via radio. "Planes in the air were unable to get clearance to land or divert" strongly suggests that the radios themselves were not working if it's actually true.
I wouldn't expect emergency rooms and 911 to stop working either, but here we are, so until someone says otherwise, I'm assuming some ATCs went down too.
I imagine the flight planning software they use was affected (so their ability to coordinate with other airport's ATC), but not their radio systems or aircraft radar (nearly all radar systems I've worked with are run on Linux, and are hardened to the Nth degree). Been out of the game for 12 years though, so things have likely changed.
The Tenerife disaster (second-deadliest aviation incident in history, after 9/11) was ultimately caused by chaotic conditions due to too many airplanes having to be diverted and land at an alternate airport that wasn't equipped to handle them comfortably.
I'd argue that Tenerife was due to taking off (in bad weather), not landing. But of course, a bunch of planes landing at the same airport without ATC sounds quite dangerous.
There were a lot of contributing causes, but it wouldn't have happened if not for the fact that Tenerife North airport was massively overcrowded due to Gran Canaria airport being suddenly closed (for unrelated reasons) and flights forced to divert.
The issue wasn't with landing specifically; I'm just using it as a general example of issues caused by havoc situations in aviation.
Pilots know where there are other places to land, e.g. there are a lot of military strips and private airfields where some craft can land, depending on size.
I would also point out that the backup plan (Radio and Binoculars) are not only effective but also extremely cheap & easy to keep ready in the control tower at all times.
Why does this tool exist and must be installed on servers? Well, Windows OS design definitely plays a role here.
Why does this software run in a critical path that can cause the machine to BSOD? This is where the OS is a problem. If it is fragile enough that a bad service like this can cause it to crash in an unfixable state (without manual intervention), that’s on Windows.
> Why does this tool exist and must be installed on servers?
Fads, laziness, and lack of forethought. This tool didn't exist a few years ago. Nobody stopped IT departments worldwide and said "hey, maybe you shouldn't be auto-rolling critical software updates without testing, let alone doing this via a third-party tool with dubious checks."
This could have happened on any OS. Auto deployment is the root problem.
In this very thread there was report of a Debian Linux fleet being kernel crashed in exactly the same scenario by exactly the same malware few months ago.
So the only blame Windows can take is its widespread usage, compared to Debian.
Yes, the Linux device driver has many of the same issues (monolithic drivers running in kernel space/memory). I’m not sure what the mitigations were in that case, but I’d be interested to know.
But we both know this isn’t the only model (and have commented as such in the thread). MacOS has been moving away from this risk for years, largely to the annoyance of these enterprise security companies. The vendor that was used by an old employer blamed Apple on their own inability to migrate their buggy EDM program to the new version of macOS. So much so that our company refused to upgrade for over 6 months and then it was begrudgingly allowed.
A tool that has full control of the OS (which is apparently required by such security software) fundamentally must have a way to crash the system, and continue to do so at every restart.
This really should be a hell no. Perhaps Microsoft's greatest claim to fame is their enduring ability to quickly and decisively react to security breaches with updates. Their process is extremely public and hasn't significantly changed in decades.
If your company can't work with Microsoft's process, your company is the problem. Every other software company in the last forty years has figured it out.
I don't blame Windows, but do blame these systems for running Windows, if that makes sense.
I imagined a lot of this ran on some custom or more obscure and hardened specialty system. One that would generally negate the need for antiviruses and such. (and obviously, no, not off the shelf Linux/BSD either)
Legit question, not trolling. Android is the next biggest OS used to run a single application like POS, meter readers, digital menus, navigation systems. It might be the top one by now. It's prone to all the same 'spyware' drawbacks and easier to set up than "Linux".
It would be better than Windows for sure. You’ve got A/B updates, verified boot, selinux, properly sandboxed apps and a whole range of other isolation techniques.
For something truly mission critical, I’d expect something more bespoke with smaller complexity surface. Otherwise Android is actually not a bad choice.
Any sort of Immutable OS would be better for critical systems like this. The ability to literally just rollback the entire state of the system to before the update would have gotten things back online as fast as a reboot...
Something like Android Lollipop from 2014 supports all the latest techniques. It's likely there's no security issues left on Lollipop by now.
A lot of the new forced updates on Android is to prevent people some apps from being used to spy on other apps, stealing passwords, notification backdoor etc, but you don't need that if it's just a car radio.
the same time new showed up here, on wechat tiktok clone (moments i think, in English) was showing animations of the usa air traffic maps and how the tech blackout affected it. from those images i that it was huge.
Crowdstrike though is not part of a system of engineered design.
It’s a half-baked rootkit sold as a figleaf for incompetent it managers so they can implement ”best practices” in their companys PC:s.
The people purchasing it don’t actually know what it does, they just know it’s something they can invest their cybersecurity budget into and have an easy way to fullfill their ”implement cybersecurity” kpi:s without needing to do anything themselves.
Exactly, and this is why I've heard the take that the companies who integrate this software need to be held responsible by not having proper redundancy, and while its a fine take, we need to keep absolutely assailing blame at Crowdstrike and even Microsoft. They're the companies that drum the beat of war every chance they get, scaring otherwise reasonable people into thinking that the Cyberworld is ending and only their software can save them, who push stupid compliance and security frameworks, and straight-up lie to their prospects about the capabilities and stability of their product. Microsoft sets the absolutely dog water standard of "you get updates, you cant turn them off, you can't stagger them, you can't delay them, you get no control, fuck you".
Perhaps true in some cases but in regulated insustries (example fed regulated banks) a tool like crowdstrike addresses several controls that if uncontrolled result in regulatory fines. Regulated companies rarely employ home grown tools due to maintainance risk. But now as we see these rootkit or even agent based security tools bring their own risks.
I’m not arguing against the need to follow regulations. I’m not familiar what specifically is required by banks. All I’m saying Crowdstrike sucks as a specific offering. I’m sure there are worse ways to check the boxes (there always is) but that’s not a much of a praise.
My rant is from a perspective in an org that most certainly was not a bank (b2b software/hardware) and there was enough of ruckus to tell it was not mandated there by any specific regulation (hence incompetence).
A properly used endpoint protection system is a powerful tool for security.
It's just that you can gamble compliance by claiming you have certain controls handled by purchasing crowdstrike... then leave it not properly deployed and without actual real security team in control of it (maybe there will be few underpaid and overworked people getting pestered by BS from management)
I think a lot about software that is fundamentally flawed but gets propelled up in value due to great sales and marketing. It makes me question the industry.
It's interesting that this is being referred to as a black swan event in the markets. If you look at the SolarWinds fiasco from a few years ago, there are some differences, but it boils down to problems with shitty software having too many privileges being deployed all over the place. It's a weak mono culture and eventually a plague will cause devastation. I think a screw up for these sorts of software models shouldn't really be thought of as a black swan event, but instead an inevitability.
That is how all of these tools are. I have always told people that third-party virus scanners are just viruses that we are ok with having. They slow down our computers, reduce our security, many of them have keyloggers in them (to detect other keyloggers). We just trust them more than we trust unknown ones so we give it over to them.
CloudStrike is a little broader of course. But yeah, its a rootkit that we trust to protect us from other rootkits. Its like fighting fire with fire.
That’s my experience as an unfortunate user of a PC as a software engineer in an org where every PC was mandated to install crowdstrike. Fortune 1000.
It ran amok of every PC it was installed to. Nobody could tell exactly what it did, or why.
Engineering management attempted to argue against it. This resulted in quite public discourse which made the incompetence of the relevant parties in it-management related to it’s implementation obvious.
Not _negligently_ incompetent. Just incompetent enough that it was obvious they did not understand the system they administered from any set of core principles.
It was also obvious it was implemented only because ”it was a product you could buy to implement cybersecurity”. What this actually meant from systems architecture point of view was apparently irrelevant.
One could argue the only task of IT management is to act as a dumb middleman between the budget and service providers. So if it’s acceptable it managers don’t actually need to know anything of computers, then the claim of incompetence can of course be dropped.
If you realize something horrific, your options are to decide it's not your problem (and feel guilty when it blows up), carefully forget you learned it, or try to do something to get it changed.
Since the last of these involves overcoming everyone else's shared distress in admitting the emperor has no clothes, and the first of these involves a lot of distress for you personally, a lot of people opt for option B.
> overcoming everyone else's shared distress in admitting the emperor has no clothes
I don't disagree, but why do we do we react this way? Doesn't knowing the emperor has no clothes instill a bit of hope that things can change? I feel for the people who were impacted by this, but I'm also a little bit excited. Like... NOW can we fix it? Please?
The higher up in large organizations you go, in politics or employment or w/e, the more what matters is not facts, but avoiding being visibly seen to have made a mistake, so you become risk-averse, and just ride the status quo unless it's an existential threat to you or something you can capitalize on for someone else's misjudgment.
So if you can't directly gain from pointing out the emperor's missing clothes, there's no incentive to call it out, there's active risk to calling it out if other people won't agree, and moreover, this provides an active incentive for those with political capital in the organization to suppress the embarrassment of anyone pointing out they did not admit the problem everyone knew was there.
(This is basically how you get the "actively suppress any exceptions to people collectively treating something as a missing stair" behaviors.)
I've not seen that at my fortune 100. I found other's willing to agree and we walked it up to the most senior evp in the corporation. Got face time andbwe weren't punished. Just, nothing changed. Some of the directors that helped walk it up the chain eventually became more powerful and the suggested actions took place about 15 years later.
Sure, I've certainly seen exceptions, and valued them a lot.
But often, at least in my experience, exceptions are limited in scope to whatever part of the org chart the person who is the exception is in charge of, and then that still governs everything outside of that box...
It's a nice idea, but has that worked historically? Some people will make changes, but I think we'd be naive to think that things will change in any large and meaningful way.
Having another I-told-you-so isn't so bad, though - it does give us IT people a little more latitude when we tell people that buying the insecurity fix du jour increases work and adds more problems than it addresses.
Sure, on long enough timescales. I mean, there's less lead in the environment than there used to be. We don't practice blood letting anymore. Things change. Eventually enough will be enough and we'll start using systems that are transparent about what their inputs are and have a way of operating in cases where the user disables one of those inputs because it's causing problems (e.g. crowdstrike updates).
I'd just like it to be soon because I'm interested in building such systems and I'd rather be paid to do so instead of doing it on my off time.
there are way too many horrific things in the world to learn about... and then realizing you can't do something about every of those things. But at least you can tackle one of them! (In my case, antibiotic resistance)
My issue is WTF do sooooooo many companies trust this 1 fucking company lol, like its always some obscure company that every major corporation is trusting lol. All because crowdstrike apparently throws good parties for C-Level execs lol
We are a major CS client, with 50k windows-based endpoints or so. All down.
There exists a workaround but CS does not make it clear whether this means running without protection or not. (The workaround does get the windows boxes unstuck from the boot loop, but they do appear offline in the CS host management console - which of course may have many reasons).
Does CS actually offer any real protection? I always thought it was just feel-good software, that Windows had caught up to separating permissions since after XP or so. Either one is lying/scamming, but which one?
> Does CS actually offer any real protection? I always thought it was just feel-good software, that Windows had caught up to separating permissions since after XP or so. Either one is lying/scamming, but which one?
Our ZScaler rep (basically, they technically work for us) come out with massive impressive looking numbers of the thousands of threats they detect and eliminate every month
Oddly before we had zscaler we didn't seem to have any actual problems. Now we have it and while we have lots of zscaler caused problems around performance and location, we still don't have any actual problems.
Feels very much like a tiger repelling rock. But I'm sure the corporate hospitality is fun.
AFAIK, most of the people I know that deploy CrowdStrike (including us) just do it to check a box for audits and certifications. They don't care much about protections and will happily add exceptions on places where it gives problems (and that's a lot of places)
It's not about checking the boxes themselves, but the shifting of liability that enables. Those security companies are paid well not for actually providing security, but for providing a way to say, "we're not at fault, we adhered to the best security practices, there's nothing we could've done to prevent the problem".
Shouldn't that hit Crowdstrike's stock price much more than it has then? (so far I see ~11% down which is definitely a lot but it looks like they will survive).
Not quite. Insurance is a product that provides compensation in the event of loss. Deploying CrowdStrike with an eye toward enterprise risk management falls under one of either changing behaviors or modifying outcomes (or perhaps both).
Pay for what exactly though? Cybersecurity incidents result in material loss, and someone somewhere needs to provide dollars for the accrued costs. Reputation can't do that, particularly when legal liability (or, hell, culpability) is involved.
EDR deployment is an outcome-modifying measure, usually required as underwritten in a cybersecurity insurance policy for it to be in force. It isn't itself insurance.
Just adding my two cents: I work as a pentester and arguably all of my colleagues agree that engagements where Crowdstrike is deployed are the worst because it's impossible to bypass.
It definitely isn't impossible to bypass. It gets bypassed all the time, even publicly. There's like 80 different CrowdStrike bypass tricks that have been published at some point. It's hard to bypass and it takes skill, and yes it's the best EDR, but it's not the best solution - the best solution is an architecture where bypassing the EDR doesn't mean you get to own the network.
An attacker that's using a 0 day to get into a privileged section in a properly set up network is not going to be stopped by CrowdStrike.
By “impossible to bypass” are you meaning that it provides good security? Or that it makes pen testing harder because you need to be able to temporarily bypass it in order to do your test?
The first. AV evasion is a whole discipline in itself and it can be anything from trivial to borderline impossible. Crowdstrike definitely plays in the champions league.
I’ll say this: I did a small lab in college for a hardware security class and I got a scary email from IT because CrowdStrike noticed there was some program using speculative execution/cache invalidation to leak data on my account - they recognized my small scale example leaking a couple of bytes. Pretty impressive to be honest.
Those able to write and use FUD malware do not create public documentation. Crowdstrike is not impossible to bypass, but for a junior security journeyman known as a pentester, working for corporate interests with no budget and absurdly limited scopes under contract for n-hours a week for 3 weeks will never be able to do anything as simple as an EDR evasion, however if you wish to actually learn the basics the common practitioner of this art please go study the offsec evasion class. Then go read a lot of code and syscall documentation and learn assembly.
I don't understand why you were downvoted. I'm interested in what you said. When you mentioned offsec evasion class, is this what you mean? It seems pretty advanced.
What kind of code should I read? Actually, let me ask this, what kind of code should I write first before diving into this kind of evasion technique? I feel I need to write some small Windows system software like duplicating Process Explorer, to get familiar with Win32 programming and Windows system programming, but I could be wrong?
I think I do have a study path, but it's full of gap. I work as a data engineer -- the kind that I wouldn't even bother to call myself engineer /s
I know quite a few offensive security pros that are way better than I will ever be at breaking into systems and evading detections that can only barely program anything beyond simple python scripts.
It’s a great goal to eventually learn everything, but knowing the correct tools and techniques and how and when to use them most effectively are very different skillsets from discovering new vulnerabilities or writing new exploit code and you can start at any of them.
Compare for instance a physiologist, a gymnastics coach, and an Olympic gymnast. They all “know how the human body works” but in very different ways and who you’d go to for expertise depends on the context.
Similarly just start with whatever part you are most interested in. If you want to know the techniques and tools you can web search and find lots of details.
If you want to know how best to use them you should set up vulnerable machines (or find a relevant CTF) and practice. If you want to understand how they were discovered and how people find new ones you should read writeups from places like Project Zero that do that kind of research. If you’re interested in writing your own then yes you probably need to learn some system programming. If you enjoy the field you can expand your knowledge base.
My contacts abroad are saying "that software US government mandated us to install on our clients and servers to do business with US companies is crashing our machines".
When did Crowdstrike get this gold standard super seal of approval? What could they be referring to?
I guarantee you that the damage caused by Crowdstrike today will significantly outweigh any security benefits/savings that using their software might have had over the years.
* lights out interfaces not segregated from business network. Bonus points if its a supermicro which discloses the password hash to unauthenticated users as a design features.
* operational technology not segregated from information technology
* Not a windows bug, but popular on windows: 3rd party services with unquoted exe and uninstall strings, or service executable in a user-writable directory.
I remediate pentests as well as realworld intrusion events and we ALWAYS find one of these as the culprit. An oopsie happening on the public website leading to an intrusion is actually an extreme rarity. It's pretty much always email > standard user > administrator.
I understand not liking EDR or AV but the alternative seems to be just not detecting when this happens. The difference between EDR clients and non-EDR clients is that the non-EDR clients got compromised 2 years ago and only found it today.
Thanks for the list. I got this job as the network administrator at a community bank 2 years ago and 9/9 of these were on/enabled/not secured. I've got it down to only 3/9 (dhcpv6, unquoted exe, operational tech not segregated from info tech).
I'm asking for free advise, so feel free to ignore me, but of these three unremediated vectors, which do you see as the culprit most often?
dhcpv6 poisoning is really easy to do with metasploit and creates a MITM scenario. It's also easy to fix (dhcpv6guard at the switch, a domain firewall rule, or a 'prefer ipv4' reg key).
unquoted paths are used to make persistence and are just an indicator of some other compromise. There are some very low impact scripts on github that can take care of it
Network segregation, the big thing I see in financial institutions is the cameras. Each one has its own shitty webserver, chances are the vendor is accessing the NVR with teamviewer and just leaving the computer logged in and unlocked, and none of the involved devices will see any kind of update unless they break. Although I've never had a pentester do anything with this I consider the segment to be haunted.
I believe the question was 'in which ways is windows vulnerable by default', and I answered that.
If customers wanted to configure them properly, they could, but they don't. EDR will let them keep all the garbage they seem to love so dearly. It doesn't just check a box, it takes care of many other boxes too.
At work we have two sets of computers. One gets beamed down by our multi-national overlords, loaded with all kinds of compliance software. The other is managed by local IT and only uses windows defender, has some strict group policies applied, BMCs on a separate vlans etc.
Both pass audits, for whatever that's worth.
believe it or not, most users dont run around downloading random screensavers or whatever. Instead they are receiving phish emails, often from trusted contacts who have recently been compromised using the same style of message that they are used to receiving, that give the attacker a foothold on the computer. From there, you can use a commonly available insecure legacy protocol or other privilege escalation technique to gain administrative rights on the device.
You don't need exploits to remotely access and run commands on other systems, steal admin passwords, and destroy data. All the tools to do that are built into Windows. A large part of why security teams like EDR is that it gives them the data to detect abuse of built-in tools and automatically intervene.
Not the same poster, but one phase of a typical attack inside a corporate network is lateral movement. You find creds on one system and want to use them to log on to a second system. Often, these creds have administrative privileges on the second system. No vulnerabilities are necessary to perform lateral movement.
Just as an example: you use a mechanism similar to psexec to execute commands on the remote system using the SMB service. If the remote system has a capable EDR, it will shut that down and report the system from which the connection came from to the SOC, perhaps automatically isolate it. If it doesn't, an attacker moves laterally through your entire network with ease in no time until they have domain admin privs.
Anyone who claims CS is nothing but a compliance checkbox has never worked as an actual analyst, of course it's effective...no, dur, its worth 50bn for no reason...god some people are stupid AND loud
Every company I’ve ever worked at has wound up having to install antivirus software to pass audits. The software only ever caused problems and never caught anything. But hey, we passed the audit so we’re good right?
Long time ago I was working for a web hoster, and had to help customers operating web shops to pass audits required for credit card processing.
Doing so regularly involved allowing additonal ciphers for SSL we deemed insecure, and undoing other configurations for hardening the system. Arguing about it is pointless - either you make your system more insecure, or you don't pass the audit. Typically we ended up configuring it in a way that we can easily toggle those two states, and reverted it back to a secure configuration once the customer got their certificate, and flipped it back to insecure when it was time to reapply for the certification.
This tracks for me. PA-DSS was a pain with ssl and early tls... our auditor was telling us to disable just about everything (and he was right) and the gateways took forever to move to anything that wasn't outdated.
Then our dealerships would just disable the configuration anyway.
The dreaded exposed loopback interface... I'm an (internal) auditor, and I see huge variations in competence. Not sure what to do about it, since most technical people don't want to be in an auditor role.
We did this at one place I used to work at. We had lots of Linux systems. We installed clamAV but kept the service disabled. The audit checkbox said “installed” and it fulfilled the checkbox…
Yes, it offers very real protection. Crowdstrike in particular is the best in the market, speaking from experience and having worked with their competitor's products as well and responded to real world compromises.
I'm a dev rather than infra guy, but I'm pretty sure everywhere I've worked which has a large server estate has always done rolling patch updates, i.e. over multiple days (if critical) or multiple weekends (if routine), not blast every single machine everywhere all at once.
If this comment tree: https://news.ycombinator.com/item?id=41003390 is correct, someone at Crowdstrike looked at their documented update staging process, slammed their beer down, and said: "Fuck it, let's test it in production", and just pushed it to everyone.
Which of course begs the question: How were they able to do that? Was there no internal review? What about automated processes?
For an organization it's always the easiest, most convenient answer to blame a single scapegoat, maybe fire them... but if a single bad decision or error from an employee has this kind of impact, there's always a lack of safety nets.
This is not a patch per se, it was Crowdstrike updating their virus definition or whatever it's called internal database.
Such things are usually enabled by default to auto-update, because otherwise you lose a big part of the interest (if there's any) of running an antivirus.
Surely their should be at least some staging on update files as well, to avoid the "oops, we accidentally blacklisted explorer.exe" type things (or, indeed, this)?
This feels like an auto-update functionality. For something that's running in kernel space (presumably, if it can BSOD you?) Which is fucking terrifying.
Windows IT admins of the world, now is your time. This is what you've trained for. Everything else has led to this moment. Now, go and save the world!!
Does it require to physically go to each machine to fix it? Given the huge number of machines affected, it seems to me that if this is the case, this outage could last for days.
The workaround involves booting into Safe mode or Recovery environment, so I'd guess that's a personal visit to most machines unless you've got remote access to the console (e.g. KVM)
It gets worse if your machines have bitlocker active, lots of typing required. And it gets even worse if your servers that store the bitlocker keys also have bitlocker active and are also held captive by crowstrike lol
I've already seen a few posts mentioning people running into worst-case issues like that. I wonder how many organizations are going to not be able to recover some or all of their existing systems.
Presumably at some point they'll be back to a state where they can boot to a network image, but that's going to be well down the pyramid of recovery. This is basically a "rebuild the world from scratch" exercise. I imagine even the out of band management services at e.g. Azure are running Windows and thus Crowdstrike.
• Servers, you have to apply the workaround by hand.
• Desktops, if you reboot and get online, CrowdStrike often picks up the fix before it crashes. You might need a few reboots, but that has worked for a substantial portion of systems. Otherwise, it’ll need a workaround applied by hand.
This is insane. The company I currently work for provides dinky forms for local cities and such, where the worst thing that could happen is that somebody will have to wait a day to get their license plates, and even we aren't this stupid.
I feel like people should have to go to jail for this level of negligence.
Maybe someone tried to backdoor Crowdstrike and messed up some shell code? It would fit and at this point we can't rule it out, but there is also no good reason to believe it. I prefer to assume incompetence over maliciousness.
>True for all systems, but AV updates are exempt from such policies. When there is a 0day you want those updates landing everywhere asap.
This is irrational. The risk of waiting for a few hours to test in a small environment before deploying a 0-day fix is marginal. If we assume the AV companies already spent their sweet time testing, surely most of the world can wait a few more hours on top of that.
Given this incident, it should be clear the downsides of deploying immediately at a global scale outweigh the benefits. The damage this incident caused might even be more than all the ransomware attacks combined. How long to take to do extra testing will depend on the specific organization, but I hope nobody will allow CrowdStrike trying to unilaterally impose a standard again.
I wonder if the move to hybrid estates (virtual + on prem + issued laptops etc) is the cause. Having worked in only on prem highly secure businesses no patches would be rolled out intra week without a testing cycle on a variety of hardware.
I consider it genuinely insane to allow direct updated from vendors like this on large estates. If you are behind a corporate firewall there is also a limit to the impact of discovered security flaws and thus reduced urgency in their dissemination anyway.
Most IT departments would not be patching all their servers or clients at the same time when Microsoft release updates. This is a pretty well followed standard practice.
For security software updates this is not a standard practice, I'm not even sure if you can configure a canary update group in these products? It is expected any updates are pushed ASAP.
For an issue like this though Crowdstrike should be catching it with their internal testing. It feels like a problem their customers should not have to worry about.
Their announcement (see Reddit for example) says it was a “content deployment” issue which could suggest it’s the AV definitions/whatever rather than the driver itself… so even if you had gradual rollout for drivers, it might not help!
I came to HN hoping to find more technical info on the issue, and with hundreds of comments yours is the first I found with something of interest, so thanks! Too bad there's no way to upvote it to the top.
In most appreciations of risk around upgrades in environments with which i am familiar, changing config/static data etc counts as a systemic update and is controlled in the same way
A proper fix means that a failure like this causes you a headache, it doesn't close all your branches, or ground your planes, or stop operations in hospitals, or take your tv off air.
You do that by ensuring a single point of failure, like virus definition updates, or an unexpected bug in software which hits on Jan 29th, or when leapseconds go backwards, can't affect all your machines at the same time.
Yes it will be a pain if half your checkin desks are offline, but not as much as when they are all offline.
Wow that's terrible. I'm curious as to whether your contract with them allows for meaningful compensation in an event like this or is it just limited to the price of the software?
Let's say you're a CISO and it's your task to evaluate Cybersecurity solutions to purchase and implement.
You go out there and found out that there are multiple organizations that tests (simulate attacks) the EDR capabilities of these Vendors periodically and published grades of these Vendors.
You found the top 5 to narrow down your selections and you pitted them in PoC which consists of attack simulations and end-to-end solutions (that's the Response part of EDR).
The winner gets the contract.
Unless there are tie-breakers...
PS: I heard others (and read) said that CS was best-in-class which suggested that they probably won PoC and received high grades from those independent Organizations.
I don't mean this to be rude or as an attack, but do you just auto update without validation?
This appears to be a clear fault from the companies where the buck stops - those who _use_ CS and should be validating patches from them and other vendors.
I'm pretty sure crowdstrike autoupdates, with 0 option to disable or manually rollout updates. Even worse people running N-1 and N-2 channels also seem to have been impacted by this.
I think it's probably not a kernel patch per se. I think it's something like an update to a data file that Crowdstrike considers low risk, but it turns out that the already-deployed kernel module has a bug that means it crashes when it reads this file.
Apparently, CS and ZScaler can apply updates on their own and thats by design, with 0day patches expected to be deployed the minute they are announced.
Why do they "have to"? Why can't company sysadmins at minimum configure rolling updates or have a 48 hour validation stage - either of which would have caught this. Auto updating external kernel level code should never ever be acceptable.
But isn't that a fairly tiny risk, compared with letting a third party meddle with your kernel modules without asking nicely? I've never been hit by a zero-day (unless Drupageddon counts).
I would say no, it's definitely not a tiny risk. I'm confused what would lead you to call getting exploited by vulnerabilities a tiny risk -- if that were actually true, then Crowdstrike wouldn't have a business!
Companies get hit by zero days all the time. I have worked for one that got ransomwared as a result of a zero day. If it had been patched earlier, maybe they wouldn't have gotten ransomwared. If they start intentionally waiting two extra days to patch, the risk obviously goes up.
Companies get hit by zero day exploits daily, more often than Crowdstrike deploys a bug like this.
It's easy to say you should have done the other thing when something bad happens. If your security vendor was not releasing definitions until 48 hours later than they could have, when some huge hack happened becuase of that obviously the internet commentary would say they were stupid to be waiting 48 hours.
But if you think the risk of getting exploited by a vulnerability is less than the risk of being harmed by Crowdstrike software, and you are a decision maker at your organization, then obviously your organization would not be a Crowdstrike customer! That's fine.
CS doesn't force you to auto-upgrade the sensor software – there is quite some FUD thrown around at this moment. It's a policy you can adjust and apply to different sets of hosts if needed. Additionally, you can choose if you want the latest version or a number of versions behind the latest version.
What you cannot choose, however - at least to my knowledge - is whether or not to auto-update the release channel feed and IOC/signature files. The crashes that occured seems to have been caused by the kernel driver not properly handling invalid data in these auxilliary files, but I guess we have to wait on/hope for a post-mortem report for a detailed explanation. Obviously, only the top-paying customers will get those details...
stop the pandering. you know very well crowdstrike doesn't offer good protection to begin with!
everyone pay for legal protection. after it happens you can show you did everything, which means nothing (well now this show even worse than nothing), by showing you paid them.
if they tell you to disable everything, what does it change? they're still your blame shield. which is the reason you have cs.
... the only real feature anybody care is inventory control.
You said Crowdstrike doesn't offer protection but there are plenty in this thread that suggested they actually do and seemed to be highly regarded at the field.
facts speak more than words. if you cared about protection you would be securing your system, not installing yet more things, specially one that now require you open up several other attack vectors. but i will never managed to make you see it.
Writing software in the safest programming language to develop mission critical product deployed on the most secure and stable OS that the world depends on would be developer's wet dream.
reply