CrowdStrike's Falcon Sensor also linked to Linux kernel panics and crashes

roblabla · 2024-07-22T07:17:56.000000Z

This is some very poor journalism. The linux issues are so, so very different from the windows BSOD issue.

The redhat kernel panics were caused by a bug in the kernel ebpf implementation, likely a regression introduced by a rhel-specific patch. Blaming crowdstrike for this is stupid (just like blaming microsoft for the crowdstrike bsod is stupid).

For background, I also work on a product using eBPFs, and had kernel updates cause kernel panics in my eBPF probes.

In my case, the panic happened because the kernel decided to change an LSM hook interface, adding a new argument in front of the others. When the probe gets loaded, the kernel doesn’t typecheck the arguments, and so doesn’t realise the probe isn’t compatible with the new kernel. When the probe runs, shit happens and you end up with a kernel panic.

eBPF probes causing kernel panics are almost always indication of a kernel bug, not a bug in the ebpf vendor. There are exceptions of course (such as an ebpf denying access to a resource causing pid1 to crash). But they’re very few.

xyzzy123 · 2024-07-22T14:13:34.000000Z

It's not clear to me they are so different but maybe I am not "sufficiently smart".

To me this feels like a complicated question - both Linux and Windows organisations are quite good at kernel reliability engineering even though quite different organisational structures and engineering approaches are involved.

Yes "the wrong people were trusted" but I don't see how we can completely solve this with engineering.

roblabla · 2024-07-22T22:31:27.000000Z

> It's not clear to me they are so different but maybe I am not "sufficiently smart".

They're different because linux promises "eBPF are safe and cannot crash the kernel", and failed to deliver on that, while Microsoft says "drivers are all-powerful and as such must be written with care", and CrowdStrike did not heed this warning.

> Yes "the wrong people were trusted" but I don't see how we can completely solve this with engineering.

I mean, we could solve the "third party software fucks the kernel up" problem easily with engineering: providing userspace APIs to do stuff that currently need kernelspace access. There's no inherent reason security products (or, really, any products) needs to live in the kernel, it's just that there are no APIs to do this job, so security products have to go there. If Microsoft provided a good API doing what the custom drivers currently do, most security products would drop their driver in a heartbeat.

For instance, macOS fixed this exact issue a couple years ago by introducing Endpoint Security Framework, a userspace API that allows watching a bunch of events, and authorizing whether they should be allowed or blocked. It's a well-designed API that should obsolete the need for kernelspace access in security products.

j2bryson · 2024-07-23T07:37:29.000000Z

So what happened with the linux bug? Presumably people fixed the OS side problem straight away?

roblabla · 2024-07-24T16:26:02.000000Z

kernel-5.14.0-427.13.1.el9_4 broke it. It was released in Apr 30, 2024, with RHEL 9.4 (this was the RHEL 9.4 release kernel).

According to the comments on https://access.redhat.com/solutions/7068083, RHEL became aware of the issue on May 3, 2024.

A workaround was identified (configuring CS to use the kernel module backend instead of the ebpf backend) on May 9, 2024.

RHEL then fixed it in kernel-5.14.0-427.18.1.el9_4, in May 23, 2024.

So the bug was fixed in ~20 days from the moment it was reported.

It's unclear whether this issue was caused by a RHEL-specific backport/patch or was also present in mainline kernels.

josefx · 2024-07-22T08:06:09.000000Z

> likely a regression introduced by a rhel-specific patch. Blaming crowdstrike for this is stupid (just like blaming microsoft for the crowdstrike bsod is stupid).

Yeah, it isn't as if crowdstrike was specifically advertising certified support for RedHat Linux and related products.

https://www.crowdstrike.com/partners/falcon-for-red-hat/

dtx1 · 2024-07-22T08:33:12.000000Z

But being certified for RedHat Linux doesn't protect you from Bugs in the RedHat Kernel. That's on RedHat.

michaelt · 2024-07-22T08:51:29.000000Z

Back in The Good Old Days, an OS vendor would release a beta version and software vendors would test against it and fix problems before the stable OS version was released.

Obviously OS updates come out a lot more often these days than they used to - but we're also better at test automation than ever before, and beta software is easier to get than ever.

It sure would be nice if companies that decide to produce kernel modules and to support certain OSes could test those kernel modules against those OSes at the beta stage.

roblabla · 2024-07-22T09:07:09.000000Z

1.An eBPF probe is not a kernel module. An eBPF probe should never cause kernel panics.

2. RHEL didn't provide beta kernels before very recently, as far as I can tell.

3. Even if you caught an error then, you're still at the mercy of RHEL to fix it. If RHEL breaks a feature, you report it to them, and they decide to ship anyways... well, your product will still kpanic. I'm not talking hypotheticals: I haven't seen RHEL do that, but I've seen other distros do it.

freedomben · 2024-07-22T13:59:27.000000Z

Emphasis added:

> An eBPF probe should never cause kernel panics.

Should, but did. This is the point at which to learn and adapt.

Also, kernels are software just like nearly everything else, and software is buggy. It's a balance obviously, but some basic defensive development can be a real savior for your users.

I don't know the details about this CrowdStrike incident, but I would also be surprised if you couldn't write an automated test (even a "smoke test") to quickly test out these new kernels before they hit your customers. Given what happened, it seems like negligence not to do that.

roblabla · 2024-07-22T15:37:43.000000Z

It's possible CS can do better, of course. But it's just wrong to blame them for the Linux crashes - they're not the ones that introduced buggy code and broke their users. RHEL/Linux did.

_flux · 2024-07-22T10:51:08.000000Z

> and they decide to ship anyways... well, your product will still kpanic

But then you are in position to share your customers that this will happen before it actually does and they can choose their way of proceeding.

One such way would be being careful with the update and then exercising their own support contracts with RH.

CRConrad · 2024-07-23T06:39:13.000000Z

> An eBPF probe is not a kernel module.

But if it runs on the same privilege level as the rest of the kernel, then isn't it, for the purposes of this discussion, in effect "a kernel module"?

kragen · 2024-07-24T13:24:26.000000Z

it doesn't—the semantics of ebpf confine it—so it isn't

CRConrad · 2024-07-27T15:39:07.000000Z

Either

1) Those Crowdstrike unit files aren't ebpf probes, so the whole subject of ebpf probes is irrelevant here; or

2) They're obviously able to stop the rest of the kernel from even booting up (as Crowdstrike so convincingly demonstrated millions of times over[1]), so yes, they do indeed have at least as much power as any other bit of the kernel.

Either way, hunting around for nits to pick is a bit pathetic.

[1]: In July 0000002024...

kragen · 2024-07-27T17:09:31.000000Z

denial of service is not the same thing as arbitrary code execution, and that goes double in kernel mode, but yes, it does seem that the linux implementation of ebpf had buggy sandboxing; i don't think allowing clownstrike to prevent booting was part of the intended objective

i wasn't hunting around for nits to pick; i was hunting around to see if you'd ever contributed any useful comments to the site. instead i found you making authoritative pronouncements about ebpf that were so wrong that you had evidently never read so much as a one-line summary of what ebpf was for. do you have a more promising historical comment to offer? perhaps something where people complimented your contribution as being informative?

have you ever made a worthwhile comment on hn?

on thursday, wahern posted this comment https://news.ycombinator.com/item?id=41061179 where they traced through the illumos/opensolaris source code to track down how a peculiar solaris interprocess communication mechanism worked, an investigation i had started but gotten stuck on. why can't you make comments like that instead of harassing me about how i format my comments?

the reason i'm asking is because i'd like to be able to talk to more people like wahern, but most of them avoid this site. a major reason why is that comments here frequently receive vacuous, aggressive responses like the comment you made the day before in https://news.ycombinator.com/item?id=41056718, where you launched a personal attack on me because you didn't like how i was formatting my comments

i'd like you to ⓐ apologize for doing that (this is not the first time you've done that to me personally; so far i haven't looked through your comment history far enough to find out how many other people you have a history of repeatedly harassing) and ⓑ commit to not doing it again

because i'm sure you're capable of making comments that make the site better instead of worse

CRConrad · 2024-07-30T19:44:11.000000Z

> do you have a more promising historical comment to offer? perhaps something where people complimented your contribution as being informative?

> have you ever made a worthwhile comment on hn?

I might answer that. If I thought you were owed any justifications from me. Which I don't.

And no, I'm neither “harassing” you nor being “vacuous, aggressive”. This isn't ad hominem, it's ad habitem. You write here for other people to read, and I'd even appreciate many of your comments -- if they weren't so infuriatingly idiosyncratically formatted as to disrupt fluent reading. Have the fucking courtesy to write like a normal person, and you'll be treated like a normal person. To begin with, get the shift key on your keyboard unstuck so you can start your sentences with capitals. And in case your dot / period / full-stop key is totally gone, copy-paste some of these: ........... So you can end them properly too.

Because I'm sure you're capable of making comments without coming off like an illiterate buffoon.

And yes, BTW, you totally were. Careful now, you don't want to end up like chockablock again, do you?

roblabla · 2024-07-22T09:04:00.000000Z

Yes, and? They probably do test their software on RHEL.

But how are they supposed to prevent a bug in a newly released kernel update? You can't test your software on future updates that aren't out yet.

If RHEL breaks some core functionality you depend on, in a newly released update, you can't really do much to prevent breakage, even with the best QA in the world. At best, they could have caught it as soon as RHEL published the new kernel... but by then it's already too late, all your currently-deployed probes now have a ticking time bomb, and need to be updated before the RHEL kernel update is applied, lest you kernel panic.

hsbauauvhabzb · 2024-07-22T09:16:08.000000Z

Maybe by not loading the module into unknown kernels in the first place?

If say you support a distro, you can’t turn around and complain that supporting the newest version is hard, no matter who caused the problem. Plenty of products say ‘this works on $x but it’s not officially supported’.

roblabla · 2024-07-22T11:22:46.000000Z

Again: this is not a kernel module. eBPF probes are meant to be Compile Once, Run Everywhere, that's their whole point! https://facebookmicrosites.github.io/bpf/blog/2020/02/19/bpf...

If you expect software to be future-bug-proof, well, I guess you live in a far better world than I do.

If you advertise your software to be compatible with RHEL, but a glibc bug gets in and causes your sw to crash for a couple of days, before RHEL realises the problem and fixes it, does that mean your software should instantly no longer be advertised as RHEL compatible? That'd make things a lot more confusing, if you ask me.

linuxftw · 2024-07-22T12:30:06.000000Z

Not all eBPF programs are compile once, run everywhere.

RHEL 'updates' can mean different things. A patch release won't change kernel ABI. A minor release will. Writing a non-CORE eBPF program for, say RHEL 8.6, might break on RHEL 8.7. It's not advisable to update across minor releases without lots of testing. Most of the time, things 'just work' but RHEL is a very complex product with a specific support cycle, and laziness of users and 3rd party vendors is not their fault.

hsbauauvhabzb · 2024-07-22T19:52:37.000000Z

That doesn’t really change my point - if stability issues are known to occur in a dependency, you can’t say you support that system.

broknbottle · 2024-07-22T10:27:24.000000Z

This was their newer eBPF falcon sensor that was trying to load a bpf program in the kernel and triggered kernel panic. This shouldn’t have happened and was definitely a bug in the kernel.

For the kernel mode, their software will flag an unknown kernel as unsupported and go into a reduced functionality mode (rfm).

The idiots didn’t know that RH E4S was a thing for like 3+ years.. I’m still baffled by how clueless most of the security people and vendors are when it comes to backporting and different streams / channels that are offered by multiple Linux OS vendors.

https://access.redhat.com/solutions/7001909

freedomben · 2024-07-22T14:03:45.000000Z

> Maybe by not loading the module into unknown kernels in the first place?

Then you better tell your customers not to `dnf update` until you've had a chance to whitelist the new kernel and ship it in your own stream. Otherwise everyone who updates before you do ends up broken. If a vendor told me that, I would laugh, realize they were serious, thank them for their time but let them know that we will be going a different direction.

mbesto · 2024-07-22T08:51:01.000000Z

> just like blaming microsoft for the crowdstrike bsod is stupid

Wait, how is this stupid? Unless I'm missing something, wasn't the patch part of a Microsoft payload that included an update to Crowdstrike? Surely Crowdstrike is culpable, but that doesn't completely absolve Microsoft of any responsibility, as its their payload.

sschueller · 2024-07-22T09:45:58.000000Z

Microsoft should revoke the CrowdStrike driver signature and should do an internal check as to why CrowdStrike's driver was approved when it can execute arbitrary code on the kernel level without any checks. If your "driver" requires this feature MS should require CrowdStrike to submit the entire source and they should have to pay MS to do a review of the code.

What is the point of driver signing if a vendor can basically build in a back door and Microsoft doesn't validate that this back door is at least somewhat reasonable

_flux · 2024-07-22T10:47:47.000000Z

Do you think Microsoft customers using CrowdStrike would then be happier, being unable to run the software at all, due to an action Microsoft took?

Backdoors of all kinds can be installed to most any operating system without vendor co-operation. That is the nature of general-purpose operating systems.

sigseg1v · 2024-07-22T12:48:45.000000Z

I'm a customer that is forced to use CrowdStrike via IT policies and I would be giddy with delight if something came along and caused the removal of it from my systems. I don't need programs sitting on my computer preventing me from installing code that I've literally just compiled, preventing me from deleting or modifying folders on my machine, and causing extreme lag for many basic system operations even when it does work. At this point, the time in lost productivity (via normal operation) and downtime (via their recent bug) has easily exceeded a thousand times over the aggregate sum of all benefits that CrowdStrike will ever have provided from threat detection and prevention. It's time to remove the malware.

_flux · 2024-07-22T13:05:11.000000Z

You are not the customer, though, your employer is the customer.

Perhaps you should push this change up in the food chain, then, and if the company is good the request will be taken seriously. As I understand it, while CrowdStrike is the biggest name in EDR, it's far from the only one, if that's what your company requires to pass some checkboxes in certifications.

hello_moto · 2024-07-22T14:34:13.000000Z

Vendors are competing with one and another to win contracts.

CIO/CISO don't select vendors lightly.

There seems to be a typical/classical Engineer's mindset of "make a claim first, ask later" around the subject lately.

"My boss plays golf with Sales Rep" might need more proof because if they selected the lesser capable vendors and they got hit with ransomware, bet my ass your boss will no longer play Golf with any Sales Rep ever.

CRConrad · 2024-07-23T07:16:26.000000Z

> Vendors are competing with one and another to win contracts.

Sure, in a well-functioning market economy without any distortions. But there are lots of those at play, so competition is severely hampered (by network effects, regulatory capture, and on and on... Up to and including, I suspect, mere ephemeral fashion). What we actually have in many areas of the "tech market" are oligopolies and near-monopolies, not perfect competition.

> CIO/CISO don't select vendors lightly.

Muahaha. Seems rather more like they're at least as naïve as any Web-surfing consumer on their sofa, easily bamboozled by trendy buzzwords and slick marketing campaigns.

hello_moto · 2024-07-22T14:31:06.000000Z

Sounds like your IT (sec team, specifically) doesn't setup the software correctly.

I've worked for a company that installs Falcon on all its fleet and I never run into issues like yours.

mbreese · 2024-07-22T12:58:15.000000Z

At this point… yes.

It would be one thing Microsoft could do to focus 100% of the attention/blame away from Windows and onto CloudStrike. And customers will want their pound of flesh from somewhere.

Really, this should serve as a wake up call w/in Microsoft to start to harden the kernel against such vulnerabilities.

Was the crash the fault of Windows? No. But did a Windows design decision make this possible? yes.

I’m sure the design decision made sense at the time (at least business sense). Keeping the kernel more open for others to add drivers to makes it easier to write/add drivers, but makes the system more vulnerable. This a good opportunity within Microsoft to get support for changing that.

_flux · 2024-07-22T13:17:43.000000Z

Ultimately this would have been almost a non-issue if there had been better deployment strategies in place for also the data file updates.

If by changing the system you mean adding some kind of in-kernel isolation to it, then I don't think it would be worth the effort to make that kind of major change to the way operating systems work just to give arguably a minor risk reduction to systems—in particular if CrowdStrike and other vendors take some learnings from this event.

Microsoft might improve their system rollback mechanism to also include files that are not strictly integrated to the system, merely used by the parts that are (the channel files loaded by the driver).

Actually I think we can just be happy that the incident was a mistake, not an attack. Had this kind of "first ever" situation been an attack, it could be extremely difficult to recover from it. I wonder how well EDRs deal with "attacks from within"..

CrowdStrike pulled off the update within 1.5 hours. I wonder if they actually use Falcon themselves? But then somehow missed the problem? Doesn't seem like they eat their own dog food :). (Or at least their own channel files.)

roblabla · 2024-07-22T14:35:22.000000Z

There's a simple thing microsoft could do to avoid this, that doesn't require anything too crazy. EDRs work in kernel-land because that's the only place you can place yourself to block certain things, like process creation, driver loading, etc...

macOS has a userland API for this, called EndpointSecurity, which allows doing all the things an EDR needs, without ever touching kernelland. Microsoft could introduce a similar API, and EDRs would no longer need a driver.

mbreese · 2024-07-22T15:17:16.000000Z

This is exactly what I’d advocate for. There are many things that run in kernel space that don’t need to. The Mac model with user-land hooks is one model. EBPF from Linux (and Windows?) is another.

I’m sure the reason why Apple migrated was because of all of the bugs/crashes security companies kept introducing into the kernel with kexts. Apple had the ability to change their architecture on a whim because of they aren’t quite a beholden to backwards compatibility as Windows.

Microsoft could take this as an opportunity to make some major changes that would be more readily accepted by the market.

_flux · 2024-07-22T14:44:55.000000Z

I suppose that's what CrowdStrike's system on Mac uses as well, then. Apparently on Linux they use EBPF and Microsoft is researching that for Windows as well: https://github.com/microsoft/ebpf-for-windows . So maybe that's actually the solution they'll go with?

It would certainly help solving this particular problem, even if not the kernel-integration in general.

mbreese · 2024-07-22T14:16:44.000000Z

If many things had gone differently, this could have been avoided. But I’m looking at this from the Microsoft perspective. No matter how much people scream high and loud that it was a CloudStrike issue and not Microsoft’s fault, Microsoft is still getting blamed. It’s a Windows BSOD.

I talked to my dad (retired enterprise operations/IT) this weekend and he was telling me that the next computer he buys will probably be a Mac, largely because he doesn’t want to deal with the possibility of a crash like this. Does he run CloudStrike? Not at all. Does he know who they are? Nope. (He’s been retired for a while) What he does know (well, thinks) is that Windows now has an unstable kernel.

And Microsoft has no control over distribution policies for other vendors. How those vendors distribute updates is up to them. Even if a sane deployment strategy could have avoided the larger global problems, Microsoft can’t control that.

So, if you have Microsoft dealing with negative publicity and public sentiment, with no way to control errors like this in the future, what can you do? To me, the best they can do is kneecap CloudStrike, put the full blame on them, and use this as an excuse to change the kernel/driver model to one where they can have more control over the stability of the OS.

hello_moto · 2024-07-22T14:37:10.000000Z

They will kneecap Security industry and open up another can of worm: Windows insecure back on the menu.

mbreese · 2024-07-22T15:11:47.000000Z

There are other vendors.

Microsoft could even reinstate CloudStrike at some point, but only after an extensive review process. And then probably require similar process reviews/checks for any other vendor that requires the same kernel access.

Or just remove the need for kernel access at all and migrate to a better driver architecture at the sacrifice of backwards compatibility. Security software doesn’t need to run in kernel space… there are other ways.

hello_moto · 2024-07-22T15:22:41.000000Z

That could potentially be a lawsuit against MSFT since their own MSFT Defender is in this space and potentially doing the same thing or else they have way less potency of catching attacks no?

CaptainZapp · 2024-07-22T13:37:25.000000Z

> Backdoors of all kinds can be installed to most any operating system without vendor co-operation

Not on Kernel level. Not without active support by the vendor.

_flux · 2024-07-22T13:48:18.000000Z

How much does it really help you if your complete user-space can still be messed up by an offending Windows SYSTEM process? As I understand it, they are able to hurt the system e.g. by killing processes, uninstalling applications, replacing binaries, allocating memory, starting too many processes, ..

Actually I could easily see a buggy remote system management update could just decide to uninstall everything and nuke the system, because it thinks it's stolen. And it would be designed functionality for it.

roblabla · 2024-07-22T11:14:31.000000Z

Unless you have a source, you should really avoid spreading misinfo here. CrowdStrike doesn't have kernel-level ACE. It has a buggy configuration parser, and they pushed a corrupted config that triggered those buggy codepath in the parser.

dathinab · 2024-07-22T12:46:28.000000Z

there are some places claiming that this "config" language is so flexible that it's basically a interpreted scripting language

but AFIK no sources I trust have yet claimed it

but it's probably where the idea comes from

sschueller · 2024-07-22T11:32:16.000000Z

My source: https://www.youtube.com/watch?v=wAzEJxOo1ts

From my understanding the CS driver lives in the kernel space and parses configs/applications downloaded in the user space. Hence the system even does a BSOD.

"CrowdStrike doesn't have kernel-level ACE" please provide your source.

roblabla · 2024-07-22T14:27:09.000000Z

> From my understanding the CS driver lives in the kernel space and parses configs/applications downloaded in the user space. Hence the system even does a BSOD.

That's my understanding as well, but not quite the same as

> execute arbitrary code on the kernel level without any checks

At least for me, when we talk about kernel-level ACE, it's something like libcapcom[0], which allowed executing arbitrary unsigned code in the kernel.

Here, the driver can only execute the code present within itself, which was signed by Microsoft. The configuration itself isn't signed by microsoft, but the config isn't code (at least, as far as I can tell - I see some people claiming the CS-0000.sys files are essentially bytecode, but have yet to see conclusive proof of this).

Now, we could argue that it's weird that Microsoft signed a buggy driver, and MS should do better qualification of third-party drivers. But in practice, MS doesn't really vet driver quality. From what I can tell, the driver signing is mostly there so they can easily attribute provenance of drivers, and revoke the certs if it ends up in the hands of malicious actors.

[0]: https://github.com/notscimmy/libcapcom

SoftTalker · 2024-07-22T14:28:11.000000Z

I never assumed that driver signing was any kind of indicator of quality. It simply says "this is the Crowdstrike driver, it has not been modified"

Maybe I'm wrong and Microsoft does some QA on drivers before they are signed?

AshamedCaptain · 2024-07-22T11:55:39.000000Z

> when it can execute arbitrary code on the kernel level without any checks

That would be grounds for blacklisting indeed (out of experience). However, that's not the case here, no matter how you put it.

cookiengineer · 2024-07-22T12:39:15.000000Z

> What is the point of driver signing if a vendor can basically build in a back door and Microsoft doesn't validate that

Before you downvote that comment, I'd like to remind everyone that this was already happening. Realtek's driver cert was leaked, and a lot of malware used this cert to sign their drivers for _a decade_ until anything happened about it.

Microsoft's driver signing workflow is utterly pointless and it doesn't mean anything. Any vendor that takes their security serious should never trust those driver signatures.

rramadass · 2024-07-22T14:29:52.000000Z

Finally! You hit the nail right on the head!

angulardragon03 · 2024-07-22T09:05:19.000000Z

You’re missing something. The Crowdstrike issue was caused by a channel update (basically a definitions update) that they pushed that broke their own sensor. Microsoft wasn’t involved in the delivery of that update.

roblabla · 2024-07-22T08:57:55.000000Z

Do you have a source for this? It's the first time I hear of this. From what I've understood (perhaps wrongly), the error came from the CrowdStrike driver (csagent.sys) having bugs in their configuration parser that could cause it to BSOD. CrowdStrike pushed a corrupted configuration (the CS-000whatever.sys we're told to delete) that hit that bug. I'm not sure how Microsoft fits in this story.

mbesto · 2024-07-22T09:14:26.000000Z

Just read more into it. You're correct. I think it would be dumb to solely blame MS, but I don't think you can completely absolve them.

this comment right here sums it up:

> Sure, but Windows shares some portion of the blame for allowing third-party security vendors to “shit in the kernel”.

https://news.ycombinator.com/item?id=41006176

roblabla · 2024-07-22T11:11:24.000000Z

Yeah, the fact that Windows requires kernel-level access to be able to do EDR stuff is really unfortunate. MacOS has been very successful with their userspace EndpointSecurity Framework for this purpose.

On the other hand, Linux is similarly crippled: eBPF LSM are fairly recent and don't work everywhere (I'm looking at you Ubuntu[0]), and the only real alternative if you want to be able to block processes is a kernel module. Which comes with the same dangers as Windows.

[0]: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2054810

sgift · 2024-07-22T12:04:14.000000Z

Well, Microsoft says that's because of the EU commission: https://news.ycombinator.com/item?id=41029590

I have my doubts, but that's at least what they give as the reason for the kernel-level access of EDR tools.

beAbU · 2024-07-22T10:07:45.000000Z

Crowdstrike pushed out a world-wide update to their client software, which auto-updated itself.

This update was buggy and it caused the host machine to go into a BSOD boot loop.

The fact that the host machine happened to be running Windows has very little to nothing to do with it.

It's like blaming a pothole for your car going exploding. Yes there was a pothole, yes it shouldn't have been there, yes it could have been avoided, but the fact that your car self immolated because of mere pothole points to possibly other underlying issues with your car.

PedroBatista · 2024-07-22T10:01:24.000000Z

The "only" thing Microsoft should be blame is for signing a driver updates itself by taking "configurations ( code )" from userspace and turning a blind eye to all these loopholes because they also know it's not practical for them to sign all the driver code that goes into the kernel.

Maybe most of these "drivers" shouldn't be in Ring0 to begin with? This is a general problem and the norm, Windows is just another OS that allows this this way.

dathinab · 2024-07-22T13:12:01.000000Z

it's not that simple

user-land drivers are a thing, heck they are the standard for modern micro kernel architectures

and even with hybrid kernels pushing part of the code out of the kernel into something like "user land co-processes" is more then doable, now it's now trivial to retrofit in a performant way and flexible way but possible

Mac has somewhat done that (but I don't know the details).

On Linux it's also possible, through with BPF a bit of a in-between hybrid (leaving some parts of the drivers in kernel, but as BPF programs which are much less likely to cause such issues compared to "normal" drivers).

A good example for that is how graphic drivers have developed on Linux, with most code now being in a) the user-land part of the driver and b) on the GPU itself leaving the in kernel part to be mostly just memory management.

And the thing is Windows has not enforced such direction, or even pushed hard for it AFIK, and that is something you can very well blame then for. You in general should not have a complicated config file parser in a kernel driver, that's just a terrible idea, some would say negligent and Windows shouldn't have certified drivers like that. (But then given that CrowdStrike insists that it _must_ be loaded on start (outside of recovery mode) I guess it would still have hung all systems even if the parsing would have been outsourced because it can't start if it can't parse it's config).

traject_ · 2024-07-22T14:17:46.000000Z

> And the thing is Windows has not enforced such direction, or even pushed hard for it AFIK, and that is something you can very well blame then for.

Even here it's pretty hard to blame them due to antitrust concerns. Just google the word Patchguard.

dathinab · 2024-07-24T07:58:42.000000Z

Thats is misleading.

Falcon uses apis like eBPF when available/usable , they are not stupid if they can use something which is more secure and reliable why should they not use it.

E.g. they use it on Linux, even through they could have created a custom kernel module (idk. if they maybe also have a custom kernel module tbh.).

And pushing for something doesn't mean banning other things. E.g. they could certify "following best security practices" and not give it to vendors not using the more modern APIs, while they can't block drivers based on it with the right marketing customers of CrowdStrike wouldn't want to buy it without such cert.

I.e. while MS doesn't provide viable ways to get the functionality Falcon and similar need without kernel modules it indeed would be a bit ridiculous for them to ban such software, and as of yet the do not.

GuB-42 · 2024-07-22T09:29:47.000000Z

What I understand is that some Azure VMs are running CrowdStrike, and like any other computer running CrowdStrike on Windows, they crashed. Totally not Microsoft's fault, CrowdStrike messed with the kernel, the only thing we can blame Microsoft on is to allow such a software to exist.

Where Microsoft is to blame however is the unrelated Azure outage in the Central US region that happened (and was fixed) just before the CrowdStrike faulty update.

ninepoints · 2024-07-22T09:11:55.000000Z

You're missing something. Many somethings.

watt · 2024-07-22T09:59:30.000000Z

There should not be any software caused crashes during operation of software. Every NPE that is not caused by hardware issue, is a null pointer not properly handled. Software needs to handle their null checks. Missing a check (or precondition, or validation) is squarely on Microsoft.

> always indication of a kernel bug and before

but:

> blaming microsoft for the crowdstrike bsod is stupid

and who owns the kernel in windows land? Microsoft. how is it stupid to blame Microsoft for not making kernel safe?

mebeim · 2024-07-22T10:27:41.000000Z

Let me give you an analogy: Volvo is known to manufacture very safe cars. Now let's say I drive a Volvo car with a box of dynamite on the passenger seat. I stop at a red light but hit the brake a bit too hard and the box of dynamite falls and causes an explosion, disintegrating everything in a 20-foot radius. So whose fault was it? Volvo?

> Missing a check (or precondition, or validation) is squarely on Microsoft.

Missing a check for presence of dynamite before allowing me to start the car is squarely on Volvo!

You see how silly that sounds?

Now, back to being serious: MS cannot possibly control and validate everything you decide to install and run on your system, specially if the things you install are kernel drivers. It is simply impossible. If you install a kernel driver developed by a 3rd party company, and that driver crashes your system because the devs at that company forgot to perform proper validation of data, well... that's on them. Even if MS wanted, they wouldn't be able to verify the soundness of any piece of code that is installed as a driver and runs with kernel level privileges. That'd require solving the halting problem.

Diggsey · 2024-07-22T10:24:54.000000Z

Microsoft don't own the kernel in that sense: anyone can write kernel drivers for windows... While there are some things that the kernel can do to protect against a bad driver, it's not a security boundary, so ultimately bad code can cause crashes.

AIUI, Microsoft actually has good tooling for validating drivers before they are deployed, but it requires that you actually run the validation...

roblabla · 2024-07-22T11:06:17.000000Z

@watt there's a big difference here.

eBPF is a bytecode that is interpreted in the kernel, with the explicit goal to allow writing code that executes at the kernel-level in a safe way. Any kernel panic (again, short of pid1 kills) is considered a bug, and could even potentially be exploited to gain capabilities in some cases. Here, the kernel explicitly says "this is safe", so any problem within is a bug in the kernel.

In contrast, a kernel module/driver is just some third-party code that is loaded in the kernel. Here, all bets are off: it is up to the third-party to do their job properly and make sure their code is correct.

In this case, CrowdStrike explicitly opted into writing a kernel module, and then failed to, as you say, "handle their null check". The segfault wasn't in Windows code, it was in CrowdStrike code that lives in the kernel. Crowdstrike should have handled their nullcheck, failed, and that will lead to a BSOD.

To be clear: the only way microsoft could make the kernel safer here is by disallowing kernel modules entirely. While there is an argument to be made that this could be a good idea, it is a bit beside the point.

yftsui · 2024-07-22T07:28:43.000000Z

Not surprising at all. My work issued MacBook top CPU time has been always `com.crowdstrike.falcon.Agent`, before Apple M1 released my Intel 2019 MacBook Pro can barely do any everyday task with that Agent running in the background. It crashed video calls, crashed the entire OS, I couldn't even smoothly type in an IDE back then.

fernandotakai · 2024-07-22T09:16:44.000000Z

yup. i worked at a company that used crowdstrike's falcon agent and it was an incredible cpu hog.

nowadays i work at a place that uses a different solution and guess what: it's also a f-ing cpu (and i/o) hog -- it makes my m1 pro macbook slow to a crawl and there's no way to disable it.

em500 · 2024-07-22T09:52:12.000000Z

Part of Windows' bad reputation (for instability and poor performance) is likely due to Windows being the standard on corporate computers (outside of tech companies) where admins/management insist on installing tons of "enterprise solutions" that slow quad core PCs with lightning fast SSDs to a crawl. MacOS has the same problem as soon as they're deployed in large corporations. I had a company issued MacBook where a bad printer driver cut the battery life in half for a month or so.

acdha · 2024-07-22T11:46:35.000000Z

> MacOS has the same problem as soon as they're deployed in large corporations.

Except where Apple does not allow vendors loose in key places like the kernel. One of the interesting questions here is whether Microsoft could possibly do that: Windows users would be better if the kernel was restricted to first-party code, things like AV used the same kind of interface which macOS has, and third-party code was forced into more moderated channels (malware uses many of the same techniques) – but there’s a security industry with revenue measured in tens of billions of dollars annually who would be running to the regulators if there was anything which could remotely be seen as favoring Defender over their products. I still think it’d be possible but hard enough that I’m not surprised they’ve slowly been letting awareness of the downsides build, especially on the enterprise IT side.

I was wondering whether this debacle might push them to have a roadmap for restricting kernel drivers in favor of the Windows eBPF implementation which has been approaching production grade. Sometimes you need a huge blowup to remove support for the status quo.

WorldMaker · 2024-07-22T14:26:23.000000Z

> I was wondering whether this debacle might push them to have a roadmap for restricting kernel drivers in favor of the Windows eBPF implementation which has been approaching production grade.

Though as this article and its Red Hat respondents admit eBPF isn't a perfect solution either because it is still a somewhat Turing Complete scripting language and bad vendors will find ways to get kernel panics out of eBPF scripts no matter how hardened the eBPF driver gets.

Microsoft is probably in a good position to use this debacle to push more vendors to Windows' implementation of eBPF. It doesn't solve the crisis that a vendor like CrowdStrike exists that is "beloved" by Enterprise Solution Architects for all the compliance boxes it checks, but is run as a terrible software company with bad standards and has multiple "accidents" in recent weeks.

acdha · 2024-07-22T19:42:21.000000Z

Yeah, I’m not saying eBPF is perfect but it’s getting better and has a path to making things much safer. I’d compare that to where things were with memory safety 20 years ago where it seemed unlikely that anything could displace C/C++ but by now we’re seeing a lot of important things written in memory safe languages. For a company with Microsoft’s resources, I’d imagine they could do quite a lot if 10% of the CEO’s bonus was instead invested in making their customers safer.

nikcub · 2024-07-22T12:47:40.000000Z

Technically, they could do it - I believe Microsoft tried in the distant past. Problem is as soon as they restrict ring 0 to first-party only it would raise competition and antitrust issues and be seen as Microsoft favoring it's own solution and locking out third parties.

mrkstu · 2024-07-22T13:02:27.000000Z

Not if MS’s equivalent also used the new system.

acdha · 2024-07-22T19:38:42.000000Z

I definitely think it would be critical that Defender launches on day one using only the new APIs.

jajko · 2024-07-22T12:07:46.000000Z

Quadcore? More like 12 core corei7-1365U. Its literally just a function of time (aka forced silent updates from admins) till it becomes slow like early 2000s desktops running modern software. Same for HDD.

Once I got new laptop due to some internal migration, it was blazingly fast. Well, not so much anymore. I literally don't install anything on it since receiving it, I simply can't (unless its just about copying to c: and it runs). Some colleagues have stuff like windows firewall running constantly on 50% cpu, nothing admins can fix apart from replacing ntb.

fernandotakai · 2024-07-22T10:56:21.000000Z

totally. i have a macbook air m2 and it performs better than my work m1 pro because of the bloatware.

my zsh config spawns in ~90ms on my macbook air m2 while it takes 600ms in the m1 pro.

vips7L · 2024-07-22T14:34:41.000000Z

My PowerShell startup time on my work laptop is around 4000ms. Corporate IT ruins everything.

krzyk · 2024-07-22T12:00:13.000000Z

This is not something Crowdstrike specific, my company uses SentinelOne and it is also as intrusive and CPU intensive - basically makes development work on intel mac almost impossible.

I hate all the EDR nonsense on laptops. I wonder if the added cost for lost workhours and electricity wouldn't be more than the tiny chance of catching a malware.

ykonstant · 2024-07-22T06:05:37.000000Z

Sorry to hijack this post, but for affected admins reading this: how is the recovery process going? What is your estimated time to normalcy?

Also, for Linux and especially BSD admins: has this incident affected your perspective on EDR/XDR systems in the kernel? What would you suggest as an alternative to ensure regulatory compliance?

tgv · 2024-07-22T06:15:17.000000Z

I do manage a few Linux machines (firewall passes only http and https -> nginx -> custom backend), but I'd never heard of Crowdstrike before. I don't even know what their product is supposed to do. As far as I can see, kernel level protection could only help prevent someone bypassing the firewall and trigger an exploit in nginx. But if Crowdstrike knows about such exploits, everyone does, and the firewall or nginx gets patched.

What am I missing?

Edit: I know it is supposed to implement "EDR", but it's always explained in the vaguest of terms.

Khaine · 2024-07-22T06:26:18.000000Z

It is primarily aimed at workstations, although it does run and is run on servers. The idea is to be able to identify malware based on behaviour, rather than rely on signatures.

EDR solutions hook into the kernel to log, and block system calls. They use this information to try and generically identify malware. For example you could detect ransomware by identifying a process that is enumerating a large number of files, reading from those files, and then saving those files.

For a SOC, you can also use an EDR to identify files, hashes, connections to given IPs across your fleet of servers. This can allow you to see what devices have been compromised. The EDR can then isolate them, by blocking network syscalls and allow only the SOC to access to investigate and remediate.

This is the value they provide (or at least claim to) to a cyber team

weberer · 2024-07-22T11:34:49.000000Z

Basically it monitors activity on your computer (process spawning, file changes, etc) and logs them as "Events". Then it sends those to their ML models for "Detection". And if malware behavior is detected, then they perform a "Response" whatever that may be. Probably notifying the user and IT department.

jabroni_salad · 2024-07-22T06:31:11.000000Z

> But if Crowdstrike knows about such exploits, everyone does

This is actually the most important thing happening with EDR as a concept, it handles novel cases that have never been seen before, with a human review very quickly. Our csirt has an SLA of 3 minutes.

It's right there in the name acronym. Detection and Response.

therein · 2024-07-22T08:07:39.000000Z

So like let's say a user of a computer in my fleet ran something infected with malware that had enough diligence to have a unique file signature. It puts itself to startup items in a creative way and then calls back home with just a standard SSH connection.

In that case are you telling me their pitch is that they detect this behavior, dispatch some human agent from their CSIRT within 3 minutes to remotely but manually come check the binary, dump some strings, do some reverse engineering and track the CC server etc?

michaelt · 2024-07-22T09:14:15.000000Z

> In that case are you telling me their pitch is that they detect this behavior, dispatch some human agent from their CSIRT within 3 minutes to remotely but manually come check the binary, dump some strings, do some reverse engineering and track the CC server etc?

Crowdstrike is not in the business of selling to people who know WTF any of that means.

Crowdstrike is in the business of selling to people like the CEO of Southwest Airlines. Their pitch is "The definitive AI-native SOC platform; Forrester named CrowdStrike a Leader in The Forrester Wave for Managed Detection and Response (MDR) in Europe; IDC MarketScape name CrowdStrike Named a Leader in Worldwide Risk-Based Vulnerability Management Platforms 2023 Vendor Assessment"

If the CEO consults people lower in the hierarchy, the pitch is "Some asshole has decided you need to be SOC2 compliant, that means you need to run antivirus, our product will check that checkbox and though our product is not good, it is at least better than mcafee or symantec"

p_l · 2024-07-22T10:56:55.000000Z

Not necessarily from Crowdstrike CSIRT, but I have experience of security calling me back within 30 minutes of changing system security impacting file to verify that it was done by me and not something else.

Probably because they had already looked at the modification which was benign so slower escalation path in absence of other indicators.

rswail · 2024-07-22T14:31:40.000000Z

tripwire(1) has been part of systems for decades.

Bullshit about "they had already looked at the modification which was benign".

So your "security" is to totally expose every operation of your software to an external party with absolutely no auditing of what data they are exfiltrating from your system?

p_l · 2024-07-22T15:13:56.000000Z

It was handled by internal security team.

Also, tripwire was limited to periodically scanning files, couldn't scan for example syscalls and trace relationships between them.

But yes, tripwire is a very early EDR/XDR.

jabroni_salad · 2024-07-22T16:08:21.000000Z

If the device isnt in a technical user collection, then the fact that an outbound SSH connection happened at all is a pretty good IOC. A fucking slack bot can respond to that.

rswail · 2024-07-22T14:28:54.000000Z

That's just marketing bullshit.

"We have magic code that watches everyone's computer and sends it all back to our system, where we apply magic to detect malware and then send the code back to all of your systems and until we can say we have AI, we're going to lie that a human will be able to review this information in 3 minutes.

jabroni_salad · 2024-07-22T16:06:51.000000Z

We staff our own SOC and 99% of the tickets that go thru it are just 'some app we already know about has updated to be slightly different' or 'some new app has appeared and needs to be documented'. It is super rote and boring.

p_l · 2024-07-22T06:28:19.000000Z

The whole point of those systems is catching actual behaviours, not patching/firewalling per se (though they do some level of permission management on some platforms).

For example, patching nginx is not going to help if your user gets phished of an suth token that was explicitly supposed to let them run code on the server - bit catching that the code started browsing files elsewhere and sending data out will help you notice the breach.

skywhopper · 2024-07-22T13:35:48.000000Z

In the ideal case, it logs everything your computer does, every process that's running, every system call they make, every Internet connection made, website visited, etc, and reports it all back to a central data repository that's constantly being scanned for suspicious behavior. But more importantly, when a hack does occur, the security team can go back to that data lake and figure out exactly what happened.

In reality, that's way too much data for anyone to make sense of, but giant companies spend tens of millions of dollars per year to deploy all the things so they can say they're doing it.

On the other hand, funny things can happen. I got called out by the security team at one job because the EDR agent on my workstation registered that I had put a file on disk that had a malware signature. Well, it turns out that I had checked out the security team's git repo containing malware signatures...

But I did get called out in about 20 minutes by a random security engineer I'd never met who told me the exact path on my PC where the file was. Is that a good thing? I'm not sure.

tguvot · 2024-07-23T01:11:00.000000Z

not strictly sysadmin. working on fedramp certification of really big system.

we have crowdstrike in our usual production environment and i was always against it because i was always afraid of something like what happened. but security department pushed it through because this is something that they understand and can control. "security architecture of product" it's not concept that they understand.

but answering your question, fedramp does requires EDR to be present. according to our fedramp advisors clamav is sufficient for passing audit.

pilif · 2024-07-22T11:38:19.000000Z

> We're in the process of operationalizing an opt-in to this technique

oh the jargon

> We're making progress by the minute

I better hope you do.

red_admiral · 2024-07-22T08:15:33.000000Z

In princple, yes, if you have third-party Ring 0 kernel-mode drivers, they could crash a POSIX system as well as a windows one.

But that doesn't seem to be what happened here.

Random idea that I haven't fully thought through: continue to run the kernel at Ring 0 and userland at Ring 3, but move "tools" like this to Ring 1.

doikor · 2024-07-22T08:40:35.000000Z

Problem with that is the tool can’t protect the system from any bad actor who gets ring 0 access.

And even if it has ring 0 access it can’t really verify anything without secureboot or something like it verifying that nothing else started before it. This is also why Riots anti cheat runs as ring 0 as it has to protect the game against the owner/admin of the machine.

(And after that there is still bios or firmware level exploits)

ahazred8ta · 2024-07-22T08:56:58.000000Z

Windows has an official ELAM Early Launch AntiMalware framework, which Crowdstrike complies with. The Crowdstrike driver is right where it's supposed to be, according to Microsoft.

prmoustache · 2024-07-22T08:47:16.000000Z

I'd be interested to know how many of the companies involved in data breach / massive pwnages in the last 2 years were using crowdstrike on their devices/servers.

naveen99 · 2024-07-22T11:05:42.000000Z

I want to see some proper accounting on what percent of company budget, percent of compute and memory, percent of employee time is spent in the name of security. Is it more like the usa on military 1-3% or like 10-20% ? 1% seems like a reasonable number. I suspect places that have crowdstrike installed are closer to 10-25% closer to military dictatorships.

Aperocky · 2024-07-22T13:52:59.000000Z

Might have been 4-5% in name but in between lost productivity by computers running as fast as a Pentium 4 (maybe slower?) and of course this complete dumpster fire that just happened maybe closer to 20%?

jgalt212 · 2024-07-22T13:19:41.000000Z

We had some hiccups over the weekend with Ubuntu Server / Hetzner.

- apache2 crashed on VM (had not happened in 8 months--our entire tenure at Hetzner

- another VM become entirely unresponsive--would only respond to ping. could not even access via control panel provided console. had to do a reboot. after that, the box seems to be ok.

- we are still waiting on a response to our ticket from Hetzner.

ExoticPearTree · 2024-07-22T08:24:44.000000Z

People drink the kool-aid and believe that Linux needs antivirus. So many vendors tried and failed at scaring people into buying Linux antivirus that I've lost track over the years.

My perspective is that it is a very very poor idea to have an anti-malware solution running on a Linux system. CrowdStrike is very persistent in their sales pitches and do FUD campaigns better than the competition to convinse people with decision making authority that Linux and Containers and what needs an antivirus.

nikcub · 2024-07-22T12:50:52.000000Z

Recently helped somebody who accidentally left dockerd open to the internet with no auth. It was hacked, backdoored and running a crypto miner within hours. Only detected when the hosting company emailed him days later because the worm was further scanning other hosts from his machine.

How would you stop, detect and/or remove this threat from this machine on a linux server without antivirus / EDR?

ExoticPearTree · 2024-07-22T13:17:14.000000Z

You have metrics from the server which tell you that you're running 100% CPU for a period of time. If the crypto miner wasn't something very dumb, it would not be detected. And I can scan a network using nmap with an XML or grepable output format options.

CrowdStrike doesn't remove threats. It would stop the process and quarantine the file. It requires knowledge on how to actually remove the threat beyond the quarantined file.

Ajedi32 · 2024-07-22T14:25:56.000000Z

Yeah, desktop Linux more obscure and generally used by more technically inclined people so malware is less common, but out of the box its just as vulnerable to viruses as Windows or any other OS that runs user-installed applications with no sandbox.

If we were _really_ serious about endpoint security we'd be pushing business users towards operating systems with more modern security architectures, like Android, iOS or Chrome OS. That would be a lot of work though due to the fact that most legacy software is not compatible with those systems.

louthy · 2024-07-22T10:06:16.000000Z

What makes you think Linux can't have a virus or malware? It has attack vectors like any piece of software, surely?

Spivak · 2024-07-22T16:58:17.000000Z

> People drink the kool-aid and believe that Linux needs antivirus

No we don't we just have compliance checklists to get through.

ChrisMarshallNY · 2024-07-22T12:11:05.000000Z

Oh dear.

I would really hate to be anyone on the team responsible for the Falcon Sensor, right now. I suspect that their printer is churning out CVs, like nobody's business.

> and will update this story if we receive substantial information.

I'll lay odds that there's folks at Crowdstrike that are thinking of just responding with poop emojis.

chucke1992 · 2024-07-22T08:23:37.000000Z

I mean fundamentally if Linux was used in the same use cases as Windows, it would have more issues like this. After all the scenarios of Windows are based on real world use cases.

notepad0x90 · 2024-07-22T06:27:24.000000Z

I'd like to see a proper journalistic investigation into every other EDR sensor on Linux as well. I really hate it when supposed journalists look for an angle and pursue it without critical thinking.

But they could be right, they may have an issue in their engineering department recently.

---

Speaking of,I wanted to mention a slightly related observation I've had recently scrolling through twitter (unfortunately) looking for information on this crisis. There are a lot of people who at least know enough technical jargon to probably work in IT or technology but they're using arguments like "it must be because of a DEI hire", for those who are unaware they're using "DEI" as a replacement for a hard-r N-word. In other words, I just learned that blatant racists are not a rarity in our corner of the world. If you're not one of them, I wanted to inform you of what they really mean. Fortunately I haven't seen this on HN so far :)

rsynnott · 2024-07-22T07:05:06.000000Z

> There are a lot of people who at least know enough technical jargon to probably work in IT or technology but they're using arguments like "it must be because of a DEI hire"

There is a certain type of tech person (well, they probably exist elsewhere, but the tech variety is particularly noisy) who seems absolutely determined to use this one every time a company does something stupid; there seems to be an odd unwillingness to blame _process_, rather than some sort of imagined individual saboteur (who would, preferably, in the minds of these people, be someone other than a straight white man).

A particularly extreme example, to the point that it almost read as parody; when the door plug blew out of that 737, there was a certain amount of fixation by the weirdos on Twitter on how the pilot was a woman. Quite how this was supposed to have anything to do with it was unclear.

The whole thing is pretty weird, and feels quite new. Sometimes, a poorly-run company is just a poorly-run company.

notepad0x90 · 2024-07-22T07:12:24.000000Z

You're right, I'm actually surprised my post is getting negative downvotes, it was only informative in my opinion. But perhaps the people I'm talking about are also on HN they're just silent on this topic due to efficient moderation.

Imagine actually being a minority, woman or any group like that and making a mistake. sucks.

thworp · 2024-07-22T10:23:10.000000Z

> it was only informative in my opinion

Was it though? Your second paragraph is written in a way that it can easily be interpreted as "all critics of DEI hiring are racists". Is that a more sophisticated statement than "all minorities are incompetent"? Is being a racist the only reason someone might be opposed to quota-based hiring?

notepad0x90 · 2024-07-22T11:43:10.000000Z

yes, it was. I was informing you that all critics of DEI outside of an actual discussion thread about DEI are racists and "DEI" in that context is equivalent to the n-word. I hope that clarifies my intent better.

To clarify, I'm not to keen on DEI either, but we're not really talking about DEI here are we now?

shiroiushi · 2024-07-22T08:28:41.000000Z

>The whole thing is pretty weird, and feels quite new.

It's new because, 10-20 years ago in America at least, it finally became extremely unfashionable to utter blatantly racist stuff in public, to the point where it cost you your friends and maybe your job, depending on who heard it. But then Trump happened, and casual racism became OK again among half the population, like the 50s have returned.

EnigmaFlare · 2024-07-22T10:08:33.000000Z

Don't be so quick to assume racism. You don't see people blaming Microsoft's cock-ups on Indians who are not DEI hires but would be a valid target for racists. I think blaming DEI hires for problems is mostly people who don't like the idea of hiring based on race/gender/etc. and who also realize that if you do that hard enough, you're bound to get inferior people because you're limiting your hiring pool. Of course companies that pay low salaries are also limiting their hiring pool and also bound to get inferior people. There are plenty of complaints about that too - if a famously low-paying big consultancy company (IBM?) has a cock-up, it's often blamed on their low pay.

SoftTalker · 2024-07-22T14:56:04.000000Z

"Must be a DEI hire" is also a zero-effort meme response that will reliably get upvoted by at least some subset of followers/readers. A lot of the time it's no more than that.

notepad0x90 · 2024-07-22T16:13:45.000000Z

What upset me was not the trolling, they say the actual hard-r n-word all the time on twitter with no consequence. It is them actually going into technical details and then using that slur.

regularfry · 2024-07-22T10:27:30.000000Z

> I think blaming DEI hires for problems is mostly people who don't like the idea of hiring based on race/gender/etc. and who also realize that if you do that hard enough, you're bound to get inferior people because you're limiting your hiring pool.

Wherever I've seen diversity initiatives, the point is to expand the hiring pool, not shrink it. In other words this is people who, for whatever reason, want to portray the situation as precisely the opposite of what it is.

plesner · 2024-07-22T10:44:05.000000Z

> I think blaming DEI hires for problems is mostly people who don't like the idea of hiring based on race/gender/etc. and who also realize that if you do that hard enough, you're bound to get inferior people because you're limiting your hiring pool.

Isn't that exactly the problem though: hiring is currently based on race/gender in favor of white/male hard enough that you get interior people from that hiring pool?

notepad0x90 · 2024-07-22T11:44:55.000000Z

no, not in tech companies and certainly not at crowdstrike. lookup people who work there and you'll see. In cybersecurity, the talent pool is so small you can't even pick and choose like that even if you wanted to.

notepad0x90 · 2024-07-22T16:12:29.000000Z

I disagree, it is 100% racism. See, you're talking about DEI and it's merits instead of how in this context there is no reason to even bring up DEI. The point of me mentioning this is for people like you who think they just disagree with DEI, that isn't the case. The far right is using "DEI" to mean n-word just like they did with the whole "woke" thing. Look at Kamala Harris, and how they're calling her a DEI VP/nominee or the USSS chief being a woman and them saying it's because of "DEI", and look at the post history of the people that are saying this and then tell me they are not racists substituting n-word or any other "you know what I really mean" slurs with "DEI".

Imagine if any minority or woman was involved in any crowdstrike team that had anything to do with this outage and consider how the DEI debate is being raised in that context. If those people are hired, does it make them a DEI hire by default? of course not! that is silly, that's not even what DEI is. It is a slur in this context, nothing short of it.

I wouldn't be saying this if DEI was in any way relevant to the topic at hand, then we can discuss how DEI was to blame. Honestly, I don't even agree with the corporate approach of DEI, but bringing it up in this context would indeed be racism. Imagine making a mistake and someone say "this is what you get for allowing white privilege" is that fair? is that not racism? that's what's happening here. Everyone deserves fair treatment.

EnigmaFlare · 2024-07-23T01:17:22.000000Z

We must hang out in different places. Are you seeing all that racism directly or had it been curated and presented to you by leftists to show off how racist rightists are? That's a big problem with social media - highlighting the extremes of the other side which most members of the other side don't even know about.

notepad0x90 · 2024-07-23T06:02:27.000000Z

This is specifically an alt-right thing, twitter is taken over by them thanks to Elon who is also alt-right, so you'll see this on there.

jordanb · 2024-07-22T10:33:49.000000Z

> Speaking of,I wanted to mention a slightly related observation I've had recently scrolling through twitter (unfortunately) looking for information on this crisis. There are a lot of people who at least know enough technical jargon to probably work in IT or technology but they're using arguments like "it must be because of a DEI hire",

Very easily could be one highly technical or just highly resourced individual with a bot farm..

Rinzler89 · 2024-07-22T06:47:28.000000Z

>I just learned that blatant racists are not a rarity in our corner of the world.

That's what you get if you browse Twitter. Stay off mainstream Twitter for your own sanity.

> Fortunately I haven't seen this on HN so far :)

Here people just blame offshore workers for this bug or other such critical bugs, as if US workers don't make mistakes. Peoples' egos are just unbelievable. If it's not DEI developers, then it must be those filthy foreign programmers from developing nations responsible for poor quality software and I've even seen here mentioned that Windows worldwide dominance also be blamed on SW devs from developing nations for being too poor to own Macs. I despise this "holier than thou" mentality of some privileged tech workers.

shiroiushi · 2024-07-22T08:32:03.000000Z

>Windows worldwide dominance also be blamed on SW devs from developing nations for being too poor to own Macs.

Linux has been a viable alternative for decades now for many tasks. This problem wasn't caused by a lack of money. If you want to blame someone for the dominance of Windows, blame corporations and managers, because they're the ones that have chosen it. SW devs from developing nations have only been doing what they perceived to be in their best economic interest.

notepad0x90 · 2024-07-22T11:47:03.000000Z

I'm sorry to say that while Linux is viable for highly skilled people, managing it as you would an AD joined windows workstation or server is not as easy. finding people to run Linux is also much harder.

shiroiushi · 2024-07-22T23:56:43.000000Z

Again, this is a decision by the managers and corporations, which is my whole point. They could have chosen to use Linux despite these difficulties. You can argue all you want about how much easier Windows was to administer, but ultimately it was a choice by the corporate managers to use it. Linux had its own advantages (cost, not having to worry about all the MS viruses of the 2000s, etc.) too.

notepad0x90 · 2024-07-23T06:03:56.000000Z

If you can't afford something, for you it does not exist. Linux does not exist for those managers and corporations for the specific use cases we're talking about. If you simply can't even get an applicant for a linux admin job in your hiring pipeline, what is the point of even pretending linux exists?

shiroiushi · 2024-07-23T06:20:22.000000Z

You still haven't invalidated my point. The managers chose Windows, instead of choosing something else, like Linux, Mac, Sun, SGI, Cray, etc., for various reasons (possibly including cost, availability of qualified employees, etc.).

They did not choose Windows because a bunch of devs from India were too poor to buy Macs, which was the original assertion.

notepad0x90 · 2024-07-23T11:35:38.000000Z

I don't agree with the original assertion either. Yes, it was for various reasons, my point was it is not their responsibility to choose non-windows but rather the responsibility of Linux, mac ,sun,etc.. to make their product acceptable to the market.

Rinzler89 · 2024-07-22T09:21:10.000000Z

>Linux has been a viable alternative for decades now for many tasks.

For decades?! Maybe on the servers, but on PCs, hardly. Also, even if that may be the case now, it doesn't change the fact that at your job they'll most likely use Windows not Linux on the workstations. Linux is king on the servers, but PCs everywhere will mostly still be Windows, especially in corporate environments.

shiroiushi · 2024-07-22T09:51:43.000000Z

>For decades?! Maybe on the servers, but on PCs, hardly.

While it wasn't as easy as it is today, I've been running Linux on my home desktop since around 1999. It's never been that hard, and it did require more careful hardware selection (no, you can't just grab some random dirt-cheap piece-of-shit "winprinter" and expect it to work with Linux), but it's always been quite doable for anyone who claims to be skilled with computers. We're talking about IT workers here, not grandma.

>at your job they'll most likely use Windows not Linux on the workstations

This is exactly my point in my prior post.

Rinzler89 · 2024-07-22T10:51:59.000000Z

>I've been running Linux on my home desktop since around 1999

Personal anecdotes are not statistics or cases representative for the average user or business. What others do with their systems and their requirements and apps could be very different than yours. You think if Linux was that usable at everything in 1999 companies and individuals wouldn't have loves to use that instead of paying thousand of dollars to Microsoft?

Just because you could set it up and use it in 1999 doesn't mean it was the norm. Some people know how to change their own oil while most don't and don't care to since they prefer to pay someone else to do it as they have other hobbies than learning to tinker with their car. Similarly some people like you like to thinker and find out how to get Linux to work in 1999 while most prefer to just pay to use Windows NT/MacOS and get to work.

krzyk · 2024-07-22T12:14:49.000000Z

> You think if Linux was that usable at everything in 1999 companies and individuals wouldn't have loves to use that instead of paying thousand of dollars to Microsoft?

Inertia is a strong force in corporations. I wouldn't count on reason there, inertia triumphs it.

Personal anecdotes is something that shows it can be done. In few corps I worked since 2004 I was also able to switch windows to linux (as many other developers there) and we didn't loose functionality. But I get it that for people that work mostly in excel it would be a blocker, as it doesn't have Linux version - so not all work could be done on Linux. But having options is good.

My current corp decided to give people a choice (after a decade of asking for it) and since 2 years we can choose between Windows, macOS and Linux.

(I'm still amazed that most developers chose macOS, as it is less power user friendly than Windows). Before I was the only one with Linux, now there are > 20% of us. And possibly more in the future when hardware will be upgaded.

SoftTalker · 2024-07-22T15:04:10.000000Z

The cost of a Windows license as a fraction of what it costs to staff a position in a corporate environment is not enough that it gets noticed or worried about. The costs of using Linux in terms of not being able to use standard software, not being able to hire administrators, having to train users who are unfamiliar with it, etc. just dwarfs any savings in license costs.

Rinzler89 · 2024-07-22T13:37:37.000000Z

>Personal anecdotes is something that shows it can be done.

Some people have built their own car from scratch in their garage that they use to drive to work, but it's unrealistic most people do that at scale, even though someone proved it's possible. The same way, why aren't you building your own car to daily drive and instead paying Ford/Toyota? Someone proved it's possible.

> In few corps I worked since 2004 I was also able to switch windows to linux

The vast majority of Windows/MacOS users are not SW-developers, nor to they have any deeper interest in tinkering with computers and learn Linux. They're content with what they're already familiar with.

You keep taking highly niche technical cases from your SW dev bubble and trying to extrapolate that experience as being mainstream when you're far from it. The photo studio or flower shop down the road in 2004 was no way gonna switch to Linux even if it was technically possible.

shiroiushi · 2024-07-23T00:00:20.000000Z

>You keep taking highly niche technical cases from your SW dev bubble and trying to extrapolate that experience as being mainstream

Why is it so hard for you to understand that we're not talking about average users here?

shiroiushi · 2024-07-22T23:59:05.000000Z

>Similarly some people like you like to thinker and find out how to get Linux to work in 1999 while most prefer to just pay to use Windows NT/MacOS and get to work.

We're not talking about regular people here. Try reading the prior messages for context before commenting. We're talking about people who claim to be IT professionals. If they were really as smart as they claimed, they could have done the same thing I did easily. They didn't, not because they loved Windows that much (maybe they did, maybe they didn't, it's irrelevant), but because it was seen as essential for their career.

protocolture · 2024-07-22T08:27:12.000000Z

Its honestly hard to hire women anyway. In my experience it doesnt matter if you find an extremely talented person the prejudices of the HR team often lead to them being pulled from the shortlist.

bdjsiqoocwk · 2024-07-22T06:36:58.000000Z

> it must be because of a DEI hire

Of course, white people making mistakes is unheard of.

Yeah it's quite frustrating how easily people latch on to the daily meme if it reinforces their preexisting prejudices.

Republicans calling Kamala Harris the DEI candidate in 3, 2, 1...

notepad0x90 · 2024-07-22T07:01:28.000000Z

For the record, crowdstrike isn't the type of company that does DEI hiring. For technical roles, they let technical people from the team you're interviewing for grill you and it is their opinion that decides the outcome of first round interviews.

rsynnott · 2024-07-22T09:37:51.000000Z

I'm not sure what people think inclusive hiring _is_. In almost all cases, it's pretty much entirely about the _pipeline_, trying to attract a more diverse set of people to interview in the first place. There's generally nothing special about the interviews.

bdjsiqoocwk · 2024-07-22T21:27:43.000000Z

> I'm not sure what people think inclusive hiring _is_

I bet you a lot of people think DEI is companies go "you're black, you're hired". Instead you're absolutely right, it's manipulating the pipeline so that 50% are the overall best and 50% are diverse. The "diverse" ones also have to do great in their interviews, no one is automatically hired.

rsynnott · 2024-07-23T06:08:31.000000Z

> it's manipulating the pipeline so that 50% are the overall best and 50% are diverse

Not even that; it’s changing the pipeline so that it has a lower weighting on upper-middle-class white men who went to one of about four universities (which does not actually equate with ‘overall best’). One surprisingly simple thing that companies can do is be realistic about their requirements in the job spec. There’s a fair bit of evidence that if you have an aspirational job spec, then some candidates will go “well, I meet some of that” and apply, and others will go “well, I only have five years of experience in [whatever], not six like it says in the job spec” and won’t apply. And group 2 tends to be less male, less white, and more working-class.

wongogue · 2024-07-22T07:21:18.000000Z

For your last point, there already is a long thread in the Biden post yesterday. As per them, it makes both the sides same somehow.

formerly_proven · 2024-07-22T06:58:35.000000Z

That stuff is absolutely on HN.

notepad0x90 · 2024-07-22T07:02:41.000000Z

Well, I just don't spend enough time on HN then :)

I'm curious about what you mean though, if you have any sample threads.

TacticalCoder · 2024-07-22T09:21:37.000000Z

> CrowdStrike's Falcon Sensor also linked to Linux kernel panics and crashes

How many Linux machines did crash a few days ago? How many Windows machines did crash a few days ago? Case closed.

ranguna · 2024-07-22T09:55:38.000000Z

This is not a competition or a case.

rramadass · 2024-07-22T05:47:38.000000Z

I had read reports of this earlier which is what makes me speculate that the Windows Crowdstrike issue is more than "just a update error" i.e. there might be some nefarious hand behind this. Given that they were already aware of the Linux issue it boggles my mind that they did not take extra precautions when it came to Windows updates. We will have to wait and see for further trustworthy info.

Btw - The article mentions Dave Plummer's analysis of the issue which might be easier for people to understand and worth a watch. - https://www.youtube.com/watch?v=wAzEJxOo1ts

rsynnott · 2024-07-22T07:09:30.000000Z

I mean, what’s more likely, realistically? Shadowy saboteurs, or a cybersecurity company being poorly run, like, well, all other cybersecurity companies ever?

Like, this is not new. They, as an industry, have been a byword for shoddy nonsense for literally decades.

michaelt · 2024-07-22T10:02:42.000000Z

I mean, hypothetically you might think computer security companies would be full of passionate computer security enthusiasts.

And as security tools break a lot of security norms - like sandboxing, least privilege, and running in userspace - you might think such enthusiasts would make sure they were coded with the utmost care. That this team of secure coding all-stars would be code reviewing, managing scope, fuzz testing, static analysing, formally validating and suchlike, as befits code running with the highest privilege levels.

Surely huge multinational corporations wouldn't grant unlimited privileges to kernel modules written by clowns.... would they?

If you believe the crowdstrike marketing, I can see how you might think shadowy saboteurs are the only plausible explanation.

rsynnott · 2024-07-22T10:24:39.000000Z

> I mean, hypothetically you might think computer security companies would be full of passionate computer security enthusiasts.

Eh, I mean, you might think that, absent any other information about the industry, but they're largely not.

Muromec · 2024-07-22T08:52:47.000000Z

Indeed, the issue could have been planted there by some nefarious occult hand

bawolff · 2024-07-22T06:01:17.000000Z

Some part of a company already aware of an issue but different part still ships is a pretty common tale and seems much more likely than some nefarious conspiracy theory. (And that is even assuming this is the same issue, which seems questionable)

After all, who exactly would benefit from such a nefarious scheme to crash windows computers? Certainly not Crowdstrike.

rramadass · 2024-07-22T06:40:09.000000Z

> who exactly would benefit from such a nefarious scheme

State Actors, given the current Geo-Political tensions.

You have to take a all-in-all broader view. I remember a while ago Kaspersky was accused of data-siphoning/spying from computers it was installed on and other nefarious activities. See New Government Ban on Kaspersky Would Prevent Company from Updating Malware Signatures in U.S. - https://www.zetter-zeroday.com/new-government-ban-on-kaspers...

As for your opening statement "Some part of a company already aware of an issue but different part still ships is a pretty common tale" is not applicable here since this code runs in kernel mode (in both OSes) and thus would be subject to far far greater scrutiny and testing than an ordinary app. As Dave Plummer points out in his analysis Microsoft Kernel Drivers are signed and certified after an exhaustive testing process. Even if Crowdstrike wrote their drivers as an interpreter and the data update files were actually programs in some p-code, Microsoft would have definitely known of it and its inherent vulnerabilities. I would bet money that Microsoft knows all about preventing threats/vulnerabilities than any other company simply because of their long experience and large userbase and thus would not have allowed Crowdstrike such a free hand.

krisoft · 2024-07-22T08:14:31.000000Z

> State Actors, given the current Geo-Political tensions.

I love a good conspiracy just like anyone. And i certainly hope the relevant authorities will take a good, deep look at what knocked over the dominoes CrowdStrike set up in a line. But i just don’t see how those state actors would benefit from this. There is damage, both financial and humans harmed, but is that the best a state actor could do? I would have thought they would sync such an action with other measures for maximum impact.

> You have to take a all-in-all broader view.

That is always wise. Can you tell us more? In particular could you spell out how the Kaspersky ban factors in here in your opinion?

> As for your opening statement "Some part of a company already aware of an issue but different part still ships is a pretty common tale" is not applicable here since this code runs in kernel mode (in both OSes) and thus would be subject to far far greater scrutiny and testing than an ordinary app.

Are you saying that this scrutiny is somehow enough to overcome companies natural tendency to be disparate and unorganised?

rramadass · 2024-07-22T10:18:39.000000Z

> But i just don’t see how those state actors would benefit from this.

You should never stop at obvious/superficial explanations but look at all scenarios (i.e. Game Theory probabilities) including "false flag" operations. Eg. a) What might have happened elsewhere when the world's attention was focused on this one incident? Did we miss something of greater importance? b) Was this a dry run/false flag to get businesses to tighten their cyber defences because somebody knows something about what might be forthcoming? c) The Russia/Ukraine war seems to be entering a critical phase with increasing incidents across NATO countries; see https://edition.cnn.com/2024/07/10/europe/russia-shadow-war-... etc. etc. At the minimum there has already been billions in damage and counting; one Australian report - https://www.youtube.com/watch?v=YedowOtznNo

> how the Kaspersky ban factors in here in your opinion?

Because this is very recent news; see https://www.zetter-zeroday.com/kaspersky-lab-closing-u-s-div... Is somebody flexing their attack capabilities just to demonstrate they can do it without Kaspersky? Also the US govt. has specifically banned "updating of malware signatures" in Kaspersky software which was exactly the vector used with Crowdstrike.

> Are you saying that this scrutiny is somehow enough to overcome companies natural tendency to be disparate and unorganised?

Yes. Companies do not treat kernel mode code with the same laissez-faire attitude that they might take with user mode apps. In particular, Microsoft has the most experience with this given their long history/evolution/problems and sheer number of installations. That they would allow some third-party software to bypass their testing/certifications is unbelievable to me. I am sure they would have also done some formal verifications on this as well. Remember Crowdstrike was meant to help prevent zero-day vulnerabilities and hence they would have looked at it closely.

When certain things happen at a global scale, you have to take a global view, factor in parameters like Geopolitical tensions, Economic advantages/disadvantages, Propaganda, etc. and simulate all possible scenarios one by one w.r.t. all parameters.

Remember Clausewitz, “War is not merely a political act but a real political instrument, a continuation of political intercourse, a carrying out of the same by other means”.

Also Sun Tzu, “All warfare is based on deception. Hence, when we are able to attack, we must seem unable; when using our forces, we must appear inactive; when we are near, we must make the enemy believe we are far away; when far away, we must make him believe we are near.”

Finally, you might find the classic Deception - The Invisible War Between the KGB and the CIA by Edward Jay Epstein very relevant here - https://archive.org/details/Deception-TheInvisibleWarBetween... "Deception" is the foundation for everything and "Asymmetric Warfare (Cyber and others)" is the name of the game today.

bawolff · 2024-07-22T19:44:06.000000Z

> State Actors, given the current Geo-Political tensions.

I disagree. If state actors had this type of capability they would use it to spy on big companies. The espionage potential is huge. They wouldn't waste it on causing a minor inconvinence.

> is not applicable here since this code runs in kernel mode (in both OSes) and thus would be subject to far far greater scrutiny and testing than an ordinary app

Lol. What next? Politicians always tell the truth? Everyone gets a free unicorn? This is just obviously not how the world works. There is a long history of anti-virus software being kind of crap.

rramadass · 2024-07-22T22:04:33.000000Z

> If state actors had this type of capability they would use it to spy on big companies.

Who says that is not ongoing? You just don't hear about it that much because the companies downplay/hide it for obvious reasons.

> They wouldn't waste it on causing a minor inconvinence.

This is not a "minor" inconvenience. The losses to the Economy are already running into billions and counting. See for example https://www.youtube.com/watch?v=YedowOtznNo

> Lol. What next? Politicians always tell the truth? Everyone gets a free unicorn? This is just obviously not how the world works. There is a long history of anti-virus software being kind of crap.

Snark/Glibness is not an argument. I have worked in Network Security and know for a fact that Kernel mode code is treated differently than User mode code in terms of scrutiny/testing/staging/release. Second, Crowdstrike is not just another anti-virus software; they are far more broader in scope/complex and hence their wide user base. Microsoft with their wide experience would have definitely processes in place to validate them comprehensively. Hence one should be cautious in taking this incident at face value and investigate everything thoroughly. I am almost sure multiple lawsuits are in the offing but am not so sure whether the full story will come out.

logicchains · 2024-07-22T06:29:28.000000Z

>After all, who exactly would benefit from such a nefarious scheme to crash windows computers?

Russia or China would certainly benefit from the ability to do this at a time of their choosing, and it's possible they could have an agent inside Crowdstrike, especially given China's history of industrial espionage.

bawolff · 2024-07-22T19:37:21.000000Z

You think russia or china would benefit more from this then having a stealthy rootkit on basically every important computer in the world?

I think russia or china are probably the least likely purpotrators possible. Their incentives strongly disalign with this.

tgv · 2024-07-22T06:09:13.000000Z

Wrt trying to get something out of the crash: https://www.theregister.com/2024/07/19/cyber_criminals_quick.... The conspiracy-minded could be suspicious of the quick response to this outage. "They must have known in advance!" Or they could suspect an overeager account manager at Crowdstrike, who wanted to show how important the product is.

But indeed, this really sounds like it was an internal error.