Yeah, me too, I found that pretty shocking actually. So shocking that I basically didn't believe it, so I tried myself (in C++, but it doesn't really make a difference). With 2000 threads, the computer had no problem whatsoever, the process only took around 16 megabytes of memory and not very much CPU.
So I bumped it up to 20,000, thinking that would probably work as well: the computer immediately crashed and rebooted. I didn't even get the "kernel panic" screen, it just died (this is on a 2018 mac mini with a 3.2 Ghz Core i7). When it turned back on, this was the first line of the error message:
Panic(CPU 0, time 1921358616513346): NMIPI for unresponsive processor: TLB flush timeout, TLB state:0x0
Weird. I really thought that this wouldn't be an issue at all. And: if it was an issue, that the OS would be able to handle it and kill the process or something, not kernel panic.
Yeah, that’s pretty worrying that the OS just punts. I always thought limiting the potential damage a user space process could do was one of the main jobs of an OS.
If you have any more OSes laying around to run the test on, I’d be interested to hear how well windows and Linux handle the same thing.
Because on the face of it this seems like a serious bug in the OS. I’m only used to seeing this sort of thing with bad drivers
2000 threads does nothing - everything's still responsive, and the process is shown as using 0% of the CPU.
16,000 threads uses ~30% of a core, with a ~136MB RSS. The system still handles it fine, though, and everything stays responsive.
At 20,000 the program panics when spawning threads, with the message "failed to set up alternative stack guard page" due to EWOULDBLOCK. I'm not sure exactly what limit it's hitting, though.
Sounds like it's having trouble allocating memory for the stack and stack guard. Whatever limit it's hitting though, Linux seems to be able to handle it correctly, which is to kill the process instead of a kernel panic.
It's probably about system resources - VM reservations for each stack and heap etc. Not a lot of checks inside kernel thread creation code; and not a lot to do about it if anything fails. My friend Mike Rowe said it this way: its like having an altimeter on your car so if you go off a cliff, you know how far it is to the ground. When hard limits on system resources are exhausted, it can be very hard to write code to recover. Imagine if the kernel call fail recovery code needed to create a thread to handle the exception!
I believe that this is related to the operating system being overwhelmed with waking up every 100ms on 2k threads. This example is not that great though. Depending on the OS and CPU you should be able to run much higher amount of threads.
I'm not too surprised. mac os isn't tuned out of the box for high loads, and in some areas really can't be (there's no synflood protection, since they forked FreeBSD tcp from months before that was added, for example)
Not that 2k threads is really that high; but it's probably high enough to break something.
If you hit an explicit limit, I'd expect the thread spawn to fail, and most processes to panic if thread spawning failed, sure.
But if you run below the explicit limit, but above the implicit limit of whatever continues to work, it's not surprising to me that the OS just freaked out.
You could report it to Apple, but their reporting process is pretty opaque, especially if you're not a registered developer (because why would you be, if you're just using mac os because you like the desktop environment, or whatever). Who knows if they'll fix it, but it's not worth making a big deal over, because you weren't really wanting to run 2k threads on mac os anyway, because it would suck, even if it did work.
From the other message on the thread; it looks like too many threads is causing a watchdog timer to fail, leading to the panic.
I mean, I would expect most OSes to today. If you run a couple thousand threads on windows 10, or Android 10, or a recent Linux or BSD, it's probably fine (ish).
But probably not on Windows 9x, maybe not even on NT4 (although, NT4 was pretty solid), and I wouldn't expect good results on Linux or a BSD from 2000 either.
But macos has lowered my expectations. It's a mash of FreeBSD from the turn of the century, with Mach from earlier, and whatever stuff they've fiddled with since then, plus a nice UI layer. They don't regularly pull in updates from FreeBSD, and the killed their server line (which was mediocre at best anyway), so when it panics if you do something weird, it's not unexpected.
A quick stab on my linux laptop has it hitting at most 25% CPU utilisation, and consuming almost zero memory. Seems really odd that this would nuke a Mac somehow.