On N-core, 2-way SMT hardware (which describes pretty much all consumer hardware), the maximum amount of threads that can be usefully doing work in parallel is between N and 2 * N. Any more threads than that, and it must be the case that you expect those threads to spend a lot of time doing nothing to justify the cost of their creation.
There's two main categories that can justify that. The first is threads that are spending a lot of time stuck in blocking I/O--i.e., they're calling things that exist as yield-points in practical kernels anyways. The second thing is some sort of event processing loop, which looks like kind of like this:
while (!shutdown) {
event_t *e = get_next_event(queue);
if (!e) { sleep_until_work(queue); }
else { execute(e); }
}
The call in sleep_until_work already today uses a syscall to indicate that the thread shouldn't be scheduled. But even get_next_event can likely be trivially modified to use a syscall to add an event loop. Since multiple threads are able to enqueue events (else who would fill in work while you're sleeping?), you need some sort of threading library support to implement get_next_event correctly. Change the equivalent of pthread_condvar_wait for your OS to introduce a yield point, and you'll have introduced yield points to the vast majority of applications.
most programs hopefully do the sensible thing. But just having the possibility of one browser tab to spawn 8 web-workers that don't do any IO but just busy-wait makes it clear that current implementations of common software don't play well with that model.
It might not lock-up your OS but it would lock up all of the userland and the kernel couldn't do anything about it, because it can't pre-empt. All it could do under that model is let the user kill the browser or just kill the browser itself automatically.
I think most people would prefer being able to use their PC for other things while transcoding a video or compiling a program at near the PCs full capability. Instead of sacrificing a whole core to the OS and then also having to wait longer for those tasks to finish, plus not being able to do anything else.
A model with less pre-emption is certainly possible, but current end-user software makes an approach without any pre-emption very in-advisable.
It is true that it's possible to screw your machine your over in that manner. I was pointing out that the large number of threads wasn't evidence in support of that fact, but rather the contrary; the threads have to already be using existing facilities that provide cooperative sleeping points.
On N-core, 2-way SMT hardware (which describes pretty much all consumer hardware), the maximum amount of threads that can be usefully doing work in parallel is between N and 2 * N. Any more threads than that, and it must be the case that you expect those threads to spend a lot of time doing nothing to justify the cost of their creation.
There's two main categories that can justify that. The first is threads that are spending a lot of time stuck in blocking I/O--i.e., they're calling things that exist as yield-points in practical kernels anyways. The second thing is some sort of event processing loop, which looks like kind of like this:
The call in sleep_until_work already today uses a syscall to indicate that the thread shouldn't be scheduled. But even get_next_event can likely be trivially modified to use a syscall to add an event loop. Since multiple threads are able to enqueue events (else who would fill in work while you're sleeping?), you need some sort of threading library support to implement get_next_event correctly. Change the equivalent of pthread_condvar_wait for your OS to introduce a yield point, and you'll have introduced yield points to the vast majority of applications.