Hacker News new | past | comments | ask | show | jobs | submit login

> The point is that non-blocking IO wants to abstract away the hardware, but the abstraction is leaky.

Why do you say it doesn't match hardware? Basically all hardware is asynchronous — submit a request, get a completion interrupt, completion context has some success or failure status. Non-blocking IO is fundamentally a good fit for hardware. It's blocking IO that is a poor abstraction for hardware.

> Most programs which use non-blocking IO actualy want to implement multitasking without relying threads. But that turns out to be the wrong approach.

Why is that the wrong approach? Approximately every high-performance httpd for the last decade or two has used a multitasking, non-blocking network IO model rather than thread-per-request. The overhead of threads is just very high. They would like to use the same model for non-network IO, but Unix and unix-alikes have historically not exposed non-blocking disk IO to applications. io_uring is a step towards a unified non-blocking IO interface for applications, and also very similar to how the operating system interacts with most high-performance devices (i.e., a bunch of queues).




> Why do you say it doesn't match hardware?

Because the CPU itself can block. In this case on memory access. Most (all?) async software assumes the CPU can't block. A modern CPU has a pipelining mechanism, where parts can simply block, waiting for e.g. memory to return. If you want to handle this all nicely, you have to respect the api of this process which happens to go through the OS. So for example, while waiting for your memory page to be loaded, the OS can run another thread (which it can't in the async case because there isn't any other thread).


A CPU stall on L3 miss (100ns?) is orders of magnitude shorter than the kinds of blocking IO we don't want to wait on (10s-100s of µs even for empty-queue NVMe; slower for everything else).

The OS can't run another thread while fulfilling an mmap page fault because it has to actually do the IO to fill the page while taking that trap. And in the async scenario, CPUs and high speed devices can do clever things like snoop DMAs directly into L3 cache, avoiding your L3 miss scenario as well.

The comparison between L3 miss and mmap faults is apples and oranges.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: