> The point is that non-blocking IO wants to abstract away the hardware, but the...

amelius · on Jan 23, 2021

> Why do you say it doesn't match hardware?

Because the CPU itself can block. In this case on memory access. Most (all?) async software assumes the CPU can't block. A modern CPU has a pipelining mechanism, where parts can simply block, waiting for e.g. memory to return. If you want to handle this all nicely, you have to respect the api of this process which happens to go through the OS. So for example, while waiting for your memory page to be loaded, the OS can run another thread (which it can't in the async case because there isn't any other thread).

loeg · on Jan 24, 2021

A CPU stall on L3 miss (100ns?) is orders of magnitude shorter than the kinds of blocking IO we don't want to wait on (10s-100s of µs even for empty-queue NVMe; slower for everything else).

The OS can't run another thread while fulfilling an mmap page fault because it has to actually do the IO to fill the page while taking that trap. And in the async scenario, CPUs and high speed devices can do clever things like snoop DMAs directly into L3 cache, avoiding your L3 miss scenario as well.

The comparison between L3 miss and mmap faults is apples and oranges.