This article focuses on IOPS and throughput, but what is also important for many applications is I/O latency, which can be measured with ioping (apt-get install ioping). Unfortunately, even 10x PCIe 4.0 NVMe do not provide any better latency than a single NVMe drive. If you are constrained by disk latency then 11M IOPS won't gain you much.
> Does this come up in practice? What kind of use cases suffer from disk latency?
One popular example is HFT.
And from my experience on a desktop PC it is better to disable swap and have the OOM killer do his work, instead of swapping to disk, which makes my system noticeable laggy, even with a fast NVMe.