> What you want is to have the CPU coordinate the SSD to copy chunks directly -and as is- to the NIC/app/etc.
Isn't that what DMA is supposed to be?
Also, there's work in getting GPUs to load data straight from NVME drives, bypassing both the CPU and system memory. So you could certainly do similar things with the PCIE bus.
A big problem is that a lot of data isn't laid out in a way that's ready to be stuffed in memory. When you see a game spending a long time loading data, that's usually why. The CPU will do a bunch of processing to map on disk data structures to a more efficient memory representation.
If you can improve the on-disk representation to more closely match what's in memory, then CPUs are generally more than fast enough to copy bytes around. They are definitely faster than system RAM.
Isn't that what DMA is supposed to be?
Also, there's work in getting GPUs to load data straight from NVME drives, bypassing both the CPU and system memory. So you could certainly do similar things with the PCIE bus.
https://developer.nvidia.com/blog/gpudirect-storage/
A big problem is that a lot of data isn't laid out in a way that's ready to be stuffed in memory. When you see a game spending a long time loading data, that's usually why. The CPU will do a bunch of processing to map on disk data structures to a more efficient memory representation.
If you can improve the on-disk representation to more closely match what's in memory, then CPUs are generally more than fast enough to copy bytes around. They are definitely faster than system RAM.