Now I'm kinda curious to see how much faster you could go on an M1 Max with the ...

gpderetta · on Oct 28, 2021

Pv never touches the data. It splices the input into the output, /dev/null in this case, so once written, a page is never touched again.

Splice is linux specific though, so you would need to run it on M1.

HALtheWise · on Oct 29, 2021

If that is the case, does it actually matter what CPU core pv runs on? I feel like _something_ must ultimately zero the page out before it can get re-mapped to the process, but I'm not sure which core that happens on, or whether there's some other hardware mechanism that allows the OS to zero a page without actually utilizing memory bandwidth.

londons_explore · on Oct 29, 2021

The page is reused by the generator process without being zeroed.

jason0597 · on Oct 28, 2021

Couldn't we load it onto an NVIDIA RTX A6000? It is much much faster than the M1 Max. It has a much greater memory bandwith too

HALtheWise · on Oct 29, 2021

Unfortunately, the memory bandwidth that matters here is not bandwidth to GPU memory, but bandwidth to main system memory (unless anyone knows how to splice a pointer to GPU memory onto a unix pipe). That's specifically why the M1 came to mind, as a powerful UMA machine that can run Linux. Perhaps a modern gaming console could hit the same performance, but I don't know if they can run Linux.

creddit · on Oct 28, 2021

Wouldn’t benefit from UMA

pcwalton · on Oct 29, 2021

PCIe 5 speed is 64 GB/s, so theoretically if you perfectly pipelined the result you could achieve the same performance on a next-generation discrete GPU.