Even LMDB on the raw block device, which immediately did a physical write upon every logical write, wasn't that slow. It's just bizarre.
By the way - I think you can still get free access to run your own tests. https://twitter.com/IntelStorage/status/1010284314121129985
Let me know if you want to try to rerun these, and doublecheck the results. Also, we found that 4.x kernels performed significantly better than 3.x kernels on these tests.
Shouldn't they use mmap instead of syscalls for byte-addressable storage to avoid switches into kernel space?
Using mmap with infrequent msync calls would mean you're running with looser data consistency guarantees. That may be suitable for some use cases, but it doesn't necessarily make for fair benchmarking.
To answer the parent post, when using the raw block device we're actually using mmap already. That's what LMDB means: Lightning Memory-Mapped Database. And while not all raw devices support read/write calls, if they support mmap we use it. But writing thru mmap actually performs poorly for larger-than-RAM DBs. Whenever you access a new page, the OS takes a page fault to page it in from storage first. It's a wasted I/O in this case because we're about to overwrite the entire page.