Did you do each of these after a clean reboot, or are we looking at possible cac...

burntsushi · on May 2, 2019

Yes, the file is in cache, and that was my intent. That's why my `time` output says `faults 0` for both runs. That is, no page faults occurred. Everything is in RAM.

That is indeed a very common case for ripgrep, where you might execute many searches against the same corpus repeatedly. Optimizing that use case is important.

For cases where the file isn't cached, then it's much less interesting, because you're going to just be bottlenecked on disk I/O for the most part.

> then we might be just comparing shared memory against IPC, and that's an obvious performance win, but not really what's intended to be examined here.

Please don't take my comment out of context. I was specifically responding to this fairly broad sweeping claim with actual data:

> This old myth that mmap is the fast and efficient way to do IO just won't die.

You might think the fact that this isn't a myth is "obvious," but clearly, not everyone does. The right remedy to that is data, not back of the napkin theorizing. :-)

If you want to try your own benchmarks in your own environment, then you can: https://github.com/BurntSushi/ripgrep/

On Linux at least, you do not need to do a clean reboot to measure something without cache. You can drop the file cache with `sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'`.