Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How is it so fast? Files are mmap()ed instead of read into a buffer.

It's hard to believe this would give a significant performance boost. Is there evidence of this?




In my benchmarking, mmap() was about 20% faster than read() on OS X, but the same speed on Ubuntu. Pretty much everything else in the list (pthreads, JIT regex compiler, Boyer-Moore-Horspool strstr(), etc) improves performance more than mmap().

Also, mmap() has the disadvantage that it can segfault your process if something else makes the underlying file smaller. In fact, there have been kernel bugs related to separate processes mmapping and truncating the same file.[1] I mostly use mmap() because my primary computer is a mac.

A side note: Parts of OS X's kernel seem... not very optimized to say the least. See the bar graph at http://geoff.greer.fm/2012/09/07/the-silver-searcher-adding-... for an example.

1. http://lwn.net/Articles/357767/


There is a nice (and often posted) mailing list post explaining some of the reasons GNU Grep is so fast:

http://lists.freebsd.org/pipermail/freebsd-current/2010-Augu...

He mentions: "So even nowadays, using --mmap can be worth a >20% speedup."


Now I'm burning with curiosity. I have to know why! My plan:

- replicate the experiment, confirm --mmap shaves off a non-negligible amount of time. It could be that his computer happened to be running something in the background that was using his harddrive, for example, which would skew the results.

- look at the code, figure out the exact difference between what --mmap is doing and what it does by default. Confirm that the problem isn't in grep itself (it's probably not, but it's important to check).

- dig into the kernel source to figure out the difference under the hood and why it might be faster.


I wonder if it has to do with not having to copy data back and forth between kernel and userspace. My mildly uneducated thought is that you could do this with splice() or whatever, but mmap is an easy drop-in replacement.

edit: I've been reading your posts for a while and I like them, but I keep wondering, why do you have sillysaurus1-2-3?


That's what has me so curious, because it doesn't seem like copying between kernel/userspace should account for a 20% speed drop. Once data is in the L3 CPU cache, it should be inexpensive to move it around.

Regarding my ancestry, I'm sillysaurus3 because I've (rightfully) been in trouble twice with the mods for getting too personal on HN. I apologized and changed my behavior accordingly, and additionally created a new account both times to serve as a constant reminder to be objective and emotionless. There's rarely a reason to argue with a person rather than with an idea. Debating ideas, not people, has a bunch of nice benefits: it's easier to learn from your mistakes, it makes for better reading, etc. It's pretty important, because forgetting that principle leads to exchanges like https://news.ycombinator.com/item?id=7700145

Another nice benefit of creating a new account is that you lose your downvoting privilege for a time, which made me more thoughtful about whether a downvote is actually justified.


Possibly the OS is doing interesting things with file access and caching and opting out of that has benefits for this particular workload?

...

I just skimmed the bsd mailing list email on why grep is fast which was linked up-thread, and it seems that's somewhat the case. It sounds like since they are doing advanced search techniques on what matches or can match, they use mmap to avoid requiring the kernel copy every byte into memory, when they know they only need to look at specific ranges of bytes in some instances. At least that was the case at some point in the past.

Finally, when I was last the maintainer of GNU grep (15+ years ago...), GNU grep also tried very hard to set things up so that the _kernel_ could ALSO avoid handling every byte of the input, by using mmap() instead of read() for file input. At the time, using read() caused most Unix versions to do extra copying.

P.S. Nice attitude, it earned an upvote from me. Which is probably one reason why your third account has more karma than my first.


Right, I think the point of boyer-moore is that it allows to eliminate / skip large chunks of the text during the search.

So the assumption is that those pages don't even ever get swapped in, but I think that'd only be the case when the pattern size is at least as large as the page size (usually 4KB!), which is not the case in the example in the mailing list. So the mystery continues!


You can do this with large reads:

     read(fd, buf, 100 megs)
can, in the kernel, do something like:

     read(fd, buf[0:first-page-boundary])
     remap(fd, buf[first-page-boundary:last-page-boundary])
     read(fd, buf[last-page-boundary:end])
There you go, zero copy reads. Or at least, minimal copy reads -- at most you will get 2 pages worth of copying.


The last time I had to do fast, large sequential disk reads on Linux it was surprisingly complex to get all the buffering/caching/locking to not do the wrong thing and slow me down a lot. I wouldn't be surprised if non-optimized mmap() is a whole lot faster than non-optimized use of high level file i/o libraries.


[deleted]


If anything, that post is evidence of how tricky optimization is, and how easy it is to fool yourself about what matters. It's probably best to be skeptical about mmap() as a performance optimization over reading into a buffer unless evidence demonstrates otherwise. Most OS's do a pretty good job of caching at the filesystem level, and under the hood paging is essentially reading into a buffer anyway. mmap() might make the code simpler, but it's hard to imagine it makes it faster. If it does, I'd like to understand why.

EDIT: The post was http://geoff.greer.fm/2012/08/25/the-silver-searcher-benchma...


> It's hard to believe this would give a significant performance boost.

Why is that so hard to believe? It's a standard optimization—the kernel can almost certainly coordinate reading better than your userspace C can.


So are we talking about constant-time optimization, then? I.e. it shaves off a few milliseconds regardless of how complex the search is, or how many files it's reading, or how large each file is. I'll happily concede that mmap() might do that. But a performance boost linear w.r.t. search complexity/number of files/filesize? Hard to believe, and I should go measure it myself to prove the point or learn why I'm mistaken.


Likely a constant time improvement... for each file being searched.

I don't think that anybody is claiming that mmapping actually changes the algorithmic complexity of the actual search operations.


Constant-time improvements are still improvements, especially if they're in an inner loop. Otherwise we would all be using Python and just writing great algorithms.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: