
Mmap(2) - luu
http://jmoiron.net/blog/mmap2/
======
rdtsc
As a fun heuristic I have come to believe that mmap advocacy and use for
storage is somewhat correlated with medium experience level.

At first junior developers don't know about it, then, they read a blog, hear
something at a meeting about it and all of the sudden they fall in love with
mmap. Then they want to re-write every storage related thing using it claiming
it will be lightning fast and efficient. "You just write to memory and kernel
writes to file, that so awesome, here I refactored your storage backend to use
it over the weekend!"

Some might even venture to write whole database products based on it.

But then one day they "grow up" and realize what a great pain mmap can be,
their storage backend is being hammered constantly by input data and dirty
pages are not flushed fast enough, and then they are they start blocking other
processes for seconds at a time. THey start tweaking madvise setting trying to
control behavior, adjusting number of pdflush threads, etc. Or, and also
sigbus errors are fun...

~~~
striking
And yet it exists, and works gloriously:
[http://symas.com/mdb/doc/](http://symas.com/mdb/doc/)

The kernel is smarter than your app. Trying to force it to do something it
doesn't want to do is a very bad idea.

~~~
mveety
Kernels generally aren't all that smart. They're usually dumb and simple
programs that do things based on some average defaults that look okay and
might work for most people. The kernel needs to be told what to do especially
if data integrity is important. Like if some dude thought it was a great idea
to store up writes and do them in large transactions and you're writing a
filesystem that needs verifiable writes, you turn that off or force it off if
you can't disable it otherwise. Relying on the utter sausage factory that is
your average kernel is a recipe for disaster. That is especially true for
Linux in my experience.

~~~
striking
I said the kernel is smarter than your app, not psychic. Of course you need to
tell Linux what to do with specialized applications.

However, any option worth disabling can be disabled. Nothing needs to be
forced off like the writer of the article did. mmap'd writes should be flushed
as soon as possible, not dragged out, especially in the SSD age we have today.
What the writer did was actually counterintuitive and makes no sense. None.

------
vanviegen
So what's the alternative? Doesn't data dirtied by a write() experience the
same or similar messy flushing behavior?

~~~
rewqfdsa
It depends. Are you using aio? O_DIRECT? write(2) gives you plenty more
options and these days, it's no less efficient than mmap.

The whole idea of system calls being expensive is obsolete now that we're not
using heavyweight software interrupts to implement them.

What the fuck do you think happens when you write to a mmap(2)ed page anyway?
The kernel has to know the page is dirty somehow so that it can remember to
flush it later. It learns about page dirtying by initially mapping the page
read-only, letting the CPU fault when you try to write, marking the page
dirty, and letting the write proceed.

You're entering the kernel either way.

~~~
hyc_symas
"system calls being expensive is obsolete now" \- utter nonsense. System calls
are still expensive. Argument checking, copying data between user space and
kernel space - none of that cost goes away.

~~~
mveety
It's far from nonsense. Everything that happens when you call a syscall
nowadays is minimal compared to firing off interrupts to do the job. Memory
operations are pretty cheap and the argument checking is probably only 20 or
so instructions at most. A lot of the cost has gone away. If you still believe
this, run dtrace on some random programs and be _horrified_ over how many
syscalls are being fired off.

~~~
eloff
Syscalls are hugely expensive. Not for those reasons, but because of the way
they trash the cache, causing reduced performance for a long time after
returning control to userspace. This is why to really achieve the performance
the hardware is capable of, both for nvram ssds and for high speed network
interfaces, you need to bypass the kernel completely. You can do a little
better using calls that amortize the costs of the syscalls over multiple units
of work, but even with the best optimizations it's still an order of magnitude
less than what the hardware is capable of.

~~~
rewqfdsa
Sure. The problem is that intermediate-skill developers read comments like
yours and just glean "lol, syscalls are slow" from them. They then go on to
use mmap and kill MM performance. It's important to keep your audience in
mind. Most developers do not write high-performance IO layers that talk
directly to hardware. Most developers will blindly follow any sufficiently
authoritative voice that tells them X is better than Y, which leads them to
misuse powerful tools.

In the vast majority of cases, you want plain read and write. In the vast
majority of cases, the overhead of jumping into the kernel is not your
bottleneck. Simplicity counts for a lot.

~~~
hyc_symas
Yes, simplicity counts for a lot. Staying in user space _is_ simpler.

The reason LMDB can pack so much functionality into only 7KLOCs is because
using mmap is much simpler than using read() and maintaining a user-level
buffer cache.

In the vast majority of cases, developers need to profile their code if they
actually care about performance - and fail to do so.

------
todd8
These ideas have been around for a long time. I remember using the ideas a
quarter of a century ago! See DEC SRC Report 24, "A simple and efficient
implementation for small databases", by Birrell, et al. 1988.
[http://web.archive.org/web/20051215194340/http://gatekeeper....](http://web.archive.org/web/20051215194340/http://gatekeeper.research.compaq.com/pub/DEC/SRC/research-
reports/abstracts/src-rr-024.html)

I used to read these reports as they came out, very interesting.

