
Why not mmap? - nkurz
http://useless-factor.blogspot.com/2011/05/why-not-mmap.html
======
rdtsc
The main gist:

"mmap() interface is missing a non-blocking way to access memory."

At the end of the day your process will still be suspended when the page is
brought in from disk the first time it is read. In some situations, when you
can't afford your thread to be suspended waiting for disk IO (say if you could
do something else in the meantime, like service network requests) it would be
better to use AIO for example to schedule a file read into a buffer and be
notified later when it is finished ( <http://linux.die.net/man/3/aio_read> )

~~~
rbranson
The mechanism suggested by the author seems a bit absurd. Code would to have
to pre-warm regions of memory to be available in a non-blocking fashion before
accessing it. One of the major attractions to mmap() is that this is
unnecessary, the OS handles all of the read-ahead and caching for you.

It also seems silly to go through all of this, but end up using a thread pool
to perform the page faults, which brings back the issues of thread starvation,
context switches, and lock overhead which cooperative concurrency models try
to avoid.

I'm trying to think thru all the scenarios, but it doesn't seem as if it's
really practically possible (in C at least) to make mmap() I/O work in a
cooperative concurrency model.

~~~
jerf
Not everything is C. Anything from event-based libraries like Twisted or
Node.js up through Erlang or Haskell could potentially make very good and very
transparent use of this, if the implementation was slick enough, without
changing much code.

I'd say that generally, when the kernel can do something such that a
cooperating VM can get a speed boost without doing very much, that's a good
thing, all else being equal. Which it never is, when it comes to the kernel.
But it's worth thinking about, even if it's too complicated for C to use
reasonably. (It's time to let that bottleneck go anyhow. There's all kinds of
very good things that are simply too complicated to reasonably use in C, and
the solution is not to limit ourselves to C for the rest of eternity.)

~~~
asomiv
What does not being in C have anything to do with that? Whatever tricks
Twisted and Node.js and friends use can be done in C, just with a different
syntax, but at the end of the day it's the same machine code. I don't know how
Twistd handles file I/O but Node.js uses libeio which performs all file I/O in
a thread pool. It's no better than what you can do in C.

~~~
jerf
rbranson said: "but it doesn't seem as if it's really practically possible (in
C at least) to make mmap() I/O work in a cooperative concurrency model."

In that context, it ought to make sense.

Further, there's a set of technologies that are good, but range from difficult
to effectively impossible to use in C, and the list is growing, not shrinking.
We can't afford to bind ourselves to what will work in C forever. For
instance: Garbage collection, possible but hard and certainly a bit inelegant,
you're fighting C. Software transactional memory: _Don't even think about
doing it in C._ But it may still be a useful thing in some places. And so on,
for a mix of things. C is what C is, but "it doesn't practically work in C"
can't be allowed to be a fatal objection if we want to progress.

Sure, you can type anything you want in C. But _you won't_. And I know it.
Don't even try to argue that you will, it's obvious that you won't. Given the
current state of the programming language environment, it is an impossibly
small window to sail through to claim that Java, Erlang, Haskell, Javascript,
C#, C++, SQL, etc, all have no advantages over C because we can always do the
same thing in C (and I deliberately picked a wide range), yet C has some sort
of advantage over assembler that makes it worth using. I reject that C is a
Platonic default language that all others must justify their existence
against; it's merely one that got some popular OSes written in it for certain
good reasons, but that doesn't gold-plate it against criticism. This argument
hasn't been sensible for about 30 years now, and now it's just gibberish; we
_know_ better. Cost factors matter a _lot_.

~~~
rbranson
Definitely agree with you that abstraction matters. I use C as an example
because it's the canonical way in which we as developers can reasonably
converse about directly dealing with the machine. C is still important and
will continue to be important because it's arguably the best abstraction we
have to write code directly against the machine. GC and STM would be
orthogonal to the purpose of C.

It's really sort of pointless to argue what's possible in environments lacking
real, unsafe pointers, because the whole idea with mmap()'d I/O is that it
lets apps access files as if they are a region of memory. Java wraps up the
behavior in FileChannel, and most other platforms have a way to access mmap()
in some fashion, albeit the practical use cases are vanishingly small. In C,
it's done mostly for squeezing out bits of extra performance by shrinking
memory requirements/allocation cost and avoiding copies. Much of the safety
mechanisms used by VM environments invalidate or make unavailable the
shortcuts that are used in tandem with mapped I/O to achieve higher
performance.

In the end, it's all still triggering the virtual memory interrupt that will
load the page from disk if it's not in RAM, and the kernel is still going to
suspend the calling thread while it's happening. No way around that.
Alleviating this through thread pools would just re-introduce the issues of
inter-thread locking and context switch overhead, and present an additional
bottleneck; in the end just piling on extra complexity. I don't see that as a
net win at all.

While the post has some great information on mapped I/O, it seems as if the
author is dogmatic about the use of mapped I/O and is trying to find a way to
use this hammer in a situation where it might be inappropriate.

EDIT: To the author who replied: awesome, awesome. Ultimately this is great
stuff to talk about and these types of posts are what HN should really be
about, even though it's turned into startup TMZ.

~~~
jerf
Broad agreement in general, but I would observe there are still use cases for
non-bare-metal languages for mmap. It doesn't conflict with _all_ VMs. Haskell
could use it (it actually has surprising interfacing capabilities on this
front), Google says OCaml can use mmap though AFAIK it has a less useful
threading story, Lua can probably productively use it though again I'm unclear
on the threading. Erlang can't use it out-of-the-box but conceptually it could
be modelled as a port, though again whether you could get a performance win I
don't know. Mono and the JVM could use it, though again, primitive threading
story. Python and Perl have interfaces to them but you sacrifice so much
performance simply by using them that yeah, it doesn't much matter. But at
least in theory there are VMs that can productively use it.

------
forkqueue
mmap is how the Varnish web cache works - the cache files is mmaped in, and
the kernel does the rest.

Given the awesome performance of Varnish, I am surprised more applications
haven't taken this approach.

~~~
aaronblohowiak
Varnish is good because the size of a web page fits nicely with the size of
memory pages... if you have very many small objects, you'll have to get a bit
more clever about how you place them.

------
premchai21
The big problem with mmap seems to be handling I/O errors. The most
transparent uses of mmap are for executables. Failing to page in part of the
program can probably reasonably result in a crash. But what of a server
process handling a pile of different data on unreliable disks? Destroying the
entire process on a single I/O error isn't ideal.

On Windows it seems like the best way to handle these errors is by either
embedding SEH into the surrounding code or adding a vectored exception handler
globally. On Unix, you have to set a SIGBUS signal handler. But then, mmap is
apparently not guaranteed to be async-signal-safe if you want to remap a zero
page over the broken one, and longjmp out of a signal handler is its own pile
of potatoes; both seem to work on various modern Unixoids, but I haven't been
able to find documentation saying that they'll continue to work. And with
longjmp, or on Windows (where you can't remap pages over other pages directly,
that I know of), any surrounding code that accesses the map needs to be
abortable all the way up to a suitable error point rather than just having to
handle bogus values. Much code assumes that a simple memory access will not
cause a _recoverable_ exception that may result in reëntering the code later.

And if you're in a library on Unix, good luck getting permission from the main
process to alter signal handlers. The hook mechanism isn't as rich as that in
Windows, so with the exception of large application framework libraries that
are expected to take over the process anyway, it's an invasive and possibly
irreversible activity.

This is all sad, because I love the idea of mmap. I was tinkering with a C
library for accessing certain kinds of files, and I want to do it with all
mmap, but I'm not sure I can overcome the I/O error problem adequately. (The
blocking and address space problems are not too bad here; they impact
performance and capacity, but not correctness.)

~~~
rbranson
On top of all of that, using mapped I/O effectively means embracing it
directly. Adding an abstraction layer on-top that would allow a quick switch
to stdio would negate the benefits. Sucks.

------
phamilton
Another problem with mmap not mentioned is that not all file systems support
it. Filesystems like jffs2 which access NAND flash don't support mmap.

------
rbranson
mmap is great, but it often lures developers into thinking that in-memory and
on-disk data structures can be reasonably unified. It's still spinning media,
and even SSDs are orders of magnitude slower than regular RAM. Optimized data
structures should be used accordingly.

It's also not a hard rule that the OS cache is always better. It's probably
very good at block-level caching, so don't rewrite that, but that's not very
fine grained. The OS can't collect much more than some basic access statistics
and madvise()s to figure out what to keep resident. It's kind of dishonest to
make it seem like "advanced databases" like PostgreSQL should abandon their
own caches entirely. In fact, most PostgreSQL tuners suggest that only a
fraction of available RAM should be used as a buffer cache, and that it's
prudent to just let the OS manage most of it. The query planner can even be
advised as to how much effective cache space is available, including the OS
disk cache.

------
nicolas314
Another common use of mmap is allocating blocks of memory, e.g. by mapping
/dev/zero in private mode. A couple of convenience functions to use mmap can
be found here: <https://github.com/ndevilla/mmapi>

------
danssig
I don't get the point of things like this. If you remove every single blocking
part in your code then you will always use up your scheduling quantum and
become the lowest priority process (the mother of all blocking) anyway.

~~~
asomiv
If database throughput is that important to you then you will put it on an
idle machine that has no other services running, or assign a dedicated CPU
core to it.

------
kunley
Referring to comments below the original article:

So when epoll doesn't cope with regular files as author expected, why not use
better interface: kqueue?

~~~
bnoordhuis
linux doesn't have kqueue. But kqueue is only an API, you could emulate it
with io_submit() and io_getevents().

A bigger obstacle is that not all file systems support asynchronous I/O, the
io_*() syscalls won't help you there.

