
Mmap(): Now you're coding with portals - luu
http://www.willemthiart.com/2011/09/mmap-keeping-code-elegant.html
======
eridius
Mike Ash covered the idea of using virtual memory tricks to handle ring
buffers 3 years ago, written up as [https://mikeash.com/pyblog/friday-
qa-2012-02-03-ring-buffers...](https://mikeash.com/pyblog/friday-
qa-2012-02-03-ring-buffers-and-mirrored-memory-part-i.html).

Unlike the OP's article, Mike Ash's version uses vm_remap() to remap memory
around instead of hitting the filesystem and relying on tempfs to keep the
data in-memory. vm_remap() is an OS X API, and I don't know offhand if there
is any equivalent on Linux (though I would be surprised if there isn't some
way to do the same thing).

~~~
adrusi
mmap doesn't necessarily have to hit the filesystem, as there are several ways
to create file descriptors that are backed by memory (POSIX shared memory,
typed memory objects, for instance).

~~~
dividuum
Is this the general idea?

    
    
        int fd = shm_open("test", O_RDWR|O_CREAT, 0600);
        ftruncate(fd, 4096*2);
    
        char *part1 = mmap(NULL, 4096, PROT_WRITE|PROT_WRITE, MAP_SHARED, fd, 0);
        char *part2 = mmap(part1 + 4096, 4096, PROT_WRITE|PROT_WRITE, MAP_SHARED|MAP_FIXED, fd, 0);
    
        part1[0] = 'X';
    
        assert(part2 - part1 == 4096);
        assert(part1[0] == part2[0]);
    
        shm_unlink("test");

~~~
agwa
I've programmed with POSIX shared memory, and yes, that's the general idea.

A very important caveat is that the POSIX shared memory namespace is shared
among all processes, so you need to wrap shm_open with a mkstemp()-style
algorithm that generates a random name, opens with O_EXCL, and tries again if
it fails. Unfortunately it's very easy to mess that up and introduce a
security vulnerability.

~~~
hawski
It would be also prefferable to unlink just after /open/ call.

Wouldn't mkstemp("/dev/shm/tmp-XXXXXX") be enough. As strace tells /shm_open/
is just a wrapper for /open/ with /dev/shm/ prefix.

~~~
agwa
On Linux, shm_open is implemented with /dev/shm, but that's not the case on
other platforms.

------
Animats
At first I thought this was for interprocess communication. But it's not. It
doesn't even have the locking for multithread use. This is a micro-
optimization for a single-thread, single process program.

The advantage gained with all this memory mapping is that you get to avoid an
extra copy coming out of the buffer, because the "poll" function returns a
pointer into the buffer, not the data itself. Avoiding that copy creates a
potential race condition. When "poll" is called, and returns a pointer into
the buffer, it advances "head", indicating the data has been consumed. That
space is now both available for writing and being used by the caller as if
immutable. The code that calls "poll" must be done with the data before anyone
calls "offer". You've now created an undocumented constraint on the callers to
"poll" and "offer". If someone doesn't know that constraint and modifies the
code, it will randomly break.

Is this micro-optimization really worth it? Modern CPUs are good at copying
recently touched data.

~~~
yew
This strikes me as being interesting primarily for the potential impact it has
on interface design. There are applications of (this sort of) virtual memory
manipulation beyond circular buffers. Performance is one consideration, but
not the only one.

Also, this is a demonstration - of course it lacks synchronization mechanisms
for multithreading! Any particular application of the principle would be
adapted for the context in which it occurred (and hopefully be justified
thereby).

(As an aside, that caveat only applies if you begin with a _poll_ function
that performs the copy itself. That implementation isn't the only obvious one,
especially given a large buffer - though I suppose there's room for
disagreement on that.)

------
jwatte
I agree: Magic ring buffers are cool! (We used then in BeOS in 1999!)

Separately, watch the number of mmap segments. Linux kernel uses a tree to
manage then, and the O(log N) operations really start to hurt at larger
numbers.

------
yifanlu
Does this play nice with caches? I know some systems like ARM allow aliases in
the MMU which will ensure cache coherency, but it is system dependent and a
lazy implementation would just disable caching and slow down the code.

~~~
Tuna-Fish
On physically addressed cache architectures, like all x86 implementations,
this has no ill effects. With virtually addressed caches like on ARM, this is
generally a bad idea.

------
danbruc
It is a neat idea, indeed, but from a design perspective it is pretty bad. A
general purpose implementation of a collection should always copy read results
to a separate buffer. Otherwise a malicious client could use the pointer to
modify the content of the collection or at least, if the memory is read-only,
read the content directly potentially bypassing necessary checks.

Further without external synchronization writers may at any point overwrite
data still not completely processed by a reader. This could be solved by first
just peeking at the data and only after the data has been completely processed
performing a read to indicate that this data can now be overwritten. But this
obscures the semantics of the operations and breaks multiple reader scenarios
because all readers will see the same data until the first reader finished
processing it.

There may be and there probably also is a use for this trick but 99 % of the
time you should probably not consider doing something like that.

~~~
pjc50
_malicious client_

Inside the same process? Is this really a risk one can sensibly defend
against? A malicious client can take your copy of the data, scan the entire
memory space of the process for the other copy, and overwrite that.

~~~
danbruc
Yeah, you are probably right. I mostly used managed languages for the last ten
years or so and slowly start forgetting what unrestricted access to the entire
address space even means. In a managed context not everything is lost if you
have malicious code in your process, but then again it would probably be quite
hard to make use of manual address space mappings there. So I retreat my
position to making life a bit harder for malicious code if you avoid handing
out pointers into your private data.

------
rdtsc
Very cool stuff. I like the trick.

Mmap areas can be tricky sometimes if you directly cast their areas to a
struct, depending on compiler optimization you might have make some things
"volatile". I remember hitting a bug along those lines.

You'll also get SIGBUS errors on Linux. Was kind of suprised first time by
those as well.

~~~
asveikau
mmap seems very awesome when you first get to know it. You enter one of those
"I just found a new programming technique" phases where you naively want to do
all your I/O that way because you have just seen the light.

Then hopefully you start to understand the SIGBUS problem. I/O failure becomes
indistinguishable from a bad pointer dereference. Oh wait, maybe I/O and
memory really should be separate...

At least that's how I felt about it. From what I see many people do not reach
that last phase.

~~~
mtanski
With great power comes great responsibility. mmap is one of those tools.

Keep in mind that your whole linux system essentially mmaps your binaries /
shared libraries when you run an application. And with caveats our world still
keeps going around.

Error handling with mmap is a PITA but there's a few ways you can work around
the general cases:

Use used mapped region for reading data and then use write() for writing it.
That's what LMDB does. That's an assumption that's betting errors occurring in
the write path.

If you're doing IO in a tight loop you can catch the SIGBUS sent to your
thread (SIGBUS/SIGSEGV are always deliver to the thread that caused it). You
can deal with the fault via sigsetjmp/siglongjmp. This has all sorts of fun
draw backs (like if you're using C++ RAI after sigsetjmp).

~~~
asveikau
> Keep in mind that your whole linux system essentially mmaps your binaries /
> shared libraries when you run an application. And with caveats our world
> still keeps going around.

Yes, and it does very admirable things there, brilliant things I would say. If
you aren't going to touch the whole thing it doesn't have to load it from
disk. If there is memory pressure it can just evict the in-memory copies of
pages. All great stuff for that usage.

That said I have seen it cause issues. Most commonly I'd see it on Windows
(it's not called mmap there, but whatever, same issue) if you run an app from
a network share. Suddenly network timeouts make the whole app blow up. Not
cool. There is actually a flag in the EXE file format that says "if you run
this from a network, copy the contents to the pagefile first" \- meant for
exactly this scenario.

------
joosters
A nice trick!

An alternative API, if you are using a circular buffer that is just being
read() into or write() out of, is to make the I/O parts of your code use
readv() and writev() instead. The circular buffer call then returns either one
or two memory ranges depending on whether the range crosses the end of the
buffer or not. Then you achieve the same thing as the mmap trick, full reads
and writes with one syscal.

------
amelius
Why not just use the % operator to make the memory wrap around? It seems so
much simpler and less prone to errors. Ok, you'll need an extra ALU operation,
but these are cheap nowadays, especially if % is implemented by bit-masking.

Also, mmap may confuse the compiler, and to counter that you will have to add
"volatile" everywhere, which very likely implies a performance hit anyway.

