Hacker News new | past | comments | ask | show | jobs | submit login
Uvm: a BSD virtual memory system (2016) (pr4tt.com)
60 points by fanf2 on Sept 5, 2018 | hide | past | favorite | 14 comments

The author writes: Page loanout is when a process loans its memory to another process. This is useful particularly in networking, in which data can be sent to the kernel’s network stack simply by loaning the appropriate pages. This avoids the need for costly copy operations.

Does NetBSD/OpenBSD actually have a zero-copy sosend()?

I wrote one once for FreeBSD back in the 90s for some research work I was doing. FreeBSD, even at that time, had the primitives needed for a userspace application to loan pages to the kernel for transmit. However, unless the application had knowledge that it was using a zero-copy socket, it was a bit of a mess in practice, as most applications would not benefit, due to taking COW page faults by re-writing to memory that was loaned to the kernel (and was marked read-only). The other big problem was handling the mapping changes (eg, the marking memory read-only, and then restoring RW access). The whole thing was crying out for a better interface. I probably should have required applications use aio_write(), or something similar. That would have removed a lot of the overhead..

So, I just looked at their code. They do actually loan from sosend(), but they seem to have the same issues that I did, and disable loaning when on a multi-core system:

commit 386144702d14ee9cc5303a073383cf4e607a1ffe Author: ad <ad@netbsd.org> Date: Wed May 28 21:01:42 2008 +0000

    Disable zero copy if MULTIPROCESSOR, until it is fixed:

    - The TLB coherency overhead on MP systems is really expensive.
    - It triggers a race in the VM system (grep kpause uvm/).*

Not that I know of.

Isn't sendfile() the usual API to trigger zero-copy send ?

It isn't available in NetBSD yet.

Yes / no. Sendfile is for files. Sometimes you want to send normal anonymous memory.

Why do CoW in sosend(2) -- just have such a system call that requires the caller not to touch the memory being loaned until the writes are completed. CoW for memory is way too expensive.

This reminds me that CoW filesystems (think ZFS) and writing through mmap() don't play well. You end up having to use msync(2), which many apps assume there is no need for, and msync(2) is often terribly slow (ISRT at least one system ended up doing page-at-a-time sync writes!).

Now, what happens if you have a sosend(2) that requires the caller leave the memory alone, and the caller touches it anyways? Undefined behavior. Possible outcomes include: some CRC/hash/MAC will fail to verify, the mod will have come too late and not been included, the mod will have come soon enough to be included.

Yes, as I said above, I should probably have required a new API.

It could just be a new flag.

Also need some way to indicate the send is complete.

Eh, as usual. If the socket is non-blocking, then you'll find out how much did not fit in the send buffers and you can poll for when the socket becomes writable.

EDIT: Oops, no, the memory would not be copied to the send buffers, and since this is for IPC, we don't even need to account for buffering in this path. Also, for IPC, just sharing with the receiver doesn't work: you can't tell when the received will be done with the memory. You'd need the receiver to munmap() the memory when done, else you'll never know when it's done. Though presumably even sosend(2) as-is requires a munmap() on the receive side... but the docs I can find don't mention it, e.g., https://www.freebsd.org/cgi/man.cgi?query=sosend&sektion=9&m...

What happens in the case of TCP when the remote end hasn't acked the data yet?

I'm assuming this is an IPC case. If not then sosend(9) (ah yes, it's not a system call) would have to devolve into a proper sendmsg(). The memory might still be held onto by the kernel until sent and ack'ed.

The mechanism for completion notification (when the data is ack'ed in the non-IPC case, or when the data is unmapped in the IPC case) would be kqueue- or epoll- or whatever-based.

In any case, I'm skeptical of CoW for this. The problem is that on today's CPUs any manipulation of memory mappings is just expensive. I'm even more skeptical of loaning w/o CoW -- its semantics rub me very much the wrong way (see ZFS experience with writing through mmaps). So I'm inclined to suspect this path is just not worthwhile for sockets. (sendfile(2) is different because there is no loaning there as the blocks to [read, if not already read, and then] send are in the same address space already.) I'll be very happy to be very wrong about this because indeed, it feels wrong that copying should be faster than loaning, and CoW feels so right.

I should have looked at the source before posting, NetBSD can loan pages to the kernel within sosend().

Uvm used to be really cool, when there was only one core and one memory map.

Then those busybody CPU manufacturers gave us lots of cores, and each core its own TLB, and then we had to trash everybody's TLB whenever somebody re-mapped something. That made working by flapping page mappings slow everything else down, and UVM became a specialist technique for embedded systems small enough to have just one core, but big enough to have mapped memory.

Meanwhile, memory systems have got better and better at copying -- mostly just by adding bandwidth, but also by having lots of registers to slurp bytes into and spew out elsewhere -- to the point that gymnastics to avoid copying are often slower than just copying.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact