This was interesting. It reminded me how fork() is so weird and I found some explanation for its weirdness that loops back to this conversation about nommu:
"Originally, fork() didn't do copy on write. Since this made fork() expensive, and fork() was often used to spawn new processes (so often was immediately followed by exec()), an optimized version of fork() appeared: vfork() which shared the memory between parent and child. In those implementations of vfork() the parent would be suspended until the child exec()'ed or _exit()'ed, thus relinquishing the parent's memory. Later, fork() was optimized to do copy on write, making copies of memory pages only when they started differing between parent and child. vfork() later saw renewed interest in ports to !MMU systems (e.g: if you have an ADSL router, it probably runs Linux on a !MMU MIPS CPU), which couldn't do the COW optimization, and moreover could not support fork()'ed processes efficiently.
Other source of inefficiencies in fork() is that it initially duplicates the address space (and page tables) of the parent, which may make running short programs from huge programs relatively slow, or may make the OS deny a fork() thinking there may not be enough memory for it (to workaround this one, you could increase your swap space, or change your OS's memory overcommit settings). As an anecdote, Java 7 uses vfork()/posix_spawn() to avoid these problems.
On the other hand, fork() makes creating several instances of a same process very efficient: e.g: a web server may have several identical processes serving different clients. Other platforms favour threads, because the cost of spawning a different process is much bigger than the cost of duplicating the current process, which can be just a little bigger than that of spawning a new thread. Which is unfortunate, since shared-everything threads are a magnet for errors."
"Originally, fork() didn't do copy on write. Since this made fork() expensive, and fork() was often used to spawn new processes (so often was immediately followed by exec()), an optimized version of fork() appeared: vfork() which shared the memory between parent and child. In those implementations of vfork() the parent would be suspended until the child exec()'ed or _exit()'ed, thus relinquishing the parent's memory. Later, fork() was optimized to do copy on write, making copies of memory pages only when they started differing between parent and child. vfork() later saw renewed interest in ports to !MMU systems (e.g: if you have an ADSL router, it probably runs Linux on a !MMU MIPS CPU), which couldn't do the COW optimization, and moreover could not support fork()'ed processes efficiently.
Other source of inefficiencies in fork() is that it initially duplicates the address space (and page tables) of the parent, which may make running short programs from huge programs relatively slow, or may make the OS deny a fork() thinking there may not be enough memory for it (to workaround this one, you could increase your swap space, or change your OS's memory overcommit settings). As an anecdote, Java 7 uses vfork()/posix_spawn() to avoid these problems.
On the other hand, fork() makes creating several instances of a same process very efficient: e.g: a web server may have several identical processes serving different clients. Other platforms favour threads, because the cost of spawning a different process is much bigger than the cost of duplicating the current process, which can be just a little bigger than that of spawning a new thread. Which is unfortunate, since shared-everything threads are a magnet for errors."
https://stackoverflow.com/questions/8292217/why-fork-works-t...