
A future for fork(2) - lizmat
http://brrt-to-the-future.blogspot.com/2018/10/a-future-for-fork2.html
======
MisterTea
Interestingly enough, plan 9 did away with threads and replaced fork() with
rfork() [http://man.9front.org/2/fork](http://man.9front.org/2/fork).

Instead of having to deal with the duality of threads and processes they they
opted to implement threading using light weight processes that share the
parents data and bss segments. This is only turned on if you pass the RFMEM
fag to rfork(), otherwise it behaves the same as unix fork and gives the child
a copy of those segments.

This simplifies the system design as you only deal with processes which is
arguably a more correct and simplified approach (I mean the OS only has 30
something syscalls). The api is implemented via a CSP library simply called
thread. You replace main() with threadmain() and the thread api gives you all
the concurrency tools. Or you can create your own threading library yourself
if you so choose to.

If you are familiar with Go's CSP concurrency then you can thank plan 9's
thread as Rob Pike, Ken Thompson and Russ Cox came from Bell Labs and built
plan 9. And those ideas were born in Alef and Newsqueak.

~~~
khuey
fork(2) is largely superseded by clone(2) which works similarly to rfork. The
CLONE_VM flag is equivalent to rfork's RFMEM.

~~~
jadbox
Weird, I've never seen clone(2) used before

~~~
caf
It's used in the implementation of pthread_create().

~~~
dullgiulio
Also fork(3) (the libc wrapper using GLIBC), is often just calling clone(2)
internally: [https://stackoverflow.com/questions/18904292/is-it-true-
that...](https://stackoverflow.com/questions/18904292/is-it-true-that-fork-
calls-clone-internally)

------
hyperman1
Has anyone an example for wanting to fork? Most use cases seem covered by
either threads or fork+exec process spawning.

The example given by the author:

    
    
      make it practical to introduce concurrency after development, rather than designing it in from the start
    

seems risky to me: Even if you're safe from other threads, there is a lot of
other unknown state like fd's still inherited. And concurrency can be
introduced after development, e.g. by using message passing. So going all the
way threaded or re-execing yourself seem both safer to me with no real
downsides.

~~~
nemanjaboric
One of the available garbage collector implementations in D (version 1) (and
there's a work in progress to port it to D2) is a concurrent garbage collector
which performs a mark phase in forked process going over the address space.

Also I've seen a project which persists large amounts of data to disk by
forking and using a consistent snapshot of the data structures (which the new
process will not mutate, so there's no need for locking).

~~~
scottlamb
> One of the available garbage collector implementations in D (version 1) (and
> there's a work in progress to port it to D2) is a concurrent garbage
> collector which performs a mark phase in forked process going over the
> address space.

I'm pretty skeptical of that approach. If there's heavy churn, the copy-on-
write behavior means that the mark phase _increases_ memory use, when
presumably you're doing GC because you need to _decrease_ memory use. And I
assume there's some cost to these VM manipulations in general. Is this
implementation competitive with other D garbage collectors in terms of memory
usage, CPU time, latency, etc? are D's garbage collectors in general
competitive with those of Go and Java?

------
pjc50
This probably needs to be fixed in POSIX with the gradual deprecation of
fork(2) over decades. If we start now we might be done by 2038. Maybe start by
deprecating fork-after-multithreading, since that's extremely hard to get
right in the first place.

fork/exec was a convenient hack for singlethreaded small programs on the
PDP-11. Its semantics interact badly with all kinds of subsequent features,
and it interferes with porting to other operating systems.

~~~
noselasd
I don't think we'll see the fork/exec semantics going away anyway soon -
inheriting privileges/file-descriptors etc. in this way is quite core to all
unix like OSs.

posix does provide posix_spawn() though, which an OS could implement in a more
efficient way than fork()+exec().

~~~
cryptonector
It is usually implemented as vfork()+exec/exit.

Some OSes implement posix_spawn() as a system call. IIRC NetBSD has had that,
and Solaris 12 has it too.

~~~
trasz
OSX as well.

------
ComputerGuru
fork is a nightmare. On paper, fork/exec sounds better than Microsoft’s
CreateProcess. In the real world, I’ll take the latter plus threads any day.

Fork is also stupid. 99.9% of the time (if not more), it’ll be followed by
exec/execve. Why go through all the pain of either copying everything or take
the performance and complexity hit of setting up COW memory space if it’ll all
be thrown away in a few microseconds so a different process can be launched?

Then it leaves as “a trivial implementation detail” to the developer the
process of setting up the pgrp, inheriting or negotiating terminal access (and
dealing with the impossible to avoid race condition - except by using vfork
instead and accepting its limitations) of determining if the child set its own
pgrp before the parent did so tcsetpgrp can be called adept before the child
can call exec so it doesn’t end up SIGTTIN or SIGTTOU when it tries to use the
terminal after exec.

No thanks.

------
cryptonector
fork(2) is evil.

I used to think fork(2)+exec(2) was genius and much more versatile than the
WIN32 CreateProcess*() functions. The last part is still true, but I no longer
think fork(2) is genius. fork(2) was genius back in the 70s when it was easier
to do all the spawning work in user-land, but it's not actually a good thing,
not now.

The only sane way to use fork(2) is to use it very early on in main() to start
worker child processes. Or to immediately exit(2) or exec(2) on one or the
other (or both) side(s) of fork(2) without calling any async-signal-UNsafe
code on the child side. If you're going to exec() on the child side, then just
use vfork(2) -- it's much easier and safer. Even better, if you're going to
exec() on the child side then just use posix_spawn(3) and be done -- let the C
library / OS figure out how best to spawn the child process. (On some systems
posix_spawn() is a system call. On others it's implemented in terms of
vfork(2) or clone(2) called in vfork(2)-like manner.)

The irony is that vfork(2) has the worse reputation, but it really is fork(2)
that is evil. fork(2) is the root of much evil Unix-land:

    
    
      - there are fork-safety considerations galore
      - so you can only call async-signal-safe functions
        on the child-side of the fork (unless you're
        quite certain of the parent's state at the time
        of the fork, which is why it's OK for the child
        to continue if the fork was done early in main())
      
      - it's EXPENSIVE
         - copying the RSS, or all the writable data, or
           arranging for CoW -- all of these are expensive
           nowadays
         - it complicates accounting of swap / VM space
        (JNI code that calls fork(2) in a JVM with a 32GB
        heap... really slows you down.)
      
      - it has terrible semantics as to signals
      
      - every time something is added to what it means to
        be a process... one has to think about whether that
        should be inherited across fork(2) or not
    

fork(2) is evil. Do not use. Especially do not expose fork(2) in other
languages or their system libraries.

------
xenadu02
I suppose it might be possible to fix fork() but the performance costs might
be prohibitive.

The kernel would need to suspend any thread that attempts to take a lock or is
not currently holding a lock. That immediately makes userland-only
synchronization besides lockless impossible; the kernel has to know about the
synchronization primitives.

Once all threads besides the forking thread are blocked trying to acquire a
lock or suspended without holding any locks the process can fork and all
threads resume.

A far better payoff might be making processes cheaper to create, faster to
startup, and providing easier-to-use IPC mechanisms. Those things benefit lots
of programs, not just esoteric uses of fork().

~~~
comex
I don’t think that’s true. There’s nothing inherently bad about forking a
thread while it’s holding a lock. The problem, rather, is with _not_ forking a
thread while it’s holding a lock. That is – from the child process’s
perspective, it’s as if all the threads other than the one that called fork()
were abruptly terminated. If any of those threads happened to be holding a
lock, then it will never have the chance to finish whatever work it was doing
and release the lock… because it no longer exists. On the other hand, if
fork() cloned all threads in the process, then they they could keep
synchronizing and working as usual, oblivious to the fact that they’re now in
a new process.

Well, mostly. One significant exception: Synchronization mechanisms sometimes
identify threads internally using a system-global ID. For example, some
variants of the Linux futex() syscall require you to write your thread ID, as
returned by gettid(), to the lock word to indicate that the lock is owned by
you. This obviously won’t work if a thread’s ID can change at any time; you
would have to switch to some other ID that’s instanced per-process. Except
that would break mutexes shared between processes using shared memory, so
you’d have to find a different solution for that. Solvable, but not quite
trivial.

By the way, as a sort of existence proof, Linux already has a way to do
something like multithreaded fork()! Namely, CRIU (“checkpoint restore in
userland”), used with containers, is a program that can dump a process (or set
of processes) to disk and restore them later – including multithreaded
processes. This is normally used for things like migrating containers across
physical machines, but AFAIK, in theory you could restore the same dump
multiple times to get multiple copies of the same process; it would just be
slow, and massively overkill. :) The catch: it relies on PID namespaces, such
that all processes and threads get the same PIDs and TIDs after being restored
as they had before – but they’re only unique within the PID namespace, i.e.
container. Thus it doesn’t have to deal with the possibility of TIDs changing,
but the process has to be isolated from others. For a container, isolation is
what you want anyway, but it’s probably _not_ what you want if you’re just a
random Unix program trying to fork.

------
wging
See also rachelbythebay, "Don't mix threads and forks":
[http://rachelbythebay.com/w/2011/06/07/forked/](http://rachelbythebay.com/w/2011/06/07/forked/)

------
caf
Another alternative (if you want to write a multi-threaded runtime that allows
its clients to request a fork) is to implement your own memory allocation
routines that use robust mutexes, along with appropriate cleanup routines for
the EOWNERDEAD case.

This may be easier said than done.

------
k2k9
hm, if in posix systems a thread inherit "The entire virtual address space of
the parent is replicated in the child" that means actually that context switch
between threads among one process is not that costly (TLB cache stays intact
for example). Am i right?

~~~
hyperman1
It is surely faster than windows. But I don't think the TLB stays intact: You
have to redo all page tables etc.., as every memory write will cause a copy
on-write (if one process writes to memory, it gets a copy of the memory and
the other process never sees the write)

~~~
yuubi
Copy on write is for fork. Threads share memory, so you specifically don't
want to CoW.

~~~
hyperman1
Correct, of course. I somehow read 'forked process' where he said 'thread'.
Sorry.

------
glenrivard
If you look at the new kernel, Zircon, you do not have a fork to create a
process like Linux and other kernels.

Here is a recent article on creating a process.

[https://depletionmode.com/zircon-
process.html](https://depletionmode.com/zircon-process.html)

One correction on this article I noticed is that the PCI driver is currently
in the kernel and not user space.

