
Why threads can't fork - akerl_
http://thorstenball.com/blog/2014/10/13/why-threads-cant-fork/
======
Animats
Sun did try multi-thread fork semantics, but that didn't help.

UNIX "fork" started as a hack. The reason UNIX originally used a fork/exec
approach to program launch was to conserve memory on the PDP-11. "fork"
originally worked by swapping the process out to disk. Then, at the moment
there was a good copy in both memory and on disk, the process table entry was
duplicated, with one copy pointing to the swapped-out process image and one
pointed to the in-memory copy. The regular swapping system took it from there.

Then, as machines got bigger, the Berkeley BSD crowd, rather than simply
introducing a "run" primitive, hacked on a number of variants of "fork" to
make program launch more efficient. That's what we mostly have today. Plan 9
supported more variants in a more rational way; you could choose whether to
share or copy code, data, stack, and I think file opens. The Plan 9 paper says
that most of the variants proved useful. But that approach ended with Plan 9.
UCLA Locus just had a "run" primitive; Locus ran on multiple machines without
shared memory, so "fork" wasn't too helpful there.

Threads came long after "fork" in the UNIX world. (Threads originated with
UNIVAC 1108 Exec 8 (called OS 2200 today), first released in 1967, where they
were called "activities"). Exec 8 had "run", and an activity could fork off
another activity, but not process-level fork. Activities were available both
inside and outside the OS kernel, decades before UNIX.

That's why UNIX thread semantics have always been troublesome, especially
where signals are involved. They were added on, not designed in.

~~~
Danieru
> share or copy code, data, stack, and I think file opens

Sounds like linux's "clone" system call. Which is the underlying syscall which
clib's fork() uses.

You can do just about anything imaginable with it:
[http://linux.die.net/man/2/clone](http://linux.die.net/man/2/clone)

For example: you could create a child process-like-thing which shares nothing
but the signal handler table. No idea what that would be good for.

~~~
caf
Not all combinations are allowed. In this specific case, if you specify
CLONE_SIGHAND then you must also specify CLONE_VM (so the processes share a
virtual memory space, and are essentially threads).

~~~
Danieru
Ah good catch, sorry I just skimmed the man page for an interesting sounding
feature.

------
ChuckMcM
In my opinion this is fundamentally a problem of mixing coprocessing
metaphors. The whole thread vs process vs container differences are tied up in
leakage between the permissions model (process) and computation model (threads
and containers). The thread equivalent of 'fork' would be some form of
promotion to a 'root' thread (which is to say one where instance data about
computation limits can be changed independently) Processes, which were the
traditional collection point of resources under an identity, ideally sit apart
from computation constraints. And if you follow that path you realize that
resource allocation (which is one of the three key parts of OS management)
then need to both be computation aware, and identity aware. In the example of
the article, malloc would break the lock such that its validity would be
related to the identity it was associated with, so if you promoted a thread to
the 'identity' level you would invalidate any locks visible to it that were
attached to identity.

There have been discussions about this in OS design for almost forever.

------
ridiculous_fish
The fish shell [http://fishshell.com](http://fishshell.com) is multithreaded
and calls fork, so it can be done. But it is difficult, even if you just call
execve().

An example of a problem we encountered: what if execve() fails? Then we want
to print an error message, based on the value of errno. perror() is the
preferred way to do this. But on Linux, perror() calls into gettext to show a
localized message, and gettext takes a lock, and then you deadlock.

This is a hard problem to solve because it requires knowing the status of all
locks. This breaks the abstractions that library authors present, where locks
are internal and not exposed.

~~~
ben0x539
Isn't it easier to have the parent print the error message by passing yet
another pipe to the child that closes on exec and can be used to pass up
errno?

~~~
ridiculous_fish
That's a good idea! I'm not sure if it's easier but it would effectively
sidestep this class of issues.

------
rubiquity
You give people an amazing implementation of M:N concurrency and the people
want to fork. You give people concurrency via forking, and they want N:M
concurrency.

~~~
wahern
Did Go ever fix M:N threading? From what I've read there's significant
performance degradation when you use M:N threading as opposed to 1:N.

If Go figured out how to efficiently detect data dependencies and
relationships, and then automatically move goroutines around, that would be
exceptionally noteworthy. Everybody starts out thinking they can do this,
which is why Solaris, NetBSD, Linux, Java, et al all started with out M:N
threading. But then when they figure out that it's a Really Hard(tm) problem,
they invariably shift to 1:1.

I've found that it's better to leave it to the developer to choose whether to
run an OS thread or coroutine, just as the developer chooses between a process
and OS thread. So in my project[1] I don't spend much time trying to automate
that.

[1]
[http://25thandclement.com/~william/projects/cqueues.html](http://25thandclement.com/~william/projects/cqueues.html)

~~~
justincormack
Google are working on some kernel help for userspace threads there was an
article on lwn.net a while ago. Someone told me yesterday he was seeing a lot
of spurious wakeups but hadn't debugged them yet. So I think there is room for
improvement.

------
preillyme
The pthread_atfork() function shall declare fork handlers to be called before
and after fork(), in the context of the thread that called fork(). The prepare
fork handler shall be called before fork() processing commences. The parent
fork handle shall be called after fork() processing completes in the parent
process. The child fork handler shall be called after fork() processing
completes in the child process. If no handling is desired at one or more of
these three points, the corresponding fork handler address(es) may be set to
NULL.

The order of calls to pthread_atfork() is significant. The parent and child
fork handlers shall be called in the order in which they were established by
calls to pthread_atfork(). The prepare fork handlers shall be called in the
opposite order.

I'm not sure if that's the best approach, but it's an attempt at least.

~~~
pjmlp
I just had a look at IEEE Std 1003.1, 2013 Edition for pthread_atfork().

It has a few corner cases across POSIX systems. I wouldn't bet it works 100%
the same way in all UNIXes.

~~~
djcapelis
Of course it doesn't, very few things do. Cross platform is hard, but that
doesn't mean you don't use features.

~~~
pjmlp
Of course, but sometimes it is a huge pain.

I used to do cross platform across Aix, HP-UX, Solaris, GNU/Linux, FreeBSD and
Windows NT/2000 back in the .COM days.

~~~
wahern
It's much easier in 2014 than it was in 2004. POSIX has evolved, and POSIX
conformance has substantially improved. Most systems are, in practice, nearly
100% conformant to POSIX-2001.

Excepting Windows, I rarely run into difficult portability problems except
when I deliberately use non-POSIX functionality or newer POSIX functionality.

I target Linux, OS X, OpenBSD, NetBSD, FreeBSD, Solaris, and AIX. The biggest
laggard was OpenBSD, particularly wrt to threading, signal handling, and real-
time extensions. But in the past couple of years that's been substantially
addressed.

One of my biggest headaches now is OS X. They appear to have stopped trying to
track POSIX, so while everybody else is busily implementing POSIX-2004,
POSIX-2008, and tentative POSIX features, OS X is nearly at a stand-still. OS
X hasn't fixed any significant conformance issues, adopted real-time
extensions, nor adopted any POSIX-2008 features for several years, now.

~~~
pjmlp
Thanks for the update, however there seems they still a long way to go until
most systems reach UNIX V7 X1201 compliance.

------
aidenn0
This is an issue in common lisp as well; you generally have the choice of
either fork or threads. In fact ClozureCL, a popular lisp implementation
launches an extra thread at startup (For I/O IIRC), and for a while someone
maintained a fork of it that did not do so to allow usage of the fork syscall.

~~~
616c
Interesting. I noticed that with RESTAS (a CL web server/API server library)
they make a big deal of daemonization and not requiring the usual hacks to
daemonize a CL server (tmux/screen/dtach/etc).

Was this a common (no pun intended) problem among CL implementations and why
server daemonization is an issue? I am just learning CL, and noticed RESTAS
only really supports this daemonization feature in SBCL, if I believe the
documentation.

I guess I am going to have to dive into the source this weekend and check it
out.

~~~
aidenn0
I use daemontools to monitor my daemons, so I've never tried to daemonize a
lisp daemon. With SBCL you can still fork so long as you do it before you
spawn any threads. I'm guessing that RESTAS uses the sb-posix:fork function to
daemonize, which would explain why it only works on sbcl.

~~~
616c
Neat. Thanks for the explanation. I had installed RESTAS inside of a Clozure
CL image but had not gotten as far as daemonization and forking.

Judging from the one sentence on the landing page, I supposed it would blow
up. Haha.

------
Dylan16807
That's quite a copout with forkall. Presumably you have _some_ idea what your
threads are doing if you want to fork the entire process, so don't do it in
the middle of writing to a file.

~~~
swartkrans
How could you know what your threads are doing at the time you call fork? That
doesn't sound safe. I write such code in a way that threads communicate with
each other when they need to, but if I start assuming what those threads are
doing that seems like it could get really complicated really fast.

~~~
thetrb
Sure it can get complicated, but there are also some safe ways to fork.
Imagine you fork at the very beginning of your program before starting any
threads. No danger in that. So the language could at least allow it (but
provide the proper warnings around it).

~~~
Vendan
the issue arises when the programming language doesn't let you run code before
the threads split out, like golang.

------
robmccoll
In the motivating Go example, implementing a fork doesn't seem like it would
be particularly challenging since the runtime can pause all executing Go
routines and reschedule them in the same place on the other side. Bit easier
than pthreads.

------
lmm
fork() is a dumb interface, and non-portable anyway. I've yet to see a use
case that couldn't be handled with either threads, or spawning another process
- after all, those are the only APIs you get elsewhere.

If you need to use a language with a runtime (not just Go by any means, the
likes of Python also suffer from this issue) from two processes that need to
be separate but communicate with each other, do the fork first, then start the
language runtime (i.e. embed the language in your parent program).

~~~
mdwrigh2
> I've yet to see a use case that couldn't be handled with either threads, or
> spawning another process - after all, those are the only APIs you get
> elsewhere.

Android uses this to pre-load framework resources and code in a way that lets
all applications share the backing memory. And when applications crash, they
don't bring down the initial process that preloaded everything (so it can
continue spawning new apps).

How would you handle that with only threads or spawning another process?

~~~
lmm
It's possible to share memory between processes that weren't originally forks
- consider e.g. X clients using XShm to communicate with the server, or jk for
fast communication between apache and tomcat. I guess forking lets you do
"share everything, COW", which is kind of handy, but it's also a very lazy way
of programming; you get access to the whole address space, so it relies on the
other processes to not reuse data that doesn't make sense when shared. Better
to only share memory that processes explicitly want to share, and make it
clear which one owns any given region of memory.

~~~
mdwrigh2
Shared memory also means that any modifications are also shared, which is
really _not_ good in this case. We could mark the sections RO I suppose, but
then we have them occasionally copying things out of the RO pages to modify
them which just bloats the address space (though doesn't change the number of
backing pages). It's also slightly more brittle because you have to be careful
about marking everything shared RO.

> Better to only share memory that processes explicitly want to share, and
> make it clear which one owns any given region of memory.

We are only sharing memory that we explicitly want to share: we load _only_
what we care about, then fork.

------
djcapelis
Really? Writing a wrapper around fork with a mutex doesn't seem like that huge
of an issue. This combined with pthread_atfork() should provide a way to make
this work, no?

~~~
asveikau
A mutex around fork() buys you nothing. The problem is _other_ locks, which
often are not even part of your code.

The article has perhaps the most likely example. Imagine malloc() has a lock.
Another thread happens to be inside malloc() at the moment that you fork(),
and therefore owns the lock and might be in the middle of manipulating shared
data structures. Now suddenly the child cannot malloc(), because that thread
[suspended in the middle of its execution] isn't going to be carried over to
it, and will never be able to clean up its intermediate state and release the
lock.

~~~
djcapelis
It buys you plenty if you prevent other threads from entering until you all
meet in a rendezvous lock.

~~~
asveikau
The worker thread that was kicked off by some random library dependency
(acquiring a lock you didn't know about) isn't going to care about your
rendezvous lock. Even if you do control all the threads getting them to do
what you're suggesting may be nontrivial and costly.

Edit: IMO it's better to just admit the programming model is thorny and move
on. I feel similarly about signals. Restrict what you do after fork() and
generally be cautious, the same way you'd be cautious reacting to a signal.

~~~
djcapelis
Wrap the thread create calls and make them care. :)

