
A fork() in the road - ralish
https://www.microsoft.com/en-us/research/publication/a-fork-in-the-road/
======
sfink
I read the paper, and they make a lot of good points about fork's warts.

But I really wanted some explanation of why Windows process startup seems to
be so heavyweight. Why does anything that spawns lots of little independent
processes take so bloody long on Windows?

I'm not saying "lots of processes on Windows is slow, lots of processes on
Linux is fast, Windows uses CreateProcess, Linux uses fork, CreateProcess is
an alternative to fork/exec, therefore fork/exec is better than any
alternative." I can imagine all kinds of reasons for the observed behavior,
few of which would prove that fork is a good model. But I still want to know
what's going on.

~~~
ralish
I'm a bit rusty on this but from memory the overhead is by and large specific
to the Win32 environment. Creating a "raw" process is cheap and fast (as you'd
reasonably expect), but there's a lot of additional initialisation that needs
to occur for a "fully-fledged" Win32 process before it can start executing.

Beyond the raw Process and Thread kernel objects, which are represented by
EPROCESS + KPROCESS and ETHREAD + KTHREAD structures in kernel address space,
a Win32 process also needs to have:

\- A PEB (Process Environment Block) structure in its user address space

\- An associated CSR_PROCESS structure maintained by Csrss (Win32 subsystem
user-mode)

\- An associated W32PROCESS structure for Win32k (Win32 subsystem kernel-mode)

I'm pretty sure these days the W32PROCESS structure only gets created on-
demand with the first creation of a GDI or USER object, so presumably CLI apps
don't have to pay that price. But either way, those latter three structures
are non-trivial. They are complicated structures and I assume involve a
context switch (or several) at least for the Csrss component. At least some
steps in the process also involve manipulating global data structures which
block other process creation/destruction (Csrss steps only?).

I expect all this Win32 specific stuff largely doesn't apply to e.g. the Linux
subsystem, and so creating processes should be _much_ faster. The key takeaway
is its all the Win32 stuff that contributes the bulk of the overhead, not the
fundamental process or thread primitives themselves.

EDIT: If you want to learn more, Mark Russinovich's Windows Internals has a
whole chapter on process creation which I'm sure explains all this.

~~~
Gibbon1
> created on-demand with the first creation of a GDI or USER object, so
> presumably CLI apps don't have to pay that price

This tickles my brain. I read some blog post bitching that because Windows
DLL's are kinda heavy weight it's way easy end up paying that price without
realizing it.

~~~
cesarb
It probably was this one: [https://randomascii.wordpress.com/2018/12/03/a-not-
called-fu...](https://randomascii.wordpress.com/2018/12/03/a-not-called-
function-can-cause-a-5x-slowdown/)

------
rgovostes
On macOS, fork() is a bit weird:
[https://opensource.apple.com/source/Libc/Libc-997.90.3/sys/f...](https://opensource.apple.com/source/Libc/Libc-997.90.3/sys/fork.c.auto.html)

Many frameworks are backed by XPC services, where the parent process has a
socket-like connection to a backend server. After forking, the child would
have no valid connection to the server. The fork() function establishes a new
connection in the child for libSystem, to allow Unix programs to port easily
to macOS, but other services' connections are _not_ re-established. This makes
fork on macOS (i) slow, and (ii) unsafe for code that touches virtually any of
Apple's APIs.

~~~
cryptonector
fork() is generally unsafe for that reason, and OS X is only special in this
regard in that it has more of these hidden C library handles that can blow up
on the child-side of fork(). vfork()+exec()-or-_exit() is much safer.

------
dis-sys
I agree with most parts of the paper.

Fork() is now basically the root of a looong list of special cases in so many
aspects of programming. Things get even worse when you use a language with
built-in runtime such as Golang for which multi-threaded programming the
default behaviour. If fork() can't even handle multiple threads, what is the
real point of having it when a 8 core 16 threads AMD processor is about $150
each.

~~~
sqrt17
> If fork() can't even handle multiple threads, what is the real point of
> having it when a 8 core 16 threads AMD processor ...

These threads and those threads are not the same. The 16-threads SMT processor
will happily chew on 16 different programs, processes or whatever the load at
the moment is, e.g. if you use Python's multiprocessing you can create 16
processes and they'll be executed in parallel.

fork() can handle multiple threads but you have to be attentive when cleaning
up etc. - quite often, code using fork() will get confused when you spawn
threads, and code using threads will get confused when you fork()

------
swiftcoder
Fork has really weird semantics, and a lot of fun gotchas around managing
resources. Good riddance?

~~~
evilotto
Not even just the semantics, the performance is awful. Even when the fork is
virtual (as any modern fork is) and there's no memory copying because it's
COW, all the kernel page tables still need to be copied and for a multi-GB
process that's nontrivial. That's why any sane large service that needs to
fork anything will early on start up a slave subprocess whose only job is to
fork quickly when the master process needs it.

~~~
dexen
_> all the kernel page tables still need to be copied and for a multi-GB
process that's nontrivial_

Only in the pathological case where the large process is backed solely by the
4kb pages. The hardware has long now supported large pages - on x86 since
Pentium Pro, if memory serves - and huge pages. The popular OSes (Linux 2.6+
and Windows 2003+) also do support large and huge pages. A 2GB process can
easily be three pages: r/x code, r/w stack, r/w data (2gb). Granted, it gets a
bit more complex if mmapped I/O or JIT are used, but since both are mature
technology now, it's fine to point fingers at any inefficiency and demand
better. Another caveat would probably be shared libraries loading at separate
address ranges, which, IMO, is another reason to ditch shared libraries for
good.

Contrary to popular wisdom, OS research is still relevant.

~~~
temac
You want to ditch shared libraries and mmap to map your big processes using GB
pages to make fork fast again (despite it not being the main and only
drawback)???

OS research might be relevant, and it's good that some people have wild idea,
but honestly I doubt this one will go anywhere :P

~~~
dexen
Ah sorry, only want to ditch the shared libraries; added mmap is a later
edition didn't realize it's unclear. Of course mmap is necessary.

~~~
temac
About shared libraries, I know that there is this line of thought considering
them "evil" (well at least sufficiently to want to get rid of them); but I'm
quite unsure about what a modern system would look like without them (although
this is less a problem at the application level on e.g. Android, the system
level is still extremely important)

With Spectre, proper process bounds (well, address spaces) are more important
than ever -- and oh well even without that I'd still have cited them as
incredibly important, in the sense that I'd rather have more than fewer. Given
that, code reuse involves shared libraries, for several good reasons; the
obvious one being not wasting RAM, but then there is the update problem (how
to patch programs when security holes are discovered, especially if multiple
parties are involved), and on top of that there is the cache pollution
problem, which is related to the code duplication problem, and which is quite
insidious because it is probably simultaneously hard to benchmark and very
real (ambient loss of perf, just not in very hot paths, but this will still
have an impact on the general perf of a system, quite like Spectre mitigations
are having a big impact)

Now we could like address space boundaries so much that we would want to just
use even MORE processes in place of shared libraries, but this obviously does
not work for all services (and Spectre is biting us again because context
switches are not cheap), plus if you take it to the extreme this makes systems
_extremely_ hard to design, and even bigger. This is part of the reasons we
are using Linux instead of Hurd... (well Linux is too much in the opposite
direction, but there are hopes that it will in the long term evolve toward a
middle ground)

And anyway that does not fit the narrative at all of using more huge pages.

Now there are the usual radical ideas about how everything should be running
on some kind of VM (sometimes even including the kernel), drastically reducing
the amount of "native" code; but given the reality of our current systems that
"everything" both relies on multiple VMs and I doubt it will tend to only one,
nor should it (because of the monoculture this would induce). Plus the ambient
perfs are _still_ lower than native code, and TBH I don't expect that to
change ever.

So, why and how would you like to get rid of shared libraries?

~~~
pjmlp
We are using Linux instead of Hurd due to manpower.

Most high integrity real time OSes are microkernels.

Interesting that you mention Android, one of the key points of Project Treble
is using separate processes for drivers with Android IPC to talk to the kernel
(including hardware buffer handles).

~~~
temac
Well in the end we are using any X rather than Y tech because of manpower,
regardless of pretty much any other characteristics.

So let's put manpower kind of aside for the bulk of the dev (where thousands
of man-years are needed for any big project) and look at what could actually
be achieved with the very small amount of manpower _bootstrapping_ those
projects. At this point you understand that the manpower thing is only a
convenient narrative, while the reality is that even at the early time, Linux
based systems worked _really better_ than Hurd based systems.

Because general purpose micro-kernels based systems are hard, and especially
those with a design as ambitious as the Hurd. (When you start to want to
strongly isolate FS from VM code, it even stops being just hard and starts to
be really HARD.)

And this was even worse at the time for perf reasons (but perf reasons are
still applicable even today, given the impact on mobile and datacenter
workloads)

 _However_ , yes, I'm in favor of more isolation today, because for a shitload
ton of drivers it literally won't make any difference whether or not you take
1us vs 50us if you need to execute once every few seconds. So it is retarded
to the highest level to run in kernel space if you don't actually need it.
Sadly, Linux is _way behind_ on that subject today.

That being said, and back to the original subject, a microkernel or at least a
less monolithic one won't really get us in the less shared-libraries direction
if it just re-implements the same perimeter of features of a monolithic ones,
nor in the huge pages everywhere direction...

------
vbernat
fork() is also used to daemonize and for privilege separation, two tasks where
posix_spawn() cannot be used. I suppose daemonization can be seen as something
of the past, but privilege separation is not. On Linux, privileges are
attached to a thread, so it should be possible to spawn a new thread instead
of a new process. However, a privileged thread sharing the same address space
as an unprivileged one doesn't seem a good idea.

The paper also mention the use case of multiprocess servers which relies
heavily on fork() but dismiss it as it could be implemented with threads. A
crash in a worker would lead to the crash of the whole application. While a
worker could just be restarted.

A proper use case of removing fork() from an actual program would help. For
example, how nginx on Windows is implemented?

~~~
alkonaut
I can’t answer for Nginx but normally on windows if you want “worker
processes” you just start N of them and have them read work from a shared
memory queue. That is, workers live longer than the tasks they perform. If one
crashes, a new one is spawned. This does seem like a more sensible way of
doing things than forking tbh. It isolates work in processes but doesn’t pay
for process creation per request.

~~~
bloak
"If one crashes, a new one is spawned."

I suppose that makes sense on an OS on which crashing is expected behaviour,
though some people would want to know what bug caused the crash and whether
that bug has security implications.

~~~
zaarn
Crashing is an expected behaviour in Linux as well, you can enable coredumps
or utilize an applications log if you want to know why.

~~~
bloak
The Linux kernel doesn't crash much, unless you have dodgy drivers or dodgy
hardware. Whether your userland programs crash or not depends on what you're
running. I don't expect to see sshd crashing, for example, though it's true
that almost any program will exit suddenly if the system runs out of memory,
which to an ordinary user looks like a crash, though it's a very different
thing really.

~~~
Dylan16807
If you weren't talking about userland crashes, then your crack about "an OS on
which crashing is expected behavior" makes no sense.

~~~
bloak
The comment was not meant to be taken all that seriously, of course, but an OS
is more than just the kernel, and I do tend to disapprove of brushing a crash
under the carpet.

System runs out of memory, various processes get terminated, and the easiest
way to get it back into a good state is a restart: not that worrying, but do
you have a memory leak? Some process segfaults with 54584554454d4f53 in the
PC: should be investigated, not glossed over.

------
cryptonector
I've been saying this for quite some time. Here's a gist I wrote about it:
[https://gist.github.com/nicowilliams/a8a07b0fc75df05f684c23c...](https://gist.github.com/nicowilliams/a8a07b0fc75df05f684c23c18d7db234)

------
ktpsns
Can anybody elucidate about why fork() is still used in Chromium or Node.js?
They are not old-grown traditional forking Unix servers (unlike Apache or the
mentioned databases in the paper). I would expect them to implement some of
the alternatives and having fork() only as a fallback in the code (i.e. after
a cascade of #ifdefs) if no other API is available. Therefore, I wonder where
the fork() bottlenecks really appear in everyday's life.

~~~
xyzzyz
> why fork() is still used in Chromium

To support a multi-process web browser architecture that Chromium pioneered,
you need to spawn processes. See
[https://chromium.googlesource.com/chromium/src/+/HEAD/docs/l...](https://chromium.googlesource.com/chromium/src/+/HEAD/docs/linux_zygote.md)

~~~
Dylan16807
That's not what the page says. It says the use of fork() saves 8MB and a few
tens of milliseconds per process spawn.

------
eesmith
It points out that "1304 Ubuntu packages (7.2% of the total) calling fork,
compared to only 41 uses of the more modern posix_spawn()".

In section 7 it suggests "We should therefore strongly discourage the use of
fork in new code, and seek to remove it from existing apps."

Is anyone here going to help work on changing those 1304 packages?

I have already over-volunteered for thankless FOSS tasks like this, so I know
it won't be me.

~~~
wbl
There are things that don't fit the posix_spawn limitations, especially with
fd or capability manipulation.

~~~
eesmith
Yes, certainly. The paper covers many of those limitations.

The goal is not "remove", but "seek to remove". The relevant definition of
"seek" here is "to make an attempt" says [https://www.merriam-
webster.com/dictionary/seek](https://www.merriam-webster.com/dictionary/seek)
.

How many of those 1304 Ubuntu packages require fork()? Are there benefits to
replacing (say) 1283 of them with posix_spawn()?

~~~
sanxiyn
Yes, there are benefits to using posix_spawn: It's faster. See Figure 1.

~~~
eesmith
How many of those packages would be improved with a faster spawn mechanism?
Who is going to investigate each one? How will they convince upstream to
change well-tested code?

------
harryf
From the paper...

> 7\. GET THE FORK OUT OF MY OS!

Someone couldn't resist...

------
heavenlyblue
If they're going to remove fork, then python's multiprocessing is going to be
dead. Maybe then the community will be forced to get rid of GIL?

------
fopen64
When I learnt how fork() and select() worked, I just felt in love with Unix.
The Win32 API was so ad-hoc and unnatural in direct comparison.

~~~
pova
For me it was poll() due to it's simple and intuitive API. Also, it's much
faster then select() when you have a large number of file descriptors being
monitored.

~~~
fopen64
For some reason the book (Beginning Linux Programming, Wrox Press, 1998
edition I think) explained select() first, so like the proverbial little duck
that names 'mother' the first think it sees moving after hatch, select()
caught my heart.

------
stuaxo
And there I was thinking they would expose their fork() implementation.

Interested to see what this paper has to say.

------
zbentley
While fork() might be sub-optimal for launching different programs (fork() +
exec() vs. posix_spawn()), it's absolutely essential in several types of
common systems that don't use it to launch different programs.

Fork-requiring program class 1:

The biggest example where fork() is needed are webservers/long-running
programs with significant unchanging memory overhead and/or startup time.

Many large applications written in a language or framework that prefers the
single-process/single-thread model for executing requests (e.g.
Python/gunicorn, Perl, a lot of Ruby, NodeJS with ‘cluster’ for multicore,
etc.) are basically dependent on fork(). Such applications often have a huge
amount of memory required at startup (due to loading libraries and
initializing frameworks/constant state). Creating workers that can execute
requests in parallel but _don’t require any additional memory overhead_ (just
what they consume per request) is essential for them. fork()ing _without_
exec()ing a new program facilitates this memory sharing; everything is copy-
on-write, and most big webapps don’t need to write most of the startup-
initialized memory they have, though they may need to read it.

Additionally, starting up such programs can take a long time due to costly
initialization (seconds or minutes in the worst cases); using fork() allows
them to quickly replace failed or aged-out subprocesses without having to pay
that overhead (which also typically pegs a CPU core) to change their
parallelism. “Quickly” might not be quick enough if a program needs to
continually launch new subprocesses, but for periodically forking (or just
forking-at-startup) long-running servers with a big footprint, it’s far better
than re-initializing the whole runtime. For better or worse, we’ve come far
enough from old-school process-per-request CGI that it is no longer feasible
in most production deployments.

Anticipated rebuttals:

Q: Wouldn't it be nice if everyone wrote apps small enough that startup time
was minimized and memory footprint was low?

A: Sure, but they won’t.

Q: People should just write their big, long-running services in a framework
that starts fast, has low memory requirements, and uses threads instead of
fork()s.

A: See previous answer. Also see zzzcpan’s response.

Q: Can you access some of those benefits with careful use of shared memory?

A: Yes, but it’s _much_ harder to do than it is to use fork() in most cases
(caveat Windows, but it’s still hard).

Q: Do tools exist in single-proc/single-thread forking frameworks/languages
which switch from forking to hybrid async/threaded paradigms (like gevent)
instead?

A: Yes, but they’re not nearly as mature, capable, or useful (especially when
you need to utilize multiple cores).

Fork-requiring program class 2:

Programs which fork infrequently in order to parallelize uncommon tasks over
shared memory. Redis does this to great effect; it doesn’t exec(), it just
forks off a child process which keeps the memory image at the time of fork
from the parent, and writes most of that memory state to disk so that the
parent can keep handling requests while the child snapshots.

Python’s multiprocessing excels at these kinds of cases as well. If you’re
launching and destroying multiprocessing pools multiple times a second, then
sure, you’re holding it wrong, but many people get huge wins from using
multiprocessing to do parallel operations on big data sets that were present
in memory at the time multiprocessing fork()ed off processes. While this isn’t
cross-platform, it can be a really massive performance advantage: no need to
serialize data and pass it to a multiprocessing child (this is what
apply_async does under the covers) if the data is already accessible in memory
when the child starts. Node's 'cluster' module will do this too, if you ask
nicely. Many other languages and frameworks support similar patterns: the
common thread is making fork()ing parallelism "easy enough" with the option of
spending a little extra effort to make it _really really cheap_ to get pre-
fork memory state into children for processing. Oh, and you basically don't
have to worry about corrupting anyone else's in-memory state if you do this
(not so with threads).

Anticipated Rebuttals:

Q: $language provides a really accessible way to use true threads that isn’t
nearly as tricky as e.g. multiprocessing or knowing all the gotchas (e.g.
accidental file descriptor sharing between non-fork-safe libraries) of fork();
why not use that?

A: Many people still prefer languages with primarily-forking parallelism[1]
constructs for reasons besides their fork-based concurrency capabilities--
nobody’s claiming multiprocessing beats goroutines for API friendliness--so
fork() remains useful in much more than a legacy capacity.

Q: Why not use $tool which does this via threads or why not bind
$threaded_language to $scripting_language and use threads on the other side of
the FFI boundary?

A: People won’t switch. They won’t switch because it’s hard (don't tell me
threaded Rust is as easy to pick up as multiprocessing--Rust has a lot of
advantages in this space, but that ain't one of them) and because there’s a
positive benefit to staying within a given platform, even if some infrequent
tasks (hopefully your Python doesn’t invoke multiprocessing _too_ much) are a
bit more cumbersome than usual. Also, “Friendly, easy-to-use concurrency with
threads” is often a very false promise. There’s a reason Antirez is resistant
to threading.

\--------------

TL;DR perhaps using fork() _and exec()_ for launching new programs needs to
stop. But fork() itself is absolutely essential for common real-world use
cases.

[1] References to parallelism via fork() above assume you have more than one
core to schedule processes onto. Otherwise it’s not that parallel.

EDITs: grammar. There will be several because essay. I won't change the
substance.

~~~
cryptonector
There is one case where fork() is fantastic: as a way to dump a core of a
running process while leaving the process running -- just fork() and abort()!
But even this case should be handled by having something like gcore(1).

Another common use of fork() for things other than exec()ing is multi-process
services where all will keep running te same program. Arranging to spawn or
vfork-then-exec self and have the child realize it's a worker and not a
(re)starter is more work because a bunch of state needs to be passed to the
child somehow (via an internal interface), and that feels hackish... And also
this case doesn't suffer much from fork()s badness: you fork() early and have
little or no state in the parent that could have fork-unsafety issues. But
it's worth switching this use-case to spawn or vfork-then-exec just so we have
no use cases for fork() left.

~~~
IshKebab
These are both mentioned in the paper.

------
makach
fork() must go too??!!

------
zerr
Interesting that Redis uses fork() for COW implementation.

~~~
Mic92
It should be possible to achieve the same with mmap() and MAP_PRIVATE

------
bbsimonbb
[https://www.youtube.com/watch?v=p-mGXLgGqkY](https://www.youtube.com/watch?v=p-mGXLgGqkY)

------
zzzcpan
It's hard to take them seriously when they imply the mess that threads are is
somehow acceptable and necessary, but nicer, less error prone and simpler fork
isn't. Threads are a nasty hack and a liability for the modern programmer to
use. And systems researchers really should acknowledge that their continued
existence as first class OS primitives is holding back systems research much
more, than fork. I guess they are looking to spread FUD and justify the mess
that Windows got itself into, not doing actual research.

~~~
IshKebab
What's wrong with threads exactly?

~~~
a1369209993
Aliasable, mutable memory (ie race conditions) is evil, and threads perfuse
the entire programming environment with it. This is a dirty implementation
detail that operating system kernels have to deal with, and we should be
burying it in the same hole as memory swapping and TCP retransmits, not making
it a fundamental hazard every application developer has to worry about.

------
chasil
I readily admit that I am unfamiliar with POSIX_spawn() and its benefits over
fork().

However, may I point out that Microsoft SQL Server benchmarks have been posted
that show Linux TCP-H outperforming Windows?

[https://www.dbbest.com/blog/running-sql-server-on-
linux/](https://www.dbbest.com/blog/running-sql-server-on-linux/)

While I am sure that this is wise criticism, it might also be concluded that
Windows itself contains no small amount of architectural decisions that limit
performance.

------
kazinator
Fork is quite excellent, except in cases when the intent is to run a different
program or when threads are involved (threads are basically an incompatible,
competing model of concurrency).

The use of fork as a concurrency mechanism (creating a new thread of control
that executes in a copy of the address space) is very good and useful.

In the POSIX shell language, the subshell syntax (command1; command2; ...) is
easily implemented using fork. This is useful: all destructive manipulations
in the subshell like assignments to variables or changing the current
directory do not affect the parent.

Check out the fork-based Perl solution to the Amb task in Rosetta code:
[https://rosettacode.org/wiki/Amb#Using_fork](https://rosettacode.org/wiki/Amb#Using_fork)

This essentially simulates continuations (in a way). (If the parent process
does nothing but wait for the child to finish, fork can be used to perform
speculative execution, similar to creating a continuation and immediately
invoking it).

Microsoft "researchers" can stuff it and their company's flagship piece of
shit OS.

~~~
afiori
The paper agrees with you that the fork models had a reason to exist and that
is is perfect for shells.

They also point out that on modern hardware you often should want to write
multithreaded multiprocess application.

Their main criticism of fork is that it does not compose at any level of the
OS (as it cannot be implemented over a different primitive)

I understand that a lot of people here dislike Microsoft for good reason (not
only historical), but drawbacks in fork() are well known and recognized, here
they point out that it is also hard-to-impossible to implement as a
compatibility layer if the kernel does not support fork.

Also:

> Microsoft "researchers" can stuff it and their company's flagship piece of
> shit OS.

Do you have any reason to insult Microsoft researchers? They have plenty of
citations in this paper of other researchers that appear to agree with them.
This type of comments does not appear constructive to me

~~~
kazinator
It's an idiotic argument. Only functions compose. Though fork is packaged as a
function, it's really an operator with a big effect.

Booting a system doesn't compose; let's not have power-on reset and
bootloaders.

Everything in this paper could have been cribbed from twenty year or older
Usenet postings, mailing lists and other sources. Fork has been dissected _ad
nausem_ ; anyone who is anyone in the Unix-like world knows this.

Oh, and threads have perpetually been the way to go on current hardware ---
every damn year since 1988 and counting.

~~~
afiori
Also levels of abstraction compose.

> Booting a system doesn't compose;

Actually this is false, virtual machine and hypervisors allow to boot a system
inside another system

~~~
kazinator
Virtual machines can be forked processes, and contain operating systems with
forked processes, some of which are virtual machiens ... fork composes!

~~~
afiori
as a function obviously, the point is that it does not compose easily with
other abstractions. That is every other library and OS functionality needs to
be fork-aware.

spawn do not have this requirement.

~~~
kazinator
The concept of "fork aware" didn't exist until threads. You could argue it's a
thread problem. Remember, every library and OS functionality aso needs to be
"thread aware" when threads are introduced. The _pthread_atfork_ function can
be thought about as "what do we do about thread and thread paraphernalia when
we fork" rather than "what do we do about fork when we have threads".

Even the close-on-exec flag race condition is a result of threads. duplicating
a file descriptor and setting its close-on-exec flag is a two step process
during which a fork can happen, causing a child to inherit the descriptor
without close-on-exec flag being yet set. But that can only happen if there
are threads. (Or something crazy, like fork being called out of an async
signal handler).

~~~
afiori
> You could argue it's a thread problem

But I explicitly want to not do it :) thread are obviously a good thing to
have.

> every library and OS functionality aso needs to be "thread aware"

which is good, because differently from the case with fork thread aware
libraries/OS help performance. Fork aware libraries/OS (in the case fork+exec)
do not.

~~~
kazinator
"Fork aware" is "thread aware". Hint: see the "pthread" substring in the
identifier "pthread_atfork".

Note that this is necessary only because of the broken threading model that
was retrofitted into Unix.

How it should work is that fork should clone the threads also. If a process
with 17 threads forks, then the child has 17 threads. The thread IDs should be
internal, so that all the pthread_t values in the parent space make sense in
the child space and refer to the corresponding threads.

It's not fork's fault that the hacky thread design broke it. Fork is supposed
to make a faithful replica of a process; of course if that principle is
ignored in a major way (like, oops, where are the parent's threads?) then
things are less than copacetic.

Threads also break the concept of a current working directory. If one thread
makes a relative path access and another calls chdir, the result is a race
condition.

Threads also break signals quite substantially; the integration of signal
handling with threads is a mess.

Threads are not inherently a good thing to have; they are idiotic, in fact.
Fork provides a disciplined form of threading that eliminates problems from
the mutation of shared state, and provides fault isolation. It's much better
to use forked processes instead of threads. Shared memory can be used for
direct data structure access. With fork, you can create a shared anonymous
mmap. This is then cloned into child processes as shared memory at the same
virtual address.

------
lawl
I havent read the entire thing yet, but reading from "replacing fork" to the
end it reads too much like embrace extend extinguish.

~~~
saagarjha
It's suggesting posix_spawn, which is standardized and has nothing to do with
Microsoft.

~~~
lawl
> Just as a programming course would not today begin with goto, we suggest
> teaching either posix_spawn() or CreateProcess(), and then introducing fork
> as a special case with its historic context (§2).

Or CreateProcess(), which has a lot to do with microsoft.

~~~
saagarjha
Yeah, if you're using Windows you aren't going to be able to use it. Or are
you suggesting that Microsoft should implement posix_spawn?

~~~
tomjakubowski
Can't you use posix_spawn() with WSL and your favorite POSIX-compatible libc
implementation?

~~~
Dylan16807
Well that's a complicated question to answer.

You can use the posix_spawn function in glibc, which uses a vfork or clone
syscall just like on Linux.

~~~
acqq
Also relevant, regarding the Linux native performance:

[https://mobile.twitter.com/RichFelker/status/602313979894038...](https://mobile.twitter.com/RichFelker/status/602313979894038528)

"Rich Felker, May 24, 2015: Some interesting preliminary timing of @musllibc
's _posix_spawn vs fork+exec shows it ~25x faster for large parent processes.
(~360us vs 9ms)._ #glibc has a vfork-based posix_spawn but it's only usable
for trivial cases; others use fork. @musllibc posix_spawn always uses
CLONE_VM. This also means @musllibc posix_spawn will fill the fork gap on
NOMMU systems cleanly/safely (unlike vfork) once we get NOMMU working."

Also evilotto's post here:

[https://news.ycombinator.com/item?id=19622477](https://news.ycombinator.com/item?id=19622477)

"a 100mb process generally takes >2ms to fork, while a 1mb or less process
takes 70us"

~~~
Dylan16807
Glibc got its main clone-based implementation in 2016, so it should be much
more competitive now.

------
Solomoriah
Okay, this one has me laughing out loud. Of COURSE Microsoft doesn't like
fork()... Windows pretty much can't do it. I'll admit, there have been a lot
of times I wish there was a more streamlined way to spawn processes on Linux
(particularly daemons) but when I don't have fork() I always end up missing
it. I'd take this paper a lot more seriously if it came from someone with a
less obvious bias.

~~~
arghwhat
As Linux developer and Windows hater, I agree with Microsoft. fork() is a
hack.

Of course, all Windows APIs are terrible, but that doesn't make complaints
about fork() any less legitimate. The concept of Establishing empty processes,
instead of cloning yourself, is much more sane.

After all, the use of fork() is 99% of the time just to call execve(), and
anything done in between is just to clean up the mess from fork(). Having a
dedicated way to just create processes in a controlled fashion would have been
better there. And, the other 1% is usually cases where pthread should have
been used instead.

~~~
gmueckl
Cleaning up your own process between fork and exec is hard. Several programs
resort to terrible hacks like force-closing everything except file IDs 0,1,2
in a loop. Or they look into their /proc directory to discover whichnfile IDs
exist, which is only marginally better. But when your process is a house of
cards built on third party libraries with their own minds, there are not a lot
of other options.

~~~
tobias3
Use O_CLOEXEC everywhere (even third party libs). It's really annoying, but
necessary. Means you need to use accept4(), dup3(), popen with an additional
"e" (of course all of that needs to be feature tested, during
compilation/runtime).

~~~
gmueckl
The catch is that you may not be able to control 3rd party libraries enough to
be be able to do all that. Thus all these annoying hacks. To me, the
complexity of using fork() and the race conditions around pid reuse are the
worst design problems of POSIX systems.

------
localhostdotdev
"every other system has a feature except us, and we are not going to add it
(because reasons) even if it's very widely used"

also:

> When a fork syscall is made on WSL, lxss.sys does some of the initial work
> to prepare for copying the process. It then calls internal NT APIs to create
> the process with the correct semantics and create a thread in the process
> with an identical register context. Finally, it does some additional work to
> complete copying the process and resumes the new process so it can begin
> executing.

[https://blogs.msdn.microsoft.com/wsl/2016/06/08/wsl-
system-c...](https://blogs.msdn.microsoft.com/wsl/2016/06/08/wsl-system-
calls/)

~~~
alkonaut
They must have considered it many times (not least when making the partial
posix support for NT) but felt that supporting it wouldn’t help deprecating it
either.

AFAIK it’s only unix/Linux (posix) OSes that implement fork. Perhaps that’s
what you meant by “every other system”, ie unix + clones/derivatives?

~~~
dijit
VAX/VMS implement "vfork()" which is an implementation of the most common use-
cases of fork.

VAX and VMS are not POSIX or UNIX-like.

~~~
msla
VAX is a hardware architecture. I'm not one to nitpick, but differentiating
between VAX and VMS when VMS ran on the VAX is confusing.

------
stirfrykitty
We already have posix_spawn. I guess MS isn't aware of this.

Methinks MS need to focus on their own issues and leave the _nix world alone.
While many people find their involvement in FOSS welcome, I do not and never
have. They are still a for-profit company beholden to shareholders.

The purchase by MS of GitHub may, again, be welcomed by many, but I find it
disastrous. I smell triple E here no matter what anyone says. This is why
distros like Debian and Slackware are still so important. All _nix needs to do
is start adopting MS ideas and then it's a matter of time before distros adopt
disastrous code like systemd. MS does want to control everything around them
like every other for-profit company. I cannot see this any other way. They are
involved for their own good, for things like Azure and their own "cloud". MS
needs to focus on their own garden and not that of *nix. I always have and
always will prefer the "us and them" mentality when dealing with MS. Don't
forget EEE. It's still a reality should you care to look hard enough.

~~~
naasking
Talk about uncharitable. MS Research produces world-class research. They don't
just research Windows, they do research in all operating systems, programming
languages and more.

~~~
stirfrykitty
It's not about being "uncharitable". It's about protecting _nix from being
controlled by outside forces. MS does, indeed. have world-class research, but
they are sticking their heads in the_ nix camp, which some of us don't like.
We're not all in this together, despite what some will tell you.

Sadly, UNIX (umbrella term here) is not what it was a few years ago. I dearly
miss Solaris, for example. Nothing touched it in it's day, not even AIX or HP-
UX. I was a UNIX admin for 10 years. I've used them all. Nothing MS can
produce will ever be better than pure UNIX. There is a reason it's still being
made. FreeBSD can outperform anything MS has on offer. Hell, they borrowed
networking code because they couldn't come up with better.

Not all of us see us all under the same tent. I surely don't and never will.
It's us and them. To say otherwise would indicate we on all on a level playing
field and we're all working together to a common good. We're not. Good
research aside, I don't like their history, stewardship, or about anything
else they do. Agenda...

~~~
naasking
Microsoft Research is not Microsoft. Microsoft Research employs some of the
main Haskell developers, and you don't see Haskellers going all conspiracy
theory. Research is research, and either the ideas they describe are good and
should be adopted, or they're bad and should be ignored.

~~~
stirfrykitty
This may be true, but I don't want MS having ANY say on what goes into a Linux
or FreeBSD OS. None. They have an agenda that doesn't fit in well with FOSS.
May no mistake about it, driving everything so that it works with Azure/VS,
whatever, is about staying relevant in a world that is largely leaving them
behind. Short of having to write PS at work (required), I haven't run anything
MS at home since 1998 and have no need to do so. EEE is alive and well. Ask
why they want *nix compatibility so bad. To extend their hegemony into
everything. There is nothing MS offers that I need. Nothing. I'm about to set
up a shop for some people that is completely and utterly MS free. Cost will be
only the HW. No software license costs. Freedom to do whatever. No stupid,
arbitrary concurrent connection limits. FOSS all the way.

~~~
naasking
> This may be true, but I don't want MS having ANY say on what goes into a
> Linux or FreeBSD OS. None. They have an agenda that doesn't fit in well with
> FOSS.

False. MS is now one of the leading FOSS contributors. You're living in the
past.

~~~
stirfrykitty
My living in the past is YOUR opinion. There are many millions of FOSS users
who are highly opposed to MS having any influence whatsoever on FOSS. They
have an agenda and it's never in the best interest of the FOSS crowd. Do you
think they are doing what they do out of benevolence? It's done for MS
software compatibility with FOSS, so users will choose Azure and their other
cloud offerings. It's done purely to keep them in the game and relevant. No
other reason. I heavily distrust MS, as over the years they have given many
reasons not to trust them. Ever wonder why so many people abandoned GitHub
after MS bought them?

