
Understanding the fork system call in Unix - objectivem
http://mohit.athwani.net/unix/understanding-the-fork-system-call-in-unix/
======
mehrdadn
Question: What happens (and should happen) if a GUI app forks? Do/should the
windows get duplicated? Do/should two apps control the same windows? Is the
answer the same/should it be the same regardless of all the underlying stack
(X/Wayland/Gnome/KDE/whatever)?

Edit: I really care about the "should" part too, not just what currently
happens. I've been trying to think of semantics for forking that could work in
these situations but I haven't been able to think of great solutions.

~~~
theamk
No, windows are not duplicated. Yes, two apps control the same windows. Yes,
this answer is the same as underlying stack.

That solution is very obivious once you realize it is exactly the same as what
happens with the terminal and network connections.

For example, after forking, both processes are still attached to the same
terminal (what else can they do? It’s not like they can make new terminals -
what if it was an ssh connection). They better agree who gets to read/write,
or user will get confused - in shell, the parent voluntarily stops accesing
terminal until the child is done.

~~~
mehrdadn
It's not clear to me if you can really just solve this like you can with the
terminal case. The thing with a terminal is that it's insanely easy to fork
streams -- you can just interleave the streams, and even without active
cooperation between the processes you still get a behavior that has some
semblance of semantic sense. But with a GUI system, you're not just dealing
with a pipeline; you're dealing with a state machine. Do you really take a
similar approach? When a window receives a message/event, which process
responds? If multiple processes respond, which message/event gets sent and
which gets discarded? If only one responds, does the same process also get the
reply to the reply, or might the fork get the reply instead? You could
interleave events randomly here too, but it seems like far less sensible of a
solution than with the terminal case.

~~~
chrisseaton
the Answer to most of your questions is that the GUI would quickly break after
being forked.

------
quotemstr
It's worth noting that on Linux, the address space itself is copy-on-write,
but the page tables _describing_ the address space are nevertheless copied
eagerly, and this copying can take significant time. If you're going to be
calling execve right after fork, you should be using vfork, not fork.

~~~
mehrdadn
Why don't people just use the standardized posix_spawn() or popen() in the
first place? Why fork then exec?

~~~
umanwizard
posix_spawn (at least in GNU/Linux) is a library function, not a syscall.
Under the hood, it calls clone (which, like vfork, avoids copying page tables)
and then execve.

The reason fork (or vfork, or clone) and exec need to be separate steps is
because you may want to do some stuff in between the two steps. The most
obvious example is redirection: when you run "grep foo < bar", the shell needs
to set stdin to point to "bar" before it execs "grep".

~~~
mehrdadn
Thanks but why don't people just use the portable posix_spawn when they can
(which should be the vast majority of the time)? That it calls clone/execve
underneath doesn't really answer my question...

~~~
umanwizard
As far as I know, there isn't any good reason not to. So indeed people should,
if they can.

(IIRC, posix_spawn was introduced relatively recently, so most old software
wouldn't use it).

~~~
manwe150
I was trying to find a year on it and couldn’t, but agree it seems newer.

FWIW, it’s not required to be able to run any setup code in the child to do
arbitrary setup. For instance, WinNT makes that very hard, but has a larger
syscall interface to compensate, so it’s actually more general. An alternate
design of running all code in the parent can be made work if the kernel
supports passing a pid handle to every syscall:
[https://lwn.net/Articles/360747/](https://lwn.net/Articles/360747/)

On the other hand, this is perhaps a bit more complex in some ways? or at
least just different.

~~~
cesarb
> I was trying to find a year on it and couldn’t, but agree it seems newer.

[http://man7.org/linux/man-
pages/man3/posix_spawn.3.html#CONF...](http://man7.org/linux/man-
pages/man3/posix_spawn.3.html#CONFORMING_TO) says POSIX.1-2001, so I'm
guessing between 1998 and 2002 (according to
[https://en.wikipedia.org/wiki/Single_UNIX_Specification#2001...](https://en.wikipedia.org/wiki/Single_UNIX_Specification#2001:_Single_UNIX_Specification_version_3,_POSIX:2001)
it started development in 1998 and was released in 2002). That man page also
says "since glibc 2.2", which according to Wikipedia was released November
2000.

------
saagarjha
> So “cin“, “cout” and “cerr” also known as stdin, stdout and stderr are
> shared.

Note that these are different things: std::cin is a std::istream, std::cout
and std::cerr are std::ostreams, and stdin, stdout, and stderr are FILE *s.

~~~
chrisseaton
They clearly mean in a practical sense they have the same purpose, not that
they’re literally aliases.

------
rachelbythebay
Please don’t use this code. fork can fail, and this doesn’t handle it.

~~~
civility
I wouldn't criticize anyone for posting code snippets for educational purposes
without confounding the learner with all of the error checking. Pedagogy vs
pedantry.

~~~
asveikau
I think the number of people who will copy-paste from a website and not
revisit it may be alarming. It is on those who publish snippets to make these
things explicit. Maybe the following is enough:

    
    
        if (pid < 0) { /* TODO: handle error */ }
    

Reading the actual code I doubt it's just an oversight, the coding style used
seems to lend strongly towards not thinking an error is possible:

    
    
        if ((pid = fork()) == 0)
        { //Child
            longTask(); 
        } else {
            children[i] = pid;
        }

~~~
civility
> the coding style used seems to lend strongly towards not thinking an error
> is possible

So does every tutorial which doesn't check the return from printf or check for
errors from cout. Did you notice that the parent and child are both writing to
the same file descriptor? Maybe the writer should add some words and tests in
his code about the inherent race condition there. A true pedant would push
this even further and teach their audience how to eliminate this problem with
cross process mutexes or semaphores. Better be sure to check the error codes
on those calls too.

I know arguing with a pedant is like wrestling with a pig, but honestly I'm
more offended by the "#define N 2" than by the lack of error checking.

------
millstone
archive.org mirror:
[https://web.archive.org/web/20190519000123/http://mohit.athw...](https://web.archive.org/web/20190519000123/http://mohit.athwani.net/unix/understanding-
the-fork-system-call-in-unix/)

~~~
joshbaptiste
Thanks, the site itself can no longer fork ;)

~~~
DonHopkins
Stick a fork in it. It's done.

------
DonHopkins
While we're on the topic of calling fork and overloading servers to the point
that they can't establish database connections or serve web pages, I'd like to
point out that forking another process is an astronomically EXPENSIVE and
WASTEFUL way of performing simple arithmetic and string manipulation.

Please stop wasting electricity and time by writing shell scripts that fork
processes to do ridiculously simple things like calling "bc" to add two
numbers or "sed" to perform simple string manipulation, since your computer
has a "mul" instruction that doesn't require millions of instructions and
bytes of memory and disk accesses to perform.

It's like using Bitcoin to pay for a stick of gum.

[https://www.quora.com/How-much-energy-does-it-take-to-
make-a...](https://www.quora.com/How-much-energy-does-it-take-to-make-a-
Bitcoin-transaction)

>According to Digiconmist, each bitcoin transaction took 215 kWh. Whereas an
average Indian household consume : 90 kWh per month. It sums up that, each
bitcoin consumption can power a average indian household for 2.5 months
probably. The average electricity cost in India is 5 rupees per kwh.

It's like rolling coal because you want to trigger the libs.

[https://www.youtube.com/watch?v=fkmLf5OXxz0](https://www.youtube.com/watch?v=fkmLf5OXxz0)

Stop and think before you write that shell script. It's not going to take you
any more time to write it in Python, and then you won't be so shocked and
surprised and totally fucked when you realize later that you need to add more
features and make it more complicated than you originally though it would need
to be in the first place.

It just boggles my mind that some people have such a hard time understanding
how spectacularly inefficient shell scripts are, on top of how horrible to
write, read, understand and maintain they are.

[https://news.ycombinator.com/item?id=19886164](https://news.ycombinator.com/item?id=19886164)

~~~
narnianal
For many things shell scripts are the easiest and most expressive way to write
them. And probably you don't know that human resources are more expensive than
cpu cycles.

Sure, I love python, but if I want to find and filter some files nothing is
easier to write than bash, so it's possible to have a quite big bash script
even doing a good job and being well readable.

I'm honestly a little shocked that this doesn't seem to be common knowledge
anymore and your post got upvoted so high.

My main language is Python btw.

~~~
DonHopkins
What if one of your file names has a space in it? Most bash scripts will crash
when that happens.

Please don't tell me never to use spaces in file names.

How many human resources are required to track down and fix why a bash script
suddenly starts failing mysteriously, years after it was written, when
somebody mistakenly dares to use a space in a file name?

That kind of problem never happens with Python scripts (including problems
with file names including other punctuation and unicode characters), because
Python simply represents file names as strings.

For a scripting language that was indented to manipulate file names all the
time, bash sure fucked up big time.

[https://www.linuxjournal.com/article/10954](https://www.linuxjournal.com/article/10954)

>Work the Shell - Dealing with Spaces in Filenames

>[...] I've outlined three possible solution paths herein: modifying the IFS
value, ensuring that you always quote filenames where referenced, and
rewriting filenames internally to replace spaces with unlikely character
sequences, reversing it on your way out of the script.

>By the way, have you ever tried using the find|xargs pair with filenames that
have spaces? It's sufficiently complicated that modern versions of these two
commands have special arguments to denote that spaces might appear as part of
the filenames: find -print and xargs -0 (and typically, they're not the same
flags, but that's another story).

>During the years I've been writing this column, I've more than once tripped
up on this particular problem and received e-mail messages from readers
sharing how a sample script tosses up its bits when a file with a space in its
name appears. They're right.

>My defensive reaction is “dude, don't use spaces in filenames”, but that's
not really a good long-term solution, is it?

>What I'd like to do instead is open this up for discussion on the Linux
Journal discussion boards: how do you solve this problem within your scripts?
Or, do you just religiously avoid using spaces in your filenames?

>Dave Taylor has been hacking shell scripts for a really long time, 30 years.
He's the author of the popular Wicked Cool Shell Scripts and can be found on
Twitter as @DaveTaylor and more generally at www.DaveTaylorOnline.com.

(Dave wrote that 8 years ago, and there are still zero comments on it. Nobody
has a good solution! Bash is just permanently fucked when it comes to file
names.)

~~~
pdkl95
bash has handled spaces in filenames easily for a long time:

    
    
        #!/bin/bash
        # run this scri0t with 0 or more filename args
        for file in "$@" ; do
            do_something_with "${filename}"
        done
    

> Nobody has a good solution! Bash is just permanently fucked when it comes to
> file name

From the man page bash(1):

    
    
        Special Parameters
           @  Expands to the positional parameters, starting from one.
              When the expansion occurs within double quotes,
              each parameter expands to a separate word.
              That is, "$@" is equivalent to "$1" "$2" ...

~~~
IshKebab
It's _possible_ to write bash that handles spaces in filenames properly, but
it's really hard to do it consistently. Unless you test it then you've
probably got it wrong.

Most people don't test their bash on spaces. Look at the dismissive commenter
above who just things "nah why would people use spaces in filenames?". Do you
think he quotes everything properly? No chance.

CMake is also really bad about this. Although it is arguable slightly better
because it uses semicolons as list delimiters instead of spaces, and
semicolons in filenames are going to be very uncommon. I would expect there
isn't a single CMake project out there that works if you build it in a path
containing a semicolon though!

~~~
DonHopkins
To compound the problem, bash tends to attract the kind of care-free security-
unconscious people who don't want to be bothered with going out of their way
to handle unexpected "trivialities" like spaces in file names or carefully
sanitizing user input or meticulously testing their code.

The same kind of lazy care-free people who love the convenience of PHP's
"register_globals", because it "saves typing".

If they're too lazy to use something like Python in the first place, they're
probably also too lazy to be conscientious and anal retentive enough to write
and test bash scripts that handle spaces in file names properly, or do any
kind of user input sanitization.

And they don't think twice about running their scripts as root, either. Or
writing blog postings telling other people to copy and paste their code
snippets into a root shell, or worse yet download and run them without
auditing first, by piping curl into a root shell.

I recently had an Xcode project fail to build (blender, which uses cmake),
because I was using a version of Xcode that had a space in its directory name
(because I had foolishly added a space followed by the version number to the
name "Xcode 9.4.1.app"). Of course that was because of some of the tools and
custom build steps were calling out to bash. The problem wasn't in any of the
source code or configuration files, it was in the file name of the tool
itself! It took hours of frustration poring over machine generated project and
makefiles to diagnose the problem from the obscure unhelpful error message,
which I will never get back.

In the long term, isn't it more efficient of a lazy person's time to simply
use a scripting language that just doesn't have these problems in the first
place, and doesn't require you to preemptively bend over backwards and dodge
silent invisible bullets all the time?

It's pretty ridiculous that a scripting language intended to be used to
operate on files falls flat on its face and is so difficult to use when you
have something as common and innocuous as files with spaces in their names (or
worse yet, file names that begin with dashes -- think of the security
implications of that: not something a lazy careless bash scripter would ever
pause to reflect about)!

It really is an extremely common and unsolved problem. Notice how each "better
answer" is actually worse than the last and comes with its own host of
problems and edge cases and work-arounds:

[https://unix.stackexchange.com/questions/9496/looping-
throug...](https://unix.stackexchange.com/questions/9496/looping-through-
files-with-spaces-in-the-names)

[https://stackoverflow.com/questions/3967707/file-names-
with-...](https://stackoverflow.com/questions/3967707/file-names-with-spaces-
in-bash)

[https://www.cyberciti.biz/tips/handling-filenames-with-
space...](https://www.cyberciti.biz/tips/handling-filenames-with-spaces-in-
bash.html)

[https://www.tecmint.com/manage-linux-filenames-with-
special-...](https://www.tecmint.com/manage-linux-filenames-with-special-
characters/)

[https://www.hecticgeek.com/2014/02/spaces-file-names-
command...](https://www.hecticgeek.com/2014/02/spaces-file-names-command-
line/)

------
dicroce
fork() explanations generally describe the api's, maybe talk a little about
COW and memory protection... but I never see sufficient explanation of WHEN to
use fork().

