
fork() can fail - dantiberian
http://rachelbythebay.com/w/2014/08/19/fork/
======
kabdib
When I was young and really didn't understand Unix, my friend and were summer
students at NBS (now NIST), and one fine afternoon we wondered what would
happen if you ran fork() forever.

We didn't know, so we wrote the program and ran it.

This was on a PDP-11/45 running v6 or v7 Unix. The _printing_ console (some
DECWriter 133 something or other) started burping and spewing stuff about fork
failing and other bad things, and a minute or two later one of the folks who
had 'root' ran into the machine room with a panic-stricken look because the
system had mostly just locked up.

"What were you DOING?" he asked / yelled.

"Uh, recursive forks, to see what would happen."

He grumbled. Only a late 70s hacker with a Unix-class beard can grumble like
that, the classic Unix paternal geek attitude of "I'm happy you're using this
and learning, but I wish you were smarter about things."

I think we had to hard-reset the system, and it came back with an inconsistent
file system which he had to repair by hand with ncheck and icheck, because
this was before the days of fsck and that's what real programmers did with
slightly corrupted Unix file systems back then. Uphill both ways, in the snow,
on a breakfast of gravel and no documentation.

Total downtime, maybe half an hour. We were told nicely not to do that again.
I think I was handed one of the illicit copies of Lions Notes a few days
later. "Read that," and that's how my introduction to the guts of operating
systems began.

~~~
derefr
> ...a minute or two later one of the folks who had 'root' ran into the
> machine room with a panic-stricken look because the system had mostly just
> locked up.

It's kind of weird that, while root has always had e.g. 5% reserved disk space
on the rootfs for emergencies, one thing no Unix has ever done is enforce a 5%
CPU reservation for root so administrators can "talk over" a cascading
failure. I think this is possible just recently in Linux with CPU namespacing,
but it's still not something any OS does by _default._

~~~
zurn
It's not specifically the lack of cpu timeslices that crowds out other
programs, it's more like exhaustion of all the OS resources (process table
fills up, file table fills up, memory runs out, swap death etc).

Sure if you carefully made everything fork-bomb-resistant then a cpu quota
would be a part of it. Container systems use fork bombs as basic test cases.

~~~
derefr
I'm surprised that this wasn't one of the primary goals of cgroups: the
ability to group "all userspace processes" into one cgroup, and then say that
that cgroup can _in sum_ only use so much CPU, so many processes, so many
inodes, etc. You know, a control plane/data plane separation, without
requiring hypervision.

~~~
throwaway0010
It is. Cgroup provides limits for memory, CPU time. We already have other
accounting mechanisms for processes/threads (rlimits) and for inodes and disk
space (disk quota systems). We've had those for ages. I imagine there will be
more work to integrate these various accounting mechanisms with cgroup as the
work continues.

------
cperciva
This reminds me of one of the most epic bugs I've ever run into:

    
    
        mkdir("/foo", 0700);
        chdir("/foo");
        recursively_delete_everything_in_current_directory();
    

Running as root, this usually worked fine: It would create a directory, move
into it, and clean out any garbage left behind by a previous run before doing
anything new.

Running as non-root, the mkdir failed, the chdir failed, and it started eating
my home directory.

~~~
stinos
When you see chdir, or any notion of the current working directory being used
for anything: run as fast as you can. (or refactor if it's not too late).
Things I've seen because of software relying on it.. Sometimes it's just
directories/files it creates popping up all over the place, sometimes it's
'just' crashing, but yes sometimes it starts to erase and all hell really
breaks loose.

~~~
pachydermic
If you can't rely on current working directories then you have to specify any
file locations absolutely? That doesn't seem like a good idea because then
your code quickly turns into a hot mess if you ever have to change where stuff
lives.

This is such a stupid problem I run into a lot. Both alternatives (doing
things with absolute paths vs doing things entirely with relative paths) seem
to have a lot of downsides. Overall relative paths seems to be way better, but
then you leave yourself open to problems which "rhyme" with the one OP was
talking about.

~~~
dllthomas
As agwa and I mentioned in sibling comments, there are the ...at() functions,
which let you specify actions on paths relative _to a specific directory you
have a file descriptor for_. This not only avoids issues like the above
(failing to open the directory and then failing to check for failure will mean
you're passing -1 into unlinkat, which would simply fail) but will also keep
you talking about the same place if links are moved around somewhere up the
tree from where you are working.

~~~
tedunangst
Rearranging the tree above CWD isn't a problem, CWD will follow along just
fine (as in, be the same directory). Also, you're assuming AT_FDCWD won't be
-1, which could be reasonable but isn't guaranteed afaik.

~~~
dllthomas
_" Rearranging the tree above CWD isn't a problem, CWD will follow along just
fine (as in, be the same directory)."_

I was citing rearranging the tree is a potential issue with absolute
directories, not with relying on CWD - that certainly could have been clearer.

 _" Also, you're assuming AT_FDCWD won't be -1, which could be reasonable but
isn't guaranteed afaik."_

Interesting point regarding guarantees. It's not -1 on any existing OS that I
can find (it seems to be -100 on Linux and FreeBSD, -3041965 on Solaris, -2 on
AIX), and shouldn't be for precisely this reason, but something to bear in
mind if you are working on something more obscure that nonetheless has these
functions.

Of course, you shouldn't be relying on reasonable behavior from functions
passed a bad FD in general. It's just nice to have the additional defense when
that _does_ get missed.

------
spudlyo
_If a function be advertised to return an error code in the event of
difficulties, thou shalt check for that code, yea, even though the checks
triple the size of thy code and produce aches in thy typing fingers, for if
thou thinkest "it cannot happen to me", the gods shall surely punish thee for
thy arrogance._ [0]

[0]: [http://www.lysator.liu.se/c/ten-
commandments.html](http://www.lysator.liu.se/c/ten-commandments.html)

~~~
quotemstr
Counterexample: pthread_mutex_unlock. That function returns an error code, but
it cannot possibly fail in a well-formed program. Checking for an error for
mutex unlock is pointless: what would you do in response?

~~~
cpeterso
> cannot possibly fail in a well-formed program

I think that's your answer. A mutex error return probably indicates an
application bug, such as double unlock. You should probably assert or abort on
"can't happen" mutex errors.

Programmers are lazy. If they take the time to document an error return value,
then you should probably heed their warnings. :)

~~~
quotemstr
Indeed. I like using a VERIFY macro:

    
    
        #ifdef NDEBUG
        # define VERIFY(x) ((x), 1)
        #else
        # define VERIFY(x) assert((x))
        #endif
    

Then you can write

    
    
        VERIFY(pthread_mutex_unlock(&lock) == 0);
    

You don't need, however, to consider the possibility of your program
continuing to run after pthread_mutex_unlock fails.

~~~
unwind
Huh? You do realize that the standard assert() macro _already_ is compiled out
if NDEBUG is defined, right?

Your code could just as well be written as

    
    
        assert(pthread_mutex_unlock(&lock) == 0);
    

which of course has the added benefit of not inventing anything new, i.e.
being standard and immediately understood by anyone who knows the language and
its libraries reasonably well.

~~~
unwind
Replying to self since I can't edit: d'oh. Yes, I totally mis-read the
original code. I should have relized why the other comment questioning this
practice had been down-voted, heh.

Of course not unlocking the mutex in non-debug builds would be a problem.

Thanks, and sorry.

------
jwise0
In a similar family, note also that setuid() can fail! If you try to setuid()
to a user that has has reached their ulimit for number of processes, then
setuid() will fail, just like fork() would for that user.

This is a classic way to get your application exploited. Google did it (at
least) twice in Android: once in ADB [1], and once in Zygote [2]. Both
resulted in escalation.

Check your return values! All of them!

[1]
[http://thesnkchrmr.wordpress.com/2011/03/24/rageagainsttheca...](http://thesnkchrmr.wordpress.com/2011/03/24/rageagainstthecage/)
[2]
[https://github.com/unrevoked/zysploit](https://github.com/unrevoked/zysploit)

~~~
agwa
Thankfully, setuid() no longer fails on Linux because of RLIMIT_NPROC, as of
2011[1].

Still, I agree with you 100%: check your syscall return values, especially
security-critical syscalls like setuid!

[1] [http://lwn.net/Articles/451985/](http://lwn.net/Articles/451985/) and
[http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.g...](http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=72fa59970f8698023045ab0713d66f3f4f96945c)

------
azinman2
I see a lot of comments blaming the programmer. This is completely the wrong
attitude.

Why are you treating the programmer like a machine? They're not a machine --
they're human. Regardless if they fully understand the API or not things
should have have sane defaults for HUMAN FACTORS reasons.

Bugs will always exist. The fact that the Linux kernel has many bugs is just
one example of a code base that has over a decade of work put into it by many
people with high skill shows that bugs are inevitable.

The goal should be to assume people will do stupid things and make fatal
behavior more explicit/difficult. Do we really need -1 for kill to do such
behavior? How common is that anyway? It's a pretty destructive behavior, and
probably should be removed from kill. The human factors approach would say if
you really want that behavior then write a for loop to do over the list of
pids, because it should never be within easy reach especially for such an
uncommon scenario.

Apple's iOS API is similar. Try to insert a nil object into an array? Crash.
Try to reload an item in a list that's past the known objects index? Crash. So
instead of doing something sane like reloading the entire list, the user has a
shit experience because off by one errors happen easily especially in front-
end/model work [1 re: fb's persistent unread chat].

Not recognizing the human part of things leads to issues everywhere..
reminding me of this article on human factors in health care previously posted
on HN [2].

Conclusion: design for humans and default to non-fatal situations.

[1] [http://facebook.github.io/flux/](http://facebook.github.io/flux/) [2]
[http://www.newstatesman.com/2014/05/how-mistakes-can-save-
li...](http://www.newstatesman.com/2014/05/how-mistakes-can-save-lives)

~~~
aroman
In general I agree with your "don't blame the programmer" point, but I would
seriously hesitate to criticize fork(). Yes, in 2014, that behavior seems
uncommon and it seems like very poor design to lump such destructive behavior
into an otherwise meaningless "-1"...

but remember that fork was not written in 2014. It was written _forty-five
years ago_. I'm not saying it was a great API design decision back then, but
I'm willing to bet that it seemed a lot less "wrong" at the time.

~~~
cpncrunch
I think azinman2 was criticizing kill(), not fork(). It would make more sense
to have a separate system call, something like killall().

~~~
masklinn
You can criticise both, and more importantly criticise C for its inability to
create sensible APIs: in a good design, fork() would have exclusive domains
for a PID, an Error and a Child result and you couldn't confuse an error for a
pid.

~~~
SeanLuke
> criticise C for its inability to create sensible APIs

Where in the C manual is kill() described again?

------
jgrahamc
Quietly goes to check the last piece of C I wrote containing a fork():

    
    
        if (daemon && !test_mode) {
          int pid = fork();
          if (pid == -1) {
            fatal_error("Failed to fork");
          }
          if (pid != 0) {
            write_pid(pid_file, pid, !test_mode);
            exit(0);
          }
        } else {
          write_pid(pid_file, getpid(), !test_mode);
        }
    

Phew!

~~~
cperciva
I usually use switch with fork:

    
    
        if (daemon && !test_mode) {
            int pid;
            switch (pid = fork()) {
            case -1: /* Error */
                fatal_error("Failed to fork");
            case 0: /* In child */
                break;
            default: /* In parent */
                write_pid(pid_file, pid, !test_mode);
                exit(0);
            }
        } else {
            write_pid(pid_file, getpid(), !test_mode);
        }

~~~
colanderman
For daemonization, daemon(3) is better (EDIT: assuming you only care about
Linux). (It also chdirs to /, closes STD*, and detaches from the terminal.)

~~~
cperciva
Alas, daemon(3) is not POSIX.

------
mutation
Just noticed that in Perl the behavior is slightly different:
[http://perldoc.perl.org/functions/fork.html](http://perldoc.perl.org/functions/fork.html)
unsuccessful fork() returns undef, effectively stopping you from kill-ing what
you don't want to kill.

~~~
takeda
Similarly python throws an exception, and I bet other languages have their own
behaviors, but in case of C this is the only way (or at least it is the only
non complicated way to do it).

When I read this article I thought it was preaching to a choir. I'm actually
quite surprised people programming C don't check for errors. That's the only
way the functions can provide a feedback.

~~~
_yosefk
Nothing in C forces the API designer to use -1 as "bad PID" in one place and
as "the set of all PIDs" in another, however. Perl's undef isn't that
different from returning, gosh, -2 or any other bloody number _except_ -1 in
C.

~~~
takeda
Actually this is not true.

fork() returns pid_t type which is usually mapped to int32_t. For this type
there's no equivalent of Perl's "undef", the -1 is standardized as an error in
all system calls that return an integer.

As for the argument why not send -2 instead, well guess what? Other negative
values also have a meaning. Negative values in kill send signal to a process
group instead of a process.

It's not libc responsibility to predict all possible things the programmer can
do. Also unlike perl, C doesn't have exceptions so it can't exactly quickly
terminate on error showing what went wrong.

Imagine C throwing SIGSEGV every single time a function failed.

~~~
lmm
> As for the argument why not send -2 instead, well guess what? Other negative
> values also have a meaning. Negative values in kill send signal to a process
> group instead of a process.

That's the problem there. Kill takes an argument that's either a process id or
a magic number or a different magic number or.... Those should be different
functions, and the special cases like "kill all processes", "kill all
processes in this group",... should be some kind of enumeration type. But it's
C, so...

------
quotemstr
I wish posix_spawn were ubiquitous; it's a much better process-launching
interface than fork: it's naturally race-free and amenable to use in multi-
threaded programs, and unlike fork(2), it plays well with turning VM
overcommit off. (If overcommit is off and a large process forks, the system
must assume that every COW page could be made process-private and reserve that
much memory. Ouch.)

~~~
FooBarWidget
Unfortunately, posix_spawn is woefully underpowered. I can't make the child
process a session leader (setsid) or process group leader (setpgrp). I can't
set a working directory. Etcetera.

~~~
quotemstr
Yeah --- but at least it's fairly obvious how to add platform-specific
extensions that won't conflict with future standards.

~~~
wiml
And extensions to let us specify failure-case or other behavior of those
extensions, and so on. But at that point, we're already heading down the road
of implementing a tiny DSL for "the program that posix_spawn() should run
after creating the new process but before exec()ing the new executable". Why
not simply write that code in the host language? You could specify a thunk of
code to be sent to the new process and executed there. Oh, you'll also need to
pass any data structures that code relies on--- pass a closure, not just a
thunk. And garbage-collected references to any system objects that those data
structures rely on. Congratulations, now you have fork()! If you squint a
little, that's exactly what fork() provides you --- a closure and
continuation.

~~~
quotemstr
Yes, you end up with a tiny DSL for specifying transformations to make to a
child process: they key difference is that the kernel can execute this DSL
much more efficiently than it can code written in the host language: fork
closes over the entire world, and posix_spawn doesn't have to do that.

------
jbb555
This to me is a good example of why exceptions in modern languages are good
way to handle errors. In this case the user has basically ignored the error
return from fork() and the accidentally used it in kill.

If fork() had thrown an exception for an unexpected failure then the user
could not have accidentally ignored it in the same way.

I realize that this is not appropriate for a system call but it seems like a
good example of why handling errors using exceptions is helpful sometimes.

------
AnimalMuppet
Somewhat OT, but in the same neighborhood:

Standard file handles are another thing you should not assume are there
(though I'm not sure how to test for it programmatically).

We once had a user that, for whatever reason, tweaked their Unix installations
to not pass an open stderr to processes - they just got stdin and stdout (that
is, file handles 0 and 1, but not 2). If you wrote to stderr anywhere in your
program, it wrote to whatever was open on handle 2, which was _not_ a stderr
that the OS passed in.

Yeah, that's a pretty insane thing to do, but somebody was doing it...

~~~
djcapelis
Wow that's a really wild story, but I think it's also pretty different. fork()
returning -1 is defined behavior, as is true for many functions. Whereas not
having stderr open defies everything about the C standard I/O. (K&R B1, 7.5 &
7.6)

Note however, that strictly speaking stderr does not have to be 2. It can be
any number, but it has to be whatever the include file says it is, so if you
don't specify the stream as stderr but instead write to stream 2, that would
potentially be a problem.

That said, if for some reason you have a program completely defying the C
standard, you can test whether the streams are open (and they are explicitly
defined as having to be open) using fcntl and testing for EBADF at the very
beginning of the program.

~~~
__david__
> Note however, that strictly speaking stderr does not have to be 2. It

Nope. On POSIX systems stderr is defined as 2:

[http://pubs.opengroup.org/onlinepubs/9699919799/functions/st...](http://pubs.opengroup.org/onlinepubs/9699919799/functions/stdin.html)

    
    
      > The following symbolic values in <unistd.h> define the file descriptors
      > that shall be associated with the C-language stdin, stdout, and stderr
      > when the application is started:
      > 
      > STDIN_FILENO
      >     Standard input value, stdin. Its value is 0.
      > STDOUT_FILENO
      >     Standard output value, stdout. Its value is 1.
      > STDERR_FILENO
      >     Standard error value, stderr. Its value is 2.

~~~
djcapelis
Not quite. That is according to POSIX, but not the C standard.

stderr is defined by the C standard. POSIX is a standard followed by many of
the systems that run C.

Strictly speaking, stderr does not have to be 2. It wouldn't comply with POSIX
in that case, but it would still be C.

~~~
__david__
You are not correct here: There are no fds in the C standard. The only thing
defined is fopen/fread/fwrite (which are FILE*). The open/read/write API is
only defined by POSIX, where 2 is most definitely stderr.

~~~
djcapelis
I cited the exact portions of K&R that specify stdin, stdout and stderr in my
original post: K&R B1, 7.5 & 7.6

Section 7.6 explicitly says all three must be open when the program begins.
There are equivalent sections in the formal standards, I cited K&R because it
was on my shelf.

Please stop and bother to check your facts before continuing. You're very
close to being accurate (POSIX does define open, but C defines stderr and some
functions to print to it, the underlying mechanics are the choice of the
implementing system.) But don't you think it would be nice to check that I
actually am before writing yet another post simply asserting I'm wrong?

~~~
__david__
You are wrong because you keep asserting that file descriptor #2 is not
defined to be standard error. The only place that defines the interface that
uses file descriptors is POSIX and it defines standard error to be file
descriptor #2, end of story.

The sources you cite say nothing of file descriptors; they are all references
to the standard FILE* interface in C. Those are opaque pointers and have
nothing to do with 0, 1, or 2.

You may be confused because I abbreviated "standard error" as "stderr", yet I
was never talking about the C standard global "FILE *stderr". That was sloppy
of me.

~~~
djcapelis
When you printf to stderr, _nothing_ in that function call is dependent on
POSIX semantics. It is dependent on the semantics of C, which requires a
stream called stderr to: 1) exist 2) be open when the program starts

POSIX implements this using file descriptors and specifies that stderr is 2.

I said: "strictly speaking, stderr does not have to be 2" which is true. A
system is welcome to implement file descriptors and make stderr's something
other than 2. It will be a blatant violation of POSIX, but complying with
POSIX is optional. Systems that don't just aren't POSIX systems. Complying
with the C standard? Not really optional.

~~~
__david__
> When you printf to stderr, nothing in that function call is dependent on
> POSIX semantics.

fprintf(), but yeah. That is exactly what I've been saying, too.

> I said: "strictly speaking, stderr does not have to be 2" which is true.

Ok, but that's just like saying, "strictly speaking it's a valid C to write a
bunch of zeros to a file and call it a jpeg."

It might be technically true (complying with the JPEG spec is optional for C
programs, too), but it's in no way a reasonable thing to argue for.

~~~
djcapelis
If I was arguing that it was reasonable, I'd have said that. I merely stated C
allows it. Which it does.

As for the printf/fprintf flub, totally.

~~~
__david__
> If I was arguing that it was reasonable, I'd have said that. I merely stated
> C allows it. Which it does.

That's the thing. The C standard does not _allow_ it. The C standard merely
never mentions it, which is a different thing entirely. Hence my analogy to
JPEGs (which the C standard never mentions either).

And I meant the argument itself is unreasonable. It makes no sense! I could
just as well argue that the TCP/IP RFCs allow fd 2 to be something other than
standard error. Or the HTTP 2.0 spec. Or the Ecmascript spec. Arguing that
something is allowed by a spec that never mentions it and _has absolutely
nothing to do with it_ is not an argument.

------
IgorPartola
Back in the day I had a Motorola Atrix (remember those? First dual core
Android phone, best thing since sliced bread, abandoned by Motorola a few
months after launch?). Well, one of the ways to root it was to keep forking a
process until the phone ran out of memory. After fork failed, you were left
with a process that for some reason was running with root privileges...

~~~
TheLoneWolfling
Probably tried to setuid and didn't bother to check the result.

------
Aurel1us
Just as a reminder: "So, malloc on Linux only fails if there isn’t enough
memory for its control structures. It does not fail if there isn’t enough
memory to fulfill the request." \-
[http://scvalex.net/posts/6/](http://scvalex.net/posts/6/)

~~~
quotemstr
Malloc can also fail if you're out of _address space_ without necessarily
being out of _memory_.

NT does a much better job of separating these concepts than Unix-family
operating systems do. Conceptually, setting aside a region of your process's
address space and guaranteeing that the OS will be able to serve you a given
number of pages are completely different operations. I wish more programs
would use MAP_NORESERVE when they want the former without the latter. (I'm
looking at you, Java.)

One day, perhaps when I am old and frail, we will achieve sanity and turn
overcommit off by default. But we're a long way from being able to do that
now.

~~~
klodolph
These days, I describe malloc() as "a function which allocates address space",
to avoid confusion. Which means it makes sense that malloc() returns NULL if
you are out of address space, even if you have lots of memory. (But so many
people don't check malloc()'s return anyway...)

------
kazinator
I once did

    
    
        rm -rf $PREFIX/usr/lib
    

in a Bash script being run as root. PREFIX was misspelled, and set -u was not
in effect, so the misspelled variable silently expanded to nothing ...

~~~
Macha
Or Bumblebee's pretty infamous by now:

    
    
        rm -rf /usr /lib/bumblebee

------
zokier
Who needs type safety when we got integers.

~~~
AnimalMuppet
Is this just using some vague connection to ride on one of your favorite hobby
horses? Or does this have a connection to the article, and I missed it?

~~~
mkehrt
The problem is that C and POSXIX don't (idiomatically) provide rich enough
data types force checking the error condition. The fact that the error is
signaled by a random integer (-1) is horrible. In a language with stronger
types and richer data structures, one can have a return type that is a
disjunction of {failure, parent, child}, so you can never accidentally treat a
failure as a PID.

In a functional language this would look like a datatype (essentially a
generalized enum), while an OO language you would use different subclasses of
a common superclass.

In C you _can_ return a tagged union and check the tag, but nothing forces you
to do the check. A user of this API can just go ahead and assume the success
branch of the union. Furthermore, this isn't idiomatic POSIX, so it is never
done.

[edit: "subclass" -> "superclass"]

~~~
dllthomas
You could return a pointer or null. That would successfully force the "did it
succeed" check, but of course raises questions about memory management.

Tagged union is probably the best approach. It doesn't prevent skipping the
check, but it at least makes "thing I am supposed to use" different than
"thing I am supposed to check".

------
trippy_biscuits
"Unix: just enough potholes and bear traps to keep an entire valley going."

If you don't understand how to use sharp tools, you may hurt yourself and
others. Documentation for fork() clearly explains why and when fork() returns
-1. Those that find the man page lacking or elusive may get more out of an
earnest study of W. Richard Stevens' book, Advanced Programming in the UNIX
Environment. In any case, every system programmer should own a copy and
understand its contents.

~~~
ben336
> If you don't understand how to use sharp tools, you may hurt yourself and
> others.

It's still bad API design when naively handling an error case kills
everything. Is there an inherent reason that the error value for a pid has to
be the same as the "all pids" value? Unless there's a very compelling reason,
it seems like very poor design, well documented or not.

~~~
din-9
The inherent reason is that -1 is the most common error return code in C based
APIs. The problem is not naively handling an error case, it's not handling an
error case. Using a different value might avoid calling killall -1, but the
program would still be incorrect.

This is the same sort of argument as strlcat vs strncat, and people can't
agree on that one.

------
ajarmst
Stevens and Rago, "Advanced Programming in the Unix Environment, Volume II",
page 211,212.

if ((pid = fork()) < 0) err_sys("fork error"); is idiomatic in Unix.

~~~
wiml
err_sys() is something you have to provide yourself, though, which takes you
out of the flow of the code you're thinking about, which is half the reason
people don't write error handling in the first place.

Once I discovered the existence of the BSD err()/errx()/warn() functions,
though, the error handling in even my quick one-off programs became much
better and more informative.

    
    
       pid_t child = fork();
       if (child < 0) {
         err("fork");
       } else if (child == 0) {
         ... in child
       } else {
         ... in parent
       }
    

is idiomatic, quick to write, and produces useful error messages when that
"throwaway" program starts failing years later.

------
VLM
"Neither of them fail often"

See /etc/security/limits.conf and nproc and "fork bomb"

Aside from intentional fork bombs I've seen this done intentionally in the
spirit of a OOMkiller to keep a machine alive for debugging / detection of
problem. 100 "whatever" processes will kill this webserver making it
impossible to log in and diagnose much less fix, so we'll limit to 50
processes in the OS.

I've also seen it in systems where people are too lazy to test if a process is
running before forking another and the system doesn't like multiple copies
running (like a keep alive restarter pattern). If ops has no access to the
source to fix that or no one cares, then just run it in jail where you only
get two processes, the restarter-forker and the forkee. Then hilarity can
result if the restarter thinks the PID of the failed fork means something,
like sending an email alert or logging the restart attempt. "Why are my logs
now gigabytes of ERROR: restarted process new pid is -1?"

------
ionelm
Seems Python handles this correctly (by raising an exception):

    
    
        >>> resource.setrlimit(resource.RLIMIT_NPROC, (0, 0))
        >>> os.fork()
        Traceback (most recent call last):
          File "<ipython-input-7-348c6e46312a>", line 1, in <module>
            os.fork()
        OSError: [Errno 11] Resource temporarily unavailable

~~~
illumen
There are plenty of little nice things like this in python that save you. So
if you're porting python code to other platforms... be very careful!

------
prasoon2211
The first time I learnt of fork (from an OS book), the example had three
branches to the if statement after fork - and the first tested for a negative
pid. I suspect that the reason this link has 400 odd upvotes is because more
people aren't learning OS the correct way in the beginning. Or maybe my OS
book was nice. IDK.

~~~
lchengify
This. One of the most useful classes I took in undergrad was implementing a
scheduler and an I-node disk. Nothing shows you all the ways fork() can fail
like having to implement it.

Every system call can fail, even if it doesn't do something obvious like use
disk resources. Ignoring this is how subtle bugs appear that seem
unreproducible until you implement correct error handling.

------
brazzy
And this right there is why exceptions are a superior mechanism of announcing
errors...

~~~
andrewchambers
Multiple return would be fine too.

pid,err = fork()

~~~
knocte
Are you sure?

And what if $programmer forgets to check what's in err? What would pid contain
in that case?

I mention this because I guess you quoted a kind of syntax that matches the
one from Go.

So then I'm guessing that Go would simply ignore the error in this case.

However, having a proper exception mechanism, if you don't catch the problem,
then it bubbles up, and the program doesn't continue with wrong data (which is
a good thing => fail fast!).

This is the biggest downside of GoLang IMNSHO.

~~~
MichaelGG
Yeah, better than returning a tuple, it should return a sum type. Then you
would need to deconstruct it, like how people suggest using a switch:

match fork() -> | Error(errno) -> ... | Pid(pid) -> ...

Or a general "Choice" sum, perhaps using phantom types so int<err> isn't
compatible with int<pid>. But then all of a sudden, instead of a single word
being returned, a tag and possibly variably-sized result has to be returned,
and that's quite a hassle which doesn't fit well with C.

~~~
masklinn
A "Choice" (Either in Haskell, Result in Rust) wouldn't work for fork() as it
can have 3 results, and you'd want the `Child` case cleanly and easily
separated from `Pid`.

~~~
mbel
I think the parent meant only a sum type, not a concrete example of it such as
Either of Haskell, which would surely not suffice here. In Haskell you would
probably define a new sum type for this occasion, e.g.:

    
    
        data ForkResult = Failure | Parent Int | Child

------
serve_yay
Out of "-1 as failure return value", and "-1 to signal all possible
processes", at least one is a bad idea.

~~~
leni536
It's the second one. kill() shouldn't have the kill(-1,...) functionality,
there should be a separate syscall for this.

------
mcguire
If you have set a non-root user's process limits correctly, sending SIGKILL to
all of that user's processes is likely a perfectly fine response to their
fork() failing.

If you haven't limited the number of processes a given non-root user can start
to some value the machine can handle, sending SIGKILL to all of the user's
processes is probably not going to do anymore damage.

If a program running as root _doesn 't_ correctly handle fork() failing,
_someone_ needs to be taken out back and beaten with a stick. Maybe the person
who wrote the program, maybe the person who ran it as root. But somebody.

------
Mister_Snuggles
This reminds me of the time I was telnet'd (since SSH wasn't a thing at the
time) into a remote SunOS/Solaris server. At the time my only Unix experience
was with Linux.

"killall -9 httpd" gave an unhelpful error message. "killall httpd" also gave
an unhelpful error message. "killall", which would give you usage instructions
in Linux, killed all processes on the system. Reading this article makes me
figure that killall was likely a frontend to kill(-1, ...).

That day I learned a valuable lesson about reading man pages and understanding
that not all unixes are the same.

~~~
tracker1
That's funny... I almost always try "command --help" first if I'm not sure. Of
course some may point out "man command" but I always find man painful, and
revert to google.

~~~
acdha
At least one hairy old Unix killall took no arguments – so "killall --help"
was just as bad…

------
japaget
See also
[https://news.ycombinator.com/item?id=8189968](https://news.ycombinator.com/item?id=8189968)
for another UNIX trap for the unwary.

~~~
mnw21cam
I sometimes drop a "\--help" or "\--version" file in a directory I don't want
to accidentally run "rm *" in.

------
quackerhacker
Sometimes these threads are just serendipity!

I just recently finished a multithreaded program where I found obtaining the
pid [on linux: getpid()] of child processes spawn was only effective by
utilizing a common pipe that was non-blocking [fcntl(pipefd[1], F_SETFL,
O_NONBLOCK) ].

In other, more humorous words, as a "parent," it's great to know what your
"child," is doing (or in this sense), who your child is (the actual pid),
instead of just kill SIGTERM them.

------
JD557
Is there any clean way to use an Option/Maybe monad in C (or C++)? It should
be a simple way to solve problems where error codes are valid inputs of other
functions.

The simplest way I can think of is:

    
    
        struct maybe {
            bool isEmpty;
            void* value;
        }
    

Although I wonder if using C++ templates, classes and operator overloading is
possible to make a more practical implementation (using void* does seem like a
bad idea).

~~~
sbmassey
In C++ std::optional<T> does the trick, as of version 11, and
boost::optional<T> previously.

In C ... hrm you could probably wrangle some macros around that struct if you
were desperate.

~~~
pbsd
Nitpick: std::optional<T> does not exist in C++11, nor in the recently-
approved C++14. It is, nevertheless, in the process of getting into the
standard, and is already present in GCC 4.9's libstdc++ as
std::experimental::optional.

------
yokom
I don't use fork() that often, but my own paranoia is why I always test for <=
0 instead of == 0. Some people think I'm weird for doing something like:

    
    
      if len(some_list) <= 0:
          # Test for empty list
    

But it's just my way of covering my ass in case the laws of physics change
during execution, or just in case weird bugs exist like those found in this
article.

~~~
kelnos
Perhaps I'm misunderstanding what you're saying, but that's wrong too. With
fork() you need to _always_ test 3 cases:

    
    
        * -1: error
        * 0: success, in child
        * > 0: success, in parent
    

Testing for <= 0 would cause you to think there's an error when you're really
just the child process.

~~~
dllthomas
Or that you're the child process when really there's an error, which might go
undetected longer.

~~~
djcapelis
Thankfully that case will never happen because you can't be a child if there
was an error. :)

Also the check as presumably written would never _miss_ an error, it would
just potentially assume valid return values were also errors.

~~~
dllthomas
_" Thankfully that case will never happen because you can't be a child if
there was an error. :)"_

Uh, no. If there is an error, there will be no child process _but the parent
process will think it is the child_.

From the fork man page, emphasis mine: "On success, the PID of the child
process is returned in the parent, and _0 is returned in the child._ "

 _" Also the check as presumably written would never miss an error, it would
just potentially assume valid return values were also errors."_

That was already discussed as a possibility; I was addressing the other. In
the case you describe the software would never work at all, even when fork
successfully forks, because the child will always think there was an error and
presumably fall over rather than getting things done. That's probably the
better case, in terms of development progress, because it would be spotted and
fixed right away. But hopefully fixed correctly, and not converted to the
broken-but-working-when-fork-succeeds other variant that also uses "<= 0".

~~~
djcapelis
Ah, I see what you mean! Sorry for the misunderstanding, I had trouble parsing
your post.

~~~
dllthomas
No worries :)

------
nikita
Yes, fork can fail and we ran into this a few years ago at MemSQL. The problem
was that MemSQL would allocate a lot of memory and linux wouldn't allow to
fork such a process. A remedy to that is to create a separate process and talk
to it via tcp. This small and low on memory consumption process is responsible
for fork/exec paradigm.

~~~
leni536
Why doesn't it use vfork() for fork+exec? AFAIK vfork() doesn't clone the
allocated memory of the parent process and is only useful for calling exec
immediately after forking.
[http://linux.die.net/man/2/vfork](http://linux.die.net/man/2/vfork)

------
walski
Thanks! Definitively in my "shit I should know but didn't before HN schooled
me"-top-10 :)

~~~
vigneshv_psg
i had a similar thought when i read this. it would be nice if you can share
other similar stuff in that list of yours! :-)

------
alan-crowe
Is there a test command that asks the operating system to run a program but
cause the _n_ th fork to fail? I would be more diligent about writing code
that handles rare errors if I could create test cases. Writing code that I
cannot test feels wrong.

------
CSDude
I'm a teaching assistant of an OS course, I grade projects. I constantly
remind students to check the return values of the system calls and it is
mostly the main issue in their codes.

------
wmil
The kill -1 behaviour seems like a bug.

Sure, it's documented, but how often is it done on purpose? It seems like
something that should at least be a separate function.

~~~
dllthomas
If I set off something that proves to be a fork bomb as an unprivileged user,
kill -1 might be a reasonable approach. And note that kill exists as a builtin
in bash, so you can run it without forking a process.

Hopefully doesn't happen often, but potentially very useful in narrow
circumstances. The problem is it being -1, not it being available. If it took
an argument to kill that you'd never accidentally generate as a PID then it
might as well be a different function. Of course, with negative numbers
otherwise referring to process groups, there's not a lot of room remaining, so
yeah...

------
jdrago999
Yes, yes it can. That's why it's always:

    
    
        fork() or die "Cannot fork: $!";

------
dasmithii
I'm uncertain that I've ever checked for errors after calling fork.

Thanks for providing an impetus to do so.

------
donatj
And this is why I think Go-lang and its multi-return is the way of the future.
In Go you are _required_ to handle errors. If you want them to go away you
have to explicitly use an _ and thats really easy to find in the code and
shame the person who did it. Nothing fails silently. Nothing fails via primary
return. It is such greatness it is hard to express.

~~~
nwmcsween
You can easily do multi-return in c, return a struct it just doesn't hold your
hand.

~~~
donatj
But as a language construct and the agreed upon way of returning errors it's
much more powerful. Returning a structure you can just ignore the error piece.
Returning an error however in Go, you have to explicitly handle it. If you
don't, it's literally a compile error.

------
jacquesm
That's why you read the manpage on a function before you apply it rather than
just cutting-and-pasting the first bit of code google returns when you search
for 'fork example unix'.

(In this particular case that actually returns (for me) a bit of code that
gets it right.)

------
jheriko
i've never actually used fork... feeling glad now. i probably would have not
realised this...

reminds me of allocating memory for an error message to tell someone they are
out of memory. :)

------
runarb
Easy to test out to. In C:

    
    
      #include <unistd.h>
      
      int main(void)
      {
             while(1) {
                     fork();
             }
      }

~~~
tdicola
Haha too cruel. I remember in one of my early college CS classes the professor
told us about fork bombs and wasn't sure if they still worked on the Sun Fire
servers we were using. About 5 minutes later someone piped up and confirmed
that yep they still worked at bringing the server down. :)

~~~
X-Istence
That's why user accounts should have appropriate resource limits set per user.

------
general_failure
This is why we need checked exceptions.

Imagine if C/POSIX had a checked ChildProcessCloneException

------
gre
C programming 101.

------
smegel
I see it happen quite often on boxes with limited memory and hungry processes.

------
danielbhall001
Lol this is a true war HAHA..

