Normally if the process group leader dies the system will send SIGHUP to all the members of the process group. The default behavior is to exit. So on Unix systems the first process calls setpgrp() and becomes the leader of a new process group. All of its children will, by default, be members of the group, and their children, and so on. When the leader dies they'll all get SIGHUP and die as well. They have the option (if needed) to ignore that signal or spawn their own process group.
Session leaders are similar in some ways. When a session leader dies SIGHUP is sent to all members of that session's foreground process group and its controlling tty is released but IIRC it doesn't do anything about background process groups. (Session leaders control this with tcsetpgrp()).
Structured concurrency as commonly understood requires children to not outlive parents, recursively. For example, if A spawns B spawns C, then B cancels for C with timeout 30s, then 5s later A cancels B with timeout 5s, then the timeout for C must be effectively reduced by 20s[1].
But the issue here is more basic than timeouts. If we are to use process groups in the example, then B must be a process group leader for C and simultaneously belong to the process group led by A, which is impossible on Unix—that’s what TFA means by “[the] process group mechanism [is] one-level deep”. Close but no cigar.
(On the other hand, no reason this can’t be an extension to—I would even say a rationalization of—the process group mechanism. It doesn’t exist right now, though.)
I'm puzzled, too. The example is about spawning threads in Rust, not processes under UNIX. In Rust, if the main thread dies, all other threads get killed. So child threads outliving the main thread is not a problem.
A harder problem is making sure that the program shuts down if some thread panics. There are several ways to do that, but it's not automatic.
Incidentally, regular mutexes are "poisoned" if the thread holding them panics. Anyone trying to lock the mutex gets an error. However, parking_lot mutexes do not do this. If the thread holding one of those panics, the mutex unlocks. This may be unsafe.
I think the example is the code for a child process which must use a separate thread to block on stdin for the whole process. As soon as the parent process dies (no example code?), the connection to the child process’ stdin will be unblock, causing the child monitor thread to detect and panic.
Well for one thing, the child process could handle sighup, or just mask it and ignore it. And as another comment mentioned, it only goes one level, so it won't signal grandchildren. But, if you control the code of the child, as this article assumes, you could probably deal with that.
The article is wrong FWIW. By default a child is a member of the same process group as its parent no matter how many levels deep.
Shells push each job into a separate group, making the shell interpreter the process group leader. Thats why suspend works, it sends that whole process group to the background.
Depending on how an interpreter is setup a shell script might end up under its own progress group if the shell always creates one... usually when invoked as an interpreter it shouldn't be making a new one though.
well process groups aren't hierarchal. So if a child (or other descendant) process sets itself as a process group leader, so that it can kill it's children, it escapes your process group. So it isn't composable.
Process groups aren't hierarchical because it is a two-level namespace: process groups (jobs) are members of a session and all processes in a job must be members of the same session. All child processes inherit their parent's process group and session. If they didn't then such a program would not work correctly in any shell pipeline and a bunch of other behavior would break.
I'm assuming all processes are under your control in this scenario. It should be very rare for any process that isn't a shell or daemon to create its own process group or session.
Sessions can optionally have a controlling terminal, one (and only one) foreground process group, plus any number of background process groups.
The foreground job is how the system decides who receives signals generated by the TTY. If the controlling TTY disconnects or is closed the session leader gets SIGHUP which it usually uses to signal all the process groups it created. If the session leader dies or exits then all processes in the foreground process group get a SIGHUP.
If you exit an interactive shell or close your SSH session the system generally cleans everything up as you would expect. The processes involved generally don't implement any special handling code to make this happen: the session leader (your login shell) uses the standard Unix job control APIs to manage the session + process groups and the expected behavior falls out naturally.
Unless there is a good reason to implement this differently: don't. Use the standard Unix job control APIs as they were designed to be used.
>Well for one thing, the child process could handle sighup
I've come across programs that use SIGHUP for various things like config reloading when they could use USR1/USR2 instead. UNIX signals are kind of crap in my opinion. USR1/USR2 should probably not even exist, there should be some other way to 'message' processes without IPC.
This can not be achieved using a pattern equivalent to { ... }.
The parent process itself might be abruptly killed, and the finally blocks / destructors / atexit hooks are not run in this case.
It is kinda bonkers that after 30 years Linux still doesn't some kind of inescapable and recursive process group mechanism. How many tens of thousands of hours of programming time would have been saved over the years if there was a way to:
1. Your process can create a new process group ID (and write it out to stable storage if it wants)
2. add an argument to fork so the same process can pass that ID into fork and the child of the fork wakes up permanently in that group, and any child processes later created by that child will also be in that group
3. Any new groups created by any process in the group are always contained in the original group, you can never leave a group but you might be in multiple groups.
4. There's a way to kill an entire group and verify that everything for an identifier is dead.
I think Solaris and some of the BSDs had something like this, so I wish Linux would add one too (though, I guess Linux has managed without one for so long that maybe that's proof enough that it's not really needed, and worst case you can always just reboot the box - nuking from orbit is the only way to be sure)
1. It is relatively complicated to use. And even harder to use properly. From what I understand, to reliably kill all processes you need to freeze the cgroup then list the pids in it, then send a signal to each of those pids. Which is pretty involved, requires a separate supervisor process, and isn't 100% reliable in cgroupv1.
2. It requires root, or at least having control of a cgroup delegated to the process. You might be able to use user namespaces, depending on the distro and kernel, but that makes the implementation even more complicated.
3. It is possible to escape the cgroup, if the child process has permission to write to the task file of another cgroup.
Cgroups are useful, and can be used for this use case in some common scenarios, such as docker and systemd.
But as a general tool for structured concurrency that normal processes can use, it doesn't quite fit the bill.
its sort of weird to think that the notion that portability with respect to cgroups is now primarily about whether your version of the linux kernel supports the revision of cgroups that you're concerned about, rather than that you have a kernel that understands cgroups.
Two. And what's funny, I saw some of the fallout from the transition from v1 to v2 only last year at my last gig. Company upgraded Debian and Debian maintainers opted to go v2 only in the kernel build. However, the version of JVM the company was using did not fully support v2. Without the cgroup support, each JVM in a container thought they had the resources of the whole system. It was a cluster (heh) fuck, as in that all of the affected services OOM'd thinking they had more heap potential and the k8s cluster was churning pods like no tomorrow.
Aren't you talking about cgroups? They "allow processes to be organized into hierarchical groups whose usage of various types of resources can then be limited and monitored." `man cgroups` and `man unshare`.
Or even just have a way for a process to tell the OS that when it dies, all descendant processes should be sent a signal (including SIGKILL), with no way for them to opt out of it.
99% of the time when people say "I wish Linux had ...", the problem is that they're not using systemd.
It's possible to do it without systemd of course, but that involves an ad hoc, informally-specified, bug-ridden, slow implementation of half of systemd anyway.
If you do this, and you should, you should probably set up stdin in the child to be a pipe, not whatever stdin of the parent is, implied by the comment about /dev/null.
Maybe not in /dev/, but if you make two named pipes ("mkfifo") you can read from one and write to the other, and they'll block since nothing is attached to the other end.
I had a hell of a time a few months ago debugging why my child processes were dying, before learning that PDEATHSIG=9 (don't ask) kills child processes when the thread that created them in the parent process exits.
My debugging was not aided by the fact that disabling the code where I set PDEATHSIG had no effect, since someone else's code was invisibly setting it regardless.
I'm a little rusty here, but I think you are talking about the fact that the exit status of the child process has to remain available until the parent process can reap it. Until that happens the child process is in the zombie state. You indicated that the creating thread needed to exit, but that seemed a bit too specific to me, I think that any thread can reap the exit status of a child process.
Related to this is the double-fork pattern to avoid zombie processes (and a couple other issues) when initiating a daemon process.
The surprising behavior was that the child process received the configured signal not when the process that created it exited, but rather when the specific thread that called fork(2) exited.
The parent process was an event loop based python program whose main function was to manage the creation and deletion of these child processes, and the simplest way to spawn child processes without blocking the event loop is to call fork(2) on a thread pool. My thread pool was triaging the number of worker threads based on demand, so occasionally it would decide a worker was no longer needed, and all the child processes that happened to have been created on that thread would get SIGKILL'd — something you rarely want when using a thread pool!
I didn't want the child processes to die unless the parent process's business logic decided they were no longer needed, or if the parent was itself killed (this latter reason being the motivation for setting PDEATHSIG).
Once I understood why my processes were dying, the solution was simple: make sure the worker threads never exit.
No, what they're talking about is the fact that in Linux, sometimes "process" in the documentation actually means "thread", and this is particularly true if you're doing mildly funky process management APIs.
The comment about
'/dev/reads-and-writes-block-forever' intrigues me, as I think there could actually be lots of different use cases for this hypothetical '/dev/blocking' device.
For instance, it could be used to intentionally plug the stdin of a process like described in the blog post. Another way that it could be beneficial would be debugging async yielding of blocking operations.
Can't think of any better examples, am I totally off the hook?
Hm interesting, although I've never come across the need for this. I think it arises in a fairly specific set of circumstances
- You're running unit tests for a compiled language. Broken code may crash
- You also need to start processes from the same process.
In contrast, I basically never have unit test processes that themselves start processes. Instead all unit tests are a single process, and started with a shell script. The shell is responsible for waiting, and handles concurrency.
Although I guess the third consideration is
- the tests need to work on multiple platforms
I can sorta see that, though I still think it's fairly niche, because I think that starting both processes and threads should be parameterized. Library code generally shouldn't start processes or threads -- that should be a policy that you pass in.
---
I have come across cases where I'm specifically testing code that starts processes (a shell!), and you get orphans if either the test code or the tested code isn't quite correct. However I'm pretty sure the pattern in the OP would not be enough in those cases -- as mentioned, it requires some cooperation from children and parents.
And BTW the general problem of containing adversarial processes is impossible in Unix, and would require Linux cgroups (mentioned in the articled). There's an inherent race between enumerating children and killing processes. A malicious process can always escape your attempts to kill its children.
To be honest, what I started doing more is running tests in CI, so there is another layer of cleanup.
I think the bottom line is that tests that start processes are very much special cases, and should be separate from most of your test suite.
I don't know about unit tests, but I often write regressions test in python that start and orchestrate a dozen or so of subprocesses that communicate with each other. Making sure everything cleans-up properly no matter what is a pain.
I have solved this problem, except for grandchildren processes. (Use some non-portable construct if you must.)
This is what I did: I have one thread whose sole purpose is to wait on children. It's activated by a semaphore. But it will wait and then send a message through a pipe to the thread that started that child when the child exits. (As a side benefit, the launching thread can then use poll() or other fd waiting.)
The killer is killing children; I found that it is best to kill all children that haven't been waited on when killing is necessary. The problem is that child exit messages can block the pipe, so you need to take them out, but they may arrive in a different order than you expect. If you're just killing all children launched by that thread, no problem, just read and ignore every time you kill a child.
There are a lot of details (that code has the single longest code comment I've written in my life), but most of them are in that classic "Gotchas" document matklad references.
But as the original article mentions, what about the case where your parent process is killed uncatchably, e.g., with kill -9? Your children may persist, because the parent was not afforded an opportunity to kill them.
Yes, but if a process gets a SIGKILL, the user is explicitly saying that they don't want that process to do anything else. Otherwise, they would send a SIGTERM.
So a SIGKILL is really the user passing the responsibility of the children onto some other process.
I think you may have misunderstood the problem that this article is solving.
You've implemented something which kills all your child processes when you exit cleanly, but which leaks child processes if you exit uncleanly. This is, frankly, easy to do, and not interesting. It's not a "problem".
The article is solving the problem of: how do I make sure my child processes die even if I exit uncleanly? That's an actual hard problem to solve.
Is having a "process manager process" a good pattern? This process would have a record of all currently running processes and their children. When you want to create a new process you make a request to the process manager process. When a process is killed the process manager could then kill its children.
This looks entirely wrong to me? The stdin file descriptor isn't "owned" by or associated with the parent process at all. It's whatever the parent decided to set up at fork time, and by default it's just "whatever the parent's parent set up for IT". That is absolutely not guaranteed to be closed when the parent exits.
It's true that if you spawn something interactively from the shell and kill it, the shell will clean up the file descriptors before returning to read the next command. And that will look like a close to any children that might have launched. But redirect stdin from a real file and no such close is going to happen. Likewise launcher tools like systemd that want to parse the process output and control it have behavior that tends not to be shell-like.
This sounds like a really good pattern, and something that deserves to become a modern standard (like https://no-color.org/).
I feel like this should be made opt in somehow. If you are a noninteractive process and stdin is a tty then you probably shouldn't be swallowing input. I frequently blind-type the next command while a long running command is active, because well behaved noninteractive programs don't swallow input from stdin.
Session leaders are similar in some ways. When a session leader dies SIGHUP is sent to all members of that session's foreground process group and its controlling tty is released but IIRC it doesn't do anything about background process groups. (Session leaders control this with tcsetpgrp()).