One place where the inconsistency gets weird is when you use signalfd with epoll. The epoll will flag events on the signalfd based on the process where the signalfd was registered with epoll, not the process where the epoll is being used. One case where this can be surprising is if you set up a signalfd and an epoll and then fork() for the purpose of daemonizing -- now you will find that your epoll mysteriously doesn't deliver any events for the signalfd despite the signalfd otherwise appearing to function as expected. That took me a day or two to debug. :(
With all that said, at the end of the day I disagree with Geoff. I would rather use signalfd than signal handlers. The "self-pipe trick" is ugly, involves a lot of unnecessary overhead, and runs the risk of deadlocking if you receive enough signals to fill the pipe buffer before you read them back (which can be solved with additional synchronization, but ick). In fact, in my own code, on systems that don't have signalfd or any similar mechanism, I tend to block signals except when I'm about to call poll(), and then siglongjmp() out of the signal handler to avoid the usual race condition. (See pselect(2) for discussion of said race condition.)
I think it's just a fact of life that you need to clear your signal mask between fork() and exec(), and yeah no one does this, whoops.
BTW, for the specific problem of dealing with child processes, I really hope Linux adopts the Capsicum interface as FreeBSD has:
Until then, you simply can't expect to reap children via signals. You use the signal to let you know that it's time to call wait().
The unfortunate terseness of the original "self-pipe trick" description makes the solution to this difficult to see. As far as I've figured out there are two things to notice:
1) You're supposed to set the pipe to be non-blocking. Presumably you also then don't check the return code of the write(2) call in the signal handler. While this solves the case of a signal handler blocking forever, it does mean you might have dropped writes that correspond to signal receptions. That leads us to:
2) The self-pipe trick specifically calls out handling SIGCHLD (probably because it's one signal that you don't want to ignore!) But given the chances of dropping a byte as described in 1) and the fact that SIGCHLD and fork are explicitly called out, I can only assume that the lesson here is: only have one pipe per signal you intend to handle. Since multiple signals sent to a process may result in a single signal being delivered, your real signal handling code (the stuff that's watching the other end of the pipe) already has to deal with this situation.
As for Capsicum, I can't wait til they implement pdwait(2)! Until then, at least pdfork(2) ensures that the parent process' death kills the child process...
That doesn't matter. You're not supposed to have a byte in the pipe for every signal. What matters is having at least one byte any time there are unprocessed signals. The only function of the pipe is to wake the select(2) up. You still need bookkeeping elsewhere.
> That leads us to:
> 2) The self-pipe trick specifically calls out handling SIGCHLD (probably because it's one signal that you don't want to ignore!) But given the chances of dropping a byte as described in 1) and the fact that SIGCHLD and fork are explicitly called out, I can only assume that the lesson here is: only have one pipe per signal you intend to handle. Since multiple signals sent to a process may result in a single signal being delivered, your real signal handling code (the stuff that's watching the other end of the pipe) already has to deal with this situation.
Meh. Just have one pipe and a sig_atomic_t for each different types of signal you're interested in.
Whoops, I mentioned this in another comment and missed this, but see http://lwn.net/Articles/638613/
Is this what libuv does? I'm pretty sure it reads signals using epoll on linux, so in theory - if it does it this way - this bug could be be underlying all of node.js.
There's no need for threads. Set the pipe to non-blocking and ignore the write() error if it's EAGAIN/EWOULDBLOCK. See my response above for why dropping writes if a byte already exists in the pipe is okay.
(But yes, setting it non-blocking is correct.)
Perhaps but then you're mixing signals and threads and you're in for a whole new world of hurt. :)
E.g. I've found that OSX does not always behave correctly when delivering signals to a process where one thread has blocked the signal but another hasn't, though I cannot remember the exact details. And of course on any system there is such a thing as signals addressed to a specific thread rather than a whole process (ptherad_kill()).
I'm not really sure that thread-directed signals are in scope for the sorts of things where you must use signals (SIGINT, SIGTSTP, etc. from a terminal, SIGCHLD from child termination, etc.) Those should all be process-directed. If you design your own API that involves signals, then sure, but that's a problem of your own making.
To his broader point, the mistake is to assume you will be able to get one signal delivered per signal raised. That's just not how (classic) UNIX signals work (POSIX realtime signals are different, and are queued) - they fundamentally need to be treated as level-triggered, not edge-triggered. For the SIGCHLD example, when a SIGCHLD is recieved (no matter whether through signal handler, self-pipe trick, signalfd() or sigwaitinfo()) you need to loop around waitpid() with the WNOHANG flag until it stops returning child PID statuses.
> So you have to be very careful to reset any masked signals before starting a child process
I don't see what this has to do with signalfd. That statement is true generically. Non-default Unix signal handling and subprocess management have never cooperated cleanly. The point to signalfd is to provide a simpler mechanism to integrate signals (which are a legacy API in almost all cases) with existing synchronization and event handling architectures, not to magically make them not suck.
It's particularly bad because the only way to get notified on a child exiting, in an event-handling architecture, is to wait for SIGCHLD notifications. (You can't call wait/waitpid because that's blocking; at best you can call it in a separate thread.) So even if all you're trying to do is write a program that runs a handful of children asynchronously, you have to incorporate signals into your architecture. And signalfd taunts you by providing siginfo with each notification, so you think you know which child exited -- but in fact, those siginfos could have coalesced, so this data is useless.
A friend claimed to me that siginfo is only useful for so-called synchronous signals (SIGSEGV, SIGILL, etc. -- stuff that you can't handle in an event loop anyway), which I'm inclined to believe, the more I think about it. So there's no reason for signalfd to have included siginfo.
This is kind of nasty in case your pids tick over very fast so you'd have to do this with a fairly high frequency to make sure you don't hit the same pid twice.
Fortunately this is hardly ever a problem but it is something worth thinking about when using the trick.
No actual signal is sent, it's just asking the kernel to check if the signal could have been sent.
The only major thing you have to remember when using signalfd is to mask the signals you want to only receive via signalfd and then unmask these signals in any child processes before calling exec*() functions.
If signals are so problematic, why rely on them? Is the functionality useful for other things other than dealing with 'emergencies'?
One thing I can see that is useful, is that it allows a program to gracefully deal with a kill, but many applications seem to have a 'graceful stop' mechanism that doesn't need signals.
You could certainly imagine some kernel extensions that take all of this useful functionality and make it available in ways other than signals, leaving just signals for things you have to deal with immediately like SIGSEGV (so you can print a nice error message before quitting), but they don't exist yet. I imagine some of the intent behind signalfd was to do this all at once for all signals, but it didn't quite work.
In the SIGCHLD case, there's a proposed CLONE_FD flag to clone which would return a file descriptor instead of a PID. This fd could be read poll'd on and read from, which is much nicer than dealing with SIGCHLD. See http://lwn.net/Articles/638613/
So those kernel extensions are happening :)
Overall, it seems easier to avoid process ID wraparound attacks via using the full 32-bit number space for PIDs. There may be a few programs that need to be changed because they did something silly like cast pid_t to short, but I think overall most programs would work just fine. As far as I can remember, the reason for using low numbers was because people didn't want to type longer ones at the shell. Internally the kernel and libraries store everything as 32-bit, at least on Linux.
Absolutely. But once you've opened the file descriptor, the kernel would guarantee that its corresponding process ID would remain unused until you closed the file descriptor. (For example, it could keep the process a zombie if it exits.)
This way, it's possible to write a reliable killall: walk /proc, call openpid() on each entry, and with the PID FD open, examine the process's user, command line, or whatever else, kill the process if necessary, and close the process file descriptor.
That seems like it would open you up to a trivial denial-of-service attack where some attacker just spawns a bunch of processes and never closes the /proc handles. Then you can't start any more processes because there are no more process IDs available. The only workaround is to have a larger PID space, which poses the question... why not just have a larger PID space in the first place and skip the new, non-portable API?
I don't think a ulimit would be very effective here at preventing denial-of-service. Let's say I set it to 100... I can just have my 100 children each spawn and hold on to 100 children of their own, and so on and so forth. If I just go with a bigger process ID space all these headaches go away, plus existing software works without modification.
The advantages of process handles outweigh this small risk.
That means you ought to be able to transfer it to other processes via file descriptor passing (the SCM_RIGHTS ancillary message; see man unix).
The identity of a process would thus be local to its parent or to a process with which the parent has agreed to share that identity. Not only does this avoid race conditions, it also enables a completely unrelated process to reap a child which can be terrifically useful.
This is exactly the approach the Capsicum sandboxing framework (mentioned elsewhere) is taking. The goal there, though, is to eliminate globally shared identifiers as much as possible -- which makes sense for sandboxing!
I don't see how that's possible. You need to listen to at least SIGTERM, SIGINT and SIGHUP if you're going to gracefully stop.
There used to be a FUTEX_FD, but it got removed. I think you can mostly achieve the effect with eventfd, though.
I am rusty on low level programming but I have done enough to know that this poster is whining a bit too much.
Signals should only be used in the general case for exceptional circumstances, like killing a programme. A signal handler's job is to deal with the crisis, e.g., gracefully exit.
In lower level cases signals mean there is an urgent event, something that must be done now or it is useless to bother.
If you try to use signals for general purpose IPC then you get what you deserve - chaos.
(That said, I would definitely agree that the kernel is misusing signals -- SIGWINCH should just be some form of metadata on the terminal fd, not a process-wide signal.)
It's not a case of "misuse": the API is so truly atrociously bad that any programmers attempt is going to be wrong. I'm aware of the pitfalls, and I do not feel comfortable stating that I would get it right; someone who is not aware of the pitfalls is hopelessly screwed.