
Should you be scared of Unix signals? - ingve
http://jvns.ca/blog/2016/06/13/should-you-be-scared-of-signals/
======
pm215
If you've ever read the Lions' Book commentary on 6th Edition Unix, you'll
notice that many parts of the API as implemented back there are pretty solid
-- quality, well designed interfaces that have stood the test of time.

Signals are not one of those parts. The 6th Ed signal handling code reads to
me as somewhat of an afterthought whose use cases were mostly "kill the
process for a fatal signal or terminal ^C", "ptrace for a debugger" and maybe
SIGALRM. The data structures don't allow a process to have more than one
pending signal -- if a new one comes along the fact an old one was pending is
simply dropped. Running a signal handler automatically deregistered it,
leaving a race condition if two signals arrived in close succession (this is a
well known bug fixed by BSD later). And EINTR is an irrelevance if signals are
generally fatal but its effects spread like poison through every other kernel
API if you need your program to be reliable even with signals being delivered.

The worst bugs and races were fixed up by the BSD folks and others, but the
underlying concept is an unfortunate combination of "basically irredeemable",
"indispensable" (you have to have some kind of "kernel tells you something has
happened" API, and signals are what we got) and "insidious" (thanks to EINTR).
I think they're a strong candidate for "worst design decision in unix".

(PS: one of the reasons they stand out in 6th Ed is that so much of the rest
of that code is so good!)

~~~
Qwertious
>The worst bugs and races were fixed up by the BSD folks and others, but the
underlying concept is an unfortunate combination of "basically irredeemable",
"indispensable" (you have to have some kind of "kernel tells you something has
happened" API, and signals are what we got) and "insidious" (thanks to EINTR).
I think they're a strong candidate for "worst design decision in unix".

Suppose you were to throw the whole thing out and write a _good_ replacement
(and backwards-compatibility be damned), what would it be like?

~~~
pjc50
_Suppose you were to throw the whole thing out and write a good replacement
(and backwards-compatibility be damned), what would it be like?_

Steal the best bits from Windows NT, and improve the existing mechanisms.

Kill signals in their current form. Build a general-purpose notification
mechanism consisting of a _mutex_ and a _message_. Possibly allow a process to
have more than one message queue (Windows makes this really, really easy).

All IO, networking and informational signals (SIGWINCH, SIGCHLD etc) then come
as messages (these may have to be fixed size, but anything from a few words to
a 4k page would do). select, poll etc are replaced by waiting on a mutex. You
can put all your worker threads waiting on that mutex. A message arrives. The
kernel wakes _one_ waiting thread and gives it the message (via an atomic
deque-or-block syscall). You don't have to do any O(n) processing to work out
which socket it relates to as the kernel has helpfully put it in the message.

In the deluxe 4k page version, a 1500-byte ethernet frame arrives, is DMA'd
into the top half of a page, the kernel inspects it and sets message headers
to say where the data is, and hands it directly into the userspace of a
waiting process.

The one downside of this is that UNIX pipe programs become slightly more
complicated. Rather than just doing "while(read()) write()" you'd have to
switch on the type of message recieved and implement your own abnormal exit
functionality. This could probably be tidied away for you inside the standard
library.

External process control mechanisms would have to be built for killing
processes and suspend/resume.

~~~
gpderetta
Unix already has a general notification mechanism in the form of poll and
select, no need to add a new one. The problem is not all interesting events
are not portably delivered via a file descriptor, but that can be more easily
extended (as done by lot of unices, including linux) than coming up with a
completely new primitive.

But some messages really must be delivered synchronously and can't normally be
queued: SIGSEGV, SIGFPE, SIGBUS, etc. There is really no way around
interrupts.

BTW mutexes are not for signaling. What you want for signaling in a queue are
semaphores, events or condition variables (or even file descriptors, like
eventfd).

~~~
pjc50
Well, the brief was to "throw the whole thing out", including select (which is
bad) and poll (which is merely adequate).

The machine traps are interesting in that they should only be generated
locally - there's no sensible case for injecting SIGSEGV into other processes.
Arguably we should learn from Windows "structured exception handling" here.
There are two sensible things to do with traps (other than sudden death): hand
over to a callback of some kind (which should be told about the state of the
stack), or turn into a language-native exception and throw that.

~~~
gpderetta
Poll is perfectly fine for the very large majority of unix applications which
do not need to scale to tens of thousands of sockets.

The handling over to callback is exactly what is done by unix signals.
Converting to exceptions can be implemented on top of signal handlers, but
note that even MS stopped mapping by default structured exceptions to language
exception a while ago, at least in C++, as unwinding the stack, destroying
state and potentially calling destructors is the last thing you want on a
segmentation fault or other unexpected events.

~~~
pjc50
_The handling over to callback is exactly what is done by unix signals_

Not quite, there are quite a lot of restrictions on what you can do in a
signal handler. It ought to be possible to design a callback mechanism without
those restrictions. And a signal tells you nothing about its origin or what
file descriptor / child process etc. it might relate to.

~~~
gpderetta
I assume that by restrictions you are talking about the async signal safety;
this is inherent on the 'interrupt' nature of signals as they can happen at
any point in a program execution, there is really no way around that. It would
be of course nice if more functions where async signal safe (especially
malloc).

Regarding the lack metadata, I agreed else thread that messages that carry
such data ought to be transported via an explicit message queue, not via
signals.

------
euske
I found a paragraph in this article
[http://www.linusakesson.net/programming/tty/](http://www.linusakesson.net/programming/tty/)
very apt at describing what Unix signals are like:

    
    
      In *The Hitchhiker's Guide to the Galaxy*, Douglas Adams 
      mentions an extremely dull planet, inhabited by a bunch of 
      depressed humans and a certain breed of animals with sharp
      teeth which communicate with the humans by biting them very
      hard in the thighs. This is strikingly similar to UNIX, in
      which the kernel communicates with processes by sending
      paralyzing or deadly signals to them.

------
Animats
Signals are like interrupts, and like interrupts, they're handled in an
unusual environment. That's the main problem. You can be inside some
nonreentrant library when a signal handler is called.

Most programs that do something complicated with signals generate an event in
the signal handler and put it on a queue to be handled later. The queue should
be lock-free, or there's a risk of deadlock.

~~~
ambrop7
Interrupts typically do not _observably_ interrupt currently running code. In
a simple system (e.g. embedded system with no threads, all event-driven), the
interrupt handler will run, then the processor will go back to running
whatever it was running before. This is not so for UNIX signals, in case you
are in the middle of a system call, because the mere occurrence of the signal
will change the behavior of the interrupted code.

Yes I know it's not the same because in my example single-threaded system
there's no such thing as a blocking call.

Actually I don't see a good reason why signals in unix would have to cause
EINTR errors in system calls. Perhaps a better solution would be to let the
system call go on normally. Since the signal doesn't observably interrupt code
not in a system call, why would it observably interrupt code in a system call?

In case anyone thinks, "so you can detect the signal in the main code", that
is a bad answer because whatever you do you will have race conditions if the
signal happens just before you enter the system call. Your only chance is to
use things like ppoll() which are designed for proper signal handling, and
these things could work just as well in a hypothetical unix design with no
EINTR.

~~~
gpderetta
You can siglongjump out of a signal handler [1]. If you sigsetjump right
before doing a blocking call, you can reliably detect signals.

Another way to avoid the race condition in poll/select, before p{poll/select}
were standardized, was to store the timeout parameter in a global variable and
have the signal handler set it to zero. Finally there is the self pipe trick,
which admittedly doesn't require EINTR at all.

[1] This is historical unix behaviour. At one time it was specified by the
SUS, but it seems that it was dropped from more recent SUS/Posix standards.

~~~
scottlamb
> You can siglongjump out of a signal handler [1]. If you sigsetjump right
> before doing a blocking call, you can reliably detect signals.

The problem with that approach is that if the system call has already returned
by the time the signal handler runs and jumps, the system call's return gets
clobbered. So if for example you're doing blocking reads/writes, you don't
know how many bytes you read or wrote.

If your only blocking syscall is level-driven polling this approach is fine
but the self-pipe trick is easier.

I wrote (10 years ago) a library to do something similar reliably. It required
custom wrappers for every system call of interest so I could know by the
instruction pointer in the ucontext_t whether the system call had actually run
yet or not.
[http://www.slamb.org/projects/sigsafe/](http://www.slamb.org/projects/sigsafe/)
The library's a bit stale now; it doesn't do the vsyscall thing for example.

~~~
gpderetta
Duh! You are right, losing the results of partial read/writes is not
acceptable. I guess on x86, completely unportably, you could check whether the
current ip is pointing to a syscall/int instruction.

~~~
scottlamb
You probably want to jump if you're "just before" the syscall, too, though. So
you end up with basically this:

syscall wrappers:
[https://github.com/scottlamb/sigsafe/blob/master/src/x86_64-...](https://github.com/scottlamb/sigsafe/blob/master/src/x86_64-linux/sigsafe_syscalls.S)
(probably could do something better with that thread local; and as I mentioned
this isn't using vsyscall)

signal handler:
[https://github.com/scottlamb/sigsafe/blob/master/src/x86_64-...](https://github.com/scottlamb/sigsafe/blob/master/src/x86_64-linux/sighandler_platform.c)
(although it bugs me now that I iterate the whole array if the instruction
pointer's not in any of the syscalls)

and just to be sure, a race checker:
[https://github.com/scottlamb/sigsafe/blob/master/tests/race_...](https://github.com/scottlamb/sigsafe/blob/master/tests/race_checker/race_checker.c)

------
eric_the_read
This is FoaF-level stuff, but:

I used to work with a guy who in a past life was an HP-UX dev. He told me that
the guys who worked on the signals support in the OS had a 10-foot pole
between their cubicles that had a flag on top reading: "You must be THIS tall
to use signals."

~~~
vikiomega9
What's FoaF? (RDF?)

~~~
ludamad
Friend of a friend

------
marios
I've linked to this before, but AFAIK it's still relevant as the gotchas
regarding signals haven't changed. Slides from a talk titled "Signal Handlers"
from OpenBSD developer Henning Brauer :
[http://www.openbsd.org/papers/opencon04/index.html](http://www.openbsd.org/papers/opencon04/index.html)
To answer the articles question: should you be scared of Unix signals ? No.
But you shouldn't do anything complicated in the signal handlers.

------
wscott
BitKeeper uses signals to implement a paging data structure from a compressed
backing store. I allocate the memory for my data structure that is backed by a
file on the disk and then use mprotect() to mark that memory as read-only.
Later when trying to access that memory a signal handler traps the access and
loads and decompresses the data from disk into memory.

This is only done for unix systems that implement the sigaction() POSIX
signals. It is tricky to get right, but it does work.

BTW I did find I could never get OSX 10.4 to work correctly, but by 10.7 Apple
had finally fixed the bugs in their signal code.

~~~
monk_the_dog
Oodbs used a similar technique to translate addresses from a large "global"
address space into smaller "local" one. Here is a paper if you're interested
(ftp://ftp.cs.utexas.edu/pub/garbage/swizz.ps). Did you discover this
technique on your own? At any rate, very cool.

------
blucoat
>SIGSEGV is a very important signal. It happens when your program tries to
access memory that it does not have. An appropriate reaction might be to

    
    
        allocate more memory
        read some data from disk into that memory
        do something with garbage collcetion (but what? I'm confused about this still.)
    

What? Are there any Real World Programs which do anything other than print a
stacktrace and exit? I don't think this person gets what a segfault is.

~~~
hornetblack
If you had a Green threaded program and one of the threads segfaulted. You
would probably want to catch Segv and kill that thread. (Not killing the OS
thread running it).

I've also seen is used to implement a distributed malloc. When a segfault
occurs, the handler messages the programs peers asking if they have the data
for that address. If so the peers sends the page and the handler maps in a new
page for that address with the correct data in it. This is essentially
implementing a page fault handler in user space. (For some network backed
memory).

~~~
gpderetta
Why would you only want to kill that green thread? On any thread
implementation I'm aware of, an unhandled segfault kills the whole process.
Anything else is disaster waiting to happen.

------
wyldfire
As others have said, there's peril to be had there for sure, so tread
carefully. Minimize the scope of your handler is the best advice, certainly
also refer to "Async-signal-safe functions" in signal(7) if you must use libc
funcs.

One challenge in a distributed system when there's (ab)use of signals is
finding out which process issued a signal. There might be a better facility to
do it now but I've used systemtap [1] to find out who the sender was with
satisfactory results.

[1]
[https://sourceware.org/systemtap/examples/process/sig_by_pid...](https://sourceware.org/systemtap/examples/process/sig_by_pid.stp)

------
js2
_Except for SIGKILL. When you get sent SIGKILL nobody communicates with you,
you just die immediately. But the rest of the signals you 're allowed to
install signal handlers for._

SIGSTOP also cannot be caught nor ignored.

~~~
wtf_is_frp
or blocked.

------
ambrop7
The worst thing about signals is that they interrupt whatever system call the
thread is currently inside (EINTR). This is not typically observed but can
have dire consequences randomly. For example, last time I checked, in Python
2.7, a signal that invoked a signal handler will cause a running print() to
throw an exception. Here you should consider signals like SIGCHLD which you
want to handle and not kill the process.

A particular case this happens is when doing event-driven programming. The
only way to be sure that you don't have such bugs lurking is to setup signal
handling such that a signal cannot possibly interrupt unsuspecting code.
Currently, I'm aware of two solutions, both involve blocking signals:

1\. Block relevant signals in the main loop (or generally all threads) and use
signalfd to detect and consume signals (or similar mechanisms on other
platforms, e.g. kqueue).

2\. Start a dummy thread whose only purpose is to handle signals, leave
relevant signals unblocked in this thread and block them in all other threads.
Write the signal handler to communicate the signal to your main loop via the
self-pipe mechanism or similar.

Note that solution (2) can usually be implemented for an existing framework
without changing that framework - you only need to add code to main which
starts that thread then blocks signals, before any other threads are started.

I consider "fixing" code to be robust to signals a non-solution, because you
would have to verify every single piece of code running in your program,
including third-party libraries.

~~~
ptx
> For example, last time I checked, in Python 2.7, a signal that invoked a
> signal handler will cause a running print() to throw an exception.

Python 3.5 fixes this – system calls are now automatically retried:

[https://docs.python.org/3.5/whatsnew/3.5.html#pep-475-retry-...](https://docs.python.org/3.5/whatsnew/3.5.html#pep-475-retry-
system-calls-failing-with-eintr)

------
0x0
I've never seen a safe signal handler beyond a SIGTERM that sets a "volatile
int time_to_quit = 1" for a main loop to pick up on later...

~~~
JdeBP
volatile int isn't necessarily safe. volatile sig_atomic_t would be, however.

------
bitwize
YES. Especially when there are much better ways of handling OS interrupts to
your program -- like Structured Exception Handling under Windows.

------
gpderetta
The big problem with unix signals is that they have been abused to deliver
some messages that should really be delivered via a message pipe (e.g. SIGCHD,
all the terminal/tty specific signals). The other is that the set of signals
is limited and signal handlers are a process (or thread) wide resource, so it
is hard to make use of them in a library.

Other than that, the general ability of interrupting and delivering a message
to a thread no matter what is doing is necessary and signals are a way to
implement that. Exceptions are another way, but that can be implemented on top
of signals.

edit: but there is really no excuse for EINTR. The "Worse is Better" essay has
something to say about this.

------
known
Consider this code highly experimental and yourself highly mental if you try
and use it in a production environment

[http://www.kegel.com/c10k.html#examples.nb.sigio](http://www.kegel.com/c10k.html#examples.nb.sigio)

------
jasonzemos
On linux, the behavior for locking a mutex (I tested a GNU C++11 std::mutex)
from a signal handler is to consider it (at least for the interrupted thread)
to have been unlocked. This allows intuitive synchronization from handlers and
avoids deadlocks, which I'm assuming is facilitated in the kernel-futex
design. If any kernel hackers want to chime in on why this works (and is safe
(is safe?)) in the face of most unix specifications, generic docs and articles
like this it may be enlightening.

~~~
gpderetta
You are well into UB land. The behaviour you describe is very dangerous as the
signal handler will be accessing the mutex protected data structure while it
is in a potentially inconsistent state.

The right, portable way to signal from a signal handler are POSIX semaphores
that on glibc are a thin wrapper over futexes. Any data structure access must
be non blocking.

Edit: autocorrect

~~~
jasonzemos
I should mention I was using x86. My initial assumption was that the kernel
simply references the robust list (even if it was initially entirely resolved
in userspace), and yields back to the interrupted thread -- I should emphasize
my test showed the kernel _breaks into the lock_ and presents a semi-coherent
_as-is_ structure entering and exiting the handler's lock. Of course this is
all way way UB for portability or complex structures indeed...

~~~
0x0
Even on x86, you could have a pthread_mutex protecting a struct with two
integers that need to be updated "atomically", and have a signal delivered in
the middle?

~~~
jasonzemos
Sure, and then the handler only sees one integer as updated and the other
integer will be updated after the handler. The lock gets silently broken
unfortunately, but there's probably a useful reason for why this is. It could
just deadlock instead.

~~~
gpderetta
Deadlocking would be the ideal outcome. Deadlocks are easy to debug. Silent
concurrent memory corruptions not so much.

------
breadbox
Also, signals become much easier to deal with if your program is single-
threaded. Once threads get involved, it becomes more complex to know which
thread(s) will receive a given signal.

~~~
signa11
> Once threads get involved, it becomes more complex to know which thread(s)
> will receive a given signal.

well, one approach that might be worth looking into would be to designate a
special thread as a signal-handling-only thread. others just block every
signal that can possibly be blocked. this signal-handling-thread then
communicates the signals etc. to others as needed.

prima-facie, this boils down to signal handling for single threaded programs.
what might be the downsides ?

~~~
scottlamb
People say "signals" as if they're just one thing, but I find it more useful
to break them into two categories:

* process-directed signals such as SIGHUP, SIGINT, SIGWINCH, SIGTERM, SIGQUIT, SIGCHLD. They come from outside the process, including the `kill` command, the init system, and the terminal. For these, a dedicated signal-handling thread is a common, practical approach. Even if your program is single-threaded before implementing signal handling, creating a new thread might be the best approach. Or you could integrate signal handling with an event loop via the self-pipe trick.

* thread-directed signals such as SIGSEGV, SIGFPE, SIGBUS (the preceding are all machine exceptions), SIGPIPE, SIGPROF, or anything sent by pthread_kill / pthread_sigqueue. If you need to handle these signals (usually for diagnostics), by definition you have to do it in the thread in question. And you almost certainly need a traditional signal(2) / sigaction(2) style signal handler.

------
jhallenworld
When I first started programming UNIX I thought SIGIO made sense as a good
mechanism for I/O multiplexing. I thought this because of much previous
experience with interrupt handlers in the embedded world. However at the time,
it just did not work (no sigsuspend) and even today it's a big mess. SIGIO
should just be removed- it's the wrong way to handle I/O in UNIX.

------
ausjke
I think these days we all should use sigaction instead of signal()? why is
this totally missed in the post.

[http://stackoverflow.com/questions/231912/what-is-the-
differ...](http://stackoverflow.com/questions/231912/what-is-the-difference-
between-sigaction-and-signal)

------
zAy0LfpBZLC8mAC
I would say: Oh, yes, be as afraid as you can be. But don't let that stop you
from figuring out why it is perfectly rational to be afraid of signals ;-)

Now, it is not impossible to use signals, but there are many opportunities to
screw it up, often in non-obvious ways (so, things seem to work, but it's not
actually reliable). And at the same time, signals almost never give you any
advantage over alternatives if you do it correctly. That profiler thingy might
be one of the rare cases where it actually makes sense.

In particular, what tends to be so tempting about signals is that they are
executed "immediately", so you get to react without any further delay, no
matter what else your program is currently doing--who wouldn't want that?
Except that doesn't actually work, because you need to somehow access the
state of your program in order to actually do anything useful with the signal
notification. But you cannot access that state unless you can be sure it's in
a consistent state and that your accesses won't interfere with what your
program is doing in some unpredictable way. Just like in multithreaded
programming. You have to somehow coordinate with your program to make sure
things happen in an orderly fashion. Which essentially means that you only can
access the program's state at certain times when the program isn't currently
using it. Like, when it is unlocked. Except the signal handler potentially
preempts your program, so you can't use locks to perform the coordination,
because that could deadlock. Except if you were to use locking primitives that
also block signals, so that preemption during critical sections can't occur.
But then, you effectively have a weird polling solution (the unlocking at the
end of the critical section/before entering the event loop dispatcher
effectively acts as if you were polling for signal events).

Also, you cannot even reliably queue signals without potentially dropping
some. Now, the kernel does that anyhow, so you can't rely on all signals being
delivered individually anyhow, but it still is important to understand why
that is (which is also why the kernel behaves the way it does): If you consume
events, you have to either have some mechanism to slow down the source to
prevent it from generating events at a higher rate than you can handle (like,
if you can't keep up reading from a pipe, the writer end of that pipe will
block in order to stop it from producing more data), or you would need
potentially infinite amounts of memory to be able to store all those events
for later processing. Now, the latter isn't really possible, of course - but
it's even worse in signal handlers because you cannot really allocate memory
there because there is only one memory allocator in the libc and that most
definitely is not reentrant (like, you cannot allocate memory right in the
middle of your thread freeing memory).

What this boils down to is that you always have to somehow defer processing of
signals to some point in time where you can actually safely access your
program's state, which is something that you can achieve with pipes and
sockets much more easily.

------
jfoutz
This is one of the great counter examples to betteridge's law.

------
puppers
Signals are not scary if you read the docs! Perhaps you could tone it down
with the exclamation points!

