
EINTR and PC Loser-Ing: The “Worse Is Better” Case Study (2011) - akkartik
http://blog.reverberate.org/2011/04/eintr-and-pc-loser-ing-is-better-case.html
======
haberman
Wow. I am the author of this, but I forgot all of this after 8 years and it
was like reading an article written by someone else. :)

I think the only thing I would add now is that SA_RESTART does seem like
inherently the wrong design for this. The code that registers the signal
handler might be totally unrelated to the code making the system call. They
might be from different libraries, written by different people, completely
unaware that the other exists. The right place to specify whether the system
call should automatically retry is at the place where the system call is made.
So I don't think Unix actually evolved into the "right thing" here.

~~~
ncmncm
Yes, it's fraught. Unix didn't evolve to the "right thing", if indeed there is
a right thing to evolve to.

An important consideration here is that Unix, for much of its history, ran on
machines with only a few kilobytes of memory -- a really big one might have
256k. So unnecessary complexity in the kernel translates to it being too big
to be useful at all. Extra code in user programs wasn't great either, but
couldn't make the whole system unusable, so it was less bad.

System calls that might return EINTR could have been wrapped in a standard
library call that would take, say, an extra argument that says what to do --
maybe even a function pointer to call. But there also wasn't really a standard
library, yet, and not everybody wanted that much junk going on around their
system calls. You got less than a million instructions per second, and system
calls burned often a millisecond or hundred. Anyway, what would a library pass
to it? Something it got from its caller?

But if you want a loop, you know how to write one, and you can put in it
exactly what you want. So the only problem is what happens in some library you
use that does system calls. At the time it was considered good form to return
to the caller if you got EINTR, and let the caller decide what to do, instead
of looping in the library. Then your caller could have a loop calling you,
instead.

Then BSD did their thing, and then library callers that might need to break
out had to have a setjmp to break out to, and generally had to shut down
immediately, afterward, because the library was probably left in a corrupt
state.

So things are still not right, by any defensible definition.

But there's no evidence Richard Gabriel really understood these tradeoffs, or
cared, really. It was really all a metaphor for LISP vs. C, where C was
"worse" and drove out LISP, and Unix and PDP-11s drove out TOPS-10 and then
the Lisp Machine, and then the PC came along with DOS, and the world went all
to hell.

And here we all are!

------
amluto
There is a genuine complication that this post seems to have missed. If a
system call is interrupted and returns to user space (due to a signal, most
likely), then it is in one of three states: nothing happened, the call
finished, or it is part-way done. If nothing happened, then PC can point back
to the syscall or EINTR can be returned. (EINTR is a promise that no progress
was made.) If it’s all done, it can return success. But, if it’s only part way
done, then either it needs to restart transparently or the user code needs to
be prepared to handle this.

With recv(), for example, the return value can indicate that some but not all
bytes were read. When read()one from a file, most programs aren’t prepared for
partial success.

Linux has some ERESTARTSYS mechanisms to handle some of these cases. The code
and the semantics are a bit gross.

~~~
mwcremer
> most programs aren’t prepared for partial success

But read() and write() return the number of bytes actually read or written. So
actually they are, or should be...

~~~
amluto
On a normal file, unless it’s NFS with a special mount option, read() and
write() never return partial results, which means that many programs can’t
handle it.

Even modern async/await/promise/future systems frequently have no reliable way
to express partial success.

------
wrs
The Unix side of this comparison is talking about signals, but what is the ITS
side talking about? A technical description of that is available at [0].

>Another way to state the PCLSRing modularity principle is to say that when a
process looks at the program counter of another process, it must always see a
User mode PC, never an Exec mode PC. When one process wants to access the
state of another process, and the target process happens to be executing in
Exec mode, then something must be done in order to put the target process into
User mode.

[0] [https://hack.org/mc/texts/pclsr.txt](https://hack.org/mc/texts/pclsr.txt)

~~~
comex
And to complete the circle, the Unix way for one process to look at another’s
program counter, ptrace(), works via signals.

I recently had to implement a version of that behavior when writing a GDB stub
for an embedded microkernel. The system in question originally had neither a
debugger nor an equivalent of signals, or any other way for system calls
(message sends and receives) to be interrupted – so the programs written for
it naturally assumed they wouldn’t be, and didn’t have any code to retry
interrupted operations. I could have gone through and changed those programs,
but it would have been a lot of work and easy to get wrong. Instead, just as
described in that link, I had the GDB stub set the PC to point back to the
system call instruction, so it would be restarted after continuing the thread,
and adjust the argument registers if necessary (e.g. if a combined message
send+receive operation was interrupted after the send but before the receive,
the arguments had to be changed to request only a receive). This was made more
difficult by the fact that the microkernel originally threw away some of those
register values, in the name of efficiency, assuming they wouldn’t be needed
once the system call had started…

------
iiirogers
Discussion on EINTR wrt close in Linux:
[https://lwn.net/Articles/576478/](https://lwn.net/Articles/576478/)

Glibc shows use with TEMP_FAILURE_RETRY:
[https://www.gnu.org/software/libc/manual/html_node/Opening-a...](https://www.gnu.org/software/libc/manual/html_node/Opening-
and-Closing-Files.html)

Using TEMP_FAILURE_RETRY when close always deallocates the fd will at best get
EBADF, at worse you close a random file descriptor.

~~~
mcculley
This is the best example of how bad EINTR is as a design, in my opinion. I was
painfully aware of how hard it is to write correct code when dealing with
interruptible system calls, but even I didn't know that close() could
potentially be interrupted.

~~~
jstimpfle
It's not hard to write correct code in the face of EINTR. You can simply try
again. Or just use libc (or another language's wrapper).

There is a real need for interrupts during a "slow" system call. I much prefer
EINTR to sth. like completion ports. EINTR is simple and usable. (I don't
understand well enough the problems around interrupted close())

~~~
mcculley
One cannot simply try again. One has to take the byte count returned by read()
and write() and deduct that from what was expected. One has to find out what
time it is after returning prematurely from sleep() and nanosleep(). For a lot
of the interruptible system calls, one has to write defensive code.

~~~
jstimpfle
> For a lot of the interruptible system calls, one has to write defensive
> code.

Errrr, duh. Remove "interruptible" and that sentence still holds very true.

If you don't care about being interruptible, write restarting wrappers doing a
simple subtraction. (Or use fread(3)/fwrite(3)). It's not hard.

nanosleep(2) gives you back the remaining time, it's about as easy to use as
read(2)/write(2). Very simple. sleep(3) doesn't, but due to seconds resolution
it's not a generally useful API anyway. Still useful for quick debugging. Not
a system call, but a libc call, by the way.

~~~
mcculley
I think we are in complete agreement that it is not hard to write defensive
wrappers to work around the fact that the API is a mess.

------
gwern
"The "worse" system (Unix) did indeed do "the right thing" eventually, even if
it didn't at first. "Worse is better" systems incrementally improve by
responding to user needs. Since users got tired of checking for EINTR, the
"worse" system added the functionality for addressing this pain point. The
whole thing did leave a rather large wart, though"

Which of course is exactly what RPG predicts about 'worse' systems: after a
lot of pain, they will gradually evolve into doing the right thing, but
probably never as well as if they had been correctly conceived from the start.

~~~
klodolph
And the “better” systems don’t get a chance to evolve into the right thing,
because while you can kludge together a fix for something that’s broken, it’s
much harder to remove an unnecessary or extraneous feature.

That, and the “worse” systems have been on the market for nine months by the
time the better systems are available. I try to err on the side of
underengineering systems, as long as there are no lives on the line.

~~~
c256
I think I understand what you’re trying to say, it the UNIX/New Jersey
approach came much later than the MIT/ITS approach. The conversation happened
(if I recall correctly) because the ITS approach required more architecture-
specific changes (to revert/restore state), while the later UNIX approach was
quickly ported to new architectures. The ITS people were effectively asking
“how’d you manage to handle this gnarly thing so quickly, so often.”, and the
UNIX people’s response was “we punted”.

------
ts4z
[http://wiki.c2.com/?DanWeinreb](http://wiki.c2.com/?DanWeinreb) Dan Weinreb
was the MIT guy; Bill Joy was the Unix partisan. Before he passed away, Dan's
blog had his side of this story; unfortunately, I can't find it now.
Fortunately WikiWikiWeb knows a little!

Edit: found it!
[https://web.archive.org/web/20121107034606/http://danweinreb...](https://web.archive.org/web/20121107034606/http://danweinreb.org/blog/the-
worse-is-better-idea-and-the-future-of-lisp)

~~~
larrik
I'm confused why the guy from Berkley (in CA) originally from Michigan is
suddenly referred to as "the New Jersey guy".

~~~
LukeShu
The essay being quoted compares two design philosophies, the MIT approach:

 _> I will call the use of this philosophy of design the ``MIT approach.''
Common Lisp (with CLOS) and Scheme represent the MIT approach to design and
implementation._

and the New Jersey approach:

 _> Early Unix and C are examples of the use of this school of design, and I
will call the use of this design strategy the ``New Jersey approach.''_

(Unix came out of Bell Labs, which is in Murray Hill, New Jersey.)

When the anonymized Bill Joy is introduced, it is written that he is _" from
Berkeley (but working on Unix)"_. That parenthetical is important because it
establishes him as working under the New Jersey approach (and possibly working
with people in New Jersey), and therefore as being a "New Jersey guy", despite
not physically being located there.

------
emtel
"worse is better" seems to be born out again and again and again. In fact it's
fairly hard to find cases where "do the right thing" won. Here's some examples
where a clearly flawed technology became totally dominant.

    
    
      * Unix
      * The C language
      * DOS and Windows
      * Javascript
      * The x86 instruction set.
    

And here are some cases where dominance was achieved in a more narrow market,
or in which one might argue about which technology was "worse", or where its
not totally clear which was the winner.

    
    
      * Linux/SysV vs any of the BSDs
      * Ethernet vs ATM, SONET, etc.
      * Android vs iOS
      * MySQL vs Postgres (Postgres is having a renaissance, but for a long time mysql was very dominant)
    

What is also especially striking is how often "The Right Thing" in programming
languages seems to lose, whereby "lose" I mean "have a very small fraction of
market share despite almost everyone agreeing that the language is better in
almost every way to more widely-used languages like C/C++, Java, etc."

    
    
      * LISP and Scheme
      * The whole ML family
      * Haskell
      * Rust (although its soon to tell on this one)
    

Meanwhile, languages that aren't obviously terrible, but have major flaws,
like Java and Python, are ubiquitous, and the dumpster-fire languages Perl and
PHP have enjoyed massive success.

~~~
jstimpfle
My impression is that there are various, even contradicting understandings and
opinions in the wild what "worse is better" actually means. Rarely do people
who interpret it in a cynical way demonstrate a good understanding of the
total set of qualities that matter for the success of a technology.

If you think LISP or Haskell are without flaws (despite all the years of
hype), then... you're wrong.

~~~
lisper
It's not that they are without flaws, it's that it is a lot easier to
fix/ignore/work-around the flaws than in other languages, where the flaws are
unchangeable (except by the standards committee) and constantly in your face.

~~~
ncmncm
No. You are just used to your flaws, and not used to theirs; and have learned
not to complain about yours, but complaints about theirs are welcomed in your
circle.

In other words: other systems really are crap, but so is yours, really, but we
all get our stuff done, despite it all. Sometimes things get better (Moore's
law has forgiven all manner of evil) and sometimes they get worse (Java).

~~~
lisper
> You are just used to your flaws, and not used to theirs

This is a testable theory: name me a flaw in Common Lisp that cannot easily be
worked around. I'll bet that for any flaw you name, I'll be able to show you
an easy way to work around it.

~~~
jstimpfle
Of course you will be able to give a brittle, half-assed work around for any
issue that someone throws at you. But you know, not all people are obsessed
with LISP, not all people are ok with half-assed workarounds.

I think you are being very dishonest to yourself if you think that LISP is
universally the best just because you can theoretically turn it into anything.
You can't, really. Good tooling matters. Error messages matter.
Standardization matters. Syntax matters (to some extent). Mindshare matters (a
lot). There's a lot of subtle qualities that you can't just fake.

When I write performance sensitive code I'll personally just pick C. I'm sure
there's a Common Lisp solution available to specify machine types and compile
efficient code. (Probably there's many, which is another problem). But what's
the point in putting lipstick on a pig when I can just use the real deal that
works perfectly for what I'm doing?

~~~
lisper
"Half-assed" is in the eye of the beholder. If you appoint yourself as the
ultimate arbiter of half-assedness then your position becomes unfalsifiable.

But I think you might be surprised at how un-half-assed these things can be.

~~~
jstimpfle
> But I think you might be surprised at how un-half-assed these things can be.

You might be surprised how little I care, because I'll just use a solid,
battle-tested tool instead.

Are you suggesting we should rewrite compression or imaging or ... software in
Common Lisp because it will be so much better?

~~~
lisper
> You might be surprised how little I care

You obviously care enough to have taken the initiative to engage me on this.

> Are you suggesting we should rewrite compression or imaging or ... software
> in Common Lisp because it will be so much better?

No. It's generally never worthwhile to _re-_ write _anything_. But one should
not fall prey to the sunk cost fallacy either.

~~~
jstimpfle
If you think we should not "fall prey to the sunk cost fallacy" and write
future software in Common LISP, then I recommend you to start working on
improving the Common LISP systems-level coding situation. Because I'm not
willing to replace this

    
    
       unsigned long update_crc(unsigned long crc, unsigned char *buf,
                                int len)
       {
         unsigned long c = crc;
         for (int n = 0; n < len; n++)
           c = crc_table[(c ^ buf[n]) & 0xff] ^ (c >> 8);
         return c;
       }
    

with something like this
([https://github.com/pmai/Deflate/blob/master/deflate.lisp](https://github.com/pmai/Deflate/blob/master/deflate.lisp))

    
    
        #-lispworks
        (defun update-crc32-checksum (crc buffer end)
          (declare (type (unsigned-byte 32) crc)
                   (type (simple-array (unsigned-byte 8) (*)) buffer)
                   (type fixnum end)
                   (optimize (speed 3) (debug 0) (space 0) (safety 0))
                   #+sbcl (sb-ext:muffle-conditions sb-ext:compiler-note))
          (let ((table (load-time-value (generate-crc32-table)))
                (cur (logxor crc #xffffffff)))
            (declare (type (simple-array (unsigned-byte 32) (256)) table)
                     (type (unsigned-byte 32) cur))
            (dotimes (i end)
              (declare (type fixnum i))
              (let ((index (logand #xff (logxor cur (aref buffer i)))))
                (declare (type (unsigned-byte 8) index))
                (setq cur (logxor (aref table index) (ash cur -8)))))
            (logxor cur #xffffffff)))
    
        #+lispworks
        (defun update-crc32-checksum (crc buffer end)
          (declare (type (unsigned-byte 32) crc)
                   (type (simple-array (unsigned-byte 8) (*)) buffer)
                   (type fixnum end)
                   (optimize (speed 3) (debug 0) (space 0) (safety 0) (float 0)))
          (let ((table (load-time-value (generate-crc32-table)))
                (cur (sys:int32-lognot (sys:integer-to-int32 
                                            (dpb (ldb (byte 32 0) crc) (byte 32 0) 
                                             (if (logbitp 31 crc) -1 0))))))
            (declare (type (sys:simple-int32-vector 256) table)
                     (type sys:int32 cur))
            (dotimes (i end)
              (declare (type fixnum i))
              (let ((index (sys:int32-to-integer
                            (sys:int32-logand #xff (sys:int32-logxor cur (aref buffer i))))))
                (declare (type fixnum index))
                (setq cur (sys:int32-logxor (sys:int32-aref table index)
                                            (sys:int32-logand #x00ffffff 
                                                              (sys:int32>> cur 8))))))
            (ldb (byte 32 0) (sys:int32-to-integer (sys:int32-lognot cur)))))
    

Have a good day.

~~~
lisper
That is not exactly an apples-to-apples comparison. You've taken a very simple
C function and compared it to Lisp code written in two different ways to take
advantage of features available only in specific compilers. Equivalent C code
would be chock-full of #ifdefs and look just as ugly. I could produce an
equally contrived example where to do the same thing as two lines of Lisp
would require pages and pages of C.

Second, these code snippets are not comparable in their functionality. The
Lisp code generates the crc table, the C code assumes it has already been
done. If you wanted to add bounds checking to the C code you would have to
make major changes. If you wanted to add bounds checking to the Lisp code, all
you would have to do is change the optimization settings. (This is one of the
reasons that today's computing world is a swiss-cheese of security holes.)

Third, there are any number of ways to make the Lisp code look significantly
prettier. Comparable code would look something like:

    
    
        (defun crc (crc l)
          (dolist (item l)
            (setf crc (logxor (aref crc-table (logand 255 (logxor crc item))) (ash c 8))))
          crc)
    

It's even easy to embed an infix parser [1] in Lisp so that you could write:

    
    
        (defun crc (crc l)
          (dolist (item l)
            (setf crc infix(crc_table[(crc ^ item) & #xff] ^ (crc >> 8)))
          crc)
    

if you wanted do. You could also embed all this in a little DSL so that the
resulting semantics and even the generated code would be exactly the same as
the C code.

Fourth, most Lisp systems have a foreign function interface, so if you want to
you can call C functions directly and so get all the benefits of C (such as
they are) from within Lisp.

\---

[1]
[http://www.cs.cmu.edu/Groups/AI/util/lang/lisp/code/syntax/p...](http://www.cs.cmu.edu/Groups/AI/util/lang/lisp/code/syntax/parcil/parcil.cl)

~~~
jstimpfle
> The Lisp code generates the crc table

I don't think it's generating a table, just calling generate-crc32-table.
Let's remove that line and the difference is still ridiculous (even if you
look at only one of the two implementations).

> Comparable code would look something like:

Can't see any machine types here.

> (setf crc infix(crc_table[(crc ^ item) & #xff] ^ (crc >> 8)))

You really don't see how ridiculous that is? You are chasing exactly the
target that is unholy in your eyes. Except, you will never reach it (the lisp
version is really ugly, probably has really bad tooling, poor error messages,
poor standardization and mindshare, and so on)...). While you could just use
the right tool for the job.

> and compared it to Lisp code written in two different ways

question is, WHY was the Lisp code written in two different ways, but not the
C code? Go figure.

> If you wanted to add bounds checking to the C code you would have to make
> major changes.

If you want automated and _perfect_ bounds checking, you need a few more
flavours of GC/OOP (because you need some automated notion where the "length"
field is, and that length field needs to be authoritative), and you will end
up with less modular, less portable code (because you can't just pass sub-
arrays / smaller lengths, but always need to pass object handles. Can't
support sub-allocators, etc). It's a tradeoff. If you go for automated bounds
checking, that's a total different game. Personally I don't like to do GC, and
I like to manually insert explicit "bound" checks at strategic locations. But
YMMV.

~~~
lisper
> I don't think it's generating a table, just calling generate-crc32-table.

Well, yeah. What do you suppose a function called generate-crc32-table could
plausibly do?

> Let's remove that line and the difference is still ridiculous (even if you
> look at only one of the two implementations).

OK, lets' do that experiment:

    
    
        (defun update-crc32-checksum (crc buffer end)
          (declare (type (unsigned-byte 32) crc)
                   (type (simple-array (unsigned-byte 8) (*)) buffer)
                   (type fixnum end)
                   (optimize (speed 3) (debug 0) (space 0) (safety 0)))
          (let ((cur (logxor crc #xffffffff)))
            (declare (type (simple-array (unsigned-byte 32) (256)) table)
                     (type (unsigned-byte 32) cur))
            (dotimes (i end)
              (declare (type fixnum i))
              (let ((index (logand #xff (logxor cur (aref buffer i)))))
                (declare (type (unsigned-byte 8) index))
                (setq cur (logxor (aref table index) (ash cur -8)))))
            (logxor cur #xffffffff)))
    

That's 14 LOC, and 8 of those are declarations. For an example that you
cherry-picked to be a perfect match for the kind of task that C was designed
to do.

> You really don't see how ridiculous that is? You are chasing exactly the
> target that is unholy in your eyes.

You are completely missing the point, which was:

> It's not that they are without flaws, it's that it is a lot easier to
> fix/ignore/work-around the flaws than in other languages, where the flaws
> are unchangeable (except by the standards committee) and constantly in your
> face.

And specifically, you were responding to this:

> I'll bet that for any flaw you name, I'll be able to show you an easy way to
> work around it.

You haven't actually done that, you've just exhibited some nice clean C code
and some ugly CL code. I was just showing you some of the possible ways that
the Lisp code could be improved. I threw the infix in there not because I was
advocating it, just to show you that it was possible.

So what exactly is it about that code that makes it "ridiculous" in your eyes?
Whatever it is, I'll bet I can fix it with very little effort.

For example:

> Can't see any machine types here.

That's right, I left them out (because you can do that in CL if you want). Is
that what you want to see, a cleaner syntax for type declarations? Because
that's trivial, a few DEFTYPEs and a two-line macro.

> WHY was the Lisp code written in two different ways, but not the C code?

Because the Lisp code was trying to leverage system-specific optimizations
while remaining portable. Do you seriously doubt that I could find an example
of ugly #idef-laden C code that does the same?

> If you want automated and perfect bounds checking, you need [a bunch of
> stuff]

That's right. And why is it that you don't see that in C? Because adding that
to C is _hard_. Really really hard. So hard that in 47 years it still hasn't
been done in any standardized way. Yes, it's true that CL is not a
particularly good systems programming language out of the box. But turning
Lisp into a good systems programming language (if that's what you want) is
relatively easy, almost an elementary exercise. And it can be done at the user
level. No need to wait for a language design committee.

You want to know what is ridiculous? A language that after 47 years still
doesn't have a linked list as part of its standard library because the
language design makes it impossible.

~~~
kazinator
C doesn't have linked lists in its standard library probably because there is
a myriad of ways of doing linked lists with different performance, stylistic
and other trade-offs.

Should it be macro-based, like that <sys/queue.h> thing from BSD? Or just pure
declarations?

Many C programs that maintain lists favor the "intrusive container" approach:
mixing in the link node into their own payload data structure and then using
that structure itself as a list node.

C++ certainly provides lists. C++ is opinionated: they pick a representation
(or at least API) and stick it in.

~~~
lisper
> C doesn't have linked lists in its standard library probably because there
> is a myriad of ways of doing linked lists with different performance,
> stylistic and other trade-offs.

No, that's not why. The fact that there are myriad ways to implement linked
lists is a fact independent of any programming language. And yet Lisp somehow
manages to offer linked lists as a native part of the language.

The reason C cannot be extended in this way is that its memory model precludes
it. You can't add GC to C because of the way that pointers work.

> C++ certainly provides lists.

Yes. Many languages do. Nonetheless, my original claim, that it is easier to
work around Lisp's limitations than those of other languages, stands.

~~~
kazinator
C's memory model also precludes decent string handling also, yet it has a
string library. But that's it; the only other data structuring provided is the
_qsort_ function for sorting arrays and _bsearch_ for searching them. I think
it's deliberate. It looks like ISO C tends to avoid specifying new library
features that can (and likely would be) be written entirely in C.

~~~
lisper
> it has a string library

But not a decent one :-)

~~~
jstimpfle
Making the point. C is not a language for convenient canned solutions. But I'm
not too sad about that. There are a myriad ways of doing strings, and most
high-level programming languages are still stuck in the 90s with UCS-2
encodings or such, and offering only solutions that don't scale well for, say,
dozens of megabytes of strings. C isn't stuck, at the cost of convenience.

------
xenadu02
A decent number of theoretically interruptible system calls don’t actually get
interrupted (or automatically retry) regardless of the signal settings because
no one bothers to check for it and the calls are cheap.

Honestly only IO really has any excuse for supporting this sort of behavior.

------
ansible
In general I think that programs using the POSIX API to deal with system
calls, threads, and signals is unnecessarily hard to do _correctly_.
Especially if you want good performance.

I'd welcome some completely new operating system API that guides programmers
towards correct and efficient solutions, rather than leaving numerous pitfalls
that may only be discovered after the software is put into production.

------
jforberg
This discussion seems to ignore the fact that being blocked in a "long-running
system call" is the normal state for many (most?) Unix services.

If you look at `ps ax` on your system, you'll likely see about a hundred
processes. But if you look at `top`, you'll see only a handful of processes
having non-zero CPU usage. Why? Because most processes are just waiting (in a
system call) for something to do. A web server is blocked in a
select/poll/epoll() call waiting for a connection. Your shell is blocked in a
read() call waiting for you to type something. This is just the normal way
that a main loop is implemented on Unix.

When you kill one of these processes, they need a way to break out of their
loop and with the EINTR approach, they get a chance to break and exit.

I'm far from convinced that a "majority" of services want to just catch
signals and carry on.

~~~
haberman
Correct me if I'm wrong, but I think if you kill a process normally, it will
invoke a signal handler which will exit the process (maybe writing a core file
first) without ever returning to normal program flow.

Recognizing EINTR at the program level isn't required for this kind of
shutdown. I think the system call will only return if the signal is ignored or
if the signal handler returns, but you would only do this if you thought you
had recovered from the error.

~~~
jforberg
This is true, but if anything it reinforces my point that continuing past a
signal is the exception, not the rule.

In the general case services are not at liberty to just exit(), they need to
perform some kind of active cleanup action before exit. So the signal handler
would set an "exit flag" somewhere and the EINTR would be an indication for
the main loop to check this flag before continuing.

The only common case I can think to continue past signals is SIGHUP, which
some services interpret as a command to re-read their configuration file. In
this case, you are essentially doing a shutdown and startup sequence anyway,
only in a possibly more efficient way. E.g. the case of a web server, if you
were previously listening on port N there's no reason to believe that the new
config file won't ask you to instead listen on port M. So you will be closing
down most connections anyway, and catching SIGHUP is mostly an optimisation as
exiting and restarting would have a similar effect.

~~~
abbeyj
For continuing past signals, what about SIGCHLD? I don't think anybody wants
that to kill their process.

Most programs are also going to want SIGSTOP, SIGTSTP, SIGTTIN, SIGTTOU, and
SIGCONT to be handled transparently with the process continuing on as if
nothing happened. I've actually been bitten by this. I had written a program
that worked fine on Linux. While it was waiting for input on stdin you could
suspend it with Ctrl-Z and then resume it with `fg`. When my friend tried this
on his machine (a Mac, IIRC), it would suspend fine. Then resuming it would
cause it to immediately die with "Interrupted system call". I hadn't written
any code to handle EINTR. I'd been getting away with it because something in
glibc and/or the kernel were helping me out and hiding this detail for me.

After a SIGWINCH or a SIGINFO you probably want to continue on too. If you've
registered a handler for SIGUSR1 and/or SIGUSR2 you usually don't shut down
after receiving them.

SIGPIPE is arguable. If you've decided to handle broken pipes by looking for
errno == EPIPE then you likely want SIGPIPE to be ignored entirely, as if it
had never happened. But you have to opt into this behavior by explicitly
setting SIGPIPE to be ignored. The default behavior is to terminate your
process.

For SIGALRM you probably do want a system call to be interrupted and return
EINTR. But that usually doesn't indicate that your program will quit but that
you wanted a timeout on some operation and that you intend to continue on with
other work in the case that it didn't complete quickly.

The issue here is that signals have a bunch of different behaviors shoehorned
into them. There are things like SIGSEGV and SIGBUS that need to be handled
synchronously because they related to the current instruction. There are
things like SIGINT, SIGINFO, and SIGTSTP that come from user action on the
terminal. There are things like SIGCHLD that are generated by the kernel in
response to events on the system. And there are things like SIGTERM and
SIGUSR1 that are generated by a different process calling kill(2). Trying to
make blanket statements about all of them is tricky.

------
toast0
That you can interrupt long running system calls with a signal provides a way
to add a timeout to system calls that may (or probably will) block, by
scheduling a timer signal with alarm or setitimer in advance of the possibly
time consuming system call. Which is more clear is certainly up for debate, of
course. Additionally, that break sends an interrupt signal also allows the
user to bail out of long running system calls and at least potentially get
back a responsive system.

~~~
zAy0LfpBZLC8mAC
No, you can't, because that strategy has a race codition.

~~~
jschwartzi
You should really explain what that race condition is.

~~~
sersnth
I think they are referring to the fact that the signal could be triggered
before the system call is executed, in which case you will miss the signal and
the system call will never time out. For example:

    
    
      alarm(1);
      // alarm could theoretically be triggered here before read() syscall is executed
      read(...);
    

This sort of problem is what the self pipe trick
[https://cr.yp.to/docs/selfpipe.html](https://cr.yp.to/docs/selfpipe.html) is
made to solve. I believe it is not really necessary anymore because there are
functions like pselect() which solve this by accepting a sigmask parameter.

Anyway, most system calls that one would want to have a timeout for have
alternatives that accept a timeout parameter.

------
nibbula
It's sad, but worse is better is so pervasive, that even many of you very
smart people have difficulty conceiving of what the problem is, much less what
would be a better way, not because it's particularly hard, but because of
technology culture. It seems trivial. Just return an error, right?

I've had to deal with the problems of EINTR for so long. SA_RESTART seems
nice, but you can't rely on it, and it only begins to scratch the surface of
what would actually be "better".

How about you can call anything in a signal handler?

How about any code, user or kernel, be safely re-entered?

Of course you can't magically solve every concurrency and resource contention
problem, but having functional style, and things like unwind-protect, language
support for safety, etc. goes a surprisingly long way.

How about you can have arbitrarily many signal handlers for any condition and
have arbitrarily many places to continue on the stack, which can receive
arbitrary data?

How about I can call a system call from inside a signal handler, from inside
the debugger, from inside a signal handler, inside a system call, and pop up a
UI to ask the user what they want?

How about every piece of code can be moved around and called from anywhere?

How about you can pass around an objects that can resume a function or a
system call in the middle?

How about the thing reading trashy network packets doesn't have absolute
power, while my user process doesn't have permission to close the lid or play
audio?

Not just Lisp code, but a large fraction of modern languages that have things
like GC and bignums, have to make sure it's safe to do anything, even in the
middle of adding a number. So why can't we have these things that have been
well known for over 40 years? But I'm sure you're familiar with the relatively
small set of crews that designed most the stuff we use. vmlinux, kernel32.dll,
and even shabby xnu, might be cute, like the jerq is a cute joke on the perq,
if we didn't have to actually rely on them for real stuff.

Unfortunately, it's not hard to see that the worse-is-better culture is a
result of larger human culture, which is pushed around by the same typical
things. But the good thing is that culture being intangible, it only takes the
mostly mental acts of skepticism, education, and personal communication, to
change it.

