I've run into a similar issue in Rust with SIGPIPE. The compiler adds the call to signal() to ignore SIGPIPEs when a Rust program starts. Apparently this was to stop a listening server from exiting when a client closed their connection. But in my book I'm teaching Rust by rewriting a classic BSD utility (cat, head, wc, ...) in Rust and none of those utilities have to care about broken pipes because they don't ignore the signal and the OS silently kills the process. Instead, if a Rust program doesn't handle a write failing with a EPIPE error then the process panics and exits with a backtrace. The worst part is that Rust standard library doesn't include a module for handling signals so it's not possible to undo the signal() call ignoring SIGPIPEs without using another crate.
In the end it's fine and you just have to handle broken pipes but it adds a lot of boilerplate to small CLI programs that in C just work as expected. Another twist (and possible further subtle bug) is that most shells set the exit status of a program that exits from a broken pipe to 141 (128 + the signal number, in this case 13). So when you catch the pipe you can exit the process with a status of 141, but that's the shell's behaviour and there's no safe way to fake an exit status.
> The compiler adds the call to signal() to ignore SIGPIPEs when a Rust program starts. Apparently this was to stop a listening server from exiting when a client closed their connection
That’s appalling and irresponsible if true. Not every program is a server! If you want to ignore SIGPIPE you can do that yourself.
Well, that's the source of the bug in Python as well so it's not exactly a unique choice or default. It's tricky to manage this in a cross-platform way.
Python is a high-level language, though, and python code is run through an interpreter, so it's not a surprise that it might handle signals (and other things) differently than your average program. (Not restoring signals handlers to their default on fork()/exec() is definitely a bug, though.)
Rust is intended to be a systems programming language, and a replacement for C/C++. It should not be mucking about with signal handler defaults before main() runs.
The one thing that Rust does right, though, is it resets SIGPIPE to SIG_DFL when spawning processes[0]. Of course, I assume it only does that if you use std::process::Command, not if you use libc::fork()/libc::exec(), or get there some other way.
> Well, that's the source of the bug in Python as well so it's not exactly a unique choice or default. It's tricky to manage this in a cross-platform way.
Well, Python wasn't intended to write the sort of programs that C and Rust gets used for.
In Python, such a choice is not necessarily a design failure. In systems languages it is.
Thank you thank you thank you! Docker processes randomly exiting with "137" suddenly make a lot more sense!
137 - 128 == 9
... that's SIGKILL, probably via some sort of pipe/subshell. We'd figured it was OOM-killer or something similar, but now it seems like there's a bit more of the dots connected between.
Who can/will send random `kill -9`s? Probably the kernel, or some sort of supervisory process.
The usual way to generate an exit status for a signal that normally terminates the process is to explicitly unset the handler and then raise the signal yourself. In C, the latter half of that can be done with the ‘raise’ function; on (at least one) Linux with glibc, this uses tgkill underneath. I assume Rust doesn't expose this though from what you said?
That's a cool trick. There are crates for signal handling, and the libc crate exports raise(), they just aren't part of the standard library which attempts to be cross-platform. But if you're already going to unset the signal handler you might as well do that from the start (in this particular case).
The "fix" also has a bug - `preexec_fn` causes nasal demons if the current process has threads.
Fortunately, Python 3.2 added the `restore_signals` argument for exactly this purpose, and it is enabled by default. The (2010) in the title is quite relevant.
Yeah, the problem is not on the python script side. It seems like a problem with tar or gzip.
Tar expects gzip to react graciously to SIGPIPE. Gzip only registers a SIGPIPE handler if SIGPIPE is not ignored. This is either a bug in gzip, or tar has to make sure it starts gzip without SIGPIPE ignored.
This doesn't mean the failure is any less complex. Tar incorrectly assumes that starting gzip without extra setup doesn't make it ignore SIGPIPE, which is subtly wrong.
Is tar really only expected to work in a standard shell script setup? It's a general-purpose utility after all.
Python not wanting to handle SIGPIPE is perfectly fine. It is less clear whether it's fine for python to keep it disabled for subprocesses.
The other side: Is it okay for `tar` to fail if it is started with SIGPIPE disabled? That's definitely not what I'd expect, although it's maybe excusable since disabled SIGPIPE is nonstandard. You can argue about whether a system utility like tar is expected to handle nonstandard conditions -- I'd tend towards 'yes'.
If you're going to use popen(), you have to be prepared to handle how the utility you call sends data over that pipe. The bug is indisputably on the Python script side unless there's a spec tar claims to adhere to but doesn't.
That said, I don't think it's an unreasonable bug. I certainly wouldn't think any less of a programmer who wrote it.
I remain unconvinced that the decision to ignore SIGPIPE is right. The description is very hard wavy about it, "Python knows what it's doing". Maybe, but I can think of half a dozen other situations where this may trip up someone who does not take this into consideration.
How do similar languages like Perl do it? They seem to work fine doing things the Unix way.
Pretty clearly a problem with the implementation of Python, not a problem with gzip (which is following the POSIX spec) and Linux pipes (which are also POSIX compliant).
periodic nasal irrigation with beer also helps mitigate the impact of nasal demons. although, they may briefly become unusually rowdy before they pass out.
That, however, is an extremely well-known class of bug. The post is about an extremely obscure (but presumably quite common at the time!) class of bug.
Also, if you're writing about bug x, you might make a minimal reconstruction of that piece of code only to showcase bug x, instead of the full version with all bells and whistles.
That's usually prevented by using double dashes ("--") to indicate the end of options, before the file argument. I try to remember to include include double dashes in my shell scripts for as many commands as possible (it's easier to just include them rather than figure out whether a user could possible influence the filename).
Apart from that it's mentioned. I have often seen tarfile being replaced by tar as subprocess, because tarfile is just so slow that it can make or break your use-case.
I'm not sure if tarfile got better since I've last used it.
In the end it's fine and you just have to handle broken pipes but it adds a lot of boilerplate to small CLI programs that in C just work as expected. Another twist (and possible further subtle bug) is that most shells set the exit status of a program that exits from a broken pipe to 141 (128 + the signal number, in this case 13). So when you catch the pipe you can exit the process with a status of 141, but that's the shell's behaviour and there's no safe way to fake an exit status.