A very subtle bug (2010)

mprovost · on Nov 22, 2023

I've run into a similar issue in Rust with SIGPIPE. The compiler adds the call to signal() to ignore SIGPIPEs when a Rust program starts. Apparently this was to stop a listening server from exiting when a client closed their connection. But in my book I'm teaching Rust by rewriting a classic BSD utility (cat, head, wc, ...) in Rust and none of those utilities have to care about broken pipes because they don't ignore the signal and the OS silently kills the process. Instead, if a Rust program doesn't handle a write failing with a EPIPE error then the process panics and exits with a backtrace. The worst part is that Rust standard library doesn't include a module for handling signals so it's not possible to undo the signal() call ignoring SIGPIPEs without using another crate.

In the end it's fine and you just have to handle broken pipes but it adds a lot of boilerplate to small CLI programs that in C just work as expected. Another twist (and possible further subtle bug) is that most shells set the exit status of a program that exits from a broken pipe to 141 (128 + the signal number, in this case 13). So when you catch the pipe you can exit the process with a status of 141, but that's the shell's behaviour and there's no safe way to fake an exit status.

gumby · on Nov 22, 2023

> The compiler adds the call to signal() to ignore SIGPIPEs when a Rust program starts. Apparently this was to stop a listening server from exiting when a client closed their connection

That’s appalling and irresponsible if true. Not every program is a server! If you want to ignore SIGPIPE you can do that yourself.

mprovost · on Nov 22, 2023

Well, that's the source of the bug in Python as well so it's not exactly a unique choice or default. It's tricky to manage this in a cross-platform way.

kelnos · on Nov 22, 2023

Python is a high-level language, though, and python code is run through an interpreter, so it's not a surprise that it might handle signals (and other things) differently than your average program. (Not restoring signals handlers to their default on fork()/exec() is definitely a bug, though.)

Rust is intended to be a systems programming language, and a replacement for C/C++. It should not be mucking about with signal handler defaults before main() runs.

The one thing that Rust does right, though, is it resets SIGPIPE to SIG_DFL when spawning processes[0]. Of course, I assume it only does that if you use std::process::Command, not if you use libc::fork()/libc::exec(), or get there some other way.

[0] https://doc.rust-lang.org/beta/unstable-book/language-featur...

gumby · on Nov 22, 2023

I consider it malpractice

lelanthran · on Nov 22, 2023

> Well, that's the source of the bug in Python as well so it's not exactly a unique choice or default. It's tricky to manage this in a cross-platform way.

Well, Python wasn't intended to write the sort of programs that C and Rust gets used for.

In Python, such a choice is not necessarily a design failure. In systems languages it is.

mjw1007 · on Nov 22, 2023

The attribute for controlling this is implemented but not yet stable: https://doc.rust-lang.org/beta/unstable-book/language-featur...

mprovost · on Nov 22, 2023

I'll have to rewrite some chapters once it's stable and released but I'm not holding my breath... It will be a nice problem to have anyway.

ramses0 · on Nov 22, 2023

Thank you thank you thank you! Docker processes randomly exiting with "137" suddenly make a lot more sense!

137 - 128 == 9

... that's SIGKILL, probably via some sort of pipe/subshell. We'd figured it was OOM-killer or something similar, but now it seems like there's a bit more of the dots connected between.

Who can/will send random `kill -9`s? Probably the kernel, or some sort of supervisory process.

afr0ck · on Nov 22, 2023

If the oom-killer is called, you'll see something on dmesg.

dasyatidprime · on Nov 22, 2023

The usual way to generate an exit status for a signal that normally terminates the process is to explicitly unset the handler and then raise the signal yourself. In C, the latter half of that can be done with the ‘raise’ function; on (at least one) Linux with glibc, this uses tgkill underneath. I assume Rust doesn't expose this though from what you said?

mprovost · on Nov 22, 2023

That's a cool trick. There are crates for signal handling, and the libc crate exports raise(), they just aren't part of the standard library which attempts to be cross-platform. But if you're already going to unset the signal handler you might as well do that from the start (in this particular case).

mgaunard · on Nov 22, 2023

That's silly, there are ways to mark that network operations shouldn't cause signals.

spacechild1 · on Nov 22, 2023

> The compiler adds the call to signal()

The compiler adds library calls? Seriously?

o11c · on Nov 22, 2023

The "fix" also has a bug - `preexec_fn` causes nasal demons if the current process has threads.

Fortunately, Python 3.2 added the `restore_signals` argument for exactly this purpose, and it is enabled by default. The (2010) in the title is quite relevant.

interroboink · on Nov 22, 2023

Though from the bug report, the fix did not make it into Python 2.7 ... still being used some places today — by other people, not me (:

There is a backport fixed version though: https://code.google.com/archive/p/python-subprocess32/

formerly_proven · on Nov 22, 2023

In my experience virtually every use of the subprocess module at the time of first writing contains at least one more or less subtle bug.

cabirum · on Nov 22, 2023

> This code has a bug

From what I see, the code should be expected to work without requiring workarounds; instead, it is one of following:

- a python design problem

- gzip signal handling bug

- a flaw in linux pipe spec

yccs27 · on Nov 22, 2023

Yeah, the problem is not on the python script side. It seems like a problem with tar or gzip.

Tar expects gzip to react graciously to SIGPIPE. Gzip only registers a SIGPIPE handler if SIGPIPE is not ignored. This is either a bug in gzip, or tar has to make sure it starts gzip without SIGPIPE ignored.

This doesn't mean the failure is any less complex. Tar incorrectly assumes that starting gzip without extra setup doesn't make it ignore SIGPIPE, which is subtly wrong.

ndsipa_pomu · on Nov 22, 2023

> the problem is not on the python script side. It's a problem with tar or gzip

As it works perfectly in the equivalent shell script, I'd point the finger at Python for not wanting to handle SIGPIPE signals.

yccs27 · on Nov 22, 2023

Is tar really only expected to work in a standard shell script setup? It's a general-purpose utility after all.

Python not wanting to handle SIGPIPE is perfectly fine. It is less clear whether it's fine for python to keep it disabled for subprocesses.

The other side: Is it okay for `tar` to fail if it is started with SIGPIPE disabled? That's definitely not what I'd expect, although it's maybe excusable since disabled SIGPIPE is nonstandard. You can argue about whether a system utility like tar is expected to handle nonstandard conditions -- I'd tend towards 'yes'.

torstenvl · on Nov 22, 2023

If you're going to use popen(), you have to be prepared to handle how the utility you call sends data over that pipe. The bug is indisputably on the Python script side unless there's a spec tar claims to adhere to but doesn't.

That said, I don't think it's an unreasonable bug. I certainly wouldn't think any less of a programmer who wrote it.

xorcist · on Nov 22, 2023

I remain unconvinced that the decision to ignore SIGPIPE is right. The description is very hard wavy about it, "Python knows what it's doing". Maybe, but I can think of half a dozen other situations where this may trip up someone who does not take this into consideration.

How do similar languages like Perl do it? They seem to work fine doing things the Unix way.

gumby · on Nov 22, 2023

Pretty clearly a problem with the implementation of Python, not a problem with gzip (which is following the POSIX spec) and Linux pipes (which are also POSIX compliant).

josephcsible · on Nov 22, 2023

I'd call it a Python design problem. Python should have never let its changes to signal dispositions leak into subprocesses by default.

tbm57 · on Nov 22, 2023

this is clearly fixed by popen starting a docker container for the subprocess to run in

Brian_K_White · on Nov 22, 2023

beershootingoutofnose.gif

shoo · on Nov 22, 2023

periodic nasal irrigation with beer also helps mitigate the impact of nasal demons. although, they may briefly become unusually rowdy before they pass out.

actionfromafar · on Nov 22, 2023

The inevitable error messages can be explained by an LLM?

tromp · on Nov 22, 2023

Related: https://news.ycombinator.com/item?id=22647539

thirdplace_ · on Nov 21, 2023

i think more bugs reside here.

if tarball or path is user input then they could be used to inject tar command options.

Smaug123 · on Nov 22, 2023

That, however, is an extremely well-known class of bug. The post is about an extremely obscure (but presumably quite common at the time!) class of bug.

carlmr · on Nov 22, 2023

Also, if you're writing about bug x, you might make a minimal reconstruction of that piece of code only to showcase bug x, instead of the full version with all bells and whistles.

ndsipa_pomu · on Nov 22, 2023

That's usually prevented by using double dashes ("--") to indicate the end of options, before the file argument. I try to remember to include include double dashes in my shell scripts for as many commands as possible (it's easier to just include them rather than figure out whether a user could possible influence the filename).

karatinversion · on Nov 22, 2023

tarball is the mandatory parameter to the preceding -f and can’t be used to inject commands

dveeden2 · on Nov 22, 2023

Maybe better to use something like https://docs.python.org/3/library/tarfile.html instead of calling some "outside" utilities?

This might be safer (command injection) and maybe more reliable.

However I don't know what I would have chosen and hindsight is always 20/20 and there might be other external commands and requirements...

kmill · on Nov 22, 2023

If you're wondering about downvotes, it's probably because this is mentioned briefly.

> It probably should be using tarfile, but let’s ignore that for the moment.

carlmr · on Nov 22, 2023

Apart from that it's mentioned. I have often seen tarfile being replaced by tar as subprocess, because tarfile is just so slow that it can make or break your use-case.

I'm not sure if tarfile got better since I've last used it.