Hacker News new | past | comments | ask | show | jobs | submit login
A very subtle bug (2010) (nelhage.com)
118 points by r4um 9 months ago | hide | past | favorite | 38 comments



I've run into a similar issue in Rust with SIGPIPE. The compiler adds the call to signal() to ignore SIGPIPEs when a Rust program starts. Apparently this was to stop a listening server from exiting when a client closed their connection. But in my book I'm teaching Rust by rewriting a classic BSD utility (cat, head, wc, ...) in Rust and none of those utilities have to care about broken pipes because they don't ignore the signal and the OS silently kills the process. Instead, if a Rust program doesn't handle a write failing with a EPIPE error then the process panics and exits with a backtrace. The worst part is that Rust standard library doesn't include a module for handling signals so it's not possible to undo the signal() call ignoring SIGPIPEs without using another crate.

In the end it's fine and you just have to handle broken pipes but it adds a lot of boilerplate to small CLI programs that in C just work as expected. Another twist (and possible further subtle bug) is that most shells set the exit status of a program that exits from a broken pipe to 141 (128 + the signal number, in this case 13). So when you catch the pipe you can exit the process with a status of 141, but that's the shell's behaviour and there's no safe way to fake an exit status.


> The compiler adds the call to signal() to ignore SIGPIPEs when a Rust program starts. Apparently this was to stop a listening server from exiting when a client closed their connection

That’s appalling and irresponsible if true. Not every program is a server! If you want to ignore SIGPIPE you can do that yourself.


Well, that's the source of the bug in Python as well so it's not exactly a unique choice or default. It's tricky to manage this in a cross-platform way.


Python is a high-level language, though, and python code is run through an interpreter, so it's not a surprise that it might handle signals (and other things) differently than your average program. (Not restoring signals handlers to their default on fork()/exec() is definitely a bug, though.)

Rust is intended to be a systems programming language, and a replacement for C/C++. It should not be mucking about with signal handler defaults before main() runs.

The one thing that Rust does right, though, is it resets SIGPIPE to SIG_DFL when spawning processes[0]. Of course, I assume it only does that if you use std::process::Command, not if you use libc::fork()/libc::exec(), or get there some other way.

[0] https://doc.rust-lang.org/beta/unstable-book/language-featur...


I consider it malpractice


> Well, that's the source of the bug in Python as well so it's not exactly a unique choice or default. It's tricky to manage this in a cross-platform way.

Well, Python wasn't intended to write the sort of programs that C and Rust gets used for.

In Python, such a choice is not necessarily a design failure. In systems languages it is.


The attribute for controlling this is implemented but not yet stable: https://doc.rust-lang.org/beta/unstable-book/language-featur...


I'll have to rewrite some chapters once it's stable and released but I'm not holding my breath... It will be a nice problem to have anyway.


Thank you thank you thank you! Docker processes randomly exiting with "137" suddenly make a lot more sense!

137 - 128 == 9

... that's SIGKILL, probably via some sort of pipe/subshell. We'd figured it was OOM-killer or something similar, but now it seems like there's a bit more of the dots connected between.

Who can/will send random `kill -9`s? Probably the kernel, or some sort of supervisory process.


If the oom-killer is called, you'll see something on dmesg.


The usual way to generate an exit status for a signal that normally terminates the process is to explicitly unset the handler and then raise the signal yourself. In C, the latter half of that can be done with the ‘raise’ function; on (at least one) Linux with glibc, this uses tgkill underneath. I assume Rust doesn't expose this though from what you said?


That's a cool trick. There are crates for signal handling, and the libc crate exports raise(), they just aren't part of the standard library which attempts to be cross-platform. But if you're already going to unset the signal handler you might as well do that from the start (in this particular case).


That's silly, there are ways to mark that network operations shouldn't cause signals.


> The compiler adds the call to signal()

The compiler adds library calls? Seriously?


The "fix" also has a bug - `preexec_fn` causes nasal demons if the current process has threads.

Fortunately, Python 3.2 added the `restore_signals` argument for exactly this purpose, and it is enabled by default. The (2010) in the title is quite relevant.


Though from the bug report, the fix did not make it into Python 2.7 ... still being used some places today — by other people, not me (:

There is a backport fixed version though: https://code.google.com/archive/p/python-subprocess32/


In my experience virtually every use of the subprocess module at the time of first writing contains at least one more or less subtle bug.


> This code has a bug

From what I see, the code should be expected to work without requiring workarounds; instead, it is one of following:

- a python design problem

- gzip signal handling bug

- a flaw in linux pipe spec


Yeah, the problem is not on the python script side. It seems like a problem with tar or gzip.

Tar expects gzip to react graciously to SIGPIPE. Gzip only registers a SIGPIPE handler if SIGPIPE is not ignored. This is either a bug in gzip, or tar has to make sure it starts gzip without SIGPIPE ignored.

This doesn't mean the failure is any less complex. Tar incorrectly assumes that starting gzip without extra setup doesn't make it ignore SIGPIPE, which is subtly wrong.


> the problem is not on the python script side. It's a problem with tar or gzip

As it works perfectly in the equivalent shell script, I'd point the finger at Python for not wanting to handle SIGPIPE signals.


Is tar really only expected to work in a standard shell script setup? It's a general-purpose utility after all.

Python not wanting to handle SIGPIPE is perfectly fine. It is less clear whether it's fine for python to keep it disabled for subprocesses.

The other side: Is it okay for `tar` to fail if it is started with SIGPIPE disabled? That's definitely not what I'd expect, although it's maybe excusable since disabled SIGPIPE is nonstandard. You can argue about whether a system utility like tar is expected to handle nonstandard conditions -- I'd tend towards 'yes'.


If you're going to use popen(), you have to be prepared to handle how the utility you call sends data over that pipe. The bug is indisputably on the Python script side unless there's a spec tar claims to adhere to but doesn't.

That said, I don't think it's an unreasonable bug. I certainly wouldn't think any less of a programmer who wrote it.


I remain unconvinced that the decision to ignore SIGPIPE is right. The description is very hard wavy about it, "Python knows what it's doing". Maybe, but I can think of half a dozen other situations where this may trip up someone who does not take this into consideration.

How do similar languages like Perl do it? They seem to work fine doing things the Unix way.


Pretty clearly a problem with the implementation of Python, not a problem with gzip (which is following the POSIX spec) and Linux pipes (which are also POSIX compliant).


I'd call it a Python design problem. Python should have never let its changes to signal dispositions leak into subprocesses by default.


this is clearly fixed by popen starting a docker container for the subprocess to run in


beershootingoutofnose.gif


periodic nasal irrigation with beer also helps mitigate the impact of nasal demons. although, they may briefly become unusually rowdy before they pass out.


The inevitable error messages can be explained by an LLM?



i think more bugs reside here.

if tarball or path is user input then they could be used to inject tar command options.


That, however, is an extremely well-known class of bug. The post is about an extremely obscure (but presumably quite common at the time!) class of bug.


Also, if you're writing about bug x, you might make a minimal reconstruction of that piece of code only to showcase bug x, instead of the full version with all bells and whistles.


That's usually prevented by using double dashes ("--") to indicate the end of options, before the file argument. I try to remember to include include double dashes in my shell scripts for as many commands as possible (it's easier to just include them rather than figure out whether a user could possible influence the filename).


tarball is the mandatory parameter to the preceding -f and can’t be used to inject commands


Maybe better to use something like https://docs.python.org/3/library/tarfile.html instead of calling some "outside" utilities?

This might be safer (command injection) and maybe more reliable.

However I don't know what I would have chosen and hindsight is always 20/20 and there might be other external commands and requirements...


If you're wondering about downvotes, it's probably because this is mentioned briefly.

> It probably should be using tarfile, but let’s ignore that for the moment.


Apart from that it's mentioned. I have often seen tarfile being replaced by tar as subprocess, because tarfile is just so slow that it can make or break your use-case.

I'm not sure if tarfile got better since I've last used it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: