
Hanging the Linux core dump pipe helper - weinzierl
https://rachelbythebay.com/w/2018/04/29/core/
======
phaker
In re the 'and nothing improves' sentiment at the end: there _is_ a safety
limit and it's even documented in dump(5):

 _/ proc/sys/kernel/core_pipe_limit

When collecting core dumps via a pipe to a user-space program, it can be
useful for the collecting program to gather data about the crashing process
from that process's /proc/[pid] directory. In order to do this safely, the
kernel must wait for the program collecting the core dump to exit, so as not
to remove the crashing process's /proc/[pid] files prematurely. This in turn
creates the possibility that a misbehaving collecting program can block the
reaping of a crashed process by simply never exiting.

Since Linux 2.6.32, the /proc/sys/kernel/core_pipe_limit can be used to defend
against this possibility. The value in this file defines how many concurrent
crashing processes may be piped to user-space programs in parallel. If this
value is exceeded, then those crashing processes above this value are noted in
the kernel log and their core dumps are skipped.

A value of 0 in this file is special. It indicates that unlimited processes
may be captured in parallel, but that no waiting will take place (i.e., the
collecting program is not guaranteed access to /proc/<crashing-PID>). The
default value for this file is 0._

(edit: in case if that came off as curt i didn't intend it to, i do agree that
far too many things suck because "it's documented in a defunct mailing list
somewhere and if you don't know it's your own fault")

~~~
digi_owl
I find more often than not that the kernel people are reasonably good with
documentation. User space on the other hand seems to treat documentation
either like a chore best avoided, or something to bludgeon people with when
and API is mangled between releases.

~~~
monocasa
Kernel space has the benefit of a very clear public/private boundary. The
kernel internals have much worse docs than user space once you get over that
wall.

------
mst
Certain types of script really really really should always start with

    
    
        alarm 10
    

or similar. This has saved me from myself way too many times.

~~~
cpach
What does that do?

~~~
jey
Probably sends a SIGALRM signal in 10 seconds.

[https://linux.die.net/man/2/alarm](https://linux.die.net/man/2/alarm)

~~~
mst
Which, if you haven't taken precautions to make it otherwise, will shoot your
process in the head.

"Dear operating system, please shoot me in the head in ten seconds if I
haven't already finished" is a really useful thing to ask for sometimes.

------
amelius
Can't you just make core-files land on a FUSE-mounted filesystem, and from
there decide if you want to keep the file?

~~~
ahefner
That only moves the point of failure from one userspace process to another.
The FUSE handler can hang just like the core dump pipe helper.

