
Why doesn't `kill -9` always work? - fool
http://www.noah.org/wiki/Kill_-9_does_not_work
======
agwa
Do not mount NFS with "soft" unless you really know what you're doing. NFS'
behavior to hang is not "stupidity" - it's actually one of the best things
about NFS. Applications do not deal well with failed reads/writes. If there's
a brief network interruption or the NFS server goes down, it's WAY safer to
cause applications to hang until the server comes back. Since NFS is a
stateless protocol, when the server comes back, I/O resumes as if nothing ever
happened. This helps make using NFS feel more like using a local filesystem.
Otherwise it becomes a very leaky abstraction.

Not being able to kill processes stuck on NFS I/O is annoying though, so you
can mount with the "intr" option and that makes such processes killable.
However, since Linux 2.6.25, you don't even need this and SIGKILL can always
kill applications stuck in NFS I/O.

~~~
dredmorbius
Expecting NFS (or any other remote filesystem) to behave as if it were local
is a fundamental error.

Time, and speed of light, ultimately matter. If you need assurance, find a way
of getting reliability in your system through redundancy and locality.
Distinguish between "task has been delegated" and "task has been confirmed
completed". Down any other path runs pain, and anyone who tells you otherwise
is selling something.

You're going to have to compromise: whole systems (or clusters) going titsup
because your NFS heads had a fart, or lost commits. Neither is very attractive
when shit's on the line.

Your database relying on NFS is a fundamental error you'll have to design
around.

~~~
laumars

        > Expecting NFS (or any other remote filesystem) to behave as if it were local is a fundamental error.
    

Any storage medium is subject to fail, the issue isn't specific to remote file
systems. In fact I personally consider the Plan 9 methodology towards file
systems to be one of the greatest ideas it's imparted on Linux and Unix. So
I'm all in favour of the local file systems, services and remote objects and
file systems transparently behaving as one.

    
    
         > Time, and speed of light, ultimately matter. 

People often cite the speed of light when talking about electronics when
actually electrons don't travel at the speed of light (they have mass). The
difference maybe small, but using precise scientific terms imprecisely is one
of my pet hates.

    
    
        > You're going to have to compromise: whole systems (or clusters) going titsup because your NFS heads had a fart, or lost commits. Neither is very attractive when shit's on the line.
    

If this is a serious issue then you should be looking into iSCSI. A bad
workman blames his tools, a good workman finds the best tool for the job.

~~~
SEMW
> People often cite the speed of light when talking about electronics when
> actually electrons don't travel at the speed of light (they have mass)

Actual electrons move pretty slowly - of the order of millimetres per hour.
That speed has nothing to do with the speed the signal propagates down the
wire.

The speed the signal is propagated is the speed the electromagnetic wavefront
moves along it. Which is mostly limited by the dielectric constant of the
wire's insulator. (That actually _is_ related to the speed of light in the
insulator - they both depend on its permittivity). It's not related to the
speed electrons move in the wire, which depends on its cross-sectional area
and the current. In particular, the fact that electrons have mass isn't
relevant to the wavefront propagation speed.

(C.f. fibre optic cables, which will have a wavefront propagation speed not
dissimilar to copper wire (i.e. both will be a substantial fraction of c),
even though, unlike copper wire, their carriers _are_ massless and actually
_do_ move at the wavefront speed).

Analogy: imagine pushing the end of a very long, rigid broomstick. The actual
wood in the broomstick moves pretty slowly (maybe you move it a cm in a
second). But the person at the other end feels their end of the broomstick
move almost immediately, limited only by a speed of light delay (or a little
more if the broomstick isn't as rigid as possible).

~~~
SummonJetTruck
With a broomstick, wouldn't it be the speed of sound, since it's a pressure
wave?

~~~
SEMW
Fair point. For the analogy I was going for a hypothetical rigid broomstick,
made of stuff that's as incompressible as physically possible. The speed of
sound in that would be _c_. Yeah, that isn't very realistic, but I don't think
that hurts the point it was illustrating.

------
ambrop7
I consider any uninterruptable sleep in the kernel a bug. There's no technical
reason a process waiting for a resource (e.g. disk I/O) couldn't be killed on
the spot, leaving the resource on its own. If it can't be, it just means it
hasn't been implemented in the kernel.

~~~
dfox
It's bug motivated by compatibility. On original 70's implementations of Unix,
file system I/O mostly led to busy wait in kernel and thus was not
interruptible because it was simply not possible and there were applications
that relied on this behavior. On UNIX, signal received during system call
generally causes the kernel to abort whatever it was doing and requires
application to deal with that situation and restart the operation,
implementations of stdio in libc generally do the right thing, but most
applications that do filesystem I/O directly do not (and surprisingly large
number of commonly used network services behave erraticaly when network
write(2) is interrupted by signal). And even applications that handle -EINTR
from all I/O still have places where it is not handled (allowing interruptible
disk I/O will cause things like stat(2) to return EINTR).

Allowing SIGKILL to work and not any other signal is ugly special case, and
while generally reasonable it is still special case that is relevant for
things like NFS (with modern linux NFS client allowing you to disable this
behavior) and broken hardware (and then trying to recover the situation with
anything other than kernel-level debugger is mostly meaningless, with power
cycling being the real solution when you can do that. Accidentally we
currently have similar issue on one backend server where power-cycling is not
an option).

~~~
asveikau
> implementations of stdio in libc generally do the right thing

Tell me, what is the "right thing" for stdio to do when it sees EINTR? It
strikes me that this can't really be solved at the library level. There are
times when you'll want to retry and there are times when you'll want to drop
your work and surface the error to the caller. Doesn't seem to me like a
library can decide which is which. Which is probably why the I/O syscalls need
to surface it in the first place. (I'd argue if a library like stdio, which
does nothing but wrap syscalls and buffer stuff, can decide it, then there's
no need for EINTR to exist at all because the syscall could theoretically make
the same decisions.)

~~~
agwa
The right thing is almost always to retry the syscall. Syscalls on Unix return
EINTR because it makes the kernel simpler, which was a key design goal in
Unix[1]. If you need to do something when a signal fires, you do it in a
signal handler (relying on EINTR instead of a signal handler is error-prone
because if the signal fires between syscalls you lose it).

That's the theory anyways - in practice it's really hard to use signal
handlers to do stuff because of things like threads and async-signal safety.
There are newer syscalls like pselect() which let you atomically unblock
signals, execute the syscall, and re-block the signals, meaning EINTR _can_ be
used reliably, and then there's the even newer Linux-only signalfd() syscall
which lets you receive signals via a file descriptor. But stdio is still very
much oriented for the older signal handler approach.

[1] See the paper "The Rise of ``Worse is Better''"
[http://www.stanford.edu/class/cs240/readings/worse-is-
better...](http://www.stanford.edu/class/cs240/readings/worse-is-better.html)

~~~
asveikau
Yes, I'm aware, "almost always". Not always, though. I was thinking
specifically of an application that might want to use signals to cancel
blocking I/O and continue running.

(PS: When I wrote my reply I was also already familiar with your linked
article, the challenges of signal safety, and the signalfd() syscall. Surely
an interesting set of topics but I still maintain that a library doesn't
really have a "good" way to deal with EINTR, especially if all it does is wrap
read or write.)

~~~
dfox
Simplest solution would be for libc to export some flag that could be set in
signal handler signifying that I/O operation should be aborted.

as for the simpler kernel, I think that windows NT/VMS solution where user
code has to explicitly block on I/O completion is simpler kernel-wise, but
leads to unnecessary complexity in applications (which is abstracted away by
winapi, but it's sometimes leaky abstraction). On the other hand, most common
application for interrupting syscalls is timeouts and then killing the thread
is most often what you want.

In all, EINTR is not way to find out that there was an signal during syscall
but an hack to get process to meaningful state the easiest possible way when
signal handler runs. By the way for some syscalls post-2.6 linux does
something reasonably similar to ITS' pclusering transparently without
returning EINTR.

------
Trufa
That site is trying to murder my eyes!

Go here <http://www.readability.com/articles/zcqkmihi> and switch to
Readability view!

~~~
glenstein
Another (which I originally found because of a comment here a few years ago)
is <http://viewtext.org/>.

I like this one because I can add it as a custom search engine in Opera
without adding a browser extension or leaving the page.

~~~
aw3c2
opera has user style sheets built in. see the "author" and "user" mode. also
check out the accessibility layout.

no need to rely on a third party or even tell anyone what you are reading.

------
a_bonobo
Can anyone explain the "Why is a process wedged?" part? I do understand that
piping from /dev/random to /dev/null is going to run forever, but I do not
understand the gdb-output, nor what that has to do with the rest of the text.

~~~
dllthomas
Not sure just how much you do/don't understand, or how much others will/won't
understand, so I'll run through line by line:

    
    
        PID=$!
    

grabs the PID of the process you just spawned (in the background, with &) into
shell variable PID

    
    
        CMDLINE="!-2"
    

grabs the full line you just ran (before the line storing PID) with shell
history expansion

    
    
        CMD=${CMDLINE%% *}
    

expands the CMDLINE variable, replacing everything after the first space (so
CMD now has "cat") with bash trickery

    
    
        WCHAN=$(cat /proc/${PID}/wchan)
    

grabs the name of the currently executing syscall for the process (at least,
according to <http://www.lindevdoc.org/wiki/proc/pid/wchan>)

    
    
        echo "command: ${CMD}, pid: ${PID}, wchan: ${WCHAN}"
    

prints the info we've grabbed

    
    
        strace -p ${PID}
    

connects a trace to the process to see what it's doing

    
    
        gdb ${CMD} ${PID}
    

connects to the process (gdb needs program name and can be given a pid to
connect to)

    
    
        (gdb) disassemble
    

prints the actual (assembler) code being run. In this case, I think all we get
from the output is that it's in fact in the middle of some syscall - you'd
have to check registers and syscall tables to determine which.

As others have mentioned, much of this is less useful than implied in the face
of an actual wedged process.

Tangentially, using gdb to attach to running processes is a very powerful
technique - I've been able to get line numbers out of running bash scripts.

~~~
a_bonobo
Thank you for that! Makes it clear, I guess I have to properly learn gdb's
output... I work in bioinformatics, we never go that low.

------
nikster
Much simpler answer: Bugs.

If kill -9 does not work, its a bug. The kernel needs to be able to end
processes no matter what the process is doing. By definition this should not
be about how the misbehaving process was implemented. I imagine practical
considerations are keeping these bugs in there, eg I can imagine the effort of
making all processes killable would stand in no relation to the gains - its
hard to do, and rare to occur,

------
kqr2
Slightly off-topic, but kill -9 even has a rap song dedicated to it:

<http://www.youtube.com/watch?v=Fow7iUaKrq4>

~~~
SG-
Disgusting.

~~~
tutuca
You spelled AWESOME wrong...

------
darwinGod
The number of times I have spent 10 minutes staring at the output of 'pgrep
processname' , when I had attached gdb to the process in another terminal
session... Urgh!! :-/

------
JimmaDaRustla
I always thought kill -9 won't always work because it currently has control of
a system resource, like disk or something.

~~~
dfox
That is almost correct understanding. Processes that are waiting for things
like disk I/O do not respond to any signals, not even KILL.

------
liotier

      That is not dead which can eternal lie
      And with strange aeons even death may die

~~~
ucee054
_ph'nglui mglw'nafh Cthulhu R'lyeh wgah'nagl fhtagn_

------
kaeso
> ps Haxwwo pid,command | grep "rpciod" | grep -v grep

pgrep(1) is there for a reason.

------
drivebyacct2
sshfs used to have this problem and it was enough to bring nautilus and a lot
of other applications to their knees as they tried to stat() my homedir and
failed on the hung sshfs mount point.

~~~
pyre
Same for the cifs.kext (or was is smbfs.kext) in early versions of OSX.
Putting your laptop to sleep with a mounted Samba share was enough to slowly
grind the system to a halt when you woke it up.

