

The right way to deal with frozen processes on Unix - dRiek
http://blog.phusion.nl/2012/09/21/the-right-way-to-deal-with-frozen-processes-on-unix/

======
trotsky
I'm not sure if that's darwin terminology, but calling a process that's
blocking on io or in a tight loop "frozen" certainly isn't tradition in UNIX -
you're much more likely to use hung or stuck. In linux specifically the term
has a completely different meaning - "freezing" a process is intentional and
involves putting the process into a cgroup and then removing all cpu shares to
prevent it from executing - suspending the process and allowing for
sleep/hibernation/etc.

~~~
jlgreco
The "frozen" terminology is amusing to me since I would normally say that
processes in tight loops are "burning CPU".

------
bradleyland
I don't mean to disparage other Ruby application servers, but this is why I
use Passenger. It's not that I believe other Ruby application servers aren't
good, or that their authors don't understand Unix as well as the guys at
Phusion, but there's a real dedication to stability and predictability inside
Phusion that I don't see elsewhere.

You can acheive the same results with other app servers, but you're going to
have to do a lot of the heavy lifting yourself. I'm not ashamed to admit that
I have a lot more confidence in Passenger's solution than I do my own.

------
unixnoob
OK, total noob question here. Could we achieve the same thing with something
like pgrphack from daemontools?

    
    
        pgrphack sh -c "processes" 
    

Kill the pid for the sh ("agent") and you thereby kill all the processes?

Again, sorry for the noob question. I'm still learning and making mistakes.

~~~
FooBarWidget
No. To instruct kill() to kill a process group, you have to specified the PID
of the process group leader as a negative number. Otherwise kill() will kill
only a single process.

~~~
unixnoob
But won't all the processes in my example have the PGID of sh?

~~~
FooBarWidget
Yes they do, but that is irrelevant. kill(pid) kills the _process_ specified
by 'pid'. kill(-pid) kills the _process group_ specified by 'pid'.

~~~
unixnoob
What if I just use userland kill(1) utility? Is it possible to kill all
processes under a PGID using kill(1)?

Say the PGID I get for sh is 321. If I do

    
    
        kill [signal] 321
    

that will not kill all the processes having PGID 321?

If it would not kill them, then couldn't we modify kill(1) to be able to call
kill() with a negative integer as you describe?

Sorry for the noob questions. I am still learning and making mistakes.

------
ibotty
isn't frozen in usual unix terminology any SIGSTOPped process?

~~~
Evbn
That's suspended to me.

------
bifrost
Its nice to see traditional debugging made easy, this is stuff that you try to
teach people as a sysadmin/opsguy and they never pay attention. Yay!

------
X-Istence
The page is unfortunately not loading, and there doesn't seem to be a Google
cache for the page.

~~~
FooBarWidget
There was a slight interruption of service, but the blog has been restored
now. Our apologies for the inconvenience.

------
janerik
The server is not responding right now and Google Cache is empty. Any one got
a copy of this?

------
dredmorbius
In related news, Hongli Lai has solved the halting problem.

~~~
geofft
You do realize that it's quite easy to solve the halting problem in most
cases, right?

The halting problem is unsolvable because _very particular_ (and pathological)
cases are unsolvable. If a process is in a loop between the same instructions
at the same states, it's very easy to tell that it's not going to halt -- the
only challenge is that you can't make this determination for all processes all
the time.

Mathematically, note that the concept of an oracle for the halting problem is
well-defined (and a useful concept).

~~~
dredmorbius
I was mostly making a jibe at the notion that there is a single, correct, and
reliable method for identifying stuck/hung processes.

There are in fact fairly reliable heuristics for noting when things are going
pear-shaped. The edge cases get sticky though.

