

Dustin Sallings: Running Processes - mbrubeck
http://dustin.github.com/2010/02/28/running-processes.html

======
mojombo
As the author of god, I'll agree that if your only need is to ensure that a
process is running, and that process isn't doing a lot of fancy daemonization
or forking, then init is the best solution. In fact, on all our boxes at
GitHub we use init to run god, because init is incredibly reliable; who better
to ensure that god is actually running?

But if your needs extend to ensuring that your processes are well behaved with
respect to memory usage, cpu usage, response to HTTP requests, etc, or any
custom metric of your choice, then you have to go a bit farther than the
suggestions in this article. God is all about making it easy to keep
everything running, no matter how complicated the setup or metrics may be.

~~~
dlsspy
I've used god quite a bit (that's actually how I found github in the first
place, I believe) and found many of its facilities useful.

I was mostly aiming for the stuff your system already does (and daemontools
for when it doesn't do what it should already do). As this was becoming a
rather large work of writing, I didn't want to spend some extra time to, you
know, praise god.

Some of these things do go into the basics, though. At the very least, memory
usage. Using rlimit for memory usage and death-of-child events for signaling
restart gives really quick turnarounds for almost free.

But rlimit for cpu utilization is less useful for long-lived processes. And
for things that are entirely outside of the scope (the most common one for me
is, "is my log still growing"), it's just not helpful at all. These are the
types of process monitoring where god helps a lot.

------
strlen
Upvoting, because it's a great article.

My own personal preferences (on systems where I don't have smf, upstart or
launchd available-- e.g., production RHEL/CentOS machines) is daemontools. My
only quibble with it is that the logging system it comes with has a fairly
insane feature of fsyncing() on every write (which, if you run a service which
spews a lot of logs to STDOUT, can seriously degrade performance). I'd really
like to investigate runit as an alternative: <http://smarden.org/runit> (the
site is down at the moment, but here's Google's archive:
[http://74.125.155.132/search?q=cache:ZPOjz5Z7k9IJ:smarden.or...](http://74.125.155.132/search?q=cache:ZPOjz5Z7k9IJ:smarden.org/runit/)),
but haven't yet.

~~~
dlsspy
Yes, I meant to look at runit some last night, but it was unavailable. It does
look pretty good.

------
bensummers
The Solaris section is very short, but I suppose that's because SMF is very
good.

Although if you do start using it, make sure your daemons fork, contrary to
the advice at the beginning of the article. I've encountered some race
conditions and other bugs when you use 'transient' processes which don't fork.
Solaris doesn't need to poll the process list with forking daemons because it
uses Contracts which keep track of a set of forking processes.

~~~
dlsspy
I have limited experience with smf (mostly, I work with people who think it's
really awesome and who would punch me if I didn't at least mention it). It
seems to do a lot of things, which, in turn makes the descriptors a bit more
verbose than launchd's.

If you have a good resource for introducing writing one of these, I'd love to
link to it. In particular, the idea of forking seems to conflict with classic
process monitoring.

I haven't been a solaris sysadmin in over a decade. I'm hopefully going to be
taking care of some modern solaris boxes real soon now, though, so I'm looking
forward to what all's changed.

~~~
bensummers
Here's a good SMF resource, by the ever reliable c0t0d0s0!

[http://www.c0t0d0s0.org/archives/4144-Solaris-Features-
Servi...](http://www.c0t0d0s0.org/archives/4144-Solaris-Features-Service-
Management-Facility-Part-1-Introduction.html)

------
lmz
One of the positives of the shellscript init.d/rc.d scripts are that you can
write your own start/stop steps e.g. sending a shutdown command to a socket.
Not all programs react well to a SIGTERM.

~~~
dlsspy
I don't see that as a huge advantage for a few reasons:

1\. We're mostly talking about startup (restart, etc...), not shutdown. You
can still write your own start script and that's all I've found myself caring
about most of the time.

2\. Writing a shutdown script is still plenty possible, though in the worst
case, you may need an adaptor (and only one of these needs to exist for all
apps).

3\. Processes that don't handle TERM signals properly should have bugs filed
against them to do so.

To be honest, the software that's the biggest pain for me to deal with right
now is one that requires a gentle shutdown. I can't always give it one, and my
system sometimes crashes and can require a painful recovery.

I'm becoming more of a fan of crash-only software every day for this reason.

