Because of that I usually make all my services and systems crash only. End up using things like use atomic file moves, open files with append-only, use kill -9 to stop services and so on. To make your system crash-onl,y you have to go down the base system calls.
Some observed effects so far (many are covered in the article):
* Faster restarts (if your regular operation involves restarting lots of processes).
* Less code (don't have to handle both the clean shutdown and dirty shutdown).
* Recovery/cleanup code if it is needed, is often ends up moved to startup instead of shutdown (you might have to recover corrupt files when you start up again. For example re-truncate the files to a known offset based on some index).
* Something else might need to manage external resources (OS IPC recources, shared memory, IPC message queues etc). This could be a supervisor process.
* If you do a lot of socket operations on localhost, your sockets could get stuck in TIME_WAIT state and you'll eventually run out of ephemeral ports if you do a lot of restarts (say during testing). SIGTERM signals often are caught and processes (libraries) perform a cleaner shutdown.
* Think very well about the database you use and see if it can can support crash only operation. Some do some don't ( I won't name any names here ).
My favorite one of the principles is "All important non-volatile state is managed by dedicated state stores". Being both crash-only (or even just tolerating crashes) and keeping state is a very difficult combination, and you don't want every one of your services needing to solve that problem over and over. Dedicated state stores let you hand this problem off, which turns many systems stateless (or at least without hard state). Tolerating crashes in soft-state-only services is much easier, perhaps even trivial if you follow the other rules.
I wrote a blog post about this paper a while back (http://brooker.co.za/blog/2012/01/22/crash-only.html), if anybody is interested.
(Although, if you want to preserve undo/redo over (unintended) shutdowns, it becomes much more difficult.)