
Introducing dumb-init, an init system for Docker containers - ckuehl
http://engineeringblog.yelp.com/2016/01/dumb-init-an-init-for-docker.html
======
gnud
The actual reason you really need a proper PID 1 is not explained in this
post, but a couple of clicks away at [0]:

    
    
      >[...] the init process must also wait for child processes to terminate, 
      >before terminating itself.
      >If the init process terminates prematurely then all children are terminated uncleanly by the kernel.
    

0: [https://blog.phusion.nl/2015/01/20/docker-and-the-
pid-1-zomb...](https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-
reaping-problem/)

~~~
jzelinskie
Considering phusion/baseimage has been around for more than 2 years and plenty
of people have been using an init system inside their containers that contain
multiple process, why didn't Yelp just pick something up off the shelf? Why
not use runit or one of the plenty of more mature lightweight init systems?

~~~
predakanga
I can't speak to the others that have been mentioned in this thread (tini in
particular seems to be identical), but the solution used by phusion/baseimage
is written in python[0] - a C-based solution allows for lighter-weight
containers

[0]: [https://github.com/phusion/baseimage-
docker/blob/master/imag...](https://github.com/phusion/baseimage-
docker/blob/master/image/bin/my_init)

------
stephen-mw
Very nice! I see this as the next evolution to the phusion's custom init
system[0], which was created to solve largely the same problems.

I should be able to take Yelp's dumb-init and add it easily to any linux
container I want -- including things such as Alpine[1]

[0] [https://github.com/phusion/baseimage-
docker/blob/master/imag...](https://github.com/phusion/baseimage-
docker/blob/master/image/bin/my_init) [1]
[https://github.com/gliderlabs/docker-
alpine](https://github.com/gliderlabs/docker-alpine)

~~~
Gigablah
s6 and s6-overlay were also created with the same goal:
[https://github.com/just-containers/s6-overlay](https://github.com/just-
containers/s6-overlay)

I use it in my Alpine containers.

------
akavel
If I understand correctly, the main goal here can be summarized in the quote
below:

 _" The motivation: modeling Docker containers as regular processes_

 _[...] we want processes to behave just as if they weren’t running inside a
container. That means handling user input, responding the same way to signals,
and dying when we expect them to. In particular, when we signal the docker run
command, we want that same signal to be received by the process inside. "_

and that seems to me as the core reason why they can't just use a simple init
system (like e.g. runit I suppose?)

------
vikiomega9
> Having a shell as PID 1 actually makes signaling your process almost
> impossible. Signals sent to the shell won’t be forwarded to the subprocess,
> and the shell won’t exit until your process does. The only way to kill your
> container is by sending it SIGKILL (or if your process happens to die).

Noob question. Why is it impossible? You have the PID, no?

~~~
ckuehl
Good question! The problem is trying to signal it from outside the Docker
container.

If your container has a process tree like

    
    
        PID 1: /bin/sh
        +--- PID 2: <your Python server>
    

then if you use `docker signal` from the host, it will only send a signal to
PID 1, which is the shell. However the shell won't forward it on to your
Python server, so nothing happens (in most cases).

dumb-init basically replaces the shell in that diagram, but forwards signals
when it receives them. So when you use `docker signal`, the Python process
receives the signal.

Alternatively, just eliminating the shell (so your Python app is PID 1) works
for some cases, but you get special kernel behavior applied to PID 1 which you
usually don't want. This is the main purpose of dumb-init.

~~~
vikiomega9
Ah that makes sense. I did not realize how docker-signal forwarded signals.
From its perspective using PID 1 makes sense because that's where the
"application" should run as specified in your dockerfile.

------
lox
This looks to be an alternative
[https://github.com/krallin/tini](https://github.com/krallin/tini)

~~~
ckuehl
Yup, tini is really really similar and looks pretty cool! They're solving much
of the same problem. It's unfortunate that we didn't find tini before we went
and wrote dumb-init.

There are some minor differences (dumb-init looks like it's probably a bit
better for interactive commands since it e.g. handles SIGTSTP). You can also
get process group behavior at run-time with dumb-init rather than compile
time, and it's on by default unlike tini (as far as I can tell from a brief
reading). But for most cases it won't make a difference.

~~~
krallin
Quick disclaimer: I'm the author of Tini (thanks for the hat tip, by the
way!).

Note that for interactive usage, Tini actually hands over the tty (if there is
one) to the child, so in that case signals that come "from the TTY" (though in
a Docker environment this is an over-simplication) actually bypass Tini and
are sent to the child directly. This _should_ include SIGSTP, though I'm not
sure I tested this specifically.

That being said, both tools are probably indeed very similar — after all there
is little flexibility in that kind of tool! Process group behavior is probably
indeed where they differ the most. : )

------
DanielDent
Another alternative, for what it's worth:
[https://github.com/rciorba/pidunu](https://github.com/rciorba/pidunu)

I've created an Ubuntu PPA packaging of it
([https://launchpad.net/~danieldent/+archive/ubuntu/pidunu](https://launchpad.net/~danieldent/+archive/ubuntu/pidunu))
and you can see an example of it in use at:
[https://github.com/DanielDent/docker-
powerdns](https://github.com/DanielDent/docker-powerdns)

For situations involving multiple processes, there's also
[https://github.com/just-containers/s6-overlay](https://github.com/just-
containers/s6-overlay)

Example use: [https://github.com/DanielDent/docker-nginx-ssl-
proxy](https://github.com/DanielDent/docker-nginx-ssl-proxy) (automated Let's
Encrypt SSL front-end)

------
lox
If it's such a straightforward fix, why isn't it part of the docker core? I'd
love to hear from the docker team why it's not a concern for them. Presumably
if it was they'd have addressed it by now.

From my own experience with docker in production, I'm yet to see any of the
described scenarios crop up. Has anyone else, or is this solving an extreme
edge case?

~~~
ckuehl
> From my own experience with docker in production, I'm yet to see any of the
> described scenarios crop up. Has anyone else, or is this solving an extreme
> edge case?

The biggest issue we see at Yelp is leaking containers in test (e.g. Jenkins
aborting a job but leaving the containers it spawned still running).

Depending on how you orchestrate containers, you might not encounter the issue
in prod. If you're using something like Kubernates or Marathon or Paasta,
they're probably going to do the "right thing" and ensure the containers are
actually stopped.

We also use containers a lot in development. For example, we might put a
single tool into a container, and then when developers call that tool, they're
actually spawning a container without realizing it. For this use case, it's
really important that signals are handled properly so that Ctrl-C (and
similar) continues working.

~~~
sandGorgon
Why did you not use something like supervisord? I run a few containers
(obviously not at yelp scale) and supervisors has been spectacular at
restarting, managing,reloading,etc. It handles nginx,gunicorn,puma,tomcat, etc
pretty well. Yes its python - but was that the motivation?

Also,you guys should comment on
[https://github.com/docker/docker/pull/5773](https://github.com/docker/docker/pull/5773)
which is work on unprivileged systemd in docker. I think you guys can
influence the bug with your experience in this.

~~~
anonbanker
some of us don't want systemd and it's tag-alongs ruining our docker.

------
dpedu
This is cool, but you can solve the same problem with a single line of bash

    
    
        trap 'kill $(jobs -p)' EXIT

~~~
FooBarWidget
No you can't. It _looks_ like it can, but there are various edge cases that
aren't handled.

See [https://blog.phusion.nl/2015/01/20/docker-and-the-
pid-1-zomb...](https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-
reaping-problem/), section "A simple init system".

------
ivan_ah
Is there any reason you wouldn't run normal "non-dumb" init for this using

    
    
        CMD ["/sbin/init", "2"]
    

and start your app using init.d scripts or supervidord as usual?

I feel like logrotate, cron, etc, are worth having inside container no?

~~~
jarito
Generally accepted practice is no - they aren't worth having. Containers
should only contain a single process. That process shouldn't be writing logs
to disk (hence no logrotate) and timed tasks would generally be done outside
the container rather than in it (though there are a lot of ways to skin that
cat).

Single process containers generally don't need all the baggage of a full init
system or other dependencies - hence this project.

~~~
rvense
At my current job, we're basically using Docker as a sort of package manager
and deployment script runner. Our containers are very fat, things are
installed with apt. One of them has GCC in it but I'm not sure why. One
installs Node and runs a few js scripts during the build process, then never
runs it again but keeps it around. It's obviously wrong, but I think it's just
a new set of bad ideas that this software has allowed people to have.

~~~
XorNot
Under a time crunch I've not found a way to use language-package managers and
not wind up with gcc in the container.

The problem is apt does a poor job of letting you setup something like build-
essential and then remove it and leave just the runtime shared libraries you
need for the other things you build to actually work.

~~~
justinsaccount
I kinda solve that by doing something like this:

    
    
        # install runtime deps
        dpkg -l | awk '{print $2}' | sort > old.txt
        # install build deps
        # build software
        dpkg -l | awk '{print $2}' | sort > new.txt
        apt-get -y remove --purge $(comm -13 old.txt  new.txt)
    

Probably a better way to accomplish that, but it was the easiest way I could
see to implement an 'undo'

~~~
zenlikethat
If doing this sort of thing, make sure to accomplish it in a single step
(image layer) in the Dockerfile. Otherwise you won't be doing any good as the
"removed" files actually persist behind a layer which specifies them as
removed.

~~~
justinsaccount
I've been using docker-squash for that. This way I can take advantage of layer
caching during development, and still have a small image for uploading.

------
ktt
Reminds me of RancherOS that also have Docker running as PID 1:
[https://github.com/rancher/os#how-this-
works](https://github.com/rancher/os#how-this-works)

------
anonbanker
Another sign that systemd is not going to infect docker any time soon.

