
Docker containers should not run an SSH server - davidkellis
http://jpetazzo.github.io/2014/06/23/docker-ssh-considered-evil/
======
rdtsc
What is the general pattern of usage with Docker containers? Are they supposed
to isolate just one application and do IO via a networks socket?

It used to be you ran everything on a real server. Then it moved to VMs in the
cloud. So spawning and managing VM was the the new thing to with a large
IaaS,PaaS,...aaS industry around it.

The product would consist of one or multiple VMs. All possibly running
multiple applications as different processes.

Then I guess inter-dependencies between applications and OS versions got to be
complicated, so the idea was that each application should run in its own
lightweight VM (with LXC as long as they share the same base OS kernel at
least).

Isn't this just pushing the problems into managing dependencies between more
little VMs while also constraining the architecture? It increases the
difficulty of synchronizing 10 or 20 separate isolated applications
start,upgrade,fail. Maybe all that living on yet another VM guest machine (so
conceptually having 2 virtualization levels). Handling more complicated
network setup (bridging, firewall rules at multiple levels). Handling effects
of disk and other subsystem interacting with each other in strange, sometimes
sub-optimal ways.

One idea was that ok, this is good for security. One can build secure
containers. But doesn't SELinux do that better? It even has a multi-level
security mode. It sure is complicated but it is used.

~~~
opendais
> What is the general pattern of usage with Docker containers? Are they
> supposed to isolate just one application and do IO via a networks socket?

The maintainers of Docker _appear to_ strongly believe that 1 application per
container is the way to go.

At sufficient scale, they are pretty much correct. You grab Y bare metal hosts
and spin up X docker containers for X processes.

> Isn't this just pushing the problems into managing dependencies between more
> little VMs while also constraining the architecture? It increases the
> difficulty of synchronizing 10 or 20 separate isolated applications
> start,upgrade,fail. Maybe all that living on yet another VM guest machine
> (so conceptually having 2 virtualization levels). Handling more complicated
> network setup (bridging, firewall rules at multiple levels). Handling
> effects of disk and other subsystem interacting with each other in strange,
> sometimes sub-optimal ways.

A handful of people advocate for immutable role-based containers [e.g.
[https://devopsu.com/blog/docker-
misconceptions/](https://devopsu.com/blog/docker-misconceptions/)] for that
reason. In that use case, its really replacing running something like Xen +
Chef, KVM + Ansible, or whatever.

You grab a host machine, you stand up your X containers with Z processes per
container on Y hosts.

I think this is _really_ equivalent to the Microservices vs. Normal SOA vs.
Monolithic argument. Given sufficient scale/requirements, each makes sense.
However, none of them are optimal for all situations.

~~~
FooBarWidget
> At sufficient scale, they are pretty much correct. You grab Y bare metal
> hosts and spin up X docker containers for X processes.

Can you elaborate why they are correct with this view? What is inherently
superior about 1 process per container? What is wrong with having two
cooperating processes per container? And why should Docker containers only run
on bare metal?

~~~
superuser2
> And why should Docker containers only run on bare metal?

The "point" of Docker is to be a lightweight replacement for virtualization in
some cases. Layering Docker on top of virtualization, while fine, adds layers
rather than removing/replacing them.

------
opendais
I think the core flaw with the "no SSHD in Docker, ever" reasoning is it makes
a number of assumptions.

To me, for my use, Docker is really a lightweight role-based VM I can put on a
provisioned host of some kind [another VM, dedicated server]. In other words,
its a really simple way to deploy X identical instances of an entire
service/application. If _any_ component in that instance of the application is
non functional the container is dead and you route requests to a different
container.

The problem with the "one process per Docker instance" logic is you need a
great deal more service discovery logic as a result. There are a bunch of
projects/methods to simplify this but at the end of the day it adds complexity
to operations. Instead of a single health check to monitor, you have X
containers to monitor. You have X containers to discover, etc. Sure, this lets
you get out of running Supervisor and SSH on a Docker container...but I think
it adds alot of application complexity that you can avoid 90% of the time.

SOA makes sense when you have a large deployment but if you are deploying to a
cluster of 5 machines...its overkill. Many [likely the majority] of projects
are at the scale of a basic cluster for redundancy.

~~~
vidarh
I think that's just an issue of tooling and getting used to a different way of
working with the services. If you run each of those services as a single
docker container and group them on the same server, there does not need to be
any more service discovery: You can link all of them. That still gives you the
flexibility of modifying the linking without having to modify any of the
containers.

The main practical difference vs. running them all in one vm is explicitly
documenting dependencies - both _which service_ relies on which package sets
etc. (assuming you're strict about what you put in the Dockerfiles) and which
part of your system needs access to which other component or needs to share
access to which directories.

You can also easily start them e.g. via systemd unit files or via a tool like
fleet, and pull the ip/port of dependencies on startup of the container and
pass them as arguments. This also makes deploying them as a unit easy, but
still decouples it:

If you need to move one component to a different server, or suddenly realize
your foobar server needs lots of resources and want to load-balance it across
multiple machines, you can easily do so by forwarding a port via stunnel or
load-balancing via haproxy or what have you without affecting the actual
containers.

As for your health checks, you can still have a single all-encompassing health
check if you want - whether it's in one "vm" or a set of separate containers
does not really change that in any way, though the bigger the setup, the more
additional checks you'll probably want to add. You have X containers to
monitor, but you had X server processes to monitor previously. If your use
case made it ok to depend on just e.g. hitting a page on a website that
depends on all X services being up, then your single old monitor still does
the job.

The overhead can be made really minimal and beneficial even for a single
server setup, yet help you if/when you suddenly want to scale part of it.
Personally, I run about a dozen docker containers on my home server so far,
and I expect that to increase several times over as e.g. every web app I
experiment with ends up running in its own container, with the docker setup in
a git repo as documentation of exactly which project caused me to pull in
which additional packages etc., and keeping the host itself as pristine as
possible. Makes me lot less nervous about distribution upgrades etc.

~~~
opendais
> I think that's just an issue of tooling and getting used to a different way
> of working with the services.

Load Balancer w/ Builtin HTTP Health Checks <-> Application Containers
[nginx+php-fpm]

Load Balancer w/ Builtin HTTP Health Checks <-> Container [nginx] <->
Container [php-fpm] <-> Additional health check you have to build out to check
that php-fpm is working correctly <-> Additional Tooling to make your process
work

For every single service in your architecture you are adding an additional
level of redundant complexity because it "helps you scale".

I can spin up as many app containers as I want, as often as I want without a
second thought or a moment's consideration. I don't care if I have 1000 of
them or 1, as long as the load balancer can handle the connections.

The scale you are talking about is when you need more than just a couple load
balancers...and you aren't going to need that for 90% of projects.

~~~
mwcampbell
I think that in the specific case of nginx plus php-fpm, separating things all
the way out to one process per container is overkill, and it would be better
to put the two in a single container, managed with something like runit or
supervisord. Conceptually, the combination of nginx and php-fpm can be treated
as a single service, and your operations team (or youu when wearing your ops
hat) would probably prefer to think of it as such. There are two tricky parts
though: getting both underlying processes to log to the container's
stdout/stderr, and getting them both to shut down cleanly in response to a
SIGTERM to the supervisor.

~~~
vidarh
I disagree. It's a perfect example where one of the components is likely to
scale differently than the other, and where the components are reusable
separately but less reusable together.

As someone constantly wearing my "ops hat", I much prefer to see them treated
separately for that reason.

------
contingencies
Hrrm. I want to agree, but as far as containers go outside of purely docker, I
just don't buy these arguments.

It makes perfect sense to use industry standard, encrypted communications with
proven cryptography when creating vast numbers of systems. Yes, you don't have
to. No, that doesn't mean it's a bad idea.

I'm not sure why things would be any different with docker, and after reading
the article I'm unconvinced. If you want to be locked in to docker's APIs, so
be it. If you want to be free of them and integrate in other ways, such as
proven, portable, secure methods like SSH, that's fine too.

If a management task as simple as deploying a key is so hard in docker
(couldn't you just bind-mount a read only _.ssh_ dir?), maybe you should
consider alternative methods of container instantiation.

I don't see how granting access to the host is a cleaner architecture... from
a security standpoint, it seems the opposite.

~~~
ldlework
I don't think I understand your point.

From the Docker Host itself, if you need to manage the state of a container,
the intuition is that you need to go into the container (with SSH) in order to
do so. But by externalizing your state, you can manage it without the need to
enter the container. Assuming your Docker Host is secure, this doesn't make
anything less secure just because you're no longer abusing SSHd in order to
manage your application's state.

In the case you need to gdb, or strace the process, you can do that from the
Docker Host with nsenter. Assuming your Docker Host is secure, you no longer
need to abuse SSHd to carry out a debugging task that has nothing to do with
needing a secure shell.

Neither of these use-cases have anything to do with the security of SSH.

In the case that you need to do these things from a remote host, the
prescribed answer is indeed SSHd to access the Docker Host, at which point you
switch to the previously suggested methods for managing state.

"I don't see how granting access to the host is a cleaner architecture... from
a security standpoint, it seems the opposite."

Because now you only have to worry about one security layer instead of N
security layers for each container you run. The security layer is now actually
coupled to the act of granting access to the host, its intended purpose vs
granting access to a container so you can manage its state or debug it or
whatever.

As far as being locked into Docker's APIs, I totally miss the aim of this
remark. Volumes are just paths on the filesystem. If you're talking about the
interoperability of standard tools to manage your state, I don't think they
will have problems in this case.

~~~
contingencies
_the prescribed answer is indeed SSHd to access the Docker Host, at which
point you switch to the previously suggested methods for managing state. [...]
As far as being locked into Docker 's APIs, I totally miss the aim of this
remark._

Yes, you missed the point. Please read the other response to comprehend the
difference.

------
CraigJPerry
I don't get it. What's wrong with putting my SSH public key in the image?

Tbh I'd pay more attention to host key regeneration! Easily overlooked.

Until there's better investigation tooling available, sshd seems a fairly
sensible approach.

MAC / SELinux prevents many other poke-inside techniques on enterprise
platforms

~~~
waffle_ss
I don't think it's SSHd in particular that he has a problem with. Docker
maintainers do not seem to like the idea of people treating Docker containers
as lightweight VMs. They seem to want images to be restricted to running as
few processes as possible - ideally one. I think they view it as too hard to
scale once you start going down the path of running multiple services in an
image - I've seen the "pets vs cattle"[1] analogy used for explaining why.

There is an image by Phusion called baseimage-docker[2] which adds SSHd, init,
syslog, and cron in an attempt to make Docker containers more like lightweight
VMs. But in the #docker channel, I've seen people have issues with it. For
example, one person had some /etc/init.d scripts that wouldn't start up (other
ones started up fine). Turns out that one of the signals that init was waiting
on to start that script was never getting sent (I think it was networking
coming online?), and that was just a side effect of how Docker works that
couldn't easily be fixed. The Docker maintainers in the channel discouraged
using this image for these reasons.

[1]: [https://groups.google.com/forum/#!msg/docker-
user/pNaBYJkmnA...](https://groups.google.com/forum/#!msg/docker-
user/pNaBYJkmnAA/TsA3b8u5kf0J)

[2]: [http://phusion.github.io/baseimage-
docker/](http://phusion.github.io/baseimage-docker/)

~~~
Dylan16807
Pets vs. cattle is about customization vs. automatic setup; I don't understand
how it's relevant to whether the containers have no SSH or identical SSH.

~~~
wmf
Pets vs. cattle was the last battle. Now there's immutable cattle vs. mutable
cattle; Docker is trying to promote immutable infrastructure and microservices
and eliminating ssh is one aspect of that.

------
FooBarWidget
I am the author of baseimage-docker ([http://phusion.github.io/baseimage-
docker/](http://phusion.github.io/baseimage-docker/)) and I work at Phusion. I
have the feeling that Jerome wrote this article mainly in response to the fact
that baseimage-docker encourages using SSH as a way to login to the container.
I believe that the ability to login to the container is very important.
Depending on how you architect your container, you might not have to, but I
believe that it's always good to have the _ability_ to, even if only as a last
resort method.

I had a pleasant conversation with Jerome quite a while ago about SSH and what
the "right" way is to login to a Docker container. We were not able to find
consensus, but Jerome is a brilliant guy and his reasons were sound. For some
time, I considered using lxc-attach to replace the role of SSH. Unfortunately,
a few weeks later, Docker 0.9 came out and no longer used LXC as the default
backend, and so suddenly lxc-attach stopped working. We decided to stick with
SSH until there's a better way. Solomon Shykes told us that they have plans to
introduce an lxc-attach-like tool in Docker core. Unfortunately, as of Docker
1.0.1, this feature still hasn't arrived.

Now, Jerome is advocating nsenter. There is currently an ongoing discussion on
the baseimage-docker bug tracker about replacing SSH with nsenter:
[https://github.com/phusion/baseimage-
docker/issues/102](https://github.com/phusion/baseimage-docker/issues/102)

But leaving all of that aside, we regularly get told by people that Baseimage-
docker "misses the point" of Docker. But what is the point of Docker? Some
people, including Jerome, believe it's all about microservices and running one
process in a container.

We take a more balanced, nuanced view. We believe that Docker should be
regarded as a flexible tool, that can be mended into whatever you want. You
_can_ make single-process microservices, if you want to and if you believe
that's the right choice for you. Or you can choose to make multi-process
microservices, if that makes sense. Or you can choose to treat Docker like a
lightweight VM. We believe that all of those choices are correct. We don't
believe that one should ONLY use Docker to build microservices, especially
because Microservices Are Not A Free Lunch
([http://highscalability.com/blog/2014/4/8/microservices-
not-a...](http://highscalability.com/blog/2014/4/8/microservices-not-a-free-
lunch.html)).

Baseimage-docker is about _enabling users_ to do whatever they want to. It's
about choice. It's not about cargo-culting everything into a single
philosophy. This is why Baseimage-docker is extremely small and minimalist
(only 6 MB memory over), flexible and thoroughly documented. Baseimage-docker
is _not_ about advocating treating Docker as heavyweight VMs.

~~~
shykes
I don't think your base image misses the point of Docker. Different people use
Docker for different purposes, that is normal and a fundamental goal of
Docker.

I do have criticism for your _communication_ around that base image, starting
with the link-bait blog post "you're using Docker wrong". Your message is that
anybody not using Docker _your_ way (full-blown init process, sshd, embedded
syslog) is doing it wrong. That is not only incorrect, it contradicts Docker's
philosophy of allowing and supporting more than one usage pattern.

My other criticism is that you point out a known Docker bug (the pid1 issue)
and use it as a selling point for your image, without concerning yourself with
reporting the bug let alone contributing to a fix. Meanwhile many people hit
the same pid1 bug and have reported, suggested possible fixes, or contributed
code to help implement that fix. If you want to be taken seriously in the
Docker community, my recommendation is that you consider doing the same.

~~~
FooBarWidget
Hi Shykes, glad to see you replying. Your point about communication is fair
enough. I will take a look at how the communication can be improved. However,
let me stress that the message is _not_ "you're using Docker wrong unless
you're using it our way". I see how it can be read like that, but the real
message is much more technical, complicated and nuanced. The message is
fourfold:

1\. Your _Unix system_ is wrong unless it conforms to certain technical
requirements.

2\. Explanation of the requirements.

3\. One possible solution that satisfies these requirements: Baseimage-docker.

4\. Does your image already satisfy the requirements? Great. If not, you can
implement these requirements yourself, but why bother when you can grab
Baseimage-docker? And oh, it happens to contain some useful stuff that are not
_strictly_ necessary but that lots of.

As you can see, such a complicated message becomes waaay too long and hard to
explain to most people. It probably only makes sense if you've contributed to
the Linux kernel, or read an operating systems book. If I explained it in a
way that's too technical and nuanced, 99% of the people will fall asleep after
reading 1 paragraph. So the message was simplified. I apologize if the
simplified message has offended you, and I am continuing to finetune the
message.

As the for the PID 1 issue: I genuinely thought you guys didn't include a PID
1 _on purpose_ , because running one isn't that hard. Last time I talked to
Jerome, he had the opinion that, if software couldn't deal with zombie
processes existing on the system, it's a bug in the software. With that
response in mind, I thought that the Docker team does not recognize the PID 1
issue as really an issue. So please do not mistake the lack of a bug report as
malice.

Later on, you told me that you guys are working on this, and I was glad to
hear that.

I get the feeling that you feel bitter about the fact that I chose to write
Baseimage-docker instead of contributing a PID 1 to Docker. Please understand
that I did not do this out of any adversarial intentions. My Go skills are
minimal and I am busy enough with other stuff. This, combined with the fact
that at the time I thought the PID 1 issue was simply not recognized, led to
me write Baseimage-docker. I would like to stress that I look forward to
friendly relationships with you, with the Docker team and with the community.

~~~
ithkuil
Do you have any links to relevant discussion, documentation and/or code
related a (official?) pid 1 process for/by docker? I'm not able to find it
quickly and I thought it might be useful if you could share given that you
clearly have some context. Thanks!

~~~
FooBarWidget
I don't know what the Docker team are working on, but this is the PID 1
process we use in Baseimage-docker: [https://github.com/phusion/baseimage-
docker/blob/master/imag...](https://github.com/phusion/baseimage-
docker/blob/master/image/my_init) It's a custom system we wrote specifically
for use inside Docker.

------
ksikka
Volumes are convenient! But what do you do if you have a multi-host setup?
Multiple volumes, shared volumes, distributed file systems, NFS? What do you
use when?

~~~
mixmastamyk
Would you not mount nfs volume in each container to share data?

~~~
ksikka
ok, then how do you increase server storage with minimum downtime? say you're
in a cloud environment, specifically AWS

------
x1798DE
Reading some of the other comments in this thread, I think I'm getting a
better idea of why this guy is suggesting that SSH in Docker could complicate
things, especially when you are scaling to a large number of servers, but what
I still don't quite understand his argument about security updates. Is that in
any way SSH specific, beyond the fact that SSH is powerful and it's critical
to be up-to-date on powerful things?

If it's a problem to update SSH, then it should also be a problem to update
whatever else you have in your Docker container. I guess there's some argument
to be made that if you're running a single-purpose Docker container, the
updates to whatever service it's running won't sync up with the updates to
SSH, so you may drastically increase the number of times you'll have to
package the image, but that's just a general argument in favor of single-
purpose containers, not anything specific to SSH like the key management
issue.

------
lifty
Its funny that he uses Docker as a build and installation tool for nsenter. If
you look at the installation example from the nsenter github page it shows how
you can mount your host's /usr/local/bin inside the container where nsenter
will be built. Pretty nifty/hacky way for building software and keeping your
system clean.

------
vbit
Wow. Given all the hype around docker, I'm really surprised nsenter type
functionality isn't part of the core. Glad I'm using FreeBSD where running a
shell in your jail is just 'jexec my_jail bash'. Compare that to the
rigamarole described in the article. Maybe the docker guys will benefit from
just reviewing the tooling for existing systems. I also recommend a look at
ezjail-admin.

~~~
vidarh
nsenter is part of util-linux, which means it will eventually make it into
pretty much every Linux distro around. The functionality doesn't belong in
Docker, because it isn't Docker specific - all it depends on is cgroups, and
cgroups is a kernel feature, not a Docker feature.

Once it's in there, a "jexec" equivalent would a couple of lines of shell
script.

~~~
vbit
Being part of util-linux makes sense. However, it would be so much nicer if I
didn't have to look up those two lines.

------
jaybuff
nsenter might be a short term solution, but IMHO, getting a resolution to this
issue
[https://github.com/dotcloud/docker/issues/1228](https://github.com/dotcloud/docker/issues/1228)
so you can just do "docker exec <cid> /bin/bash" is much simpler than the
nsenter call he recommends. Plus, he leaves out chroot, so I have a different
view of the filesystem than PID 1 inside the container does.

------
atoponce
As a cloud hosting provider, I don't want to give customers access to the
hardware node. Then what?

------
drydot
i don't see that wrong to setup the ssh service, if i accept ssh daemon is
safe which i do.

~~~
vidarh
You need not just accept that the ssh daemon itself is safe, but that:

\- Your key management is safe. \- The process manager you now need to
introduce to start sshd and the app running is safe. \- That the ssh daemon is
sufficiently protected against abuse. \- That your configuration of it is
safe.

If you don't _need_ ssh in every container to do achieve what you need to
achieve, why do you want to have to deal with each of those _and_ waste the
extra resources of having a bunch of extra sshd's and process monitors
running?

(To the last point: Yesterday we suffered an attempt at brute-forcing ssh on a
public facing server. We're used to people trying to brute force passwords.
But as it happens, it is "easy" to make openssh consume all of your servers
resources if you don't block access on the network level in the event of an
apparent attack; so if any of those ssh servers are reachable in any way from
the outside, you _have_ just increased your attack surface even if your key
management and everything else is perfect and they have no way of actually
getting _in_ )

~~~
FooBarWidget
If you are worried about the attack surface, then SSH - as Baseimage-docker
configures it - isn't that much of an issue. By default, we do not expose the
SSH port to the public Internet, nor do we install any keys. Unless otherwise
configured by the user, you first have to login to the host machine, and then
from there login to the container through SSH.

~~~
vidarh
While it's great that you ship with secure defaults, to me, if you're going to
restrict it to access from the host only, that just makes it more pointless to
run sshd in the containers vs. the alternatives presented in the article.

~~~
FooBarWidget
Like I already said, there is an ongoing discussion about replacing SSH with
nsenter now that nsenter is a viable alternative:
[https://github.com/phusion/baseimage-
docker/issues/102](https://github.com/phusion/baseimage-docker/issues/102)

SSH was purely chosen because until recently there wasn't a better
alternative. lxc-attach stopped working out of the box since Docker 0.9. See
[https://news.ycombinator.com/item?id=7951042](https://news.ycombinator.com/item?id=7951042)

------
zobzu
its all about useability basically use nsenter.

other points in the blog are true but its not what people want. people want
nsenter, but they dont know, so they use ssh.

------
yebyen
Typo: where a special key with force a specific command

