
Understanding Docker Container Escapes - ingve
https://blog.trailofbits.com/2019/07/19/understanding-docker-container-escapes/
======
rtempaccount1
Whilst from an overall security perspective, this may not be that meaningful
(the pre-req's are CAP_SYS_ADMIN or --privileged), it's still a really neat
hack.

Also a good demonstration of how runc container security is really just Linux
security, and the isolation provided by that type of container, is dependent
on underlying Linux features.

I could see circumstances where this would be of practical use, where you've
landed in a privileged container and are looking for a relatively
easy/universal breakout to the underlying node.

The ToB breakdown is cool too as the initial tweet was a bit hard to parse,
unless your Bashfu is good.

(ofc the post from ToB I'm really looking forward to hasn't been released yet,
which is what they found in the Kubernetes audit)

~~~
panpanna
Two minor remarks:

1\. Docker is not synonymous with Linux containers. LXC for example has always
had much better security than Docker despite using the same infrastructure.

2\. You don't need to interface the kennel directly. There is for example a
Google project (gvisor?) that acts as a syscall proxy in userland to combat
these types of problems.

~~~
rtempaccount1
Out of curiosity, what makes you say that LXC has better security than Docker?
What aspects of LXCs design or implementation provide additional security
controls over those of Docker?

~~~
panpanna
For example LXC containers are unprivileged by default. If you escape the
container you end up as a normal user on host, not root.

LXC main dev is on HN. I'm sure he can explain this much better.

~~~
rtempaccount1
I'm not sure I follow you here.

The Docker daemon runs as root (which is kind of inevitable, if you don't then
you end up needing hacks like slirp4netns to hook up networking)

However, there's nothing innately inside Docker that requires contained
processes to run as root.

Indeed it's a standard part of recommended Docker security guidance not to run
your contained processes as the root user.

Docker also provide the facility to enable user namespaces at the daemon
level, so that root inside the container != root on the host.

~~~
cyphar
When GP said unprivileged containers, they meant user namespaces (that's the
terminology LXC uses). Docker doesn't default to user namespaces being used
(LXC does) and within Docker it has many limitations that LXC/LXD do not.
LXC/LXD can also isolate containers from each other by mapping different
uid_maps, but with Docker all containers use the same mapping.

Disclaimer: I'm a maintainer of runc, the runtime Docker uses.

~~~
rtempaccount1
I understand that Docker's user namespace support is relative basic, but the
point I was looking at was that if you don't run your contained process as
root (e.g. unprivileged) by specifying a USER in the Dockerfile, then I wasn't
aware of major differences in the security of an LXC container as against one
running under runc

ofc happy to be corrected, as I'm aware you'd know more about this :)

~~~
cyphar
It really depends whether you want to compare similarly-configured containers
or the defaults.

If you compare the defaults, LXC wins overall because they have rootless
containers and user namespaces by default (runc has them too -- I implemented
them -- but it's not the default in Docker). To be balanced, LXD's isolation
of individual containers is not on by default either (because of backwards
compatibility requirements) -- but Docker doesn't have an equivalent feature.
If you configure a Docker setup to be as-close-as-possible to an LXC setup,
then it's much harder to give a definitive answer. Generally, the containers
we set up look almost identical from the kernel's point of view so we have
similar kernel 0day problems. So it comes down to the security of the runtime
in particular.

I am currently working on solving several pretty fundamental security issues
that exist both within LXC and runc (and many more programs generally)[1], so
it's not like either is perfect (though LXC does have more code to defend
against the attacks I'm working on fixing). LXC does make use of more of the
kernel hardening work that we (both the LXC folks and myself) have worked on.
A trivial example is that LXC uses TIOCGPTPEER (a feature I originally
implemented that allows you to avoid certain theoretical attacks by container
processes against the runtime) but Docker doesn't use it (and because runc
doesn't have a container manager by design we can't implement it in runc). LXC
also supports using pidfds (a new feature in Linux 5.1 that Christian Brauner
has been working on for a while) which allow much nicer methods of avoiding
PID recycling race conditions -- with runc we still use the old pid+starttime
method which is prone to well-known (though usually harmless) attacks.

Funnily enough, I'm actually giving a talk about this topic at the end of this
week[2] and was writing slides when I saw this thread. :P

[1]:
[https://github.com/openSUSE/libpathrs](https://github.com/openSUSE/libpathrs)
[2]: [https://2019.container.camp/au/schedule/securing-
container-r...](https://2019.container.camp/au/schedule/securing-container-
runtimes-how-hard-can-it-be/)

------
awinter-py
> use official docker images

(as a mitigation)

okay, but even for official images, figuring out the provenance of a build on
docker hub is totally impossible

I challenge you to start with an image sha and tell me what git version (or
even _what repo_ ) was used to create it

docker needs to get better at supply chain

------
zeroxfe
This is a weird kinda escape. The only time I use --privileged is when I'm
debugging, _because_ it lets me easily elevate access. Does anyone actually
run privileged containers for production workloads?

~~~
tln
Not sure about production workloads but its not just privileged containers you
start that should be a concern.. anyone who can access /var/run/docker.sock
can run a privileged container, so this can be a privilege escalation.

Because of this escape, giving access to /var/run/docker.sock to regular users
is the same as giving them root access.

Also as the article says mounting /var/run/docker.sock is (now, because of
this escape) the same as giving that container access to the host system.

~~~
mclehman
On the other side of things, my favorite demo for people new to docker who
aren't yet aware that sudoless docker ~~ root access is:

    
    
        docker run -itv /:/host ubuntu chroot /host

~~~
cyphar
Or, even better use nsenter to join all the namespaces of PID 1 on the host
(making your process an ordinary root process in the init namespaces).

------
oso2k
The title ought to be updated to specify the container types in question are
privileged containers. Even if the original blog post doesn’t. A privileged
container is quite like running a process with a user that has sudo
permission. Being surprised that such a container can do bad things seems
disingenuous. It’s like saying, “When you give me the keys to your house, I
can rob you blind. I just need to figure out which key is for the front door.”

~~~
tptacek
The post says, over and over, almost ad nauseam, that you shouldn't be using
--privileged and that the original tweet depends on --privileged; moreover, it
presents an alternate version of the escape that doesn't depend on
--privileged (but does depend on other elevated privileges).

The title should not in this case be editorialized; relative to the topic,
it's about as boring as titles come.

~~~
lawnchair_larry
I think the title is a bit clickbaity, perhaps unintentionally. I think
parent’s point is that this post isn’t really going to teach you anything at
all about docker container escapes, but rather, it teaches you about escaping
containers that an admin took several deliberate uncommon and non-default
steps to basically escape the container for you. If one clicked it to learn
about real world container escapes, they left disappointed.

------
OJFord
_Obviously_ CAP_SYS_ADMIN and root-running containers should be avoided.

But on the occasion that they're necessary, isolating from other pods with a
network policy and having no public ingress is enough right?

Assuming of course that you trust the container process(es) - or is that the
issue?

~~~
rtempaccount1
I'd say "that depends"

For example if you're running Kubernetes, then if you have a user who has RBAC
rights to exec into a privileged pod, but doesn't have rights to create new
privileged pods, then this could be a privesc risk, as they can use something
like this to escalate to the underlying node after exec'ing into the
privileged pod.

It (as with all things security) depends on your threat model :)

~~~
OJFord
Good point - I was thinking that 'operators' were trusted though, potential
threat is the (ab)users of the running software.

As far as I can tell, as long as there is no service/ingress on the privileged
container, and a netpol blocks those that do from accessing it, it's less than
ideal but 'ok' that this privileged container is running behind the scenes.

------
mychael
The author is presenting this as if they found a serious security flaw in
Docker. They're starting a container in privileged mode, what exactly did they
expect?

This is like acting surprised that a Linux root user can do immeasurable harm
to the underlying OS.

------
based2
[https://www.reddit.com/r/netsec/comments/cfh7rk/understandin...](https://www.reddit.com/r/netsec/comments/cfh7rk/understanding_docker_container_escapes/)

------
king_phil
Funny thing about docker container security, bug that has not been fixed for
ages: a custom AppArmor profile is only applied on the first container start,
but for no later restart.

Yes, the container runs in the "unconfined" profile after a restart.

[https://github.com/moby/moby/issues/38075](https://github.com/moby/moby/issues/38075)

~~~
zapita
That’s disingenuous. In this issue, the maintainers clearly explain that
running your container as privileged is _supposed_ to disable all confinement
by apparmor. The bug is that the custom apparmor profile is sometimes applied,
when it should never be. This is not a security issue in any way since the
container is already privileged.

~~~
king_phil
But in a privileged container you could still take away capabilites and/or
permissions with an apparmor profile. Sometimes that happens, sometimes it
does not. And when it does not, you have no way of knowing.

~~~
zapita
> _But in a privileged container you could still take away capabilites and /or
> permissions with an apparmor profile._

Right, what you want is “privileged except for XYZ”, which is not supported by
Docker. That’s a missing feature which is not the same as a bug. Calling it a
security bug is even more misleading.

> _Sometimes that happens, sometimes it does not. And when it does not, you
> have no way of knowing._

Right, it should fail every time. That is a bug. But it’s not security bug,
and fixing that bug won’t give you the feature you want, it will just make it
clearer that the feature is not supported.

