
Docker: insecure opening of file-descriptor allows privilege escalation - emilburzo
https://bugzilla.redhat.com/show_bug.cgi?id=1409531
======
cyphar
I discovered the vulnerability, and I'm not entirely sure that Trevor Jay
fully understands the issue (though to be fair, the easiest way of exploiting
it is using ptrace(2) which is blocked by most default security policies). You
don't need to use ptrace(2) or CAP_SYS_PTRACE to exploit the vulnerability.

You just need to have proc_fd_access_allowed(). I've not checked if
ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS) calls into SELinux hooks (it
probably does, and if it doesn't then resolving further files probably does
too) but neither seccomp profiles (unless you're blocking open(2)) nor
blocking CAP_SYS_PTRACE can help you here.

Now, the LXC exploit used ptrace in order to stop the process from closing its
file descriptors. I'm not sure how you would reliably hit the race in this
issue (something with SIGSTOP presumably?).

In any case, SUSE's update has additional fixes which also fix the issue even
when you give a container CAP_SYS_PTRACE (the released patch does _not_
protect containers that have CAP_SYS_PTRACE enabled). The patches will be
merged upstream ASAP, but Docker didn't want them in the patchset sent to its
customers (preferring instead to update their vendored runC once they are
merged upstream).

~~~
gtjay
Sorry for not seeing your comment until now. Amazingly great vuln BTW. It's
early in 2017, but this is probably going to be one of this year's best.

It's very important for everyone to understand my advice is RHEL/Fedora
specific, which is---I think---the source of the misunderstanding here.

Putting aside `ptrace` being the best way to guarantee a race win, the reason
for my focus on `CAP_SYS_PTRACE` is that with SELinux enabled there is no
other way to exploit having access to the file descriptors. Even if you
explicitly try to pass a containerized process an external file descriptor
"legitimately" (e.g. with `sendmsg`) SELinux will still ultimately block the
access due to the type restrictions. This means that with `setenforce 1` you
need to use something like code injection to get the external process to
access the file descriptors on your behalf.

~~~
cyphar
> Sorry for not seeing your comment until now. Amazingly great vuln BTW. It's
> early in 2017, but this is probably going to be one of this year's best.

Thanks. :D

> Putting aside `ptrace` being the best way to guarantee a race win, the
> reason for my focus on `CAP_SYS_PTRACE` is that with SELinux enabled there
> is no other way to exploit having access to the file descriptors. Even if
> you explicitly try to pass a containerized process an external file
> descriptor "legitimately" (e.g. with `sendmsg`) SELinux will still
> ultimately block the access due to the type restrictions. This means that
> with `setenforce 1` you need to use something like code injection to get the
> external process to access the file descriptors on your behalf.

Ah okay, yeah I suspected that's what you meant (on _RHEL_ xyz is the case).
Thanks for clarifying.

~~~
gtjay
> [...] you meant (on _RHEL_ xyz is the case) [...] >

Totally on me. We fight against it, but it's hard not to have the implicit
context of RHEL/Fed be omnipresent on the Red Hat bugzilla. In fact, when I
wrote the comment in question I had just finished lighting my incense to the
sīla of `systemd`... :)

Did not realize you were in Sydney. AU truly has the best hackers.

------
gtirloni
For those that won't open the link :)

Update by Trevor Jay:

"This is an extremely difficult to exploit flaw on standard RHEL and Fedora
systems.

I checked the 1.10.3 and 1.12.5 builds on Brew. Both drop the `CAP_SYS_PTRACE`
capability by default. 1.10.3 blacklists `ptrace` calls under the default
seccomp profile. Thus, this flaw only comes into play for containers that
already have elevated privileges.

Even if `ptrace` is available. The proposed exploit scenario of quickly
attaching to a process joining the container space and using its file
descriptors is _not_ possible under the default SELinux configuration. The
containerized PID 1 will have a type of `container_t` or similar SELinux type
and thus will be blocked by standard type enforcement from accessing accessing
any resources that haven't already been made available to containerized
processes."

~~~
zokier
Gotta love modern computing and defense in depth. Sometimes it actually really
helps despite feeling like a hassle at times.

~~~
lotyrin
Every once in a while I'll say something along the lines of "Wouldn't it be
nice if there was <describes SELinux>" or "wouldn't it be nice if you could
<describes using strace>" etc. and get enthusiastic nods, disbelief or "if
only" and have to break it to someone that they don't know what the hell
they're doing and they simply overlooked a huge feature of their production
platform.

------
geofft
Oh good, yet another vulnerability from the model of retroactively changing
the execution environment of a process after it's been created. We had a
thread about setuid binaries a week ago, which is the most common case of this
design:
[https://news.ycombinator.com/item?id=13312722](https://news.ycombinator.com/item?id=13312722)

We would all be better off if we designed systems such that some helper
process, already running with the right environment / config / privileges,
spawns the process for you and proxies input/output to your terminal.

And (as I mentioned in the other thread) this helper process could be
literally sshd. Instead of having sudo, ssh root@localhost. No weird process
trees with confusing things like effective UIDs. Instead of having runc exec,
ssh root@container. No file descriptors get passed that aren't explicitly
forwarded over the SSH connection.

Patching sshd to run over UNIX sockets without encryption and to use
getpeername() for authentication is left as an exercise to the reader.

~~~
mnarayan01
How would you implement e.g. ping? I mean obviously you could have e.g. user
accounts which were "sufficiently" locked down, but that seems like an even
more likely source of problems.

~~~
aaronmdjones
Along with the aforementioned IPPROTO_ICMP; you can still use file
capabilities(7) instead (or as well, in the case of older kernels):

# chmod 0711 /bin/ping # setcap CAP_NET_RAW=eip /bin/ping

~~~
geofft
File capabilities are certainly better than setuid, but they still have the
same problem of elevating privileges in a potentially-attacker-controlled
environment. If setuid ping has a vulnerability that lets you get root,
CAP_NET_RAW ping would also have a vulnerability that lets you read all
traffic into the machine and spoof packets from privileged ports or existing
connections. That's an uncomfortably large amount of access, even if it isn't
quite root.

------
robszumski
CoreOS engineers started deploying patches across all channels for this CVE
minutes after it was made public. More info here:
[https://coreos.com/blog/cve-2016-9962.html](https://coreos.com/blog/cve-2016-9962.html)

------
lukeck
A fix for CVE-2016-9962 was released in Docker upstream 1.12.6 a couple of
days ago.

[https://github.com/docker/docker/releases/tag/v1.12.6](https://github.com/docker/docker/releases/tag/v1.12.6)

------
Hortinstein
is there a poc? would love to see a walkthrough

