
Yet Another Reason Containers Don't Contain: Kernel Keyrings - mmastrac
http://www.projectatomic.io/blog/2014/09/yet-another-reason-containers-don-t-contain-kernel-keyrings/
======
biot
As Theo de Raadt so eloquently said:

    
    
      "x86 virtualization is about basically placing another nearly
       full kernel, full of new bugs, on top of a nasty x86
       architecture which barely has correct page protection. Then
       running your operating system on the other side of this brand
       new pile of shit.
    
       You are absolutely deluded, if not stupid, if you think that a
       worldwide collection of software engineers who can't write
       operating systems or applications without security holes, can
       then turn around and suddenly write virtualization layers
       without security holes."
    

See
[http://web.archive.org/web/20120513060008/http://kerneltrap....](http://web.archive.org/web/20120513060008/http://kerneltrap.org/OpenBSD/Virtualization_Security)
for discussion/context.

------
lifeisstillgood
"Let's walk before we run"

Totally totally totally. I loved building VMs with BSD jails, and was excited
to try out LXC a couple of years back - and it felt similar to very early
years of the big jails push - all was working but kinks still existed and the
user land was immature.

This is not stuff that great UX with Ansible will fix - the underlying
mechanisms will still take a couple of years to shake out. We will run on them
but ... Keep your secrets close.

------
kapilvt
also known as user namespaces to the rescue again (available via lxc or other
users of relevant parameterized syscalls), they've been available for a
while... though libcontainer doesnt support. in conjunction with a good lsm,
seccomp filtering, and capabilities thats pretty good protection. for those
willing to endure the pain might also go with grsec.

------
kentonv
Sandstorm.io's containerization (which uses Linux namespaces just like Docker
and others) is not affected by this, because add_key, request_key, and keyctl
are among the syscalls which Sandstorm disables with seccomp. The other issues
the author alludes to from his previous post on opensource.com also do not
affect Sandstorm, because it does not mount /proc or /sys and only exposes the
devices null, zero, and urandom.

[https://blog.sandstorm.io/news/2014-08-13-sandbox-
security.h...](https://blog.sandstorm.io/news/2014-08-13-sandbox-
security.html)

------
maccam94
Yawn. All containers will have potential security problems until people start
using different user namespaces. I'm looking forward to Docker implementing an
easy way to create unprivileged containers.

------
stephenr
Please enlighten me if I have misunderstood but isn't this something where
unprivileged containers (under lxc 1.0+) can be a bonus?

------
duaneb
Note that these are docker containers and not "containers" as a general
purpose concept. Jails still contain fine....

~~~
justincormack
No, these are Linux namespaces. These break up the things jail does into
smaller pieces of functionality, and that is more difficult. But this bug is
to do with the fact that Linux just has a lot more complex kernel
functionality than the BSDs do.

You can break out of a jail if there is a local exploit that gives kernel code
execution as non root.

~~~
duaneb
> You can break out of a jail if there is a local exploit that gives kernel
> code execution as non root.

Well, this has always been true of any sandbox or virtual machine.

> But this bug is to do with the fact that Linux just has a lot more complex
> kernel functionality than the BSDs do.

Also probably true. The linux kernel was not built to be namespaced,
unfortunately, so bolting no namespacing will have edge cases that are not
covered. I would not recommend anyone use linux containerization for
production environments for some years.

~~~
nl
_> You can break out of a jail if there is a local exploit that gives kernel
code execution as non root.

Well, this has always been true of any sandbox or virtual machine._

That's not true for Virtual Machines.

 _I would not recommend anyone use linux containerization for production
environments for some years._

Too bad people have been doing it for years - successfully - already. For
example, most PAAS products use linux containers for isolation and as a
security layer.

OpenVZ (ie, early version of Linux containers) has been used in production
hosting environments to give people root shell access for just as long.

~~~
krakensden
> > You can break out of a jail if there is a local exploit that gives kernel
> code execution as non root. > That's not true for Virtual Machines.

That's certainly an opinion:
[http://www.ubuntu.com/usn/usn-2342-1/](http://www.ubuntu.com/usn/usn-2342-1/)

> It was discovered that QEMU incorrectly handled certain PCIe bus hotplug
> operations. A malicious guest could use this issue to crash the QEMU host,
> resulting in a denial of service

Most virtualization servers are great big chunks of C and C++ running as root
with a bunch of crazy optimizations for benchmarks. They get some help from
the ISA, but... stuff happens.

