Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If you take a look at LXC and LXD, I would very much argue you can use them as a security boundary. One of the main problems with Docker is that the most powerful isolation primitive available in Linux -- user namespaces -- is not used by default and doesn't fully utilise the underlying feature. LXC uses unprivileged user namespaces by default, and LXD defaults to user namespaces as well. You can even isolate containers from each other with a basic config option.

All of that being said, this bug is caused by container runtimes trusting the rootfs too much. This is something I've been trying to improve but it's definitely not a trivial problem (lots of things require trusting the container processes in specific ways due to limitations in the corresponding kernel APIs -- though I am working on fixing those too).



Apparently even user namespaces can't be trusted for secure isolation, so much so that Arch Linux even has them disabled by default[1]. That said, it's possible that security improved since then, and I don't know when the most recent user namespace vulnerability was found.

[1] https://lists.archlinux.org/pipermail/arch-general/2017-Febr...


That mail is outdated. Arch, like some other distros such as debian, now applies a kernel patch that allows toggling userns support via kernel.unprivileged_userns_clone sysctl.


Oh, I'm aware that you can toggle it via sysctl, but it's still not on by default. That said, I can't find any user namespace CVE from 2019, only 2018, so maybe it's safe enough now. I guess "safe enough" is the keyword. If you really worry about the kernel's attack surface, you'll use a separation kernel, VMs, or separate machines altogether.


The issue is not that user namespaces cannot be used for secure isolation -- the problem is that it has been used for privilege escalation in the past. It definitely is more secure than it was 5+ years ago and there are ways of restricting its use on running systems through a couple of sysctls (in addition to the out-of-tree patch that Debian and Arch use).

But, in the case of running things in containers, you can stop exploits of user namespaces through seccomp filters that block unshare(CLONE_NEWUSER) -- Docker does this by default.


I wish it were possible to run Docker as a regular user, or run a separate Docker in Docker in CI (I assume the Docker CI runners on things like Gitlab are running as root or shared via `-v /var/run/docker.sock:/var/run/docker.sock` since Docker-in-Docker is only recommended for actually developing Docker)


Docker is a front end to underlying technologies which do the work.

Red Hat created a Kubernetes compatible set of tools for running Docker compatible OCI containers called CRI-O https://cri-o.io/ with RHEL/Centos 7.7 and 8+ you can run containers as a regular user: https://www.redhat.com/en/blog/preview-running-containers-wi... using their tools.


And Fedora!


Good news, there is a lot of work on that front, including an official "rootless" distribution: https://get.docker.com/rootless

The main thing about it is cgroups are disabled and it requires userspace networking.

Here's a write-up on it: https://engineering.docker.com/2019/02/experimenting-with-ro...


As well, you may need to use a fuse based implementation of overlayfs


It is possible to run Docker in Docker in CI. At a previous job I built containers that ran docker as Bamboo build agents. The containers did not use the docker socket and instead had their own and their own `/var/lib/docker` directory. However, the containers have to run docker as root (I started docker and then dropped privileges to run the bamboo agent) and have to run with the `--privileged` option. The advantage of doing it that way was that the hosts image storage was cleaned up with the containers and separate from the host. Disadvantage was that you have to use loopback based storage which makes docker a little slower. I don't think there's a huge difference in security since docker would end up being accessible via the socket anyway and by dropping privileges for the build agent you're losing the capabilities that you get from `--privileged`.


The issue is that if you want to communicate with the outside world you need to create a network bridge, which only a sufficiently privileged user on the host system can do.

An unprivileged-user docker daemon would be limited to either communicate with an isolated network namespace on the parent side or do userspace forwarding of network traffic. Or it would require a privileged helper for the network parts.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: