Hacker News new | past | comments | ask | show | jobs | submit login
Launch HN: Nestybox (YC S20) – Containers beyond microservices
168 points by ctalledo on Aug 7, 2020 | hide | past | favorite | 111 comments
Hi HN,

This is Cesar Talledo and Rodny Molina, co-founders of Nestybox (www.nestybox.com).

Nestybox has developed a new container runtime that sits under Docker/containerd (it's a new type of runc) and enables containers to act as virtual-servers capable of running software such as systemd, Docker, and Kubernetes, easily and with proper isolation.

The motivation came from noticing that containers are great at running microservices but struggle to run system-level software in them such as those mentioned above. That is, in order to run such software in a container, we needed unsecure privileged containers with complex images, custom entrypoints, volume mounts, etc., or alternatively a heavier virtual machine. This did not seem right.

We studied the problem and noticed that the container abstraction was not complete enough, meaning that inside the container a root process lacked capabilities to perform certain low-level operations, the namespacing of procfs and sysfs had a few holes, there are limitations for running overlayfs-on-overlayfs, and more.

To solve this, we decided to create a new container runtime that would set up the container in such a way that it could run system software easily and without resorting to privileged containers. That is, a user should be able to do "docker run -it some-image" and get a container inside of which she can run systemd, dockerd, or even K8s without problem (much as if it were a virtual machine).

After lots of long days, we came up with Sysbox. It's a new type of "runc" and sits below OCI-based container managers (e.g., Docker/containerd). You typically don't interact with Sysbox directly, but rather use Docker (or similar) to launch the containers. Sysbox was forked from the excellent OCI runc in early 2019 and has undergone significant changes since then. It uses OS virtualization techniques such as always enabling the Linux user namespace, uid shifting via shiftfs, partial virtualization of procfs and sysfs, selective syscall trapping in user-space, setting up special mounts into the container, and more. It's written in Go.

Here is a video: https://asciinema.org/a/kkTmOxl8DhEZiM2fLZNFlYzbo?speed=1.75

Today we are happy to announce that we are open-sourcing Sysbox (Apache 2.0). You can find it at https://github.com/nestybox/sysbox . We welcome users and contributors, as it has plenty of room to grow and improve. There are plenty of docs in the repo describing how to use it and how it works.

We think Sysbox is a very useful tool to expand the use cases for containers and provides an alternative to virtual machines in many scenarios, particularly for dev environments, testing, CI/CD, and even running legacy apps in containers.

In order to pay the bills, Nestybox (the company we founded) will sell a version of Sysbox called Sysbox Enterprise Edition (Sysbox-EE). We are using an open-core model, such that Sysbox-EE is based on the open-source Sysbox and adds a layer of proprietary enterprise level features. We think this model will help us strike a healthy balance between creating useful technology that all can benefit from and keeping the lights on.

Thanks for reading and we welcome your feedback.

Best, -Cesar & Rodny




Ex-Docker person here. I got an early peek at Sysbox and I'm really excited by it -- it's really neat.

Docker is missing a bunch of features that make some software work, which is why you can't run Docker inside Docker by default. Instead of dropping from containers all the way down to hardware virtualization, Sysbox is "augmenting" containers with the missing features by simulating them in userland. That gives you all the power of a VM, without any of the downside of slow start-up speed, provisioning blocks of memory, not being able to run them on EC2, etcetc.

It reminds me a bit of user-mode Linux [0], weirdly. There's something kinda interesting about simulating a bunch of the kernel in userland.

[0] https://en.wikipedia.org/wiki/User-mode_Linux


Thanks! Yes, Sysbox is using OS-virtualization techniques to augment the abstraction of the container, thereby enabling software that interacts deeply with the kernel (e.g., dockerd, k8s, etc) to run inside the container, and do so with proper isolation (no privileged containers). Now that you mention user-mode linux, it is one of the references we used as we built Sysbox, though they are very different things of course. I think my background as a VMware ESX kernel developer played a strong influence too ...


So can you run sysbox in sysbox?


Unfortunately not. Sysbox requires "true root" privileges, so it can't run inside a system container deployed by Sysbox itself (since that system container would use the user namespace). You can run Sysbox inside a privileged container however, and in fact the Sysbox test framework relies on this heavily.

What use case do you envision for running sysbox in sysbox?


No OP but one of my first questions about any layer is “how transparent is it”. If it can’t host itself, it’s clearly not 100% transparent.

This matters because it adds cognitive overhead - I have to keep track of which features are available at which layer.


Agreed; it's certainly something we will keep in mind as we mature Sysbox.


It's mostly curiosity. Probably docker in docker in docker.


Got it; note that inside a system container you can always run docker-in-docker using privileged containers (https://hub.docker.com/_/docker). That is, you don't need sysbox nesting in order to run more levels of docker nesting inside the system container. And those privileged containers would only be privileged within the system container, but not at host level.


Oh, that's cool. TIL. Thanks for taking the time to respond to something pretty far off in the weeds.


i'm just curious. For what kind of use cases do you need docker inside of another docker?


As @wh33zle mentioned, CI/CD is an obvious use-case. Development environments is another one (pls see the other question i just answered on this topic).

But i also see 'production' scenarios. Think about having a reverse-proxy system container, inside of which you host the backend applications, which happen to be docker containers.

In essence, Sysbox allows you to create hierarchies of containers in a secured fashion, without the need to abuse 'sidecar' pattern, nor priv containers.


I use dind to build containers in gitlab ci.


Try using Kaniko with GitLab CI, it will save you from having to run privileged / DIND containers.


I've been experimenting with buildah, but that does look like another option I should try, thanks.


For example, my CI provider might want to execute my build in a docker container for security reasons but I also want to run/build a docker container as part of my CI.


I think you'll also need it if you want to use docker-compose in your CI.


Have you considered looking at data science use cases for this (particularly with Python)? I've spent a lot of time tweaking containers meant for data science and one issue I run into a lot is that since you actually spend a lot of time developing inside the containers you want access to a lot of developer tools, but you end up with enormous images as a result. I could see using a tool like this to have one main developer container for working from that has all of your tooling and then using more streamlined containers for individual projects that have only the dependencies you need.


Absolutely, that's one of Sysbox's main use-cases, we usually refer to it as 'docker sandboxes'. As you mentioned, the idea is to have your entire dev environment within your fully-customized container, which would allow you to take that environment wherever you go, you are not tied to a particular hypervisor or cloud service-provider. We have heard of people already using Sysbox in Jupyter ecosystem.


That's awesome. Are there public examples of what you'd consider a good setup for that use case that you could point me to? Or is one of the use cases in the Sysbox-EE User Guide a good example (I don't see a 'docker sandbox' one but I may have missed it). Thanks!


Please take a look at these examples and let us know if they help:

https://github.com/nestybox/sysbox/blob/master/docs/quicksta...


Take a look at the Nestybox blog site: https://blog.nestybox.com/

It has an article on Docker sandboxing. Hope that helps!


How is NestyBox different from LXD (and LXC), as they too can:

- Run a distro including system software, systemd etc. as a container

- Run unprivileged, using uid-mapping to provide root and other system uids inside the container?

I've been using LXC and LXD for years to run system images as containers, even migrated some real machines to containers this way.


The main difference is that it's OCI-based, so works with Docker/containerd and hopefully K8s soon (we are working on the latter). Also, correct me if I am wrong, but I don't believe LXD runs K8s inside without privileged containers. Having said this, I know LXD and Sysbox use many of the same OS-virtualization techniques to do what they do. And in fact we owe much of the work we've done to the ground-work done by the good folks at Canonical/LXD.


Here is how to think about NUMA. In a modern AMD epyc box, you have 2 CPUs. Each cpu has 350GB/s of memory bandwidth and has PCI devices connected to it. Memory speed between CPU sockets is some fraction, 70GB/s? if you run computation on cpu0 and talk to nic on cpu1, you burn a lot more cpu cycles, than if you move your computation to cpu1. So in theory, if you partition the box using containers such that everything on cpu0 runs in container0 and has own nics, and same thing on cpu1. You end up with 2 'virtual' boxes that might actually perform better within your container than outside of it.

Note on modern CPUs, each cpu is further broken down into numa nodes(numactl -H and numa stuff n lscpu)...perf degradation isn't as great going between chiplets, but it's measurable(2x reduction in ram bandwidth?).


Got it, thanks for the explanation. I clearly see the use-case, just need to review cgroup specs (specifically cpuset) to fully understand if what you mention is already supported (which i believe it is).


This is very cool.

I'm trying to get a better sense of how this approach differs technically from rootless Docker / usernetes. I understand that it's not there yet, for many reasons, and I see your FAQ about it, but it's clearly working towards the same goal, right?

I think what's going on is that you depend on shiftfs from Ubuntu, and SECCOMP_RET_USER_NOTIF (or something?), unprivileged user namespaces, cgroup namespaces, etc. from the upstream kernel, but the major missing parts in the upstream kernel are procfs and sysfs virtualization and making shifts feel se amless, and so you've written a syscall trapper and a FUSE filesystem that run on the host and emulate the things you need. Is that approximately right?

If so, I'd be really curious whether you see a path to get onto upstream runc at some future point. It seems like you'd need shiftfs to be upstreamed, but if an unprivileged procfs2 + sysfs2 shows up upstream, I think you can use that? And you'd probably fit in at approximately the place something like vpnkit fits in for managing shiftfs?

I have a use case for this sort of thing at work, and we've been exploring rootless Docker and unprivileged containers a bit. I"m trying to get a sense of why to prefer Sysbox EE instead of waiting for (or, ideally, contributing to) upstream support for namespaced procfs/sysfs, for shiftfs, and for properly teaching Kubernetes about user namespaces. I suppose the answer is that your solution works right now, and upstream support might take years?

I guess that puts you in a position much like OpenVZ and even LXC itself, which both had significant out-of-tree code in years past and seemed to be decently successful businesses as stuff slowly got upstreamed.

It seems like the major benefits of Sysbox EE are paid support and not using the same uid_map for each container?


Thanks @geofft, you made a lot of great points.

I don't think rootless approach is fully aligned with what we're doing right now. True, we both rely on user-namespaces, and we both emphasize the security angle, but our goal is to expand the number of applications/functionality that can run in containers, which is something rootless approach may struggle with for some time.

Please see here if you haven't done it yet: https://github.com/nestybox/sysbox#sysbox-is-not-rootless-do...

In regards to our dependencies, we can operate with or without shiftfs. In both cases user-namespaces are always utilized. The rest of your approximation is correct: we need most of what you mentioned in your second paragraph, which btw, is already there (thanks to Canonical/LXD folks) starting in Ubuntu 5.0+ and 5.5+ for other distros. As you know, shiftfs is only present in Ubuntu at the moment, but as i said, we can live without it.

Which leads me to your question: why would you wait if the functionality you're after is already there? If having dockerd running as an unprivileged user is not a real must-have for you, then Sysbox provides a fairly secure solution while giving you all the functionality.

Sorry, i'm not familiarized with vpnkit yet, will take a look.

Correct, those are some of the benefits Sysbox-EE offers at the moment. That, plus efficiency & scalability features and hardened testing.

Thanks a lot for your detailed feedback @geofft. Please ping us on slack anytime.


I'm mostly meaning vpnkit in the sense of a it's thing that plugs into rootless Docker to provide networking - it seems like you could also be a plugin to upstream rootless Docker to provide sysbox-fs and your shiftfs management, at least in the long term.

Will try to remember to join the Slack next week, this is definitely a cool project :)


Actually - I thought unprivileged procfs and sysfs already are there, modulo the whole mount_too_revealing thing? https://github.com/opencontainers/runc/issues/1658


Awesome, this will be very useful.

It would be good to be able to assign physical network interfaces into the containers(using network namespaces) and also document how to create virtual network devices shared between containers.

Furthermore it would be awesome to make all this numa-aware. So you could have network interfaces and cpus and memory be assigned to a single container where everything is numa local. Then you could break up a single physical box into high performance independent domains.


Thanks! Both of your suggestions sound very interesting. I personally like the idea of creating large network topologies with a very few outer containers: the real mesh would be at L2/L3 levels. You would launch your large topo with just a 'docker run'.

The numa-aware idea would take some more research on our part, but at first glance looks like something that we could definitely explore.


What’s the use benefit of this over Firecracker & Ignite (https://github.com/weaveworks/ignite)?


I've not used either, but conceptually the main difference is that those approaches use micro-VMs and thus require hardware virtualization (hypervisors). This can be a challenge if you want to run those on cloud VMs, as it would require nested virtualization. Sysbox on the other hand is a pure OS-virtualization container runtime, so it does not require hardware virtualization.

Also, I think the goal is different: I understand Firecracker is meant as a way of strengthening the isolation of containers by wrapping them in micro-VMs. Sysbox is meant as way of enabling containers to run system workloads without complex images, entrypoints, volume mounts, etc., and with proper isolation via the Linux user-namespace.


How does Sysbox compare to Podman?

(If I remember correctly you can run systemd in a podman container and/or run a podman container with systemd.


We haven't had enough cycles to look at Podman in details (yet), but my understanding is that Podman and Docker serve similar purposes: they are high-level runtimes. (i'm obviating important nuances though and i'm not podman expert).

Sysbox, on the other hand, acts as a low-level runtime (same as runc), so we could potentially integrate with Podman too. In fact, we could _potentially_ integrate with anything that speaks OCI spec.

Having said that, we are not there yet, as for example sysbox wouldn't work with Podman in rootless mode right now; it should work in regular mode though, but we haven't tried it yet. If we accomplish this, it would allow podman to launch a larger set of applications too, same as we are doing for Docker.


Podman can use runc or their own runtime crun https://github.com/containers/crun so it should be able to work without much drama :tm:


Interesting. Thanks @jdoss!


Yes, the situation even improved with the latest releases:

  podman run -ti --security-opt label=disable --security-opt seccomp=unconfined --cap-add SYS_ADMIN --env STORAGE_DRIVER=vfs quay.io/podman/stable sh -c "dnf update -y; podman run hello-world"


Thanks; one thing I may have omitted mentioning is that Sysbox works with the fast overlayfs storage driver, meaning that when you do use it for Docker-in-Docker for example, both the outer Docker and the inner Docker are using overlayfs (as opposed to the slower vfs driver).


I think this might have some use cases outside of running microservices, services, Docker / container images, etc.

For example, let's say that I'd like to compile some complex piece of Linux software, software for which I don't have all of the third-party software/library dependencies, and I don't want to download/install all of those packages on my desktop Linux computer, because they're only going to spam it up...

Well, it sounds like with a Nestybox -- I could install all of those 3rd party libraries/packages, compile the code (inside of the Nestybox), and then not have to worry about my main Linux desktop being clogged/spammed up by unwanted third party libraries/binaries/dependencies... is that true?

Will Nestybox work for the above scenario?


Right, that will work.

I fully agree that Sysbox use-cases extend beyond docker-in-docker and k8s-in-docker. These, docker and k8s, were just the first two system 'apps' that we decided to support, but Sysbox can grow to support many others.

To be fair, we are not expecting Sysbox to be able to run _every_ application, for many apps you'll still need hw virtualization. Our goal is to focus on the (potentially large) subset of apps that could technically run inside a container, but are not capable of doing so due to current runtime limitations (mainly coz they were not designed for that purpose).

Thanks.


Sounds absolutely awesome! Congrats on your excellent product / company!

I hope you find much success!


Thanks a lot Peter!


can you run kubernetes in nestybox ? entirely within the container ?

if you can do this (and build a great experience around it), you have a winner.

k3d.io does it somewhat...but not all the way. Updates,rollbacks, etc - everything that a sysadmin need.


Yes, it's possible already to run K8s entirely inside a system container deployed with Docker + Sysbox. It's as easy as "docker run --runtime=sysbox-runc -it some-image" and running kubeadm inside to setup K8s. We also have images that come preloaded with K8s to make it easier.

Here is a demo video: https://asciinema.org/a/V1UFSxz6JHb3rdHpGrnjefFIt?speed=1.75

Having said this, K8s is very complex and while its most common functionality works inside the container, we've not yet tested it all.

Here is a doc that describes this in more detail, including what is supported and not supported at this time: https://github.com/nestybox/sysbox/blob/master/docs/user-gui...


We also created a simple tool called kindbox that runs K8s inside system containers deployed with Docker + Sysbox. It's a simple bash script around "docker run --runtime=sysbox" commands. It does some of the same things that K8s.io KinD tool does, but using simpler Docker images and without using privileged containers. It's meant as a reference example to show developers how to deploy K8s clusters using Docker + Sysbox.

You can find kindbox here if you are curious: https://github.com/nestybox/kindbox


Yes.

Having said that, we have certain limitations at the moment (e.g. we don't run all cni's), but we are not relying on priv containers as i believe is the case for existing K8s-in-docker solutions (pls correct me if i'm wrong in k3d's case).

Btw, we haven't tested k3d.io & sysbox interaction yet (part of our roadmap), but i believe we should be able to interoperate after making some minor adjustments. That would allow existing k3d users to operate in a more secure environment.


> can you run kubernetes in nestybox ? entirely within the container ?

> if you can do this (and build a great experience around it), you have a winner.

Genuine question. I'm not that familiar with Kubernetes. Can you explain the pain this solves? What is this winning at?


There is plenty of info on Kubernetes (K8s) on the web, so I would start there. As far as running K8s inside Docker containers though, the use case would be one in which you want to run multiple isolated K8s clusters on a single host. One way is to use VMs, but recently people are resorting to using containers for this purpose due to their ease & efficiency. It's in the latter that Sysbox really helps, because it's capable of creating a container that runs K8s easily and with proper isolation. Typical use cases are testing, CI/CD, learning. But I would not discount this moving into production use cases in the future as the technology matures.


This seems like a feature that, once the need is demonstrated, docker will add to its own product eliminating the niche you hope to fill.

Is it wise to go through all the effort and risk of starting a business to prove demand for a feature that an existing established product will then add, removing the need for your company?

(Note this question is not a general critique of all startup ideas, it's specific to startups launching to address missing features of established products)


Startups that address gaps on established products are a great acquisition target for the companies. They would not have invested such time, effort and resources as a niche startup would have. I believe chances of success for such startups are better for a good exit.


True, but in that scenario such a startup only really has one exit strategy and it's hard to get much above acqui-hire pricing because the potential acquirer is well aware that they are making a build-vs-buy decision now that the market need and fit has been proven, and they as the potentially acquiring company control the cards in doing so.


That's always a possibility, and the future will tell, but Docker appears to be more focused on improving application development rather than enabling containers to run system software as we are doing. It's a risk we were willing to take.


Isnt this true for most non-giant-funded non-hard-tech startup out there who are anywhere near a funded competitor?


This is very interesting! I’m not well versed in runc and friends but I was just exploring runsc[0] and gvisor.

Is there any overlap here even though your project seems to go the path of adding more functionality? Is it fair to think of NestyBox as a type of sandbox?

https://pkg.go.dev/github.com/google/gvisor/runsc?tab=overvi...


I understand gvisor's main goal is to improve container isolation by intercepting and inspecting syscalls before they reach the kernel to reduce the attack surface. Sysbox on the other hand is meant as a way to run system software (in addition to apps/microservices) easily inside a Docker container, so its focus is on enabling this functionality. Having said this, Sysbox always enables the Linux user-namespace in containers, and thus also improves container isolation.


Any advantage of using your solution over Kata Containers?

https://katacontainers.io/


It's hard to compare them because the goals are different. Kata containers seeks to harden container security by wrapping it with a highly optimized VM. Sysbox seeks to enable containers to run system-level workloads (systemd, dockerd, k8s) without requiring complex images, privileged containers, special mounts, etc.

Also, Sysbox is a pure OS-virtualization technology, which means it runs in environments where hardware virtualization is not available (e.g., a cloud VM, since most cloud providers don't allow nested virtualization).


Congrats on the launch - looks very interesting!

Are there any performance implications using this custom runtime?

Have you used this in production systems?

Are there any known limitations of using sysbox?


Thanks! Performance wise, we've not noticed any reduction in performance compared to a regular container, mainly because Sysbox sits on some control-path operations (e.g., accesses to /proc/sys, mount syscall, etc) but is really not intercepting anything on the datapath. For example, deploying K8s inside a system container takes < 40 seconds on my laptop, same as it takes with K8s.io KinD which uses the OCI runc (with privileged containers). Having said this, we've not done a thorough perf analysis yet.

As far as using this in production, the software is well tested but has not been used in production to the best of our knowledge. It has room to mature still, both in term of functionality and security, but it's in pretty good shape already. We hope open-sourcing it allows it to mature it faster.

And as far as limitations, there are a few, here is a list: https://github.com/nestybox/sysbox/blob/master/docs/user-gui... . We hope to remove some of these as the product matures too.


Thanks for the response - looking forward to trying it out this weekend!


Great! ping us on our slack channel (the link is in the sysbox README file) in case you need help.


Thanks!

We are out of the critical path, meaning we only emulate interactions with procfs / sysfs, and we only intercept mount syscalls at the moment, so we don't see any tangible performance hit. Having said that, we haven't done large scaling&perf tests yet.


Ah okay cool, makes sense.

Thanks for the response!


This looks interesting and promising. There are definitely use-cases to run docker in docker. We are running some open source project that leverages dockers heavily, let's see if we can collaborate to make things work end-to-end, I believe that would be beneficial for both communities.


Thank you. Please reach out to us through email/slack. Would love to hear more.


This is interesting, however the open core business model is absolutely terrible. Why can't you just provide Enterprise support with an EE-friendly license, but keep the EE features open source? That will also remove the conflict of interest in your business model.


We felt just providing enterprise support would not be sufficient to create a healthy business, given that Sysbox is designed to work under the covers (i.e., under docker/containerd) and does not require a lot of support. We opted for the open-core model as we felt it creates a good balance between contributing to the container ecosystem while still allowing us to sell some enterprise-level functionality (rather than just support).

Regarding the conflict of interest, we decided to handle it as follows: features that mainly benefit practitioners would go on the open-core, while features that mainly benefit enterprise deployments would not. Of course, there is still ambiguity there, but that will need to resolved on a feature by feature basis based on the feedback from practitioners and enterprises.

This is a learning process for us, but we understand this model is being used successfully by other IT infrastructure companies such as HashiCorp, so we opted for it.


Any time I hear about running docker inside docker (or indeed a VM host inside a VM) I'm reminded of a lesson from the great duo B&BH : https://youtu.be/PyrRVNyjlqU


That's funny ... brings back memories of the 90s :)

One thing I've noticed is that in modern IT infrastructure, there is usually two levels of sandboxing going on. At the low level you have VMs (sandboxed OS), and on top of it you have containers (sandboxed applications). Sysbox makes it easy replace that lower level with containers (which naturally leads to docker-in-docker or more accurately containers-in-containers).

To be clear I am not saying that containers are equivalent to VMs or that containers should always replace VMs. They are different beasts with different properties. But I am saying that in many scenarios it does make sense to use containers instead of VMs, particularly if your stack is all Linux, you don't need the isolation strength provided by VMs, and want the higher efficiency of the container.


How many levels of nesting is supported? From reading the Sysbox readme, I am led to believe that you can create a system container, and run docker containers in that containers, but then you're done. Is this true?


IIRC, Linux supports up to 32-levels of nesting, so that's an upper bound. This means that within a system container deployed by Sysbox, you can in theory nest inner containers up to 31 levels (since one of the 32 levels is used by the system container). In fact you can do docker-in-docker using privileged containers inside the system container. Having said this, while I've tried docker-in-docker inside the system container and it works fine, I've not gone to deeper nesting levels yet. And this is complex stuff, so I won't say that it definitely works until we try it.


Congrats on launching, can you give more info on the Sysbox EE pricing? I couldn't find anything on the site on pricing besides the contact form which is a pet peeve of mine.


Thanks! Regarding Sysbox EE pricing, it's something that we honestly are still trying to figure out. The reason we ask enterprises to contact us is to understand their use case and needs, so that we can derive a fair price based on this. It's early days for Nestybox, and pricing is a work in progress at this time.


This is a great contribution to the container/K8 ecosystem.


Thanks, that's really encouraging. While it has taken a lot of hard-work to develop it, Sysbox would not exist without the excellent work done by OCI runc developers (Sysbox was forked from runc) as well as the LXD developers (who have done a bunch of the kernel work to enable the advanced OS-virtualization techniques we incorporated in Sysbox).


congrats on the launch! Sysbox seems to be a solution that combines the best of both worlds (vm and containers) - excited it will be open source too :)


Thanks for the kind words!


I hadn’t run across shiftfs before - does this mean the podman style of subuid/subgid allocations no longer need to be manually managed?


Correct; Sysbox always enables the user-namespace in containers and manages the subuid/subgid allocation. In the open-source version, it assigns all containers the same subuid/subgid range, which is not ideal for cross-container isolation. In the enterprise-version (Sysbox-EE), it assigns an exclusive subuid/subgid range to each container automatically.


the docker installation that rhel uses carries small patches that enable one to build containers that run with systemd as init with ease, without the user having to do anything special (or run with privilege), its runtime sets everything up for you correctly. I've built and run them, so I know they work.

RedHat wanted docker to take them, but Docker refused. <shrug>


Didn't know that. But it makes sense given that podman already supports it. Btw, i did a quick search but couldn't find anything on this (docker's systemd support in rhel). If you happen to know where to find these patches, please send them my way. Thanks.


https://www.projectatomic.io/docs/docker_patches/

specifically, the hooks patch.

then see https://developers.redhat.com/blog/2016/09/13/running-system... (i.e. 4 years ago when I built containers to do this)

though of course as you note they now say to use podman https://developers.redhat.com/blog/2019/04/24/how-to-run-sys...


Is it by any chance that it will work with docker-nividia? I.e. support accessing the host's GPU insider the nested containers?


This is something we've not tried yet, so I don't know. I would be surprised if it worked right now. But in general, it's something we would definitely be interested in exploring.

The containers created by Sysbox act like virtual-hosts, so it makes sense to have the ability to expose GPUs / hw-accelerators within them. But container nesting, which comes naturally with Sysbox, would introduce another challenge since the GPUs would not just need to be passed to the outer system container, but also to the inner application containers.


Any reason why sysbox is only supported on Ubuntu and not say RHEL / CentOS / Fedora?


Ubuntu carries a few things that Sysbox relies on: a couple that come to mind are the shiftfs module (which Sysbox uses to enable the user-namespace in containers without requiring Docker to be set in userns-remap mode) and a kernel patch that allows overlayfs mounts from within a user-namespace (since the Docker running inside the container uses overlayfs mounts for its inner images). Having said this, we are looking at ways of overcoming these requirements to extend Sysbox to more distros; it's one of the most asked features.


We are actively working on this one as @ctalledo mentioned. Please ping us offline if want more details.


Doesn't Systemd use PID 1? Meaning you can't run it multiple times; are you utilizing the host's systemd then? In that case: why would I want to dockerize systemd services in the first place?


Right, systemd uses pid 1, but it does so within the pid-namespace of the container, so each container has its own systemd. Hope it makes sense. Thanks!


what are some practical use cases of being able to run docker inside docker? Does that help with hardware in the loop?


We use it a bit, mainly as a side-effect of infrastructure. For instance, our CI runs each job in a container; the spec for the container is checked into source control. That way its easy to add new software in CI, just change the Dockerfile in the project repo. Also easy to run CI jobs locally for debugging.

Then some of our integration tests themselves use Docker, for various things. And at that point, docker in docker comes in handy.

We just run regular docker-in-docker though, which is indeed a very leaky abstraction, lots of pitfalls.


A use case that we often get asked about for Docker-in-Docker is using the outer container as a dev environment that includes a developer's tools, ssh, and a dedicated Docker (CLI + daemon). It gives sys-admins a lighter-weight alternative to VMs for launching those dev environments, and works well in scenarios where efficiency & cost reduction is important and having VM-level isolation is not required. The problem is that prior to Sysbox, those outer containers had to be privileged containers, which provide very weak isolation (e.g., it's possible to turn off the host from within the privileged container!). With Sysbox, those outer containers are now properly isolated via the Linux user-namespace, truly enabling this use-case.


Great work César and Rodney, best of luck with this.

David.


Thanks @lopezator.


Seems Like a promising solution.


Thanks!


what's the difference with say lxc | lxd ?


Please see our response to a similar question below. Hope that helps. Thanks.

"The main difference is that Sysbox is OCI-based, so it works with Docker/containerd and hopefully K8s soon (we are working on the latter). Also, correct me if I am wrong, but I don't believe LXD runs K8s inside without privileged containers. Having said this, I know LXD and Sysbox use many of the same OS-virtualization techniques to do what they do. And in fact we owe much of the work we've done to the ground-work done by the good folks at Canonical/LXD."


How can I use this to start a system container on e.g. AWS?


You can certainly install Sysbox on an AWS EC2 VM and launch system containers inside that VM.

Note that Sysbox currently requires Ubuntu Linux, because the latest Ubuntu releases use pretty new kernels and carry kernel patches that Sysbox relies on in order to perform some of the OS-virtualization in userland. See this doc for the distros supported by Sysbox:

https://github.com/nestybox/sysbox/blob/master/docs/distro-c...

We are actively working on adding support for more distros.


Ah sorry, I was unclear: I meant Amazon ECS.


Got it; the answer is no, because I believe AWS ECS (Fargate) creates the containers using the OCI runc. In order for them to offer system-containers as a service, AWS ECS would need to run Sysbox on their backend to deploy the containers.


That should be possible with EC2 nodes for your ECS cluster rather than Fargate.


Yes, as long as you can install Sysbox on the EC2 nodes, you are good to go.


This is completely off-topic, but why use a Github user like an organization?


Thanks @sanketdasgupta for reporting the issue and @asadlionpk for the explanation. Nestybox's github account is an organization now.


Sorry, not sure i got that. Can you please elaborate?


I think they mean: why is https://github.com/nestybox a user instead of an org account?


I see. Will look into that right away. Thanks!


Nothing but love for Rodny and Cesar - we love Nestybox at ilk!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: