Has been using LXD for last 6 years in production. Its a little gem hiding with all the spotlight on Kubernetes and Docker. Indeed the way LXD works, it naturally provides a path from physical servers and VM instances to containers and back.
The important part is it runs containers in userspace providing a much better security then docker, but seems developers don't care about it.
Also you can use traditional ansible, puppet or chef to create and manage images and containers directly instead of learning shell scripts and Dockerfile way with additional cognitive load.
The only issue I am facing is mounting a shared NFS or CIFS in userspace container using FUSE, since the client drivers for this fs needs to be run through a root process. Hopefully it will be resolved in future. Tried Lizardfs instead of NFS and the system failed in the middle and my issues on the github is there for weeks without any feedback. I will try glusterfs and also looking and beegfs lets see if it can work.
> The important part is it runs containers in userspace providing a much better security then docker, but seems developers don't care about it.
I'm really not sure what you mean here by "runs containers in userspace". LXC and runc (the runtime underneath Docker) both use the same kernel primitives and generally work in fairly similar ways. LXC has more specific work in order to be able to have more VM-like containers (such as having "console" support and the "fun" that is booting systemd inside containers), but it's fundamentally not different to runc containers. In many cases, kernel work by one of us will benefit the other (right now we're working on several kernel patches in parallel, and each of us is considering how we can integrate the others' work into our projects).
I do think LXC has better protections against certain attacks (and I worked with them on some of those protections) and has a much nicer design in many aspects. So you could argue it has some security upsides over Docker or runc, but that's a different topic to what primitives they use -- they're effectively indistinguishable at that level.
(I'm one of the maintainers of runc.)
The main difference is LXC boots an init in the namespace as the first process so you get a standard OS environment with multi-process support. Docker runs the application directly so you get a single process environment, unless you decide to run a process manager. IMHO this creates far more problems than it solves as anything nonstandard would, as you now need workarounds for basics like logging, networking, daemons and anything that expects a standard OS environment
Docker also uses layers to build containers and ephemeral data. None of these 3 critical decisions have seen much technical scrutiny resulting in persistent confusion in the ecosystem and with scrutiny many users will find they can add far more management overhead and complexity to containers. We have written about some of these issues here. 
For instance OP is referring to unprivileged containers and user namespaces that let you run containers as a user process. LXC has supported these since 2013. Docker added user namespaces support in recent versions. However taking just one issue - because of the use of layers to build containers you can't use unprivileged containers for app builds as only root users can mount filesystems. So a lot of these decisions have all kinds of tradeoffs that should be better known and discussed. 
Disclaimer: Founder at Flockport and trying to build a simpler alternative with LXC with support for app builds, provisioning, orchestration, service discovery and an app store. 
Not sure what you mean by "app builds" here. The container process is in a user namespace, not docker itself and "docker build" will work just fine. Maybe you are referring to something else? Let me know, would love to take a look.
Layers are largely an implementation detail in Docker. Yeah they are there, you can even check the sha of them, but not something users interact with. I don't think I've personally heard any of tbis confusion except for very old versions of Docker before content-addressabity and the concept of layers did leak through (each layer was an "image")... maybe there's still some lasting confusion here? I could see this I suppose with things like older books/blogs/etc referring to them as a first class citizen. They are definitely not and certainly should not be. One of the early design mistakes which are long since corrected (docker 1.10 introduced content addressable images, and the concept of the image as a whole rather than an image being a tree of other images).
Also, a side note, "docker run --init" gets you the init you are looking for, can set it as default in the daemon as well.
Note, I've been a maintainer/contributor to Docker for 5 years now.
> Layers are largely an implementation detail in Docker.
You say this, but they are so deeply baked into many tools that there have been many hundreds of talks about how to "reduce the size of your Docker image with this one simple trick!" which reduces down to understanding how layers in Docker images work. Layers definitely leak through a lot of the tooling.
I'm hoping that my OCIv2 proposal will remove the practical need for layers entirely -- and also remove a lot of the cargo-culting which has popped up around "how to get smaller images". Maybe Docker might adopt this in the future (after you guys get OCIv1 support :P).
> Also, a side note, "docker run --init" gets you the init you are looking for, can set it as default in the daemon as well.
This is different to LXC. With LXC, the "init" is actually systemd, not just a zombie reaper -- and when you do an "lxc attach" you are actually put into a new console (getty and all).
Yes, you can run (after some headaches) systemd inside Docker, but LXC definitely handles this much better because that's the usecase it was designed around. It's not really a defect in Docker, it's just not what Docker was designed around.
> I'm hoping that my OCIv2 proposal
I've seen (and largely agree with) your post (somewhere? Can't remember where) here, are you working on a formal proposal?
> "init" is actually systemd
I think this opens up additional security issues (requires SYS_ADMIN as I recall), though?
I'm currently working on a blog post to describe and justify the design, as well as a PoC. The formal proposal for OCI will come after the PoC (and after sufficient benchmarking and discussion within the community). But I plan to write a spec document along the PoC.
Here is an example. I build an Nginx container with no layers from a base alpine image. Users can download and run it in a layer so the original Nginx container remains untouched or run a copy of it. If there is a security alert or update I rerun the build. No layers used. You build the same Nginx container will multiple layers, and have all the overhead of tracking and managing these layers. And users then download the Nginx container made up of all these layers and run it. If there is a security update in any lower layers it needs to rebuild. You have used a workflow and designed an entire system based on stacking layers to compose and build your image. That is a lot of overhead and complexity in basic image management. What is the benefit?
Caching for `docker build` really doesn't have anything to do with layers except as a means of storage, it doesn't need to be layers at all.
To be honest, layers aren't even all that efficient (except in perfectly ideal scenarios), just better than no sharing at all and no one has taken the time to come up with something better yet...because frankly there are many other things that need improvements before bothering with it.
Again, just figured I'd mention again... since you are trying to explain Docker to me... I'm a Docker maintainer and have been for the last ~5 years. Happy to chat about this.
Although today my system with LXD is much easier and getting better with every subsequent release of LXD, I owe you gratitude for helping me to get off the ground.
Full user namespacing has been in from very early on (day 1?) which is the "increased security" -- this is where I believed the difference between "system containers" for LXC and regular "application containers" was. runc was not always the runtime underneath Docker, and IIRC it has not always had support for proper user namespacing for as long as LXC has.
Also, LXC has so many more features packed in I think it's a disservice to compare it to runc in the first place.
It should go without saying but I would absolutely love for you to sprinkle some knowledge on me wherever I'm wrong about either of these.
[EDIT] - Getting downvotes so figured maybe I said something egregious so I did some digging:
- LXC had user namespaces @ v1.0 in 2013
- Docker got user namespaces @ 1.10 in 2016
This is the distinction I meant.
I'm not sure this is true. User namespaces were added a while after LXC first started development (from memory), and weren't safe (at all) for several years. There was a lot of churn to make user namespaces work (and both LXC and runc had issues with them).
It should be noted that the user namespacing available in LXC is emphatically the correct way of doing it, and few other runtimes have had similarly correct approach. rkt had a similar idea, but never really went all the way with the extra work. Christian Brauner (one of the LXC maintainers) did quite a bit of work to extend the mapping limit for user namespaces just so that LXC could do even more with them.
runc _could_ be used in a similar way to LXC (after all, fro userspace user namespaces are just three files and a clone flag), but you'd need to do the management separately. And doing it right is quite hard when you get in the weeds -- and I absolutely give props to the LXC folks for getting them right.
> IIRC it has not always had support for proper user namespacing for as long as LXC has.
runc has had user namespace support for a shorter time because the project is younger. User namespaces aren't particularly difficult to use (though a lot of the anciliary work is quite significant -- and LXC does a lot of things that are outside the scope of runc).
Most of the delay for Docker to have user namespaces was purely because of Docker (and I even think runc wasn't a separate project when Docker got user namespace support -- it was still libcontainer back then).
> LXC has so many more features packed in I think it's a disservice to compare it to runc in the first place.
I absolutely agree LXC has many more features (many of them that I'm jealous of), it's just that runc is the most apt comparison to it from the Docker stack.
I disagree it's a disservice, given that I know the LXC folks personally and they feel it's okay to make such comparisons (especially when we're discussing individual runtime issues that both runtimes had to deal with in different ways). I'd never claim that runc does things much better than LXC -- it has lots of really great engineering that I wish more people took advantage of.
There are only two things that runc has over LXC:
* In our design, there are no monitor or long-running processes. The upside of this is that you don't have to worry about us being bad at programming -- your container is your business. The downside is that you likely will need to manage your containers somehow (getting the exit code is the most obvious thing you currently need a monitor process for). And console handling is quite complicated. But it does allow for quite a bit of flexibility in the layers above runc -- and it took quite a while to get this right.
* runc has a fully-unprivileged mode ("rootless containers"). Now, LXC can do most of this, but with runc my main idea was that we needed to have completely unprivileged containers (no admin setup, no setuid helpers, and so on). This has pioneered a few projects (slirp4netns -- a highly performant unprivileged user-space veth bridge) which actually now allow you to do an incredibly large number of things without any privileges at any step in the process. So while LXC could do most of this, it used several setuid binaries which made it not work for me -- and runc's rootless container support has really made a lot of strides that wouldn't have been made without the work I did in runc.
But those are fairly minor gripes. LXC even has OCI image support with templates (using a tool I wrote called "umoci" :P), and there were plans to add OCI runtime configuration support as well.
liblxc has go bindings now!
(As an aside, the liblxc Go bindings have been around for a while -- the LXC guys added them so that Docker could use LXC more effectively. We all know how that worked out.)
> I'm not sure this is true. User namespaces were added a while after LXC first started development (from memory), and weren't safe (at all) for several years. There was a lot of churn to make user namespaces work (and both LXC and runc had issues with them).
Ah, this is news to me -- thanks for clearing that up. I knew that both projects had worked hard on getting user namespaces working and kernel changes did have to land to make it happen.
> runc has had user namespace support for a shorter time because the project is younger. User namespaces aren't particularly difficult to use (though a lot of the anciliary work is quite significant -- and LXC does a lot of things that are outside the scope of runc).
> Most of the delay for Docker to have user namespaces was purely because of Docker (and I even think runc wasn't a separate project when Docker got user namespace support -- it was still libcontainer back then).
Yup, I was thinking of the switch from LXC -> libcontainer -> CRI-shim|containerd|runc which is the reality now. Basically LXD just stayed on LXC and let that mature and got all the benefits that came along. But I also value what runc achieved because the CRI is a fantastic abstraction and fossilization of what people can/should expect from a container runtime, and runc is the minimal realization of those requirements.
> It should be noted that the user namespacing available in LXC is emphatically the correct way of doing it, and few other runtimes have had similarly correct approach. rkt had a similar idea, but never really went all the way with the extra work. Christian Brauner (one of the LXC maintainers) did quite a bit of work to extend the mapping limit for user namespaces just so that LXC could do even more with them.
Agreed, I actually use rkt for my first few k8s installs back when CoreOS was brandishing "rktnetes". For anyone interested Christian has an excellent talk on this about the stuff in the kernel that had to land for all this magic to be possible.
> I absolutely agree LXC has many more features (many of them that I'm jealous of), it's just that runc is the most apt comparison to it from the Docker stack.
> I disagree it's a disservice, given that I know the LXC folks personally and they feel it's okay to make such comparisons (especially when we're discussing individual runtime issues that both runtimes had to deal with in different ways). I'd never claim that runc does things much better than LXC -- it has lots of really great engineering that I wish more people took advantage of.
I want to be clear that I didn't mean to disparage runc in anyway as a tool, they just seem (to me) to operate at entirely different layers. runc (to me) is small nimble and unix-y -- lxc is like more like a microkernel. I meant it would be a disservice to both tools to compare them.
> There are only two things that runc has over LXC: ...
Thanks for the from-the-horses-mouth comparison of both!
I had no idea you wrote umoci -- thanks for that as well -- it's a super useful tool.
It's always nice to see that more people are using the stuff I've made. I'm currently working on some ideas for OCIv2 images which hopefully will massively improve the current state of things (images are really inefficient and quite problematic). I'm writing a blog post about it at the moment.
You can certainly provision docker containers using ansible / chef / puppet. It's not popular, because they do way more than is required in those situations normally, but it doesn't stop you. A lot of larger applications on dockerhub are done that way - splunk's dockerfile is just an ansible starter for example.
Puppet even has a page about it: https://puppet.com/blog/running-puppet-software-docker-conta...
Could you elaborate on what you're referring to what they're doing beyond what's required?
> A lot of larger applications on dockerhub are done that way
From what I understand, you have to write the configuration management in a different way such that it doesn't require restarting services after configuration files have been updated. Converting an existing playbook or cookbook to work that way isn't trivial depending on how many things it's doing to provision the server.
Chef, puppet, ansible and other config management tools are designed to run nicely in an existing system. That means if you want to do action X, they'll usually check if it's possible, try to resolve internal dependencies, handle potential errors, register the state change for report and other things.
Most of that is unnecessary when you're building a docker image from scratch. You control all the dependencies - they're the lines before this one. You don't care about nice reports of partial failures - they should cause immediate aborts. You don't care about managing state - each step is it's own fully defined state.
For example, if I remember correctly, chef will query available package list when you request something installed. It will also continue running independent recipes if the installation fails. When building images I want literally "apt-get install foo || crash_and_burn".
> you have to write the configuration management in a different way such that it doesn't require restarting services after configuration files have been updated
In my experience with cfmgmt - this needs to be implemented whether you build images or update running systems. There always comes a time when you want to change something but not trigger service updates. When building images that time is always.
LXD makes it such a trivial job to host private image repository and serve it to a large cluster automatically that I don't need to break a sweat.
This way my infrastructure code is uniform be it compute instance, physical servers or containers. This is a beautiful thing in terms of infrastructure management be it micro-services architecture or monolith.
In our case all the code which launches container, create images, or launch physical server is same codebase following a unified pattern of doing things. In our setup we used many playbooks which we can built leveraging the work done by community for physical servers, VM's and containers.
In the docker world I cannot trust an image completely until I go through the code which is a mixture of how an image is built and which layers are used. If the image is built using successive images I need to really go back to each image code which is again a mixture of scripts, Dockerfile. Being a follower of Zen of Python PEP-20, I like explicit and hence like to know the details of images for use in production and generate them myself from base os images.
I was one of the first users of Docker when it was released, but was bit disappointed later when they try to move away from LXC to built their own layer. I wrote a blog post on it at http://www.vyomtech.com/2014/03/04/docker_and_linux_containe...
An issue was created and to my surprise a fix landed in next release. It helped me to launch haproxy containers binding port 80/443 on compute instance and have correct client ip in haproxy logs. This helped me to have analytics from haproxy logs directly.
Was reported on July 14th and committed on July 19th.
I would like as well to see usage examples in the official documentation.
Having said that, here is a distribution usage for the snap package of LXD,
(see that the end of the page).
The other big one for a long time was fuse inside user namespaces, but that has been merged upstream in 4.18.
Some of the AppArmor features were also a sticky point for a while as upstream was lagging behind quite a bit, but the AppArmor maintainer has since fixed that, so recent Linux kernels have everything that we use.
Author: Stéphane Graber
Date: Wed Nov 5 10:09:28 2014 -0500
Add licensing and contributing guidelines
Signed-off-by: Stéphane Graber <email@example.com>
There is a migration path from LXC to LXD and probably the OP migrated at some point in time.
In addition, Samsung is somehow using LXD in their new phones, with "Linux on DeX". In that way, you can get a Linux interface with Ubuntu when you plug your phone to a monitor/TV.
If you use a Linux desktop, you can get to run GUI applications in a LXD "system container", therefore isolating the files from your host's filesystem.
Here is how to do this by sharing the X server of the host (filesystem separation but sharing the X for convenience),
You can run things like CS GO DeathZone in such a LXD container with full GPU acceleration, and even closed-source NVidia driver.
Some people also got an LXD container to run in a separate X server, thus having even better isolation.
I installed CLion inside the container and use the host X server to display the IDE and the app being developed.
The containers are so useful to isolate the main system from these make install and to be able to install the dependencies or rollback to a previous version in case of mistake without fearing side effects on the host system.
What then increases my concern is that the two other major technology pushes Ubuntu has made into the wider ecosystem - Upstart and Mir - have both failed to take off.
Adopting a niche technology - especially for something as fundamental as a hypervisor - is already a hard choice. The LXD team is making adoption even harder by not offering official repos for the major distros.
A Linux distribution can repackage LXD in their native packaging format and some distributions are doing it already (Debian, Alpine, Fedora, etc). If you are familiar with packaging, you can help promote those packages to the official repositories.
Also, while the main development team are Canonical employees, LXC existed long before the lead developer was hired by Canonical -- and predates many projects. It's not going to be going away soon -- I'd bet that the development team would just move to a different employer if Canonical wanted to stop supporting its development.
We welcome distributions packaging LXD directly themselves and tend to recommend that they stick to the LTS releases for that as they move at a much more manageable pace.
As noted, there are native packages for several distros out there, I'm currently aware of Fedora, Alpine, Arch and Gentoo.
The snap, makes us, the upstream, able to easily build and test a package that will work identically on many popular Linux distributions and releases. This also makes it possible for us to release fixes and new releases at the exact same time to everyone and be able to very easily reproduce any issue that gets reported to us.
We have CI in place for a lot of distributions that can use the snap package and do not let anything reach users until it's green on all of them: https://jenkins.linuxcontainers.org/job/lxd-test-snap-latest...
LXD offers snaps, which work on a wide variety of distributions. As somebody who has packaged LXD for Void Linux, I can tell you it is not a simple piece of software to package. It requires a patched version of SQLite, for example.
I plan on releasing some of the work I've been doing on LXD soon, hopefully it will push lxd into the spotlight over kubernetes and docker.
So basically. I push code terraform detects if container running service exists, if not it creates the containers, if it does it will upgrade the service in the existing containers without destroying, so my deploys are super fast. I will have a more detailed video series coming out soon on the site I mentioned above.
I have 5 machine in a cluster right now running various services, it is definitely over provisioned, I'm barely using the machine's resources. I could technically host the whole thing on one server, but given what I'm building, I wanted to test out how my solution scales in a clustered setup. It's got an elixir backend, so the elixir apps are also clustered sharing data using LXDs fan networking.
All the services get deployed / updated into the cluster via an automated CI / CD
LXC doesn't have the flexibility (for example you can't just add another runtime underneath it), but the amount of features it has packed in it could have given earlier k8s versions a run for their money. Also, a lot of the implementation flexibility that Kubernetes is seeing now (ex. runtimeClass) is the result of years of inflexibility and people finding ways to work around it (frakti, etc) and build at a slower pace (which is also a good thing, slow measured progress and all that) -- my point is that LXC might have have been the better base, all things equal.
Moving VM workloads right into LXD without needing to port them to be "cloud native" might have saved people a bunch of time, never having to deal with complexities of overlay networks (as long as you could set up LXD properly) would also likely have saved people some time.
In any case, all you have to do these days is to satisfy the Container Runtime Interface (CRI).
kube-lxd and lxdlet seem pretty old though containerd has moved at breakneck speed in the last year... Looking through the code it might need some TLC
You create a "system container" and it keeps running as a stock Linux distribution (many flavors are supported).
It takes about a couple of seconds to create a new container, and a bit less to remove it.
Unfortunately, I've got some older LXC (v1.0) containers that fail to boot under LXD, which means I need to have both container systems installed on my hosts.
It's unfortunate that this means I have to have both the "dash" commands like `lxc-ls` and the "space" commands `lxc ls` which return separate lists. That's because one of them is the LXC project, and one of them is the LXD project, confusingly using very similar names for commands.
The one feature of LXC that I really miss in LXD is the `lxc-console` command. Among other things, without console it's harder to debug those LXC containers to figure out why they aren't booting under LXD.
Obviously this can be fixed but it's non-trivial to figure out why. For the time being it's easier to keep using LXC for those.
If you mean why can't I completely replace them with newly built containers, it's because they are complete but rather old OS images with years of configuration history in them, running specific versions of mission critical applications.
They are older than LXC, and were originally run on real servers, then under chroot and UML, then under KVM, before LXC or Docker were a thing.
To be tinkered with most carefully. As much as I would like to restart from scratch, with new versions of everything, experience has taught me it would be unwise to do it in a hurry, or without extensive testing.
So yes they can be replaced and will be eventually, but it's not a small job and will need time scheduled to it.
For the time being it's the Right Thing To Do(tm) to keep the existing images still running as they are.
For example, if you're testing your infrastructure configuration code, you need some type of server to run it against. Using a base Ubuntu 18.04 Docker image might work, but there's a lot of things that Docker is incapable of allowing you to configure, where as with an LXC, you can.
In other words, using Docker as an infrastructure target gives me about 80% confidence that things will work on a real host, but with LXCs that confidence level goes up to 99%.
So it's common to have a snapshotted LXC image that you can use as a target system, so you can spin up a new fresh LXC based Linux distro in a few seconds. It's a lot faster than using something like Vagrant and fills a much needed gap for testing how you provision servers.
 - https://discuss.linuxcontainers.org/t/lxd-on-centos-7/1250
 - https://pkgs.alpinelinux.org/packages?name=lxd&branch=edge
Obviously, this does not preclude the distributions from packaging LXD in their own packaging format.
Apart from the AlpineLinux effort to package LXD, there is also a Debian effort, https://wiki.debian.org/LXD
Users upgrading from 18.04 to 18.10 are already being automatically migrated over to the snap.
We have 3 tracks for the snap right now:
- 2.0 LTS
- 3.0 LTS
So users that want to stay on an upstream LTS release can still do even after switching to the snap.
Void Linux has LXD LTS (3.0.3) and latest (3.8) available.
Reading further, I see that LXD uses liblxc for container management. This comment about LXD using lxc command naming also was helpful: https://news.ycombinator.com/item?id=18672510
I'm excited about Silverblue, just need to get a few more apps into sandbox-ready Flatpaks...(Firefox, I'm looking at you)
If instead you are using commands like "lxc-start", then you are using "LXC".
There are two possible aspects of confusion:
1. LXD uses the "lxc" command line utility for all the management of LXD containers.
2. LXC is both the name for the Linux kernel "Linux Containers" functionality and the first/early implementation of tools (those "lxc-????") for Linux Containers.
Makes life very easy, I used to have to mess with iptables too but with proxy device things became much simpler.