Thoughts on a Container Native Package System

philips · on July 1, 2015

If you haven't seen the MPM talk that is referenced in this blog post I would recommend it. Very interesting to understand the thought process behind's Google's deployment pipeline: https://www.youtube.com/watch?v=_uJlTllziPI

vezzy-fnord · on July 1, 2015

This seems like a roadblock that is rather inherent to the app container model of making images of host OS subsystems partitioned into their own namespaces.

I thought that the idiomatic usage was not to treat them as distros, but as immutable images that you simply tear down and provision when a major system state change transaction needs to be performed?

Package introspection and sharing seems like something people would run a CM or task runner on the containers for. Hence the whole move to build large orchestration suites for them.

Not sure what is exactly meant by "order dependence" in this context.

Package manager cruft is just bad resource management, or sometimes even deficiencies in the PM itself.

jbeda · on July 1, 2015

Immutable images are the name of the game. Any sort of surgical update to image A would result in image A-prime.

Any sort of orchestration system will help to run containers and will know what images you are running but not which packages you are running inside those images.

With something like this a cluster admin could find all the containers across a 10k node cluster are running an unpatched version of openssl. Or could block running containers that are using unapproved packages.

"Order dependence" is the fact that Dockerfile is stictly linnear -- even when 2 steps won't interact in any way. It makes caching and reuse more difficult.

mattzito · on July 1, 2015

Why wouldn't you be better served by an adapted version of a configuration management toolchain like Puppet, Chef, or Ansible?

This would give you the semantic representation you're looking for, idempotency, and a prebuilt library of reusable components that you could leverage going forward.

In addition, it could also handle the case where you have a long-running process in a container and suddenly there's a critical severity bug that you need to patch.

jbeda · on July 1, 2015

I think this is really orthogonal. Puppet/Chef/Ansible/Salt all help to configure an environment and work with the native package manager for the OS. What I'm suggesting here is to instead create a new package manager that is native to the container world and is used at build time.

There is still a need for an improved tool set to build the containers and use this package manager. Perhaps these tools could expand into that area.

As for long-running processes -- I'd consider that an anti-pattern. If there is a critical bug, the correct thing to do would be to build a new (versioned) image and restart (either blue/green or a rolling update) to that new image. This also enables rollback if your fix breaks things further.

jbeda · on July 1, 2015

Happy to answer questions and brainstorm ideas here.

cwp · on July 1, 2015

Just curious about this sentence:

> There are package managers out there that are cleanly factored from the underlying OS (Homebrew, Nix) but they aren’t typically used in Docker images.

I agree that Homebrew is probably not useful. It's cleanly factored from the OS, but it's still pretty Mac-centric.

Nix, however, lots of Linux packages. Why didn't consider using it, rather than starting over?

willsher · on July 1, 2015

There is also pkgsrc, which from memory can be build to be separate from the underlaying OS. FreeBSD's ports is similar. Slackware's packaging is based off simple tar files.

It does seem that this software build/deploy is a key problem that needs solving and the decentralised, easy to grok nature of Docker is key to the software delivery system. Nix is a really clever bit of engineering and design, but is also hard to grasp. Could it be made more simple? I suspect it's 'functional' nature is the hard part.

The nature of containers being essentially immutable, at least from a base software stance, with packages not being upgraded so much as newly installed avoids the problem of upgrading running services. Most (all?) software would run as its own user, so no root level daemons.

Configuration files are built from service discovery (e.g. via Kelsey's confd in lieu of the apps themselves deriving config), so even config need not be preserved if a roll back to prior to the package layering is done.

Just some thoughts, but I agree there is a need to better manage dependencies. Heck, why not build static linked binaries?

jbeda · on July 1, 2015

Thanks for the pointers -- I'll dig in to those to get more ideas.

Also agree that config-as-package is part of this too.

As for statically linked binaries -- this solves some of it but not all. Still hard to figure out which version of openssl is actually running in production. Also falls down in the world of dynamic languages where you app is a bunch of rb/php/py files.

willsher · on July 2, 2015

Which version of X library can be introspected via which binary is linked, rather than which package is installed - inspecting the actuality rather that an meta-data wrapper in the form of a package may be preferable.

As for static vs dynamic binaries, is dynamic less memory intensive even in containers - that is, do library version get shared across containers in RAM or are they separate? If not separate, static may be much easier to manage in general, though there maybe cases where they can't be provided. Upgrading the app involves a recompile, but that's a container building exercise. Versioning can then become tied explicitly to the container version, or by inspecting the static binary.

masom · on July 1, 2015

> Still hard to figure out which version of openssl is actually running in production

You could check the version of openssl in a specific image id and check if that image is used on the cluster.

If one of your image has a package with known security updates it is just a matter of re-building it with the newer package and re-deploying.

jbeda · on July 1, 2015

Agreed -- but how do you figure out which packages are in an image without cracking that image?

Quick -- what version of OpenSSL is in the golang Docker images? (https://registry.hub.docker.com/_/golang/). Short of downloading them and poking around the file system, I can't tell.

willsher · on July 2, 2015

That's a fair point, but the alternative of always compiling the packages at the point in time negates that, at the risk of regressions and other 'new version' bugs, but that may be endemic in this anyway - if the author of the golang container built it at a point in time against version X, then went away and left the container alone, upgrading versions may become risky in any case.

Those kind of risks, and perhaps the larger container risks look like CI pipeline issues - how is the container tested a given new versions? As an adjunct, how do we so container component integration testing? Is that part of this packaging & building system?

snw · on July 1, 2015

pkgsrc is brilliant for containers!

Not using docker but zones as a container solution here, but what we do is bake images containing: pkgsrc + static config files + small scripts.

To provision we take the image + metadata containing dynamic configuration values (network details, keys & certs, etc.) and execute that.

This allows us to make very stable releases containig all our software. Pkgsrc is the most important part of this. It already contains very recent versions of packages as it is released quarterly. But sometimes we need to run a very specific version in production, or maybe add a patch to fix some bugs. This is super easy with pkgsrc.

I've given a talk last year about some parts of this at a local meetup: http://up.frubar.net/3165/deploy-zone.pdf

jbeda · on July 1, 2015

I looked closely at Nix and there is a lot to like there. The thing that turned me off is that it is just too complex. Going from something like Dockerfiles to Nix is just a huge leap.

Example: http://sandervanderburg.blogspot.com/2014/07/managing-privat...

A lesson from Docker is that we have to make this dead simple. It has to be something you can grok in 15-60m.

nhorman · on July 9, 2015

Joe, in regards to your use of packaging systems to create container images, I've been working on that here: http://freightagent.github.io/freight-tools/

gdamjan1 · on July 1, 2015

what about vagga http://vagga.readthedocs.org/

jbeda · on July 1, 2015

I haven't seen vagga before. It looks very interesting.

But, since it is still using an OS based package manager (Ubuntu, for example) it doesn't address all of the issues that I outlined. Specifically packages can't be cached across different images by the host. It also isn't obvious which packages are included in which image without cracking the image open.

Thanks for the pointer!

gdamjan1 · on July 2, 2015

I've not played with it yet. wrt caching packages across different hosts, I have the same problem with debootstrap.

I'm using a common bind mount outside of the container for that now.

redsymbol · on July 1, 2015

Am I the only one who thinks containers and config management tools - e.g., chef, puppet, ansible, salt - are completely orthogonal to each other, and actually work really well together? A simple cycle like the following solves at least a couple of the problems mentioned in the article:

1) Run a base docker image.

2) Run the config management tool on/in this image, (ansible playbook, chef cookbook, etc.)

3) Commit the container to the new image

Dockerfiles are very convenient and easy, and are great as far as they go. They also don't have idempotent actions built-in. As a result, a minor change requires one to rebuild the entire image from scratch.

In contrast, the CM systems mentioned above operate by letting you declare the desired system state - i.e., what it is, instead of how to create it. This lets the CM system runtime automatically skip unnecessary steps. For example, in (say) an Ansible playbook, instead of (in the Dockerfile):

  RUN mkdir -p /path/to/some/dir

... we'd say something like (in the Ansible playbook):

   - file:
       state: directory
       path: /path/to/some/dir

If /path/to/some/dir already exists, and I run this play twice, I do not get the "mkdir: cannot create directory ‘/path/to/some/dir’: File exists" error, because this is designed to be declarative. That means: Ansible first checks whether the directory exists, and if it does, it does nothing. If it doesn't, it recreates it.

It also helps a lot with surgical updates. For example, suppose my Dockerfile for a web service container contains the following:

  RUN apt-get install -y apache2 php5 libapache2-mod-php5

An analogous Ansible play might include the following:

   - apt:
       name: "{{ item }}"
       state: present
     with_items:
       - apache2
       - php5
       - libapache2-mod-php5

Now if an Apache security vulnerability is announced, with a fix pushed to the apt sources, I can rapidly modify the image like so:

1) Run the previous docker image.

2) Run the config management tool with the surgical-update action.

3) Commit the container to create an updated image.

The Ansible version of the update would be as simple as:

   - apt:
       update_cache: yes
       name: apache2
       state: latest

And again, this is a fast no-op if the package is already up-to-date. (It also doesn't needlessly restart apache, though when creating an image that doesn't matter much.)

Using a CM system like this doesn't solve all the problems outlined in the article - actually, I don't think it even solves a majority of them. But I can't figure out why so many smart people are focused on reinventing parts of it poorly, or thinking in terms of "docker vs. ansible/chef/puppet/salt". I know they're not idiots. Am I missing something? If so I'd sincerely like to know.

willsher · on July 2, 2015

Config management, as it currently stands with Puppet/Chef etc./ ideally has no place _in_ containers iff the issue of how applications are deployed there can be solved in a suitably generic and straight forward way. A container becomes and immutable item - logs get shipped off container, config comes from service discovery, failing containers get taken out of service.

There is a need for shims like confd, though perhaps we'll see configuration information libraries emerge that can go straight to a service discovery/config lookup endpoint such as etcd or zookeeper.

All of the config management tools have some sort of overhead and all affect the ability for a container (bar its data in the case of databases) to be immutable.

Building the container should be as simple as installing the package for the given app into whatever container systems is being used, be it Docker, a more bare bones container, or something that comes from the standarisation effort.

The base 'OS' in the container then becomes irrelevant, as the base OS is really the containing, host OS, with the containers just containing the apps and not additional overhead.

In terms of where configuration management fits, it potentially is at the orchestration level, but that said there are already other, more specialised tools emerging in that space too.

jbeda · on July 1, 2015

I agree that they are orthogonal but I'm not convinced that they are the right tool for the job. They can do all sorts of things above and beyond what is needed. Those capabilities come at a cost. For instance, they also manage things like ensuring that servers are running and other transient state.

In addition -- I'd consider the "rerun the config tool on a pre-existing container" to be an anti-pattern. This creates more and more layers to the Docker file that have to be downloaded every time that container is run. I'd much rather build each container from a bedrock up with well versioned dependencies (not as easy as it should be with many package managers).

redsymbol · on July 1, 2015

Ah. It clicks for me now. You're proposing a package management system that operates on the image level. Somehow I didn't fully understand that before.

I fully agree that "rerun the config tool on a pre-existing container" is an anti-pattern, and it's best to avoid it if at all possible. I can see how what you are proposing would be a great step in a positive direction.

I'm not convinced CM systems don't have a role, still. They are very flexible and powerful at accommodating many specialized system setup scenarios. But I think that's independent of what you're really focused on here.

Anyway, excellent and well-written article!