Microcontainers – Tiny, Portable Docker Containers

_query · on Feb 2, 2016

> Docker enables you to package up your application along with all of the application’s dependencies into a nice self-contained image. You can then use use that image to run your application in containers.

Actually docker was created to enable easy scaling of web applications. If you want to package your application up take a look at some modern package managers. By using a modern package manager like nix you can work around the issues of containers while still keeping the "it just works" advantage of containers.

This does not mean that containers are bad, it's just that using a good package manager is usually a simpler solution.

Take a look at my blog post if you want to read more on this: http://www.mpscholten.de/docker/2016/01/27/you-are-most-like...

mstade · on Feb 2, 2016

I mostly agree with what you're saying, but what caught my eye was:

> About the author > Hey! I’m Marc, an eighteen years old software engineer located in Germany.

You're obviously quite knowledgeable and well spoken for such a young age, and presumably English isn't your first language either. I just wanted to say keep up the good work, you have a bright future ahead of you – kudos!

wcummings · on Feb 2, 2016

Surprised he advertises his age, I started working around that age and pretty deliberately avoided age as a topic of discussion. Age is something best left unmentioned in a professional context.

_query · on Feb 2, 2016

Thanks for the advice. I've thought about it a little bit and decided to remove it with my next update to the web page.

mstade · on Feb 7, 2016

I'd love to say it's not sage advice, but it is. Unfortunately, once people know your age they will – consciously or subconsciously – treat you differently. I rather quickly opted to never disclosing my age in any context, unless it is absolutely necessary (it almost never is.) It's just easier that way, leads to less discrimination. Unlike skin color or other highly visual traits, it's one aspect of your person you can hide without any significantly negative consequence.

wcummings · on Feb 3, 2016

You don't have to, but it will help avoid a lot of snark as well as well-intentioned but annoying jokes (like the babel dev described), not to mention less visible discrimination like being passed over for promotions, being given worse projects and so on.

There are also programmers whose pride is damaged by doing the same work as someone much younger than themselves.

Remember (at least in the US) age is a protected category and you can't be asked about it during an interview, so if you're careful about how you write your resume no one really has to know exactly how old you are.

Good luck.

marianoguerra · on Feb 2, 2016

I used to do it a lot until I noticed, I think it's related to the first topics you learn while learning English (presenting yourself, name, age, where you come from).

honua · on Feb 2, 2016

Yeah usually gets messy. E.g. the Babel author's experience

petetnt · on Feb 2, 2016

Happen to have a link at hand where I could read up on this? Tried searching for it but drew blank.

bgaid · on Feb 2, 2016

From the creator, Sebastian McKenzie: https://medium.com/@sebmck/2015-in-review-51ac7035e272#.e80o...

sjclemmy · on Feb 2, 2016

Awesome read. Sounds like a great guy.

karmajunkie · on Feb 2, 2016

Its the kind of thing most people learn after they wish they'd learned it earlier :)

_query · on Feb 2, 2016

Thanks, made my day :) In case you found some spelling or grammar mistakes let me know.

simonebrunozzi · on Feb 2, 2016

Agree with mstade - very impressive for your age.

I just sent you a private email.

pea · on Feb 2, 2016

Yes, layering is somewhat of a hack to get around the fact that most OS package managers are mutable. IMO nix / guix have a much better approach here.

bboreham · on Feb 2, 2016

> docker was created to enable easy scaling of web applications

Citation? From my perspective, most of the features you'd need to do that appeared around 2 years into Docker's life.

_query · on Feb 2, 2016

Initial docker was part of dotCloud, a platform-as-a-service company[0]. From the perspective of a paas-company container technology allows for easy scaling. From the perspective of a docker user you're right that many of these tools were not available until recently.

[0]: https://en.wikipedia.org/wiki/Docker_(software)#History

fapjacks · on Feb 2, 2016

Nah. I don't like articles with phrases like "you are doing it wrong" when it comes to Docker, because Docker affords you choice. It is up to you to make the choices, and there are some "best practices" around using Docker, but there is no "right way" to use this technology. Use what is "right" for your environment and avoid platitudes like "you are doing it wrong". Anybody that wants a taste of what I'm talking about should stop by #docker on Freenode and see how people are using it.

foldr · on Feb 2, 2016

I don't know if things have changed in the past few months, but the last time I tried nix it's used too much memory to make it practical to run it within a VM on my laptop.

treeder · on Feb 2, 2016

I"m pretty sure that's not why Docker was created. And Nix is definitely not a simpler solution for packaging/dependency management. It's a good idea, but with all the advantages of Docker (full isolation, flexibility of using any OS and any package manager, easy distribution), why bother with Nix?

matzipan · on Feb 2, 2016

I believe Ubuntu click packages have been developed exactly for this purpose. That isn't to say alternatives aren't good, but maybe you can draw inspiration.

tmd · on Feb 2, 2016

Docker storage engines remember all the layers by storing incremental differences so it makes little sense to have a layer that only removes things (like the last RUN in https://github.com/iron-io/dockers/blob/master/ruby/Dockerfi...). You can see with `docker history <image>` that this layer has 0 bytes and you can access anything that it removes by using the parent image (`docker run -it 88ae7e32865f ls /var/cache/apk`).

In this case it doesn't make much difference (a couple of MB perhaps) but often you'll see people chaining all the commands and ending it with a cleanup in a single RUN like here: https://github.com/docker-library/ruby/blob/ccbf9e54f60ba245...

grkvlt · on Feb 5, 2016

Heh, looks like the author fixed this [1] and it now uses the technique you suggest, based on issue #22 [2] which restates the same thing as your comment, basically chaining everything together with `&&` in a single `RUN` entry.

[1] https://github.com/iron-io/dockers/commit/71687edb849dcb9079... [2] https://github.com/iron-io/dockers/issues/22

lox · on Feb 2, 2016

Do we really need to brand using smaller containers as microcontainers? Why not submit these as pull requests to the official docker images rather than introducing more fragmentation?

Also worth noting that the official docker images are moving towards alpine. Of course if you have ruby or node apps your biggest space hog is still going to be your packages.

TheDong · on Feb 2, 2016

There's also a very good reason to not move some things to alpine, including ruby and node applications. A significant number of ruby and nodejs apps depend on some c/c++ code hidden away in this module or that gem. Any time you're compiling c/c++ code, you run into a chance of musl vs libc causing differences.

These differences can range from building requiring different options to the application having strange performance characteristics to random crashes (rare!).

In addition, people who write these modules typically test against an ubuntu or debian system, so those are much safer bets. Alpine makes no serious attempt to match packages with those, either in breadth nor version.

It really isn't feasible to unilaterally move people, and for compatibility reasons Docker does not often change tags in such a drastic way.

The official docker images are often offering alpine alternatives in different tags, which is not the same as "moving towards" as you remarked.

shykes · on Feb 2, 2016

We are in fact moving towards alpine, but can't just switch the default tags overnight in case some users depend on the properties of certain distros.

ishtu · on Feb 2, 2016

excited to see recently added support[1] of dns search domains in musl (and therefore Alpine).

[1] http://git.musl-libc.org/cgit/musl/commit/?id=3d6e2e477ced37...

kylequest · on Feb 2, 2016

Alpine is an option in some cases, but, like you said, it's not always straight forward. Native application modules, 3rd party OS packages and the overall package echosystem differences are important factors that need to be considered. This is why DockerSlim [1] was created. You use a regular distro like Ubuntu and you still get micro containers! Note that the sample node microcontainer with DockerSlim is 14MB and in this post it's 29MB :-) It uses hapi.js instead of express.js though, but it's probably not the main reason for the size difference.

[1] http://dockersl.im

lox · on Feb 2, 2016

Sure, but you have it the wrong way around. The default should be slim and Ubuntu/Debian/etc should be an alternative.

https://news.ycombinator.com/item?id=11000827

cyphar · on Feb 2, 2016

One solution would be to have a "build" image that produces your binary or set of packages you need and then a "service" image that actually runs it.

TheDong · on Feb 3, 2016

In general, I agree with this, but it specifically doesn't help with issues where glibc vs musl matters.

Either your build image's libc (musl in this case) is statically linked in and you have the aforementioned issues, or it's not statically linked and your binary won't run. Most things compiled within an ubuntu container won't run in an alpine one; you specifically have to go out of your way to make that so.

cdnsteve · on Feb 2, 2016

Agreed. Docker was suppose to inherently be about small containers. Vendors just choose to use the largest Linux base image out there.

AstroJetson · on Feb 2, 2016

I agree, they are still docker images, they have less stuff inside. I didn't know about Alpine, so I'm going to start down that path. Working on Lua based applications, hoping to see if Alpine will work.

philipwood · on Feb 2, 2016

I've always felt that VMs are a sign that show that OSes have imperfectly abstracted the virtualization of their resources. A "design smell" if you will.

When you get right down to it a window is a virtualization of a screen, a packet switching network is a virtualization of a fixed line, a process is a virtualization of a CPU, etc. (I can't remember who I'm quoting).

The direction this is all heading (VMs, containers, unikernels, etc.) feels like confirmation of this to me. A series of corrections aligning our OSes with more ideal/virtual abstractions.

sirtaj · on Feb 2, 2016

Plan 9 was a realization of this line of abstraction, in a lot of ways. Even though I never used the platform for anything serious, I still draw inspiration from it all these years later.

antouank · on Feb 2, 2016

Isn't it the same as using "alpine" as base ? https://news.ycombinator.com/item?id=10782897

woadwarrior01 · on Feb 2, 2016

It looks like their base image is based off of the alpine image[1].

[1]: https://github.com/iron-io/dockers/blob/master/base/Dockerfi...

schappim · on Feb 2, 2016

I have to say I love the work iron.io does. Their "iron worker"[1] service is awesome (think Amazon Lambda for pretty much any common language).

[1] http://www.iron.io/worker/

markherhold · on Feb 2, 2016

Interesting. How long does it take an "iron worker" to start code execution after a webhook (latency)?

treeder · on Feb 2, 2016

Typically a couple seconds.

_Marak_ · on Feb 2, 2016

This seems to solve one of my biggest issues with trying to use individual Dockers to run microservices.

Past experiments proved that it wasn't feasible to match one microservice request to one Docker instance. The size of each Docker required vertical scaling of the microservices per container, which was not ideal.

Good job Iron.io! Might trying playing with these at some point.

TheDong · on Feb 2, 2016

... Why? The scalability difference (in terms of disk space) between having multiple services per container and many containers is, in practice, zero.

That's because of layered filesystems and most of the language runtimes and other images pulling from a small set of base images.

For example, if you have two node applications, the base "nodejs" image is 633MB in size. If both of your applications have "FROM node" at the top and proceed to add different applications (say 2MB each), you'll find the "node" portion is totally shared and only the diff, you copying in your application's files, is different. You'll only store a single copy of the 633MB node image + your two application image's diffs, for a total of 637MB.

That number is the same if you bundle the applications (vertically scale as you say it).

This is also sorta true across language runtimes too! With some creative diffing, I experimentally tested the difference between the ruby and nodejs images. It turns out there's just under 600MB in common. The ruby image has 116MB of data the node one doesn't, and the node one has ~40MB of data the ruby one doesn't.

Again, you'll end up using a very small amount of disk space.

These savings also apply to downloading images after the initial base layers are downloaded (e.g. docker pull ruby; docker pull node will only download roughly 750MB of data, even though that's two >600MB images).

In any case, it's strange that disk space on the order of a couple gigs is even a scaling concern at all; I'd imagine memory or your application's disk needs would far outweigh such concerns.

My main point, however, is that your discussion of one service per container vs multiple services per container being any different in terms of disk space is rubbish and utterly false.

_Marak_ · on Feb 2, 2016

I would assume the smaller images would also result in a smaller memory footprint for running the images and a general reduction in time of starting images. You seem to know a lot of about Docker, is that a wrong assumption?

The scale which I'm discussing is in the order of at least several hundred docker images per second. Previous attempts at making this work involved keep a warm elastic pool of Dockers. I'm working with at least 11 environments, ( which all have separate dependency requirements ).

Instead of trying to manage a very large pool of Dockers, I opted for a smaller pool with several larger servers to scale the microservices vertically ( using tools like chroot to help try to isolate each service per silo ).

My main issue with using Docker for this was the bulk of the containers. Startup time, RAM consumption, and the size of the images were all causing me issues.

lucaspiller · on Feb 2, 2016

Docker isn't a VM, so the memory usage should be pretty much on par with chroot. The only difference is shared libraries will need to be duplicated in each container (as nothing is shared) and loaded into memory multiple times, but that should be on the order of a few megabytes.

davexunit · on Feb 2, 2016

The duplication is worse than that. It's a data structure problem. Docker deals in opaque disk images, a linear, order-dependent sequence of them. The data structure is built this way because Docker has no knowledge of what the dependency graph of an application really is. This greatly limits the space/bandwidth efficiency Docker can ever hope to have. Cache hits are just too infrequent.

So how do we improve? Functional package and configuration management, such as with GNU Guix. In Guix, a package describes its full dependency graph precisely, as does a full-system configuration. Because this is a graph, and because order doesn't matter (thanks to being functional and declarative), packages or systems that conceptually share branches really do share those branches on disk. The consequence of this design, in the context of containers, is that shared dependencies amongst containers running on the same host are deduplicated system-wide. This graph has the nice feature of being inspectable, unlike Docker where it is opaque, and allows for maximum cache hits.

cyphar · on Feb 2, 2016

> The duplication is worse than that. It's a data structure problem. Docker deals in opaque disk images, a linear, order-dependent sequence of them. The data structure is built this way because Docker has no knowledge of what the dependency graph of an application really is. This greatly limits the space/bandwidth efficiency Docker can ever hope to have. Cache hits are just too infrequent.

This is only true when you're building your images. Distributing them doesn't have this problem. And the new content-addressability stuff means that you can get reproducible graphs (read: more dedup).

> So how do we improve? Functional package and configuration management, such as with GNU Guix. In Guix, a package describes its full dependency graph precisely, as does a full-system configuration. Because this is a graph, and because order doesn't matter (thanks to being functional and declarative), packages or systems that conceptually share branches really do share those branches on disk. The consequence of this design, in the context of containers, is that shared dependencies amongst containers running on the same host are deduplicated system-wide. This graph has the nice feature of being inspectable, unlike Docker where it is opaque, and allows for maximum cache hits.

For what it's worth, I would actually like to see proper dependency graph support with Docker. I don't think it'll happen with the current state of Docker, but if we made a fork it might be practical. At SUSE, we're working on doing rebuilds when images change with Portus (which is free software). But there is a more general problem of keeping libraries up to date without rebuilding all of your software when using containers. I was working on a side-project called "docker rebase" (code is on my GitHub) that would allow you to rebase these opaque layers without having to rebuild each one. I'm probably going to keep working on it at some point.

lisivka · on Feb 2, 2016

Your assumptions are wrong. Glibc is faster (and better) than musl. Systemd is faster (and better) than SYSV init scripts.

Moreover, for example, I can update my running containers based on Fedora 23 without restarting container, by issuing "dnf update", which will download updated package from local server, which is much faster that to build container, publish it to hub, download it back, restart container (even when only static files are changed).

bluejekyll · on Feb 2, 2016

> Glibc is faster (and better)

Faster is objective, and in most cases correct. Glibc has a lot mor optimization over the years. Better is subjective and completely depends on your use case:

http://www.etalabs.net/compare_libcs.html

Similar point to systemd, it is kind of misleading to say that it's faster, it is parallel and event driven, which definitely makes it's end to end time shorter on parallel hardware. And again better is subjective, it's so much more complex that it might not always be the right choice.

Also, why use systemd inside a container at all? There's just one process in there usually.

sandGorgon · on Feb 2, 2016

whoa - you have a container with working systemd ? I thought that was an unfixed bug - https://github.com/docker/docker/pull/5773 and https://github.com/docker/docker/issues/3629

lisivka · on Feb 2, 2016

Yep. It have some quirks (hard to shutdown properly), but it works.

    [vlisivka@apollo5 docker-centos7-systemd-unpriv]$ ./enter.sh
    [root@e3c3dd7539ad /]# ps ax
    PID TTY      STAT   TIME COMMAND
        1 ?        Ss     0:00 /usr/lib/systemd/systemd
       71 ?        Ss     0:00 /usr/lib/systemd/systemd-journald
       74 ?        Ss     0:00 bash
       92 ?        R+     0:00 ps ax

https://github.com/vlisivka/docker-centos7-systemd-unpriv

sandGorgon · on Feb 2, 2016

Hmm.. It's not worth it for me right now. I am using supervisord which works brilliantly

_Marak_ · on Feb 2, 2016

So the overhead of using Docker is still to great.

Thanks for the information.

cyphar · on Feb 2, 2016

Docker has very little overhead (apart from all of the setup required to start a container). In principle it has no overhead, but Linux memory accounting has implicit memory overhead (this is a kernel issue, not a Docker issue).

BenjaminCoe · on Feb 2, 2016

I believe Alpine is GPL licensed. Curious what companies are using Alpine? What ramifications does this have on the licensing of a micro service running in Alpine.

danieldk · on Feb 2, 2016

The GPL only comes into effect when you redistribute software. Since most companies will probably use it to deploy their own software, GPL does not matter much in practice.

Besides that, for an application only the licenses of libraries that you link against are relevant.

cyphar · on Feb 2, 2016

You can't use the GPL for an entire distribution. You could argue that it's possible to only package GPL-compatible software and sublicense all of it under the GPL. This is impractical because of MPL and LGPL libraries. But even then the distribution isn't under the GPL, so you could inject an "MIT version" of musl and other non-GPL versions of the rest of the software.

tl;dr: That's not how licenses work.

There have been some interesting questions about packaging Ubuntu and their terms of use. I always found their terms of use odd, because they are basically reinstating trademark law inside a software license.

bmajz · on Feb 2, 2016

While it's nice to have small on-disk containers for some applications (e.g. deployment pipelines/CI), for production, I've found that Alpine doesn't save you much in RAM. I'd love to see this become something people look at as well when evaluating base images. To me at least, this was a far more important constraint when running Docker in production.

bmajz · on Feb 2, 2016

To be fair, the case I was testing was node, so the packages could be killing me. Could be much better for a standalone bundled application.

mullsork · on Feb 2, 2016

I thought everyone was doing this already. The first thing I noticed when pushing my first test version of our Node server was that the images was way above what I thought it should be. It seems really odd that the "official" Docker images are this big. Perhaps to enable easy development within the VMs, and not meaning for these to be distributed?

ohdrat · on Feb 2, 2016

I predict that the term "micro" will be replaced by the term "modular", and that docker will create a replacement for the microcontainer named the modular loader/container.

ecesena · on Feb 2, 2016

Related repo with micro containers for each language: https://github.com/iron-io/dockers

jorgecurio · on Feb 2, 2016

how are you guys using docker btw? I keep hearing about it, I like it but have almost to no reason to use it.

tombh · on Feb 2, 2016

Amongst other things I use it to run Skype on my Linux laptop, cos I don't trust Microsoft.

https://github.com/sameersbn/docker-skype

kasey_junk · on Feb 2, 2016

What trust does docker give you? It seems less secure in this instance than properly permissioning skype or creating a true vm for it?

mikewhy · on Feb 2, 2016

We use it in development to match production environments

We ditched Heroku + S3 for dev/test/staging branches (if your app can be expressed in a docker-compose.yml file you can get a server up with the command `b3cmd server-scaffold`)

Many small tools for random workflows

cyphar · on Feb 2, 2016

You can use it to run desktop apps if you're as anal as me about having a clean host machine. I also use it to run all of my small services (my website, IRC bouncer, etc).

dc_new · on Feb 2, 2016

I use it for dev to more closely mock prod, as well as ensure repeatable builds. It allows me to run all the services we use locally without a separate vm for each.