Hacker News new | past | comments | ask | show | jobs | submit login
Super small Docker image based on Alpine Linux (github.com/gliderlabs)
261 points by antouank on Dec 23, 2015 | hide | past | favorite | 159 comments

Size is such a tiny concern. I'm surprised people make such a big deal about it. When all of your images use the same base, it's only a one-time cost anyway.

And there are FAR more important concerns:

- Are the packages in your base system well maintained and updated with security fixes?

- Does your base system have longevity? Will it still be maintained a few years from now?

- Does it handle all of the special corner cases that Docker causes?

That's why I use https://github.com/phusion/baseimage-docker

Sorry - but the phusion images are unnecessarily bloated. The existence of the them has been defended by 'fixing' many so-called problems that are actually no problem at all - or at least shouldn't be a problem if you know what the hell you're doing. No well, written software won't spawn zombie-processes - sorry. Reaping dead child processes is something pretty basic if you're using "fork".

And then - a logger daemon. Guess mounting /dev/log into a container is too complex if you care about this?

Logrotate - sure, useful - but if you care about logs and aren't sending them to your logger daemon or /dev/null, you probably want to store them externally - in a volume or mounted host directory - and have a separate container taking care of that.

The ssh server... Containers are no vm's, if you have to log in on a container running in production - you're doing something wrong - unless that container's only job is running SSH (which can be useful for example for Jenkins build slaves).

Cron - again - same thing: run in a separate container and give access to the exact things your cronjob needs.

That is for me the essential thing about containers: separate everything. But sure, you could treat containers as a special VM only for one service - nobody is going to stop you. I however prefer isolating every single process and explicitly telling it how to communicate with other processes. It's sane from many perspectives: security, maintainability, flexibility and speed.

> Containers are no vm's

A container is whatever you want it to be. Single process? Sure. Full OS? Sure. Somewhere in between? Sure.

Containers are not new technology, and they were not invented by Docker or Linux. An artificially-constrained view of what a container is (or should be) that's driven by one tool's marketing (Docker) isn't helpful.

Sorry, but it's not only Docker using 'containers' that way. I'm no fan of systemd for various other reasons - but that is one thing it does correctly: use namespaces aka 'containers' to separate processes.

It simply makes no sense to add additional unnecessary overhead and complexity to something that is essentially very lightweight. If you want a full-blown OS - a VM is much better suited at that, and modern hypervisors come with a ton of bells and whistles to help you manage full-os environments.

LXC is using containers in the same manner as VMs. There are still reasons to use a container over a VM. To name a big one, application density. There's a Canonical page about it I can dig up if you want that claims you can get ~14 times the OS density with LXC containers that you can with KVM VMs. That allows you to provide a high degree of separation while still allowing you to use more traditional tools to manage it.

Not everyone is of the caliber that tends to browse HN. Not everyone adapts to new technology as quickly as people around here tend to, especially if that new technology requires a huge upheaval in the way that things have been done for the last 10 or 15 years. Using containers the same way we do VMs provides a lot of the benefits of containers without requiring a drastic change from other departments.

Scalability of LXC vs a HW VM was written up by a Canonical engineer here:


I've had upto 512 LXC nested containers running quagga for bgp & osp to simulate "the internet". My machine is an i7 laptop and this used less than 8-10 gigs of ram to run.

fyi the github of "The Internet" setup was from the 2014 NSEC conference where they used it so the participants had a large internet routing simulation available to test security.

The github for "The Internet" simulation is here:


"The Internet" creates 1 single LXC parent/master container and then 500+ Nested LXC containers each running quagga & setup for the simulation used.

Containers also have a massive attack surface in comparison with VMs. Modern KVM has a comparable density to containers (except for memory).

I agree on the advantages on LXC though. Many hosting companies use it. Why fix it if it ain't broken?

They're supposedly coming along quite nicely with the security of containers. Can you run docker containers in userspace? It's been a while since I did much with it, I know LXC can with a fair bit of customization. That would do a lot to help with security, and if you're following good containerization principles you should be able to set a really finnicky IDS that shuts down containers on even the slightest hint of a breach.

> Modern KVM has a comparable density to containers (except for memory)

It does, but the memory can make a big difference if you're running microservices. If I'm guesstimating I'm thinking there's probably about a 200MB difference in memory usage between a good container image and a VM. With microservices that can grow quite a bit. Let's say 4 microservices, needing at least 2 of each for redundancy, you're already looking at a difference of 1.6GB of memory. If you need to massively scale those that's .8GB of memory for every host you add, not including any efficiency gains from applications running on containers rather than VMs (which is going to be largely negligible unless we're talking a massive scale).

You can create either privileged or unprivileged LXC containers. Creating Unprivileged containers only requires a very simple configuration that takes 60 seconds to do.

Here's Stephane Graber's blog on it: https://www.stgraber.org/2014/01/17/lxc-1-0-unprivileged-con...

Also, note that with LXD/LXC the "default" container is now unprivileged. Also with LXD/LXC the LXC command syntax is now simplified even more than it was with traditional LXC but with the added power of being able to orchestrate and manage LXC containers either remotely or locally.


> Can you run docker containers in userspace?

Yes, and it increases the attack surface even more in some scenarios. Now, an unprivileged user can create new namespaces and do all sorts of things which were previously limited to root.

With "clear containers" (very minimal KVM VMs), you get the overhead down to <20MB:


Also, RAM is cheap.

Today you can run Docker in LXC and you can run KVM in an LXC container.

LXC also supports Nested LXC.

The scheduled release of LXC 2.0 and LXD 1.0 sometime around mid to late January.

This will also include support for live migration/CRIU.

LXC (www.linuxcontainers.org) supports Apparmor, SElinux, Seccomp and what’s probably the only way of making a container actually safe LXC has supported user namespaces since the LXC 1.0 release in 2014.

Yeah that's cool, but my main point is that images which make use of the stable debian package system and are actively maintained are a better approach than an image that makes use of more obscure technology that could be abandoned, or worse, maintaining your own container infrastructure.

> No well, written software won't spawn zombie-processes - sorry.

And yet it happens.

> The ssh server... Containers are no vm's, if you have to log in on a container running in production - you're doing something wrong

The SSH server is incredibly useful for diagnosing problems in production, so I for one applaud it (although it's not really necessary anymore with docker exec).

> Cron - again - same thing: run in a separate container and give access to the exact things your cronjob needs.

Or just run it in-container to keep your service clusters together.

> That is for me the essential thing about containers: separate everything.

It's a question of degree. Where you draw the line is almost always a personal, aesthetic choice.

>And yet it happens.

I can understand that argument. It's an edge case, and building a sane Dockerfile on top of Alpine that runs applications through S6 (or runit), which developers use for their applications is the way to go for me. This is what phusion baked in?

>The SSH server is incredibly useful [...] (although it's not really necessary anymore with docker exec).

It's an additional attack vector and, by your own admission, it's useless. docker exec has been baked into docker for over a year.

>Or just run [cron] in-container to keep your service clusters together.

Per-container cron sounds painful. Then you have to deal with keeping every container's system time in sync with the host (yes, they can deviate). Not only that, if you have a periodic cron job that runs an app to update some database value, scaling becomes bottlenecked and race conditions (and data races) can get introduced. You are prevented from running multiple instances of one application to alleviate load because the container has the side-effect of running some scheduled job. Cron should be separate.

One can also choose the degree to which they want to throw out good practices that prevent them from repeating others' mistakes.

Have you ever seen a container's system time deviate from a host? This makes sense with boot2docker since it runs in a VM but I can't think of a reason this would happen in a container.

Yes, time keeping is up to the host kernel. The time can't deviate in the container.

>> No well, written software won't spawn zombie-processes - sorry. > And yet it happens.

Strange, I have been running software in docker for almost 2 years in production on 6 docker hosts running a ton of containers these days, and yes - a lot of this software spawns child-processes.

In all this time I have never seen zombie processes with one major execption: Phusion Passenger to run our Redmine instance. If you run this under supervisord as 'init' process - you indeed notice the init process cleaning up "zombie processes" at startup like this:

2015-12-24 01:00:32,273 CRIT reaped unknown pid 600) 2015-12-24 01:00:34,774 CRIT reaped unknown pid 594) 2015-12-24 01:00:35,802 CRIT reaped unknown pid 610)

So that case for me is the exception, and I do use an init process (supervisord) to run only apache with passenger. Note that using Apache with PHP or plain does not leak zombie processes.

Some things you really can't split into one-process-per-container. Like how WAL-E needs to run alongside the Postgres daemon (or at least, I was unable to get it to run otherwise). You might argue you shouldn't run Postgres in a Docker container, but that's just one example of IPC you can't delegate to shared files / TCP ports.

The real problem with splitting things into a bunch of containers is that the story around container orchestration is still poor. Kubernetes is the leader here, but running a production-ready cluster takes some work (besides Google Container Engine, there are some nice turn-key solutions for spinning up a cluster on AWS but they come with short-lived certificates and rigid CloudFormation scripts which create separate VPCs; so you have to setup your own PKI and tweak CloudFormation scripts).

I see no reason why it couldn't run in a separate container. You'd probably have to mount the postgres socket directory and the WAL archive dir into it, but it could be tricky - true. But containers are just a tool. Some things are not suitable to run in containers, don't try to shoe-horn everything into them.

Other than that, there's no problem running postgres itself in a container - as long as your data is stored in a volume ending up being bind-mounted on the local disk, and not on the layered filesystem - otherwise performance will suffer badly.

And yes - orchestration - especially on small-scale - is still a sore point. All the tools like kubernetes seem to focus on large scale and scaling multiple instances of the same containers - which is not what I and many people need. Something like docker-composer, but in daemon form would be nice.

Personally, I've run into weird issues sharing sockets and other files that need to be read+write on both containers. One thing is you have to set up the users very carefully/similarly in both containers, due to file ownership issues with bind mounts (UIDs have to align in both containers).

Agreed about not shoehorning things into containers. Redis, for instance, should be ran with custom kernel parameters (transparent huge pages disabled), so doesn't fit well in the container paradigm since containers share the same kernel.

Agree in general, but you can overdo it with splitting services up. E.g. would you really run a extra container just for a cronjob that runs once a night to e-mail some data from a database? Esp. if you run on a platform where you essentially pay per container that seems like a waste.

Most of the things I described assume you have full control over your host's OS.

For stuff like you mention - you should maybe reconsider not using containers if you're on a pay-per-container platform? They are just a tool, and certainly don't fit every single use-case. Also - paying per container seems like a silly thing to do - since containers can be very short-lived. Resource-based billing would be a better fit - although that could be tricky to measure I guess.

I'm currently toying with IBM bluemix (mostly because they have a relatively big free tier) and they have resource-based billing, but you since can't make containers arbitrarily small and you pay for RAM reserved for a container, it is effectively per container. So even if you only need 1 GB for 30 min every night, you either build something that starts a worker container on schedule or you pay for resources you don't use 98% of the time. I guess other platforms are similar.

But of course, if you can afford to use that in production it probably doesn't matter very much, and you might choose a different platform if it bugs you. Just came to mind because I just was wondering how to split stuff up.

Size of programs, in terms of disk, memory, cpu time, and network usage, is bloated by multiple orders of magnitude by all the confused people who think the only thing that matters is "developer productivity". Maybe 20% is worth sacrificing, maybe 50%, but 100x? 1000x? It all adds up.

One really easy and relevant example, sizes of docker images for running memcached:

  vagrant@dockerdev:~$ sudo docker images | grep memcached
  memcached                     latest              0868b36194d3        2 weeks ago         132.2 MB
  sylvainlasnier/memcached      latest              97a88c3744ef        13 months ago       297.4 MB
  ploxiln/memcached             2015-07-08          aa4a87ee2c05        5 months ago        7.453 MB
(that last one is my own, the other two are the two most popular on docker hub).

As another example, a co-worker recently was working with some (out-of-tree) gstreamer plugins, and the most convenient way to do so was with a docker image in which all the major gstreamer dependencies, the latest version of gstreamer, and the out-of-tree plugins were built from source. The offered image was over 10GB and 30 layers, took quite a while to download, and a surprising number of seconds to run. With just a few tweaks it was reduced to 1.1GB and a handful of layers which runs in less than a second. It was just a total lack of care for efficiency that made it 10x less efficient in every way, enough to actually reduce developer productivity.

Size matters.

> the confused people who think the only thing that matters is "developer productivity".

Developers, especially good developers (or hell, even just competent) are more than worth the effort put into improving their productivity, and the good ones will usually intuitively have a grasp of the XKCD time trade-off graph and reduce or eliminate delays themselves given the chance.

That being said, even in this day and age of extremely cheap cycles, non-volatile and volatile storage, and insane throughput, making something like VM/chroot images smaller can lead to higher productivity in that you can spin them up faster, or spin up tons more in paralell than you would normally think of. Having the option to do such can help shape alternate modes of development and open up possibilities previously undreamt of ("spin up 1000 docker images? Can't do that because they each need 200MB RAM and I only have 32GB of RAM").

Size of cruft aside, there's value in discussing whether such cruft should exist.

It's normal for common tools to be SUID root - it's necessary for operation on a normal machine. Do you really need 30+ SUID binaries inside your Docker container built for one thing?

Docker seems to present an ideal situation for stripping such potential exploit vectors back.

Are you able to share any of the tweaks that were used?

One really easy one: write a shell script to do most of the image building (run by the Dockerfile), instead of adding a bunch of RUN directives in the Dockerfile, especially if you clean up intermediate files with a "make clean" or something. Each directive in the Dockerfile adds a layer, which adds container setup overhead, and also "locks in" all filesystem space usage at that point.

Container size influences memory usage, which is an important concern:


I'd be careful about drawing conclusions from those tests. We know that the number of bits in a container does not directly influence how much RAM it consumes. Therefore, there must be something the images are doing that consumes memory which is not happening in the "smaller" images. The key would be to find the culpable process or daemon(s).

It could well be due to things like shared libraries. A larger distro will have more options enabled, causing more shared libraries to be linked into the same running processes, and thus more shared libraries to be fully loaded into memory.

A smaller distro might even statically compile most things - Alpine does. If you dynamically link shared libraries, the whole library is loaded into memory to serve the process. If you statically link, only the actually used part of the library is included in the binary.

Statically linked binaries can't share the library in memory between each other like dynamically linked binaries can, but if all your processes are running in separate containers, they won't share those libraries anyway (unless they're all in a VM and the rarely used "samepage merging" is enabled for the VM).

Finally ... simplicity has knock-on effects. Making things simpler and smaller (not easier), and reducing the number of moving parts in the implementation, makes cleaning up more stuff easier.

That's not really how it works. Both executables and shared libraries are mapped into the virtual address space of the process, then only stuff that is actually used will be faulted (read) into physical memory. At page granularity, so yes, there is some bloat due to unused functionality, but it's not as bad as requiring the entire thing to be loaded into memory.

That's an awful lot of conjecture. I'd wager that most of what you would actually be running in a container would not have its memory usage significantly affected by the presence or absence of optional shared libs.. I'm with the parent on this; such claims warrant research.

Not really, it was an educated guess, and then a description of how binaries and libraries work on modern unix systems.

Here's a quick demo based on the trivial example of the memcached docker images I mentioned in another thread:

  vagrant@dockerdev:/host/scratch/janus-gateway$ sudo docker run --name=mc_big --detach --publish=11212:11211 --user=nobody sylvainlasnier/memcached /usr/bin/memcached -v -m 64 -c 1024
  vagrant@dockerdev:/host/scratch/janus-gateway$ sudo docker run --name=mc_small --detach --publish=11213:11211 --user=nobody ploxiln/memcached /bin/memcached -v -m 64 -c 1024
  vagrant@dockerdev:/host/scratch/janus-gateway$ top c -b -n1 | grep 'COMMAND\|memcached'
   5984 nobody    20   0  316960   1192    768 S   0.0  0.1   0:00.02 /usr/bin/memcached -v -m 64 -c 1024
   6091 nobody    20   0  305256    780    412 S   0.0  0.0   0:00.00 /bin/memcached -v -m 64 -c 1024
Notice the significant difference in RES (resident set size) and SHR (shared memory). Less trivial processes will have more shared libraries and bigger differences here. Multiply this kind of result times all the contained processes. It adds up.

Sorry, I was responding to your post in the context of logician's "an important concern" assertion. You and jabl are correct technically of course.

Within the context of "an important concern" though; the difference in RES and SHR between the two is about ~330kb. I suspect most people wouldn't find that significant particularly given memcached's common use cases.

Are you sure that is the general case? I have no reason to believe that this is true in general.

No I am actually skeptical myself, I was hoping somebody here would explain the real cause of their findings :)

As a sysadmin, I fully agree. I shudder at the thought of using Alpine Linux (or Arch Linux, or...) in production.

The value that stable, long term support distros provide shouldn't be underestimated.

Using Gentoo stable in production right now. I'm in charge of how long a package is supported now. All execs get a brand new gentoo machine built with binaries compiled by myself.

You wouldn't believe how fast you can get a gentoo machine up and running compared to other distros. Build for a minimum common architechture (all intel binaries are based on Sandy Bridge, all ARM based on Rockchip RK3088), and installing for new computer is little more than untarring a bunch of binaries to /. My record is 5 minutes for a full KDE Plasma 5.5 software stack.

I explicitly did not mention Gentoo - I know a bunch of people who run it in production. But, for anyone considering to do this - if you're running Gentoo, you're essentially building your own distro, which has massive advantages but is also a huge effort. You're now in charge of security updates, maintenance and Q&A. What if you're leaving the company? There are many Debian or Redhat admins, but good luck finding a Gentoo expert.

I train my replacements, much like every Sith should.

We use Alpine Linux for our applications and I like it, and I too shudder at it being used for the entire production system. As a sysadmin, you can still administer the LTS distro that hosts the docker containers and whatever other pieces of the stack you interact with. Alpine Linux containers, like any other container, should host an instance of an application (maybe not even that, depending on how complex the application is) -- not the entire production server, not SSH keys, not iptables, firewall rules, etc.

"Will it still be maintained a few years from now?"

What would you suggest? Debian? Ubuntu LTS?

Both. Debian is the gold standard of long term support, and Ubuntu is a stable company that builds upon this. And that's why I approve of phusion/baseimage being based off it.


Or CentOS 7 (Docker is not supported any more on 6.x).

CentOS is the gold standard of ANCIENT packages. I think they still ship Ruby 1.8.

RHEL (and by extension CentOS) 7 provides Ruby 2.0. And a 3.10 kernel even. If you're running docker with CentOS, this is what you're likely to use.

RHEL/CentOS 6 provides Ruby 1.8.7 and a 2.6.32 kernel. It can be made to run with docker, but it's unsupported and it won't be easy.

RHEL/CentOS 5 provides Ruby 1.8.5. The 2.6.18 kernel it comes with won't even run go binaries such as docker, much less lxc. Yes this is ancient. It was released in 2007 and it will be supported until 2017.

I like this. It makes a lot of sense to use something like Alpine Linux for Docker images. If you're going to build a 'process container' like Docker -- something that does not encourage the same mindset as a traditional container or VM -- it makes sense to start with a stripped-down operating system and then build it up to be exactly what you need.

Perhaps loud suggestions like these are necessary due to a bias in the group that uses Docker. 20% people that really know what they're doing and have chosen Docker for a specific reason, and the hangers-on who try to emulate them by using the same tools.

Docker is an interesting, useful, and extremely overhyped tool. I may be wrong, but I feel like its popularity has caused a bunch of people who don't really need Docker to use it. Besides popularity, they can't really explain why they are using it over something like LXC or FreeBSD jails.

I imagine (again, no data to back this up) that this same large percentage of people are also the ones that just keep using the default Ubuntu image once they finish the "Get Started" tutorial.

I'm glad to see suggestions like this gaining popularity. If you're going to make the most of Docker, I think there's value to be found in really committing to the mindset of a 'purpose-built, no-frills environment for running a single process.'

From what I can tell right now, a huge number of people are using Docker "sort of like a VM but you need more of them, and Git is integrated and you have to tell it to do something or it stops running".

I think one of the main problems with Alpine adaption is due to how the official run-times are set up on the Docker hub.

- Python-slim uses debian:jessie

- Ruby-slim uses debian:jessie

- Node-slim uses debian:jessie

Your web application is probably going to pull in from one of those run-times which automatically sets you up to use jessie.

I'd also like to see someone take a random large project and see if their native extensions compile under Alpine without any other dependencies and to compare the final image size of a real world web app with alpine vs jessie.

It's sort of a micro benchmark to compare it like this because a project with 75 gems/packages and a couple of native extensions that need to be compiled will drastically increase the size of your image, with or without Alpine.

I absolutely do think it's worth optimizing your images, but this seems like something that may end up being quite personal to your app because it will require a bit of tinkering to get everything your app needs to work. I also wouldn't bother doing it until I was constantly pulling them down in production to auto-scale.

actually, in this case it totally makes sense. the issue is that alpine uses musl, and many things only compile for glibc. if you're writing in any of those languages, chances are you're going to install a library that requires some c compilation (yaml parsing, database libraries, numeric processing, etc) and this becomes an issue

*edit: which is what you were saying all along, and this didn't sound enough like "yes I agree"

Have you had trouble with any specific libraries? We're using alpine-based images with statically-linked binaries and haven't had any issues compiling third-party libs. One area you're likely to run into trouble is RPC, but I only discovered that in messing about with something experimental.

The real problem with musl in these environments is its DNS behavior, particularly if you're running on a platform like Kubernetes that uses DNS search domains for service discovery. Not hard to work around, but the workarounds are a bit, er, inelegant. See http://www.openwall.com/lists/musl/2015/09/04/4 and https://github.com/gliderlabs/docker-alpine/issues/8

Yes, I found out Rails containers were problematic with Alpine because therubyracer would segfault when run under musl. It appeared to be a known issue at the time, though I haven't looked into whether it's fixed.

Go has official images based both on alpine and wheezy, so you can choose https://hub.docker.com/_/golang/ in case you have issues with C extensions. Most upstream projects are happy to take patches to work with Musl, and with docker it is much easier to replicate issues than it used to be when you had to install the distro.

Yep, that is true. I can't remember the last time I worked on an app that didn't require compiling 1 or more dependencies.

As for the edit, sorry about that. I edited my comment about a minute after posting it.

yeah totally. my rule of thumb is that if it's got c extensions, Debian is fine. if it's compiled, you'll be doing that outside your container anyway (because who wants gcc in prod aye?) so you may as well copy your 20kb binary into alpine rather than Debian

I'm pretty new to Docker, so I'm curious about "a project with 75 gems/packages and a couple of native extensions that need to be compiled"...

Is the common procedure in the Docker world to build an application image that includes all the build tools that were used to build native dependencies? That seems like it does generate a pretty large image.

I figured I'd take a three-step approach to my first node.js app in Docker:

1. Build an image to build my dependencies. This uses the same base image as step #2 will, but installs all the development tools and libraries (eg. build-essentials, libpq5-dev), and then outputs a .tar.gz to a shared volume containing my node_modules folder.

2. Build an image with my dependencies; imports the runtime versions of any libraries (eg. libpq5), imports & expands the .tar.gz generated by #1.

3. Build an image with my application, FROM the image in #2.

The process is optimized by having the automation check for the existence of #2 by hashing the contents of the relevant Dockerfiles, and the package.json list of dependencies, and doing a `docker pull` with that hash to see if I've already built #2. If so, my build just needs to build #3.

It's a bit more complex (Hello, everything in Docker-land), but ends up being pretty powerful. But your post makes me think I've over-complexified it a bit.

My suggestion is to build a package installer for your app and use that to build the final image. For example, we use fpm (running in a container) to build .deb packages, then we push those to an apt repository (artifactory) and then build images downstream using apt-get.

Initially we did a lot of cloning from source and compiling/installing dependencies, but it's very slow, there's a lot of wasted time in rebuilding identical code, and it's hard to provide patches and upgrades to customers.

Yours isn't overly complex, it's one way to trim down an image. However, it is a lot more complicated than just defining 1 Dockerfile that at least copies in your package.json file separately to speed up future deploys that don't touch your packages.

I guess I just don't see the time vs. effort value in optimizing most smaller projects.

For example, that 75 gem project may take 5 minutes to build once but after that it takes 10 seconds to build and push a version that updates the app code.

I'm ok with this pattern for most of my projects because you can easily get by with 1 host to serve tens of thousands of requests a month on most typical projects. It's not like I'm spinning up and destroying dozens of instances a day where the network overhead is a legit concern (if I were, then I would optimize).

I simply build all required packages externally to the container and then bake the resulting binaries into the image by adding the requisite file trees. Fairly easy to script after the first two or three attempts, really.

Alpine is a great example of a lightweight, container first approach.

Please stop using Ubuntu as your base images people!

There's no excuse for using a full Linux distribution (especially one that's really aimed at the desktop) for a container unless you're doing a staged migration or something along those lines.

* Edit: Formatting.

No excuse? How about it works well enough? It's a semi-consensus, if you've used any other Docker image you already have the base downloaded, the distribution is familiar and everything offers Ubuntu debs. If you like Alpine that's cool, but why do people always have to make strong prescriptive claims out of their opinions?

"and everything offers Ubuntu debs"

This, great hardware support, and decent stability are the main reason I use Debian-based distros. Almost anything is a deb away.

> Please stop using Ubuntu as your base images people!

I think Ubuntu images are a symptom of a much more serious disease: Ubuntu usage in general. Ubuntu is not really concerned with software freedom (its origin was Debian-plus-proprietary-blobs), nor does it strongly care about privacy (although it can be shamed into doing the right thing), nor does it care terribly much about getting along with everyone else (c.f. Mir vs. Wayland).

As a distro for my family, it's fine. But I expect my fellow developers to run something which indicates more technological prowess than does Ubuntu: Debian or Arch or Gentoo or Slackware are all good choices for different reasons.

Except that Ubuntu provided a decent Linux on my desktop that is relatively polished to be used by a normal human being.

I did more than Debian, Redhat, Mandriva, Gentoo and other could do even though they had a longer head start.

Once I run that on my desktop, I don't really want to learn another distro, I'll just use that the server as well.

> run something which indicates more technological prowess than does Ubuntu:

There one difference between how I develop -- I don't develop to show my technological prowess, in fact when I do that, I start making mistakes and generate complicated and hard to maintain systems.

> Except that Ubuntu provided a decent Linux on my desktop that is relatively polished to be used by a normal human being.

They did, and they should be congratulated for that. I like to believe that Debian learnt a hard lesson from its long delay.

> Once I run that on my desktop, I don't really want to learn another distro, I'll just use that the server as well.

Ubuntu on the server has essentially been Debian unstable-ish. It's not really a case of learning another distro.

Your argument would also apply to running OS X Server, and I don't think anyone outside of Cupertino thinks that's a good idea…

> There one difference between how I develop -- I don't develop to show my technological prowess

If you like, substitute 'competence' for 'prowess.' Running Ubuntu is like running Windows: it's popular; it's not really wrong; it even has advantages; but running Windows doesn't indicate any level of competence. In Bayesian terms, P(competence | Ubuntu) < P(competence | ~Ubuntu).

> Your argument would also apply to running OS X Server,

And it does! Old work had a few in house servers with OS X. If it was free, we'd see a lot more of it, I am convinced.

> If you like, substitute 'competence' for 'prowess.' Running Ubuntu is like running Windows: it's popular;

Isn't the ability to quickly ship a stable, reliable product that customers are happy to pay for, a better sign of competence than say picking Slack or FreeBSD for server for now good reason except to show competence?

The question is who is the show of competence for? Other developers, customers, management? I can see developers boasting who knows how to configure and run obscure distros and use exotic functional languages and that's cool. I was just saying after a while you realize that show of prowess is not what is important.

> Isn't the ability to quickly ship a stable, reliable product that customers are happy to pay for, a better sign of competence than say picking Slack or FreeBSD for server for now good reason except to show competence?

Sure! What I'm saying is that if someone is unable or unwilling to run something other than Ubuntu then I suspect he is less likely to be able to build that stable, reliable product in the first place.

It's like how I suspect I'm likely to have a better meal if the cook prepares it from fresh ingredients than if everything comes pre-made off of a truck.


    Hey, what does 'Ubuntu' mean anyway?
    It's a South African word meaning, "doesn't know how to install Debian"

I've run Ubuntu for some years, mostly because software was newer in Ubuntu than Debian (which I've used before).

With One-Service-Per-Docker I think about migrating back to Debian, as I can chose the version of my service myself (with curl if needed) and don't need the newest versions in my base image.

I have the feeling with Docker over time Debian might have a comeback.

Why not? Let's run the numbers.

* http://wiki.alpinelinux.org/wiki/Special:ActiveUsers

14 active users. That is not a lot. In fact, that is tiny. Maybe their redmine has more? http://bugs.alpinelinux.org/projects/alpine says 16 users in the 'developers' group.

But maybe they're very pro-active on the mailinglist? Let's check their security announce list:

* http://lists.alpinelinux.org/alpine-security/

1 message. From 2009. Hmm. Well, their alpine-devel list then? 5110 messages in 10 years. 9 Messages per week. By comparison: the debian developers mailinglist had 492 messages in November 2015 alone.

So even though Alpine Linux looks nice and lean, it is maintained by a very small group of developers.

Now. If building your own container were very very hard, I'd sure understand grasping for something like Alpine.

But here's how you build an Ubuntu Trusty container:

     debootstrap --variant="minbase" --include="systemd-sysv" trusty ${TMPDIR} ${UBUNTU_MIRROR}
     chroot ${TMPDIR} 
     dpkg-divert --local --rename --add /sbin/initctl
     ln -sf /bin/true /sbin/initctl
     dpkg-divert --local --rename /usr/bin/ischroot 
     ln -sf /bin/true /usr/bin/ischroot
Just tarball it and throw it in 'docker import'. Your done.

Need to add or remove software? Use Ansible to configure specific containers with specific confgurations.

Need security-updates? chroot into the folder, apt-get update; apt-get upgrade.

Throw the new tarball into docker import again. You use long-term supported methods and systems. There are tens of thousands of packages. Bazillions of PPAs to use.

Edit: typo in codesnippet

OpenBSD is likewise developed by a small number of users. A Linux distribution is arguably easier to maintain for a smaller group because the core components are developed upstream.

Alpine does well at making security measures like SELinux accessible. Meta-distributions like Debian serve a different purpose.

Alpine Linux supporting selinux is only relevant in this discussion if you run Alpine as the container host. To the Alpine containers it is of no consequence.

OpenBSD has more than a great track record on security, maintainability, community spirit. As has Debian.

Alpine, after ten years, was simply not on the radar as a distro.

It is merely developers that do not seem to care about the actual systems these containers are built from that find Alpine interesting.

It's small, so even on a 3g connection you can download those containers and get the functionality a developer seeks. Fast. And that is fine. It gets alpha code out in a timely manner without too many resources.

Just do not pretend that this way of working will deliver sustainable, maintainable and consistent code that will work just as well inside as well as outside containers.

Maintained, secure, stable and proven distributions have served any purpose given in the past. From embedded systems to HPCs, from trading floors to satellites.

Saying any of the "old school" distro's are a bad fit for running in a container is a display of ignorance at best.

quantity-of-people-involved is a terrible, irrelevant metric for code quality.

Maybe, but it's a fantastic metric when guessing how many undiscovered bugs and security vulnerabilities there are.

GP also goes into some detail about the amount of discussion and updates to Alpine Linux, which are excellent metrics for code quality.

Nonsense, it's just a straight up ridiculously uncorrelated, terrible metric. Which would you say has been more buggy and broken, djbdns or php? nacl or mysql? qnx or openssl? windows 95 or ping? Which one of each do you think has had more "discussion and updates"? Which one of you think is better code?

Code quality has nothing to do with how much jibber-jabber there is on some mailing list, nor with how widely used a piece of code is. It has to do with the actual code.

In the case of Alpine Linux (which I've never used), probably 50% of the code is the linux kernel itself, another 20% is musl and busybox, and the rest is random gnu utilities. Which of those things is 'low quality' and has 'undiscovered bugs and security vulnerabilities' that broken, random, low-quality high-politics tire fires like most linux distributions don't have?

But conversely, is it not intrinsically obvious that not having the grotesque pile of random freshman desktop apps and terrible init systems that other distros have, could reduce the attack surface to a point where a single organization could conceivably make sense of it?

You are correct on all points concerning the quality of code of Alpine Linux. I do not doubt it. But it is irrelevant to the discussion. The Linux kernel is not part of the containers that are based off of Alpine. That is the whole point of this level of virtualization: sharing the kernel.

Furthermore, the problem I have with Alpine-based containers is that using those as the basis of tooling used for building your own product, your own product will have a hard time becoming maintainable, sustainable an secure.

I've had developers doing make; make install in Dockerfiles just because Alpine doesn't have some library or version packaged.

Containerization brings all manner of sweetness to the table, but the current way it is used is a throwback to 1998.

Not having desktop software inside a small container does reduce the attack surface. Debian, Ubuntu, Centos can handle that requirement just fine. What is your point?

Your sentences don't make sense next to each other. If you're unable to point to any fault in the quality of Alpine Linux, then why are you trying to create FUD about how Alpine Linux is unmaintainable, unsustainable, and insecure? Could you maybe, instead of just repeating it over and over without evidence, provide some example of how Alpine is concretely any one of those things?

While you're at it, please show me the Debian, Ubuntu, or CentOS distribution that doesn't have desktop bus installed. I'll wait.

Appologies if I am unclear.

> why are you trying to create FUD about how Alpine Linux is unmaintainable, unsustainable, and insecure?

I never tried to make that claim.

What I am trying to say is that if YOU built YOUR software against Alpine, IT will be hard to maintain/sustain/insecure. Because your software will probably have dependencies. Dependencies not found in Alpine. And now you have to maintain and test those dependencies. You'll have to keep informed on all the security advisories of those dependencies. All the changelogs. And by then, you've started to reinvent wheels that the fine folks of Debian, Ubuntu, Centos have invented already.

That is a resource drain on companies that is inefficient and cumbersome with little to no added value.

> While you're at it, please show me the Debian, Ubuntu, or CentOS distribution that doesn't have desktop bus installed.

A container is not the same beast as a distribution. It does not have the same requirements. It is just a tarball. And you can throw anything into it, or out of it.

I'm just saying to use debootstrap to throw stuff in that tarball so you have the benefits of an enterprise-level, proven distribution, instead of using this something that has not yet proven itself. So if you ever need to take your software OUT of the container and run it on an AWS instance, or on your own hardware, you'll have no problem with it.

In short: I see no added value for Alpine. It does not address my operational concerns, and raises a bucketload of new ones when I compare it to Debian, Ubuntu or Centos.

That makes a little more sense, thanks. Although, I will disagree that your software will probably have dependencies that are not in Alpine; I tested it out and installed a large software stack and found no such issue. And I think you radically underestimate the crumminess and incompetence of the, e.g., debian package system. Nevertheless, good luck with your systems and Merry Christmas.

Who said anything about the quality of their code?

We took alpine, added the oracle jre, and use it as a base image for our clojure apps. Seems to be a reasonable choice for that use-case since you just need something that will start your jar.

I'd have thought the same would be true for apps written in Go.

You can actually take things way further in Go: https://blog.codeship.com/building-minimal-docker-containers...

You're probably right, but have you ever thought that maybe Debian has so much mailing list activity because it is such a huge project? There's something to be said for lightweight software. Of course, workaday devs should use the accepted best practices, but can't people experiment with better techniques until they have been proven out?

A huge community and installable packages does not mean you need to use all those packages. The smallest container I've built is about 90MB, using Ubuntu. That is pretty lightweight. Of course, the container doesn't actually do anything....

Another thing to consider: if you software works inside you self-built Ubuntu container you can be pretty sure it works on any Ubuntu install anywhere. Even if your company does not use containers everywhere, your developers can.

Edit: typo and sentence finishing.

Alpine Linux has been a fantastic choice for docker images. Small, light and has a package manager that keeps getting better. I had one big snag using it with kubernetes, DNS based service discovery didn't work (https://bugs.alpinelinux.org/issues/4371). The work around was fairly simple and used env vars, but if you dig on this bug I'm not sure when (if ever) that will get fixed.

Interestingly, alpine doesn't use systemd.

I'd like to point out that a docker container, usually doesn't run an init system. [I get that OP was talking about standalone alpine]

We use phusion baseimage, which does have an init system (runit). We have been happy with that. Our customers are not ready for us to provide our product as 6 distinct linked docker containers. The image we build is more like a VM in that it contains multiple processes (pg, nginx, and some apps/services).

If you're providing an all-in-one solution like that, it sounds like you would be better served by just providing an OVF image to them.

Or better yet, keep 6 distinct containers like you should, and use docker-compose to spin them up together. Makes upgrades easier because you can simply point to a newer version of whatever service containers you're using without having to take out everything.

If you're using containers as a VM, you're essentially adding overhead for no good reason.

I am not sure if I saw it HN recently, but there is an article by Phusion [1] talking about the init process and how to handle it with Docker. They have a base image, baseimage-docker [2], that is supposed to solve that problem.

[1]: https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zomb... [2]: https://phusion.github.io/baseimage-docker/

I've used this a bunch. Its great. Runit is super simple too.

I remember reading centos tried to make systemd available in docker containers briefly but went back on it due to bugs (not sure if they were ever resolved). I've seen containers that need init management using supervisord.

Why is this downvoted? It is interesting (and worth pointing out), since not using systemd is different from every "enterprise" Linux distro out there.

Sorry that was me, when I read it I thought it was a snarky 'systemd vs bla' comment, upon re-reading I see that you could well be correct so I appologise for that, I will be less hasty next time.

Actually, I take it back -- you are probably wiser than me, pre-empting the systemd battles that always ensue :-D.

Alpine really wouldn't fall into the Enterprise category due to it's small size and developer pool. It's a great community project that I wouldn't doubt has some form of corporate support financially.

Personally I don't miss it. IMHO the alpine image is perfect to run single services that don't have too much dependencies. On your host system you can use systemd to spin up these services as docker instances (if you don't need scheduling).

Does upstream systemd build on musl now?

Alpine is so cool. grsec, musl libc, openRC. If you are sad about the direction most distros are headed, Alpine represents a good stab at "the way things should be".

Love Alpine. The biggest use case I've found for small images so far is for testing a microservices system on Travis or a similar CI. You package each of your services in a container, then use docker-compose1[1] to start everything up. The faster it can pull down the images, the faster your CI build runs, so size can be important here.

One interesting thing about Alpine is that it uses MUSL[2] for its libc. If you want the bare minimum image size, you can use a scratch or busybox image and statically compile your binaries[3].

[1] https://docs.docker.com/compose/

[2] http://www.musl-libc.org/

[3] http://blog.xebia.com/create-the-smallest-possible-docker-co...

For simple containers, this is a great base image. I have been using it for an rsync server for a few months. https://github.com/Thrilleratplay/docker-ssh-rsync

The only problem is that the package library is not as extensive as Ubuntu or Arch. https://pkgs.alpinelinux.org/packages

For anyone who is using Digital Ocean, I have written up a little guide on how to install Alpine directly onto a droplet:


When packaging up various npm components, BusyBox was recommended to me as a great solution for creating containers with a low overhead -- I eventually stumbled across Alpine (which is built on top of BusyBox) and have been really happy.

Alpine has a tiny footprint, which is great for wrapping Node.js which itself is tiny; But wait, there's more, Alpine has a great package-manager similar to apt, called apk -- this is what sold me on it over BusyBox.

Alpine looks great and has grsecurity baked in.

I've been struggling to find a lightweight PHP image, they're all huge. I'm starting to see a big advantage for Go in the container world where I've seen containers 10MB in size instead of the 200-600MB I'm getting in Python, PHP or Ruby. The nature of these dynamic languages and their libraries I guess.

Grsecurity is a patch on Linux. The kernel. As far as I know this does nothing for a container unless your container host runs Alpine.

Having been running Docker in a production system now for 6+ months, the size of the images is really a non-issue if you're using the Docker layer cache effectively. We use all of the official language images and builds are under a minute, with deploys happening in ~5-10 seconds (machines generally only need to download a single layer).

Alpine linux is great, however lack of JDK (needed for Java/Scala/Clojure/JRuby/Groovy projects) keeps me from using it. Hopefully this can be fixed soon :)

I remember seeing a blog post[1] about getting JRE running on Alpine containers.

[1]: https://developer.atlassian.com/blog/2015/08/minimal-java-do...

Why can't you build an image with the JDK based on Alpine and use that as your base image for everything else?

glibc. alpine uses musl, and from what I understand this is actually not a thing you can do very easily. not that I use jvm in an way (thankfully)

at the very least, it's probably not worth spending too much time on, because you'd get better optimisations from doing other things

Alpine has Sable, and you can also compile your own OpenJDK (if it doesn't have it already). There are plenty of reasons to use Alpine and just as many not to use it, but lack of a JVM isn't one of them (Pretty much same with Docker- it added nothing of apparently value to me more than that LXC did (or BSD jails like someone above me mentioned), but I'm sure it provides value to some people.)

This is pretty helpful and what I've used.


I haven't yet used Alpine, but from its wiki it seems you can easily create the packages yourself?

What's the difference between FROM gliderlabs/alpine & FROM alpine:latest ?

Love Alpine though I did run into an annoying nginx permission issue: http://lists.alpinelinux.org/alpine-user/0002.html

They're more or less the same, just that gliderlabs/alpine includes the apk-install convenience script that removes the apk cache for you.

Although with 3.3 that isn't necessary anymore if you use the --no-cache flag.

In what context would the 100MB-ish saved help a lot? It feels like such a tiny amount compared to modern storage capacity (the cheapest SSDs are like 100GB). And I wouldn't think of tiny devices (raspberry pi, etc) as good candidates for hosting lots of docker containers.

It's a good practice nonetheless

It's not about only size (though getting something that's 10x-20x smaller is helpful)

It's about getting only what you need, reducing security issues, disk usage, memory usage, etc

Makes sense.

Impressive, I can't think of another time I've seen someone just acknowledge a response like that on the interwebs. Most Impressive.

in this case, it has little to do with storage cost and more to do with network transfer. when you're deploying an Ubuntu container to several nodes, your startup time is probably < 1sec for the app, and a lot more pulling down the image (most of which is totally wasted time)

the thing is, what do you gain out of using Ubuntu over alpine? chances are that its very little. the gains of using alpine are a more efficient, faster deployment system

But in a datacenter, the connection is most likely 1Gbit. Isn't the download time of the image insignificant compared to the boot time of linux?

well, if you have, say 10 nodes on AWS connecting to the docker hub the speed is definitely an issue, because let's be honest here neither are very fast at all

*edit: also, downloads don't just happen in a data centre. chances are your (or many) office connections just really... well, are not very good. also, think of Australia. please think of Australia (our internet is something of a dire situation)

I'm using slugrunner from flynn [0] for deploying my apps. This way I can share a base image, and each compiled slug is about 40mb for ruby apps and 10mb for golang apps. This is similar to how heroku works.

When I deploy, I generate the slug using slugbuilder, push it to a local storage on the same network, and each docker task is instructed to pull the "latest" slug from the slug storage. Containers start after a code update in a couple of seconds.

Continuous deployment can be easy achieved by copying slug from staging to production, similar to how pull docker image each time is currently done.

[0] https://github.com/flynn/flynn/tree/master/slugrunner [1] https://github.com/flynn/flynn/tree/master/slugbuilder

For me it shaves several minutes (about 80% of the total time) off a build/deploy cycle, which is huge.

build time and upload time (huge savings sometimes)

Once you have the base image though, there should be no real difference between alpine / debian / pick your poison. Docker isn't uploading the base image for every single push, and if you've pulled any other container that uses that base image, it's similarly not re-downloading it.

But there are cases in which you'll have to decide not to use alpine. If your application and library needs glibc support, you are pretty much your own - and you have to patch the build scripts to use the busybox tools rather than the gnu ones [eg, sed and other tools].

You can install the Gnu tools in alpine, at which point they replace the busybox versions. They are all packaged up, just not installed by default.

Alpine Linux looks great. Unfortunately there is an outstanding issue (https://github.com/gliderlabs/docker-alpine/issues/8) with its DNS implementation that makes it tough to use in Kubernetes or similar environment that uses DNS for service discovery.

It does look like a fix is on the way though.

Interesting to see Alpine, which began life as an embedded linux project is now being used for server deployments. This has to be a win for open source style of projects. Also reminds me of the time when Kroah-Hartman mentioned that people working on embedded linux ended up improving power efficiency of Linux saving the data center guys tons of money.

> Also reminds me of the time when Kroah-Hartman mentioned that people working on embedded linux ended up improving power efficiency of Linux saving the data center guys tons of money.

This right here. There's so many here decrying Alpine who can't see the bigger picture: having options for different Docker deployments will create possibilities currently undreamt of. Maybe you can't use it on your project; fine, keep on keepin' on with what suits you best, but don't knock another project just for a different approach, especially when it might have huge benefits to the overall environment in the future.

Fun fact: the creator of Alpine Linux, Natanael Copa works at Docker :)

Although not clear from the GitHub link, this is actually the official Docker image of Alpine: https://hub.docker.com/_/alpine/ - see the Dockerfile links.

If you like Alphine and it works for you great, but it's not always an option. If you use Ubuntu and you still want small containers there are a number of options. DockerSlim (http://dockersl.im) is one of these options. A sample node app container build from Ubuntu is around 14MB.

Is that used heavily in production? Looking at a container based on usage, and slimming sounds very risky to me.

Would Alpine be the best idea in-general to use as a replacement for Ubuntu even when not using Docker? What are you giving up by going with Alpine?

Since I've never even heard of Alpine before today, I'd say you are at the very least giving up on a community of users and support, but maybe Alpine will gain a large community and this is the start?

It would be nice if most of the package manager and basic shell commands could be kept out of the docker image too. I think this would require a utilities image that could be mounted during the build of the docker image. Some images would still require a shell for startup scripts or other tasks, but they can include them when needed.

> Alpine Linux has a much more complete and update to date package index

> ...

> (5/5) Installing nodejs (0.10.33-r0)

They should really update their examples, as that makes it look worse than it actually is (the current repository contains 4.2.3).

And while I'm at it: "use % as a wildcard" on the package page? Really?

What I like about Alpine is grsec being bult in, and personally I like grsecs new distribution model. Remove openssh, put tinyssh (curve25519) ossec, hiawatha, a few ither tweaks and you have a pretty secure system. I wish they were more gnu friendly but still...

Anybody using Alpine for a minimal Desktop OS? How does it compare to something like Arch?

I gave it a try, but it's a bit less well documented and has less packages than Arch.

If heading this route, perhaps Guix or Nix are a nice option too. As you can get something declarative and with traceable builds.

I'm hoping a minimal distro that has a nix-like package manager gets mainstream soon.

You can use the Guix package manager on top of any distribution and get declarative builds. https://www.gnu.org/software/guix/manual/html_node/Invoking-... Like the main GNU page says, it's not distribution ready, but it's close. In fact if you run it on top of Arch (just install the minimal requirements via pacman, then defer to guix) or Alpine you've pretty much got your minimal distribution with declarative package management.

Thanks! I'll give it a try then and maybe Nix too!

Of course all the HN comments on a page about making a small docker image are just bitching about the person making a small docker image because of <reasons> that would all make it not a small docker image any more

It's been great for us. We use it for deploying applications such as curl or netcat in containers without overhead of Ubuntu potentially being downloaded onto a host.

Alpine's size combined with its simple package manager is great, I use it for easy image pull, but of course it lacks some packages.

Except, well, I don't care. All my images derive from ubuntu, so I pay the download cost once, due to docker having this whole delta thing going on, and storage cost really isn't a problem. Developers know how to apt-get usually, so that's less support cost on me. In fact, most of them run ubuntu themselves, which is a massive help with dev to prod parity. Oh and of course, there are more packages and a whole lot more forums/community etc etc.

> so I pay the download cost once

Not if you're spinning up new hosts via autoscaling.

Sure. I'm still paying it once per box lifetime, which really isn't too bad. And given I'm orchestrating stuff etc etc, I can use fairly large boxes and get quite a lot of work out of one lifetime.

This thread needs to get nuked by admins. Amount of "downvote because I disagree", "upvote whatever because hype" is astounding. Your response is an example. It's better articulated then many others [0], [1] and yet it was downvoted to hell. Groupthink is strong.

[0]: https://news.ycombinator.com/item?id=10783021 [1]: https://news.ycombinator.com/item?id=10782946

You're seriously suggesting censorship as a counter to "groupthink"?

Alpine Linux has packages for Inkscape.

This runs against the notion of minimalism in the context of a harness for containers in my view.

I'm currently using rancherOS, it's awesome!!

It comes with its own package format, naturally.

Sounds great. Will definitely give a try !

For anyone wondering what "APK" is, I think it's the Android packager. Interesting choice!


Different apk format. Alpine linux apk is just a tar.gz of a signing key, .APKBUILD file and the files rooted in /

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact