What worries me about Alpine is that it invents YET-ANOTHER package manager(apk) . I think it claims that it builds on top of Gentoo... But could someone explain why Deb or rpm could not be used.
Does using Deb or rpm screw up size or security in some way. We could also have gotten access to the Debian/Fedora repository in a pinch. Not all of us run statically compiled go executables and often have tons of dependencies that need to be installed for webapp.
P.S. And the package manager apk is unfortunately named to conflict with IMHO the biggest package repo in the world:Android. It is virtually impossible to Google anything meaningful without running into android.
Alpine uses the musl libc and the OpenRC init system. In order for a deb or rpm package to install correctly, the binary would have to be built for musl already and the necessary information to set up services on the init system be provided. I don't know how the latter works for other package managers, be it in a way compatible with OpenRC or not.
P.S. In my experience APK has been rock-solid and simple to use.
this is a huge blocker - we are already in a world where RPM and DEB dont get released in tandem. And it has taken millions of man hours to resolve deep problems in dependency resolution, circular dependencies,etc.
For a startup like mine, I already estimate I'm installing hundreds of libraries (Ruby gems, Python wheels, etc). I'm pretty sure some of them are weird dependencies... but I have come to trust apt and dnf.
There is zero reason for me to trust Alpine's new package manager that is managed by a single dev. Disk space is cheap - my time is not.
This is the big blocker - if Alpine can figure out a way to co-exist with apt or dnf (pick your own poison), that makes it compelling.
But then again, I would ask - can something like Alpine be achieved with debian or centos. Even if it is 3X the size (30mb) ?
EDIT: I run a fat docker VM based on debian with nginx, rsyslog, postgresql and ruby+puma using supervisord. There is absolutely ZERO need for OpenRC. I have been running this in production for over 18 months since before Docker was this stable.
> But then again, I would ask - can something like Alpine be achieved with debian or centos. Even if it is 3X the size (30mb) ?
No. In the past I've tried several times to create a lightweight Debian derivative that would still allow me to install Debian/ubuntu packages from their repositories. The smallest I've been able to get things without completely breaking everything was around 230mb. This required lots of ugly hacks such as post install triggers that would remove useless garbage such as /usr/share/doc/ and man pages.
It's simply not possible to take the Debian eco-system and magically turn it into a slim version of itself.
I would never run Alpine stand-alone on a server. But for containers, it's absolutely amazing. I don't need the full Debian ecosystem. All I need is some basic tools such as NPM, Pip or gem. They can take care of the rest. The whole point of containers is to escape from the dependency and package hell that we're currently in.
The smallest I've been able to get things without completely breaking everything was around 230mb. This required lots of ugly hacks such as post install triggers that would remove useless garbage such as /usr/share/doc/ and man pages.
You can also use separate build and deploy containers... build in a container with all the tooling you need, that exports out to a mount point everything needed to run, then turn that mount point into the new deploy container.
this is by one of the posters above you. its called Dockersl.im
> Here are a few numbers to give you an idea (using on Ubuntu 14 base image):
> node: 431MB -> 14MB python: 433MB -> 15MB ruby: 406 MB -> 14MB java: 743MB -> 100MB
I don't need the full Debian ecosystem. All I need is some basic tools such as NPM, Pip or gem. They can take care of the rest.
What happens when your npm/pip/gem package depends on a C library like libpq or libsasl2? Restricting yourself to pure JS/Python/Ruby code to avoid having a few MBs sitting on the disk sounds like a terrible tradeoff.
And your reason is ... to share less between instances? The more code in common, the better, isn't?
When I trying to optimize my system of containers, I move everything common into topmost container, to make application containers smaller. You are doing opposite. Why?
The package ecosystem is the biggest challenge for Alpine. It's just not there and there's only so much you can do yourself. You need to have enough critical mass before the majority of the 3rd party providers begin to release packages for Alpine.
There's a way to have small Docker images using the standard distros like Ubuntu where you get to use your favorite packages. DockerSlim [1] was created for this. It collects the artifacts your are actually using in the container throwing everything else away...
Here are a few numbers to give you an idea (using on Ubuntu 14 base image):
It's a work in progress, so there might be corner cases where it's less than ideal. It's not tested well enough with "fat" images. Pull Requests are greatly appreciated :-)
You also get auto-generated Seccomp (already usable) and AppArmor (WIP) profiles, so you don't have to create them manually.
Yes, I'm one of the authors. If you want to contribute too it'll be awesome and it'll be greatly appreciated!
The quoted numbers are based on the set of sample apps you can find in the repository. Take a look in sample/apps. You'll see the code and the Dockerfiles that go along with them.
I haven't thought about publishing the images. Thanks for the idea!
Ehh I wouldn't worry about it too much. The whole idea behind these skinny distros is there are few if any packages. Distros can also consider using something like rpm-ostree[1] for making skinny, RPM-based immutable file systems where you only update the "master" image then push it out.
What is the fall back if a package is not available? For example, I was able to find an APK for python 3.5.1-r0. However, I can find nothing for Python 3.4 except a bug report and some hacky fixes.
Also, it looks like running anything that require binaries that have compiled against glibc will be wonky as hell.
> What is the fall back if a package is not available?
Then use another distro if you're not prepared to invest the effort into it. The point of people wanting to move Docker images to distros like Alpine is to minimize size. That matters if you're going for massive density etc., but depending on your use cases, using e.g. Ubuntu or Debian or Centos as a base can be just fine.
It's worthwhile to move the official images to it because they are used so widely. It's not necessarily worth it for every random image someone puts together.
The official images are used so widely precisely because they are trivially extendable using apt. This will not be the case if they are based of something obscure.
There are even more exciting libs like numpy/scipy which rely on a system BLAS package before they can even compile (and take forever). This is a problem that Ubuntu has made less difficult but a better solution would be package managers for C and fortran.
If the 'alpine experiment' results in more work on portable build tools for complex projects, that's a win for the open source community.
And if it's a proprietary piece of software and you don't have the source code? Or you do have the source, but the build process is too convoluted to figure out?
1. If it is proprietary, we don't need it in docker. If they want to play in this ecosystem, they have to make their source free.
2. If the build process is too convoluted, we try our best to simplify it.
Building something like Mozilla Firefox might take a few hours the first time but it will not always take that long. I for one would fully support this new pro-source software distribution mechanism. We could probably use git tags to find out when we have updates if we could get people to agree on some kind of convention...
Processor vendors should love this change because every server will build all the software it needs from source.
Sure. You don't get to dictate what I run in MY docker containers hosted on MY private registry used in MY environment. If I want to run proprietary software in my docker containers, I damn well will. And I expect Docker not to work against that, if just for the reason that it works fine today, why not tomorrow?
You're more than welcome to build that from source too! Where you get the source code could still be a private, authenticated area. You could choose to never publish your docker files. That's fine. I'm just saying that we should move to a better model where if you distribute software, you should also distribute the source code (and hopefully build tools) for it.
Why is this so difficult? It does not put any constraints on the user that vendors of proprietary software haven't artificially erected.
Talk about creating more problems than you solve. If this was such a good model, why aren't all linux distros shipped with a minimal set of tools, where the users are given a "go build it all yourself" note on the box? I will tell you why; because it's a suckfest that can drag expert and non-expert linux swashbucklers into the weeds for untold amounts of time depending on the software that needs to be built.
Just because something is a bad idea today doesn't necessarily mean it will forever remain so. What we have today is far from perfect and I think any effort to branch out is a good idea. In the worst case, we won't be any worse off than we are today.
Of course, my whole idea depends on many things such as the hypothesis that processing and storage will continue to get cheaper with time. I don't know if it will be true. I hope it will though.
"Moving" is a bit of a strong word. It would be much more accurate to say "providing alternatives". For example, the "golang" image now has an "alpine" variant for each supported version (https://hub.docker.com/_/golang/), but the default variant is still Debian-based (especially given that switching the base outright would break far too many existing Dockerfiles). Additionally, the documentation calls out that there might be libc compatibility issues in the spirit of trying to ensure our users are properly informed about the potential problems they might run into in their quest for the smallest possible base: https://github.com/docker-library/docs/blob/b7b6b86124682ef1...
I would definitely welcome PRs to make this verbiage more accurate or more informative of pros and cons.
What's with the weird comparison to the Windows start button? Is that if you encode the button as a BMP? Or is it the size of the compressed vector graphics? Or is it the size of the binary used to implement the start button? Does it include the shared libraries that binary uses? wtf.
Last I'd heard (I believe) the creator of Alpine Linux was looking for work and the future of the project was a bit uncertain. I'm happy to hear that he'll be able to continue working with Alpine Linux and that we'll still be able to make great Docker images with it.
Either way, it seems it's now getting some backing by Docker. I can only imagine that the number of contributors and packages for it will grow and mature.
Blog claims Alpine is based around being secure and light weight...but gives no indication on why it is secure. Oh, lightweight because of busy box? Is there scrutiny on packages installed? I don't see the security component.
Maybe Docker can reveal more there, though given how they iterate and things break, I'm not sure they are willing (or capable).
From the Alpine linux site: "Alpine Linux was designed with security in mind. The kernel is patched with grsecurity/PaX out of the box, and all userland binaries are compiled as Position Independent Executables (PIE) with stack smashing protection. These proactive security features prevent exploitation of entire classes of zero-day and other vulnerabilities."
I got excited, but then remembered - grsec will not affect containers. Neither will PaX unfortunately. PIE + stack smashing protection is already available in most serious distros. From the basic info I can find, I don't see a huge difference.
Having less crap in by default reduces the attack surface area.
Having a smaller libc makes it easier to audit. (It still needs to actually be audited of course)
Why does the size of a base image matter? What happened to the shared layers between images? Did the new file systems completely sacrifice that?
The original filesystem (AUFS) used shared read-only layers, so if two images used the same base image, only their differences contributed to disk usage. I know there has been a lot of work to move to filesystems supported by more kernels, but if shared layers have been sacrificed, that makes me sad.
> Why does the size of a base image matter? What happened to the shared layers between images?
It matters because when bootstrapping new hosts you still need to download all the base images, and because in many systems the base images can come to totally dominate the storage needs.
It still can often save a lot, but it's not enough for a lot of places where people want to use Docker.
If you need a library, you will download it anyway. But you can download it once, in base image, or multiple times. IMHO, the fatter _base_ image is, the better.
Ideally, base image must be full installation of everything, one large image for all. You will just download it, and it will just work.
Alpine is pretty nice though using it in containers you are not getting the best part of it: the hardened kernel. I'd say Alpine is a better fit for the host OS where you have a few moving parts.
It will be great for compiled language base images, but even there it might be tricky if you rely on 3rd party packages. Libc compatibility issues are also real. It's great that they are slowing addressing them though.
These incompatibilities arose from software that was written to rely on glibc-specific name resolution quirks. Neither musl nor docker rely on these misplaced expectations, but unfortunately some projects have decided to blame a standards-compliant libc implementation rather than fix their own software.
I am not familiar enough with the quirks of glibc name resolution to say much more than a lot of my stuff relies on binaries compiled against it. Who, in your opinion is to blame here, and what can we do to fix it?
If Alpine does not support the software I depend on I have no use for it. I would like to use the official Docker containers as far as I can but I'm most certainly not going to spend time recompiling third party deps to accomplish that. Shipped products trumps "security" and by god "size" every time, always.
Great move by the docker team. The promise of containers has always been to make the environment that processes live in super lightweight, with a minimum of unnecessary binaries and permissions. Having a full ubuntu installation per container has been a major hazard to that, as it's required the use of overlays and other tricks to avoid having huge disk overhead per container. Moving to a much lighterweight base image means you need fewer overlays, because you can pay the 5mb cost for days. It also reduces the attack surface by a lot, for much greater security.
Gives you an image that is ~30M (15M compressed) plus the size of whatever package you are installing. The advantage is that you get access to Fedora's package database with security updates.
This could probably be slimmed down further with custom versions of the base packages.
Combine that with apt-cacher-ng and whooosh. And now all you have to do is tell Ansible what needs to get done in that chroot. And you have a container that matches production one-on-one.
Has Alpine Linux solved the glibc problem? I recall you can't simply install Oracle Java 7+ on Alpine linux without a lot of setup first because of the dependency on glibc.
Ubuntu is great because people already understand it and probably already know how to install their apps on it which makes moving to Docker easy. Most businesses unsurprisingly don't like the idea of running on a OS they never heard of with tooling their engineers don't have experience with, limited community or enterprise support, or even internal repos.
Alpine may be the 'technically correct' choice but Ubuntu is easily a much better business choice.
Because all you're really doing is increasing your attack surface and wasting storage space. If you have a need for a specific piece of software, you should be able to identify that and include that in your docker image. Starting with a kitchen sink is only good when you're too lazy to spend an hour to understand what your software depends on.
If you run _same_ application in container with 20MB of files and in container with 2000MB of files, how it can affect attack surface at all? Bytes on disk are just data.
Moreover, if I use standard RPM package to run service using non-root user in limited environment using Systemd, then it will be much less riskier than running same service in container using root user, by order of magnitude less safer.
Container are not solution to problems with security. Much often they are huge security hole.
Fedora 23 minimal image in docker: 43MB in archive.
With tons of packages available (with patches and live maintainers). With formal stabilization process. With well tested package management system (with hundreds of bug fixes in 20 years of use). Which can be used as host and as container (so you will need to learn well and support just one OS). With Systemd, which handles daemons well. With well supported LTS version (RHEL/CentOS). With option for paid support. With glibc, which is much faster and feature-richer than musl.
Why I should use Alpine, which cannot even handle versioned dependencies between packages? Literally, I cannot tell that package A needs package B >= 3.x or package C < 2.x, which causes serious troubles in complex systems.
Red Hat's Project Atomic "installer ISO" is ~630mb and the uncompressed qcow2 is >900mb.
I was impressed with Atomic's size, but seeing how much smaller Alpine is, I can't help but wonder what all the additional size is in Red Hat's images.
RHEL is an enterprise OS. It is designed to handle various drivers (video, network, storage). It has monitoring, auditing, reporting stuff. Some of those dependencies are bringing others (say the monitor needs a mail client to sent messages, ok, install the mail client, oh looks like that brings in perl, etc). Then there might be multiple version of said monitoring. And think they just never really try to make it small. That is just what their customers pay for.
If Alpine did what RHEL does out of the box it would be hundreds of MBs as well.
Atomic is intended to be used as a host OS and uses RHEL, CentOS, or Fedora as a base typically. And the installer ISO is that large precisely because it bundles hardware, language, and all kinds of other support.
However, if you'd like to craft your own minimal Atomic host, you can.
Making minimal containers is pretty easy, though, since yum/dnf lets you create execution trees that contain only what's needed for an application to run (as others have mentioned).
So, really, doing micro-services on RHEL/CentOS/Fedora hosts is pretty easy.
I agree with snubbing glibc, systemd, and apt-get. Seriously, it is about time to put an end to the bloated fatibubbul fest going on over there! I intend to move my dockers to alpine. I will only come back to debian after and not before they have downsized, trimmed and layed off all the useless fat around their bellies!
Dunno if it is about snubbing, but i get the impression that having systemd inside the container makes it damn hard to work it if you do not also have it outside the container. And at that point, systemd is likely to take over control completely rather than play nice with docker.
Na, that's not how it works at all. When you run a docker container, you'll be running your app directly, thus systemd won't be involved and isn't even needed within the container.
The problem with debian and centos is they weren't create with containers in mind, thus by default their base images pull in a lot of stuff that's required to actually init hardware.
We can see with fedora that efforts have been made to slim it down, I suspect as containers become a popular usecase we're going to see smaller base containers, except with access to 30,000 stable packages and a mainstream community.
You might find people attempting to start systemd as pid 1 then running their containers as sort of lightvm, but that's not really the typical use case.
More common is folks not wanting to deal with splitting processes that are grouped together such as an internal gitlab instance that has, nginx, unicorn, postgres, ssh, and sidekick running.
Best practice is arguably that you should be able to run all these processes in different containers and share the directories or sockets where possible, but the pragmatist is probably running them together using supervisor.
Well, the ISO includes the Linux kernel. A Docker image only comes with user space components since the host provides the kernel, so an ISO would need to be larger.
Based on my understanding of the threads on the Musl list, part of this was some features that are now implemented in Musl, and part was Kubernetes using DNS in ways that are incorrect.
So, that Alpine linux has more man-power and resources to maintain kernel and core libraries stability and binary compatibility than Ubuntu (with its huge community) or Fedora/RH/CentOS (with its money)?
This is silly nonsense. Before I care about image size or "security", I need to ship my products. After I've accomplished that, I make sure my firewall is still good and if necessary buy $20 more volume space.
You have "solved" a problem that no one has. I don't care about image size, and neither should you. I don't care about container security, and neither should you. How does struggling with a anaemic non-distribution hacked together by a handful of kids that believes software is about size and "security" help me accomplish shipping shit ? It does not.
Does using Deb or rpm screw up size or security in some way. We could also have gotten access to the Debian/Fedora repository in a pinch. Not all of us run statically compiled go executables and often have tons of dependencies that need to be installed for webapp.
P.S. And the package manager apk is unfortunately named to conflict with IMHO the biggest package repo in the world:Android. It is virtually impossible to Google anything meaningful without running into android.