Unless building a base image... doesn’t this just take away from the benefits of using docker? If I understand one of the primary goals of containers it to: create an isolated environment with quotas and restrictions to the underlying OS by using Linux namespaces and cgroups. However one of the great things about docker is that I can do FROM ubuntu and then anywhere I run my container I now have my app running in an OS that I’m comfortable with. So I can always run bash inside the container and apt-get whatever I need and debug it/experiment etc...
I understand the problem with docker image sizes. I worked at a company where we had a ~1GB image and our CI tool didn’t support caching of docker images so it would take a good 15 minutes to do a build every time. But when we were faced with the option of using another smaller OS, like alpine, we decided not do it because we would give up a lot of flexibility that the OS was providing us.
If you’re running a statically linked binary produced by go and that’s all you want on your pretty much empty image, why not just scp the file and run it manually under a cgroup? Or good ol choot/jails/zones?
> If you’re running a statically linked binary produced by go [...]
In a way you are answering your own question. Sure you can give up Docker and use something else, but you are giving up benefits of using the Docker infrastructure and ecosystem.
If you are just using Docker for one app, then yes I agree, but if you have other apps running through Docker then it’s certainly beneficial to do so even for statically linked executables, to keep everything consistent.
This is especially important once an organization grows much. Once you start having ops or security teams, different development groups, etc. there's a significant benefit to having one way to manage everything.
A new sysadmin doesn't need to learn that custom way your hand-rolled deployment system handles dependencies, how to see what's supposed to be running on a box, etc. A security person who's wondering whether something is supposed to be listening on a port can at least see that it's something someone went to the trouble of exposing. That QA team or the person who inherited your maintenance coding can avoid learning a new way to ensure they have a clean CI build.
(That doesn't mean that Docker's always the right answer — maybe you've identified an area like networking where there's a reason not to use it — but in most cases the differences are below the threshold where it's worth caring)
Funny, I think using FROM ubuntu takes away the main benefits of Docker, namely the ability to have perfectly groomed execution environment, that is also fully reproducible and with minimal external dependencies. Basically having immutable infrastructure and infrastructure as code as the core pillars, enabling them to be so smooth that live mutation etc would be unnecessary.
Weird, if "fully reproducible" and "groomed/minimal" environments were the goal of docker, I'd expect image tags to not be mutable, the "FROM" line in a Dockerfile to only take immutable hash references, and for the default `docker build` environment to have no network access.
At the very least, `docker build` should accept and produce a "lock" like file which specifies the source image it came from, and mark anything that does a network request (e.g. `apt-get update`) as tainted and unreproducible.
There are plenty of ways to create reproducible minimal images (such as using the nix package manager to create the rootfs), but the official docker images don't use those techniques and the docker tooling and ecosystem actively fight against it.
Docker is clearly focused on usability / first-user-experience at the expense of reproducibility and immutability. They encourage the use of the 'latest' tag, they encourage the use of procedural network-heavy build steps, and they have made not attempt to promote or create alternative tooling which tackles reproducibility correctly.
FROM myregistry.example.com/foo:1.2.3@sha256:<hash>
You can call `docker build --network=none` to disable network access during the build step.
Why would you make a lock file from a Dockerfile? You can specify everything inside it already; from the version you pull with yum to explicitly COPY'ing RPMs/debs you keep locally.
In today's Docker the biggest benefit I can see is Networking that container with other. Docker networking is quite powerful and having the container auto-join the network and have it routable from any other machine without having to even know what machine(s) it is actually running on is a nice thing.
Being able to use consistent tooling with the rest of the containers is another plus.
> But when we were faced with the option of using another smaller OS, like alpine, we decided not do it because we would give up a lot of flexibility that the OS was providing us.
Alpine is getting easier to work with. It's been a while since I haven't been able to find a pre-built package for something I needed for instance.
The 'FROM scratch' with a single binary pattern is something I use a lot with stowage.org : basically, I can create series of containers that allow other developers to easily install and update dev-environment / build tool chain without having to do a bunch of packaging.
That said, I definitely agree that you don't want to do 'FROM scratch' unless you're definitely not re-using the various upper layers. Having a fat base image is a one-time cost that potentially pays itself back many times over.
If you're running multiple containers of images that are themselves derived from a same image, is it still a one-time cost? Is Docker smart enough to run "only one" instance of Ubuntu, for example?
What resources are you concerned about being consumed?
- There will only ever be one running kernel with docker.
- The base filesystem layer, if identically hashed, will be shared as an overlay filesystem.
- The memory footprint of whatever each container runs (which will generally not be a full from runlevel0 system) will not be shared, except in the sense that binaries loaded into ram from the same overlay filesystem will have some of their disk pages shared.
What stormbrew said above. It's probably not an absolutely O(1) cost but in practice I haven't noticed the difference. I definitely have noticed a saving from pushing many different "slim" alpine images around (alpine is not at fault here, but the more differences your images have, the less you get to re-use existing layers).
Agreed, just for one app no need to create unneeded overhead. But PROD usually looks different. See paragraph "Application Environment" here http://queue.acm.org/detail.cfm?id=2898444 why you would still want to put static binaries in Docker containers.
My current philosophy on this is use always start from scratch if I can. This would be the case where I'm using something that is statically compiled, or a standalone go binary.
If I need more facilities from an OS, then I try to use a micro-distribution like Alpine. This could be because I have a more complex go binary, or if I have a python script that I want to execute.
If Alpine isn't cutting it, then I go for something like Ubuntu. This is typically because Alpine doesn't have some library that I need, or because musl libc isn't behaving properly.
Works great with Haskell statically compiled binaries. Running the binary through UPX i've managed to get small HTTP microservices down to a 2MB docker image with just Scratch.
Works just as well for Go binaries. It's pretty much the recommended base image for distribution of Go apps on Docker. I assume that it would be just as effective for any statically compiled binary.
Edit: I really should have read the article first. It uses Go binaries as the example. Good to know Haskell folks are also using it.
Yup, in the end it´s an OS process and all rules apply. I did not care too much about Dockerfile best practices in my article. Good point, should at least have used "user <!root>".
Thx! See me comment above on why you would want to put static binaries in "scratch", i.e. use "scratch" to deploy apps and not just for building base layers as you suggest.
You can however compose micro services with scratch
For example I created a cntlm base image (linked in another comment)
From there I can do
FROM my_base_image
COPY whatever
and then add layers of services
first one is proxy
second one could be queue service (for example http://nsq.io)
then a message server, that just sends notifications
etc. etc. etc.
The same could be achieved downloading and configuring the static binaries, but Docker packaging, security and network separation makes evrything a little bit easier
I understand the problem with docker image sizes. I worked at a company where we had a ~1GB image and our CI tool didn’t support caching of docker images so it would take a good 15 minutes to do a build every time. But when we were faced with the option of using another smaller OS, like alpine, we decided not do it because we would give up a lot of flexibility that the OS was providing us.
If you’re running a statically linked binary produced by go and that’s all you want on your pretty much empty image, why not just scp the file and run it manually under a cgroup? Or good ol choot/jails/zones?