Hacker News new | comments | show | ask | jobs | submit login
Inside Docker's “FROM scratch” (embano1.github.io)
147 points by kiyanwang 10 months ago | hide | past | web | favorite | 29 comments

Unless building a base image... doesn’t this just take away from the benefits of using docker? If I understand one of the primary goals of containers it to: create an isolated environment with quotas and restrictions to the underlying OS by using Linux namespaces and cgroups. However one of the great things about docker is that I can do FROM ubuntu and then anywhere I run my container I now have my app running in an OS that I’m comfortable with. So I can always run bash inside the container and apt-get whatever I need and debug it/experiment etc...

I understand the problem with docker image sizes. I worked at a company where we had a ~1GB image and our CI tool didn’t support caching of docker images so it would take a good 15 minutes to do a build every time. But when we were faced with the option of using another smaller OS, like alpine, we decided not do it because we would give up a lot of flexibility that the OS was providing us.

If you’re running a statically linked binary produced by go and that’s all you want on your pretty much empty image, why not just scp the file and run it manually under a cgroup? Or good ol choot/jails/zones?

> If you’re running a statically linked binary produced by go [...]

In a way you are answering your own question. Sure you can give up Docker and use something else, but you are giving up benefits of using the Docker infrastructure and ecosystem.

If you are just using Docker for one app, then yes I agree, but if you have other apps running through Docker then it’s certainly beneficial to do so even for statically linked executables, to keep everything consistent.

This is especially important once an organization grows much. Once you start having ops or security teams, different development groups, etc. there's a significant benefit to having one way to manage everything.

A new sysadmin doesn't need to learn that custom way your hand-rolled deployment system handles dependencies, how to see what's supposed to be running on a box, etc. A security person who's wondering whether something is supposed to be listening on a port can at least see that it's something someone went to the trouble of exposing. That QA team or the person who inherited your maintenance coding can avoid learning a new way to ensure they have a clean CI build.

(That doesn't mean that Docker's always the right answer — maybe you've identified an area like networking where there's a reason not to use it — but in most cases the differences are below the threshold where it's worth caring)

Funny, I think using FROM ubuntu takes away the main benefits of Docker, namely the ability to have perfectly groomed execution environment, that is also fully reproducible and with minimal external dependencies. Basically having immutable infrastructure and infrastructure as code as the core pillars, enabling them to be so smooth that live mutation etc would be unnecessary.

Weird, if "fully reproducible" and "groomed/minimal" environments were the goal of docker, I'd expect image tags to not be mutable, the "FROM" line in a Dockerfile to only take immutable hash references, and for the default `docker build` environment to have no network access. At the very least, `docker build` should accept and produce a "lock" like file which specifies the source image it came from, and mark anything that does a network request (e.g. `apt-get update`) as tainted and unreproducible.

There are plenty of ways to create reproducible minimal images (such as using the nix package manager to create the rootfs), but the official docker images don't use those techniques and the docker tooling and ecosystem actively fight against it.

Docker is clearly focused on usability / first-user-experience at the expense of reproducibility and immutability. They encourage the use of the 'latest' tag, they encourage the use of procedural network-heavy build steps, and they have made not attempt to promote or create alternative tooling which tackles reproducibility correctly.

But you can already do this.

FROM myregistry.example.com/foo:1.2.3@sha256:<hash>

You can call `docker build --network=none` to disable network access during the build step.

Why would you make a lock file from a Dockerfile? You can specify everything inside it already; from the version you pull with yum to explicitly COPY'ing RPMs/debs you keep locally.

In today's Docker the biggest benefit I can see is Networking that container with other. Docker networking is quite powerful and having the container auto-join the network and have it routable from any other machine without having to even know what machine(s) it is actually running on is a nice thing.

Being able to use consistent tooling with the rest of the containers is another plus.

> But when we were faced with the option of using another smaller OS, like alpine, we decided not do it because we would give up a lot of flexibility that the OS was providing us.

Alpine is getting easier to work with. It's been a while since I haven't been able to find a pre-built package for something I needed for instance.

The straightforward resource isolation is a big deal. CPU/Memory quotas are necessary for distributing containers among a machine pool.

The 'FROM scratch' with a single binary pattern is something I use a lot with stowage.org : basically, I can create series of containers that allow other developers to easily install and update dev-environment / build tool chain without having to do a bunch of packaging.

That said, I definitely agree that you don't want to do 'FROM scratch' unless you're definitely not re-using the various upper layers. Having a fat base image is a one-time cost that potentially pays itself back many times over.

If you're running multiple containers of images that are themselves derived from a same image, is it still a one-time cost? Is Docker smart enough to run "only one" instance of Ubuntu, for example?

What resources are you concerned about being consumed?

- There will only ever be one running kernel with docker.

- The base filesystem layer, if identically hashed, will be shared as an overlay filesystem.

- The memory footprint of whatever each container runs (which will generally not be a full from runlevel0 system) will not be shared, except in the sense that binaries loaded into ram from the same overlay filesystem will have some of their disk pages shared.

In practice, the base layers update often, and not all apps will be running from the same version.

That's true, but it is something that is potentially within your control as well.

What stormbrew said above. It's probably not an absolutely O(1) cost but in practice I haven't noticed the difference. I definitely have noticed a saving from pushing many different "slim" alpine images around (alpine is not at fault here, but the more differences your images have, the less you get to re-use existing layers).

Agreed, just for one app no need to create unneeded overhead. But PROD usually looks different. See paragraph "Application Environment" here http://queue.acm.org/detail.cfm?id=2898444 why you would still want to put static binaries in Docker containers.

My current philosophy on this is use always start from scratch if I can. This would be the case where I'm using something that is statically compiled, or a standalone go binary.

If I need more facilities from an OS, then I try to use a micro-distribution like Alpine. This could be because I have a more complex go binary, or if I have a python script that I want to execute.

If Alpine isn't cutting it, then I go for something like Ubuntu. This is typically because Alpine doesn't have some library that I need, or because musl libc isn't behaving properly.

Very interesting. Does someone have an example of a project where they used scratch? It seems to be only useful to build base distribution images

Static building, as mentioned by others, also works for Rust. Portier builds this way:


This uses the awesome clux/muslrust Docker image as a build environment, then copies the result into a new ‘from scratch’ layer.


Works great with Haskell statically compiled binaries. Running the binary through UPX i've managed to get small HTTP microservices down to a 2MB docker image with just Scratch.

Works just as well for Go binaries. It's pretty much the recommended base image for distribution of Go apps on Docker. I assume that it would be just as effective for any statically compiled binary.

Edit: I really should have read the article first. It uses Go binaries as the example. Good to know Haskell folks are also using it.

Massive downside: You run your app as root or you have to do nasty mounts of /etc/passwd and /etc/group from your host

You don't have to mount the host versions - you can create container-specific ones.

See https://medium.com/@lizrice/non-privileged-containers-based-...

You can run a Docker container as a particular user.


You can use `setcap` to grant capabilities to the binary or the `pam_cap` module if you need to do capabilities per user.

I haven't run across the need to run most containers as root for a while now.

Yup, in the end it´s an OS process and all rules apply. I did not care too much about Dockerfile best practices in my article. Good point, should at least have used "user <!root>".

I used FROM scratch to build a docker container for AppFS, which just has 2 files: init, and appfsd


init comes from: http://appfs.rkeene.org/web/artifact/ecb8eda1cfb32ecc

And just sets up some symlinks and starts appfsd, followed by running bash (which is cached and run transparently).

Thx! See me comment above on why you would want to put static binaries in "scratch", i.e. use "scratch" to deploy apps and not just for building base layers as you suggest.

You can however compose micro services with scratch

For example I created a cntlm base image (linked in another comment)

From there I can do

FROM my_base_image COPY whatever

and then add layers of services

first one is proxy

second one could be queue service (for example http://nsq.io)

then a message server, that just sends notifications

etc. etc. etc.

The same could be achieved downloading and configuring the static binaries, but Docker packaging, security and network separation makes evrything a little bit easier

I have a sidecar pod in kubernetes that runs "kubectl proxy" so the image just has the kubectl binary.

I use it as base image for docker-compose.

Applications are open for YC Winter 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact