Hacker News new | comments | show | ask | jobs | submit login
Kaniko: Build container images in Kubernetes (googleblog.com)
261 points by dlor 6 months ago | hide | past | web | favorite | 45 comments



This work (and related efforts like Img and Buildah) is a big deal.

Right now docker images and Dockerfiles are joined at the hip to the Docker daemon.

It works great for local development, but for hosted systems that run on containers, it's a dire mess. I have personally slammed head-first into Docker-in-Docker quagmires on Kubernetes and Concourse. Not knowing the particular arcane rites and having neither sufficient eye of newt nor sufficient patience to get it to work, I like everyone else in the universe gave up.

Not an acceptable state of affairs, given the many problems of Dockerfiles in themselves. Dockerfiles force an ugly choice. You can have ease of development or you can have fast, safe production images. But you can't really have both.

Kaniko is another step in the direction of divorcing docker images as a means of distributing bits from Dockerfiles as a means of describing docker images from Docker daemons as a means for assembling the images. All three are different and should no longer be conflated.

Disclosure: I work for Pivotal, we have a lot of stuff that does stuff with containers.


> Not knowing the particular arcane rites and having neither sufficient eye of newt nor sufficient patience to get it to work, I like everyone else in the universe gave up.

One thing I feel like more people need to know: Docker container-images are really not that hard to build "manually", without using Docker. Just because Docker itself builds images by repeatedly invoking `docker run` and then snapshotting the new layers, people think that's what their build tools need to do as well. No! You just need to have the files you want, and know the config you want, and the ability to build a tar file.

Here's a look inside an average one-layer Docker image:

    $ mkdir busybox_image; cd busybox_image
    $ docker pull busybox:latest
    $ docker save busybox:latest | tar x
    $ tree
    .
    ├── 8ac48589692a53a9b8c2d1ceaa6b402665aa7fe667ba51ccc03002300856d8c7.json
    ├── f4752d3dbb207ca444ab74169ca5e21c5a47085c4aba49e367315bd4ca3a91ba
    │   ├── VERSION
    │   ├── json
    │   └── layer.tar
    ├── manifest.json
    └── repositories

    1 directory, 6 files

• `repositories` contains the tag refs that will be imported when you `docker load` this archive;

• `manifest.json` contains the declarations needed for the daemon to unpack the layers into its storage backend (just a listing of the layer.tar files, basically);

• the SHA-named config file specifies how to reconstruct a container from this archive, if you dumped it from a container (and I believe it's optional when constructing a "fresh" archive for `docker load`ing);

Each SHA-named layer directory contains:

• a `layer.tar` file, which is what you'd expect, e.g.:

    -rwxr-xr-x  0 0      0     1037528 16 May  2017 bin/bash
• a `json` file, specifying (the patch of!) the container config that that layer creates. (If you're composing a docker image from scratch, you just need the one layer, so you don't have to worry about the patching semantics.)

That's pretty much it. Make a directory that looks like that, tar it up, and `docker load` will accept it and turn it into something you can `docker push` to a registry. No need to have the privileges required to run docker containers (i.e. unshare(3)) in your environment. (And `docker load` and `docker push` work fine without a working Docker execution backend, IIRC.)


+1000 on docker images being easier to construct than it seems.

I wrote a few blog posts awhile ago on this where I reimplemented docker pull and push in bash: https://www.danlorenc.com/posts/containers-part-1/

It even got up to basic image modification support.

Disclosure: I work on kaniko and lots of other things that construct docker images without docker at Google.


See also: Docker image spec 1.2 https://github.com/moby/moby/blob/master/image/spec/v1.2.md

Not to be confused with the similarly named “Image Manifest V2, Schema 2”: https://docs.docker.com/registry/spec/manifest-v2-2/

Oh and there’s the OCI spec too: https://github.com/opencontainers/image-spec/blob/master/spe... (and its repo)

Really enjoying all these Docker-demystifying posts, I wish I’d found these months ago while puzzling over how Dockerless worked from confusing bzl files. In fact, I really wanted Kaniko just yesterday as I poured over DinD and wondered why it was so complicated to build Docker images within Kubernetes. Now all we need is a lightweight CI wrapper for K8S jobs with GitHub webhook and kubectl apply support! :)


This unpacks a series of tarballs fetched with curl, but it's not clear how it would correctly handle file deletions.


In the OCI image spec file deletions are handled with special whiteout files... See: https://github.com/opencontainers/image-spec/blob/master/lay...


> Docker container-images are really not that hard to build "manually", without using Docker.

I've sometimes described docker images as a collection of tarballs stickytaped together with a JSON manifest.

I understand that the format is simple, but I don't want to write such a tool. I want such a tool to exist and be in wide usage, so I have assurance that it will keep up with changes, receive security scrutiny and receive improvements in features and performance.

This is a question of economics. I can write software to do anything software can do. Whether that makes sense is different. Until it has not made sense for me or many others. Given Google's clear interest in prying apart the tangle and willingness to assign fulltime engineering to it, there is a chance that we can all get out of the quagmire.


> `docker load` them, `docker push` to a registry. No need to run docker in docker.

This is missing the point.

The point of the tool is to do docker builds + pushes on Kubernetes (or inside other containerized environments) securely.

If you can `docker load/push`, that means you have access to a docker daemon. If that daemon is not docker-in-docker, you have root on the machine since access to the docker.sock is trivially the same as root.

As such, to do `docker load` + `docker push` in a containerized environment reasonably securely, you do need either docker-in-docker (which is probably insecure anyways if you need the container to be privileged still).

In addition, sure you can piece together a tarball, but the point of this tool is backwards compatibility with Dockerfiles, not to be able to manually piece things together.


I wasn't trying to argue against the existence of this product; I was, like I said, trying to make a separate point—that people don't realize it's very simple to manually construct Docker images, and that this kind of pipeline may be preferable to a Dockerfile-based one for some CI environments. (And, in such cases, you really didn't need to be waiting around for something like this to exist. You could have reached CI/CD nirvana long ago!)

> If you can `docker load/push`, that means you have access to a docker daemon.

Yes†, but by manually creating a container image, you've decoupled CI from CD: you no longer need to actually have a trustworthy execution sandbox on the machine that does the `docker push`-ing, because that machine never does any `docker run`-ing. It doesn't need, itself, to be docker-in-docker. It can just be a raw VM that has the docker daemon installed (sitting beside your K8s cluster), that receives webhook requests to download these tarballs, and then `docker load`s them and `docker push`es them.

---

† Though, consider:

• You can talk to a Docker registry without a Docker daemon. The Docker daemon<->Docker registry protocol is just a protocol. You can write another client for it. (Or, you can just carve the registry-client library out of Docker and re-use it as a Go library in your own Go code.)

• You can parse and execute every line of a Dockerfile just as `docker build` does, without a running Docker daemon, as long as none of those lines is a RUN command. Many application container-images (as opposed to platform container-images) indeed do no RUNing. You've already got a compiled static binary from earlier in your CI pipeline; you just want it "in Docker" now. Or you don't have a build step at all; you're just "composing" a container by e.g. burning some config files and a static website into an Nginx instance. In either of these cases, you might have a Dockerfile with no RUN at all.

Combine the two considerations, and you could design and implement a `docker`-compatible executable that supports `docker build` and `docker push`, without doing anything related to containers!

(The simplest way to do this, of course, would be to just take the docker client binary—which is, handily, already the same binary as the docker daemon binary—and make it so the Docker client spawns its own Docker daemon as a thread on each invocation. Add some logic for filesystem-exclusive locking of the Docker state dir; and remove all the logic for the execution driver. Remove the libcontainer dependency altogether. And remove `RUN` as a valid `docker build` command. There: you've got a "standalone Docker client" you can run unprivileged.)


- We've published libraries to interact with the registry without docker or the docker CLI, which we use in these projects

https://github.com/google/go-containerregistry

https://github.com/google/containerregistry

- Our team has built something exactly like you're describing https://github.com/GoogleCloudPlatform/distroless

Dockerfiles without RUN commands are technically more correct: reproducible, much easier to inspect. However, its quite limiting for the existing corpus of Dockerfiles.

I like to think of kaniko as the (pull) + build + push decoupling of the docker monolith. Other tools, like cri-o, have implemented the complement (pull + run).

Disclaimer: I work on kaniko and some of these other tools at Google


What should one replace the RUN instructions with? Say I have:

`RUN apt-get update` and then an `RUN apt-get install -y pkgX pkgY..pkgN`

I could download each package beforehand, tar em and use docker save, but I'd want the recursive dependency tree of packages too....


That's what https://github.com/kubernetes/contrib/tree/master/go2docker does, without even constructing the transient directory.


This is one of the things that has me really interested in ansible-container. I don't like investing in Dockerfiles when there's still so much other stuff that has to go on AROUND the container itself.

The concept behind ansible-container - having the ability to create Docker, LXC, LXD or any future type or flavor of container...from Ansible playbooks...that you're already able to use to configure entire VMs or bare metal machines just feels like a much more efficient use of ops resources.

Ansible becomes portable across everything.

https://www.ansible.com/integrations/containers/ansible-cont...


I like reading and writing dockerfiles, not so much with most other tools.


I actually have that working... The trick is DIND wipes /tmp (for no reason what so ever) on startup which also wipes out the concourse build dir. You need a custom version of DIND with the startup script setup not to wipe /tmp.


We’re using docker for development, but we still have to take the leap into production. The whole build/push/pull part is rather confusing somehow. I tried docker hub or docker cloud build as it’s now called(?), but the build itself takes forever... what are people using these days??

Also for development machines, how do you sync things between developers. I can commit a docker file change, but unless I explicitly tell docker compose to rebuild my images and containers, it will happily stick to the old version. I have to keep nagging our (3) developers to do this from time to time... what am I doing wrong?? Sorry if these are dumb questions but we’re still stuck with the basics it seems.


If you're still struggling with the build workflow, it's probably not yet the right time to take that leap.

It's not rocket science, of course. You build an image somewhere (your local machine, a CI server, anywhere), push to a registry, and when you want run the image, you pull from the registry and run it. ("docker run" will, by default, automatically pull when you ask it to run something.)

I don't quite understand what your Compose problem is. Is the Compose file referencing images published to, say, Docker Hub? If so, the image obviously has to be built and published beforehand. However, it's also possible to run Compose against local checkouts, then run "docker-compose up --build", e.g.:

    version: '3.2'
    services:
      mainApp:
        build:
          context: .
      service1:
        build:
          context: ../service1
      service2:
        build:
          context: ../service2
and so on.

There's a whole ecosystem of tools built around Docker for building, testing, deploying and orchestrating Docker applications. Kubernetes is one. If you're having issues with the Docker basics, however, I wouldn't consider any of these systems quite yet, although you should consider automating your building and testing with a CI (continuous integration) system, rather than making your devs build and test on their local machines.

As with anything, to actually use Docker in production you'll need an ops person/team that knows how to run it. That could be something as simple as a manual "docker run" or a manual "docker-compose", to something much more complex such as Kubernetes. This is the complicated part.


The problem I was referring to with docker-compose:

let's say I update my Dockerfile and change from `FROM ruby:2.3.4` to `FROM ruby:2.5.1` and commit the Dockerfile change, merge it to master, etc.

Our developers have to remember to manually run docker-compose --build, or to remove their old containers and create new ones, which would get them rebuilt... I couldn't find something that would warn them if they're running off of stale images, or better, simply build them automatically when the Dockerfile changes.

Part of the benefits of docker is creating a repeatable environment with all sub-components on all dev machines. Isn't it?

Maybe our devs should only pull remote images and never build them, but then wouldn't I have the same problem that docker-compose won't force or remind the developers to pull unless they explicitly tell it to? And also, isn't this detaching the development process around the Dockerfiles/builds themselves from the rest of the dev process??


If you run with "docker-compose up --build", it should automatically build. This requires that any app you want to work on references the local Dockerfile, not a published one, the same way as in my paste. I.e. "build: ./myapp" or whatever.

Edit the code, then restart Compose, and repeat. It will build each time. If you want to save time and you have some containers that don't change, you can "pin" those containers to published images — e.g., the main app is in "./myapp", but it depends on two apps "foo:08adcef" and "bar:eed2a94", which don't get built every time. This speeds up development.

Building on every change sounds like a nightmare, though. It's more convenient to use a file-watching system such as nodemon and map the whole app to a volume. Here's a blog article about it that also shows how you'd use Compose with multiple containers that use a local Dockerfile instead of a published one: https://medium.com/lucjuggery/docker-in-development-with-nod....


We're not building every time. But some times, like the example above, we do need to build. The problem however is this becomes a fairly manual process. If a developer forgets to do it, they will keep running with an older base image. So all the consistency benefits across developers is gone.

In any case, thanks for your suggestions. I think it's some misconception on my part about how docker-compose should behave.


So to me it's starting to sound like "developers forgetting" is your problem. Not Docker or Compose.

The solution I've used in the multiple companies I've started is to maintain a developer-oriented toolchain that encodes best practices. You tell the devs to clone the toolchain locally and you build in a simple self-update system so it always pulls the latest version. Then you provide a single tool (e.g. "devtool"), with subcommands, for what you want to script.

For example, "devtool run" could run the app, calling "docker-compose --build" behind the scenes. This ensures that they'll always build every time, and never forget the flag.

If you have other common patterns that have multiple complicated steps or require "standardized" behaviour, bake them into the tool: "devtool deploy", "devtool create-site", "devtool lint", etc.

We've got tons of subcommands like this. One of the subcommands is "preflight", which performs a bunch of checks to make sure that the local development environment fulfills a bunch of checks (Docker version, Kubectl version, whether Docker Registry auth works, SSH config, etc.), and fixes issues (e.g. if the Google Cloud SDK isn't installed, it can install it). It's a good pattern that also simplifies onboarding of new developers.


That's a great suggestion! Thanks. We're doing parts of it, but I just need to expand it to work with docker-compose. As I mentioned, I probably had the wrong preconceptions about it "figuring out" when components were stale... I guess a few simple bash scripts can work wonders to make it more intelligent :)


We build a microservices-based tool, hosted as containers in AWS, and have a very developer-friendly workflow. My team's workflow might not work well for yours, YMMV, etc, but here's how we do it:

- When we make a PR, we mark it as #PATCH#, #MINOR#, or #MAJOR#.

- Once all tests pass and a PR is merged, CI uses that tag to auto-bump our app version (e.g. `ui:2.39.4`, or `backend:2.104.9`) and update the Changelog. [0]

- CI then updates the Dockerfile, builds a new image, and pushes that new image to our private repo (as well as to our private ECR in AWS).

- CI then updates the repo that represents our cloud solution to use the newest version of the app.

- CI then deploys that solution to our testing site, so that we can run E2E testing on APIs or the UI, and verify that bugs have been fixed.

- We can then manually release the last-known-good deployment to production.

The two main keys to all of this is that our apps all have extensive tests, so we can trust that our PR is not going to break things, and our CI handles all the inconvenient version-bumping and generation + publication of build artifacts. The best part is, we no longer have to have 5 people getting merge conflicts when we go to update versions of the app, as CI does it for us _after_ things are merged.

0: We use pr-bumper (https://github.com/ciena-blueplanet/pr-bumper), a tool written by my coworkers, for our JS apps and libraries, and a similar Python tool for our non-JS apps.


My first recommendation would be to separate in your head the Docker development environment from the Docker production environment. They can be very different, and that is OK.

For production you want the Docker image to be built when PRs are merged to master (or whatever your flow is). Google Container Builder makes that very easy, you can set up a trigger to build an image and push it to the registry when there are changes to git (code merged to a branch, tag pushed, etc.). Then you need to automate getting that deployed, hopefully to Kubernetes, but that is a different issue.


> They can be very different, and that is OK.

This feels odd to me. Isn't one of the major selling points of docker development-production parity?


This is cool! Thanks for posting. I can see how this is useful if building images is part of your CI process.

I’ve been using https://github.com/dminkovsky/kube-cloud-build to build images on Google Cloud Container Builder. It handles generating Cloud Container Builder build requests based on the images specified in my Kubernetes manifests, which was a big deal for me since writing build requests by hand was a total pain.


If you have CI, you normally shouldn't need something like this.


The idea is to not have to maintain separate CI infrastructure in addition to your Kubernetes cluster.

Disclaimer: I worked on this kaniko at Google


Thanks for your work really appreciated, but is there any way to cache some layers?


There should be! We haven't thought too much about it yet, but there's no reason it wouldn't work. We can't reuse the Docker daemon cache because that would imply access to the docker daemon.

Discloser - I work on kaniko and other container things at Google.


My understanding is that it is best practice to run your docker builds and images as s non root user. OpenShift will complain if you do for example. Now this kaniko image runs the build as root contrary to the recommendation and the post explicitly mentions this difference with Orca.

Why is it okay now for kaniko to run as root user?


This was my first thought as well.


With the availability of the free Red Hat tools for building container images (buildah...) and this, it will be interesting to see what remains of Docker (Inc).


It's pretty clear that Docker has been focused on moving downstream. They want to add value by assembling open-source components into a complete platform that they can control and sell. They don't want to be the ones developing all the components themselves - at this level of maturity and sophistication in the container market, they just don't have the manpower to do that. A major benefit of that strategy is that they can use the best component available, regardless of who developed it. I bet they're feeling spread very thin on the open-source side, and would love to redirect some of their resources away from developing a gazillion open-source gadgets on their own, and towards their commercial products (which historically have been not as good in my experience).

Evidence that Docker is doing this:

- They only advertise three things with the name Docker: "Docker for Mac" (a free product that is not open-source), "Docker EE" (an enterprise product), and "Docker Hub" (a cloud service). Those are all downstream products, like RHEL or Openshift.

- The whole "Moby" thing is basically their upstream brand, aka "the things not called Docker".

- They spun out tons of smaller projects like buildkit, linuxkit, containerd, runc, and seem eager to get others to use them and contribute, even competitors.

- They embraced Kubernetes as part of their downstream product, even though they famously did not invent it, and they certainly don't control it.

So I think people saying "these free open-source tools are killing Docker" are missing the point. The real competition for Docker is Openshift vs Docker EE, everything else is implementation details.

If you listen to the sales pitch of these two companies right now, it's an absolute tug of war. Docker focuses on independence and innovation ("we know where containers are going, and we don't force RHEL down your throat"). Red Hat focuses on maturity and upstream control ("We've been by your side for 20 years, are you going to trust us or some Silicon Valley hipster? Also we employ more Kubernetes contributors than anyone else").

That's the real battle, in my experience on the open-source side you'll find mostly engineers from all side collaborating peacefully and building whatever they need to get their job done.


Although Docker images are not hard to build, (it is just a layers of tars with proper jsons) it is very nice to see such tools rise. Although I have a nice Kubernetes cluster, or any orchestrator, due to security reasons, I have to come up with a new VM with Docker installed and build it there, which really sucks. It is sad to see Docker did not implement this years ago although people wanted it a lot. They were busy deprecating the Swarm Whatever^TM for the 3rd time and not listening as usual.


Interesting! We built and use a service called Furan: https://github.com/dollarshaveclub/furan

That said, Furan isn't suitable for untrusted Dockerfiles (or multi-tenant environments) exactly due to the security implications of access to the Docker engine socket.

The issue I see with Kaniko is drift from upstream Moby/Docker syntax. One of the strengths with Furan is that you have the guarantee that the Docker build you perform locally is exactly what happens by the service. When you can't make this guarantee you get into weird situations where "the build works for me locally" but there's some issue when doing a remote build. That's also why we've resisted putting special build magic into Furan (like injecting metadata into the build context, for example).


Does this (or could this) use Buildkit? It seems that Docker themselves are encouraging the development of an ecosystem of third-party container build tools, with buildkit as an interoperability layer. I heard good things about buildkit but haven't tried it yet.

If Kaniko authors are reading this: have you considered buildkit and, if not, would you be open to contributions based on it?

My understanding is that the official 'docker build' itself is based on Buildkit.

https://github.com/moby/buildkit


Kaniko doesn't use buildkit - buildkit still uses containerd/runC under the hood so it can't run inside a container easily.

We are looking at interoperability with buildkit (and the large set of of other tooling like this) through the CBI: https://github.com/containerbuilding/cbi which aims to be a neutral interface on top of things like buildkit, buildah, docker and kaniko that build images.

Discloser: I work on kaniko and other container things at Google.


Very interesting, and it looks like the people working on CBI are also active on Buildkit, which is a good sign!

Thank you for the pointer.


@zapita

FYI: Kaniko plugin for CBI is now available. https://github.com/containerbuilding/cbi/pull/35


Thanks Akihiro for the follow-up. And while I'm at it, thank you for all this excellent code that we get to enjoy for free! It's really fantastic work.


This is great work. Github link for the lazy: https://github.com/GoogleCloudPlatform/kaniko


This is a great tool. Wish it could work with build workflow tools like Habitus (http://www.habitus.io)


Sweet, This is truly great, I was hoping for a service like this for a long time. being able to build images without root privilege!


Kaniko does run in the container as root, but the container doesn't need to be granted any extra privileges when run (you don't need the equivalent of Docker's --privileged flag).




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: