CNCF to Host Cloud Native Buildpacks in the Sandbox

sclevine · on Oct 3, 2018

For those who would like to better understand the technology:

Presentation to CNCF TOC: https://docs.google.com/presentation/d/1RkygwZw7ILVgGhBpKnFN...

Formal specification: https://github.com/buildpack/spec

Sample buildpacks: https://github.com/buildpack/samples

nailer · on Oct 3, 2018

In other words: they're (typically) shell scripts that set up an OS to run a language.

Here's the node one: https://github.com/buildpack/samples/blob/master/nodejs-buil...

a012 · on Oct 3, 2018

It's first time I heard of buildpacks, can I just throw in other than nodejs, ie. Laravel source code and build an image?

jacques_chester · on Oct 3, 2018

If there is a conforming buildpack, then yes.

What you're looking at here is informally the "v3" effort, which extends and consolidates the "v2a" and "v2b" designs evolved by Heroku and Cloud Foundry respectively from the original Heroku design.

In v2 both Heroku and Cloud Foundry provide supported PHP buildpacks, as well as Java, Ruby, Python, .NET Core and I forget the rest right now. There are hundreds of community buildpacks.

dreyfiz · on Oct 3, 2018

Yeah, do you have a composer.json in the root of your repo? Buildpack will detect a PHP app...

dullgiulio · on Oct 3, 2018

I have the feeling that the detect phase is absolutely orthogonal to delivering what should basically be an executable package.

What is the rationale behind having the build step tied to the packaging format?

jacques_chester · on Oct 3, 2018

It's not tied to the packaging format. Detect is the step that decides which buildpack or buildpacks will be responsible for constructing the image from the sourcecode.

Typically this means that buildpacks look for files that correspond to the relevant ecosystem. Maven buildpacks look for pom.xml. PHP buildpacks look for composer.json. Etc.

Nothing in this creates a hard binding. Detect steps may use whatever logic they need to decide on whether to signal they can work on a codebase.

Edit: in the v3 design the detect script can also provide dependency information that later steps can pick up. So for example, a JDK buildpack can say "yes, I can interpret this codebase, and I can contribute a JDK". A later buildpack can then look for this contribution as a condition, eg. the Maven buildpack can say "I will proceed if I see a pom.xml and if there is a JDK available".

josegonzalez · on Oct 3, 2018

> As one of the maintainers of Herokuish - a tool that manages the application of buildpacks to repositories and is used in quite a few Buildpack implementations - I am super happy that CNCF reached out to us and included us in the process. Oh wait, that didn't happen...

All jokes aside, this looks great. Super-early of course - seems like there are quite a few issues in the `pack` repository to be implemented - but I'm excited to see where this lands. Buildpack detection and application is not a straightforward problem.

jacques_chester · on Oct 3, 2018

Please join us at https://slack.buildpacks.io/ !

(ps. I know what these surprises can feel like -- for what it's worth, I am sorry)

gigatexal · on Oct 3, 2018

What am I missing? Looking at the example for java it looks a lot more complicated and a step back from docker and its layers concept.

sclevine · on Oct 3, 2018

Dockerfiles require you to rebuild lower layers when any upper layers change, even though the OCI image format doesn't care about this. Cloud Native Buildpacks can intelligently choose which layers to rebuild and replace. Additionally, certain layers can be updated en mass for many images on a registry (using cross-repo blob mounting, with no real data transfer!), which is useful for patching CVEs quickly at scale.

The samples take advantage of this (as well as a separate transparent cache) in order to demonstrate the different aspects of the formal spec. A simple buildpack is not necessarily much more complicated than a simple Dockerfile.

gigatexal · on Oct 4, 2018

Yes, if the underlying layer’s hash changes then it has to be rebuilt. But if you just change index.html it caches the other layers and builds are very quick.

My issue with Buildpacks is that it looks like a glorified bash script (which is a skill I am not bashing — pun not intended) whereas a dockerfile is much more human readable and the idea of layers, for a guy coming from a systems background, is much more intuitive for me. The analogy of a very lightweight VM makes perfect sense to me which means I’m much more productive with it.

jacques_chester · on Oct 3, 2018

There are several audiences.

1. Developers.

What you need to know is: it just works. You don't even need to write a Dockerfile any more. The buildpack turns your code into a well-structured, efficient runnable image with no additional effort.

2. Operators.

What you need to know is: oh thank god no more mystery meat in production. Patching the OS is a ho-hum affair instead of a stone grinding nightmare that turns your developers into a white hot bucket of rage because you have to nag them or block them from deploying or both.

3. Platform vendors, buildpack authors and curious passers-by

What you need to know is: All the other stuff about detect, analyse, build or export.

Unless you are in group 3, the basic thing is that Buildpacks require less effort than Dockerfiles with more safety and faster builds.

coredog64 · on Oct 3, 2018

I don't think you're missing anything. It's vendors that don't do Docker trying to stay relevant in a Docker/Kubernetes world. By making Docker more complex, said vendors can continue to charge $500/pod for OSS k8s.

jacques_chester · on Oct 3, 2018

It's opensource and you can run it yourself, which has incidentally been true since before either Docker or Kubernetes existed. We've got an open PR to add Cloud Native Buildpack support to Knative Build. You can take these containers and run them on whatever Kubernetes you like.

Disclosure: I work for a Said Vendor.

mosselman · on Oct 3, 2018

I just tried this on my rails project and it is detecting a nodejs project. Is there a way to have it use ruby instead while still taking care of my yarn dependencies?

justinsaccount · on Oct 3, 2018

This can be a problem with buildpacks, since you'd need one buildpack that has both the ruby and nodejs runtimes.

With Docker it is easy to make this work using multi-stage builds, I'm not sure if buildpacks account for this at all.

jacques_chester · on Oct 3, 2018

> This can be a problem with buildpacks, since you'd need one buildpack that has both the ruby and nodejs runtimes.

This isn't strictly true of v2 designs (both Heroku and Cloud Foundry have schemes for multi-buildpack support) and in the v3 design explicit consideration is given to making mix-and-match a triviality. Buildpacks can cooperate quite easily and in multiple groups.

Where this shines is in updates. If I have an OS base layer, a JDK buildpack and a Maven buildpack, the layers they generate can be independently updated without needing a complete rebuild. So far as I am aware, this is not currently possible with a Dockerfile, multibuild or not. If you invalidate the OS layer, everything else gets rebuilt whether it needed to be or not.

justinsaccount · on Oct 3, 2018

That's cool, I'll need to read up on the v3 design.

I think you are right about the rebuilds, but maybe not in all cases.. If you had.

  FROM nodejs:whatever as js
  RUN  npm build

  FROM golang:whatever as go
  RUN  go build

  FROM base
  COPY --from js app.js
  COPY --from go app

If only 'base' was updated rebuilding the image would just need to re-run the COPY commands. The only way everything would get rebuilt is if you also updated the go and js images.

I think for more complicated applications https://grahamc.com/blog/nix-and-layered-docker-images is ultimately the best way to do things... ensuring that every step is isolated with explicit dependency information.

jacques_chester · on Oct 3, 2018

I agree, the Nix approach makes lots of problems go away, but you have to be bought in first. I wrote something distinct a few months back which overlaps on the "we can be smarter about layers" thing[0].

Google folks worked on "FTL", which is a technique for determining layering by reasoning about packaging system information. Jib[1] is one such system. There is a view that it will be possible to use FTL implementations or derivatives as buildpack components in the future.

[0] https://docs.google.com/document/d/1M2PJ_h6GzviUNHMPt7x-5POU...

[1] https://github.com/GoogleContainerTools/jib

coredog64 · on Oct 4, 2018

Isn’t that the problem that Google’s image-rebase tool [0] is supposed to solve? Given a well constructed image with an OS, JDK, and app layers, it could rewrite any (or all) layers.

[0] https://github.com/google/image-rebase

jacques_chester · on Oct 6, 2018

That's one the technologies being used in buildpacks v3. It's the key to very fast updates. But it's not the whole picture. Having a standard way to assemble the images means that the rebasing operation can be done safely.

We've also met to talk about buildpacks and have pre-existing working relationships with all the relevant folks in Google Cloud Builder and Google Container Tools teams through our work on Knative.

EdSchouten · on Oct 3, 2018

I currently use Bazel in combination with rules_docker to construct container images of applications, often written in Go. As I don't depend on cgo, my container images may be bare (i.e., only containing a Go executable and some static artifacts).

Though I understand that there are likely only dozens of us, my question is: what would the use of Buildpacks buy us for this specific use case?

justinsaccount · on Oct 3, 2018

one case is that you could have a bazel buildpack that contains bazel and java and all that stuff requires in order to run the build.

Unfortunately the resulting image would be larger than just a single binary, but building it would be a lot easier and repeatable for other people working on the project. The base layers would be cached though so in practice it might not be that much larger on disk.

I looked into using bazel for doing similar things, and the biggest stumbling block is that bazel itself is a PITA to install on all platforms. May end up trying https://please.build/ at some point.

jbott · on Oct 3, 2018

The bazel installation process has gotten a lot better lately - they now bundle OpenJDK inside the binary and it is only a single download.

justinsaccount · on Oct 3, 2018

Unfortunately that's a non starter for things like debian.. but that's good to know, I'll give it another look.

One thing I really like about bazel is the pkg rules.. I currently use fpm/goreleaser(nfpm) to build rpms for things, and it's nice having a single build tool that can build the app and spit out an rpm.

jacques_chester · on Oct 3, 2018

Mostly the advantage would be that 1) you wouldn't have to maintain the buildpack yourself, 2) operators can upgrade your software without needing to get you to do it, 3) upgrades might actually be faster due to layer rebasing and 4) if you add other stacks to your architecture, you still have the same development, deployment and update experience.

jacques_chester · on Oct 3, 2018

Hi, in my role as professional gadfly, I have been involved with this tangentially for a few months and directly for a few weeks (on behalf of Pivotal).

I will make mistakes, but ... AMA.

spenczar5 · on Oct 3, 2018

I know almost nothing about this topic. I have only really built container images with Dockerfiles. How would you introduce buildpacks to me?

jacques_chester · on Oct 3, 2018

I used the Onsi Haiku over on the equivalent HN thread on Heroku's blogpost:

    Here is my source code.
    Run it on the cloud for me.
    I do not care how.

The gist is that Docker containers are awesome for the Day 1 experience. I write a Dockerfile and I'm off to the races.

But then Day 2 rolls around and I have a production system with 12,000 containers[0].

1. What the hell is in those containers, anyhow?

2. A new CVE landed and I want to upgrade all of them in a few minutes without anyone being interrupted (or even having to know). How?

3. I have a distributed system with many moving parts. I build a giant fragile hierarchy of Dockerfiles to efficiently contain the right dependencies, making development slower. Then I snap and turn it into a giant kitchen-sink Dockerfile with the union of all the dependencies in it. Now production is slow as hell.

4. Operations become upset about points 1-3. Now I can only use curated Dockerfiles, can only come through our elaborate Jenkins farm, rules rules rules. Wasn't the purpose of Dockerfiles to make this all just ... go away?

Buildpacks solve all of these. I know what's in the container because buildpacks control the build. I can update CVE flaws in potentially seconds. Each container can have what it needs - no more, no less.

And most important: the buildpack runs locally, or in the cluster, exactly the same. It's all the developer benefits of Dockerfiles/docker build, minus most of the suck.

[0] Yahoo! Japan has 12,000 prod and 8,000 dev containers using buildpacks: https://www.slideshare.net/iranainanimosuteteshimaou/yahoo-j...

kgilpin · on Oct 3, 2018

It sounds like a buildpack identifies a common case of what someone might be doing in a container / Dockerfile, and standardizes it to remove the arbitrary variation that inevitably occurs when lots of people are independently solving exactly the same problem. For example, the patch level of the OS is directly or indirectly specified in the Dockerfile, but generally doesn’t actually matter to the application.

If your use case fits into a scenario which is handled by an existing buildpack, then the claim is that you’ll be better off using the buildpack because the infrastructure can make optimizations that can’t be made with arbitrary containers.

If your use case isn’t covered by a buildpack, then you can either (1) make a buildpack or (2) revert to raw containers.

(Although your platform may not allow (2)).

jacques_chester · on Oct 3, 2018

I think this is a pretty solid summary.

Clound Native Buildpacks are based on a very well-proved model. Heroku do this at massive scale. So does Pivotal, so do many of our various customers and customers of other buildpack-using systems like Deis.

bryanlarsen · on Oct 3, 2018

Step 1: replace your Dockerfile with one that consists of the single line "FROM cloud-gov-registry.app.cloud.gov/python-buildpack". The buildpack contains magic that knows how to turn a standard python program into a usable docker image. (AKA, it inspects requirements.txt, et cetera).

Step 2: Now that your Dockerfiles no longer contain any real information, retool your orchestration system to use source tarballs rather than docker images.

jacques_chester · on Oct 3, 2018

I'm not familiar with this flow for using buildpacks, myself.

Here's how I would do it:

    rm Dockerfile
    cf push

aaand I'm done.

Someone using Heroku might do it wildly differently. It's super complicated:

    git rm Dockerfile
    git commit -m "Switch to Cloud Native Buildpacks"
    git push

bryanlarsen · on Oct 3, 2018

Sure, you're welcome to skip straight to step 2. :)

pm90 · on Oct 3, 2018

That sounds un-ideal to folks who are used to and comfortable with the Docker image/Dockerfile based workflow. What advantages does this provide over plain Dockerfiles?

stevehiehn · on Oct 3, 2018

In a situation where there are many running containers a security patch could be applied across the board and still have confidence that running apps are not affected. This would be a case by case situation if they were all Dockerfiles.

jacques_chester · on Oct 3, 2018

I agree with this bit.

The cool part is that when the patch arrives, devs don't have to do anything to be patched. Cloud Foundry already does this with buildpacks and so does Heroku.

A big part of what's new is that layer rebasing could make this really really fast.

pm90 · on Oct 3, 2018

Personally, I don't think this kind of magic is a good thing. But to each their own.

jacques_chester · on Oct 3, 2018

Could you elaborate, though? It helps us to understand what seems magical so that we can either explain it better or make the process more transparent.

pm90 · on Oct 3, 2018

For companies with compliance requirements, you're required to change the systems explicitly; this is one of the reason why Docker images work so well; you can target specific tags for deployment and nothing changes with the same tag.

Aside from that; I think its a bad idea to have things update automatically. What if the upstream fix breaks things? It reduces trust in the build system.

jacques_chester · on Oct 3, 2018

Thankyou, I appreciate the answer.

> For companies with compliance requirements, you're required to change the systems explicitly; this is one of the reason why Docker images work so well; you can target specific tags for deployment and nothing changes with the same tag.

This isn't really true, though. Tags are floating targets, only the digest is stable. Taking Kubernetes as an example, suppose I push an updated Pod definition where I've changed an image tag from v1 to v2. If the tag is not properly locked, then I can be running multiple versions of the software without even realising it.

Speaking of regulation, we find a lot of people like buildpacks for that exact reason. Operators know exactly what OS is running in every container, exactly what JDK is running in every container, everything up to the runtime (and as FTL matures, up to the package dependencies as well). The platform doesn't have to accept any old container, they can all enter through a trusted pathway.

You can do this with docker builds, of course. You build CI/CD, you have centrally-controlled images, prevent non-conforming images from reaching production and so forth. But then you've pretty much recreated all the stuff buildpacks gave you, except you're the one having to maintain it.

> Aside from that; I think its a bad idea to have things update automatically. What if the upstream fix breaks things? It reduces trust in the build system.

Rebasing layers is close to instant. You can rollback the change as soon as it looks bad. More to the point, if the OS vendor or runtime have broken ABI compatibility, rebuilding a docker image won't necessarily help you to notice that before runtime.

pm90 · on Oct 3, 2018

I think I may have been unclear with my terminology; I said tag but I meant the sha checksum of the built docker image, which is unique.

jacques_chester · on Oct 3, 2018

Gotcha. Not my intention to jump down your throat with hiking boots, but the tag-vs-digest thing has been a big part of what I've worked on over the past few months. I agree heartily that keeping the books using digests is the only sane option and buildpacks retain that property. They're just producing and updating OCI images, at the end of the day.

NewJazz · on Oct 3, 2018

I wonder how this relates to LinuxKit.

codefinger · on Oct 3, 2018

LinuxKit (as I understand it) is for building the OS itself. Buildpacks are a higher level abstract for building a complete application image (with the emphasis on application). They take app source code as input, and output a docker image that's ready for prod.

tjfontaine · on Oct 3, 2018

LinuxKit is a way to build immutable machine images by stitching multiple containers together.

You could use the resulting container from a buildpack in an image built by LinuxKit.

m-arnold · on Oct 5, 2018

What is the difference between buildpacks and packer?