Hacker News new | past | comments | ask | show | jobs | submit login
Introducing Jib — Build Java Docker images better (googleblog.com)
215 points by rbjorklin 8 months ago | hide | past | web | favorite | 102 comments

I'm a Docker noob and I haven't shipped Java in ten years, and I find that this article leaves the main question unanswered: what happened to just building a JAR?

When I read that sentence, I assumed that it must be something about controlling the JVM parameters and version, or the ability to include certain system tools or imagemagick type of CLI thingos, but then as the article goes on, I see none of that. None of the things that, to my uninitiated mind, give Docker an edge over building a plain old JAR.

So why? Why is it necessary / a good idea to Dockerize a Java app? Can anyone help me out?

A jar needs deps. A fat jar still often needs a script to run it with baked-in parameters, JAVA_HOME/PATH manipulation, maybe a tanuki service wrapper, a cron job to keep it running, some what to bootstrap it from scratch (maven) - everything extremely specific to the java ecosystem. Docker can do some a lot of that stuff generically, then leveraging kubernetes you can get much more advanced deployments based on pods, labels, etc... at a much higher level, with a dashboard. Throw in Prometheus/ELK etc... then you have a centralized log store you can just read.

If your entire stack is a monolithic fat jar, this isn't the solution you are looking for.

> I find that this article leaves the main question unanswered: what happened to just building a JAR?

What happened is that the world has turned polyglot. Many of the things that Java pioneered in the wider industry have been resupplied in different places at different levels of the stack.

Containerisation essentially creates uniformity for a lot of workflows. While Java has addressed all these concerns itself for a long time, the overhead of two build chains, two admin toolkits, two registries etc etc becomes increasingly onerous.

Some folks will have a 100% Java mandate. This project is not for them.

Others will have a 100% containers mandate. This project is for them.

We have a custom service that requires a Java jar that plugs into an application server. Our partner org only supports this on CentOS/Fedora/RedHat.

This also talks to another service, which is an Apache binary plugin provided by another partner org who once again only supports CentOS/Fedora/RedHat. This Apache plugin not only requires Apache, but it has to sit on top of a properly configured Fuse file system. For extra fun, we have to run 4 copies of this Apache server with 4 different license keys which are all hard-coded to the same damn file on the file system (thanks assholes).

This application server sits behind an Nginx proxy for a variety of annoying reasons (routing, rewriting response output, security stuff, etc.). We also package it up with a Python server that does some additional delegated work depending on the type of request.

Oh, and we run all of the rest of our infrastructure on Ubuntu.

All these different pieces work together to provide a "single" service. None of them make any sense on their own.

We wrap up all these services and deploy them as a unit. We deploy them to a couple servers behind a load balancer. There's no point breaking them out into separate layers on different servers, we'd just be wasting computation resources as this system sits idle most of the day. We pack them in as tightly as we can manage given the above constraints.

There's some consolidation we could potentially do, i.e. we could do some of the routing in Apache thus removing our dependency on Nginx, but uh, we use Nginx everywhere else.

We could remove the Python piece, write it as another Java jar and plug it into the application server, but again, most of our other stuff is in Python.

This is the kind of stuff where Docker is really useful. It gives us a uniform interface to all of these different pieces. It gives us centralized logging, resource control, process management, service dependency management, isolation (4 instances of Apache with their 4 damn license keys in the same location but isolated form each other), deployment, etc.

If you don't have these kinds of problems, then you might not need Docker. If you do have these kinds of problems, Docker is a life saver.

Do I like that it's this way? No. But this is the reality of the business we are in and we have to accommodate the needs of our biggest customers otherwise we can't pay the bills.

There are really a host of reasons why it's a good idea. Off the top of my head here are three that I think justify the weight:

1. A Jar includes your app, but not your runtime or your applications native deps. Docker containers let you specify a whole lot more about your image that goes beyond the language specific application package.

2. You can use the same tooling and infrastructure for managing your executable binary artifacts across all the languages used in your org.

3. You can take advantage of cluster schedulers like K8s.

Point 1 is taken care by fat jars and jlink.

I hadn't looked at:


before - nor on the linked jep 220.

I wasn't really aware project jigsaw and jdk 9/10 had quite so ambitious goals (paraphrasing:... Getting rid of the old and inefficient jar format based on zip file format...).

While technically correct the maven-jlink-plugin is in alpha and I’ve not yet seen any application packaged this way. If you’re aware of any open-source project doing this I would be very interested :)

With the planned removal of java webstart from java 11 onwards this september, you can expect jlink to become much more relevant.

Not yet.

But I would argue that it would still be more mature than Jib, given that Java 9 is almost one year old.

For large projects, fat JARs are slow to build. Unless you bundle JARs inside the JAR and use a special classloader, like Spring Boot... I don't use them at all because it's a pain. Plus you'd probably want to put your app code that changes in a different layer if you're doing docker images, as opposed to the other 90% of your app which is dependencies that rarely change

That is what build servers are for.

waiting on build servers is still waiting

Not if they are doing continuous builds on regular intervals.

I guess waiting for building Docker images doesn't count then.

fat jars are just as expensive to create as war files. No matter which way you do this you’re going to be copying a lot of files around.

For one I’ve found that Docker containers provide some teachable moments for people trying to learn (or who should already know better) about the CI/CD pipeline.

Docker images take away a lot of wishful thinking. Some people will, at least occasionally, refuse to take responsibility for a problem until you can demo it for them locally. If you can’t demo a problem locally you’re already in a world of pain.

Call it a good fence for good neighbors, if that works for you.

You still build the jar and then you wrap that in a docker image so it can be deployed in a standard way just like all other (non java) services in the cluster.

Assuming you run container infrastructure like Kubernetes or Mesos or what have you

If all you deploy are fat jars to a few nodes, this is overkill and will increase complexity for no good reason. Yet, I predict that most K8S deployments fall into this range. If your devs are bored and want $tech to play with, at list this won't be as damaging as introducing an overly complicated programming language or something.

Fo heterogenous deployments and with a lot of nodes, this standardizes everything and thus reduces complexity, instead of increasing it without adding any value.

    at list this won't be as damaging as introducing an overly complicated programming language or something.
As a 15 year veteran of the Java salt mines, this made me chuckle.

The publication of the J2EE 1.0 spec left me feeling queasy. If I had known instead of just suspected what this meant for the next ten years Rube Goldberg would rule Java, I might have made different choices.

I don’t think it’s an accident that the two projects I’m proudest of were an embedded and a mobile Java app.

Every generations got its own disease.

In particular, if the use-case calls for all dependencies to be bundled together, then why not an uberjar? Why use something as wonky as Docker when the JVM eco-system offers a solution that has been solid and mature for 15 years?

As to my use of "wonky", see some of the frustrations at the end of "Docker protects a programming paradigm that we should get rid of":


It's about having a closer integration (i.e. beyond dropping a fat jar in a Docker) between the Java toolchain and Docker toolchain.

Why? Because when you need to operationalize stuff, you suddenly or less suddenly and more painfully discover a tons of other concerns not directly tied to the actual Java build.

Dependencies outside of uberjar (like JVM, glibc etc.), speed of builds (through incremental builds) and potentially size of the final artifact.

This is a tool that helps with Docker images. If you don't need a Docker image, you clearly don't need it. If the question is why Docker images or why containers, then the answer is not in the docs for this tools.

> what happened to just building a JAR?

Same thing that happened to rsyncing an executable. Every application has _something_ specific to the environment eventually.

To be fair, Docker is not somehow magic. It's "_something_" is a Linux kernel + Docker daemon. That's just a rather small/common dependency.

To actually have no "_something_", you need unikernels.

> what happened to just building a JAR

(1) JARs don't have native code (unless you package that in an extract it at runtime)

(2) JARs don't have the Java runtime (unless you package that in to an executable and extract it at runtime)

(3) JARs can't be easily isolated or managed, at least not to the extent of Docker

jar still needs a java binary

and which java version does it need?

who wants to install and maintain different java versions? Not me. So users provided a docker and I can run that whatever it is.

sometimes an application also need some locales, timezone data, etc.

Ecause you can hide you're running a old and unsecure version (some EE software work still with Java 1.6)

It's not necessary. The simplest thing to do is to build a fat jar.

Shinny new toys syndrome.

A plain JEE container is more than enough for like 99% of use cases.

And starting with Java 9, you can even link everything together VM + fat jar.

> A plain JEE container is more than enough for like 99% of use cases.

Odd, I would say that a JEE container is too much for like 99% of use cases :).

(Point being that most of the time these days all I seem to need is an embeddable HTTP server and a main() function. Yet I see people insisting on incredibly complex JEE container deployments rather than just a simple "java myserver.jar".)

Usually JEE containers get to be shared.

Of course, I am not advocating a JEE container for a doing a single REST point or website.

You cant simply deploy a plain JAR to ECR, K8S (EKS/AKS/GKE), or OpenShift, or any other such system. And organizations/people are increasingly using such tools/services for building their services.

Here is my "initial" sbt implementation: https://github.com/schmitch/sbt-jib

(Should be easier to do after: https://github.com/GoogleContainerTools/jib/issues/337)

It also still misses some important things like that not all resources will be put into the resources layer and also that the resources should be deduplicated, if any.

Edit: thanks google for this awesome library! With this library you prolly loose some money!

Yay! Good steps (in the Java ecosystem at least) to bring containers closer to their potential.

Docker is the best example I know of how "worse is better" can lead to absurdly bad local maxima. Docker solves the most shallow and most obvious problems of software deployment: How to package all dependencies in one big ball of mud.

Instead of declaring dependencies explicitly (with acceptable version ranges like maven, bundler, even nvm can do for ages) we get the equivalent of a bash script (Dockerfile) that is impossible to parse automatically and encourages volatile external dependencies (`curl http://mirror.com/current_version.sh | bash`)

Instead of slim, reproducible build descriptions that could reference single files or packages via hash sums (ala `packages.lockfile`) we get fat filesystems with the caching granularity of snapshots. But why care, developers have multi terabyte SSDs and gigabit network connections by now, right?

Instead of being able to propagate security updates down to each package (/container), we have a filesystem snapshot caching, where cache invalidation is for all intends and purposes impossible. But cache invalidation is hard, so how could you blame docker?

And I get why Dockerfiles are so seductive: They are damn easy. You can build your first docker image in minutes. But also docker doesn't help you to solve any of the deeper problems for you. E.g. you better figure out how to invalidate your cached image layers yourself when packages get security updates. Or just rebuild everything all the time, but there goes your performance and maybe some of those pesky external dependencies are no longer available. Want to internalize those? Build that yourself, too, docker can't help you there, because it is too damn simple.

So, Horray to Jib! All the work that has already gone into Maven to solve the deeper problems (e.g. internalizing external dependencies with Nexus) can be repurposed to build container images. Let's hope we get more of that "more complex but better" sanity.

Seems like a good idea for new project. The article says "You don't have to create a Dockerfile" but honestly, who does?

There are a ton of Gradle/SBT/Maven plugins that build your container for you. Most call your Docker daemon API directly, copy in all your jars and write a startup script. Some generate a Dockerfile, but they stick it in your build directly and you never touch it directly.

The advantage is not needing a Docker daemon. That is pretty nice. However I already need a daemon in our Jenkins pipeline and a lot of my unit tests use the Scala testcontainers framework (so I can run tests against a real MySQL or Postgres DB, which is way nicer than using mocks or H2).

So it is nice, but chances are, we're already running container daemons everywhere already.

Plus I'm sure for this to be really useful, we'll need ECR plugins as well (which will probably need to be 3rd party because I don't see Google supporting AWS officially).

I wouldn't move to this if I was on something that worked. That previous article about the team going back to maven after a gradle migration is kinda in that same view: with a new project you can make it clean, but your current project is probably so customized around your workflow that unless you can make a case for a really bad pain point it will solve, it's probably not worth it.

A good pain point I think this would solve is if your CI doesn't have access to a Docker daemon at all (for security or you're using a hosted solution or whatever) and you want to be able to build containers and publish them in your CI.

If jib doesn't work against ECR, please file a bug. There isn't anything specific to Google here -- it's all based on standard container tooling.

The only thing you should need to do is configure the ECR credential helper: https://github.com/awslabs/amazon-ecr-credential-helper

There's two things here.

The first is layering. A goal is smarter layering without a Dockerfile. A dependency-management tool gives fairly good insight into how to arrange layers to improve cacheability and reduce layer churn.

The second is security. Going without a daemon socket is more than "pretty nice", it is a massive security win. Docker daemons have a lot of privileged access and a wide API. Having that entire massive attack surface lying around to write some tarballs and a JSON file on your behalf doesn't make a great deal of sense.

These both become more valuable at large scales, which is why it makes sense for Google folks to have developed it.

Agreed re: both of these wins.

On the security front, would be ideal if these resulting containers ran as non-root. Hopefully that will come soon.

It'll be really interesting to see where this goes. We currently use Fabric8's Docker Maven Plugin (https://dmp.fabric8.io/) for our Java-based containers. It's a little verbose but works really well- it allows us to run our integration tests directly against the final container images (+ any container dependencies) as part of the standard Maven build.

I wish fabric8 had Gradle support.

This ticket implies that it isn't "build tool agnostic" (https://github.com/fabric8io/fabric8-maven-plugin/issues/609...) and I found a few github repos that are "gradle plugins" for Fabric8 but they only had a few stars or were out of date.

The Fabric8 Maven plugin is awesome. It has some rough edges, but streamlines a ton of things.

Maybe someone can help me out on a question I've not got a clear answer on: is it better to build your artefact (jar file in this case) and put that in the docker image, or is it better to put everything you need in the image (source files, dependencies etc) then build the artefact from within the container?

Seems to me like the first approach (compile outside then put artefact in container) is the most sane/sensible approach and the one used here in Jib, but I have seen people advocating the build-in-container approach as better (self-contained, repeatable and so on).

Any benefits to the build-in-container approach that are compelling?

Docker recently has support for multi stage builds. So you can use one Dockerfile to define how to build your image (produce your jar and necessary other resources) and then copy them into a fresh container running off a more lightweight image that might not contain all the intermediates and your build tools.


> Docker recently has support for multi stage builds.

I view these as a future antipattern. They bring what is properly CI's work inside the Docker daemon, which makes it a larger attack surface (many folks run PRs sight-unseen) and obscures it from outside tooling.

They only exist because of the limits of the Dockerfile format. With something like Jib, that disappears.

Multi-stage builds is definitely my preferred method of building stuff, but the problem I've found is that caching dependencies is bit of a pain, either you need to do some tricks with volumes or your builds will be very slow.

At work, I setup the build system to use Apache Archiva to cache Java dependencies, along with other services like apt-cacher, devpi, and squid, for Debian, Python, and general HTTP caching, respectively. We build all of our images atop the "bitnami/minideb" base, and split the Dockerfiles into "partials" that source a dynamic parent image via a build argument. For example, the partial that sets up Debian package and HTTP caching:

    ARG base
    FROM ${base}
    ARG apt_cacher_host
    ARG http_proxy
    ARG https_proxy
    ARG no_proxy
    ENV HTTP_PROXY="${http_proxy}"
    ENV HTTPS_PROXY="${https_proxy}"
    ENV NO_PROXY="${no_proxy}"
    ENV http_proxy="${http_proxy}"
    ENV https_proxy="${https_proxy}"
    ENV no_proxy="${no_proxy}"
    RUN set -ex; \
        if [ ! -z "${apt_cacher_host}" ]; then \
            echo "Acquire::http::Proxy \"${apt_cacher_host}\";Acquire::https::Proxy \"false\";" >/etc/apt/apt.conf.d/01proxy; \
        fi; \
For Java, another partial creates a Maven settings file with the Apache Archiva URL:

    ARG base
    FROM ${base}
    ARG maven_mirror
    COPY devops/image-partials/java-artifacts/settings.xml /tmp/settings.xml
    RUN set -ex; \
        env; \
        if [ ! -z "${maven_mirror}" ]; then \
            sed -r "s!%%MAVEN_MIRROR%%!${maven_mirror}!g;" /tmp/settings.xml >/usr/share/maven/conf/settings.xml; \
The important part of settings.xml is this section:

And voila, Maven will go through Archiva for all its dependencies, which are transparently cached so that subsequent builds don't need to re-download everything again.

I would do both.

Make a build container that pulls in the Git hash that you want to build/test against. You can keep your dependencies, build process, unit and func tests in the container. Your dev team can use the same container image to build and test in the same way that you do, pulling in the latest code from Git and running a docker build. Now nobody needs to set up their environment because it's all just in the Docker image in the same way.

You don't need all those build and test files in production, so you can build a new "deploy container" with just the artifacts from your build process. I would also record the versions of your dependencies, as well as the SHA hash of the build/test container you used, and the SHA hash of the Git repos you pulled from. You can also set things like flyway migrations to run each time the container runs.

Now you can deploy this second container somewhere. You can then use the same build/test container you used before to run func tests against the deployed container, to double check it works in your deployed environment.

The former can be handily taken care of in your CI toolchain. You need a build agent with all of the CLI tools you need for your build. Many support Docker these days. Some can even spin up images on demand.

This is (for me) the best part of keeping your build process in a container: reducing the complexity of the build agent. The only software dependencies on your build agent are Docker and Git. Your CI/CD job pulls a Git repo with a Dockerfile, and runs 'docker build', and that pulls every dependency it needs.

Agreed, and I can’t wait to work someplace like that again.

We have a build agent with 9 versions of Node in it. It was 6 when I started here. Once a month I get to help someone figure out weird errors because one task has the wrong value in a drop down. It’s such busy work I can’t even.

When I run the CI system, shit gets done and people are expected to be able to recreate build failures by themselves. Give a man a tool, make it reliable, then expect him to use it.

I am obliged to refer you to Concourse, which is really really good at containerised build and automation: https://concourse-ci.org/

I really like how stupid easy that index page is to read and understand. These seem like pretty good docs.

Some of the documentation is missing, so I'm not sure if I missed this or not. But can you make a pipeline out of independently configured jobs? Or a pipeline of pipelines? Often you'll want to chain jobs or pipelines that are owned by different groups. I imagine they support this, if they support teams...

The usual way to create pipelines of pipelines is that one pipeline puts to a resource that another pipeline will trigger on.

For example, one pipeline might do a bunch of stuff and drop a file into a bucket. Unbeknownst to pipeline 1, pipeline 2 is triggered by the file drop and begins to do its own thing.

The key here is that all statefulness is the responsibility of Resources -- Concourse does not do very much except bookkeeping about what Jobs have seen what versions of a Resource.

So if you need multi-pipeline coordination you can do this through any shared state you please. One that I personally like is tagging repositories, these are a nice visible marker of activity to observers.

At Pivotal each team tends to have several pipelines. Teams which rely on lots of upstream work (Release Engineering, Master Pipeline, CloudOps) tend to run many more pipelines which respond to activity by upstream teams. Similarly, Buildpacks has many pipelines which react to changes made in projects outside our control. When Node releases a new version, the pipeline checks out the sourcecode and begins to roll a new buildpack from scratch. At no point do the downstream teams need to coordinate with the upstream teams.

I'd say that automation is one of our secret weapons, but I never shut up about it.

Building in a container gives you the portability benefit of having tools installed in a container.

But not building in your productiom container makes it smaller.

Dockerfiles can actually do both, via "mutli-stage builds". You can build in one container and in the same Dockerfiles copy files from that build container to your final container.

Some take a really extreme stance where the production container has nothing superfluous, not even a shell. https://github.com/GoogleContainerTools/distroless

> Any benefits to the build-in-container approach that are compelling?

The build in container approach makes it so that in theory you have reproducible builds too from any Docker supported machine. I personally like this approach because it makes it so that you don't run into the case where "It works on my machine" when testing locally.

I think Java has such good build tooling that you don’t need to build in a container for reproducible builds. Other languages might be different.

Install the JDK, install Maven, check out the project, “mvn package” and it’ll download what it needs and produce the JAR to be copied into the Docker image by the Dockerfile.

Everything your jar contains can be stored in a git-hub repo with the project. Then you build a fat jar and put it in the image.

I would advice against using fat jars with docker. They do not layer nicely, so your layer sizes become much bigger than necessary, slowing everything down. If you have multiple applications then with fat jars you lose the ability to share the common bits in the applications, which could also have impact on maintainability.

I would argue that fat jars make everything simple and portable. Especially if you do performance testing frequently on hardware that isn't running docker. You can compare performance baremetal vs. VM's this way.

Furthermore, as log as you use gradle/maven with artifactory then maintainability of fat jar artifacts is pretty simple.

We run our builds within containers (using the Jenkins k8s plugin) for repeatability, but the generated artifacts (jars and Docker images) are pushed to our artifact repository (Artifactory) as part of the build process. Our final deployable artifact is our base image with a single layer added to include our app.jar.

Maybe I am old school. But I kinda like a WAR-file, run/deployment via a maven plugin and a simple servlet container (like Tomcat). If you run a pure Java stack why not keep to this stuff that has been around for ages?

The reason to run your WAR+Tomcat or JAR file in a container is that it is portable across different Linux distributions. The container appears to have a certain environment to your application, regardless of the Linux it is running on.

Some containers launch a single process, your process.

Some containers launch a "linux distribution" sans kernel, really launching the /sbin/init process, such that you now think you have a complete Linux system, as root, with files and libraries laid out in a way that you expect -- regardless of the conventions of the actual Linux system running your container.

A container can be as cheap as launching a process, since that is all a container really does. It launches a process with a lot of parameters and controls that set up the environment, network, isolated process table, your own root user zero, your own root directory, etc. It's just kernel features that isolate your process so that it has a controlled view of what the network looks like, what the filesystem is, etc.

The other reason to package as a container instead of a WAR or JAR is that other DevOps or sys admins can install your container without knowing anything about Java. They don't even have to know that you are using Java. I can get a container and install it and run it so that it does some specific function, and I don't care what technology the developers used. Java, Python, PHP, C, God forbid Perl, it doesn't matter. Maybe the container has a mini-Linux distribution inside it. Maybe not. I don't care, I don't know. The container just plugs in and runs. Think of a Container as the benefits of a WAR / JAR but for a larger world outside of Java.

A container is also a great way to package a legacy application. Or an application using a technology unfamiliar to those deploying it on servers. Your container has all of your dependencies. A specific Java. Specific dependencies, regardless of what the host system has.

I can kinda understand that using docker makes Java programs "run" the same way as other languages. But from my perspective, if you run a pure Java stack, using docker instead of the Java technologies for packaging, deployment seems like a step backwards.

You can probably use for defining and spinning up new servers (with JRE, database, servlet container and so on) though but that's a different story.

For a pure Java stack, you might not find any advantage in Docker.

There is still one however -- Kubernetes. Run many instances of a Docker container in Kubernetes as a cluster.

Oh, but I can set up various Java technology clusters without Kubernetes.

Think of Kubernetes as doing for clusters what an OS does for a single computer.

1960s: Joe: can I use the computer today to run my workload? Bob: yeah, but my workload is running the computer until about 3 PM.

Today: Joe: can I use the (let's say Hadoop) cluster to run my workload? Bob: yeah, but my workload is running the cluster until about 3 PM.

An OS lets Joe and Bob both use the computer at the same time.

Kubernetes lets Joe and Bob both use a generalized cluster at the same time.

Joe has his Docker container (say a Java workload set up as a Hadoop node).

Bob has his Docker container (say some other kind of Java parallel computing workload, or maybe Python or other).

Joe tells Kubernetes to run 50 instances of his workload.

Bob tells Kubernetes to run 30 instances of his workload.

Joe and Bob's workloads can each see each their own nodes on a private network that cannot see the other's nodes on the network.

Maybe the physical hardware cluster only has 45 nodes. Kubernetes might schedule multiple docker containers running on the same physical node.

Kubernetes is something you would expect from Google running a giant data center where many different people want to run multiple-node clusters ad-hoc at any given time. Even multiple instances of the same type of workload, such as Jane wanting her own 50 node Hadoop, while Carol runs a 40 node Hadoop cluster, but they don't interfere with each other.

Containers give control over the application's JRE version (without making a self-extracting executable).

Sure but in practice that would change very rarely.

For what I am working on a deployment takes a couple of seconds including controlled shutdown and startup of the application.

I don't know how fast docker/jib would be but I am imagining this would be a somewhat heavier setup?

Not having to install and manage a tomcat server is pretty nice. There is "embedded" tomcat so you can just run `java -jar my-app.jar` but then you still have to worry about installing and managing the JVM plus whatever file-system resources like logging locations etc.

> [JRE version] would change very rarely.

It's optimizing for setting up new machines and having everything about how to run an application contained within that application itself rather than relying on sysadmins to `yum install` things or whatever. Plus with docker you always run the same version in prod that you do in dev (and on your local machine) unless you intentionally do something else.

> I don't know how fast docker/jib would be but I am imagining this would be a somewhat heavier setup?

Not really - a layered docker image may have tomcat as a base layer and then the app jars etc layered atop. Deployment is just downloading a new layer which wouldn't really be any overhead atop of just the jars.

I am curious about startup / deployment times?

When I a deploy WAR-file it, it is fairly small (just app + dependencies), and startup / shutdown is well-defined and happens fast, allowing me to update an application with just a few seconds downtime. But here - how fast would an update be? How much downtime? Does it reinstall the os, the vm, the app, are the shutdown / startup operation well defined or does the process just get killed?

Wouldn't it take quiet a bit of work just to match what you get out of the box in a traditional setup?

Yes to what kimdotcom says - blue/green is the only real sane way to do it with minimal/no downtime, and that strategy/recommendation is good even if you're not running in Docker. If you're just doing cowboy deploys and accepting a few seconds of downtime while the new war file explodes and tomcat starts up, then Docker won't add any noticeable overhead of its own.

> Does it reinstall the os, the vm, the app,

No. First, "reinstall" isn't the right word. The "OS" is baked into the container. There is no vm, Docker is not a vm. It doesn't re-install the app, it's starting it up in a new container. All your deploys are likely doing is pulling a new container image or layer (about the same size as your war) and calling `docker run` which could be more or less the same as `java -jar...`.

> are the shutdown / startup operation well defined or does the process just get killed?

There is a well-defined container lifecycle that iirc sends SIGTERM by default. This is similar to what systemd or similar would do.

You would bring up a 2nd container with the new code, map the traffic to this container, and once you are sure it is good, trash the old container.

If there is a problem with the new code, just point the traffic back to the working container and kill the new container.

So it is so slow you have two spin up a second node :-D

> you still have to worry about installing and managing the JVM

Up to a very limited point. JVM backward compatibility is very good so you can do global version upgrades with reasonable confidence, and the JVM is such a common requirement (in the context of being a java shop) that including it in the base system image makes sense.

> plus whatever file-system resources like logging locations etc.

It's very much doable to get these down to zero (e.g. using remote logging), and I find it's worthwhile.

Backward-compatibility is one thing but forward-compatibility is something else entirely.

I upgraded a whole java shop to java8 and it was not a fun experience having two JVMs on the same machine and all the different tools thought they knew which one was the correct one but none of them agreed (it wasn't as simple as setting JAVA_HOME).

All of this java tooling/versioning is simple in theory since java is a pretty simple ecosystem (compared to something like C++ or even python to an extent), but there's so many layers of indirection in a modern build & deployment system that it's really painful to debug when some tool somewhere sets its PATH incorrectly or doesn't respect JAVA_HOME or symlinks or....)

If the apps had been docker-ized there wouldn't have needed to be a JVM on the host machine at all, and teams could come in and use java9 or whatever without having to work with SREs/devops to upgrade the machines and worry about whatever other apps may be using those machines or virtual-machine images.

> I upgraded a whole java shop to java8 and it was not a fun experience having two JVMs on the same machine and all the different tools thought they knew which one was the correct one but none of them agreed (it wasn't as simple as setting JAVA_HOME).

I agree that having two JVMs on the same machine without something like docker can be problematic, but in my experience there's no need to ever do that - the backwards compatibility is good enough that you can just replace the old JVM with the new one.

I mean, if you're worrying about this kind of thing shouldn't you also worry about whether apps are built against older versions of docker? How easy is it to have two different versions of docker on the same machine and make sure that docker upgrades don't impact other apps running on those machines?

Not if the JVM goes with the application, specially now that Java also has a linker.

How a container works is that the file system has a series of layers. Like stacked sheets of glass with drawings on them. What you see at the top is the file system that results from what is on each sheet of glass. Things on higher sheets of glass can obscure or shadow things from lower layers.

Your layer, the top most layer, might have only your JAR. Assembling and deploying your container could be very fast -- but docker takes care of that. You could organize your container so that you have a layer with your Tomcat. Another layer with your JVM. Then you can easily swap in / out layers to define different versions of your container.

Whoever deploys your container on a server doesn't know or care what technology you use. Java, Python, etc. They just plug in the container and it runs. (Docker assembles its layers.)

It depends. The build / environment base images can be cached, and then deployment could be just pushing and additional layer with the jar. This could be pretty pretty fast if setup correctly.

You can achieve this with JAVA_HOME and installing your JDK or JRE in a standard location. This has been possible for 20+ years.

The person installing your application may not know anything about Java. They don't want a 35 page installation document.

With a container, they just plug in your application and it runs. They don't know and don't care what technology it was developed with.

You have complete control of your dependencies, which exact Java runtime you use, any other native libraries or tools you use, etc. They are all part of your container and independent of the host system. You can't see the host system, and the host system doesn't care what is in your container.

The container is like a giant simple JAR file for any technology that runs on a Linux system.

Once you know how to install and run a container, you can run apps developed in unfamiliar technologies. You don't have to know about their conventions for paths, or classpaths, or other things you don't care about. That is all packaged up inside the container.

I already know all this, and it's nice, but still unnecessary for many, many use cases.

> JAVA_HOME and installing your JDK or JRE in a standard location

...and that's exactly what Docker + Java does.

Just in a standard format that works with any kind of program, Java or otherwise.

This seems pretty neat if you have the infrastructure already, removes a lot of the busy work around creating docker images etc, and from the looks of the instructions it looks like you don't even need Docker installed to create the image - which is great for CI!

Someone asked me if I used Google's Skaffold (save+push to kubernetes), and I giggled and told we mainly use Java and Spring even boot takes more than 10 seconds and building also takes around 10 seconds, so if you do not have .m2 caches, it would take much more time.

Dependency caches and already built components, in all languages (Go, Java, Nodejs) is always a problem for us in Docker in our CI/CD, glad to see such solutions coming up

I work on skaffold and with the jib folks at Google. Why not have the best of both worlds? :) The code that lives here will use jib for the build and skaffold for the deploy, so you get those really fast (and native) workflows.


I had this problem with Gitlab until I worked out how to use it's built in cache mechanism to save the .m2 directory, the bit I was missing was to force maven to use a different location for that directory.

Often you can just mount a persistent volume where the cache should go.

What do you do when they are shared and remote?

I started to make something similar with Node, since I wanted to use Node to make containers rather than shell scripts (I don't think learning shell scripts should be a prereq for NoOps). Unfortunately I got distracted, before I eliminated the need for a Dockerfile (it had a wrapper Dockerfile that ran npm scripts) https://github.com/diodejs/dev/blob/master/Dockerfile

It's based on a plain alpine container, with the idea that you can run `apk remove node` when you're done to clean it up if you don't need node once the container is built.

A lot of people are missing that this is really about replacing shell scripts with Maven or Gradle. You can still build a fat JAR.

Google have also done work on the more general idea of "read dependencies to layout containers", under the title FTL. I would give you a link but it is a maddeningly ungoogleable name.

Docker works wonders if you don’t have a functioning app server or something equivalent. If you don’t have the possibility of deploying to an already functioning cluster, use docker. If you have an environment where you can’t “run everywhere” use docker. However docker is just something to handle environments. Not applications.

If you really want to explore why running a JVM in a Docker container makes a lot of sense, try running a modded Minecraft server with and without a container for a few weeks.

Another commenter in the thread mentioned that "Docker eliminates wishful thinking." Getting that first modded Minecraft server operational will drive that home.

Looking forward to Microsoft investing some more in this area with Core. Creating a new Core project offers to add Docker support which is pretty decent out of the box but it’s pretty much hands off after that point and you have to do everything yourself from there.

Java isn't an ideal fit for containerization or horizontal scaling by nature, but if you're stuck with it and really want to avoid using anything other than Java ecosystem tooling, this seems like a timesaver.

Why would Java not be an ideal fit for containerization (why would any runtime not be)? Also, given you don't have to pack everything into one VM (can have many nodes, VMs, load balanced, whatever)...what makes Java particularly bad for horizontal scaling?

Yea I don't understand the original comment... however it is true Java isn't (or at least wasn't until recently) ideal for containerization, since the JVM would not obey things like cgroup limits for memory and CPU. This was fixed in Java 9 and backported to the Java 8 (update 131) however there are still potential issues regarding CPU isolation that are not trivially solved without some explicit configuration.

For a good overview, see: https://mesosphere.com/blog/java-container/

I ended up leveraging multi-stage building with Docker and have a gradle image as my builder image, build the jar, and copy that to the image that is going to get deployed.

How does this compare to sbt-native-packager docker integration?

sbt-native-packager doesn't do dependencies as separate layers, so the image has a big fat layer with all your jars.

U can help with dockerizing Java application and running it in nomad cluster

I can help


Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact