
Introducing Jib — Build Java Docker images better - rbjorklin
https://cloudplatform.googleblog.com/2018/07/introducing-jib-build-java-docker-images-better.html
======
skrebbel
I'm a Docker noob and I haven't shipped Java in ten years, and I find that
this article leaves the main question unanswered: _what happened to just
building a JAR?_

When I read that sentence, I assumed that it must be something about
controlling the JVM parameters and version, or the ability to include certain
system tools or imagemagick type of CLI thingos, but then as the article goes
on, I see none of that. None of the things that, to my uninitiated mind, give
Docker an edge over building a plain old JAR.

So why? Why is it necessary / a good idea to Dockerize a Java app? Can anyone
help me out?

~~~
koalaman
There are really a host of reasons why it's a good idea. Off the top of my
head here are three that I think justify the weight:

1\. A Jar includes your app, but not your runtime or your applications native
deps. Docker containers let you specify a whole lot more about your image that
goes beyond the language specific application package.

2\. You can use the same tooling and infrastructure for managing your
executable binary artifacts across all the languages used in your org.

3\. You can take advantage of cluster schedulers like K8s.

~~~
pjmlp
Point 1 is taken care by fat jars and jlink.

~~~
rbjorklin
While technically correct the maven-jlink-plugin is in alpha and I’ve not yet
seen any application packaged this way. If you’re aware of any open-source
project doing this I would be very interested :)

~~~
nomercy400
With the planned removal of java webstart from java 11 onwards this september,
you can expect jlink to become much more relevant.

------
merb
Here is my "initial" sbt implementation: [https://github.com/schmitch/sbt-
jib](https://github.com/schmitch/sbt-jib)

(Should be easier to do after:
[https://github.com/GoogleContainerTools/jib/issues/337](https://github.com/GoogleContainerTools/jib/issues/337))

It also still misses some important things like that not all resources will be
put into the resources layer and also that the resources should be
deduplicated, if any.

Edit: thanks google for this awesome library! With this library you prolly
loose some money!

------
Perseids
Yay! Good steps (in the Java ecosystem at least) to bring containers closer to
their potential.

Docker is the best example I know of how " _worse is better_ " can lead to
absurdly bad local maxima. Docker solves the most shallow and most obvious
problems of software deployment: How to package all dependencies in one big
ball of mud.

Instead of declaring dependencies explicitly (with acceptable version ranges
like maven, bundler, even nvm can do for ages) we get the equivalent of a bash
script (Dockerfile) that is impossible to parse automatically and encourages
volatile external dependencies (`curl
[http://mirror.com/current_version.sh](http://mirror.com/current_version.sh) |
bash`)

Instead of slim, reproducible build descriptions that could reference single
files or packages via hash sums (ala `packages.lockfile`) we get fat
filesystems with the caching granularity of snapshots. But why care,
developers have multi terabyte SSDs and gigabit network connections by now,
right?

Instead of being able to propagate security updates down to each package
(/container), we have a filesystem snapshot caching, where cache invalidation
is for all intends and purposes impossible. But cache invalidation is hard, so
how could you blame docker?

And I get why Dockerfiles are so seductive: They are damn easy. You can build
your first docker image in minutes. But also docker doesn't help you to solve
any of the deeper problems for you. E.g. you better figure out how to
invalidate your cached image layers yourself when packages get security
updates. Or just rebuild everything all the time, but there goes your
performance and maybe some of those pesky external dependencies are no longer
available. Want to internalize those? Build that yourself, too, docker can't
help you there, because it is too damn simple.

So, Horray to Jib! All the work that has already gone into Maven to solve the
deeper problems (e.g. internalizing external dependencies with Nexus) can be
repurposed to build container images. Let's hope we get more of that "more
complex but better" sanity.

------
djsumdog
Seems like a good idea for new project. The article says "You don't have to
create a Dockerfile" but honestly, who does?

There are a ton of Gradle/SBT/Maven plugins that build your container for you.
Most call your Docker daemon API directly, copy in all your jars and write a
startup script. Some generate a Dockerfile, but they stick it in your build
directly and you never touch it directly.

The advantage is not needing a Docker daemon. That is pretty nice. However I
already need a daemon in our Jenkins pipeline and a lot of my unit tests use
the Scala testcontainers framework (so I can run tests against a real MySQL or
Postgres DB, which is way nicer than using mocks or H2).

So it is nice, but chances are, we're already running container daemons
everywhere already.

Plus I'm sure for this to be really useful, we'll need ECR plugins as well
(which will probably need to be 3rd party because I don't see Google
supporting AWS officially).

I wouldn't move to this if I was on something that worked. That previous
article about the team going back to maven after a gradle migration is kinda
in that same view: with a new project you can make it clean, but your current
project is probably so customized around your workflow that unless you can
make a case for a really bad pain point it will solve, it's probably not worth
it.

A good pain point I think this would solve is if your CI doesn't have access
to a Docker daemon at all (for security or you're using a hosted solution or
whatever) and you want to be able to build containers and publish them in your
CI.

~~~
jacques_chester
There's two things here.

The first is layering. A goal is smarter layering without a Dockerfile. A
dependency-management tool gives fairly good insight into how to arrange
layers to improve cacheability and reduce layer churn.

The second is security. Going without a daemon socket is more than "pretty
nice", it is a _massive_ security win. Docker daemons have a lot of privileged
access and a wide API. Having that entire massive attack surface lying around
to write some tarballs and a JSON file on your behalf doesn't make a great
deal of sense.

These both become more valuable at large scales, which is why it makes sense
for Google folks to have developed it.

~~~
kronin
Agreed re: both of these wins.

On the security front, would be ideal if these resulting containers ran as
non-root. Hopefully that will come soon.

------
pillfill
It'll be really interesting to see where this goes. We currently use Fabric8's
Docker Maven Plugin ([https://dmp.fabric8.io/](https://dmp.fabric8.io/)) for
our Java-based containers. It's a little verbose but works really well- it
allows us to run our integration tests directly against the final container
images (+ any container dependencies) as part of the standard Maven build.

~~~
blahblahblogger
I wish fabric8 had Gradle support.

This ticket implies that it isn't "build tool agnostic"
([https://github.com/fabric8io/fabric8-maven-
plugin/issues/609...](https://github.com/fabric8io/fabric8-maven-
plugin/issues/609#issuecomment-297363332)) and I found a few github repos that
are "gradle plugins" for Fabric8 but they only had a few stars or were out of
date.

------
mattlondon
Maybe someone can help me out on a question I've not got a clear answer on: is
it better to build your artefact (jar file in this case) and put that in the
docker image, or is it better to put everything you need in the image (source
files, dependencies etc) then build the artefact from within the container?

Seems to me like the first approach (compile outside then put artefact in
container) is the most sane/sensible approach and the one used here in Jib,
but I have seen people advocating the build-in-container approach as better
(self-contained, repeatable and so on).

Any benefits to the build-in-container approach that are compelling?

~~~
bzajax
Docker recently has support for multi stage builds. So you can use one
Dockerfile to define how to build your image (produce your jar and necessary
other resources) and then copy them into a fresh container running off a more
lightweight image that might not contain all the intermediates and your build
tools.

[https://docs.docker.com/develop/develop-images/multistage-
bu...](https://docs.docker.com/develop/develop-images/multistage-build/#use-
multi-stage-builds)

~~~
zokier
Multi-stage builds is definitely my preferred method of building stuff, but
the problem I've found is that caching dependencies is bit of a pain, either
you need to do some tricks with volumes or your builds will be very slow.

~~~
kinghajj
At work, I setup the build system to use Apache Archiva to cache Java
dependencies, along with other services like apt-cacher, devpi, and squid, for
Debian, Python, and general HTTP caching, respectively. We build all of our
images atop the "bitnami/minideb" base, and split the Dockerfiles into
"partials" that source a dynamic parent image via a build argument. For
example, the partial that sets up Debian package and HTTP caching:

    
    
        ARG base
        FROM ${base}
        ARG apt_cacher_host
        ARG http_proxy
        ARG https_proxy
        ARG no_proxy
        ENV HTTP_PROXY="${http_proxy}"
        ENV HTTPS_PROXY="${https_proxy}"
        ENV NO_PROXY="${no_proxy}"
        ENV http_proxy="${http_proxy}"
        ENV https_proxy="${https_proxy}"
        ENV no_proxy="${no_proxy}"
        RUN set -ex; \
            if [ ! -z "${apt_cacher_host}" ]; then \
                echo "Acquire::http::Proxy \"${apt_cacher_host}\";Acquire::https::Proxy \"false\";" >/etc/apt/apt.conf.d/01proxy; \
            fi; \
    

For Java, another partial creates a Maven settings file with the Apache
Archiva URL:

    
    
        ARG base
        FROM ${base}
        ARG maven_mirror
        COPY devops/image-partials/java-artifacts/settings.xml /tmp/settings.xml
        RUN set -ex; \
            env; \
            if [ ! -z "${maven_mirror}" ]; then \
                sed -r "s!%%MAVEN_MIRROR%%!${maven_mirror}!g;" /tmp/settings.xml >/usr/share/maven/conf/settings.xml; \
            fi;
    

The important part of settings.xml is this section:

    
    
        <mirrors>
            <mirror>
                <id>archiva.default</id>
                <url>%%MAVEN_MIRROR%%</url>
                <mirrorOf>external:*</mirrorOf>
            </mirror>
        </mirrors>
    

And voila, Maven will go through Archiva for all its dependencies, which are
transparently cached so that subsequent builds don't need to re-download
everything again.

------
chvid
Maybe I am old school. But I kinda like a WAR-file, run/deployment via a maven
plugin and a simple servlet container (like Tomcat). If you run a pure Java
stack why not keep to this stuff that has been around for ages?

~~~
paulddraper
Containers give control over the application's JRE version (without making a
self-extracting executable).

~~~
chvid
Sure but in practice that would change very rarely.

For what I am working on a deployment takes a couple of seconds including
controlled shutdown and startup of the application.

I don't know how fast docker/jib would be but I am imagining this would be a
somewhat heavier setup?

~~~
ryanianian
Not having to install and manage a tomcat server is pretty nice. There is
"embedded" tomcat so you can just run `java -jar my-app.jar` but then you
still have to worry about installing and managing the JVM plus whatever file-
system resources like logging locations etc.

> [JRE version] would change very rarely.

It's optimizing for setting up new machines and having everything about how to
run an application contained within that application itself rather than
relying on sysadmins to `yum install` things or whatever. Plus with docker you
always run the same version in prod that you do in dev (and on your local
machine) unless you intentionally do something else.

> I don't know how fast docker/jib would be but I am imagining this would be a
> somewhat heavier setup?

Not really - a layered docker image may have tomcat as a base layer and then
the app jars etc layered atop. Deployment is just downloading a new layer
which wouldn't really be any overhead atop of just the jars.

~~~
chvid
I am curious about startup / deployment times?

When I a deploy WAR-file it, it is fairly small (just app + dependencies), and
startup / shutdown is well-defined and happens fast, allowing me to update an
application with just a few seconds downtime. But here - how fast would an
update be? How much downtime? Does it reinstall the os, the vm, the app, are
the shutdown / startup operation well defined or does the process just get
killed?

Wouldn't it take quiet a bit of work just to match what you get out of the box
in a traditional setup?

~~~
kimdotcom
You would bring up a 2nd container with the new code, map the traffic to this
container, and once you are sure it is good, trash the old container.

If there is a problem with the new code, just point the traffic back to the
working container and kill the new container.

~~~
chvid
So it is so slow you have two spin up a second node :-D

------
djhworld
This seems pretty neat if you have the infrastructure already, removes a lot
of the busy work around creating docker images etc, and from the looks of the
instructions it looks like you don't even need Docker installed to create the
image - which is great for CI!

------
CSDude
Someone asked me if I used Google's Skaffold (save+push to kubernetes), and I
giggled and told we mainly use Java and Spring even boot takes more than 10
seconds and building also takes around 10 seconds, so if you do not have .m2
caches, it would take much more time.

Dependency caches and already built components, in all languages (Go, Java,
Nodejs) is always a problem for us in Docker in our CI/CD, glad to see such
solutions coming up

~~~
akvadrako
Often you can just mount a persistent volume where the cache should go.

~~~
CSDude
What do you do when they are shared and remote?

------
benatkin
I started to make something similar with Node, since I wanted to use Node to
make containers rather than shell scripts (I don't think learning shell
scripts should be a prereq for NoOps). Unfortunately I got distracted, before
I eliminated the need for a Dockerfile (it had a wrapper Dockerfile that ran
npm scripts)
[https://github.com/diodejs/dev/blob/master/Dockerfile](https://github.com/diodejs/dev/blob/master/Dockerfile)

It's based on a plain alpine container, with the idea that you can run `apk
remove node` when you're done to clean it up if you don't need node once the
container is built.

A lot of people are missing that this is really about replacing shell scripts
with Maven or Gradle. You can still build a fat JAR.

~~~
jacques_chester
Google have also done work on the more general idea of "read dependencies to
layout containers", under the title FTL. I would give you a link but it is a
maddeningly ungoogleable name.

------
rad_gruchalski
How does this compare to sbt-native-packager docker integration?

~~~
maccam94
sbt-native-packager doesn't do dependencies as separate layers, so the image
has a big fat layer with all your jars.

------
he0001
Docker works wonders if you don’t have a functioning app server or something
equivalent. If you don’t have the possibility of deploying to an already
functioning cluster, use docker. If you have an environment where you can’t
“run everywhere” use docker. However docker _is_ just something to handle
environments. Not applications.

------
politician
If you really want to explore why running a JVM in a Docker container makes a
lot of sense, try running a modded Minecraft server with and without a
container for a few weeks.

Another commenter in the thread mentioned that "Docker eliminates wishful
thinking." Getting that first modded Minecraft server operational will drive
that home.

------
joshschreuder
Looking forward to Microsoft investing some more in this area with Core.
Creating a new Core project offers to add Docker support which is pretty
decent out of the box but it’s pretty much hands off after that point and you
have to do everything yourself from there.

------
phanboy4
Java isn't an ideal fit for containerization or horizontal scaling by nature,
but if you're stuck with it and really want to avoid using anything other than
Java ecosystem tooling, this seems like a timesaver.

~~~
preordained
Why would Java not be an ideal fit for containerization (why would any runtime
not be)? Also, given you don't have to pack everything into one VM (can have
many nodes, VMs, load balanced, whatever)...what makes Java particularly bad
for horizontal scaling?

~~~
opmac
Yea I don't understand the original comment... however it is true Java isn't
(or at least wasn't until recently) ideal for containerization, since the JVM
would not obey things like cgroup limits for memory and CPU. This was fixed in
Java 9 and backported to the Java 8 (update 131) however there are still
potential issues regarding CPU isolation that are not trivially solved without
some explicit configuration.

For a good overview, see: [https://mesosphere.com/blog/java-
container/](https://mesosphere.com/blog/java-container/)

------
jcolella
I ended up leveraging multi-stage building with Docker and have a gradle image
as my builder image, build the jar, and copy that to the image that is going
to get deployed.

------
dogtail
U can help with dockerizing Java application and running it in nomad cluster

~~~
dogtail
I can help

