
An alternative to fat JARs - javinpaul
http://product.hubspot.com/blog/the-fault-in-our-jars-why-we-stopped-building-uber-jars
======
bradleybuda
> We have over 1,000 microservices constantly being built and deployed.

Wait until they find out how much faster in-process function calls are than
RPCs.

~~~
heartsucker
Wait until they realized microservices can be deployed independently and have
strict APIs without people using classes they shouldn't or creating spaghetti
code because they have 800k LOC.

~~~
msbarnett
> without people using classes they shouldn't or creating spaghetti code
> because they have 800k LOC.

Honestly, if you can't trust your devs not to make a mess in a monolith, you
can't trust them not to make a distributed mess with microservices.

Both architectural patterns have their uses. Neither will save you from bad
developers.

~~~
heartsucker
I think monoliths lend themselves to hacks more often than microservices do.
It's too easy to just import something you shouldn't, where as breaking the
rest API is a lot harder. You can't just do a bad database access because
there's usually a network firewall that will keep you from doing your
tomfoolery on the DB.

I also could be wrong about this, but I like the microservice after projects
reach a certain size.

~~~
takeda
The problem with API is that is not an easy to create a good one. It's so easy
to do it wrong, and once you do, it's very hard to change it later. API should
be created pretty much by a committee where everyone from the company can have
an input how it will affect them.

It's not a panacea, it looks like you're just pushing the complexity into
another place.

~~~
heartsucker
At least with an API, people seem to care more about consensus and seem to
agree that it should be planned out. "Just make it work" seems to happen to
the guts.

And no, not panacea, just a way of thinking that I think is useful.

------
exelius
Everyone uses fat JARs because it removes a lot of things that can go wrong
operationally (especially when standing up new servers).

The approach described in this article is the way Java was designed to work;
but then everyone said you needed a "self-executing" service binary for
portability. Enter the fat JAR, containers, orchestration layers, etc.

JVM application servers are certainly the way to go if you run a bunch of
Java. But then you're committing to maintaining a bunch of dependencies on
your application servers, so really you're pushing the tech debt down the line
to your ops folks who now have to ensure dependencies are synced, maven
servers running and updated properly, proxies punched for new external
dependencies, etc.

------
pcl
Cool! We've been doing something similar with public Maven repositories as the
backing store.

Jonathan: if you're listening, have you guys looked at Maven as a source
instead of an S3 upload? What percentage of your dependencies are internal vs.
external? We get 10-20x space savings from public-Maven-only elision.

EDIT: we've been working on a similar technique for executable JARs as well,
which shares the same general optimization numbers and approach. It's a bit
trickier, since we want to preserve the same 'java -jar' syntax, so requires a
bit of classloader tomfoolery, and corresponding classloader awareness in
code.

~~~
jpollock
I would recommend against depending on public repositories (of any form) as a
backing store for repeatable builds. We've repeatedly seen public repositories
remove packages, disappear entirely or otherwise make third party projects a
release depends on unavailable. If bringing up a production service requires
the presence of Maven, it will suffer an outage.

Treat any open source project your product depends on the same way you would
anything you paid cash for - store a copy in your code repository and back it
up with the rest of your source code.

~~~
niftich
You can run Nexus or Archiva or some other Maven artifact mirror within your
network to get the best of both worlds -- maven builds and none of your stuff
will break if central is down.

~~~
jpollock
Since those are proxy caches, they don't provide repeatable builds. The
referred to release can be changed without your knowledge and Nexus will
happily serve that to you as correct.

~~~
niftich
I don't follow. I shouldn't have mentioned Nexus as I'm a lot more familiar
with Archiva. But,

\- if the release of the same version was changed upstream, and you want this
change to be picked up, that's not a repeatable build anyway.

\- if the release was changed upstream and you don't want to pick it up,
configure Archiva to fail on checksum mismatch [1].

\- or as an alternative to enforcing the checksum, configure your local repo
as internal instead of proxied, and handle it all yourself [2].

[1] [https://archiva.apache.org/docs/2.2.1/adminguide/proxy-
conne...](https://archiva.apache.org/docs/2.2.1/adminguide/proxy-
connectors.html)

[2]
[https://archiva.apache.org/docs/2.2.1/adminguide/repositorie...](https://archiva.apache.org/docs/2.2.1/adminguide/repositories.html)

------
devsatish
Back in the day , used to have all common dependency jars in server/lib folder
and only deploy slim application war to the server . Reminds me of this

~~~
HiJon89
What was the process for adding or removing a dependency? Changing a version?
If two apps needed different versions?

~~~
niftich
You added the dependency to server/lib. You standardized on versions, and
applied only security fixes, testing them in a separate environment first. You
only upgraded versions of libs if you had a critical feature you couldn't live
without, of if your existing libs were about to be deprecated and out-of-
support.

I'm not saying it wasn't a dark time, but it had its merits and drawbacks just
like everything else.

------
sytringy05
I did my first fat jar project in 2010 with embedded jersey and it was about
10mb. We use dropwizard now and most of our microservices sit at around 20mb
which I think is still slim enough to avoid the extra overhead of managing the
dependencies on deployment. IMHO Its hard to overstate how much benefit you
get from java -jar

~~~
pron
You can have both. Capsule.io (I'm a developer) gives you the option of either
embedding all dependencies in the JAR (including native libraries, if you need
them), or just listing Maven dependencies and having them downloaded, cached,
and shared on the first launch. It also lets you supply all the JVM flags
directly in the JAR manifest.

~~~
sytringy05
I'll have to check it out, thanks!

------
chadmaughan
First thought (to the article title "Why we stopped building fatjars") was
that the build person had "graduated" and took their newly "acquired
superpowers elsewhere" [0]

Joking aside, fantastic idea =)

Have you received a cease and desist from SlimFast yet for violating their
trademark?

[0] - [http://fortune.com/disrupted-excerpt-hubspot-startup-dan-
lyo...](http://fortune.com/disrupted-excerpt-hubspot-startup-dan-lyons/)

~~~
gurubavan
Don't believe everything you read -- especially when there's a single source.

------
twic
This sounds pretty sensible. I wonder if there's a simple way to reproduce it.
Maybe:

    
    
      I. Build your app as a thin JAR, with:
        a. a Main-Class entry in the manifest
        b. Class-Path entries in the manifest
        c. a file containing a list of dependency coordinates (group, artifact, version) in META-INF
      II. Write a shell script (for whatever value of 'shell' you like) which:
        1. takes a Maven repo URL and a Maven coordinate (group, artifact, version)
        2. downloads the JAR from the repo, extracts its dependency list, then pulls its dependencies too
        3. (optionally) somehow records which JAR is the main one, say by writing a tiny shell script or a symlink pointing at it
    

You might be able to use a standard embedded POM instead of a dependency list,
which might reduce the work a bit, but that would then require doing
transitive dependency resolution at deploy time, which is probably a bad idea.

Phase I is something like 10 - 20 lines of Gradle, tops. You could pack it up
into a plugin easily enough. Phase II is a similar amount of shell script,
maybe more.

For extra safety, add SHA3 hashes to the dependency list file, and check them
when you download the dependencies.

I've described this as using a Maven repo, but that doesn't mean it has to hit
Nexus or whatever; you can just put JARs in S3 in the right layout. A while
ago, i was doing this by maintaining a repository locally, and just pushing
the whole thing to a Bitbucket website:

[https://bitbucket.org/twic/twic.bitbucket.org/src/14ac48d4c4...](https://bitbucket.org/twic/twic.bitbucket.org/src/14ac48d4c4a9/repository/?at=default)

You could do something similar, perhaps going from Nexus to S3.

~~~
HiJon89
This similar to our setup with the slimfast-plugin. We use the maven-jar-
plugin to add the Main-Class and Class-Path entries to the manifest. At build
time, we use the upload goal of the slimfast-plugin to upload the
dependencies. It automatically reads the configuration of the maven-jar-plugin
to make sure the paths it generates match the classpath in the manifest. Then
it spits out a JSON file that has info on each dependency, including its
location in S3, file size, checksum, and relative path where it needs to be
copied to match the Class-Path in the manifest (the JSON entries look like
this
[https://gist.github.com/jhaber/3029dc55a568f0954b1c4b459657e...](https://gist.github.com/jhaber/3029dc55a568f0954b1c4b459657e1bc)).

On deploy, the simplest way to get up and running is to use the download goal
of the slimfast-plugin. It reads this JSON file, downloads each dependency
(using an optional cache folder), verifies the file size and checksum, and
copies it to the correct relative path. The application will then start up
happily with java -jar.

At HubSpot, we instead integrated this download step more transparently into
our deploy process. At build time we read this JSON file and store the
dependency information in our build database. Our deploy infrastructure
already accepts S3 URLs and handles downloading, caching, verifying checksums,
and copying to arbitrary directories so we just piggybacked on this existing
functionality to have it download the application plus all of its dependencies
for us.

~~~
twic
Here's my very rough attempt in Gradle:

[https://bitbucket.org/twic/ensure](https://bitbucket.org/twic/ensure)

Turned out to be more than 20 lines, but then i did it in Java rather than
Groovy.

I do the downloads from the Maven repo (which should be your internal Maven
repo!), so there's no need for an upload step. Deployment is done with a shell
script with a few undemanding dependencies - unzip, curl, and openssl to
verify the digests. I use a TSV file rather than JSON because it's easier for
the shell script to read!

My next trick will be to figure out how to get this into a Cloud Foundry
buildpack ...

Oh, the name: [https://ensure.com/nutrition-
products](https://ensure.com/nutrition-products)

------
setheron
Here is something I mentioned on the reddit post:

You can go just as far with [http://www.capsule.io/](http://www.capsule.io/)
and the capulet to import dependencies with Maven (from a Nexus proxy that
also fronts S3)

You are now deploying thin Jars and having the artifacts cached on the local
host through maven~!

------
virmundi
It's neat. But is it worth it? I thought inbound for S3 was free so the 150 Mb
doesn't cost much. You're paying $0.25 a month to store up to a gig of fat
jars so really nkthing. It's 20-ish seconds faster. Assuming a CI build, that
doesn't matter. It is a more complex solution. This makes it more brittle.
Which makes it more bad. Finally you're losing the single deployable. When
something goes bad, you can't just download the jar. You have to fish through
the maven dependencies.

~~~
jacques_chester
> _It 's neat. But is it worth it?_

Saving traffic is good in general, insofar as it speeds deploys and allows
higher container density.

But for me the key was trying to get out of dependency hell. The shade plugin
isn't perfect.

I work at Pivotal, we inherited the Spring team from VMWare. When I first saw
Spring Boot I wondered what all the fuss was about. Then I worked with a
classic Spring-with-hand-rolled-Maven app and oh boy, let me tell you, _I got
it_.

Having someone else level the dependencies for you? Huge. I'd be interest if
this tool can help Spring Boot too, though the runtime downloading of
dependencies is not without problems.

~~~
HiJon89
Agreed, we might not have done it if we had to resort to having the
application download its own dependencies at startup. In our case, our Mesos
scheduler, Singularity, handles S3 artifacts at deploy time for us. Previously
we gave it a single S3 URL for the fat JAR, but now we just give it a list of
artifacts (the app plus its dependencies) and it handles everything for us so
it's not much more complexity and ended up being really easy to integrate into
our deploy process. By the time the app starts up, all of its dependencies are
guaranteed to be present (if an S3 download failed, the deploy would have
failed) so it's totally transparent to the application.

~~~
jacques_chester
That makes sense.

I'm used to having to worry about disconnected environments (I worked on Cloud
Foundry buildpacks for a while), for which Spring Boot's JARs work a treat.

In a fully connected environment this approach looks promising. If I bump into
any of the Spring folks I will mention it.

------
pacoverdi
To deploy a process in my current project, I use Apache Ivy to resolve its
dependencies (starting from a small set of top-level modules). Then I cache
the results because Ivy resolution tends to take a while when you have a lot
of dependencies and a lot of repositories.

Then I send a message to a remote agent with the command to run (classpath,
main class, jvm args, args, configuration, etc). The agent then downloads the
jars that are not already present in its cache with a valid SHA-1 checksum
then starts.

Works quite well.

------
cmcginty
Sounds like an interesting idea. I would be even more excited to see someone
leverage Gradle for this. Assuming you publish your custom jar artifacts from
your build server, your "deployment artifact" could be a single Gradle file
that acts a your JAR dependency definition and execution wrapper. For
deployment, sync the Gradle file and execute the wrapper to launch the service
with Gradle handling all dependency updates and caching.

------
_pmf_
Reinventing OSGi configurations.

~~~
mindcrime
We're using ServiceMix as a container for services (micro or otherwise) and
while OSGI isn't without its own annoyances, it really is nice once you get
everything working.

The biggest problem I've found so far is simply 3rd party dependencies that
aren't packaged with OSGI manifest information. Recently I specifically had an
issue with Spring in this regard (yeah, yeah, don't use Spring, I know... but
there was a specific reason in that case, at least initially) that would have
been easy to resolve if the Spring guys still distributed OSGI'ified versions
of their jars.

Still, all in all, I'm pretty happy with this approach.

~~~
_pmf_
> The biggest problem I've found so far is simply 3rd party dependencies that
> aren't packaged with OSGI manifest information.

It's a mess, and I fear Java 9/10's modularization approach will even make it
worse (so that instead of P2 (Eclipse's Equinox' OSGi bundles) and Maven,
we'll have another set ... sigh. I'm still looking for a solution that
transparently translates Maven metainformation to OSGi bundle information at
some higher level (for example by hooking into the OSGi container's dependency
resolution logic).

Edit: apparently, such a mechanism exists (ResolverHook)

~~~
sytringy05
isn't that exactly what wrap is for?

~~~
mindcrime
That has mostly worked well for me when dealing with non-OSGI jars. Once I
figured out how to set things up using the features.xml file and specify wrap:
for the generic jars, things worked nicely.

All of this said, I'm fairly new to OSGI, so I may still run into some dark
corners that will turn me off. But right now, it seems to be serving the
purpose well.

------
mjt0229
It doesn't use S3, but you can do something similar with Coursier:

[https://github.com/alexarchambault/coursier#launch](https://github.com/alexarchambault/coursier#launch)

------
1138
On limited information, this sounds like a work around on a root problem of
runaway dependencies.

Maybe something like proguard could reduce deployable jar size to only the
classes used.

I assume these are daily dev deploys and not production deploys.

~~~
HiJon89
We definitely have some dependency cruft that could be trimmed (lots of
relocated copies of Guava due to incompatibilities, for example).

We do frequent production deploys (of individual services, there's no such
thing as deploying our entire application). To give an idea, it's a little
before 1pm here and across our team there have been 180 production deploys
already today.

------
matt_wulfeck
It's hard for me to understand why in 2016 200MB files are still any kind of
real problem.

~~~
pjc50
From the article:

 _" Combining 100,000 tiny files into a single archive is slow. Uploading this
JAR to S3 at the end of the build is slow. Downloading this JAR at deploy time
is slow (and can saturate the network cards on our application servers if we
have a lot of concurrent deploys)."_

~~~
matt_wulfeck
I wasn't being critical of the OPs build process. I agree that uploading 200
MB to S3 is _slow_ , but it shouldn't be.

