
Need for Speed: Accelerating Maven Snapshots - HiJon89
http://product.hubspot.com/blog/speeding-up-maven-snapshots
======
64BitChris
While I commend the author for taking the time to write this article and
sharing their experience with the community, I can't help but think that this
is a solution to a problem that was created by the very team that solved it.
Or maybe another way to think about it is that this accelerator is attacking a
symptom of a larger, more systemic problem.

The real problem I see is a system composed of over 3500 frequently changing
modules, all with the concept of repeatability (ie release versions) removed.
The issues with snapshots only aside, the core issue is the massive number of
modules involved.

I mean what is the ratio of modules to engineers? I don't know the size of
their engineering team but assuming it's around 100, that ends up with 35
modules per engineer!

Looking at it from another perspective, what's the ratio of modules to
features in HubSpot and their internal tools. My guess is HubSpot has less
than 3500 features...so now you have more modules than features?! That implies
that functionality for a single feature spans multiple modules and isn't
organized properly. Again, a fundamental architecture problem that should be
solved rather than patching maven to work around it.

Even with a magic build tool to handle this at build time, the system is so
complex just with modules alone that it's fundamentally unmanageable in any
holistic way, which is probably why they stopped bothering with releases and
went to a snapshot only approach.

My unsolicited suggestion to them would be to focus on reducing the complexity
of the modules and their interdependencies along quantitative metrics like
modules:engineers or modules:features ratios. It varies by org of course but
in general, worst case should be no more than one module per engineer and
definitely less than one or equal to one module per feature.

It's great that they've open sourced the tool but if they don't solve the
underlying complexity problem of their system they're going to spend most of
their time solving problems that they've created. This time would be better
spent moving their product forward which would benefit all stakeholders at
HubSpot.

~~~
HiJon89
[Copying my reply from reddit]

I can give you some background on the way we structure things in case it's
helpful. We use GitHub for our open-source projects, GitHub:Enterprise for our
internal code, and our engineering team has about 300 engineers. Each project
or feature usually lives in its own repository, and is split into a number of
Maven modules. Using the maven-snapshot-accelerator (
[https://github.com/HubSpot/maven-snapshot-
accelerator](https://github.com/HubSpot/maven-snapshot-accelerator) ) as an
example, it's roughly 1,000 lines of code split into 6 Maven modules
(representing a small percentage of what my team owns). There's the root pom
which just aggregates the other modules, a core module that contains the
POJOs, a REST API module that uses the POJOs, a client module which uses the
POJOs to hit the REST API, a module for the maven plugin, and a module for the
maven extension. If this was an internal project, it might also have
supporting Hadoop jobs, Kafka workers, Spark jobs, etc. depending on how the
system was designed. Each of these would usually live in a separate Maven
module. I'm not sure how much thought we gave to splitting into Maven modules
along these lines, it just kind of seemed natural to us. It also has a few
benefits compared to larger modules:

\- The dependency trees are smaller, because you're pulling in a more focused
set of libraries rather than the kitchen sink

\- Our build infrastructure only builds what has changed when you push a
commit, but it can only do this at Maven module granularity. So by splitting
things up this way, pushing a change to your kafka workers won't rebuild your
Spark or Hadoop jobs.

Using one of our larger open-source projects as an example, Singularity (
[https://github.com/HubSpot/Singularity](https://github.com/HubSpot/Singularity)
) is split into 15 Maven modules (and represents maybe a third to a half of
what that team is responsible for).

Snapshots vs. versions is a whole different topic and probably deserving of
its own blog post. More generally, whether we should bend our workflow to the
tools or bend the tools to our workflow is a constant tension and something
we're always thinking about, so feedback is always appreciated.

