
Benefits of a Monorepo - ingve
https://pspdfkit.com/blog/2019/benefits-of-a-monorepo/
======
notacoward
These benefits do exist, but drawbacks still exist too. The "constantly
updating, constantly recompiling" problem can get pretty bad when there are
many developers. Developers either lose time recompiling more than they should
need to, or keep working in out-of-date trees and dealing with trickier
merges. Multiply by the number of developers in a large org, and that can be a
lot of lost productivity.

The thing I don't see discussed often enough, in all of the "monorepos are
good/bad" debates, is that they require a certain discipline. When one commit
can span multiple components, it's easy to add or ignore bad coupling between
those components. That contributes to the rebuild treadmill for everyone, and
is bad design in a bunch of other ways as well. IDEs make it worse, because
they obscure the boundaries between components, but everyone ends up using an
IDE because their predecessors made it very difficult to follow the flow of
control across components otherwise. So you end up with one big ball of mud.
With separate repos this is discouraged, because violating the rules of good
design causes immediate pain. With a monorepo the pain must come in the form
of stronger reviews, to discourage bad coupling even though it's easy.
Unfortunately, the very same companies that have embraced monorepos have also
abandoned separation of concerns as a design principle. :(

ETA: please read bunderbunder's comment
([https://news.ycombinator.com/item?id=19796960](https://news.ycombinator.com/item?id=19796960))
as well. It's an insightful take on some of the same issues.

~~~
ceronman
There is available tooling to help you with that, not only has to be
discipline.

A monorepo oriented build tool such as Bazel will make sure that you don't
compile more than needed. This is extremely efficient. In fact, in my company
with a few hundred projects we tried it and it was 10 times faster than our
previous Maven based workflow. These tools also help you with dependency
management so you keep your code organized. The killer feature is that you
finally have true Continuous Integration because the build tool will figure
out any dependencies of the code that you change, rebuilding it and testing to
make sure that nothing is broken.

~~~
notacoward
> A monorepo oriented build tool such as Bazel will make sure that you don't
> compile more than needed.

Not at all true in my experience. I use Buck, which is very similar. It
_constantly_ recompiles stuff it doesn't need to by any standard. I've
seriously considered avoiding Mercurial bookmarks in favor of manual patch
management to see if I can avoid triggering the cases where it gets stupid.
Maybe it's a problem specific to Buck, but my conversations with peers at
other companies suggest not (and only a tiny percentage of developers at any
of these companies get a choice of which tool to use).

The other issue is that "more than needed" is subject to more than one
interpretation. There's "more than needed" as code was actually written, and
there's "more than needed" as code should have been written. Tooling can help
with the first but not the second. If developers are sloppy about how their
header files are structured, then a change to one that's widely used will make
it "necessary" to recompile a lot of code - including code that didn't
actually use the part that changed. Oddly enough, though, that "necessary"
recompilation wouldn't occur if those same header files were split up. (BTW
some languages make good design more difficult by requiring private
implementation details in public header files, but that's a separate
conversation.) Tooling can optimize what it can see, but what it sees might
already be a tangle of dependencies added only because a monorepo made it
easy.

~~~
yegle
You are describing modifying someone else's unrelated code will cause the
rebuild of your code right? This looks like your targets unintentionally
depend on the other code.

To find out how/where the dependency was introduced, you can use bazel query
'somepath(your_target, some_other_file)'. Not sure if Buck have an equivalent,
but bazel should work with Buck repo.

~~~
notacoward
> This looks like your targets unintentionally depend on the other code.

Sometimes it's unintentional, for example an include left in after it's no
longer needed. Other times it's intentional but lazy. For example, I often see
commits that pull in one dependency just to use one tiny little utility
function, but that dependency pulls in ten others, which each pull in ten
others, and before you know it any change in a hundred files forces yours to
be recompiled even when that tiny utility function wasn't touched.

A responsible developer would refactor the header file to keep the number of
spurious rebuilds down, but the world is full of irresponsible developers.
With multiple repos, the public immutable part of an API and the private fluid
part tend to find their way into separate files pretty quickly. With a
monorepo they don't, which is kind of ironic because one of the purported
benefits of a monorepo is that it's easy to reach into "someone else's code"
and refactor it. Unfortunately, people just don't seem to use that ability for
good often enough unless review culture forces them to.

------
mark_l_watson
When I was a contractor at Google I was surprised by the mono repo, but really
liked it. Scaling issues aside, it makes sense having all dependencies in one
repo.

I thought of having a mono repo at home because many of my projects are inter
related but instead I spent some time with a library setup. For Common Lisp, I
used a pattern in one of Zach’s blogs to have one central repo for all of my
personal Common Lisp code that is configured as a root for Quicklisp. Any of
my code can be quick loaded no matter where I am working. Really simple but it
makes it possible to write small utilities/libraries and reuse them for
whatever I am doing.

I sort of do the same for Python, writing a setup.py for each small project or
library to install globally, including appropriate executable scripts. This
also allows me to combine small things easily. I am not yet set up to do this
yet for my Haskell projects but it is on my todo list.

~~~
petters
I merged all my personal C++ projects into one monorepo and I really liked it.
Especially useful with C++ since setting up the build system is so cumbersome.

------
royjacobs
Hmm, if you're working on a single product (like the author is), then _of
course_ it makes sense to stick them in the same repository. But isn't this
just common sense?

I mean, if you have a single product split over many repositories then you
will run into the issues mentioned by the author: A change will invariably
involve multiple repositories. Again, to me this seems to be just an
indication that you are, in fact, working on a single product.

~~~
TickleSteve
This is obvious to me (a monorepo advocate) but most people these days tend to
organise their repos based on module/implementation lines rather than on
product boundaries.

My motto is: "What you release together, you should version-control together".

~~~
ctroein89
I've learnt that for "What you release together, you should version-control
together", the phrase "released together" needs to be defined as "what gets
globally updated at once". If you provide anything for customers to implement,
you cannot update those globally at once (because you need the customer to
update), and should think about using a multi-repo set-up.

My experience is that testing the interactions of historic versions of
different components in a mono-repo against each other is difficult. With a
multi-repo set-up, one can checkout to whatever historic version one needs to
test. That makes it trivially easy to set-up testing that allows one to test
historic versions of components against each.

------
bunderbunder
For my part, I've worked with two long-lived monorepos so far. Both were quite
small, all told, but, nonetheless, in both cases I saw the monorepo largely
serving as an incubator for bit rot.

All those direct interdependencies, combined with the impossibility of just
using package versioning to allow a library's consumers to upgrade at their
own pace, meant that developers had a strong disincentive to do any cleanup
that might result in a breaking change. They'd tend to prefer patches instead.
So the long-term trend was for everything to eventually become a tangled mass
of patchwork.

Google has entire teams devoted to combatting this problem. I'm not sure most
companies can (or want to) afford that.

This is one of those spots where I think it's wise to be cautious about blog-
driven development. By analogy, I've seen experienced woodworkers have fun
with mocking the DIY furniture projects that come up in places like Pinterest.
One common criticism was that, since wood expands and contracts with changing
humidity, but only in certain directions, if you put it together in certain
ways, you end up with a product that will literally tear itself apart over
time. Of course, this isn't apparent from the Instructable article, because
it's a slow process, and all those pictures were taken when the piece was
still brand spanking new. It's also not something you're likely to see written
in the comments section or anything like that - the people I've known who know
this sort of thing aren't generally the type to write Instructables articles
or maintain their own DIY blogs.

I suspect something like this also happens with any decent sized codebase:
Code is a living thing that expands and contracts with changing business
requirements. Over time, it will twist itself out of shape. Weak coupling and
fragmented codebases are two different (though related) ways of trying to
limit the damage from this effect. They both bring their own challenges,
though, and aren't perfect. Monorepos don't have the same drawbacks, but that
doesn't mean they don't have any. One, I think, is that you're left directly
exposed to the full strength of the forces that cause bit rot.

If you're disciplined enough to recognize this and treat it as another
incentive to attack bit rot head-on, then you'll probably end up in a better
place for it. If you think monorepos are a panacea, though, you're setting
yourself up for the kinds of failure that people generally don't like to blog
about.

~~~
wcdolphin
Can you help me understand what bit rot means as you are using it, and why it
is different in a mono repo vs an equally large multi-repo system?

~~~
bunderbunder
Just cherry-picking one rather extreme example: There was a method in a core
library that had an argument that was not used. Few people knew this, and it
was one that was called at least once by most services.

When I discovered this, I asked the module's maintainer about it. He explained
that it used to be used, but requirements changed so that it wasn't needed
anymore. He decided to leave it anyway, though, because the alternative would
have been an immense effort to track down all the call sites, remove the
argument, then trace all the resulting dead code, and remove that as well. It
would have eaten up a week, I'm sure. Everyone had bigger fish to fry, and
always would have bigger fish to fry.

As I understand it, Google gets around that problem by having a group of
people where frying fish like that is their primary responsibility.

~~~
sa46
I don't understand how avoiding bitrot is better in a multi-repo setup. With a
multi-repo, if you're changing a core-library you can make the change quickly
but you can't upgrade clients until they upgrade on their own. You still have
bitrot. The difference is that the bitrot isn't visible because it's
compartmentalized into separate repos.

~~~
andolanra
The reason that a monorepo facilitates this kind of bit-rot is because of one
of the advantages of the monorepo setup: that it also facilitates easily
making connections between different parts of the codebase. In a multi-repo
setup, you need to be intentional about when and how you pull in another repo,
because the bar to doing so is so high. In a monorepo, every piece of every
part of the code is available to you and no extra action is required to "pull
in" that other repo, which means in practice that without tooling or
discipline you end up with a codebase with lots of tight interconnections.

So the reason it happens in a monorepo more is that those connections are
easier to make and therefore a lot more common, not because those connections
are impossible in a multi-repo setup.

~~~
sa46
I see, the argument is that monorepo facilitates tight-coupling because it's
easy to add dependencies. The multi-repo doesn't have the same problem because
it's more difficult to have cross module dependencies.

There's a number of solutions to prevent tight-coupling.

1\. Language-level visibility modifiers. 2\. Build system visibility. Bazel
offers fine-grained visibility. I'm not sure about others. 3\. Using RPCs as
the primary interconnect between services.

My view is that disadvantages of tight-coupling in a mono-repo is a much
better problem to have than the disadvantages of having logic spread across
different repos.

------
is0tope
I've been an avid user of docker-compose for web app type projects. In this
case, I've found putting everything in one repo perfectly manageable. Usually
this means the backend, frontend, nginx and any other random microservices are
just in separate folders in the base repo along with a compose yaml file. I
gather that if you are using more complex CI/CD tools this becomes more
complicated...

------
rkangel
Side note: blog posts and news articles that don't display the date are very
irritating. There has been a lot of discussions about monorepos, I want to
know if this is something new adding to the debate or something old. If it's a
news article, I want to know how up to date it is. All I get from this is the
slug saying '2019'.

------
Ensorceled
I decided to read this expecting to see an explanation as to why my DRF
backend should be in the same repository as my React Native frontend, and
instead found a discussion on why my React Native code shouldn't be broken up
into iOS and Android.

Not sure this is what people mean by a monorepo ...

~~~
shibel
I tried deploying my DRF backend[1] + Nuxt.js[2] demo-app on Heroku[3] as one
repo but it was a NIGHTMARE. Yes, it sucks to have to open both projects in my
IDE in development, but it's nowhere near the difficulty I had with the mono-
repo. Plus, I can find things much faster _within_ each project when they are
split.

[1]: backend → [https://github.com/SHxKM/django-vue-
ssr](https://github.com/SHxKM/django-vue-ssr)

[2]: frontend → [https://github.com/SHxKM/django-nuxt-ssr-
front](https://github.com/SHxKM/django-nuxt-ssr-front)

[3]: live-app → [https://django-nuxt-ssr.herokuapp.com/](https://django-nuxt-
ssr.herokuapp.com/) (on a free dyno, could crash on first load)

~~~
nicoburns
I've settled on a many-repos-in-same-parent-folder pattern, which means that I
can open them all in the same editor window if I want to.

I've also forgone proper IDE "project" in favour of opening the directory in
my editor on an ad-hoc basis from the command line (e.g. `cd frontend && subl
.`). It's not perfect, but it avoids the issue of having too many windows, or
having to fumble through IDE project switching dialogs.

~~~
shibel
Yeah, the "issue" for me is spinning up the backend-server from the frontend
project, or vice-versa, but I should probably stop being lazy and write a
simple script to spin up both at the same time.

~~~
is0tope
I highly reccomend tryi g out docker compose if you are doing this kind of
thing.

------
mikewhy
The CI story has been my biggest gripe with monorepos. I'm surprised that no
separate tool has emerged, like the "custom Jenkins scripts" mentioned in the
article.

Maybe I'm missing something, but couldn't you get pretty far with something
like:

    
    
        # This tells the tool where to look for changes
        packages_folder: ./packages
    
        # List dependencies. In this instance, if any files in `shared_api` change, frontend and backend need to build.
        dependencies:
          - frontend:
            - shared_api
          - backend:
            - shared_api
    
        # Definite build order
        build_order:
          - shared_api
          - backend
          - frontend
    
        # Somehow find these scripts in each package subfolder and try to run them.
        scripts:
          - build
          - test
          - deploy
    

The tool itself would see which files have changed between two git revisions,
figure out what packages need to be built, and run the scripts. But I'm
probably missing a million and one things.

------
robbrit
One advantage that isn't mentioned in the article: code examples. As a
monorepo grows, it's more likely that someone else using the repo somewhere
has attempted to do the same thing that you want to do, and you can use tools
like searchcode to easily find usages and see how they did it. Often this
works better than documentation, which for internal projects tends to be non-
existent or out-of-date.

One disadvantage: a lot of IDEs don't cope well with monorepos as they try to
index the entire thing.

------
tracker1
One of the projects I'm on at work just switched several repos into a mono
repo. The issue was that there were often related changes to the Database, UI,
Configuration and API projects... Those four in particular were almost always
released for 2-3 of them for a given feature. Coordinating releases to dev/qa
servers even was a bit of a pain. Merging them removed most of this.

There are several smaller/micro services that are still separate, however.
They're generally less effected by the coordination of db/api/ui changes.

------
bostonpete
I read this, and I still don't know what "many benefits" they're referring to.
I basically read one in the article -- a monorepo avoids the pain associated
with making changes to multiple repo at once, which was happening frequently.
Does the author make reference to some other benefits that I just completely
missed...?

------
jacques_chester
> _This makes CI test runs slightly longer, as they first need to do a fresh
> checkout._

-depth=1 ameliorates this somewhat (though plays poorly with tag-heavy workflows).

------
bluGill
Summary for those who can't be bothered to read: we had two repos and
copy/pasted code that needed to be in both instead of adding more repos. That
copy/paste had pain and we solved it by going to one repo.

They could have solved the pain by adding more repos as well. They never
considered this as far as I can tell.

~~~
jorams
That is a very uncharitable summary. The article discusses codebases being
developed together or growing closer together over time, causing feature
changes in one to need changes in another. It doesn't mention copy/pasting
anything. They solved it by moving everything into a single repo.

Adding more repos would once again split up pull requests for features that
touch multiple codebases, which is the reason they merged two codebases into
one repo in the first place.

~~~
bluGill
I will admit to being uncharitable.

I am assuming they did copy/paste coding to get features into both. It is
possible that they actually did write everything twice, but that is generally
such a stupid idea I can't believe it (except for UI - it makes perfect sense
to write a the UI for android and iOS completely separate - the logic behind
the UI should still be common, but the UI elements/APIs are different enough
that common probably isn't worth it)

Adding more repos should not split up pull requests. If it does that is a
function of bad architecture.

------
satya71
It sounds like this person would have benefited from using Subversion or
Perforce. Subversion 1.12 has alpha support for a limited local branch (called
Checkpoints).

------
kevinsimper
I really wish there existed more tools, it seems like there is a gap for some
tooling.

It would be a simple one that could check the git log and see what files has
changed since last time.

Or check the hash of each folder and compare it to previous commit.

If you use Docker, you can leverage that docker will take a hash of the
context and if the images already exist on the machine use the cache if
nothing has changed for those files. That is also quite effective +1

------
baybal2
It could well read like "The Many Benefits of Using Repo Modules"

To me it feels like a misguided attempts to work around the need of having a
package manager

Why not simply build your SRPM or DEB package repo for code? Worked like magic
for me on four of my past workplaces.

This also works towards building your culture of stable releases instead of
non-stop development and hotfixes.

------
solarengineer
Approaches such as Fan In supported by GoCD are of great help when working
with monorepos. See
[https://docs.gocd.org/current/advanced_usage/fan_in.html](https://docs.gocd.org/current/advanced_usage/fan_in.html)

------
amelius
What tools are people using to manage multiple repos? A search reveals many,
but what is the latest and greatest?

~~~
pjmlp
NuGET, Maven, whatever is the package manager for the language in question.

The outcome of multiple repos should be modular libraries.

~~~
amelius
Sounds good in theory, but I'm wondering: your edit-test cycle now becomes
edit-install-test?

~~~
oblio
Not necessarily. With Java IDEs, if you have the sources you can usually
modify those and the IDE knows to ignore the package manager dependencies.

~~~
oblio
To the downvoter: [https://stackoverflow.com/questions/10688344/in-
eclipse-m2e-...](https://stackoverflow.com/questions/10688344/in-
eclipse-m2e-how-to-reference-workspace-project)

