
We’ll Never Know Whether Monorepos Are Better - kalimatas
https://redfin.engineering/well-never-know-whether-monorepos-are-better-2c08ab9324c0
======
jasode
From the arguments I've seen, the top highlights of single vs multi repo:

Single repo enables company-wide code sharing and refactoring with the least
amount of friction. E.g. A programmer can modify _" foobar(x,y,z)"_ to _"
foobar2(z,y,x)"_ and update _all_ the corporate-wide client code that calls
it. This ensures all teams are _always_ using the same code instead of Team A
on helperutils_v5.2.3 and Team B on helperutils_v6.1.0 which are incompatible
with each other.

The biggest downside seems to be slow performance for syncing huge repos. This
was one of the motivations for Microsofts GVFS for git. It enables _sparse_
downloads.

If there are good arguments for multi-repos _that are not related to
performance_ which override the benefits of company-wide consistency of code,
it would be good to discuss them. (I'm talking about companies like
MS/Google/Facebook and not consulting firms with a different repo for each
customer project.)

~~~
awinder
Off the top of my head:

1\. Security. It boggles the mind that companies with large source trees would
let their entire source tree get cloned by a single developer. Maybe this
isn’t really an issue but seems like you’re creating unnecessary surface area
when the iOS dev team can walk around with the machine learning groups code.

2\. Dev aesthetics / onboarding. There’s a complexity cost to sharing all of
that code. If I join a company and the first step is spending (weeks? months?)
getting comfortable understanding the universe of this gigantic source tree,
instead of being able to safely understand a small piece of it, that can lead
to frustration and it’s certainly not going to be productive.

I’ve never worked in a massive, well put together monorepo, so maybe these
issues are solved in other ways. But it seems like your point #1 is a trade-
off – being able to call code from anywhere in the same repo is powerful, but
it’s also capable of breeding a ton of complexity, so please mind the mere
mortals who will follow one day.

~~~
Walkman
Neither will help if you employ bad people or the company has a bad culture.
You should trust every developer.

~~~
xref
Even if you trust every developer implicitly that's not a reason to forego the
Principle of Least Privilege

~~~
Klathmon
I can't imagine working somewhere that I'm not allowed to even see all the
code.

------
erikb
There is no _better_ , there is only tradeoffs, and some people prefer one
tradeoff and others prefer another tradeoff. That's just how it is.

So before making any siginficant changes one should discuss what the desired
tradeoff is, and then measure the continuous change towards that tradeoff.

For instance monorepo is a goal for people who want interacting tools to be
build in their corresponding state without need to consult someone else. Can
you do that? Check out the repo (for 2-5 hours depending on your code size)
and then just hit "make" and "make test" and it runs through? If yes it's a
success, if no something went wrong in the process.

PS: There are actually build tools that are language independent, e.g. make,
CMake or Maven. You can just use these.

PPS: I'm personally really happy to see that monorepo in itself doesn't
resolve the dependency tracking for internal tools. I still believe in the
other way around. A good dependency tracking tool can totally replace
monorepos.

~~~
dfabulich
(Author here.) I agree with your main point but wanted to respond to your PS.

I think it's important to distinguish build tools like make from declarative
versioned-dependency management tools, which multirepos require.

For example, you can build a multi-language monorepo with either make or
CMake, but you can't declare that your repo builds "foo" version 1.2.3 and
that it depends on "bar" version 2.3.4.

Maven does that, and it will automatically download the right version of bar
for you and make it available, but Maven doesn't work well for C# dlls. You
can use NuGet for dlls, but don't try it with JS modules. Use npm to manage JS
modules, but don't try it with Perl packages. You can use CPAN for Perl
packages, etc.

~~~
erikb
Agreed. Make+Monorepo possible. Make+Multirepo maybe not enough.

------
somehnreader
> Rolling our own dependency-management tooling took a small team of engineers
> (including me) more than a year of very hard work, and there were problems.

I am amazed what is done in large organisations sometimes. No one would ever
sign off on me doing something like this. How do you convince the number
crunchers that its a good thing to build things like this if there is no
measurable outcome?

Not trying to be rude or devalue OPs work, but I haven't worked in an
organisation that would allow for this in a long while.

~~~
zaphar
There are two factors at play here:

1\. Larger organizations have bigger budgets they can allocate toward solving
problems unique to that organization and the overhead they have built up that
is unique to their particular way of working.

2\. Most of the organizations that work on this type of thing don't require
number cruncher approval. The number crunchers may track and report on these
types of efforts but they don't control the decision making. If you are lucky
enough to work at one of those be thankful.

------
bryanlarsen
Note that there is a third option: both.

You can split up a git monorepo with splitsh[1], and/or pull in & out with git
subrepo[2]

This gets you the best of both worlds and avoid most of the cons of both,
although you do get a new set of cons, like keeping them in sync and ensure
tooling works in both places. It's not too bad, though. Make sure you bless
one or the other as the source of truth and make the other basically read-
only, and add a hook to automatically merge from one to the other.

1: [https://github.com/splitsh/lite](https://github.com/splitsh/lite) 2:
[https://github.com/ingydotnet/git-subrepo](https://github.com/ingydotnet/git-
subrepo)

~~~
catern
There is also git-meta: [https://github.com/twosigma/git-
meta](https://github.com/twosigma/git-meta)

~~~
bryanlarsen
Cool. It looks like git-meta stores metadata to make git submodule easier, and
git-subrepo stores metadata to make git subtree easier.

------
suvelx
I've recently inherited a nearly 100% python microservice-monolith monorepo
project, whos tooling was written in-house by the original developers, and
then subsequently re-implemented using the tooling's plugin system within the
project. And even then, the entire project is coupled together with some bash
scripts to actually run it.

Fuck me it's a nightmare wrapped in a shit show.

Due to the way the build tool generates the app, not a single 'standard'
method can be used. Any developer niceties you use can't be (think interactive
debugging, code-completion, hot-code-reloading).

I can see why it was done... sorta. Our FE team ripped out their component and
standardised development back to their platform norms. But now we have
integration issues between front and back, and the BE devs are now subject to
a 60 second front-end build each time we change some code and reload the
application.

------
Androider
There are technical pros and cons to both, but monorepos in my experience can
"fix" or at least sidestep many human/big organization issues:

\- NIH, teams re-creating instead of re-using other teams work either due to
not knowing it exists or because they feel they cannot control it.

\- The gatekeeper: "just ask Bob to give you access", repeat for the next two
to four weeks. Once you get really big, source code access is even used as a
political tool between teams.

\- The internal customers: Team A works on a common framework, teams B, C, D
all use it. Good luck getting anyone of them to take a new release more than
once in a blue moon, and everyone will be on a different version too so you
have to support them all. Fun times.

------
vmateixeira
> _There’s no way to measure productivity in software_

This is clearly not true

Edit: It seems obvious to me that OP has a problem with his team, other than
with being able to measure pros/cons of his technologically choosen path

~~~
AndrewDucker
You've made an assertion, but not provided any evidence for it. That doesn't
seem, to me, to add much to the discussion.

Would you like to back your statement up?

~~~
renesd
They don't need to back it up. It is up to the person making the original
claim to back it up.

There's half a century of research and practice into this topic. Anyone in the
industry _should_ know about this. Anyone who doesn't can find it on a search
engine within 30 seconds.

~~~
Retric
No, it's a well known problem. The old lines of code joke still applies. You
can introduce metrics, but those will just be gamed because software is a
multidimensional optimization problem with complex trade offs. So, you can't
translate that to a single measurement and call it productivity.

~~~
renesd
Yes, it's a well known problem.

------
Jyaif
Everybody will agree that big codebase need some sort of componentization so
that understanding some part of the code doesn't require understanding the
entire codebase.

I like multiple repos because it encourages that: having pieces of code that
do not depend on each others, and having more well defined API.

The speed gain of the tooling is just a side benefit.

------
donatj
While there are certainly ups and downs to both, I'd much rather have to deal
with the downs of multirepo myself.

------
TheAceOfHearts
You can have your cake and eat it by mixing both. I've been reading through
the Fuchsia [0] source, and they essentially have a monorepo setup that's made
up of multiple repos.

They wrote a tool called jiri [1], and track projects in a manifest [2] repo.

[0]
[https://fuchsia.googlesource.com/?format=HTML](https://fuchsia.googlesource.com/?format=HTML)

[1]
[https://fuchsia.googlesource.com/jiri/](https://fuchsia.googlesource.com/jiri/)

[2]
[https://fuchsia.googlesource.com/manifest/](https://fuchsia.googlesource.com/manifest/)

------
arjie
Cross-language artefact and dependency management. Tell us more about the
tool.

~~~
nerdponx
I've always wondered why this kind of tool wasn't widely available. Is it
really that hard to get right? What's so complicated about it?

------
klodolph
My conclusion in the monorepo/multirepo debate is that for large code bases,
current tooling isn't up to the task, and switching between monorepo and
multirepo changes one set of tooling problems for another.

With custom tooling, you can overcome almost any of the differences between
the two approaches, enough so that the two approaches can start to look rather
similar. There are ways to do atomic refactors across large numbers of repos,
with the right tooling, and there are ways to use semver for components inside
a monorepo.

------
kozak
Interesting how the HN comments are also dividing between "it's obvious that
monorepos are better" and "it's obvious that multirepos are better".

------
falcolas
So, here is some more anecdata for the pile. We go with the multi-repo model
at work, and the number of repos we "use" is over 1100 right now, and growing
about a hundred repos a month.

How did we get here? An (imo) extreme dedication to "microservice all the
things", a CI system which intentionally only allows one build artifact per
repo, and a release system that encourages the separation of parts of a
service's components to minimize red tape for changes.

------
andreareina
OT, what is it with Medium sites pushing onto my history state? Getting out
requires at least two, frequently more applications of <Back>.

------
GrumpyNl
It all depends on your needs and architecture.

------
jakub_g
Anecdata:

At my $DAY_JOB we went with a mostly-monorepo solution (1 main repo and 2
small auxiliary repos), with multiple dependent modules inside it -- after me
insisting on it, while the other dev wanted a bazillion of small repos --
we're talking about just a really small dev team (3 devs initially, more
later) -- but we still agreed that 1) each subfolder of the monorepo should be
as independent as possible and only depend on the minimum subset of other
modules it really need (so multirepo-ish behavior in a sense), while at the
same time 2) allowing to change things in multiple modules and quickly check
if everything still works (we used symlinking between the modules' folders).

Result:

I was babysitting our build and tooling for _months_ and fixing more and more
exotic edge cases with incompatible transitive dependencies/multiple
branches/versioning/sumodules building order etc. (In hindsight, we threw way
too much complexity and requirements on an opensource tooling we used than it
was able to handle - but we didn't acknowledge that upfront, because at the
beginning it looked like it would work fine).

If we went with a multirepo instead, it would have been even worse: all the
pains that I described would still be there, plus we'd go crazy with non-
atomic refactorings and merges.

IMO starting a project as a multirepo is a madness: things are messy and
change too often at the beginning and it's not worth the overhead. Maybe it
makes sense to start with a monorepo but structure things in a way to make
them splittable in the future, when things stabilize more. But even that is
_way_ more costly than expected.

------
jt2190
This article missed a great opportunity to enumerate the advantages and
disadvantages of either model.

~~~
qwer
It's because this is an article about engineering decision-making, not repo
management.

------
luord
Same with:

* Vim vs emacs

* Tabs vs spaces

* Static vs dynamic

* Monolith vs microservices

* Functional vs imperative

And a long etc. We're dogmatic folks.

I fall in the multirepo camp.

------
vesak
Who cares about that? Which is better, vi or emacs?

~~~
tatersolid
Notepad++

------
sheeshkebab
monorepo is very difficult to get right - and once you do, you are pretty much
stuck with the language/tool choices since the switch usually takes
months/years.

multirepo is a nightmare for changes spanning multiple repos, but the freedom
of using the best tool for the job is much easier to achieve.

or so is my thinking about these from experience.

~~~
icebraining
I think that's having a single codebase. You can have different projects using
different languages on a single repo.

------
rdsubhas
Again, this article and in general engineers discussing Monorepo vs Multirepo
- seem to be discussing about splitting code without splitting dependency
graphs, CI/CD and independent delivery. So they had "religious arguments"
(exactly as said in the article). Holistic Modularization and SOA (i.e. not
just splitting dev's job or qa/ops/architect/manager's job, but holistic) is
not an easy thing in general.

Lots of FUD on this topic, and although its good that the author has bought
this up, its just adding to even more FUD. I'd go as far as to say the title
is clickbait. The end topologies and tooling for monoliths are different. Not
necessarily one is better than other, they are _just different_. There is no
magic pill, there are right places/requirements and right
people/experience/culture to use each of them.

Mono-repo:

* Imagine it like: Building a new Ubuntu OS release for every Firefox update. Then work backwards

* Mono-repo needs completely different toolchain like borg/cmake, to find out the whole dependency graph, rebuild only changes and downstreams, avoiding a 2-hour build, avoiding 10gb release artifacts, scoped CI/commit builds (i.e. triggering build for only one changed folder instead of triggering a million tests), independent releases, etc

* Some more tooling for large repository and LFS management, buildchain optimization, completely different build farm strategies to run tests for one build across many agents, etc

* Must have separate team with expertise to maintain mono-repo with 100 components, and knows everything that needs to be done to make it happen

* Making sure people don't create a much worse dependency graph or shared messes, because its now easier to peek directly into every other module. Have you worked in companies where developers still have troubles with maven multi-module projects? Now imagine 10x of that

* Making sure services don't get stuck on shared library dependencies. Should be able to use guice 4.0 for one service, 4.1 for another, jersey 1.x f/a, jersey 2.x f/a, etc etc. Otherwise it becomes an all-or-nothing change, and falls back to being a monolith where services can't evolve without affecting others

* Does not mean its easy break compatibility and do continuous delivery (no, there is database changes, old clients, staggered rollout, rollbacks, etc. contracts must always be honored, no escaping that, a service has to be compatible with its own previous version for any sane continuous delivery process)

Multi-repo:

* Dependency graphs have to be split for continuous delivery

* Must not get into multi-repo if you don't have true Continuous Delivery practices (not continuous integration/deployment as tools, but Delivery as a mindset)

* Do not have "Release management" and "Release managers" with multi-repos, usage of trunk-based practices

* Don't split just code but split whole dep graph

* Proper artifact management and sharing, because shared components are enforced to be "diligently" managed, not just thrown into a folder

* Plan for multiple types of deliverables: Services (runtime), static artifacts, libraries/frameworks, contracts, data pipelines, etc

Not even scratching the surface of anything here. Its just an introduction to
the introduction. If the team believed they would get this resolved, they are
wrong, both are just beginning of two journeys. With the other direction, we
would have got a different blog post today, that's all.

In the end, if one has >10 services, they WILL have challenges whether its
mono-repo or multi-repo without having holistic modularization. They will need
componentization of the service, metrics isolation, build process isolation,
issues/project management isolation, team isolation, etc etc. The repo is the
last thing to worry about.

Edit: Formatting

------
TokenDiversity
Monorepo is harder to get right initially, and you have to think things
through. But once you do, you're set for it. I love that there is so little
code duplication.

