
Monorepos: Please don’t - louis-paul
https://medium.com/@mattklein123/monorepos-please-dont-e9a279be011b
======
curtis
My advice is that if components need to release together, then they ought to
be in the same repo. I'd probably go further and say that if you just think
components might need to release together then they should go in the same
repo, because you can in fact pretty easily manage projects with different
release schedules from the same repo if you really need to.

On the other hand if you've got a whole bunch of components in different repos
which need to release together it suddenly becomes a real pain.

If you've got components that will never need to release together, then of
course you can stick them in different repositories. But if you do this and
you want to share common code between the repositories then you will need to
manage that code with some sort of robust versioning system, and robust
versioning systems are hard. Only do something like that when the value is
high enough to justify the overhead. If you're in a startup, chances are very
good that the value is _not_ high enough.

As a final observation, you can split big repositories into smaller ones quite
easily (in Git anyway) but sticking small repositories together into a bigger
one is a lot harder. So start out with a monorepo and only split smaller
repositories out when it's clear that it really makes sense.

~~~
forty
My rule of thumb is: if you need to do PRs in several repositories to do one
features, you should probably merge the repositories. At work, we have code
spread among a bunch of repositories, and having to link to the 2/3 related
PRs in other repos is a major PITA, and even more so for the reviewers.

~~~
majewsky
Just because things change in tandem, that does not mean that they're all the
same thing. When I add a new function to my backend service, all frontends
that consume its API also need to be adjusted. But that doesn't mean that the
backend service, its command-line clients and its web GUI client should live
in the same repo.

~~~
sangnoir
It's probably a matter of taste - but I think they should be in the same repo.
I like tying test failures/regressions to a specific commit for documentation
and admin purposes. Having a test fail or regression due to an 'unrelated'
commit in another repo sounds like a nightmare waiting to happen when you try
investigating.

I the difference of opinion is between developers who work on self-hosted
"evergreen" products where the latest version is deployed, and others who work
with multiple release branches with fixes/features constantly being cherry-
picked.

------
mrgriffin
My problem with polyrepos is that often organizations end up splitting things
too finely, and now I'm unable to make a single commit to introduce a feature
because my changes have to live across several repositories. Which makes code
review more annoying because you have to tab back and forth to see all the
context. It's doubly frustrating when I'm (or my team is) the only people
working on those repositories, because now it doesn't feel like it gained any
advantages. I know the author addresses this, but I can't imagine projects are
typically at the scale they're describing. Certainly it's not my experience.

Also I definitely miss the ability to make changes to fundamental (internal)
libraries used by every project. It's too much hassle to track down all the
uses of a particular function, so I end up putting that change elsewhere,
which means someone else will do it a little different in their corner of the
world, which utterly confuses the first person who's unlucky enough to work in
both code bases (at the same time, or after moving teams).

~~~
inertiatic
My current team managed to break a single "component" out into a separate
repository. Then that repository broke into two, then those broke into other
repositories, until we've eventually have around 10 or so different
repositories that we work on every day.

An average change touches 4 of them, and touching one of them triggers on
average releases on 2 or 3 of them. Even building these locally is super
tedious, because we don't have any automation in place (not formally plan to)
for chain building these locally.

This is a nightmare scenario for myself. A simple change can require 4 pull
requests and reviews, half a day to test and a couple hours to release.

Yet my team keeps identifying small pieces that can be conceptually separated
from the rest of the functionality, even if they are heavily coupled, and
makes new repos for these!

~~~
drugme
_even if they are heavily coupled,_

So don't use polyrepos for heavily coupled projects, then. Or even better...

... try to avoid heavy coupling in the first place.

~~~
andrewprock
Unfortunately, these debates tend to be of the bikeshed variety.

Q: Why are we debating the merits of mono-repos over poly-repos?

A: Because it's managing dependencies is really hard and needs expertise.

------
yowlingcat
I think this article is complete horseshit. A monorepo will serve you 99% of
the time until you hit a certain level of scale when you get to worry about
whether a monorepo or a polyrepo is actually material. Most cases are never
going to get there. Before that point, a polyrepo is purely a distraction and
makes synchronous deployment really painful. We had to migrate a polyrepo to a
monorepo and it was not fun because it was a migration that should have never
had to be done in the first place. Articles like this are fundamentally
irresponsible.

~~~
thanatos_dem
I work on CI/CD systems, and that’s one thing that definitely gets harder in a
monorepo.

So you made a commit. What artifacts change as a result? What do you need to
rebuild, retest, and redeploy? It doesn’t take a large amount of scale to make
rebuilding and retesting everything impossible. In a poly repo world, the
repository is generally the unit of building and deployment. In monorepo it
gets more messy.

For instance, one perceived benefit of a monorepo is it removes the need for
explicit versioning between libraries and the code that uses them, since
they’re all versioned together.

But now, if someone changes the library, you need to have a way to find all of
its usages, and retest those to make sure the changed didn’t break their use.
So there’s a dependency tree of components somewhere that needs to be
established, but now it’s not explicit, and no one is given the option to pin
to a particular version if they can’t/won’t update. This is the world of
google & influcenced the (lack of) dependency management in go.

You could very well publish everything independently, using semver, and put
build descriptors inside each project subdirectory, but then, congratulations,
you just invented the polyrepo, or an approximation thereof.

~~~
manigandham
Rebuild and deploy everything, what's the actual problem? Like the OP said,
that's a scale issue and most projects don't have it.

Also building/testing is far more effective at finding dependencies than just
going by repo structure. There are numerous package managers available to
solve versioning if you need separate components.

~~~
yowlingcat
100% agree with your entire comment. This is what we do with our monorepo now
-- it turns out the rebuilding and deploying everything is actually just fine.
If your application services are stateless and decoupled from your state
stores, it's completely harmless. If you need to do something fancy, congrats!
You're at scale -- enjoy it but remember that it's something rare.

------
sfrench
My last 2 jobs have been working on developer productivity for 100+ developer
organizations. One is a monorepo, one is not. Neither really seems to result
in less work, or a better experience. But I've found that your choice just
dictates what type of problems you have to solve.

Monorepos are going to be mostly challenges around scaling the org in a single
repo.

Polyrepos are going to be mostly challenges with coordination.

But the absolute worst thing to do is not commit to a course of action and
have to solve _both_ sets of challenges (eg: having one pretty big repo with
80% of your code, and then the other 20% in a series of smaller repos)

~~~
01100011
Jesus, this. Look, you're going to run into issues either way, because you're
trying to solve a difficult problem.

It's like thinking OOP or functional programming is going to solve all your
issues... I mean, in some limited cases they could, but realistically you're
just smooshing the difficulties around and hopefully moving them to somewhere
where you are more able to deal with them.

FWIW, I've worked in a many-repo org and it sucked worse than huge companies
with monorepos and good tooling, but I'm not going to make some blanket
statement because it depends on the specifics of your code/release
process/developer familiarity etc.

~~~
brodo
This. Every decision is a trade-off. There is no silver bullet. Context
matters.

------
rossjudson
Hilariously misguided.

Pretty funny to read that the things I do every day are impossible.

Monorepo and tight coupling are orthogonal issues. Limits on coupling come
from the build system, not from the source repository.

Yes, you should assume there is a sophisticated "VFS". What is this "checkout"
you speak of? I have no time for that. I am too busy grepping the entire code
base, which is apparently not possible.

If the "the realities of build/deploy management at scale are largely
identical whether using a monorepo or polyrepo", then why on earth would
google invest enormous effort constructing an entire ecosystem around a
monorepo? Choices: 1) Google is dumb. 2) Mono and poly are not identical.

~~~
ashelmire
“why on earth would google invest enormous effort constructing an entire
ecosystem around a monorepo?”

Didn’t google have a monorepo before git was created? And was created by
academics? Legacy and momentum have a strong influence on the future. Hasn’t
google also built a lot of tools for the monorepo and dedicates employees to
it? That’s exactly the issue this article is about.

From an external perspective, the speed and scale of product rollouts from the
bigger tech companies is very slow. I don’t know if the tooling has much to do
with it, but I suspect it might. I’ve heard some horror stories (some from
here) about how it takes months to get small changes into production.

~~~
titanomachy
Does Google require more engineers to support their build system than they
would with a polyrepo? That question is not trivial to answer, IMO.

~~~
ashelmire
Any is more than 0 though. In my experience (probably shared by many devs),
polyrepos don’t require a team, or even a single person, dedicated to version
control. It’s a minor part of the software management (usually: “mind if I
create a new repo for this?” “Yes/no”).

It does affect dependency management but no more than any external dependency.

~~~
thedufer
A polyrepo setup at Google's scale would pretty obviously require some dev
work. For example, their CI/build story would be way more complex.

~~~
swish_bob
While that may be true, I'm not convinced it is a given. Any complicated
enough monorepo requires complex CI/build tools, and Bazel/Blaze exist for a
reason ...

~~~
joshuamorton
At Google scale, you'd either need tooling for automated version bumps, or
some other infra to manage versioning.

You'd need cross repo bisection.

You'd need a way to run all tests in all repos reflecting a new change.

There's 10s or hundreds more frs I could list.

~~~
ashelmire
“You'd need a way to run all tests in all repos reflecting a new change.”

You really shouldn’t have to run every test on every product. Or really any
other repos. Use semantic versioning, pin your dependencies, don’t make
breaking changes on patch or minor versions.

~~~
joshuamorton
Pinning your dependencies is an antipattern (or at least in the eyes of many
people who support monorepos it is).

It results in one of three things:

1\. People never update their dependencies. This is bad (consider a security
issue in a dependency)

2\. Client teams are forced to take on the work of updating due to breaking
changes in their dependencies. If they don't, we're back at 1.

3\. Library teams are forced to backport security updates to N versions that
are used across the company.

But really, the question to ask is

>don’t make breaking changes on patch or minor versions

How can you be sure you aren't breaking anyone without running their code? You
can be sure you aren't violating your defined APIs, but unless you're perfect,
your API isn't, and there are undocumented invariants that you may change.
Those break your users. Monorepo says that that was _your_ responsibility, and
therefore its your job to help them fix it. Polyrepo says that you don't need
to care about that, you can just semver major version bump and be done with
it, upgrading be damned.

No semver means that you, not your users, feel the pain of making breaking
changes. That's invariably a good thing.

~~~
senderista
At AMZN, which has 1000s of separate repos, 1) was the general case, with 2)
occurring whenever there was a critical security issue in some library that no
one had updated for years. The resulting fire drill of beating transitive
dependencies into submission could occupy days or weeks of dev time.

------
0xFACEFEED
At least the author gave us the courtesy of italicizing his broken assumption
from the outset of the post.

> Because, at scale, a monorepo must solve every problem that a polyrepo must
> solve, with the downside of encouraging tight coupling, and the additional
> herculean effort of tackling VCS scalability.

Right.

But you have to get to "scale" first (as it relates to VCSs). Most companies
don't. Even if they're successful. Introducing polyrepos front loads the
scaling problems for no reason whatsoever. A giant waste of time.

Checkmate! I didn't even need a snarky poll. The irony of that poll is that it
clearly demonstrates his zealotry, not other people's.

~~~
adrianN
You can't split monorepos after the fact, at least not without immense costs.
You can always just put all your small repos into a big one.

~~~
btschaegg
I think there's a nuance to this that should be pointed out: Monorepos allow
you to do very bad hacks (I need this other component over there; let me just
put in a Symlink. Done.). And if people can, they will use those hacks.

If you split your repo up from the get go, the worst thing you can get that
you'll have to assemble multiple distinct, well-encapsulated (in terms of
project structure) things into one. In Git, that could lead to multiple root
commits, but that's about it.

~~~
wcdolphin
No. The worst case is that the engineering team spent more time working on
“well encapsulated projects” than on the most important project for their
business and are all now out of jobs. Most companies don’t fail because of
tech debt. And certainly not because of version control tech debt.

~~~
btschaegg
Not exactly. (At least small) companies can go out of business because of
bugs. And one great way to "achieve" said bugs are implicit dependencies
hidden from developers that didn't introduce them.

> The worst case is that the engineering team spent more time working on “well
> encapsulated projects” than on the most important project for their business

I'm not really sure how I should read this. Don't you use your repos to solve
business problems? Why should that change because of the repo layout?

~~~
wsy
If you do a poly-repo approach from the start, and have dependencies between
repos, you need to introduce component versioning from the start. Component
versioning doesn't solve any business problems, but requires engineering
effort.

------
jayd16
There's a lot wrong with this article. Most of the arguments are either not
backed up or are misleading. I haven't heard anyone argue they can drop
dependency management because of a monorepo.

The author lists downsides of monorepos without listing the upsides and
downsides of polyrepos so its really half complete.

I don't think anyone who likes a monorepo is suggesting you just commit
breaking changes to master and ignore downstream teams. What it does do is
give the ability to see who those downstream teams (if any) might be.

The crux of the author's argument is that added information is harmful because
you might use it wrong. Its just as easy (far easier in fact) to ignore your
partners without the information a monorepo gives. Its not really an argument
at all. There's really nothing here but "there be dragons".

Monorepo's provide some cross functional information for a maintenance price.
Its up to you whether the benefit is worth the overhead.

~~~
briantakita
"... Please don't" titles also give off a condescending vibe, which usually
means the author has erected strawmen, is appealing to emotion, & has not
thought things through.

------
jonex
Seems like the main point is that you'll still need to add additional tooling
(search, local cloning, build, etc) to handle scaling, something you can do
just as well with polyrepos. Conversely, for polyrepos, you can add tooling to
fix issues with dependency management and multi-project changes/reviews.
However, the author figures that monorepos engourage bad code culture and
points out that Git is hard to build a monorepo on.

To me this message seems a bit shallow, of course we can build tooling to hide
the fact that we have a polyrepo. Given well enough built tooling and
consistent enough polyrepo structure (all using same VCS, all being linked
from common tooling, following common coding standards and using the same
build tooling, etc.) the distinction from having a monorepo is more of an
implementation detail.

Given the choice between a consistent monorepo where everyone is running
everything at HEAD and a polyrepo where each project have their own rules and
there's no tooling to make a multi-project atomic change, I'd go for the
former.

Given the choice between identical working environments but different
underlying implementations I would go for whatever the tools team think is
easier to maintain.

~~~
woolvalley
What is the tooling for multi-repo atomic synchronized commits? Monorepo's
give you that for free, which is the reason why I think monorepo projects
exist. SVN kind of gave you partial checkouts, which was helpful.

~~~
bluGill
Polyrepo argues that is a non-feature and don't give it to you. You can figure
out where things are, but you never get synchronization.

This is a good thing because when you have to make the multirepo commit you
make the change and then update each downside one at a time. Each change is
much smaller and so easier to review (and also easier to find the right
reviewer).

Of course the downside is you either have to maintain both ABIs (not just
API), have a rollout scheme with two version of the upstream library exist
side by side, or don't release.

Nothing is perfect.

------
olingern
I’ve found monorepos to be extremely valuable in an immature, high-churn
codebase.

Need to change a function signature or interface? Cool, global find & replace.

At some point monorepos outgrow their usefulness. The sheer amount of files in
something that’s 10K+ LOC ( not that large, I know ) warrants breaking apart
the codebase into packages.

Still, I almost err on the side of monorepos because of the convenience that
editors like vscode offer: autocomplete, auto-updating imports, etc.

~~~
jayd16
Hold on, are we talking about monorepos, ie a set of projects with shared
change history (and possibly 'build it all' type tooling) or single monolithic
apps?

I'm seeing these two things conflated in this thread.

~~~
olingern
To me, a monorepo exists of a set of related or semi related services or
runtimes that can operate autonomously, but have a dependency on their
siblings to operate correctly.

In some cases, this could be two separate backend projects where you want to
re-use the same deployment pipeline.

Often, I find that API wrappers are something that I share across frontends
and backends in the JS world, so it often makes sense to separate my projects
into:

\- backend

\- frontend

\- common

In Typescript I really like this pattern and can namespace shared types so
that it’s very clear to the future reader that this type is probably used
outside of the current context.

So, to reply to your comment — I think the term “monorepo” can encompass a lot
of different project types.

I think Dan Luu covers the bases quite well here:

[https://danluu.com/monorepo/](https://danluu.com/monorepo/)

------
im_down_w_otp
The biggest gripe I have with modern day monorepos is that people are trying
to use Git to interact with them, which doesn't make a tremendous amount of
sense, and results in either an immense amount of pain and/or the creation of
a bunch of tools to try to coerce Git into behaving basically like SVN.

Which of course begs the question, rather than trying to perform a bunch of
unnatural acts, why not just use SVN to start with? It works extremely well
with monorepo & subtree workflows.

Sure it has some warts in a few dimensions around branching, versioning, etc.
compared to Git when using Git in ways aligned with how Git wants to work, but
those warts are minimal in comparison to what's required to pretzel Git
monorepos into scaling effectively.

------
thedufer
Maybe its just that the author's cutoff is at the wrong team size, but the
monorepo I work on (with ~150 devs) has almost none of the problems presented.

Unreasonable for a single dev to have the entire repo? I'm looking at a repo
with ~10 million LoC and ~1.4 million commits. I have 74 different branches
checked out right now. Hard drives are _cheap_.

Code refactors are impossible? I reviewed two of those this morning. They're
essentially a non-event. I'm not sure what to make of the merge issue - does
code review have to start over after a merge? That seems like a deep issue in
your code review process. The service-oriented point seems like a non-
sequitur, unless you're telling me I'm supposed to have a service for, say, my
queue implementation or time library.

The VCS scalability issue is the only real downside I see here. And it _is_
real, but it also seems worth it. It helps that the big players are paving the
way here - Facebook's contributions to the scalability of mercurial has
definitely made a difference for us.

~~~
kiallmacinnes
In theory, yes - if the underlying repo changes, code review should start
over. In practice though, it's a terrible idea ;)

Part of code review is to ensure the code "fits" with all other merged code -
so a re-review is "needed" when other changes merge. E.g. if I merge a
refactor that changes everything from Pascal case to 100% SHOUTING, reviews
now need to take this into account.

In practice, this doesn't happen - it's way too much effort for far too little
value.

~~~
thedufer
I think the trick is to only re-review the areas that had merge conflicts, and
to do the re-review aware of both the changes you already reviewed and the
changes that caused the conflict. Merge conflicts, even in big code refactors,
are fairly rare, so this ends up not being much additional work in practice.

------
malkia
I do really like mono-repos, but google's other significant new project:
fuchsia - is set-up as multi-git repo (and I believe chromium too, maybe
android (haven't checked)). For fuchsia, they use a tool called "jiri"[1] to
update the repos, previously (and maybe still in use) is the "gclient" sync
tool [2] way from depot_tools[3]

[1] -
[https://fuchsia.googlesource.com/jiri/](https://fuchsia.googlesource.com/jiri/)
[2] -
[https://chromium.googlesource.com/chromium/tools/depot_tools...](https://chromium.googlesource.com/chromium/tools/depot_tools.git/+/master/gclient)
[3] -
[https://chromium.googlesource.com/chromium/tools/depot_tools...](https://chromium.googlesource.com/chromium/tools/depot_tools.git)

It even reflects a bit to the build system of choice, GN (used in the above),
previously gyp, feels similar on the surface (script) to Bazel, but has some
significant differences (gn has some more imperative parts, and it's a ninja-
build generator, while bazel, like pants/bucks/please.build is a build system
on it's own).

Simply fascinated :), and can't wait to see what the resolution of all this
would be... Bazel is getting there to support monorepos (through WORKSPACEs),
but there are some hard problems there...

~~~
robaato
Having worked with some organisations building on Android (>1,000 repos), life
is not easy when you are trying to build on top of it and regularly take
updates etc.

I asked one company how many changes required changes to more than one repo
and was told "a small percentage". We then did some basic analysis of issue
IDs across commits and discovered that it was in reality nearer 30% of
changes. Keeping those together was just plain very hard.

Start to scale this by teams of hundreds or thousands of devs and you get a
lot of pain.

Managing branches is also hard - easy to create (with repo tool) - but hard to
track changes.

------
towaway1138
My polyrepo cautionary tale: Two repos, one for fooclient, one for fooserver,
talking to each other over protocol. Fooserver can do scary dangerous
permanent things to company server instances, of which there are thousands.

Fooserver sprouts a query syntax ("just do this for test servers A and B"),
pushed to production. Fooclient sprouts code that relies on this, pushed to
production. A bit later, Fooserver is rolled back, blowing away query syntax,
pushed to production. "Just do this for test servers A and B" now becomes "Do
this for every server in the company". Hilarity ensues.

~~~
lioeters
Ouch. I suppose the lesson is that a monorepo with both client and server
being developed _and tested_ together would have reduced such risk.

~~~
yellowapple
Versioning the client/server interface would've also reduced such risk.

------
CJefferson
Is there any examples of someone who actually maintained a monorepo for a
massive company, who now says they shouldn't? It always seems to be "back seat
drivers" against monorepo, not people with practical experience (that I can
see at least)

------
ajuc
I call bullshit on "our repository is too big for one machine".

Seriously, you have over 1 TB of code and 100 people wrote it?

~~~
Xylakant
adding raw versions of binary assets (designs, video, ...) can quickly lift a
repo beyond a TB. Now, you could say "don't do that", but there's valid use
cases where you'd want to track all binary assets as part of the development
cycle.

~~~
tome
Ouch, well, yes _that_ is a very good situation in which not to take the
"mono" part of "monorepo" too seriously.

~~~
Xylakant
or use a VCS that allows partial checkouts of repositories. There's no DVCS
that I know of that can do that, but for example SVN can. Git LFS might be an
option, too. There are also commercial products that target that market.

I just wanted to point out that reaching a measly TB of data doesn't require
much effort. (worked on a product that would version rendered clips for
special effect production).

------
thehazard
Better title would be "Monorepos don't fit with my particular use case."

~~~
forrestthewoods
I strongly agree. I hate this style of blog post.

Telling people what they should or should not do is generally absurd. Every
situation is unique and you can't possibly know another project's requirements
or acceptable trade-offs.

A better approach, in my opinion, is "Here's what we did and why". The author
clearly has experience in the area. Great! Tell me about your problems. Tell
me about your attempted solutions and what did or did not work. Tell me what
you wish you had done! I'd love to use knowledge of your situation to inform
my own decision making.

But don't be surprised if my circumstances are different and lead me to prefer
different trade-offs and choose a different solution. That doesn't make me a
zealot or an idiot.

~~~
icebraining
Isn't your own post telling people what they should and should not do
(specifically on how to give advice)?

~~~
forrestthewoods
The irony wasn't lost on me. It's a fine line. Let me try a slightly different
approach.

When I blog I've had much better luck telling people "here's what I did and
why". I don't know your circumstances and can't tell you how to solve your
problems. You may need to choose different trade-offs than I did. With that
said, here is my problem, how I solved it, and what I learned along the way.
Hopefully you can learn from my experiences and make a more informed decision
for how to handle problems you may encounter.

------
rkangel
To me, the key point is this: Splitting your code into multiple repos draws a
permanent architectural boundary, and it's done at the _start_ of a project
(when you know the least about the right solution).

The upsides and downsides of this are an interesting debate, but there is a
cost to polyrepos if you want to change the system architecture. There is a
cost to monorepos too as argued by this post, and its up to the tech leads as
to which cost is greater.

------
peterwwillis
_" The frank reality is that, at scale, how well an organization does with
code sharing, collaboration, tight coupling, etc. is a direct result of
engineering culture and leadership, and has nothing to do with whether a
monorepo or a polyrepo is used. The two solutions end up looking identical to
the developer. In the face of this, why use a monorepo in the first place?"_

.....because, as the author directly stated, the type of repo has nothing to
do with the product being successful. So stop bikeshedding, pick a model, and
get on with the real business of delivering a successful product.

------
sterlind
Could you get the best of both worlds by having a monorepo of submodules? Code
would live in separate repos, but references would be declared in the
monorepo. Checkins and rollbacks to the monorepo would trigger CI.

~~~
naniwaduni
There's not much good to either world.

You need fairly extensive tooling to make working with a repo of submodules
comfortable at _any_ scale. At large scale, that tooling can be simpler than
the equivalent monorepo tooling, assuming that your individual repos remain
"small" but also appropriately granular (not a given--organizing is hard,
especially if you leave it to individual project teams). However, in the
process of getting there, a monorepo requires no particular bespoke tooling at
small or even medium scale (it's just "a repo"), and the performance
intolerability pretty much scales smoothly from there. And those can be
treated as technical problems if you don't want to approach social problems.

To put it another way, we're comparing asymptotic O(n) with something bigger,
neglecting huge constant factors on the former. There's a lot of path-
dependence, since restructuring all your repos with new tooling is hard to
appreciate.

------
sierdolij
Polyrepos are the way to go:

\- Semantic versions.

\- Group components into reusable packages.

\- Don't use git modules or other source cloning in builds, use
native/platform package management.

\- Access control is made much easier.

\- Sign commits and tags.

\- Code review either before- or after-the-fact, just do it(tm).

\- Reproducible builds - strip out timestamps/random tokens/unsorted metadata.

\- Create CHANGELOGs semi/automatically.

\- Eliminate manual steps altogether.

\- Distributed builds/build caching (distcc, ccache).

\- TDD smoke tests should run automatically in dev on save with 10 seconds.
Bonus points for running personal TDD sandbox on faster remote servers via
rsync and trigger on file-save.

\- Standardize on 1-3 languages.

\- Services composed of simpler 12factor microservices, not monorepo
megaservices. Deploy fuse switching, proxying, HA/redundancy, rate limiting,
monitoring and performance stats collection just like macroservices.

~~~
joshuamorton
About half of these aren't specific to polyrepos.

changelogs, reproducible builds, code review, signing, grouped components,
distributed builds, TDD smoke test, standardized languages, and microservices
are all possible (and just as easy) in monorepos.

You no longer need to worry about versioning, which means no manual updates of
either your package or updating dependency versions. Although access control
is more difficult, that doesn't seem like a good enough reason to make this
kind of decision.

------
rakoo
So the conclusion is "monorepo or polyrepo, you'll need a lot of tooling
anyway. So why use monorepos?"

Very easy: because having everything in a single place is just easier to work
with.

~~~
diminoten
Easier because you can commit more atrocities.

Easier is not better, some things should be hard to do to dissuade you from
doing them. Stop burdening your co-workers!

People in here keep saying "easy" like it's the end goal, but it's not.
Correct is. Writing great software is hard, and monorepos make it even harder
to do that because a monorepos encourages an "anything goes" vibe.

~~~
rakoo
I don't understand how you can say that monorepos and correct cannot coexist.
Don't you have the minimum that is code quality analysis, automated testing
and mandatory code reviews in place? Those must exist to maximize correctness,
and they must exist wheter you have 1 or 100 repos.

If I can commit more atrocities I can also commit more fine things, and I have
infra in place to stop crap making it to master.

Like the article said: it's all about the tooling

~~~
diminoten
I didn't say cannot, I said that correctness in a monorepo exists despite the
monorepo, not because of the monorepo.

~~~
rakoo
And conversely, polyrepo doesn't bring correctness just because it is poly.

~~~
diminoten
It does promote it, however, due to the abundance of support, tooling, and how
well it integrates with most processes and software available.

Being a special snowflake (monorepo) makes it a lot harder to write good
software.

------
harunurhan
I have worked with polyrepo madness... I do remember doing commits to up to 5
different repos just for a feature. And to roll this feature to prod, few of
these repos had to go through release process. On top of everything we
couldn't really write tests to ensure if the feature works. The best we could
do is write tests on the "user facing" repo and keep fixing and releasing
others until those pass.

Well, I am sure many companies doing better than we did, with poly repos
though.

~~~
twic
Monorepo = all applications in one repo

Polyrepo = each application in its own repo

Whatever-madness-you-had-repo = each application across multiple repos

I'm sorry to hear you had to suffer what you did, but that was not the only
alternative to a monorepo!

------
chvid
Does a lot of the pain from a monorepo come from trying to use a tool - Git -
that is explicitly designed to support distributed repositories? Wouldn't
things be easier if you used eg. Subversion instead? That is a tool that was
designed around a client/server paradigm and had a single repository as its
main use case.

~~~
icebraining
Git was designed for a monorepo (the kernel). When people talk about monorepo,
they mean a single history line.

~~~
rkangel
The kernel is not a monorepo. The kernel is a large(ish) repo.

A monorepo would be if you put the kernel AND userspace in the same repo (e.g.
all the code for a Yocto distro). To me, when people talk about a monorepo
they are talking about putting separate pieces of the architecture in a single
repo.

It's a great example actually. If it was all in a monorepo _and you could
release it together_ then you wouldn't have to worry about the breaking
userspace, you could make the changes to both sides at once. In practice
because that's not how releasing works in that environment, you can't do that.

------
laurencei
Can anyone here explain to me how a monorepo like Google or Facebook handles
security?

If I pull the repo - I have the _entire_ contents of Google or Facebook? Is
that right?

Surely that lacks the normal security measures around what must be highly
sensitive information, so there must be more to it than I know of?

~~~
joshuamorton
(there's an acm paper about Google's repo that dives deeper into this).

First thing, you can't just "pull the whole repo" at Google or fb scale. It
doesn't fit on a single hard drive.

This means the enitre repo is normally accessed via networked means. As a
result, builds can also be done over the network transparently.

So building and testing is done as a different user. That user can have
different privileges than the individual requesting the build.

So there is a way to hide source code so that only the output artifacts
(compiled binaries) can be accessed.

But I think the other part of that is that that's normally a tiny minority of
code.

The other option is of course to live outside the monorepo, as some projects
do.

~~~
ddulay
Link to the paper "Why Google Stores Billions of Lines of Code in a Single
Repository" [https://cacm.acm.org/magazines/2016/7/204032-why-google-
stor...](https://cacm.acm.org/magazines/2016/7/204032-why-google-stores-
billions-of-lines-of-code-in-a-single-repository/pdf)

Also, "Software Engineering at Google"
[https://arxiv.org/pdf/1702.01715.pdf](https://arxiv.org/pdf/1702.01715.pdf)

------
erik_seaberg
I wish this had touched on polyrepos' ability to pin known-good versions of
dependencies; that tends to be the Achilles' heel of monorepos.

~~~
muro
Yet who unpins them or updates when a new good version is available?

~~~
erik_seaberg
There's a cost for being out of date, but there's also a cost for learning the
hard way whether a new version breaks prod. Pay it down like any other tech
debt.

Maybe I could test literally every release version of each of my dependencies,
but that isn't really my job.

~~~
WorldMaker
Greenkeeper (and similar systems) comes to mind, too, in the polyrepo case.
You can still CI with "the latest" in the polyrepo case. We have the
technology to automate that. Including situations like 'let me know when the
next version of my dependency that passes _this_ test is released and send me
a PR to update my pinned version when it happens'.

------
EngineerBetter
Visited a customer recently who had inherited a monorepo.

All their CI and release problems traced back to it.

At the risk of sounding like an old git, package coupling and package cohesion
principles were defined for a reason.

I do feel like a lot of patterns in contemporary development are kneejerk
reactions to how last generation's programmers did things.

Exceptions? Nah, multiple returns! Dependency management? Who needs it... Oh,
wait.

Many small, single-responsibility repos? Wang it all in one, and then invent
your own tooling to cope with it!

~~~
fnord123
>Exceptions? Nah, multiple returns! Dependency management? Who needs it... Oh,
wait.

I thought consensus was that exceptions, like OOP, is an antipattern. I guess
there's room for different opinions. :-/

~~~
marcosdumay
There is a largely held opinion (that I share) that exceptions are a dated
pattern that is better replaced by the more modern alternatives.

They are still much better than the older patterns they were created to
replace, like multiple return.

~~~
fnord123
Go uses multiple returns. Rust uses Result<T, E>

------
icedchai
Monorepos are _way_ simpler for small teams to work with. At my startup we
have roughly 10 services out of the same repo. It's much easier to "cut a
release" across the entire system. It's much easier to share code internally,
upgrade dependencies, etc.

For a larger company, it might not be a good idea. However, most startups
start small and stay that way. Why take on the overhead you don't need?

------
wtracy
I'm not familiar with how monorepos work in practice, but it seems obvious to
me that it's going to complicate everyday tasks.

Ready to commit? Whoops, another team made a bunch of commits to their
project, and you need to rebase your project before you can commit. (I'm
having flashbacks to Clearcase already.)

Need to roll back the last two commits you made? Sure, that takes two seconds
--oh, wait, another team made multiple commits that got interleaved with
yours. Have fun cherry picking the files you want to revert.

Of course, I'm apparently a curmodgeon, because as soon as someone starts
talking about running a find/replace globally across multiple projects, I want
to grab something sharp.

~~~
lclarkmichalek
> Ready to commit? Whoops, another team made a bunch of commits to their
> project, and you need to rebase your project before you can commit. (I'm
> having flashbacks to Clearcase already.)

If they merge cleanly, it's not an issue. If they don't, you need to fix the
merge conflict. The work you need to do is proportional to the number of merge
conflicts, which isn't special to monorepos.

> Need to roll back the last two commits you made? Sure, that takes two
> seconds--oh, wait, another team made multiple commits that got interleaved
> with yours. Have fun cherry picking the files you want to revert.

Again, only an issue if the changes are on the same files. It can be a bit of
a pain to revert a stack of diffs, but if it's just a random commit with no
other relevent commits to the file, very easy.

~~~
wtracy
Yeah, the rebasing complaint isn't fair if you're using a modern VCS.

I used to work on a large team at Cisco Systems that used Clearcase. Clearcase
does not do merges. If _anything_ has changed in master, you have to check out
again, which obliterates _all_ local changes.

(I have never met a developer who liked Clearcase. It was built to simplify
life for system administrators and to tick the right boxes for management, not
to be useful for developers.)

My general VCS experience is that you can't roll back a commit without also
rolling back all subsequent commits, related or not. I'm glad to hear that
modern systems have fixed that. (It looks like even Subversion does that now,
cool!)

------
sytse
The article is great summary of the pros and cons.

What is still missing from the default tooling is a way to make a change
across repos.

At GitLab we're working on group merge requests to solve this
[https://gitlab.com/gitlab-org/gitlab-
ee/issues/3427](https://gitlab.com/gitlab-org/gitlab-ee/issues/3427)

------
rbetts
Unless you are pure OSS or pure closed source - you end up with a poly-repo
strategy regardless as you split open and closed code, suffering the
annoyances of both systems.

~~~
madhadron
No, what you end up with is a system for mirroring open source code into your
repo, and a system for mirroring commits that should be open source from your
code into external repos. All active work still happens in a monorepo.

~~~
rbetts
Any good examples of small companies that pull this off and the bots/CI/tool
whatever they use to do it?

------
madhadron
The truth is that you're not going to get to make this decision. If you're
starting greenfield, you're going to start a single repo for your project. If
that greenfield is the whole company and everything is part of that project,
you get a giant monorepo. If greenfield is a new division that's not part of
another project, you're going to create a new repo, and now you're in a
polyrepo environment.

Which way it goes is determined by the environment, wherein the engineers do
the sensible thing at the time. Then you do the engineering to solve the
problems with whatever way you went.

~~~
sfink
At Mozilla, we started with a monorepo, then went to a mostly-monorepo, then
consolidated just about everything back into a monorepo, and have since
"decayed" a bit to a mostly-monorepo.

There were major gains from merging our source tree with the continuous
integration support code and configuration. We've pretty much always vendored
selected third-party code, so that didn't really change. Large collections of
tests and their infrastructure have been much easier to manage as part of a
monorepo.

Given that our tooling efforts are mostly in handling a monorepo, I can't
really judge how differently things would be if we had gone full multirepo --
my experience with multirepos has been pretty awful, but that's an unfair
comparison since we intentionally haven't worked on tooling for it. We solved
our worst multirepo problems by going monorepo, but I'm sure other projects
have solved their worst monorepo problems by going multirepo, so neither
really proves anything.

The push for separate repos these days mostly comes from social reasons -- if
you have a separable piece and want external contributors, that's a strong
motivator for putting it on github since that's just Where Things Are these
days. No matter how good your tooling and issue tracking and whatever is, even
if it's far superior to github's, it doesn't really matter. People have to
learn yours; they already know github's. I don't particularly like github's
workflow, but I still use it.

------
superasn
The biggest advantages monorepos have offered is development of tools like
lerna(1) or yarn workspaces.

Before that there used to be a node_modules folder with GBs of [useless] data
in all my projects. Now there is just one folder on top and that's it. Also if
you're developing lots of modules or plugins it makes it super to work without
committing changes since they are symlinked.

(1) [https://lernajs.io](https://lernajs.io)

~~~
WorldMaker
node_modules is an interesting worst case of package management systems.

There's some good exploratory work currently happening on making node_modules
and the node package ecosystem better in general, but especially in the
polyrepo case. Yarn "Plug'n'Play" is one, and Tink [1] the other.

[1] [https://npm.community/t/tink-faq-a-package-unwinder-for-
java...](https://npm.community/t/tink-faq-a-package-unwinder-for-
javascript/3191)

------
sftwds
>Scaling a single VCS to hundreds of developers, hundreds of millions lines of
code...

Maybe I am way out of my element here, but is this a common problem? Do
companies with only “hundreds of engineers” really have “hundreds of millions
of lines of code”?

~~~
hermitdev
From personal experience, it can happen. At one point, I was personally
responsible for about 2 million lines of code. Over several years, I was able
to reduce it to about 500k through generous use of code generation for ORM
type work. The generated code never ended up in VCS, but the generator and
model did. Certainly helped checkout/update times as there was several
thousand fewer files to deal with.

I was one of about 900 engineers at a financial company of about 1500
employees at the time.

I don't honestly know how many lines of code there were across the company,
but I imagine it easily exceeded 100M. It took us a full week to do a full
recompile of everything. We had no CI... Was always a problem approaching
release time.

~~~
tome
> I was one of about 900 engineers at a financial company of about 1500
> employees at the time.

Can you say which company it is, or give a few more details? I'm fascinated to
know which financial company can consist of 60% engineers!

~~~
hermitdev
Sorry, no, NDA at what not. But, it was a very technology driven hedge fund,
not one of the big banks you read about in the news.

------
paulddraper
> is there any real difference between checking out a portion of the tree via
> a VFS or checking out multiple repositories? There is no difference.

How big is your monorepo? Assume each line of code is a full 80 characters,
stored via ASCII/UTF-8. That _67 million lines of code_ in 5GB. I can fit five
of those on a Blu-ray.

> The end result is that the realities of build/deploy management at scale are
> largely identical whether using a monorepo or polyrepo.

True.

> It might be deployed over a period of hours, days, or months. Thus, modern
> developers must think about backwards compatibility in the wild.

Depends entire on the application. Lots of changes are deployed within short
periods of time with low compatibility requirements.

> Downside 1: Tight coupling

Monorepos do often have tightly coupled software. Polyrepos also often have
tightly coupled software. Polyrepos _look_ more decoupled, but pragmatically I
can't say I've noticed a much of difference.

> Downside 2: VCS scalability

I've also heard Twitter engineers complain about the VCS. But what is the
scope of the author's discussion? 1,000 engineer orgs? Or 20 engineer orgs?
Those are _vastly_ different levels of engineering collaboration. I assume the
article was not written to cover both of those. Or was it?

\---

Ultimately, I think the author implicitly assumed a universe of discourse of
gigantic repos with hundreds and hundreds of daily contributors.

When people talk about the spectrum of monorepo vs polyrepo architectures,
that is very extreme. For example, last I knew, Uber has more repos than it
did engineers. And I don't assume that "polyrepos" always means multiple repos
per engineer.

------
m0zg
"Scalability" issues aren't encountered until your repo has many millions of
LOCs and a lot of churn. For 99.99% of organizations this is not an issue and
will never be an issue.

------
titzer
No silver bullet here, I think.

It's definitely the case that a mega monorepo doesn't, in practice, have the
atomic commit property. E.g. once you add owner files and separate code
reviews, you're in for a world of hurt. Case in point, Google developed an
internal tool to split cross-cutting CLs into manageable pieces, wrangle all
the owners and approvals, presubmits, etc, and then submit the CL piecemeal--
i.e. _not_ atomic.

Chromium uses a different model. It just DEPS's in other repos at pinned
versions. That has a whole other set of problems.

~~~
EricBurnett
(Disclaimer: at Google)

It's not quite so black and white. It's true that repo-wide refactorings often
get carved into little changes and so aren't made atomically, but they're the
exception rather than the rule. Any small change, e.g. changing an interface
and the 5 callers of it, _can_ be made atomically. And changing code that's
reused a small number of times is a far more common case then changing core
libraries the whole company uses, so atomic submit ends up being hugely
valuable.

------
dajonker
I've been in a project where some (authoritative) people had a tendency to
split things into separate repositories for very small things, e.g.
repositories with a single class. This was pure developer hell. Any change
meant changing at least 3 repositories, including a review for each change.
Never understood this decision as all parts needed to be on the latest version
anyway. Caused lots of dependency and versioning issues too.

------
notacoward
You know what's worse than a monorepo? A duorepo. Yes, that's right, two huge
repositories embodying all the problems of a monorepo, but coupled in such a
way that it's easy to break something if the commits and deployments from one
are out of sync with the other. It's like drinking both bottles of poison, yet
it (or minor variations such as three or four entangled ginormorepos) is a
thing that really exists.

------
pbiggar
Alternate title: monorepos - ideal for teams under 100 devs

~~~
joeblau
At Uber, both of our iOS and Android teams are over 100 contributors each and
we have a Monorepo for each app platform. I'm not on the ops team but being in
a Monorepo here has been one of the best development experiences in my career.

~~~
pbalau
FB has a monorepo for most of the known universe

~~~
paulddraper
And Google has one that includes the multiverse.

------
mr_tristan
Open source software workflows are very common and provide a _lot_ of tooling,
e.g., Maven, bundler, npm, etc. Add semantic versioning and you have a lot of
tooling that you basically get for free for polyrepo setups. With monorepos,
you have to really spend a lot of time tooling, because you basically don't
use the OSS tools.

There's a lot of odd arguments in this blog that are very spurious:

"If an organization wishes to create or easily consume OSS, using a polyrepo
is required."

What? _Consuming_ OSS is usually not that bad. I've even imported the complete
history from external repos, pretty easily. (It does suck with git but I
wouldn't use git for a monorepo...) _Contributing_ to OSS is tricky, but the
fact you use a polyrepos don't really help you much there either.

"Polyrepo code layout offers clear team/project/abstraction/ownership
boundaries and encourages developers to think carefully about contracts."

Clear ownership boundaries has _zero_ to do with polyrepos. In fact, I'd say
monorepos can be easier, since you say "everything under this directory is
owned by X,Y,Z". There's no search function that's required to figure out
where some other team hid their code. So many times, with polyrepos, projects
are _hidden_ because they're off in some other grouping unit that you're not a
member of. So you don't even know who owns what or where it came from.

In the end, I'd still strongly recommend using polyrepos because you get _a
lot_ of tooling for free, and most integration issues are solved with semantic
version locking and CD automation. But the arguments here are not really
great.

------
hvindin
I suspect the problem most people end up trying to solve isn't "how do I
technically scale my tools", because the author points out that tools and
techniques for this already exist and its an already solved problem.

Instead my experience has largely been that the problem to solve is "how do I
make some few hundred developers behave in a predictable way" in the scenario
where you have many ok developers, but that you can't really be sure that none
of them will break stuff because you are trying to solve organisational
problems of keeping people with merge rights to only the people who wont break
things but at the same time not bottle necking development on to small a
number of people then sure, split your repos up so that people can only break
stuff that they 'own'.

But at least be honest about the fact that most of the technical issues of
having a monorepo have been solved already so the issues you are probably
trying to solve are actually people problems.

------
zbentley
The VCS/codebase-tooling-size argument rings a bit hollow.

We have really good code-search tools that are heavily optimized and indexed
(from ripgrep/silversearcher to more centralized things like hound, when
local-disk performance just won't cutit).

It's not hard to optimize Git workflows to be faster with relatively simple
tricks, and if that absolutely doesn't scale for some reason and VFS isn't an
option, there are always centralized VCS systems like Perforce that solve
this. P4 gets a lot of shit, but it's _really_ good at solving the gigantic-
repo domain; tune your client properly and you can initial-sync 10+ GB repos
in the time it takes to get a cup of coffee (and, if your company is large/old
enough to have a repo that big, it can probably afford the Perforce licenses).

------
etxm
I feel like a lot of arguments against monorepos assume micro services are the
_only_ option on the other side.

I tend to break up most of my projects at the edge of business logic or domain
logic and lean on a package manager to “deploy together” like any other
dependency that’s not in your repo.

This allows teams to work independently without a large sprawling repo. If
you’re following anything semver-ish hopefully your other teams in the company
aren’t breaking releases and you can auto-upgrade on patch level changes. If
not, we’ll thank goodness CI is there.

I’ve always had difficulty working in projects with too many purposes. This
helps me focus where to put things and gives an easy point of escape if a
dependency needs to become a service in the future.

------
cmrdporcupine
It's worth pointing out that while Google has a monorepo in Google3, it also
doesn't at the same time. We have are other projects such as Android, ones
based on Chrome, etc. that are composed of multiple git projects and use repo
to manage and sync.

------
userulluipeste
This is just modularity in the broader information-related development
handling. People tend to get political about things that make their
involvement comfortable and/or when it's not them who are to deal with the
pesky consequences. I strongly suspect that that must have been the cause for
how giants like Google¹ or Facebook² ended up with monorepos. It is
developers' workout or letting them be with their cake; kick the problem down
the road and hope to acquire in time the resources necessary to throw at it
later.

¹² "Don't be evil" (with developers, among others) and "move fast and break
things" most definitely asks for cutting a few corners here and there.

------
NicoJuicy
Since i knew that Google did it, i have started to think about it, a lot.

And mono-repos really do make sense ( a lot) when you need them tied together.
Finding errors in your console immediatly without version numbers gets the job
faster done.

There are other ways though, like if you use dot net. A mono repo that creates
nuget packages and projects that pull the latest build of them into their
solution. This way, external parties can re-use the same components.

On a beta version, that releases new nuget components, if there is a file
change ( and so a version update), notify the external parties.

Have one website which mentions the schedule of an update on the live version
to reduce email traffic. Oldskool, but it seems to work.

------
obeattie
The real issue is that this is more nuanced than is appropriate for one-size-
fits-all advice. "Everyone should use a monorepo" isn't helpful, but neither
is "everyone should not use a monorepo."

Sadly, this article falls into the same trap.

------
manigandham
As always, the real world is a whole lot of gray between the black and white
articles that are fun but useless.

Multirepos like microservices are all about scaling people, not the project.
Start with the monolith and monorepo until you need to split, and then focus
on separating by groups of logical functionality or team responsibilities
(although if those are different then you'll end up with other problems).

Also stop taking things literally. Monorepo does not mean you must have
everything in a single repo. Even a startup can put the majority of the
codebase in one place and have things like a corporate website or small admin
backend in another.

------
daemin
If you spend your time maintaining and improving a low level library then
having a monorepo is much better than having a polyrepo. Primarily because as
you make changes to the library you can update everyone else's code that uses
it rather than having to wait on them to do so. This reduces the need to
maintain older versions of these libraries and applications.

Additionally you can submit the single change in one go which updates everyone
and it is much cleaner than having to find out and know all of the repos that
could use your library and manually submit to each one of them, probably
breaking some for a time in the process.

------
vikingcaffiene
I am sure this author means well but I respectfully disagree with this advice.

The author is arguing against the monorepo approach and then proceeds to list
out some of the most successful software companies on earth as reasons NOT to
do it. The reason they were able to get to their lofty heights was in some
part because they used a monorepo. The biggest advantage of a monorepo is you
can move quickly and understand the implications of changes since everything
is housed under one roof. That's critical for startups IMO. By the time you
reach the "scale" the author is talking about, you have the resources to deal
with it. Is it hard? Yes. Is it worth throwing out the baby with the
bathwater? No IMO.

I currently work in a polyrepo word that the author is encouraging. I can tell
you it f*cking sucks. Just take the very simple example of firing up your dev
environment. In a polyrepo world, you have to individually fire up each
codebase or write up some sort of script to do that for you. The former
example sucks for obvious reasons and the latter example makes the case for a
monorepo since one dev could author a script that could then be used by all
(since he/she will know the paths to all things that need to start). Don't
even get me started on setting up an environment from scratch. Containers make
this easier but again, it would be nice to just rock `./start.sh` and be off
to the races. A monorepo can give you that.

Pulling/pushing changes to your vcs becomes a tiresome error prone nightmare
since now you need to remember to run git pull on all the codebases that touch
the area you are working on. You might forget to pull on one of those
codebases and everything starts breaking and now you need to stop and track it
down. Dumb error? Yep. Not a thing in a monorepo? Yep. PR's become really
sucky because now you need to harass your team for n PR's instead of just the
one if the feature you are working on cuts across codebases. I've worked in
some fairly large monorepo codebases with lifespans of >10 years and I can
tell you that I have yet to encounter any of the issues with VCS scaling the
author speaks of. In the future if I find myself in a situation like that you
know what I'll do? Migrate to a more performant solution like Mercurial or
something. Will it suck? Sure. But not as much as dealing with a polyrepo.

Then there's dependency management. Holy sweet mother of god dependency
management is the worst. Lets say you need to make a breaking change to one of
your codebases, in a monorepo (with decent test coverage or a type system
worth a damn) you have a decent chance of tracking down everything that needs
to get patched. In a polyrepo? Phttt! Enjoy those bug reports from your
customers and 1am hotfixes bruh.

I really really wish people on here would stop trying to solve problems of
"scale" when that's literally the last thing you need to worry about. Being
able to respond quickly to business requirements is the only thing you should
be worrying about until its obvious that you've made it. Then feel free to
worry about scale.

~~~
pcj-github
Yup... This. You saved me 20m from writing this same/similar reply.

Tragically this article will be subsequently be cited by countless software
managers that retain a fear of monorepos for some of the reasons cited here;
junior (& sr devs) will go along with it out of not wanting to stick their
necks out, and the cycle of pain will continue.

------
lmilcin
"Yeah, well, that's just, like, your opinion, man"

Monorepos are one way of solving some of the problems each organization has.
Monorepos require discipline in solving those problems and if the organization
is not willing to get there all the way or if it takes too much time then it's
just pain and suffering for everybody.

I suspect the author works for one of those organizations that wanted to be
hip but did not actually understand what it entails. Maybe faking agile and
devops sort of works for you (works as in "it's difficult to pinpoint where
the problem is") but faking monorepos certainly does not.

------
username90
With the right tooling for both types each directory in a monorepo is
equivalent to a repository in a multirepo setup. The only difference is that
in the monorepo it is easier to create new repos and dependencies between
repos (just add a directory in a commit or add a dependency on another
directory).

The author of this piece apparently think that the ease to work in a monorepo
is a bad thing, I disagree. I think that being able to treat repositories as
easily as directories is awesome since it is a lot simpler so requires a lot
less training for your devs to understand.

------
sandov
I agree, but also...

Medium: Please don't

~~~
erikpukinskis
Why?

~~~
sandov
TL;DR: Really bad user experience.

For a longer explanation see [https://medium.com/@nikitonsky/medium-is-a-poor-
choice-for-b...](https://medium.com/@nikitonsky/medium-is-a-poor-choice-for-
blogging-bb0048d19133)

Alternatives better than medium: Wordpress, Blogger, github pages, plain html
files.

~~~
garmaine
“For a longer explanation: see this Medium post.”

There is a certain irony there which betrays one of the problems left
unaddressed.

~~~
sandov
The purpose of it being a Medium post is so you can instantly see what he's
talking about.

------
kerng
Static analysis is easier on monorepo. At least one can run it on all code.
Polyrepo has the problem that some code is off the radar. That might be the
only advantage of monorepo in my opinion.

------
ascorbic
One thing I really dislike about monorepos for node modules is that you can't
npm install from them directly. Unless it's a project with very fast PR
merging and releases you can be stuck with a broken module that has an open or
even merged PR to fix it that you can't install because it hasn't been pushed
to npm. npm link might work locally, but that doesn't help if it needs to
build on a CI server. If it's one repo per module then I can just npm install
the git url and it works fine.

------
groestl
Me, I dream of a monorepo covering the whole world. Give me a single hash, and
let me know the state of things as they are, reaching from the toolchain used
to compile the bootloader to the state of the database, which has just dropped
a row and therefore generated a new commit, forever secure, an immutable
history.

I accept the infeasibility of my dream. But I'd like my repo to cover as much
as my tooling realistically allows.

------
mcguire
" _If an individual clone got too far behind, it took hours to catch up (for a
time there was even a practice of shipping hard drives to remote employees
with a recent clone to start out with). I bring this up not specifically to
make fun of Twitter engineering, but to illustrate how hard this problem is._
"

But mostly to make fun of Twitter engineering.

Seriously, what advantages would a big bag of billions of lines of code have?

------
isacikgoz
Just read the article and it was really great read. We're using polyrepos and
dealing with so many repos was not great. That's why I created "gitbatch".
Gitbatch allows you to manage multiple repositories in an easy way.

[https://github.com/isacikgoz/gitbatch](https://github.com/isacikgoz/gitbatch)

------
jamietanna
It's funny you say this, because my most viewed article from organic searches
is about converting your polyrepo setup to a monorepo
[https://www.jvt.me/posts/2018/06/01/git-subtree-
monorepo/](https://www.jvt.me/posts/2018/06/01/git-subtree-monorepo/)

------
1024core
Google uses a monorepo: [https://cacm.acm.org/magazines/2016/7/204032-why-
google-stor...](https://cacm.acm.org/magazines/2016/7/204032-why-google-
stores-billions-of-lines-of-code-in-a-single-repository/fulltext)

I don't think "scale" gets much bigger than Google.

------
karmakaze
TL;DR

The post just reads like some opinionated piece for traffic. The author has
never even used a monorepo as far as I can tell, so can only argue from one
side, the best one ever used: polyrepo. Then goes on to list 'theoretical'
benefits and the downsides (which should also be theoretical if having never
been used) of monorepos. It concludes with "The two solutions end up looking
identical to the developer. In the face of this, why use a monorepo in the
first place? Please don’t!" implying that 'Google, Facebook, Twitter, and
others' do it for no benefit.

------
z3t4
You can make shallow clones, and auto push to a single repo, it's common to
for example auto push to Github from an internal repo. It sure has it's
issues, but problems are solved with solutions, simply bury your head in the
sand, eg. using single repos where a monorepo is the best solution - is not a
solution.

------
qwerty456127
I didn't actually know somebody already does this, but a conceptual idea has
came into my mind yesterday: what if there was just one big code repository
for a particular programming language and everything anybody writes would
immediately become a part of the standard library? It feels kind of a
collective brain...

------
hennsen
To sum up the discussion of why this is either absolutely right / absolutely
wrong, How about: „mono/poly-repo - none of both is THE single solution for
every usecase in every organization and project“? Besides that, sure let’s
keep analyzing the pros and cons of each in different scenarios...

------
shereadsthenews
This argument boils down to people who have used Perforce, who believe in the
benefits of a monorepo, and people who have only ever used git, who do not.
While it's true that git is a terrible program that does not lead to
conclusions about the merits of a monorepo.

------
KuhlMensch
Hm, I find monorepos are a natural in javascript land. There is allot less
wiring afforded by a little meta-orchestration. This is especially helpful in
the repos I've worked on.

But from reading the article, it seems like there are legitimate areas where
they might not fit.

------
sbr464
One glaring omission of the monorepo design, not sure why really, is if you
want open and closed source software in the same monorepo, it doesn’t seem
possible. Curious as to why this design choice was made.

~~~
lclarkmichalek
It's entirely possible, and repositories like
[https://github.com/facebook/fbthrift/](https://github.com/facebook/fbthrift/)
are an example of an open source project that is synced commit for commit with
a private monorepo.

It just requires some tooling (like everything with monorepos)

~~~
sbr464
Thanks for sharing

------
astrostl
The word "workflow" is suspiciously absent from the OP.

Annoying workflow is my #1 complaint against polyrepos.

------
costrouc
in my opinion several build systems / package managers have already solved
this issue. The answer is that it doesn't matter mono repo vs polyrepo. Look
at nixpkgs/nixos/nixpkgs if you are interested

------
nathan_f77
I'm using a monorepo as a solo developer, and it's been pretty good. I like
having everything in one place, so I can work on everything in a branch,
including the feature, updates to API clients, documentation, blog post, etc.

One problem is that my test suite is very inefficient. I have to run through
every integration test, even if I haven't changed any code that might cause
these tests to fail. It's especially weird that CI runs all my tests whenever
I write a new blog post. So I'm very tempted to split up some things into
internal libraries and keep them in a separate repo, and add all these repos
as submodules. I know this can be pretty dangerous, and it's easy to break
things when you update dependencies, OS versions, language versions, etc.

If I go down this road, I have to be extremely careful to enumerate all the
things that might break the library, and prevent any of these things from
being updated automatically. I'll set a very strict version constraint in the
package.json / gemspec, and throw an error if I detect a different version of
Node, Python, Ruby, system libraries, etc. Then I'm forced to run all the
library tests and explicitly bump the versions if I want to update anything.

I should also only do this when the library is a pure function with no side
effects.

The really tricky part is figuring out how to write robust integration tests.
API boundaries can be a big source of bugs. I think I'll do something similar
to VCR [1], where the first integration test executes all of the code without
any mocks, and then records the response. The response would then include
those exact arguments, and it would also be tied to a specific commit hash for
the library. If I change anything in the library, then I just need to re-run
the slow tests, and then everything will be cached. I guess a real advantage
of putting things in a separate library is that you know exactly what files
are required for a specific feature, and the commit hash gives you a
"fingerprint" of those files that you can use for caching in your tests.

Just have to be super careful about any dependencies that might break the
library. Also I really need to start running all my tests in a Docker
container which matches CI and production. I even have some screenshot tests
where I have alternative versions for Mac and Linux. Would be nice to delete
those. The experience was really bad when I tried to do this in the past, so I
need to figure out a better way.

Anyway, sorry for the train of thought! Would be interested to hear your
thoughts, and if there's anything else I should watch out for.

[1] [https://github.com/vcr/vcr](https://github.com/vcr/vcr)

------
jrochkind1
the grass is always greener.

------
nunez
These arguments are weak, IMO.

Yes, monorepos can be slow to browse through if the VCS isn’t configured to
handle the size (sparse pulls aren’t the default with Git; that alone can make
a massive difference when your repo is massive). Polyrepos can be just as
slow? however; what’s worse is that there are _more_ of them.

I remember working with a repo that was >20GB large, mostly from videos (we
didn’t know that initially). Pulling that repo took _forever_. Nobody on that
team cared because they almost never did a fresh pull and accounted the time
it took for their CI/CD to do so in their reports. If it were a monorepo, MANY
teams would’ve felt that pain more immediately.

Yes, monorepos require some tooling to prevent a gazillion artifacts from
being deployed at once (and to specify what’s related to what if code lives
across different folders). So do polyrepos! I’ve configured a few Jenkins jobs
for my clients to dynamically pull different co-dependent Git repositories at
build time. It’s a pain! Especially when multiple credentials are involved!
Then there’s the whole “We have a gazillion repos and 20% of them are junk”
problem, which requires automated reaping; also a more difficult problem than
it seems.

Same with refactors. Refactors across polyrepos are just as much of a pain
because you’re now subject to _n_ build and review processes/pull requests,
and seeing the entire diff is hard/impossible. This introduces mistakes. If
anything, refactors in polyrepos are more of an event than they are for
monorepos.

While monorepos have their problems, I will continue to advocate for them
because the ability to see what’s going on in one place and for any developer
to propose changes to any part of the code (theoretically) is massively
beneficial, ESPECIALLY for complex business domains like healthcare or
financial services. Plus, you will have a RelEng/BuildEng team when your
codebase and engineering org gets large enough; why add more complexity by
creating a gazillion repos that are possibly related to each other?

(The large engineering organization without a team focussed on tools and
builds doesn’t exist. If it doesn’t, that means that some/many developers are
spending way more time spinning their wheels on build systems than they should
be.)

The real reason why monorepos don’t happen in the aforementioned domains is
because there’s no easy way to allow them and pass regulatory audits.

Many regulating bodies require hard boundaries enforced by role-based access
control, especially for code that deals with personally-identifiable
information or code between two or more domains that have a Chinese Wall
between them. “All of my developers can check out the entire codebase” is an
easy way to get fined hard, and polyrepos are much easier to restrict access
into than folders in a monorepo are (one advantage not mentioned in the
article). While you _can_ restrict access into directories within a single
repo, doing so is not straightforward, and most organizations would rather not
waste the engineering effort.

I would like to think that Google and Facebook have gotten away with it
because they implemented a monorepo from the very beginning and the
engineering involved in splitting it up is much more involved than engineering
around it.

That said, I continue to advocate for them because discoverability is good and
it builds a better engineering culture in the end. I would rather hit those
walls and make just-in-time exceptions for them than assume that the walls are
there and create a worse development experience without exploring better
alternatives.

------
mrbanks
Overly opinionated garbage, imagine having to work with this guy.

