
Monorepo: please do - Soliah
https://medium.com/@adamhjk/monorepo-please-do-3657e08a4b70
======
sharpercoder
In my experience, this discussion gets convoluted by confusing _modularity_
with _monorepo_. They are orthogonal to each other; you can have a very
modular codebase in a monorepo but also a very coupled (non-modular) codebase
with polyrepo.

Though it's true that monorepos without proper discipline can tend towards
coupling. Yet, when discussing mono vs poly, we should keep this in mind.

~~~
benmarten
It's not, why do I have to checkout terabyte of code that I don't need, even
if the code is modularized?

~~~
erulabs
If a mono-repo has a terabyte of code, or if 10 small repos have 1/10th a
terabyte each, what have you really gained? In any case, git LFS solves large
file storage effectively, as do a number of other artifact storage solutions,
and a repo with a terabyte of code is _not_ going to be trivially split apart,
since it would be by a factor of thousands, the biggest codebase ever created
by humankind.

~~~
twblalock
If I only need to check out one of the smaller repos then I've gained quite a
lot in terms of download speed, storage size, etc. Git LFS adds a lot of
complexity I'd rather avoid.

~~~
erulabs
Sure but then you only have some small portion of the total infrastructure,
which adds its own layer of complexity for the people reviewing your changes
:P It's all trade offs, is all I'm saying - I honestly still can't decide
between the two, although for all companies sub 20 people, I'd for sure stick
with a single repo.

~~~
tracker1
If I'm working on Application X, wtf do I care about infrastructure code? Or
for that matter, as a specific... if someone is working on Google Maps, should
they care about the codebase for Google Inbox for Android?

~~~
malkia
You maybe relying on shared component for your app, you simply put in your
BUILD bazel (blaze) file deps reference to it - e.g. "//base:something", but
now that "//base:something" might itself rely on other deps, but that should
not be of your concern.

So - what's stopping you from depending (using) anything else? Or how to stop
you from doing this? BAZEL (blaze) has visiblity rules, which by default are
private - e.g. the rules in your packages are hidden, unless explicitly made
public, or alternatively you can white-list one by one which other packages
(//java/com/google/blah/myapp) can include you back.

Let's say there is a new cool service, and your team wants to try it out...
but it's not out there for everyone to use, it's in alpha, beta, whatever
stage. So you ask for permission from the team, or simply create a CL with
your package target, name, "..." folder resolution so that you are whitelisted
- eventually you will (if that's good idea, and approved). For example you
want, if some library got deprecated, and has been slowly replaced with
another, and then now instead of being "//visibility:public" is just white
listing the last users of it... Well probably not good idea to be added on
that list, as the whole thing is going out soon (yes, Google tends to
deprecate internally even faster than externally - ... which is good!). But
such mechanisms are helpful in getting this worked correctly.

------
jonthepirate
The reason I am upvoting this is that it is written in a positive tone. Too
many people - especially in the world of DevOps, trash everything. (X is the
worst, don't do that, etc) and more often than not do not offer better
guidance following their whiney tone. We need more "please do's" in this
industry. Thank you Adam.

------
klodolph
I have personally migrated a medium size polyrepo code base (something like
~20 repos?) into a monorepo and I agonized over the decision. But it lifted a
huge weight off my shoulders.

I feel like if you are working completely in the open-source world, and you
are contributing one open-source project to a larger array of available
projects, then the decision to use a polyrepo makes a lot of sense. You can
submit libraries to a package repository like Yarn/NPM/PyPI or you can use Git
references for e.g. Go's package manager.

But what I experienced with polyrepos outside this world is that we ended up
with a weird DAG of repos. It was always unclear whether a specific piece of
code that was duplicated between projects should be moved into one dependency
or another, or whether it should have its own repo. Transitive dependencies
were no fun at all, if you used git modules you might end up with two copies
of the same dependency. You might have to make a sequence of commits to
different repos, remembering to update cross-repo references as you go, and if
you got stuck somewhere you had to work backwards. This feels like a step
backwards, like the step backwards from CVS to RCS.

Again, in the open-source world you might have some of this taken care of by
using a package manager like Yarn. But if your transitive dependencies aren't
suitable for being published that way, it can be tough. Monorepo + Bazel right
now is a bit rough around the edges but overall it's reduced the amount of
engineering time spent on build systems.

On the other hand, it's not like Bazel can't handle polyrepos. In fact, they
work quite nicely, and Bazel can automatically do partial checkouts of sets of
related polyrepos, if that's your thing.

As for VCS scalability problems, I expect that Git is really just the popular
VCS du jour and some white horse will show up any day now with a good story
for large, centralized repos with a VFS layer. In the meantime any company
large enough to experience VCS performance problems but not large enough have
their own VCS team (like Google and Facebook) will suffer, or possibly pay for
Perforce.

~~~
malkia
BAZEL has WORKSPACE file that can work with multi-repos, but AFAIK things are
still rough there, though would get better eventually (I'm bit hand-wavy on
the details).

~~~
klodolph
Yes, exactly. Unfortunately, sometimes the partial checkouts can be somewhat
limited by the fact that your WORKSPACE code will import Starlark defined in
other repos. This can get a bit ridiculous if your repo uses a bunch of
different languages, if you browse through e.g. the TypeScript support
instructions for Bazel you’ll see some of what you’re in for.

If your project is mostly something like C++ (which has support built-in to
Bazel) then the WORKSPACE rules will be much more manageable and partial
checkouts become a lot easier.

------
kokokokoko
Its almost as if both approaches have positives and negatives. Some of which
are more important depending on your project and organization.

I'd be more interested to read about a project or company that failed due to
making one choice or the other. And then by switching things to the other way,
things were fixed.

Otherwise, as someone who was worked with both, I imagine there are a host of
other decisions that will be much determinant on your success.

Let's not get too wrapped up in what color to paint the shed.

~~~
natalyarostova
>Its almost as if

Please don't do this.

------
ceronman
I work at a large organization (2000+ devs). We have used both a Monorepo and
Polyrepo. After some extensive experience with both models my conclusion is
that the Monorepo is by far a superior model, specially for a large
organization.

Of course the Monorepo is not free of downsides, those mentioned in the
original article are real, although a bit exaggerated in my opinion. VCS
operations can be slow and scaling a VCS system is challenging, but possible.
And the risk of high coupling and a tangled architecture is also very real if
you don't use a dependency management system like Bazel/Buck/Pants.

But in my opinion the downsides of the Polyrepo are much worse and much much
harder to fix. The main problem is that you need a parallel version control
system like SemVer on top of your VCS. SemVer is fine for open source projects
but for a dynamic organization is a nightmare because it is a manual process
prone to failure. SemVer dependency hell is really hard to deal with and
creates a lot of technical debt.

Additionally, once you go Polyrepo you lose __true __CI /CD. Yes, you still
have CI/CD pipelines but those apply only to a fraction of the code. Once you
get used to run `bazel test` and you know you will run every single test of
any piece of code that could depend on the code you just changed, you never
want to go back. Yes, you could have true CI/CD with Polyrepos, but it
requires a lot of work and writing a lot of tooling that does not exist in the
wild. It is cheaper to invest in scaling your VCS in a multi-repo.

------
woolvalley
My org went from polyrepo 10 commit semver dependency hell when updating an
internal API to monorepo and it saves a lot of time. Unmigrated semver
breaking changes are a form of technical debt, and it takes a lot more total
man hours to do the 'proper' one by one many commit poly repo migration than
the other way around.

If we had the tooling to do multirepo atomic commits and reviews then maybe we
would of stuck with polyrepos, but it doesn't really exist out in the wild, so
monorepo it was.

~~~
mlthoughts2018
My org went from a monorepo where every project had to obey the same CI model
and you could not introduce entirely new CI tools for new prototypes over to a
polyrepo with separated semver library repos for shared dependencies, and it
simplified everything so much.

Adding additional PRs across different repos is functionally no different than
the same PR with scattered dependencies in a monorepo, except that separating
the PRs makes each isolated set of changes more atomic and focused, which has
led to fewer bugs and better quality code review and, the hugest win, each
repo is free to use whatever CI & deployment tooling it needs, with absolutely
no constraints based on whatever CI or deployment tool another chunk of code
in some other repo uses.

The last point is not trivial. Lots of people glibly assume you can create
monorepo solutions where arbitrary new projects inside the monorepo can be
free to use whatever resource provisioning strategy or language or tooling or
whatever, but in reality this not true, both because there is implicit bias to
rely on the existing tooling (even if it’s not right for the job) and
monorepos beget monopolicies where experimentation that violates some monorepo
decision can be wholly prevented due to political blockers in the name of the
monorepo.

One example that has frustrated me personally is when working on machine
learning projects that require complex runtime environments with custom
compiled dependencies, GPU settings, etc.

The clear choice for us was to use Docker containers to deliver the built
artifacts to the necessary runtime machines, but the whole project was killed
when someone from our central IT monorepo tooling team said no. His reasoning
was that all the existing model training jobs in our monorepo worked as luigi
tasks executed in hadoop.

We tried explaining that our model training was not amenable to a map reduce
style calculation, and our plan was for a luigi task to invoke the entrypoint
command of the container to initiate a single, non-distributed training
process (I have specific expertise in this type of model training, so I know
from experience this is an effective solution and that map reduce would not be
appropriate).

But it didn’t matter. The monorepo was set up to assume model training compute
jobs had to work one way and only one way, and so it set us back months from
training a simple model directly relevant to urgent customer product requests.

Had we been able to set this up as a separate repo where there were no global
rules over how all compute jobs must be organized, and used our own choice of
deployment (containers) with no concern over whatever other projects were
using / doing, we could have solved it in a matter of a few days.

In my experience, this type of policy blocker is uniquely common to monorepos,
and easily avoided in polyrepo situations. It’s just a whole class of problem
that rarely applies in a polyrepo setting, but almost always causes huge
issues with monorepo policies and fixed tooling choices that end up being a
poor fit for necessary experiments or innovative projects that happen later.

~~~
jacques_chester
> _except that separating the PRs makes each isolated set of changes more
> atomic and focused_

It makes it _less_ atomic if you need simultaneous changes in multiple
repositories.

> _Had we been able to set this up as a separate repo where there were no
> global rules over how all compute jobs must be organized, and used our own
> choice of deployment (containers) with no concern over whatever other
> projects were using / doing, we could have solved it in a matter of a few
> days._

I think this was an organisational problem, but I accept the argument that
monorepos will provide a seed around which such pathologies can crystallise.
But I don't believe it's the only such seed and I don't think it's an
inevitable outcome from monorepos.

~~~
mlthoughts2018
> It makes it less atomic if you need simultaneous changes in multiple
> repositories.

No, each individual set of changes is more atomic (smaller in scope, mutating
a system from one state of functionality to a new state of functionality).

The problem is that it’s a linguistic fallacy to act like in the monorepo case
“the system” is the sum of a bunch of separate systems (it isn’t, because they
are not logically required to depend on simultaneously transitioning). So in
that monorepo case, to move subcomponent A from some state of functionality to
a new state of functionality, you unfortunately have to also make sure you
include totally unrelated (from subcomponent A’s point of view) changes that
also correctly transition subcomponent B to a new state of functionality, and
subcomponent C, etc., which is exactly _less_ atomic (to transition states,
you are required to have simultaneous other transitions that are not logically
required for any reason other than the superficial sake of the monorepo).

~~~
jacques_chester
> _simultaneous other transitions that are not logically required for any
> reason other than the superficial sake of the monorepo_

I don't see what's superficial about "everything everywhere is in sync",
myself.

And I have absolutely seen PR race conditions. Assuming that everyone
perfectly sliced up the polyrepo on the first go is optimistic.

~~~
mlthoughts2018
> “I don't see what's superficial about "everything everywhere is in sync",
> myself.”

Well it is superficial by definition, because two unrelated things are “in
sync” only because you say so. The very meaning of “in sync” in your sentence
is some particular superficial standard you chose that has nothing to do with
the logical requirements of the isolated subcomponents (i.e. “in sync” meaning
two independent subcomponents were adjusted in the same large commit or PR is,
by definition, superficial... it’s just a cosmetic notion of “in sync” you
chose for reasons unrelated to any type of requirement).

~~~
jacques_chester
I work on a polyrepo. The code in repo A has a dependency on the code in repo
B. When I update B, I sometimes need to update A.

In a monorepo that's already done when I finish working on the modules in B.

That I am unable to release from A until it has been synced with the module in
B is not "a cosmetic notion". It's _being unable to release_. I consider
releasability at all times to be the most important invariant to be sought by
the combination of tests, CI and version control.

~~~
mlthoughts2018
> “In a monorepo that's already done when I finish working on the modules in
> B.”

This is not usually true in monorepos or polyrepos, and is quite a dangerous
practice that nobody should use and hasn’t got much at all to do with what
type of repo you use.

I worked in a monorepo for a long time where you still had to deploy versioned
artifacts. So when you makes changes to B, you still have to bump version IDs,
pass deployment requirements and upload the new version of B to internal pypi
or internal maven or internal artifactory, etc.

Then consumer app A needs to update its version of B, test out that it works
and that, from app A’s point of view, it is ready and satisfied to opt-in to
B’s new changes, and do build + deploy of its own to deploy A with upgraded B.

Doing this in a way where a successful merge to master (or equivalent notion)
of a change for B is suddenly a _de facto_ upgrade for all the consumers of B
is _insanely bad_ for so many reasons that I’m not even going to try to list
them all. Monorepo or not, nobody should be doing that, that is bonkers, crazy
town bad. It’s a similar order of magnitude of bad as naively checking
credentials into version control.

~~~
joshuamorton
I think you're conflating wire format changes (which agreed, should be
versioned and backwards compatible) with code level api changes. If V2 of
xyz.h adds an argument to some method, a polyrepo just updates the tests and
submits the change. In a monorepo, you can't submit until all clients are also
updated.

~~~
mlthoughts2018
No, even in a monorepo you can submit the code whenever you want, and have CI
publish a versioned artifact from just that submodule / package / whatever.
Other client code in the same repo can happily keep going along never caring
about those new changes until later when explicitly ready to adopt them via
adopting the new version.

There’s no reason why CI in a monorepo can’t create versioned code artifacts
like Python packages, Java libraries or special jars, Docker containers,
whatever. This is a very common workflow, e.g. combining a monorepo with in-
house artifactory.

Definitely not talking wire format changes. Talking about publishing versioned
libraries, jars, etc., from subsets of monorepo code.

~~~
joshuamorton
Why would you do versioning in a monorepo?

>There’s no reason why CI in a monorepo can’t create versioned code artifacts
like Python packages, Java libraries or special jars, Docker containers,
whatever. This is a very common workflow, e.g. combining a monorepo with in-
house artifactory.

Correct, and this is necessary. But there's no reason for a to depend on b
from the artifactory instead of a just depending on b at a source level, and
building a and b simultaneously and linking them together. Now you have fully
hermetic, reproducible builds and tests.

Why is not doing versioning so insanely bad that you can't list all the
reasons (this would be a much more interesting discussion if you did).

------
pdpi
Can we just move along and get to "Monorepo: Maybe do it, maybe don't. Just
think it through and own your decision"?

Both monorepos and polyrepos have advantages and disadvantages. Many factors —
scale, overall team quality and experience, level of integration between
projects are a few that come to mind — will affect how much those advantages
and disadvantages matter to any given company at any given point in time. The
right choice for you isn't necessarily the right choice for me.

Much more important than which approach you choose is understanding, and
accepting, the consequences of your choice. You'll want to extract value out
of the advantages, you'll need to mitigate the disadvantages. You won't be
able to adopt tools and processes meant for the other approach without some
degree of friction.

~~~
0xFACEFEED
That's what most people do. They just don't blog about it.

------
mmmeff
Thank you so much for writing this. As someone who’s worked in the best and
worst of these two words, the productivity gains are absolutely insane and the
limitations, as stated by the author, are no more painful than limitations of
federated/polyrepo code.

Fighting back against monorepo design is dangerous - embrace experimentation.

~~~
shados
> Fighting back against monorepo design is dangerous

What's dangerous about it? Monorepos have a lot of benefits, and should
absolutely be considered. Maybe even by most. But right now in the community
it's almost pushed as the "only true way with all benefits and no drawbacks",
and that's absolutely not true. To the point the knowledge of why and how to
poly repo is already starting to get lost.

That's dangerous.

~~~
Benjammer
The real danger here is anyone talking about any system architectures or
tooling as "dangerous" (or "not dangerous") absent any other context...

What do you even mean by "dangerous"? To a business? To your health?

What is the deal with people trying to make these sorts of global assertions
in a vacuum about what's "good" and "bad"? This doesn't make any engineering
sense in any way to me. You have a problem and you figure out the best way for
your business to solve that problem given some bounded resources. Nothing in
the basic problem solving process (scientific method?) necessitates all the
arbitrary "should" axioms. Why don't people just analyze their specific
situation and figure out a solution?

It's like people arguing vehemently about the optimal design that every
company "should" be using for all windshields for all personal vehicles on the
road, without even remotely discussing various vehicle body shapes and sizes.

~~~
sebastos
Well just to present the other side, I don't really understand the prevalence
of the "there's no one answer that fits for everybody" comment trope. You see
a couple of comments like yours in every discussion like this. So no offense,
but I'm going to rant about it for a few paragraphs.

If the "no one-size-fits-all" claim happens to be genuinely and axiomatically
true for a particular engineering trade-off, then fine. There's no one correct
displacement of an internal combustion engine. There's no one correct
resolution of an LCD screen. Fine. It's demonstrably true that a trade space
exists.

But a lot of times people seem to just throw up their hands and call it a
trade space when really they just haven't reached a conclusion yet. "There's
nothing inherently better or worse between Ubuntu and Windows, they're
basically just ice cream flavors!" No! Maybe we haven't fully realized a more
perfect operating system yet to settle the debate, but that doesn't just make
it a meaningless question. It's perfectly possible for a system to be
architected poorly given both the real world it has to interact in and the
future world it makes possible. To say that this question is an unanswerable
matter of taste is to be completely unimaginative about how good an operating
system _COULD_ be. (See the death of operating system research and all that).

CVS is _worse_ than git. It just is. I don't want to hear this "well maybe if
it fits your use case" mumbo jumbo. If you think that you have a unique
snowflake reason that CVS is more appropriate than git, than you are almost
certainly lying to yourself or misinformed.

And it's strict hierarchies like that that inspire these articles. There are a
lot of technologies out there, and lot of ideas, and most people don't know
most of the things you need to know to come up with a good answer to what
suits "their specific situation". So people like myself are looking for
lessons learned and certain invariants that help them narrow the solution
space. I have no idea whether a monorepo would work well for my organization,
and if the only thing that your article has to contribute is "monorepos
sometimes work for some people, but YMMV! Good luck!" then I have learned
nothing. But if somebody thinks that they've learned a fundamental truth about
the universe, that that could be useful to me. Whats more, most people like me
have a situation that _isn't_ that specific. We have to write some code,
there's some ML shit in there, and some real-time critical stuff in there.
Nothing mindblowing. _Most_ software shops shouldn't need something that is
particularly bespoke. So coming in with the prior that everybody will have to
do something unique to their organization is bizarre. There is so much
commonality between what each software company does, in fact, that if a
commonly used technology can be used by shop A but legitimately can't be used
by shop B, there's a decent chance that this is a problem or limitation with
the tech.

So who knows, maybe saying monorepos are _always_ better or _always_ worse
really is too ambitious. But I don't think the concept that they _could_ be is
a priori ridiculous. End this software relativism! Things can be made better!
Yes, strictly better!

~~~
Benjammer
I'm of a mind that true understanding only really comes from questioning the
most firmly held "universal truths." I want to know why I know what I know, I
have no use for vague quality axioms that put black box abstractions on top of
complex systems and processes.

> most people don't know most of the things you need to know to come up with a
> good answer to what suits "their specific situation".

And most people aren't competent software architects capable of adeptly
steering an engineering team in the right choices to make. I'm not sure I
understand the point here, or why you want to make a technical field like
software engineering dumbed down to the point where "most people" can intuit
the right decisions to make simply by asking HN what "the best thing" is.

------
sigil
Observe how the verb "force" gets used 6 times. Monorepos "force the
conversation." You the individual contributor are "forced to deal with the
situation" and "forced to see the upfront cost" of breaking contracts. Your
team is forced to "look up from their component, and see the perspectives of
other teams and consumers."

All this forcing people to do things the Right Way ( _my way_ ) is surely part
of the pushback against monorepos.

But set that aside for the moment. Let's suppose defaults _should_ force
people to do things the Right Way, and that we also know what the Right Way
is.

Instead of letting anyone sloppily depend on any code checked into the
monorepo, shouldn't we _force_ people to think long and hard about contracts
between components -- the default concern in a polyrepo architecture? When and
how to make contracts, when and how to break contracts? Isn't this how Amazon
moved past their monorepo woes, adopted SOA, built AWS, and became one of the
largest companies on earth? Heck, isn't this how the Internet itself was
built?

~~~
holoway
Author here. The Right Way :tm: is situational - there isn't one right answer
to things like when and how to make contracts, or how to break them. When I
used the term "force", you'll see that I'm usually talking about dialog
between people and teams.

It's not that it's a single right way to do it. There isn't, and anyone who
tells you there is has something to sell you, or is inexperienced enough to
not have seen enough of the problem domain.

What is for certain: teams need to have tooling that causes the conversations
and behavior that lead to the outcomes we want. As systems and teams scale
large enough, this tooling becomes essential - without it, teams go their own
way, and in so doing, may or may not create the culture needed for the
outcomes you want.

I have never once in my career, so far, had to tell a team to communicate
_less_. When we're talking about engineering organizations that are large
enough to diverge, you must solve these problems somehow, and it needs to be
systemic and intentional.

~~~
sigil
Thanks for the response. Out of curiosity, how does your engineering
organization introduce new dependencies within the monorepo? Can B, C and D
all depend on A without A's consent or even awareness? (Suppose A is some
checked in code that's useful, going to see updates in future, but is dormant
at present.)

Your post puts a lot of the onus on A for breaking B, C, and D, but I think
equal care and consideration needs to come from the other side of the
contract. Eg, What are you depending on? Is it a dependency you want to take
on, or are you and the shared code likely to diverge in life? These are top of
mind decisions in a polyrepo architecture, but from my experience they're
often not even considered in a monorepo. Anything checked in is fair game for
reuse. This is why I suspect you may be "forcing" the wrong thing.

For reference I've worked in companies large and small, both monorepo and
polyrepo. When I worked on Windows back in the 00's the monorepo tooling
(SourceDepot) was quite amazing for the time, but the costs of that sort of
coordination were also painfully apparent to everyone.

The place I currently work has a monorepo for desktop software and polyrepos
for everything else. It isn't a straight up A/B experiment, but anecdotally
the pain is higher and shipping velocity lower in the monorepo half of the
world. Most of the monorepo pain is related to CI or other costs of global
coordination, the kind of things Matt touches on midway (albeit probably too
subtlely). I'd be interested to see your counterarguments to those points as
well. Do you need fancy dependency management tooling to make your global CI
builds fast and reproducible? Matt argues those end up being equivalent to the
kind of dependency tooling that's intrinsic to polyrepo architectures anyway.

~~~
holoway
Disclaimer: it depends. :) Since that's not a good answer at all, I'm going to
write the rest of this as if I have the answer, even though I know I do not,
because it's deeply situational.

Equal care does need to come from the other side of the contract. Most
frequently, I see teams B, C, and D in a polyrepo world do the worst of all
worlds: take dependencies liberally, pin them in place, and try to forget
about them. Of course, high functioning engineering teams (and cultures) will
try and avoid this: they will be thoughtful about dependencies, and they will
keep them up to date. In practice, they most frequently do not. This is
especially true in the enterprise broadly. When we get it wrong, and take a
dependency we wish we hadn't, how do we know? When do we know? What is our
recourse? If I depend on code in the monorepo that diverges, I'm more likely
to know near to the point of divergence (because of the nature of the system).
That means the conversation about how to fix it happens sooner. I'm not
interested in avoiding error - that's going to happen. I'm interested in how
close to the introduction of the error do we understand it, and how do we
communicate about its remediation.

As far as CI and global coordination goes, the cost is high in either
direction if the system is distributed, and the solutions are similar in my
experience. I think the worst case is the mixed one (which is a world I
inhabit) - you wind up splitting your investment in both style and effort
across both approaches. With the monorepo style, one big advantage is where
the complex CI interactions can be encoded, since you have access to more of
the code itself. Granted, at scale, you likely are testing against artifacts
rather than point in time commits outside of the component in question (this
is very similar to what you're going to do in a polyrepo, too.)

I think solid testing design requires real effort and understanding of the
system under test, regardless of repository layout. Which brings us back to
communications again. The more you can see, and the more clearly experienced
the pain is across the teams, the more likely you are to have the critical
conversations needed to improve the system - rather than making local fixes
("my teams tests are fast", "their component sucks").

~~~
sigil
_Most frequently, I see teams B, C, and D in a polyrepo world do the worst of
all worlds: take dependencies liberally, pin them in place, and try to forget
about them._

This has been my observation as well, minus the value judgment. Why is pinning
dependencies and moving on with life the worst thing in the world? As you
point out in your article, a security fix in A does suddenly force B, C, and
D’s hand. Another scenario I’ll add to that: if A provides communication
between B, C and D, a synchronized update to all dependents might be required.

Thing is, I’d argue these scenarios are the exception to the rule. If you’re
drawing boundaries in the right places (again this may come back to contract
design) you’re largely free to change implementation details when you need to,
on your own terms, and not because some distant transitive dependency has
decided it’s time for your build to break.

With monorepos I see lots of the latter. Lots of breakage for no other reason
than “everyone needs to be on the same page.” Lots of conversations — O(N^2)
conversations, times some constant factor — that might not need to take place,
ever, but it’s critical the entire company have them _right now_ because the
global build is broken.

Here’s another way of looking at it. Until a few years ago, it was standard
practice to frequently update npm dependencies against fuzzy semvers. Now most
people pin their dependencies, and their dependencies’ dependencies, with a
lockfile. And in other ecosystems like go’s you also have tooling to support
much more controlled, infrequent and minimal dependency upgrades (see MVS).

Why the change? Because people got tired of things breaking all the time. They
wanted off the treadmill so they could Get Things Done again. I don’t see how
monorepos provide this stability, and frankly it seems like the monorepo idea
is where npm was about 5 years ago. Perhaps even farther behind than that,
since C, C++ and others haven’t even evolved viable language package managers
yet.

You’re a rust fan, so maybe cargo + a monorepo is a sweet spot I haven’t
encountered yet? Anyway, I do really appreciate you taking the time to share
your perspective on these things. It’s been great having a reasonable
discussion about them.

~~~
Too
If you've got a pre merge build check you can't break global build in a
monorepo. That's the benefit, _the one introducing the breakage_ will get a
fail in your CI. There is no need for other teams to catch up.

By doing this you only ever "step" a dependency one at a time and one minor
minor version at a time, so you only get very few and very small breakages
each time. Instead of locking your depfile and then 6 months down the road you
realize you need a security fix in component foo but then you got 1000 other
backwards incompatible changes to fix because of transitive dependencies that
also need to be upgraded in order to satisfy foo 1.2 dependencies.

------
est31
There aren't good monorepo solutions out there (yet). Git LFS is great for few
large files, but it doesn't help with tons of smaller files. Git submodules
are crap when it comes to usability, and have been for a long time, it's even
mentioned in the famous Torvalds Git Talk.

Git had a sparse checkouts feature since a long time, but it only affected the
checkout itself, all the blobs would still be synced.

Now, Git is gaining good monorepo capabilities with the git partial clone
feature [1]. Their idea is that with them you can only clone the parts of a
repository that are interesting to you. This has been brewing for a while
already but I'm not sure how ready it is. There doesn't seem to be user-level
documentation for it yet, to my knowledge, so I am linking to the technical
docs.

[1]:
[https://github.com/git/git/blob/master/Documentation/technic...](https://github.com/git/git/blob/master/Documentation/technical/partial-
clone.txt)

~~~
dangoor
From earlier discussions around monorepos, I saw references that Google,
Facebook, and other large monorepo orgs have been making use of Mercurial.

~~~
est31
Yes, Facebook is mercurial based to my knowledge. Google is using its custom
solution called piper I think:
[https://cacm.acm.org/magazines/2016/7/204032-why-google-
stor...](https://cacm.acm.org/magazines/2016/7/204032-why-google-stores-
billions-of-lines-of-code-in-a-single-repository/fulltext)

------
malkia
Monorepo is a total win, if you have something like
[https://github.com/Microsoft/VFSForGit](https://github.com/Microsoft/VFSForGit)
(ex GVFS) - e.g. any monorepo that overlays changes, and the rest are simply
file names with no actual contents is a win.

You can certainly achieve this with Perforce, SVN, HG, any repo system there
too.

Linux: FUSE + ?

Windows: Dokan? CBFS? Or the new fangled [https://docs.microsoft.com/en-
us/windows/desktop/projfs/proj...](https://docs.microsoft.com/en-
us/windows/desktop/projfs/projected-file-system) which VFSForGit uses

------
e3b0c
Monorepo could be a decent choice if your software stack does not require too
much external dependencies. Or more precisely, the ratio of own code to the
third-party code is reasonably high.

Let me give a concrete example. The Android open source project (AOSP) which
builds the system of Android devices has the code size close to the scale of
tens of GB (let alone all the histories!). It is already a massive monorepo in
itself. And typically you would have many of them from different OEM/SoC
vendors of different major releases. In such a scenario, it would turn into 'a
monorepo of monorepos,' which is quite unpleasant to imagine.

------
totallysnowman
I think that the reason of the argument is that both authors understand the
definition of "large repository" very differently.

With 100 engineers a monorepo might seem a good idea. With 500 it becomes
nearly impossible to do anything involving a build. Some isolation is needed.

Also from my experience many engineers just don't give a shit about
architecture. They create entangled mess, that kind of works for the customer,
and go home. Without some enforced isolation it is impossible to maintain it.

That being said I am more inclined to polyrepos.

~~~
thurn
the fact that essentially 100% of big tech companies use monorepos seems like
evidence that it is at least possible to do it in a scalable way...

~~~
shados
Definitely not 100%. It also has a lot less to do with company size, and more
about when the company was created. Before the git and similar tools of the
world came to be, managing a single repo was a pain, nevermind hundreds or
thousands of them. So (almost) everyone did it the way these big companies
did.

Today, not quite. I work for a multi billion dollar tech company and we have
several thousand repos (and it's awesome)

~~~
user5994461
Not true. Google, Facebook, Goldman Sachs and JP Morgan, all companies that
run mono repos and predate git by very far.

Git cannot checkout sub directories and it slows down exponentially with the
number of branches. It's the opposite of what is needed to run a mono repo in
a large company.

~~~
shados
My wording must have been awful...because that's exactly what I was trying to
say.

The big companies that predate git and such used monorepos because that was
the norm at the time, and it was easier to do with the tools at the time, and
as they scaled, they just scaled their process instead of changing everything.
But several large tech companies, especially newer ones, do the multi repo
approach.

------
skybrian
I wonder if a star pattern would work, where you have a single, shared repo
for all your libraries and a repo for each app.

This would help people working on smaller apps, since they don't need to look
at other apps unless they're working on shared library code.

Of course, once you are working on library code, you have to build and test
all the apps that use it. But even at Google, the people working on the lowest
levels of the system can't use the standard tools anyway.

~~~
ceronman
A star pattern still has most of the downsides of the multirepo approach.
Specifically, it has the problem of needing a parallel version control (e.g.
SemVer) on top of your individual repositories. This creates fragmentation,
where different applications have dependencies on different versions of the
libraries which ends up in dependency hell, technical debt, and CI hell.

~~~
skybrian
An alternative would be to have a policy where all the app repos must use the
same version (nobody can upgrade until they all upgrade). This makes things
harder for the library maintainers, but no more than a monorepo.

I don't see why you'd need semver. The apps could sync to a particular commit
in the library repo.

~~~
Too
What you propose is just a fake monorepo, containing the global policy of
allowed version of X, disguised as multiple subrepos. OP discusses this.

------
mindcrime
"Shared responsbility" is one of those ideas that sounds good on paper, but
doesn't really scale terribly well in the real world. As the old saying goes
"when everybody is responsible, nobody is responsible".

More to the point, as the author of TFA allows, once a system reaches a
certain size, nobody can understand it all. At some point you have to engage
division of labor /specialization, and once you do that, it doesn't make sense
to have just anybody randomly making changes in parts of the code-base they
don't normally work in.

I'd rather see a poly-repo approach, with a designated owner for discrete
modules, but where anybody can clone any repo, make a proposed fix, and submit
a PR. Basically "internal open source" or "inner source"[1].

In my experience, this is about as close as you can get to a "best of both
worlds" situation. But, as the author of TFA also says, you absolutely _can_
make either approach work.

[1]:[https://en.wikipedia.org/wiki/Inner_source](https://en.wikipedia.org/wiki/Inner_source)

~~~
jacques_chester
> _As the old saying goes "when everybody is responsible, nobody is
> responsible"._

Here's my rule: You break it, you fix it.

> _I 'd rather see a poly-repo approach, with a designated owner for discrete
> modules, but where anybody can clone any repo, make a proposed fix, and
> submit a PR._

I'd rather see pairing, extensive tests and fast CI. I see PRs as a necessary
evil, rather than a good thing in themselves. If I make a change that breaks
other teams, I should fix it. If I can make a change to fix code anywhere in
the codebase, I should write the test, write the fix and submit it.

Small, frequent commits with extensive testing creates a virtuous cycle. You
pull frequently because there are small commits. You are less likely to get
out of sync because of frequent pulls. You make small commits frequently
because you want to avoid conflicts. Everyone moves a lot faster. I have had
this exact experience and it is frankly _glorious_.

~~~
sixstringtheory
> You break it, you fix it.

I’ve seen this invoked so many times to shirk responsibility though. Someone
piles up all kinds of crap in a tight little closet, complete with a bowling
ball on top, and the next unsuspecting dev who comes by and opens it gets an
avalanche of crap falling on them while the original author can be heard
somewhere in the background saying “it’s not my problem.”

This winds up leading to more crap-stacking just to get the work done ASAP and
you wind up with a mountain of tech debt.

I like the zero flaw principle where new feature work stops until all
currently known flaws are fixed. Then everyone is forced to pitch in and
responsibility is shared whether you want it or not.

~~~
jacques_chester
> _I’ve seen this invoked so many times to shirk responsibility though.
> Someone piles up all kinds of crap in a tight little closet, complete with a
> bowling ball on top, and the next unsuspecting dev who comes by and opens it
> gets an avalanche of crap falling on them while the original author can be
> heard somewhere in the background saying “it’s not my problem.”_

I'm accustomed to collective ownership where, ideally, this never happens and
in practice happens rarely (followed by the little closet being torn out and
replaced).

> _I like the zero flaw principle where new feature work stops until all
> currently known flaws are fixed._

I agree: stop the line. But I think it's orthogonal to the sins or virtues of
_n_ -repology.

------
Rapzid
Any good mono repo build tools out there? I've been thinking about this for
the past few weeks. Considering creating a general purpose monorepo tool chain
and potentially a mono repo first CI system.

Unfortunately some of the most popular CI/CD services out there(Travis,
Circle, etc) don't even support cross-repo pipelines, much less mono repo
builds.

~~~
fxfan
Pants and bazel sound like favorites

~~~
Rapzid
Interesting, thanks! Didn't realize Bazel was open sourced..

Those both look way more in the weeds than what I would have imagined.. I
guess for Bazel at least it makes sense given Googles scale how fine-grain
they would get into caching and incremental builds..

For my needs a simple tool that would allow discovering "WORKSPACES" and
constructing a build graph based on what's changed, while handing off the
actual building to some entry point in the workspace, would be good enough.
Have a weird collection Gradle projects, node projects, test suites, docs, and
etc with their own build processes already in place.

Some things are also on a "critical" path while others can run async given the
context(branch, tag, etc)...

I'm rambling though.

------
cryptonector
I agree, use a monorepo. I anxiously await MSFT's git megamonorepo
functionality. Until then there's things like git meta[0].

[0] [http://twosigma.github.io/git-meta/](http://twosigma.github.io/git-meta/)

------
luord
Yet another chapter in one of the big flamewars. Seeing as I fall in the
monorepo camp, I must say I mostly agree; also, I much prefer this tone for an
article.

I find it enjoyable how plenty of comments both here and in the other
discussion are of people saying "We had a mono/polyrepo and things improved
tremendously when we migrated towards a poly/monorepo". The issue might be one
of growth and complacency: a drastic change like that forces the team to face
the technical debt that was being ignored and do a better implementation using
what was learned from past mistakes.

------
coldtea
> _But I think Matt’s argument misses the #1 reason I’ve flipped quite hard to
> a monorepo perspective as my own level in the organization has gotten
> higher_

Perhaps the fact that since their level was now higher, they wouldn't have to
deal with the nitty gritty details and pain of working with a monorepo as a
developer?

E.g. I wasn't for it when I was a dev, but now that I can just impose it on
others, I love it. Same with how various 'development process' rituals are
adopted...

------
Tempest1981
For those using monorepos, what is your branch strategy? Say that 3 projects
share a library, and release on different schedules. How does each project
freeze shared library changes? Do you keep N version branches?

How does the library team know which consumers a commit may break? What tools
are recommended?

------
AzzieElbab
As engineers we spend wast amounts of time in constant search for a rival to
"tabs vs spaces" debate

------
randyrand
The more complicated answer is sometimes you should use a mono repo and other
times you shouldn't.

------
rdsubhas
This is starting to get a debate of "principles", like forcing A and B to
talk, or forcing A and B to have more explicit boundaries, and so on. Guess
where that ends (hint: it doesn't).

With a monorepo, the basic effort you have to put in to start scaling is quite
high. To properly do a local build, you need bazel or something. But bazel
doesn't stop at just building, but it manages dependencies all the way down to
libraries and stuff. Let's say you're using certain maven plugins, like code
coverage, shading, etc. Would bazel have all the build plugins your project
needs? Most likely not. You have to backport a bunch of plugins from maven to
bazel and so on. Guess how many IDEs support bazel? Not a lot.

Then you need to run a different kind of build farm. When you check-in stuff
to a monorepo, you need to split and distribute one single build. Compared to
a polyrepo where one build == one job, a monorepo is like one build == a
distributed pool of jobs, which again needs very deep integration with the
build tool (bazel again here), to fan out, fan in across multiple machines,
aggregate artifacts, and so on.

Then the deployment. Same again. There is no "just works" hosted CI or hosted
git or anything for monorepos. People still dabble with concourse or so on.

And guess what, for a component in its own repo, you don't need to do
anything. Existing industry and OSS tooling is built from ground up for that.
Just go and use them.

To provide a developer a "basic experience" to go from working on, building
and deploying a single component – the upfront investment you need to provide
with a monorepo is very high. Most companies cannot spend time on that,
because scale means different things to different companies. There is a vast
gap in the amount of ops/dev tooling you have for independent hosted
components vs monorepo tools. Just search for "monorepo tools" or DAG and see
how many you can come up with. So what _really_ happens with a monorepo is,
most companies go with multi-module maven and jenkins multi-job. The results
are easy to predict. I'm not saying that maven/jenkins are bad, but they are
_not_ sophisticated, and are not anywhere close to what
Twitter/Facebook/Google or any modern company uses to deal with a monorepo
(for a good reason). They are just not good at DAG. If you're relying on
maven+jenkins as your monorepo solution, all I can say is "good luck".

Instead, if you start by putting one component in one repo, you keep scaling
for _much longer_ before you hit a barrier.

In principle, monorepos are better. In practice, they don't have the basic
"table stakes" tooling that you need to get going. Maybe monorepo devops
tooling is a next developer productivity startup space. But until then, it's
not mainstream for very good reasons.

------
marcosdumay
So... An article based on equating change recording medium with integration
testing procedures.

------
fxfan
There's a lot of discussion of bazel and co inside sub-comments but i have a
question that isn't addressed-

How do the "global build tools" play with language specific build tools?

My primary stack is Rust and Scala. Both have excellent build capabilities in
their native tools. How well do pants/bazel integrate with them? I wouldn't
want to rewrite complex builds nor would I expect these tools to have 100%
functionality of native ones.

~~~
laurentlb
Bazel has some level of support for many languages:
[https://docs.bazel.build/versions/master/be/overview.html#ad...](https://docs.bazel.build/versions/master/be/overview.html#additional-
rules)

I know the Scala rules are used in production by multiple companies. Rust
support is improving quickly, but it's not perfect. See the dedicated GitHub
repositories for more information.

(I work on Bazel)

------
benmarten
Please don't. It's just too slow and not efficient. Instead use common open
source best practices of shared library architecture. Problem solved! Putting
everything into one repo is just lack of organization and creates a huge mess.

~~~
zamadatix
Too slow as in "to do it" or too slow as in "to use it". In either case I
think if that were true there wouldn't be monorepo's at Google, Facebook, and
Microsoft. I will say it's true that didn't come for free, e.g. Microsoft had
to make GVFS due to the sheer enormity of their codebase but that's already
done and works pretty well.

I agree share library style makes more sense in most cases though. The main
problem with it is forcing everyone to use the latest library versions but
that isn't insurmountable by any means.

~~~
mlthoughts2018
My old boss was an engineering manager at Google in the 90s and early 2000s.
He used to tell us that _everyone_ he interacted with at Google _hated_ the
monorepo, and that Google’s in-house tooling did not actually produce anything
approaching a sane developer experience. He used to laugh so cynically at
stories or that big ACM article touting Google’s use of a monorepo (which was
a historical unplanned accident based on toppling a poorly planned Perforce
repository way back when), because in his mind, his experience with monorepos
at Google was exactly why his engineering department (several hundred
engineers) in my old company did not use a monorepo.

~~~
user5994461
His experience from the 90s and early 2000s is meaningless in the current era.
Version control and Google were in their infancy.

SVN was first released in 2000. Git in 2008. Branching, tagging and diffing
were nowhere near what is possible now.

That goes back to desktop with a disk smaller than a GB, CPU in the tens of
MHz with a network so slow and reliable, if you have one at all.

~~~
mlthoughts2018
My understanding from many Google employees is that the properties of the
system that caused problems in ~2000 - 2010 are largely still the same today:
the canary node model of deployment, fixed small set of supported languages,
code bloat, inability to delete code, bias towards feature toggles even when
separate library dependency management would be better for the problem at
hand, various firefighting when in-house monorepo tooling breaks, difficult
on-boarding for people unfamiliar with that workflow, difficult recruiting for
candidates who refuse to join if they have to work under the limits of a
monorepo like that.

