
Monorepos and the Fallacy of Scale - loevborg
https://presumably.de/monorepos-and-the-fallacy-of-scale.html
======
ken
Whenever I hear smart and reasonable people argue well for both sides of an
engineering issue, my experience is that it will turn out that we're arguing
the wrong question. The perspective is wrong. We can't get past thinking in
terms of our old terminology.

What we all _really_ want is a VCS where repos can be combined and separated
easily, or where one repo can gain the benefits of a monorepo without the
drawbacks of one.

Another crazy tech prediction from me: just as DVCS killed off pre-DVCS
practically overnight, the thing that will quickly kill off DVCS is a new type
of VCS where you can trivially combine/separate repos and sections of repos as
needed. You can assign, at the repo level, sub-repos to include in this one,
get an atomic commit hash for the state of the whole thing, and where my VCS
client doesn't need to actually download every linked repo, but where tools
are available to act like I have.

(This will also enable it to replace the 10 different syntaxes I've had to
learn for one project to reference another, via some Dependencies list and its
generated Dependencies.lock list.)

In a sense, we already have all of these features, in folders. You can combine
and separate them, you can make a local folder mimic a folder on a remote
system, and access its content without needing to download it all ahead of
time. They just don't have any VCS features baked in. We've got {filesystems,
network filesystems, and VCS}, and each of the three has some features the
others would like!

I don't have much money right now but I'd pay $1000 for a good solution to
this. I'd use it for my home directory, my backups, my media server, etc.

~~~
WorldMaker
I don't think it is quite a revolution from/killer for the DVCS as you think.
A lot of what you ask for can be done with DVCS tools if you go low level
enough. The git graph supports more complex shapes in the raw than the git UI
tools ('porcelains') tend to make it easy to work with. Especially when you
throw in new tools like GitVFS for virtualizing some/all of the git object
database. To some extent what you ask for is simply a UX problem in the DVCS
world of how to make such power available in an easy-enough to use way,
without providing too many additional new footguns.

> Whenever I hear smart and reasonable people argue well for both sides of an
> engineering issue, my experience is that it will turn out that we're arguing
> the wrong question.

Right, though my feeling is the root underlying discussion is actually
sociopolitical rather than technical. The preference between monorepos and
polyrepos seems more to do with the organizational structures of organizations
and _people_ than about the actual technical merits of either approach. I
think the issue is that developers just feel safer trying to tie things to
technical merits than engage with the problem at a sociopolitical level.

~~~
candiodari
> tie things to technical merits than engage with the problem at a
> sociopolitical level.

This hits the nail on the head. The real advantage of monorepos is that they
are _reproducible_. Many-small-repos aren't. A monorepo, ideally one like
bazel[1] pushes will produce the same bitstream every time you put "make".

Projects terminated, going in a different direction, servers moved, down,
forked, ... it all just doesn't matter. _Your_ stuff works. Network or no
network.

The problem is that these repos allow programmers to be very undisciplined in
a way that sabotages projects. In a monorepo _you_ are updating libraries. In
every individual project. That's a big plus, when it comes to "my software
works", but of course it is work. Your software doesn't suddently stop
working, but that's _because_ it doesn't change, which means it won't change
unless you make it change. Which means, new database driver ? You're upgrading
it. No apt-get, no sysadmin deploying a shared library or dll install doing it
for you.

There is always a _strong_ drive by higher up programmers and managers to fix
"undisciplined" programming, and ... I've never seen this work. Or at least,
I've never seen this do more than move the problem.

The thing I don't understand about some languages, like Golang, is the
(ridiculous) insistence on hermetic binaries (not using anything on the
system, not even libc), and yet their very strong insistence on non-hermetic
source (even to the point of sabotaging the tools for people who wish to do
it, and sabotaging to some extent the community to achieve this). Either do
one, or the other. I feel like C/C++'s support of both approaches is a
superior option. You want shared (nobody does, but hey that's my opinion) ?
You can do that. You want everything hermetic ? Not a problem.

In my opinion sharing things between projects is a mistake.

[1] [https://bazel.build/](https://bazel.build/)

~~~
WorldMaker
There are just as many tools for reproducible builds of polyrepos, between
various package management strategies and CI tools.

> Projects terminated, going in a different direction, servers moved, down,
> forked, ... it all just doesn't matter. Your stuff works. Network or no
> network.

I think it becomes a forest/trees thing. People that prefer a monorepo are
most focused on the forest, and people that prefer polyrepos are most focused
on the individual trees. The end goal is usually the same (a healthy forest),
and done right both approaches are generally isomorphic (the "canopy" of the
forest looks generally the same to the users either way).

To push the metaphor perhaps to the breaking point, a lot of the arguments
between monorepos and polyrepos break down into management of the "root
system" in the forest. Monorepos often don't care as much how deeply the roots
are entangled between trees so long as the forest is healthy, and polyrepos
tend to encourage a more "bonsai" approach of tending to each tree on its own.
Either way, it rarely seems to affect the "canopy" (end user experience) of
the forest, but can matter a great deal to things like technical debt and
project management structure and project deadlines.

------
jupp0r
While certainly interesting, the article leaves out the juicy bits monorepos
and their workflows offer:

\- library developers can easily see the impact their changes will have on
their consumers

\- library changes that introduce regressions in their consumers can be caught
pre-merge given good test coverage

\- dependency version updates between packages cause less mayhem because they
are performed atomically and only merged when green

At the same time, many drawbacks are also left out:

\- the incentive to have long living branches for stability reasons can negate
most benefits mentioned above

\- build times for compiled languages can become problematic even for
moderately sized organizations (I’m looking at you, C++)

\- in my experience, you pretty much need a dedicated dev team working on
workflow tooling because ready solutions are fragmented and hard to integrate
(code review, merge bots, CI/CD, ...)

~~~
mlthoughts2018
You list three items here:

“- library developers can easily see the impact their changes will have on
their consumers

\- library changes that introduce regressions in their consumers can be caught
pre-merge given good test coverage

\- dependency version updates between packages cause less mayhem because they
are performed atomically and only merged when green”

But I think these are actually signs of big failure modes of monorepos, and
each one has analog for solving it with polyrepos and/or versioned artifacts
that’s actually much safer and more practical.

Firstly, wishing to see how library changes will cause issues in downstream
consumers is often a sign of very deep problems, because the library designer
should be free to make changes or enhancements as they deem necessary to solve
the problems they need to solve, which may require validating new changes
don’t introduce unexpected problems for consumers, but, critically, also _may
not_ involve that, and in fact the usual case should be that it doesn’t
involve that.

Instead, it’s up to consumers to consume versioned artifacts of the library
(more on this in a moment), so that consumers are totally in control of
“opting in” to new changes. If the consumer doesn’t want to opt in, they
should not be forced to (e.g. in monorepos where a successful merge of library
X is a silent, de facto upgrade for all consumers required to consume library
X from the same commit, etc.)

Instead, the consumer should be the one creating an experimental branch with
the intended version upgrade, and running it through the consumer’s validation
tests to see if opting to upgrade the version will work.

If it fails, they can open a bug report, and the library maintainer will
decide if that regression is now necessary because of a constraint in the new
version’s design, or if it can or should be patched.

To be clear, this all should be happening inside a monorepo or outside of one,
truly doesn’t matter. The point is to get rid of the horrible practice of
forcing de facto software upgrades on one section of the code merely via a
successful merge in a another section. That idea is terrible in concept, a
very misguided thing that shouldn’t be desired.

Instead, when you merge library changes in one part of the monorepo, CI or
other tooling should produce a fixed artifact from it, like a Python wheel,
jar file, Docker container, shared object binaries, whatever, and
automatically upload to an artifact store like artifactory, with versioned
identifiers.

The code in some other section of the monorepo then can just happily keep
plugging along, not blindsided by new changes, hit by errors uncaught in
testing, etc., and the maintainer of that code can plan for their version
upgrade according to their own timeline and testing.

In particular, this is what actually reduces “mayhem” as you put it in
dependency upgrades, because the standard of doing it implicitly when there is
a green merge is just a form of putting your head in the sand and acting like
the library author can trust green merge status as a reliable indicator that
downstream consumers can auto-accept new changes, when really only those
downstream consumer team devs actually know if that’s true or desirable.

~~~
treis
>Firstly, wishing to see how library changes will cause issues in downstream
consumers is often a sign of very deep problems, because the library designer
should be free to make changes or enhancements as they deem necessary to solve
the problems they need to solve, which may require validating new changes
don’t introduce unexpected problems for consumers, but, critically, also may
not involve that, and in fact the usual case should be that it doesn’t involve
that.

One of my pet peeves is when another developer tells me something "should"
happen. I know library changes should not cause unexpected problems. The
question is _does_ that library change cause unexpected problems. Which is one
of the benefits of a monorepo. You can make a library change and with at least
some level of backing say that it doesn't break anything.

>Instead, it’s up to consumers to consume versioned artifacts of the library
(more on this in a moment), so that consumers are totally in control of
“opting in” to new changes. If the consumer doesn’t want to opt in, they
should not be forced to (e.g. in monorepos where a successful merge of library
X is a silent, de facto upgrade for all consumers required to consume library
X from the same commit, etc.)

>Instead, the consumer should be the one creating an experimental branch with
the intended version upgrade, and running it through the consumer’s validation
tests to see if opting to upgrade the version will work.

The problem is that in an enterprise you will have many things that aren't
under active development. So if you rely on consumers to pin a version and
then upgrade you might end up with dozens of different versions of a library
out there. And then what happens if you find a security issue or something
else that forces everyone to upgrade to the latest version? Suddenly you have
a bunch of applications that no one has touched for a while that all need to
be upgraded.

~~~
mlthoughts2018
> “One of my pet peeves is when another developer tells me something "should"
> happen.”

Exactly, like when someone tells you that you should upgrade your dependency
simply because some new code was merged.

> “So if you rely on consumers to pin a version and then upgrade you might end
> up with dozens of different versions of a library out there.”

The same thing happens in a monorepo, usually with dozens of incompatible
legacy bundles of feature flags / toggles.

------
Joe8Bit
The benefits of monorepos (in my experience) are all people/organisation based
e.g. it can be easier to enforce standards/processes among 100s-1000s of
engineers with a monorepo, or it can be easier to manage/release a very large
interdependent codebase/eco-system being worked on/coordinated between dozens
of teams.

However, this linked post makes a great point, those benefits are all 'scale'
problems which 99.99% of orgs don't have. The corollary is I've seen how hard
it is to go from from multi-repo -> monorepo when you reach the scale where
you _would_ see some benefit.

I also think that the tooling/UX doesn't publicly exist to solve the multi-
repo problem with 100s-1000s of engineers working on 100s of repos. It becomes
so hard to navigate, understand and grok and so much is buried in dark
corners. My experience is that that tooling is _less hard_ to build around
monorepos (Google for example).

~~~
allochthon
Interestingly, the article linked here argues that monorepos are good at small
scale (e.g., startups) and multi-repos are good at large scale (e.g.,
Twitter).

The author does not discuss the fact that Google has used a monorepo. (Not
sure if it still does.)

~~~
Veelox
It still does, while someone might point out that a few percent of Google's
code is not in the monorepo, most of it is in the same repo.

------
justinwp
We are a small 20ish dev company using a monorepo with mostly python. Our
tooling is Bazel and Drone.

\- Ease of onboarding. Being able to quickly build or test any target is
awesome for the new employee.

\- Ease of collaboration. I can see all of the code easily and can learn from
these patterns. I can also quickly contribute or extend apis and fix all
usages without concern for breaking changes.

Our use of Bazel quickly gets us around git scale issues by enabling external
dependencies that can be loaded into the workspace without fully vendoring
everything.

~~~
hknd
That sounds cool. Could you elaborate on this "Our use of Bazel quickly gets
us around git scale issues by enabling external dependencies that can be
loaded into the workspace without fully vendoring everything."?

~~~
oblio
The other question is: how stable is Bazel and how easy is it to extend and/or
find open source extensions for it?

~~~
justinwp
We haven't had any issues regarding stability. For the most part we extend by
creating additional rules. Many of these are supported by the community(search
for bazel and rules_* and you should get a bunch of results on github).

------
azhenley
I’ve never understood the debate between mono and multi repos. With the right
tooling, the line seems to vanish and you just have folders anyway.

Each repo may have their own policies and permissions, which is the biggest
reason I see to keep them separate, but again the distinction still seems
little more than a folder.

Am I missing something?

~~~
weberc2
In my case, I'm struggling to find good tooling to support monorepos. We're
running a microservice architecture, but our CI is triggered by GitHub web
hooks. Currently we're either doing a full build of everything on each webhook
event or we're doing some error-prone git-diffing to try to make sure we're
only rebuilding when necessary. I've looked at tools like Buck and Bazel, but
they seem really heavy for our ~30 person engineering team and they also seem
to have odd ways of doing things (no support for pulling from a package
repository, vendoring is assumed to be vendored--which incidentally the author
of this blog post characterizes as a Bad Thing). Folks who are using monorepos
successfully--what tooling do you use to solve these problems?

~~~
gowld
make solved dependency resolution for partial rebuilds over 30 years ago.
There has been lots of improvement since. I don't doubt you have a real
challenge, but the problem comes from some immature tool in your toolchain
(that I'd be willing to speculate is because someone decided only Javascript
tools are interesting after 2010, because reimplementing features classic
tools is more fun than learning old reliable tools), not an aspect of the
monorepo.

~~~
weberc2
How does make solve the problem of only building what has changed in the
current PR? It seems like this would only work if you have a single CI server
and all of your artifacts live in the filesystem on that server, no? Solving
that seems sufficiently difficult that saying "make solves the problem" seems
disingenuous, but maybe it's easier than it appears to me?

------
grey-area
We're pretty small scale (< 20 services, < 10 devs), and happily use a
monorepo (recently moved from multiple repos when that became unwieldy as
services grew). If you have a lot of services/projects with some shared
dependencies they can make tracking that easier. I agree with the article that
in general they make life easier.

It depends on what tooling you're using, and whether it is tied to the version
control system. Clearly if the tooling makes assumptions about one deployable
per repo and works on git hooks that's going to cause pain, but the answer is
don't use monorepos if your tooling doesn't support it, or change the tooling
so it does.

Most companies won't scale past a few hundred employees, so they're never
going to hit any sort of scale issues with monorepos, and if they do, they'll
have the resources to deal with it.

Does this have to be a religious war? Does one size fits all really apply
here?

~~~
shados
> We're pretty small scale (< 20 services, < 10 devs)

This is really important and the whole point, indeed.

Eg: We consider our org to be "many repos" (we have several thousands).
However, hundreds of them contain 5, 10, or 20+ packages/projects/services.
It's funny because we'll talk about creating "monorepos" (plural) for certain
part of our product, and it confuses the hell out of people.

~~~
aidos
Several thousand repos - ye gads! How do you document them? How do they fit
together? Do they reference each other?

~~~
shados
The answers to all those questions are "it depends". There's a few thousand
libraries, those obviously refer to each other.. Some have readme files and
that's enough, some have full documentation "books", some have comments in the
code and that's enough.

We don't mandate a company wide development process, so each team and groups
can choose their own process and how they track their stuff.

We do have automation and tooling to keep track of things though.

------
nwhatt
The right way to write this kind of content is like Digital Ocean did:
[https://blog.digitalocean.com/cthulhu-organizing-go-code-
in-...](https://blog.digitalocean.com/cthulhu-organizing-go-code-in-a-
scalable-repo/)

Rather than these back and forth about the theoretical implications of a
monorepo, actual stories of implementing one are 10x more useful to me.

~~~
aidenn0
I think there is a balance. It's always possible to argue that DO would have
done better or worse with a polyrepo. However if you ignore evidence, then the
theoretical arguments can get silly very quickly.

~~~
mlthoughts2018
Another problem is confusing causal effects from confounders. Did a monorepo
_cause_ success, or did they succeed _in spite of_ a monorepo. Studying
individual cases for which there could never have been a counterfactual
outcome (e.g. FB or Google) literally cannot provide evidence of a causal
effect.

~~~
aidenn0
That's essentially what I was saying. We have evidence that it is possible to
be successful with both a monorepo and a polyrepo setup, which is perhaps
uninteresting other than to contradict people saying that either setup is
guaranteed to be an unmitigated disaster.

We also, however, have subjective feedback from people working on those teams
as to how they think it would have been different. It's not rigorous data, but
it shouldn't be altogether ignored either, particularly since rigorous data is
so hard to come by.

------
monksy
> Developers are not arguing children that need to be confined to separate
> rooms to prevent fights

Has the author seen the fights that go on? We're extremely opinionated.

------
asdfasdfasdfa
The original "Monorepos please don't" article really just convinced me how
great monorepos are when you _aren 't_ at scale. So you know, put your shit in
a monorepo, and then when it gets painful, break it out.

~~~
busterarm
That process will take you 3-7 years, depending on how many resources you
throw at it. Can your business survive 3-7 years of #seriouspain?

This article hand-waves over many of the criticisms while ignoring a few cold
realities. If you're following an infrastructure as code pattern and/or if you
run bare metal, at some point you _WILL_ determine that some things are too
sensitive to keep in the monorepo. Here is one of the places where coupling
will screw you the hardest.

Your lightweight production deployment repo will have a hard dependency on
some nightmarish 35-40GB monorepo. Your collaboration tools like
rietveld/gerrit will choke under the load and you will struggle to get big
enough servers to maintain it. You'll do things like push to one target and
pull from another. You'll deal with all sorts of transient failures trying to
push or pull. Your CI/CD platform will start taking an eternity to do
anything.

Monorepos absolutely result in coupling and coupling is one of those nasty
things that you don't realize how much of a problem it is until you're
drowning.

None of the above-mentioned complaints are theoretical. I've lived through
them all.

~~~
aidenn0
I imported a 20 year old SVN monorepo to git with 100s of thousands of commits
and tens of thousands of branches/tags and it was under 10GB. Removing a few
large .tgz files that were inadvertently committed brought it down to 5 GB.

Linux has 25Mloc and ~800k commits; I think the pack is on the order of 2GB?

I don't doubt that 40GB nightmarish monorepos exist, I'm just wondering how
and why.

~~~
busterarm
Linux is a highly focussed project trying to accomplish a single thing well
and with rigorous standards.

If you have 100 developers working average US work schedules and making 5-10
commits per workday (debatable number, depends on culture, but i'm averaging
between the "big" commit and lots of small commits), you're going to end up
with 100k commits _per year_. And many large startups have a multiplier of
that number of developers and they're much, much messier than kernel devs.

Referencing the ideal case as counter-example is a bit silly.

~~~
aidenn0
So the nightmare monorepos are caused by unfocused teams trying to accomplish
many different things poorly?

------
jph
Monorepo vs polyreop summary notes from previous HN discussions:
[https://github.com/joelparkerhenderson/monorepo_vs_polyrepo](https://github.com/joelparkerhenderson/monorepo_vs_polyrepo)

I'm adding notes from this HN discussion today. Feedback welcome.

------
nijave
Merging/integrating code & styles is difficult and error prone. At the end of
the day, if two systems interact they will need to be "merged" at some point.
It seems to make more sense to handle this in tests/at a source code level
then risk doing it in the runtime environment alone.

I think tooling and granular permissions (still part of tooling) can be
blockers, though. It makes less sense outside an enterprise/company
perspective such as developing a discrete component that gets pushed to a
public repo (Maven, pypi, npm, etc)

------
EGreg
What are the actually serious downsides of having a repo for each project
again? Serious question. Mercurial supports Subrepositories for example. Just
define your rules for pulling stuff.

From my own experience, if you are arguing about whether use convention A or
convention B, the answer should be to have C that allows both, and then
configurations on top of C for A and B.

This applies for example to lookups in the dabatase by an index.

------
austincheney
> Does the practice of keeping all code together in one place lead to better
> code sharing? In my experience that's clearly the case.

This is where abstraction comes in. When done correctly abstractions are
necessary so that you can separate your work from things you don't want to
work on. In my application I want to be able to access and modify files on the
local filesystem. I don't care about the differences between opening files in
Windows versus Linux or the intricacies of how filesystems work at the bit
level. My application evaluates some code and writes some output to a file. I
use Node.js to solve for a universal file management API. This is an example
of a good abstraction because the separation is clear and explicit.

The simple rule for abstractions is if you can do the very same job in a lower
level you don't need the higher level code. In the Node.js example you cannot
access the filesystem in a lower level, because no such standard library
exists to JavaScript.

Bad abstractions don't provide separation. Many times developers want to use
an abstraction to solve for complexity, but inadvertently do the very same
things the abstraction is supposedly solving for just in a different style or
syntax. Many JavaScript developers use abstractions to access the DOM or XHR.
XHR is simple: assign a handler to the onreadystatechange property, open the
connection, and then send the request. You lose huge amounts of performance by
abstracting these and dramatically increase your code base and the separation
between the API, the framework performing the abstraction, and the code you
are writing are all superficial and self-imposed.

By using and enforcing good abstractions while avoiding bad abstractions you
keep your application far more lean and restrict the focus of your development
team to the goals of the project. Without that your code isn't a monorepo, its
a dependent library of another repo.

------
oblio
I just have this to say: the discussions here are painfully oriented around
SaaS. Once you're doing stuff on-premise or making desktop applications
(things requiring long lived release branches), the discussion is totally
different.

~~~
devonkim
I don’t see why shipped software is mutually exclusive from monorepos. You can
always check out new subdirectories of repos and treat them as a form of
branching where eventually directories are merged together in a separate
commit.

~~~
oblio
I find that shipped software actually operates better in a normal, branched
monorepo. You just branch the whole thing. The alternative is several repo and
using a package manager. That minimizes merging in many branches, as you can
just point the package manager at the updated module version, but brings its
own hassles.

Either way, as I said, different world from the discussion here, which I'd
summarize as "SaaS monorepo vs SaaS polyrepo".

------
eddieh
I think I can summarize my thinking on this pretty succinctly: I want to build
a product, not tooling for software development, and I certainly don't want to
spend any time trying to keep different repos synchronized, etc.

~~~
mlthoughts2018
It turns out that effectively separating dependencies _is a huge part of_
building a product.

This is like saying you really want to swim but you don’t want to get wet.

------
vorpalhex
I'm part of a company that went from a boring VCS strategy to jumping on the
monorepo bandwagon against my advice to keep our git usage simple. It's been
fairly terrible - merge conflicts, code going to the wrong environment, nobody
can actually do a hot patch, and even long running feature branches which
should be stupidly simple run into immense problem.

It also caused issues with our npm repo solution, and has created the worst
case of dependency lock we've ever had.

Do yourself a favor and say no to monorepos. It is massive complexity for no
benefit.

------
DannyBee
1\. It's really hard to tell if any of the people writing blog posts about
these things have ever experienced the larger scale monorepos or not for any
length of time.

As best I can tell, the answer is "no", and they are mostly writing based on
perception. They don't appear to even do things like "try to talk to people
who have experienced the good and bad of it".

While the writing is fun, it makes it a lot less useful in both directions,
IMHO.

2\. The author is right that planning more than 6 months for smaller scale
companies makes no sense. However, both of these authors seem to fundamentally
miss the actual problem in large companies, which they assume is around
engineering and scaling large systems. In fact, it is not. The underlying
issue is that engineering a thing is no longer your main cost. This is one of
many reasons larger teams/companies are fundamentally different (as this
author does correctly point out).

There are 2080 work hours in a year.

If i have 8000 developers, and I have to spend an hour teaching them a new
thing, i just spent ~4 people for a year.

If you spend a day teaching them something new, I just spent ~31 people for a
year.

If you spend a work week teaching them something new, I just spent ~154 people
for a year.

That's just the basic learning costs, it doesn't include migration costs for
code base or _anything else_ [1].

But these costs certainly dominate the cost to engineer a solution as you get
larger - the systems being talked about here (which _have_ scaled engineering
wise) are not 50 people a year (i work next to them :P). Not even close.

In some sense, talking about the engineering challenges makes no sense - they
basically don't matter to the overall cost at large scale.

These same things apply to most of the broader (in the sense of who it
touches) pieces of developer infrastructure like programming languages, etc.

As you can also imagine, you can't stand still, and so you will pay costs
here, and need to be able to amortize these costs over the longer term. In
turn, this means you have to plan much longer term, because you want to pay
these costs over a 5-10 year scale, not a 1 year scale.

[1] It also excludes the net benefits, but you still pay the costs in actual
time even if you get the benefits in actual time as well :)

Also, productivity benefits from new developer infrastructure are wildly
overestimated in practice. Studies I've seen basically show people perceive a
lot of benefit that either doesn't pan out or doesn't translate into real time
saved. So at best you may get happiness, which while great, doesn't pay down
your cost _here_ ;)

~~~
gowld
What is the difference in cost between an 8000-person org all spending a week
learning something new, and 800 10-person orgs all spending a week learning
something new? it's the same fraction of availalbe time

Large orgs have more resources (revenue) and more costs. What matters is
whether revenue:cost ratio is superlinear or sublinear in org size.

~~~
DannyBee
I'm not sure of your first point. The organizations i'm talking about are
clearly agglomerations of teams anyway. Nobody has a flat 8000 person team
structure i'm aware of :)

Also, the fraction is not the problem, it's certainly the same. But the
absolute scale of the number matters at some point.

0.001 people a year of lost dev time is not likely to change what your company
could accomplish.

200 is.

(I think you don't disagree, but i can't really tell, so if you do, let me
know and happy to argue about it further :P)

------
SideburnsOfDoom
This can't really be discussed without also clarifying how your target
language and tools ecosystem does package management. i.e. are you expecting
to generate and version internal packages and then consume them from a feed?

It seems like, if you don't have this facility then the monorepo becomes more
compelling.

Not to mention, are you building 1 or 2 apps, or a whole host of
microservices.

Without knowing that, your experience of mono vs. multi-repos won't be much
use.

------
pm90
I have a simple question: are monorepos possible in git? What is the upper
limit of contributions per day for git to be effective in a monorepo?

~~~
CydeWeys
Absolutely. Most monorepos are git.

The entire commit history of Linux kernel development exists as a single git
repository which you can check out here:
[https://github.com/torvalds/linux](https://github.com/torvalds/linux)

(Indeed, git was developed for this very purpose.)

And it's very doubtful you're going to become larger than Linux. If your
company becomes so large that git can't handle all of your code then quite
frankly you've exceeded beyond your wildest dreams. Worrying about Google
scale for your code repo when you're a startup is the ultimate counting
chickens before they hatch.

~~~
beojan
I really wouldn't call the Linux kernel a monorepo. Not compared to Microsoft
keeping all of Windows in a single repo.

The problems start when you can't clone the repo within an hour or so, because
git doesn't allow partial clones like SVN did.

~~~
CydeWeys
All of the Linux source code is in a single repo. How is that not a monorepo?
How are you defining monorepo if not this?

The entire Windows codebase is also a monorepo. It just happens to be a bigger
one.

~~~
beojan
I'd define a monorepo as multiple loosely (or un-) coupled projects in the
same repository. The Linux kernel is a single strongly coupled project.

Nevertheless, whether or not you should keep your code in a single repository
is more a question of the sheer size of the repository than of whether the
code logically belongs in the same place.

~~~
CydeWeys
Makes sense.

But yeah, even the total code of all projects combined at a startup isn't
likely to get anywhere close to the scale of the Linux repo, so it's at least
a good example of how far you can get with a single large git repo (regardless
of the relatedness of the contents therein).

------
lmm
> Looking at it from the other side, could introducing strict borders somehow
> make it easier to reuse logic? I think it's clear that borders can only take
> away from your ability to perceive opportunities to use abstractions or to
> unify code.

This is not at all clear. I'd argue that a visible organization of your
codebase into repositories makes it easier to reuse code in the same way that
interface/implementation splits do: it makes it clearer which parts felt
domain-specific and which felt like reusable libraries.

> The bottom line is that you should pick the right abstraction and the right
> place for a function or class based on the individual merits of the case -
> and not driven by facts about repos created a long time ago.

This seems to be assuming that repository boundaries are defined in the
beginning and fixed for all time - the same mistake I see opponents of static
typing making. Your repository structure reflects your logic and business
structure; as those change you change your code structure to match.

> True, touching multiple subprojects in a single commit is not always
> desirable. For example, updating backend and frontend components
> incrementally in backward-compatible ways can be the better approach. But
> even so, it's useful to retain the option of cross-boundary commits for many
> reasons including simplicity and enforced coordination.

This needs to be justified. In much of programming we consider the benefits of
strict isolation to outweigh the costs - e.g. private fields in OO languages,
true parametric polymorphism, microservices. You can't just assert that having
the option of bypassing the good practice is worthwhile.

> If you think about it, splitting a codebase into sub-repos is a ham-fisted
> way to enforce ownership boundaries. Developers are not arguing children
> that need to be confined to separate rooms to prevent fights. With
> sufficient communication and good practices, a monorepo will allow you to
> avoid the question “which repo does this piece of code belong to?” Instead
> of thinking about repo boundaries - effectively a distraction - a monorepo
> allows you to focus on the important question: where should we draw the
> boundaries between modules to keep the code maintainable, understandable and
> malleable in the light of changing requirements?

Communication and good practice are the most costly way to enforce important
things; you could equally well argue that e.g. unit tests are a ham-fisted way
to enforce non-breaking of code and developers are not arguing children that
need to be reminded not to break each other's functionality.

Repo boundaries are higher-level than directory boundaries. No-one is arguing
for having each directory in its own repo, but being able to represent "not
directly involved, but versioned together" and "separate enough to be
versioned separately" is a very valuable distinction to have in your toolbox.

> Many of us, especially in the world of startups, work in smaller teams -
> let's say less than 100 developers.

Do you find it practical to communicate and co-ordinate with 100 other
developers before making any changes? Because that's the only case where a
single repo makes sense - when you are working closely enough with every other
developer sharing the repository that you don't need to go to any extra effort
to organize who is changing what.

Once you're not attending the same standup, you shouldn't be working on the
same repository. You need to have a release cycle with semVer etc. so that
people who aren't in close communication with you can understand the impact of
changes to your code area. Since tags are repository-global, the repository
should be the unit of versioning/releasing.

~~~
loevborg
OP here - interesting perspective, especially the parallel drawn to
dynamic/static types. Thanks for the thoughtful post

------
chrismatheson
Glad you wrote this and saved me the effort, pretty much what I was thinking
of jotting down

