
Advantages of Monolithic Version Control - benkuhn
http://danluu.com/monorepo/
======
erikb
Both variants are nasty in a huge repo. The quoted example at the end is a
good example. If you have different repos, you sometimes can't push to repoA
because you are depending on repoC. If everything is in one repo it should
just work. But that's not really the case. You still need to wait for changeC
before you can go and do changeA. Same problem, independent of the repo
situation.

The solution is that changeA must be backwards compatible. In a complex system
you always need to have some kind of backwards compatibility, at least for
some time.

In the end both (mono and multi repos) doesn't really work in huge, complex
scenarios.

------
Eridrus
For a moment I was a little frustrated at how we split everything into
separate repos at work, mostly due to the difficulty of finding and
refactoring code.

And while a monorepo would help a lot with discoverability, I think that the
promises this article makes about cross-project changes are a bit optimistic
since it ignores the difficulty of doing deployments in live distributed
systems. Even if you have a single git repo, there will certainly be an order
you need to deploy them in so that things don't break when the API consumer
gets updated before the API provider. Google/FB/Twitter/etc certainly have a
better deployment system than we do, so maybe for them it's easy, but it's not
something that is just solved by going to a monorepo.

------
w_t_payne
The right approach depends on your configuration management strategy, which in
turn depends on the kind of product you are making.

If you are developing a big system with lots of components that need to work
together (e.g. an embedded machine vision system with a multitude of related
data-recording, calibration, simulation & test utilities and subsystems), then
the matrix showing which versions and configurations are compatible with one-
another quickly grows to an unmanageable size.

Unless you have a god-like configuration management system, the only practical
approach is to go "green trunk" and co-version all of your components.
Granted, this doesn't necessarily _force_ you to use a single repo, but a
single repo is (at least initially) the simplest approach.

Of course, you could go "old-skool" and define interfaces up-front then freeze
them, but this just slows your development down to a snail's pace. Better
(IMHO) to co-version your components then lean on your integration tests to
maintain compatibility.

~~~
w_t_payne
Of course ... when your development team grows _really_ big, then this may
well become untenable ... but "green trunk" should work in development teams
up to a couple hundred developers or so.

------
qznc
One big issue where I would love Open Source to innovate: Test downstream.

If you change an Open Source library there is no way to check with users of
your library, because they are hard to track down and use various different
build processes. In a monorepo, this is much easier. You could create an
automatic "build and test everything".

~~~
steveklabnik
We've been working on tooling for Rust that runs tests across everything in
the package repository, as a way of hopefully detecting regressions.
[https://internals.rust-lang.org/t/regression-report-
beta-201...](https://internals.rust-lang.org/t/regression-report-
beta-2015-05-01-vs-nightly-2015-05-03/1990)

------
deathanatos
> Me: I think engineers at FB and Google are probably familiar with using
> smaller repos (doesn’t Junio Hamano work at Google?), and they still prefer
> a single huge repo for [reasons].

I'm a former such engineer; I still prefer smaller repos. There's enough
engineers at both companies that I can assure you such opinions (and
knowledge) are quite varied.

> it’s often the case that it’s very easy to get a dev environment set up to
> run builds and tests.

I've worked with both; in both cases, the workflow was essentially a checkout,
followed by a build, followed by running the tests. I've found this is more a
product of the environment (i.e., do the developers care about tests being
easy to run) than the VCS in use.

> With a monorepo, you just refactor the API and all of its callers in one
> commit.

I'd restate this: with a monorepo, you _must_ refactor the API and _all_ of
its callers in one commit. You cannot do it gradually, or you _will_ break
someone. A gradual refactor is only possible in multiple repositories,
specifically multiple repositories that obey something resembling semantic
versioning. You make your breaking change, and because it is a breaking
change, you up the version to indicate that. Reverse-dependencies wishing to
update then must make the change, but can do so at their leisure.

I've seen some truly heroic work done to get "APIs with thousands of usages
across hundreds of projects get refactored". Sometimes it _is_ easy: you can
track down the callers with a grep, and fix them with a perl script. But I
think you must limit yourself to changes of that nature: massive refactors too
great for a script would leave you to edit the call sites. Though, with
thousands of callers, this is probably true anyways, I find having to move
even a couple dozen through a major change (such as one where the paradigm
expressed by the API is completely wrong) is difficult if you must update them
all at once.

Last, the most common "monorepo" system I've seen is Perforce, and compared to
git it has such stark usability issues that I'd rather not go back to it
(staging, git add -p, bisect, real branches). This comment though,

> where it’s impossible to do a single atomic commit across multiple files

I would hesitate to use "atomic" to describe commits in Perforce; if you check
out CL X, make some changes, and "commit" ("submit" is Perforce's term), the
parent of your new CL might be Y, _not_ X, and you might get no warnings about
this, either. Collisions on an individual file will prevent the submit from
going through, but changes on separate files (that together represent a human-
level merge conflict) , will not get caught. (They wouldn't show as merge
conflicts in git, either, but git will tell you that someone updated the code,
and refuse your push; unit tests must catch these, but in Perforce's case, you
must run them after making your change permanently visible to the world.)

~~~
robaato
It is not difficult to achieve the "atomic" commit issue you mention via use
of triggers. And indeed streams provides it out of the box.

These days, with the use of shelving, there are integrations with CI tools
such as Jenkins which provide "pre-flight" builds before checkin - thus before
your changes are world visible.

------
moron4hire
Malarky.

>> With multiple repos... having to split a project because it’s too big or
has too much history for your VCS is not optimal... With a monorepo, projects
can be organized and grouped together in whatever way you find to be most
logically consistent, and not just because your version control system forces
you to organize things in a particular way.

Uuuuh, if your one project has to be split because it's too big for your VCS,
then you aren't going to make that thing smaller by putting multiple projects
in with it.

>> A side effect of the simplified organization is that it’s easier to
navigate projects.

That's a UI issue. Build a better UI, don't use a dirty hack, especially one
that has other implications.

>> A side effect of that side effect is that, with monorepos, it’s often the
case that it’s very easy to get a dev environment set up to run builds and
tests.

With the growing trend of package managers being able to install dependencies
straight from a git repository, I don't see this being an issue much longer.
Again, this is a UI issue.

>> This probably goes without saying, but with multiple repos, you need to
have some way of specifying and versioning dependencies between them.

Yeah, no shit, that's just good software development. The argument here is
that a monorepo lets you be lazy.

~~~
durin42
You're missing a huge part of the benefit, which is that when you go and do
internal API cleanups you don't have an awkward dance across N repositories,
you just have _a change_ that's atomic across the entire project.

It's a huge benefit to be able to have all your mobile apps and web apps and
whatever in the same repo, because then you can easily see who calls what, and
how various RPCs are used, etc. Don't knock it until you've tried it.

(I used to think monorepos were dumb, but over the course of several years I
came around.)

~~~
moron4hire
That's what I'm talking about being lazy. These companies provide public APIs
and we all will have the same headaches of having to update our software to
those APIs. By dogfooding your own APIs in the same way your customers eat
them, you force yourself to have to communicate change correctly.

~~~
plorkyeran
You say "lazy", I say "not wasting time on unnecessary things". Versioning
dependencies is something done to solve problems, not an inherently good moral
imperative.

------
otibom
Could an alternative be to use individual repos and also a meta-repo? The
metarepo contents are the commit ids of each individual repo.

So let's say you want to update a repo which depends on another one:

Update project A, commit changes.

# Your product hasn't changed at this point

Update project B, commit changes.

# Your product hand't changed at this point

In metarepo, checkout to the new 'master' branches of Projects A and B, commit
that to metarepo

# Your product is now updated!

~~~
mryan
This sounds a lot like git submodules.

~~~
jordigh
Or Mercurial subrepos.

Both of which are far greater of a pain in the ass than monorepos.

------
lmm
I view the repo as the unit of versioning. If something has its own release
cycle with its own semver number, it should be in its own repo, and vice
versa. Monorepos make sense if your whole site is at a single version, as in
the article. But if you want to have libraries that have stable versions
(which I find useful, because it allows teams to own codebases - and I find
maven is much much better than this article seems to think) then it's worth
putting them in their own repositories.

~~~
gecko
As a counter-argument: I think that using a monorepo encourages a steady and
careful movement of API deprecation, which is actually healthier than semver.
In fact, I'll contend that semver is nothing more than an effort to bring
monorepo-like bug fixes to environments that can't otherwise have monorepo-
like management.

Here's the thing: if all the code is in a single repo, it's _really_ easy for
me to find all users of a given function. This means that, if I need to alter
the behavior of a function, I can sanely pursue any of three options:

    
    
      1. Know for a fact I can alter the behavior without impacting
         anyone.
      2. Refactor across the whole code base in one shot.
      3. Deprecate the API and provide a replacement immediately,
         then work with teams to get off the deprecated API.
    

In all three cases, I can trivially know exactly what I'm impacting and make a
decision on how I want to impact it. As a bonus, not only can I much more
directly control deprecation times, but it's dramatically less likely that one
of my client programs keeps deploying with an outdated, insecure/buggy version
of my library.

In my view, the whole point of semver is handling situations wherein I
_cannot_ do that. For example, if I publish a library on GitHub, it's simply
not reasonable for me to know who's using my library and why. Thus, semver
provides a contract between me and you: since I can't know what you're doing,
I promise not to do certain things with my library so that you can use it with
some confidence. But there is a cost here: it's harder for you to track
upstream, it's harder for me to know who's still using what in the older
library, and it's harder for me to figure out how I can help people upgrade.

I'm emphatically not saying semver is bad, but I am saying that it's
deliberately compensating for source code federated across many repos. A
properly managed monorepo keeps you from having to worry about it.

~~~
JoshTriplett
You don't need a monolithic repo to do that. Just pull in the library as a git
submodule, and you control exactly which version (by git commit hash) of the
library you use. And in a single commit, you can change that commit hash and
change the code using the library, just as you could in a monolithic repo.

~~~
acveilleux
With subrepos, you -- the consumer -- are still stuck with either stagnation
("stability") or you need take on a bunch of refactors whenever you update the
reference to upstream.

The producer in that case is obviously completely unaware of what you're doing
so they cannot make decisions based on that fact. They may, for example,
deprecate an API and provide no direct equivalent because they don't know
anyone's relying on it. Then you suddenly have to come up with additional code
to restore an equivalent API...

The subrepo approach is OK if you're tracking some external code. In that case
it's comparable to semver but possibly better integrated with your
build/tooling, you win something. But if it's internal to your organization,
you're hacking around what a single repo could provide you.

------
kossmoboleat
I wonder if Google is still using Perforce? That's one of the arguments used
at our company to argue that perforce handles huge repos very nicely.

~~~
gecko
<wrong>Google is still using don't-look-behind-the-curtain-it's-totally-not-
Perforce-but-it's-Perforce</wrong>

Google is using a Perforce lookalike that is not actually Perforce, but
they're moving to Mercurial. Subversion also scales up to similar sizes,
although I confess to not knowing what "similar" means. (I do know that Google
did an experiment and concluded that Subversion would scale to their needs,
but they were already committed to Perforce by that point.)

EDIT: Got corrected by someone who has good reason to know.

~~~
kalmar
> Google is using a Perforce lookalike that is not actually Perforce, but
> they're moving to Mercurial.

Do you have a source for this move to mercurial? This is the second time I've
heard it, but couldn't find a source before.

~~~
gecko
I don't have a publishable source, no, but you can look at the number of
@google.com email addresses on the Mercurial mailing list if you want some to
do your own inference, or just hang in there for a bit and you won't have to.
:)

------
nothrabannosir
_Of course, there are downsides to using a monorepo. I’m not going to discuss
them because the downsides are already widely known and discussed._

I searched but I honestly can't find them. The only things I can think of are:
implementation problems (repo too large, every git pull takes more time than
strictly necessary for your project) and tainting tools like "git log".

Those don't seem very fundamental. Solvable by a "monogit" wrapper, if someone
put their mind to it. Or are they?

Is there a fundamental problem I'm missing? I feel stupid for asking, given
how matter of fact he dismissed it :(

EDIT: I guess what I'm trying to say is: monorepos feel, theoretically, like a
strict superset of many independent ones. It's just the tooling that makes it
less convenient.

~~~
blackaspen
The speed is a serious downside.

Go on vacation for two weeks? You get to go on another vacation while you wait
for your monorepo to update. Similarly, good luck ever trying to work from a
coffee shop -- at the very least you probably have 50mb of updates because
everyone in the company is committing code all the time.

~~~
gecko
This is only true if you do two unrelated things simultaneously: have a
monolithic repo, _and_ keep the whole thing checked out at all times. Provided
you have a sane build system (pants, Blaze, Buck, etc.), then the only thing
that should need to have the full checkout is the CI server; you should be
able to safely work from a narrow clone.

That said, I find your note a bit amusing. I'd have agreed a decade ago, but
nowadays, 50 MB should take effectively zero time for you to both download and
apply. It's worth revisiting things like this as LANs and the internet get
faster; I think the speed comment is a bit outdated now.

~~~
blackaspen
True, however Pants right now does not have support with Git to only checkout
dependencies it actually needs for the build.

I agree that it should take effectively zero time for me to download and apply
50MB of updates, but poor quality internet is still pretty easy to find
(working with plane wifi) -- maybe my coffee shops have worse internet than
yours!

------
JDDunn9
I've found companies typically use monolithic development because they don't
get all of the benefits of modular development. e.g. they don't re-use code
cause they only have 1 website (and don't care about open sourcing). I'll
stick with modular all the way. :)

~~~
Negitivefrags
Most of the arguments in the article were in cases where you do reuse code
though.

If you didn't reuse code (across repos) then you wouldn't have any of the
problems that a monolithic repo solves.

~~~
hvidgaard
I just cannot imagine trying to make sense of the history of a giant monorepo.

What I don't get is the example from Twitter. If I need a fellow developer to
fix projectB and projectC for me to fix projectA, I ask him to fix it. As soon
as he has committed it to their respective repos, the buildserver would pick
it up, and I can expect the next time it builds projectA, it pulls the lastest
version of projectB and C and use them.

~~~
michaelmior
If you always pull the latest version of other projects, you have no way to
have a consistent snapshot of a working system. If you use a monorepo, it
means that you always know which versions of every file in every project are
in use at a particular commit. This of course is also true if you use separate
projects and pin their dependencies. But if you use separate projects and
always grab the latest version of each subproject, there's no way to ensure
you have compatible versions of different projects in use.

~~~
theCodeStig
You don't need a monorepo to solve this problem. The dependent projects would
depend on a specific version of an artifact, rather than a 'latest' pointer.
Example: Project 1 depends on version 1.2.3 of Library A, and Library A builds
are hosted in an artifact repo. Project 2 needs an update to Library A, and
those updates are built and released as version 1.3.0. Project 1 continues to
use version 1.2.3, until it's updated to work with the new 1.3.0 API.

~~~
michaelmior
Of course you don't need a monorepo to solve the dependency problem. My point
was that just using the latest version of all projects in different repos is
not necessarily an acceptable solution.

