
On Monolithic Repositories - tristanz
http://gregoryszorc.com/blog/2014/09/09/on-monolithic-repositories/
======
wtbob
I've worked with monolithic and project-based repositories, and in practice
I've found the problems with monolithic repos are less than the problems with
project-based repos, and the benefits of monolithic repos are more than the
benefits of project-based repos. Certainly there can be issues at a very large
organisation with very many extremely large projects—but most of us don't work
at those organisations with that many projects that large.

I think that having one large repo helps identify cross-project dependency
breakage faster, e.g. on a small team without fully-automated integration
tests by increasing the likelihood that the person or team who broke the
integration notices rather than the person or team who maintains the affected
components.

There's also the issue, as jacques_chester notes, of shared components, some
of which are far too small too small the be their own projects and which don't
necessarily make sense thrown into a pile with other projects.

Project-specific repos make a lot of sense from an organised, a-place-for-
everything-and-everything-in-its-place perspective, but real life is often
quite messy and mutable, and the proper organisation for a project can change
frequently (as the article notes); there's no sense chiselling it into stone.

~~~
viraptor
I think there's a missing distinction here: is it a monolithic repo for
products available separately, or monolithic final deployable thing(s) (google
has multiple products, but it's still "google services").

This may be better for them, but it wouldn't work for example for OpenStack,
which has many projects available and released separately. Putting nova,
stevedore, anchor, bandit, etc. in one repository just wouldn't make sense -
they have their own versioning and live their own lives, even if they will be
frozen/released in one go as a single working deployment in October.

So when the author writes "When I am interacting with version control, I just
want to get stuff done. I don't want to waste time dealing with multiple
commands to manage multiple repositories." they just don't have that use case.
They don't have to care about releasing different bits separately.

~~~
Nitramp
Google has self driving cars, YouTube, various Android apps, and web search in
the same repository. That does seem a lot more varied at the first look than
e.g. OpenStack, and different products certainly have extremely different
release cycles and even deployment platforms (hardware in cars, Android
phones, web servers).

The question really is whether you can scale your version control practices,
build tools, and source organization habits across many diverse projects.

~~~
viraptor
Do they really store youtube, self driving car and Android in the same repo?

As far as I can tell, at least android bits live in _many_ repositories:
[https://android.googlesource.com/](https://android.googlesource.com/) \- I'd
say over 300, with one for each utility.

~~~
Confusion
Exactly. I frankly have no clue what this article is about. I work at a
company with < 10 developers. We started with one repo and now have over 20
repositories for various bits and pieces of our code. Each one maps to its own
releasible component. I don't recognize the 'I don't want to waste time
dealing with multiple commands to manage multiple repositories' at all. The
only time there is a difference on the cli is when you clone a repo. If
anything about this would hurt, we'd change it: optimizing our workflow is
something we pay a lot of attention to.

In fact, with hundreds of developers and everything in one repo, I don't see
how you'd ever be able get a commit through : you'd be merging commits that
others just did all the time and would have to get lucky?

~~~
teraflop
The issue with multiple repositories has nothing do to with the number of
commands you have to run. As you say, that's the kind of thing that can easily
be automated.

The problems arise when you have to combine code from different repositories
into a single deployable product. Most of us don't take Amazon's hard-line
stance of making absolutely everything a microservice, so we end up with
libraries of reusable code that are referenced by multiple projects. But when
you store those libraries in separate repositories, it becomes impossible to
describe the state of your deployed code without listing the version of every
single dependency. That makes it easy for subtle inconsistencies and bugs to
creep in, especially when the dependencies are multiple levels deep and are
owned by different teams. If everything lives in the same tree, then a single
commit ID reproducibly describes a complete system from top to bottom. And you
can atomically make changes that cross module boundaries, which is difficult
to do safely with separate repositories.

I don't really follow your comment about merging. Pretty much every version
control system since forever has been smart enough to realize that, if I make
changes only to foo/src/ and you make changes to bar/src/, our changes don't
conflict and can be merged automatically without user intervention. (There
might be _technical_ difficulties; for example, if you're using Git, I would
imagine that trying to view the list of commits of a small subtree of a
gigantic repo might not be terribly efficient. But just like the issue of
managing multiple repos, that's something that you can solve with better tool
support, if you really need to and are motivated enough.)

~~~
Confusion
About merging (I'll get back to the other bit later): if I want to push my
change, I already regularly encounter the situation where git tells me: you
are not up to date, even if I pulled say 15 minutes ago. So I have to fetch
and rebase before I can push. Even if that doesn't require manually solving
any merge conflicts (and I think I have to in about 1/4th of the cases), it
takes a bit of time. In that bit of time another developer can push a commit.
If that already happens on occasion with 10 developers, I imagine it becomes
very problematic with 1000 developers on the same repo.

Of course you effectively organise your repos in a 'tree' like e.g. happens
with the Linux kernel, but, well, then you again have many repos. One per
component that has someone responsible for signing off on them. It reduces the
amount of people committing to each specific repo, but in the end, it's still
many repos. So that's not 'one monolithic repo' in my book.

~~~
teraflop
Still, that's a limitation of the specific tools and the way you're using
them, not a fundamental problem with monolithic repositories.

You're running into problems because you have multiple people all trying to
push to the same branch, and Git errs on the side of extreme caution rather
than creating a merge that you didn't specifically request. If you use merge
requests instead, as supported by Github/Gitlab, you don't get blocked by
other people's commits.

( _Actual_ merge conflicts are an orthogonal problem. They'll happen whenever
you have multiple people editing the same code, regardless of what VCS you use
or how it's organized.)

The Linux kernel example is a bit of a tricky one. Yes, kernel.org hosts a lot
of repos, but they're all different versions of the same codebase. They have a
common commit ancestry, and commits get merged from one to another. So they're
not really separate modules in the sense that we're talking about; they're
more like namespaces that each contain a set of branches.

~~~
Confusion
I guess it comes down to organisational style. Using merge requests requires
some benevolent dictator (poor sod?) to perform the merges. For us, if your
commit is approved in Gerrit, it's your task to merge it. If some other commit
was merged in between, you have to rebase, possibly solving merge conflicts.
So guess that with a monolithic repo, that wouldn't scale, but it could be
made to scale by appointing someone to perform the merge requests. I'm not
sure I would like to be that person...

~~~
Mathiasdm
I helped set up such a system for a few hundred developers. We had an
'automated Linus Torvalds', which did the merge, and aborted whenever a file
was changed on both sides of the merge.

In the good case (almost every time), there were no conflicts, and the merge
went fine (we had unittests, builds and regression tests as extra checks in
our CI system).

In the bad case, the developer request was rejected and the developer was told
to rebase or merge his code on his own, so the merge issues would be handled.

------
jcranmer
I'm very much a fan of monolithic repositories, because I have had horrible
experiences working with project-based repos. I've been around long enough to
recall discussions of why people should stop using CVS, and invariably one of
the bullet points on that list is "CVS lacks atomic commits." Projects that
use multiple repositories invariably fall into the trap of having non-atomic
commits, and people who advocate multiple repositories have likely never had
the fun task of trying to do archaeology on those repositories where the
sudden non-atomicity is suddenly painful.

When I've brought this up before, people occasionally mention submodule or
subrepositories, and those are equally broken. It makes big assumptions about
how you're going to be organizing repositories (i.e., a strict tree), and if
your design is not in that organization scheme, you're up a creek. For
practical development, the subrepo tree effectively becomes one monorepo
anyways: touch the innermost subrepo, and now you need to add a commit to all
the outer repos to update the newer version of the innermost subrepo.

A saner way to handle repos is to recognize that it's not necessary for people
to have the full history of everything in the repository stored on their local
machine most of the time. This is something that SVN does better--you can
checkout subdirectories of an SVN repo, but the commits are still atomic
across the entire repository.

~~~
warkid
> A saner way to handle repos is to recognize that it's not necessary for
> people to have the full history of everything in the repository stored on
> their local machine most of the time.

Local history enables rebasing. Of course you can do it without local history
already present, say in SVN you'd use separate branches for changes and
'rebased' changes where you merge your work on top of some new state of trunk.
But this involvs creating branches(and at cleast one checkout for separate
branch folder) and communicating with the server(takes time). This all means
that people almost never do this. With git rebase is a snap. But of course one
can live without rebase, it's not oxygen or something.

~~~
jcranmer
You can download the history you need for rebasing when you do the rebase
command, in effect turning the local .git (or .hg) store into basically a
local cache of a remote repository.

------
gnoway
I'm sitting here running git gc and repack on about 50 repos right now, of
varying sizes. We just actually combined two of our larger repos into one for
productivity reasons, so this article resonates with me a little on that
front.

I spend a lot more time on the build and administration side of things than
the code side, and I personally prefer more smaller repos. Builds are faster
and less error prone, less disk space is used overall (regardless of cloning
scheme - I have used them all), and I do believe the separation and inherent
difficulty aids quality at the expense of productivity. I'm about the only
person in the company who does, though, and that tells me this discussion
depends more on how you personally interact with source control than any
abstract 'monolithic vs. not' ideal. Or that I'm crazy, but I refuse to accept
that.

------
pcwalton
Small repositories are great for open source because they encourage code
sharing. When your project consists of many pieces all owned by different,
often volunteer, teams, then having one big repository is a barrier to that.

For example, Servo, the project I work on, consists of around 150 small
repositories (in contrast to every other browser engine). Lots of them are
maintained by the Servo project, but lots of them aren't. They're separate
projects of which Servo is only one user of many. The fact that we can take
advantage of the fantastic work of the Rust community by simply adding a
couple of lines to our Cargo files has been invaluable in helping our small
team get the browser off the ground. A culture of monolithic repositories
would discourage code sharing, leading to more wheel reinvention and less code
collaboration overall.

I think the model works well for Google and Facebook because they're big
centralized companies with thousands of engineers under one management and
reporting structure. But that's a far cry from the aggressive, fine-grained
code reuse culture that the Ruby, JS, and Rust (for example) have fostered.

~~~
indygreg2
Thought experiment: the entire github.com URL space is a single repository.
Each organization/user has a top-level directory and projects/forks exist
under them. Does that prevent/encourage code sharing? Why or why not?

~~~
viraptor
Prevents. Instead of cloning just one of your repos and expecting it to work,
I clone `indygreg2` and need to answer:

\- is there one build system for everything, or one per project?

\- are there dependencies between projects?

\- are they symlink-vendored? (which means that potentially I need more than
one toolchain if projects A and B are in different languages)

\- are they completely separate? (do they always assume latest version of your
projects, or do they have reasonable version qualifiers)

\- are any elements included cross-project, or can I just copy one directory
and package it separately?

Those and other similar questions just don't exist (in well maintained
projects) that have separate repos. I know that I can clone one project and
build it.

~~~
indygreg2
Instead of cloning just one of your non-monolithic repositories and expecting
it to "just work" I need to answer:

\- How does the build system integrate multiple, discrete repositories into a
unified system?

\- What are the dependencies between the repositories?

\- How are the sub-repositories laid out on disk? Do the separate repositories
use separate toolchains?

\- How do I decide when to update the reference to a sub-repository? Are they
completely separate? Versioned as one logical entity?

\- Do separate repositories reference elements in each other? Can I copy files
between repositories or should certain files live in certain repositories?

These and other similar questions exist when you use multiple repositories.

(I hope you see that multiple, discrete repositories aren't a panacea and
there is a counterpoint to each of your points.)

~~~
viraptor
I don't agree with some of the counterpoints. Not because they couldn't exist
in theory, but because we already worked out some common solutions for them
and by cloning a separate repository, I expect that kind of problem to be one
of: solved, documented in red big letters, or worthy of a bug report.

1\. (build integration) All popular build systems have some dependency
management answers. Even down to C's `autotools` and `pkg-config` which will
at least tell you what you're missing. But more likely something like pip /
gem / cargo which can just get it for you. Whether that's a released version,
or another repo - none of my business.

2\. (dependencies between repos) Same as 1. They're separate projects.

I don't think these apply at all:

3\. (sub-repositories) I don't see a difference between sub-repos and
symlinking to a repo outside. This is a problem of single repo.

4\. (sub-repositories update) Same as 3 - it's the same as one repository -
avoid sub-repos unless you want to pretend you have one big repo with
everything.

5\. (moving elements) I think that's a straw man. Does anyone have a
reasonable expectation that a file containing code can be moved between
repositories without issues?

While no repository layout is perfect and there are always pros and cons, I
think those examples are really bad as counterpoints.

------
vicapow
I've worked both at a large company (Facebook) that used the mono repo
approach as well as a large company that uses the per project repo approach
(Uber) and I have to say I'm personally a VERY big fan of the project based
repo approach but every company is different. So is every team. If you're a
small company with primarily one service and primarily in one programming
language, the mono-repo way seems to be the best approach. On the other hand,
if you're a company that has embraced a service oriented architecture, the per
project repo approach is likely the way to go. Especially if your company is
OK with services being written in a variety of different languages and so long
as it is as easy to use open source code as it is to use third party code
written within your org. It also goes a long way in supporting local (ie.,
laptop) development. Otherwise, the entire codebase would be too big to fit in
RAM.

Disclaimer, these opinions do not necessarily represent the opinions of my
employer.

~~~
indygreg2
If your concerns are driven by resource requirements, then I posit your
concerns are driven by limitations of _fully_ distributed version control
tools of today. Shallow and/or narrow clone (like the Subversion model) limit
the amount of data required on clients and thus facilitate monolithic
repositories without the extreme resource requirements on clients.

I posit that if Git or Mercurial allowed you to clone a subset of directories,
the differences between a monolithic repository and a set of smaller
repositories becomes indistinguishable, as a clone of a sub-directory is
functionally equivalent to a standalone repository! The problem is that narrow
clone is not implemented in any popular DVCS tool today (but Mercurial is
working on it).

~~~
glandium
I posit that if Git or Mercurial had better workflows for submodules,
monolithic repositories would seem less attractive.

~~~
indygreg2
I don't disagree. Although, for the case where you want to copy/move things
across repositories, monolithic repositories still have the advantage that
history is more easily preserved. Although you can argue that proper submodule
support would handle this and preserve history.

~~~
glandium
_Although, for the case where you want to copy /move things across
repositories, monolithic repositories still have the advantage that history is
more easily preserved._

Well, let's wait and see how partial clones actually handle this situation.
I'm not convinced a partially cloned monolithic repository will be better than
what submodules currently do.

------
jacques_chester
Another way to look at this is that repos quickly ossify into unplanned Conway
boundaries.

Where possible, the goal is to decouple software components by design, not by
backpressure from the toolchain.

I've seen the many-repo approach. It's particularly frustrating on distributed
systems when a shared component migrates from repo to repo like a sad ronin,
sometimes alighting in some of them more than once.

~~~
jamesrom
> Where possible, the goal is to decouple software components by design, not
> by backpressure from the toolchain.

Taken to the extreme, this produces a single main.c file for an entire
organisation.

Good software design should dictate the types toolchain backpressure tradeoffs
that need to be managed.

------
ElHacker
It all comes down to the scale factor.

The monolithic approach makes more sense for big companies that have a large
portfolio of projects, an army of software engineers and the resources to
develop their own tools around a SCM. This solves several problems like
multiple commits per minute, resolving ever occurring merge conflicts, code
search queries and code sharing. Someone might argue that it can slowdown the
development process by having to load tons of data on a local machine but this
issue can be solved by creating the correct custom tooling to download a sub
node of the overall repo. At this scale level this is the only feasible
solution to empower developer productiveness and to avoid headaches in keeping
track of a long array of projects.

The project-based repositories solution has nice qualities as well: Service
oriented projects, clearcut responsibilities and clear dependencies. This
seems to be a good solution for a small to middle scale organization. Can
design a well documented interface for every service and even expose some of
those services to external clients when needed
[https://www.nginx.com/blog/microservices-at-netflix-
architec...](https://www.nginx.com/blog/microservices-at-netflix-
architectural-best-practices/) . Not that this service design solution is
incompatible with a monolithic approach, is just that is easier to come up
with a simpler answer when having silos of knowledge.

At the lower scale, where most startups reside. Either one seems to be a
reasonable approach. Albeit a startup doesn't have the resources nor the time
to spend on developing custom tools to manage a monolithic repo, it turns out
they don't need to, because they have a small amount of developers and ongoing
commit traffic. A plain simple git repo works as well as a multi repo. This is
a matter of taste and the organization the founders/early employees wish to
create for their project.

------
jamesrom
Option C: Custom tooling built for the purpose of managing collections of
repositories.

It appears the main argument for monolithic repositories is that it improves
developer productivity by giving the developer access to the entire
organisation's codebase.

What a terrible hack to provide something that could be better managed with
other tools. No DCVS is an island.

~~~
stinos
There's also the multiple repository tool 'mr'
[http://linux.die.net/man/1/mr](http://linux.die.net/man/1/mr) which adresses
exactly the _I don 't want to waste time dealing with multiple commands to
manage multiple repositories_ complaint of the author and moreover can do so
for repositories using different version control systems.

I've been using it for years mainly because I have a few repos shared between
projects and jobs, and using one monolithic repo would mean having to copy
those repos around. (At least AFAIK - or how else do people using one big
repository use external projects?)

------
jinushaun
Someone needs to do for subrepos what git did for branches. Working with
subrepos is a nightmare.

~~~
saurik
Subversion ;P (the reason their branches were so awkward was because they were
essentially sub repositories...)

~~~
lfowles
So we combine the best of git, subversion, and perforce. perverted-git,
anyone?

~~~
pm
What's the advantage of Perforce (I don't know as I've never used it before)?

~~~
to3m
It's quite fast and it seems to scale well to large (though not necessarily to
ridiculously large) repositories. The largest one I've used it with had a 1TB
head revision - I don't know how large the history was - and a zillion files.
Performance was still fine. (This is probably its main advantage, really: you
can just put all your files in it. Then it's easy for everybody to get them,
and you didn't have to think too hard about it.)

It uses the check in/check out model, so there's no problem with unmergeable
binary files. There are per-user access permissions. Branches are folder
copies. There's a GUI tool, but you can do everything from the command line as
well (I believe that is exactly how the GUI tool does everything).

(UX is a bit hit or miss though. There's no git-style index, and the command
line tool's output isn't as convenient to parse as you might like. On the
other hand the diff/merge tool is alright and the UI for keeping your branches
in sync is fine.)

I've never minded using it.

------
reissbaker
I work on the Developer Infrastructure team at Airbnb — essentially our tools
team — and have some experience with both sides of the coin. Airbnb's
monolithic vs project-based repository organization is currently split along
language lines on the backend: the Java folks prefer a single monorepo (and
have one), and the Ruby engineers use project-based repos (and have many).

There are a lot of good points made about the benefits of monorepos, and at
Airbnb we enjoy several of them. What hasn't been mentioned is the effort
required to do them well: you need specialized build and dependency tools to
ensure that you only run builds+tests for the single project that's being
changed; engineers have to check out extremely large amounts of data to work
on a single subdirectory, or else you need custom tooling to allow them to
only check out portions while still contributing to the larger whole; if
someone mistakenly breaks a piece of shared code and merges it to master,
every project is now broken and engineering work may be stalled unless you
have very good debugging tools and testing frameworks to quickly recover from
and prevent these kinds of issues.

The upfront costs of doing monorepos well are high, and doing them poorly is
in my experience a net productivity loss. For large companies with established
business models (and I'd consider Facebook, Google, and Airbnb to be some
degree of "large" and with "established business models," although Airbnb is
clearly still much smaller and pre-IPO), the tradeoff of allocating some
number of engineers to work on in-house tools for a much larger engineering
team is usually worthwhile, and monolithic repos start to become an attractive
option. I'd caution small companies or early-stage startups against monorepos,
though: when you're a twenty-person team, that amount of tools work just isn't
worth it. Use open-source tools, and spend the rest of your time shipping
product.

TL;DR: Facebook and Google have optimized their workflows for their size; if
you're not a Facebook or a Google, your mileage will probably vary.

~~~
qznc
Facebook uses Mercurial. Mercurial works well for small projects as well.

Mercurial has no clear technical advantage over git though. That might change
if Mercurial gets narrow clones (only check out subdirectories) working and
git fails to clone (haha) that.

~~~
glandium
IIRC there were experimental patches implementing it for git a few months ago.

Ironically, to make narrow clones happen, Mercurial is going to change its
internal structure representing the files in the repository (the manifest) to
use a separate manifest for each sub-directory... like git has been doing for
trees from day one.

~~~
qznc
I found patches from 2008, but nothing recent.

[http://article.gmane.org/gmane.comp.version-
control.git/9634...](http://article.gmane.org/gmane.comp.version-
control.git/96349/match=narrow+clone)

~~~
glandium
Your link is about narrow checkout, not narrow clone (so the .git data is
still transfered completely, as opposed to transfered partially). Maybe what
I'm thinking about was just a discussion with no formal patch, I don't
remember.

~~~
davvid
The two features of interest are "shallow clone" and "partial checkout".

Both of these features are already part of git.

A current experiment is an untracked files cache, which speeds up stuff on
large repos considerably. This stuff is actively being worked on -- the git
project has always valued performance.

~~~
qznc
No, "shallow clone" and "partial checkout" are both different to "narrow
clone". Maybe "partial clone" would fit the git terminology better.

A shallow clone misses history after a certain point in time (well, commit
order). A narrow clone misses history except in a subtree. A partial clone
looks like a narrow clone in the workspace, but a narrow clone should have
substantially less objects to download and store.

------
wpeterson
Architecture is just dividing pain into different buckets.

Large repositories are painful, as each developer bears the pain of
integration every time they have to make a change to the repository
integrating with all of the other code.

Small, distributed repos are painful when change accrues in larger increments
and must be resolved at release/integration time.

As a developer in one of the largest monolithic repos in the world, I feel the
pain every day of that pattern.

~~~
itsdrewmiller
Why do you think integration is more painful in larger repositories?
Presumably the architecture of the code is unchanged.

~~~
wpeterson
When you have separate repos, the external code and integration points remain
relatively static. IE: You may consume an API, but it's versioned/snapshotted
at a given version.

With a monolithic repo, especially without development branching - everyone is
running against HEAD all the time. So while you are developing your component,
everything else is changing around you.

------
rdtsc
As someone already said, it is not "many repositores vs one" but often "many
repositories + custom tooling to merge, agregate, diff, checkout, etc. vs
one".

We've tried both, had one large repo (2G in size, mostly due to historical
mistakes of checking in large binaries). Now we have many little repos but
we've spend I don't know how many man months writing custom scripts (based on
gyp) to manage all these collection of repos.

So far I would say it is toss-up. I would say for completely separate products
that do not share almost any code it might makes sense to have separate repos.
But if you find yourself building a custom version of git porcelains on top of
your multiple repos -- you have probably gone too far.

------
tristanz
Interesting comment on that article by Siddharth Agarwal:

"(I work on source control at Facebook.)

Every problem you mentioned with monolithic repositories is a well-known
problem with Git (though some of them do have workarounds, such as clone with
--depth). None of them are issues in principle.

With Mercurial we're aiming to provide tooling that scales well while still
maintaining DVCS workflows like local commits."

~~~
lfowles
Related: [https://code.facebook.com/posts/218678814984400/scaling-
merc...](https://code.facebook.com/posts/218678814984400/scaling-mercurial-at-
facebook/)

------
comex
The most interesting takeaway from me is that Mercurial is planning to support
"narrow clones" soon, i.e. sparse fetches (i.e. you only have to download some
subtree of the whole repository). It would be great if Git followed suit at
some point - the whole monolithic versus project-based repository debate would
be a lot more interesting if it were just a matter of convention, as opposed
to the former being associated with "ew, SVN/Perforce/[insert old fashioned
feeling tool]".

------
yellowapple
I'd agree with this article's conclusions if the current-gen VCSes (like Git
and Mercurial) actually provided the necessary features and tooling to
effectively manage a monolithic repo. Comparing to SVN's ability to checkout
portions of a repo instead of the whole repo doesn't do much good in systems
that aren't designed to handle that sort of workflow.

Whether or not one should go monolithic or project-based depends on how
tightly integrated those projects are. Folks like Google and Facebook and
OpenBSD's dev team develop all their things under a single source tree because
those things are pretty tightly integrated and designed to interoperate. Other
folks develop each of their things in a separate repo because they expect
those things to be independently useful (OpenBSD's subprojects _do_ happen to
be independently useful, but AFAIK the priority is generally "OpenBSD first,
everyone else second").

Multi-repository environments can also be very manageable depending on the
language, runtime, etc. Dividing things into do-one-thing-and-do-it-well
chunks (perhaps gems in Ruby land, or crates in Rust land, or packages in Perl
land, or whatever) goes a long way toward alleviating the typical non-
atomicity of multi-repo setups. Of course, this tends to be easier for open-
source projects than closed-source projects (though the closed-source camp can
have some of this fun, too; using Ruby as an example, one can run a private
gem server just by running the "gem server" command, or - if using Bundler -
can even install dependencies directly from Git repos), but it's still a
possibility.

Plus, the whole "well Google says it's more productive for them, so we should
take it as a general rule" vibe doesn't really sit well with me.

------
mirceal
I am definitely in the multiple repository camp, but with one caveat (below).

With the monolithic approach everybody is forced to be on the latest version
(or to update to the latest version). There can be a lot of contention when
trying to get things out - especially when things are moving fast. You need a
lot of discipline to make it work and you also are forced to constantly update
parts of the code you own. Also, we tend to forget about deployment and
management of the build artifacts. I am willing to bet that bigCo's that do
this have a separate system for managing the artifacts and performing the
deployments.

In the presence of a decent build and deployment system the only way to go is
multiple repos - a logical separation per component / service. The build
system can easily track dependencies via components via metadata associated
with each of them. You can easily orchestrate builds when something happens,
it's easier to rebuild only what changes and you get granular control over
build artifacts.

------
sandGorgon
can someone talk about how you keep track of commit messages when committing
to different subfolder/subprojects of a monolithic repository ?

We have 6 projects on bitbucket - and we would love to move to a monolithic
model (we have already seen integration issues because of developer
carelessness with deployments, that would go away)... but I'm just not sure on
how we will do things like Slack/Hipchat integration, emails on commits, etc.

------
Zardoz84
Were I work we have a monolithic subversion repo that contains :

\- DBs ddl

\- Selenium automatic tests

\- A hierarchy with our Java code were we have our libraries, modules, final
products and client personalization of our products.

Our build solution relay on Maven with a nexus server and Jenkins to
autodeploy maven artifacts.

Pros :

\- We can do a single commit across all to update many modules/libs/final
product to fix a issue or implement something new. So tracking a issue fix or
a new feature across all is more easy.

Contras :

\- We can't work like in git, where we can create a branch to develop a new
feature or fix a issue isolated from other changes.

\- Enforces big commits.

Actually I managed to use git svn to map our subversion repository, but I need
to create many small git local repositories to track every
module/lib/product/personalization that work over. Also, I have a few scripts
to update my git local repos (master branch) against the subversion repo. This
allowed me to be more flexible when I work on a new feature or a potential
dangerous change, as I can create local branches, do local commits (and do a
commit squash of they before send to subversion), stash , etc.

I think that with this I grab the best of both ways of working. I can be more
flexible and more productive, as I can change of branch of a
project/module/lib with a few keystrokes, and do local code versioning. At
same time. I only miss doing a commit across modules/libs/products when to
resolve a issue I need to do change across many modules/libs. On this case, I
try to use the same commit message or at least, put the issue number on the
commit message, so make easy track it on fisheye.

------
barries1
We use Perforce (since before DVCSs came along) and follow the monolithic
approach with 100s of projects. We use git and mercurial when interfacing to
clients' repositories and for small, local temporary repos to structure work
that shouldn't clutter our main repository.

The monolithic approach works so smoothly because Perforce makes it trivial to
only check out the directories we're working on. It has huge advantages when
we're working on interdependent projects.

Perforce does have some historical oddities (top level directories are called
"depots" and are slightly different than normal directories), but the ability
to branch, merge, and check out using normal filesystem concepts is a huge
usability boon.

------
_pmf_
I was a proponent of project specific repositories before I actually had to
work with large scale system that consistently handled development in this
way. This has me taught the hard way that managing transitive SVN externals
(project A refers project B as external which refers to project C as external;
I've had a hierarchy of 5 levels) is an incredibly tedious affair (tagging to
get a consistent, pegged state of the main project is now a multi hour process
instead of a single mouse click).

That alone justifies a single repository.

TortoiseSVN has a feature of pegging externals at a specific release when
tagging the main project, but this only works at one level.

------
flagZ
I agree with the author... In the context of startups that are very much in
"discovery" mode (still doing big changes to their product) breaking the
repositories in subparts is just a source of confusion for the tech lead. I
would also argue it is bad for devs because they then need to manage version-
binding between different repos. Many repos approach is only justifiable if
you have many teams with many tech leads each _independently_ doing releases.

------
TheAceOfHearts
How do people do CI with monolithic repositories? Do they run ALL the tests or
is there an easy way around that?

~~~
QuercusMax
At Google, the dependencies for everything are very well specified, so in most
cases the code affected by a single change is quite small. Dependencies are
often specified on a per file basis. It's kind of a pain in the ass, but being
able to know for sure you won't break stuff is very powerful.

Also, they have a ridiculous number of machines dedicated to running automated
tests. My tech lead told me "feel free to add as many tests as long as they're
under 30 seconds each".

I was super skeptical of perforce before starting at Google, but it works
really well, especially in conjunction with the code review tools and
processes

~~~
cpeterso
> Dependencies are often specified on a per file basis. It's kind of a pain in
> the ass, but being able to know for sure you won't break stuff is very
> powerful.

How are the dependencies specified? Is this a language-specific tool?

~~~
gefh
Bazel is the open source version of the internal build tool, it is multi-
language. BUILD files in each directory specify dependencies and more.

------
KB1JWQ
I feel like I'm in the minority here; even my dotfiles live in separate
repositories, managed by vcsh
([https://www.github.com/richih/vcsh](https://www.github.com/richih/vcsh)).

~~~
mirceal
minority as in multiple repositories? eh... I don't think so. I would say a
lot of people, given the choice, would go for this approach.

