
Case Study: Monolithic Repository at Google (2018) [pdf] - svat
https://people.engr.ncsu.edu/ermurph3/papers/seip18.pdf
======
svat
This paper from last year surveys Google “engineers who have experience with
both monolithic repos and multiple, per-project repos” to see what exactly
they like and dislike about monorepos. The results match my experience, and I
find the paper well-written (especially like the “Threats to Validity” section
that says what biases may have affected the results; more papers and arguments
should have them IMO).

Older reading on monorepos that I'm aware of:

• Dan Luu, _Advantages of monorepos_ (2015)
[http://danluu.com/monorepo/](http://danluu.com/monorepo/)

• _Why Google Stores Billions of Lines of Code in a Single Repository_ (2015):
[https://cacm.acm.org/magazines/2016/7/204032/fulltext](https://cacm.acm.org/magazines/2016/7/204032/fulltext)
(HTML) / [https://cacm.acm.org/magazines/2016/7/204032-why-google-
stor...](https://cacm.acm.org/magazines/2016/7/204032-why-google-stores-
billions-of-lines-of-code-in-a-single-repository/pdf) (PDF) /
[https://www.youtube.com/watch?v=W71BTkUbdqE](https://www.youtube.com/watch?v=W71BTkUbdqE)
(video)

This paper is from the “Software Engineering in Practice” track of the ICSE
2018 conference:

• [https://www.icse2018.org/track/icse-2018-Software-
Engineerin...](https://www.icse2018.org/track/icse-2018-Software-Engineering-
in-Practice)

~~~
svat
What engineers like the most:

• “I can search across our company's source code”

Interesting point: “Reduction of cognitive load” and “available developer
tools” are cited as advantages of both monorepos and multi-repos.

~~~
reacweb
Yes. There are different kinds of programmers, different way of thinking. Just
as there are personality tests, there could be tests to identify what type of
developer you are to select a set of tools and configurations that are better
matched.

~~~
0xdeadbeefbabe
Would it be naughty (legally) to discriminate based on the results of this
test?

------
01100011
This topic is going to keep coming up because there isn't a single correct
answer for all situations. It's like arguing about which editor is better.

I have my opinions, but at the end of the day you need to choose what is right
for your team and your project. You also should be open to new ideas and
looking for ways to improve your current development processes and tooling. At
the end of the day, version control is a means to an end and shouldn't be
something that consumes a significant amount of your developers' time.

~~~
ithkuil
I wonder if there is some fertile ground for tooling that can bridge the gap
so you don't have to make and actual binding choice when you setup your
project. For example tooling that federates multiple repos in a way that
preserves most or all the advantages of the monorepos.

------
shouldnt_be
For a company like google or facebook where everything is done "in house" this
could be feasible. They had to create a lot of tooling to accommodate theirs
engineers. But for a smaller company using tools like git, intellij, VScode,
jenkins, etc and the tools for deploying your application doesn't the size and
the complexity (number of files and their structure) eventually becomes a
problem ? What's the size of their repo ? And at which point it becomes too
much for most software to handle ?

------
lassejansen
So how does it work if a library owner changes something in his library—do
they have to update all projects that depend on it and then push this as a
single commit?

~~~
svat
Generally it's on the library owner to update all projects that depend on it,
i.e. not break their tests, but this need not be done as a single commit. More
concretely:

If the change is internal to the library, i.e. does not require the projects
that depend on it to change the code, then the library owner simply tests that
none of the affected tests break (globally), and updates their library.

If the change requires dependent projects to change their code, then some way
is first figured out of having code that would work with both versions of the
library, then updating all the callers individually (e.g. in a separate
mostly-automated change for each project to review), then we're in the first
situation above.

“Old APIs can be removed with confidence, because it can be proven that all
callers have been migrated to new APIs.” (from the older article)

------
kureikain
I myself love monolithic. But the CI is a mess.

I'm using Jenkins. So right now I don't have a way to force build specific
directory :(. It build entire thing :(. Anyone knows how to solve these
problem? I have a Rails app, Go services, iOS, Android all in same repo..

It's interesting that we're moving into a microservice architecture but we try
to consolidate the code base into one.

~~~
oweiler
Wouldn't git modules solve your problem? You'd have to have multiple repos for
each service though.

~~~
ses1984
IMO this is the worst of both worlds.

~~~
shouldnt_be
Nice explanation

------
ridiculous_fish
During my 18 months at Google circa 2015, I used the following repos: google3,
Android, ChromeOS, and Chrome browser. 1 is 4 for large values of 1, after
all.

One telling question is whether there have been any efforts to unify these
repos. Monorepos can come about passively, through uncontrolled growth. But if
you truly believe in the monorepo, surely you'll knock down the barriers and
unify the distinct repos. Has Google done that?

If they've actively chosen not to, why not? Is Android separate from google3
for hysterical raisins, or on sound principles, or some mix? Is there a cross
platform repo for shared code?

~~~
tylerl
Yeah no. It's because the development model is part of the product for things
like chromium and Android.

You can add to your list a number of other codebases; kubernetes has it's
source of truth at GitHub, for example.

These all add complexity, and not in a good way. With enough small repos you
lose the ability to centrally monitor dependencies and make sweeping fixes.
For example it becomes difficult to do things like safely patch your entire
codebase all in one go to defend against the next embargoed vulnerability,
where you have weeks or days to deploy a fix to all your products before it
goes public.

~~~
joshuamorton
But k8s does live inside third party. And then there's stuff like tensorflow,
where the source of truth is in the monorepo, but PRs via github are accepted
and merged in.

------
ddtaylor
Is it possible to break this down into a set of generic pros-and-cons?

------
Isamu
I have only good experiences with it but I think it could be unmanageable if
there is no automated integration testing (with reverts.) That is, everyone's
submits are subject to a build (if the build fails, your submit is
automatically reverted) and a battery of tests (if a test fails, your submit
is automatically reverted). So you have to be able to submit code changes and
test changes as one whole change list.

------
tozeur
I wanted to see an architecture diagram :’(

