
Monorepos - luu
https://qafoo.com/resources/presentations/froscon_2015/monorepos
======
bsimpson
Everyone I've ever met who's worked at Twitter _hates_ the monorepo.

They have a single git repo with _way_ more files than that tool was ever
designed to maintain. As such, normally instantaneous commands like status,
log, ls, etc. take an inordinately long amount of time to execute. Apparently
their onboarding process is "have someone copy his working copy to a flash
drive, then copy it from there to your laptop" because git pull would take too
long.

~~~
neilk
This has to be a sick joke. The monorepo at Google was the bane of my
existence, and everyone else that I knew.

I don't know what it's like now, but at the time Google was on svn, and there
had to be a specially hacked version of it. It used a home-grown internal
requirements system, to pull down the tree in a sparse way. And to accomplish
that you had to re-specify all your dependencies in these special text files
(because specifying deps in two places is obviously better.) And then the tool
would spend a lifetime figuring it all out, and then you would still download
like 35% of the repo anyway, because the nature of giant repos is to get
tangled together.

With languages like PHP it might not be so bad, as you typically are editing
one file at a time. But most of Google is Java, and most people like using
IDEs for that, so it was a disaster. An IDE likes to touch everything all the
time, to re-index and re-compile. And, fun fact: at the time Google
recommended you store your entire home directory on an NFS network drive, for
security. Everyone quietly ignored that.

It was better in one way: everyone could see everyone else's source, re-using
libraries was more common, and branching and contributing patches was more
common. It prevented people from hoarding their source code, and just throwing
.jars or .so's over the wall. But I think the Github era has shown that this
did not require everything being in one giant repo! I mean, at the time,
Google had Codesearch and other great tools; even though git wasn't super
common at the time, this could have been solved in other ways.

My current company is in the process of breaking up their monorepo, even
though our codebase is nowhere near the same size. Lately I get to taunt my
co-workers because I created a new project in the new broken-up way. All the
tests on my micro-repo pass in 5 seconds, and it takes many minutes for the
big one.

~~~
neilk
Missed the edit window, but of course it was p4 and not svn. Small blessings.

------
dlitz
> Facebook, Google and Twitter

Aren't these the companies all terrible at taking outside contributions,
except for the projects that they don't manage in their monorepos?

~~~
falcolas
Based on some brief experience at Oracle, accepting outside contributions is a
legal nightmare - it is frequently less expensive to do a clean-room re-
implementation of a feature than go through the lawyers and contract
negotiations now and in the future to get code ownership rights worked out.

~~~
mcs_
agree with this. I think monolithic repos are part of legal strategy in
enterprise.

------
fsloth
The key is enabling maximum developer leverage - and one aspect of this is to
get rid of as many systems errors and inefficiencies as possible.

I think build systems and internal code management schemes are large
interconnected systems. It feels fairly pointless just to expose one facet of
the entire systems since the total system encompasses all of the individual
entities dealing with the codebase - including developers, platform - the
whole works.

For example, I see no overhead in my daily work from the fact that our
codebase is split to several libraries with dependencies through explicit
built release versions. On the contrary,there monorepository would be just
horrible.

So, before saying X is great one should firsr define what type of software is
built, what is the specific toolchain, what are the specific configurations,
how many developers modify the same codebase frequently, and so on.

~~~
beberlei
hi, author of those slides here, I actually mention this during the talk, that
the talk is over-emphesizing the awesome point to take a point across. It
definately is not a technique to use everywhere, so I agree with you.

It works for some kind of projects and i was motiviated for this talk, because
I saw so many projects using manyrepos in cases where it may not fit very
well.

~~~
fsloth
Hi, yeah, sure, effective communication requires simplification.

A few points got me concerned in the slides - namely, concerning emphasizing
how _easy_ this makes to change things. I would consider that it is not a
positive thing that adding a library is cheap and easy, and I find that
component versioning can be bypassed even more so.

It all sounds like grievous local greedy optimization without a view of the
entire lifecycle of the codebase. The slides say _change_ is the only
constant. I would claim that unless the change is backed by good and strong
architecture - which throwing all architectural firewalls away does not
support - then the end result will be a mess of spaghetti the older the
codebase gets.

But, I may just be too much set in _my ways_ of coding and I respect the fact
that these may be non-issues in the particular domain these slides address.

------
StavrosK
Is it really a single big repo for the whole organization? What happens if you
want to deploy your code on servers? Do you have to upload all the mobile app
code and history as well? Doesn't the history get very polluted this way?

~~~
NickNameNick
Are you really using a checkout on a production server to install and
configure your software?

This feels like a "Doctor, doctor - it hurts when I stick a spoon in my eye"
moment - "Don't do that!"

Build tools and build servers exist for a reason.

Subversion has "svn export" if you insist on working that way. git can git
clone repos --depth 1; rm -rf repos/.git if you really, really must.

And what you are calling polluted history, they are calling efficient
collaboration.

~~~
StavrosK
> Build tools and build servers exist for a reason.

What's the build server going to do? Copy the Python code from the repo to the
other server? It doesn't seem like setting up a build server just to do "git
checkout; rm .git" is worth the effort.

~~~
NickNameNick
I mostly work with java, so the build server is doing a clean compilation, to
ensure the code word outside my dev environment. Pulling in known versions of
dependencies, some packaging, unit and integration tests etc. Mostly using
Maven.

Whilst some of that is somewhat specific to java, I'd still want a build
server to ensure I hadn't managed to pollute my dev environment with and
undeclared dependency before I pushed my code to production. And I'd still
want to run at least some integration tests, even for the simplest project.

And I really REALLY don't want my internet facing servers to be able to reach
into my internal network, which means a checkout simply isn't going to work
anyway.

Lastly I don't want my code on those servers either. You can set up .htaccess
or equivalent to prevent access to .svn or .git (or dotfiles in general..) but
those files really shouldn't have been there in the first place.

~~~
StavrosK
> I'd still want a build server to ensure I hadn't managed to pollute my dev
> environment with and undeclared dependency before I pushed my code to
> production.

In my case, the production server is the build server. You can't have dev
dependencies there, because you don't develop there.

> And I'd still want to run at least some integration tests, even for the
> simplest project.

Integration tests run on the dev or CI server, in my case.

> And I really REALLY don't want my internet facing servers to be able to
> reach into my internal network, which means a checkout simply isn't going to
> work anyway.

The code servers aren't in the internal network, in my case. They're already
internet-facing, since I need to pull from them.

> Lastly I don't want my code on those servers either. You can set up
> .htaccess or equivalent to prevent access to .svn or .git (or dotfiles in
> general..) but those files really shouldn't have been there in the first
> place.

In my case, if my code isn't on the server, the server doesn't run. There's no
need to set up .htaccess, since the HTTP server doesn't serve anything off the
filesystem.

~~~
NickNameNick
>> I'd still want a build server to ensure I hadn't managed to pollute my dev
environment with and undeclared dependency before I pushed my code to
production. >In my case, the production server is the build server. You can't
have dev dependencies there, because you don't develop there.

The problem is not that dev dependencies exist, the problem is that they might
be undeclared or unrecorded. I have imagemagic on my dev machine, but if it
isn't installed as part of the server setup script, then it won't be on the
production server. The tests might run fine on a dev server that happens to
have imagemagic on it, but fail in production.

>Integration tests run on the dev

If you run your integration tests in your dev environment, you might not catch
some on the fly configuration changes or utility installations - undeclared
dependencies.

~~~
StavrosK
I'm using "dev environment" and "dev server" distinctly. The dev environment
is whatever I have on my computer, the dev server is a server that mirrors
production. If there's a dependency on your dev environment that isn't on
production, you'll catch it when you deploy to one of the dev servers. Same
with the integration tests.

------
zokier
> Require Trunk-Based Development

This I think needs bit of expanding. Does he mean in contrast to feature-
branches? Why monorepos would not be compatible with feature-branches?

~~~
shoo
Not to answer your questions directly, but Paul Hammant has written a bunch
about this over the years:

[http://paulhammant.com/2013/04/05/what-is-trunk-based-
develo...](http://paulhammant.com/2013/04/05/what-is-trunk-based-development/)

[http://paulhammant.com/2013/05/06/googles-scaled-trunk-
based...](http://paulhammant.com/2013/05/06/googles-scaled-trunk-based-
development/)

[http://paulhammant.com/2014/04/03/microsofts-trunk-based-
dev...](http://paulhammant.com/2014/04/03/microsofts-trunk-based-development/)

[http://paulhammant.com/2014/01/06/googlers-subset-their-
trun...](http://paulhammant.com/2014/01/06/googlers-subset-their-trunk/)

[https://dzone.com/articles/legacy-app-
rejuvenation](https://dzone.com/articles/legacy-app-rejuvenation)

~~~
zokier
I just read the first link (adding the others to my reading backlog), based on
it it seems like trunk based development is less of a _development_ model and
more of a _release_ model. Especially it does not seem to conflict directly
with feature branch model.

~~~
shoo
Perhaps partly. But Paul also mentions "Developers do not break the build with
any commit" \-- that places a pretty heavy constraint upon regular development
outside of releases.

See also the so-called "not rocket science rule of software engineering":
[https://graydon.livejournal.com/186550.html](https://graydon.livejournal.com/186550.html)

~~~
speedkills
See also got-dmz or atlassian bamboo's gatekeeper feature. Team city has
something similar too. Surprised the guy who wrote bors hadn't seen more
examples, I have seen the idea come up over and over again (and strongly
support it).

