
Monorepo or Multirepo? Role-Based Repositories - optician_owl
https://blog.7mind.io/role-based-repositories.html
======
Cedricgc
The main driver of success in either model is in the tooling and practices
invested in it to make it work in an organization. Google is successful with
their monorepo because they have invested in building (blaze), source control
(piper, code search), and commit to always developing on HEAD. Multirepo is
currently easier for most companies because most public tooling (git, package
manager) is built around multirepos. One place I see multirepos fall over is
awful dependency management practices internally and in open source. Many
dependencies quickly become outdated and are not updated in cadence, slowing
down writers and consumers. Better tooling can help here but an organization
needs real discipline to stay on top of things.

~~~
username90
I wonder why nobody have made a good public monorepo offering similar to what
Google have internally. Would probably be a hit at many companies since it
fixes so many issues related to working in very large teams.

~~~
dmoy
At large enough scale, it causes a lot of problems and breaks like every other
dev tool requiring a lot of work to get it back together.

That said, there are some open source pieces to help. Facebook open sourced
their mercurial stuffs so you can get version control at scale (and before
then you just use perforce). Google open sourced bazel. Google open sourced
some parts of the underlying infra behind code search, but not enough to
really work properly. And of course lower level there's a plethora of
reasonable db offerings, etc.

It would still require a lot of glue though.

~~~
username90
It is just that Googles tooling around code works really well together. Code
search to view code and directory based history so you aren't swamped by
others commits, tap to run all unit tests all the time but with sectioned
projects so you don't run all tests on every presubmit, sponge to gather every
test log ever (even for the tests you run locally) so you can link full test
logs to coworkers when you have problems (also note that the logs source file
links are actual links into code search), critique for easy versioned code
reviews where you can diff code between any sets of comments so you can see
its evolution and running presubmit checks and sponge links for tests, blaze
to make a structured directory dependency management system to make partial
checkouts and distributed cached builds work well.

I'd like a set of tightly coupled tool like that working outside of Google,
but I guess it might be a just a dream, it is a bit too big of a project.

~~~
djhaskin987
> directory based commits This whole thread is interesting because subversion
> is exactly this, and works with large code bases. We used to have these told
> and we moved away from them.

~~~
dmoy
Subversion works with large code bases, but not crazy massive codebases.

The version control is just the tip of the iceberg though, and is largely a
solved problem: git or subversion, then perforce or straight to Facebook's
mercurial stuff.

It's the other tooling that breaks on large enough mono repos that you have
hard time with publicly. Searching your code takes awhile. Cross references
don't go wide enough or take too long to generate. Builds take too long.
Refactoring tools either take too long or don't support the repo of
sufficiently large size.

Mostly this isn't a problem though, because few repos are actually large
enough to cause real problems.

------
01100011
Monorepo shortcomings 1 and 2 seem like bullshit to me. Perforce, the popular
monorepo at most companies I've worked at, supports access control. Monorepos
do not prevent you from segmenting your code into modules and pushing
binary/source packages into source control so that builds can avoid compiling
everything(TiVo used to do this, and it worked well when you got the hang of
it).

I feel like these debates are often fueled by false arguments. Either way you
go, you're going to want to build support tools and processes to tailor your
VCS to your local needs.

~~~
jsnell
VCS access control are the wrong tool for solving the "people use code they
shouldn't" complaint.

First, VCS ACLs will massively reduce the benefits you're supposed to get from
a monorepo. How will you do global refactors in that kind of a situation? How
does a maintainer of a library figure out how the clients are actually using
it? (The clients must have visibility into the library, but the opposite it
unlikely to be true.)

Second, let's say that I maintain a library with a supported public interface
that's implemented in terms of an internal interface that nobody's supposed to
use. How will VCS ACLs allow me to hide the implementation but not the
interface? When they kick off a build, the compiler needs to be able to read
the implementation parts to actually build the library. It can't be that the
clients have access to read the headers but then link against a pre-build
binary blob. At that point you don't have a monorepo, you've got multirepos
stored in a monorepo.

The actual solution are build system ACLs. Not ACLs for people, but ACLs for
projects. Anyone can read the code, but you can say "only source files in
directory X can include this header" or "only build files in directory Y can
link against this object file".

~~~
01100011
VCS ACLs can allow for read-only access. You can also split public interfaces
into their own header. If you want the maintainer of a library to be able to
refactor clients of the library, then you have to grant them access to the
client code. How does a multirepo solve this issue?

> How will VCS ACLs allow me to hide the implementation but not the interface?

If you don't give people access to the code, they can't build it. So what?
Publish pre-built binaries from your CI system back to source control.

> At that point you don't have a monorepo, you've got multirepos stored in a
> monorepo.

I think it's a spectrum. It would be stupid to dogmatically stick to either
extreme. You modify things in a pragmatic fashion to solve the problems you're
facing. In my experience, starting with a monorepo and making exceptions as
needed has worked better than the alternative.

Your post sounds similar to a lot of the multi/mono repo discussions. You've
focused on one problem and one way to solve that problem without considering
that there are many ways to work around it. Neither approach is going to be
pain-free and both require tooling for special scenarios.

------
pricechild
I regularly join projects where someone has decided to place the project's
code in half a dozen different repositories.

Even though it's one project.

Even though they refuse to allow a release of a single component - it must all
be released together without forwards/backwards compatibility.

I think most of of the time, the mono/multi debate is spoiled by people who
feel they can have their cake and eat it too.

~~~
megous
I think that whether to use mono/multi repo depends on whether you're willing
to dump money into updating everyting at once, or not. If not, monorepos are
really a big hindrance. It's better to split on the project boundary (things
that may have different development paces), and use git worktree for having
different versions of libraries checked out for building/bundling.

It works fairly nicely with meson, as you can simply checkout a worktree of a
library into a subprojects directory, and let individual projects move at
their own paces even if you don't do releases for the libraries/common code.

It's not really clear why having to update every consumer in sync with library
changes is beneficial. Some consumers might have been just experiments, or one
off projects, that don't have that much ongoing value to constantly port them
to new versions of the common code. But you may still want to get back to them
in the future, and want to be able to build them at any time.

It's just easier to manage all this with individual repos.

~~~
pricechild
> whether you're willing to dump money into updating everyting at once

I think the majority of projects in this world only update everything at once.
They haven't investing in testing, sensible api's and testing to allow
updating small pieces of their solution.

From my experience, I also think the majority of people who think they have a
library and need multi repos to deal with that, don't have a library.

To further clarify, one user of your library means you could stop pretending
you have a library and avoid the pain.

I don't mean to insist these problems do not exist, I simply don't think many
people have them.

~~~
megous
Project != company. Project != consultancy.

Monorepo just didn't work for me. I have ~10 web projects for different
customers + my personal projects that use various versions of some common
code.

It doesn't make any financial sense to evolve my common code by updating all
the customer's code for free when they're not even asking for it. So on this
level it doesn't work.

Even within a single company with just two devs and around 90 repos for the
main product and plugins, it was hard to justify making a mono-repo with plain
git, because plugins and the main app had different release schedules,
priorities, so it never really happened that it was economical to port all
plugins right away to the new version of the main app on every change.

I still think going multi vs mono is a business decision, rather than a
technical one. You'll have to have special tooling for either case, just a
different one.

~~~
pricechild
Yep, I agree and that all makes sense.

Don't you find those projects within the company wanting multi-repos also?

~~~
megous
They have multi-repos. There was some thought given to an idea to transition
to mono-repo, but it never happened.

------
mattbillenstein
One of the things I've done at a couple companies now is flatten multi into
mono - it just simplifies everything, it's all deployed as one unit, so easier
to track and do changes across different parts of the code base in unison.

I have typically left mobile iOS/Android in separate repos however - they have
a different deployment cadence, so you need to manage breaking changes
differently anyway.

------
djhaskin987
There's a lot of people on here defending their current workflow, whatever
that is.

I for one find it refreshing that people are willing to think about different
workflows, even if they are different.

It feels like what is described is a cross between a good language package
manager and git submodules. It's an interesting space to explore, because a
lot of nice things come out of submodules, but it's not a proper package
manager.

A _proper_ dependency manager that puts code in a workspace and manages it as
you are working on it in a non clunky way is not something we have right now
and may be a game changer. Thanks for sharing to the authors.

------
scarmig
I'm curious: how would most people here define monorepo vs multirepo?

On the surface, most people seem to think of a monorepo as a source control
management system that exposes all source code as if it's a traditional
filesystem accessed through a single point of entry. Multirepo, in contrast,
seems to be about multiple points of entry.

But that's a superficial and uninteresting distinction. All the hard parts of
managing code remain for both and, for a sufficiently large organization,
you'll still need multiple dedicated teams to build tooling to make either
work at scale. All the pros listed in the article need a team to make them
work for either approach, and all the cons are a sign that you need a team to
be make up for that deficiency for either approach.

Aesthetically a single point of entry appeals to me, in that it allows for a
more consistent interface to code. But I'd go for good tooling above that in a
heartbeat.

~~~
givehimagun
I've shifted to focusing on repo == team. If your organizational structure is
to have many little teams that are independent from each other, then you build
your source code management to reflect that.

I built my engineering staff to focus on any of the initiatives that my boss
hands to me (changes week/week) - so we went monorepo so we could move between
those projects/apps/programs quickly.

We knew that we didn't want to pay the maintenance cost just because
microservices/multirepo was a buzzword AND we wanted future ventures to get
faster (example: we solved identity for authn/authz once and now every app
that needs it after can leverage it and we can upgrade identity and all of its
consumers in one pull request).

~~~
huherto
This is my conclusion too. Team becomes fundamental entity and
projects/products belong to teams. Everything the team produces stays on the
team's repo. So you always know which team owns what. It is also easier to
review, supervise and clean up.

------
ChristianBundy
Why not both? I've been using
[https://github.com/mateodelnorte/meta](https://github.com/mateodelnorte/meta)
and having a great time so far, it's just that GitHub (and others) don't have
a simple way to bundle multi-repo commits in pull requests.

~~~
punkdata
I agree with Christian... Why not both? Lots of teams I interact with have
great reasons for a monorepo which they admit requires some work in tooling
and processes and claim they're successfully releasing software faster with
less effort if their code lived in disparate repos. I believe teams must
choose the appropriate patterns that work best for their architectures and
situations.

------
jayd16
Whats the current state of git submodules? It seems like you could get some of
the benefits of mono-repos in that you can reference dependency projects
directly like a mono-repo. You can, in theory, treat many projects like a
single code base.

I don't see it used very often though. Why not?

~~~
avisser
Even with submodules it's still a PR per repo. Global, atomic changes are
super powerful.

~~~
likeliv
There is another tool called git-subtree that should solve these problems, I
think. But I've never seen it in use

~~~
Skunkleton
git subtree is just a wrapper around subtree merges, I don't think they solve
the same problem as submodules.

------
kerng
In one of my first jobs like 15 years ago at a large software company we had
just moved to a monorepo.

It was introduced to counterbalance what many saw as a big mess. Result was a
lot of process being introduced which slowed everything down, but that was
probably necessary at that stage. To my knowledge the company keeps switching
back and forth- but new projects that need to move fast typically are done
independently still.

~~~
Stevvo
I would expect you need really good training in place to make it work. e.g.
Microsoft uses a git monorepo for the Windows codebase; obviously that is not
something you could just come in on and do a "git clone" as you might on a
small project.

------
philwelch
I bet you could address this with a third approach: metarepo. The metarepo is
a repo that uses sub modules to combine your multi repo ecosystem into a
simulated monorepo. The metarepo is what ultimately gets built and deployed—no
versioned dependencies to manage. Local development usually happens at the
multirepo level, and the metarepo is managed mostly via CI.

~~~
amelius
Good idea. But what gets checked in at the metarepo level? The names of the
branches that are checked out in the submodules under it?

Can you have two metarepos, each with its own set of checked-out branches of
the same original submodules?

~~~
philwelch
I believe submodules track specific branches.

------
intellix
For TypeScript/JS projects I think NX CLI is pretty awesome as it handles
multiple frameworks

------
ledneb
So, in a monorepo world, isn't it often that you _have_ to deploy components
together, rather than "it's easy to"? How are services deployed only when
there has been a change affecting said service? Presumably monorepo orgs
aren't redeploying their entire infrastructure each time there's a commit? Are
we taking writing scripts which trigger further pipelines if they detect
change in a path or its dependencies? How about versioning - does monorepo
work with semver? Does it break git tags given you have to tag everything?

So many questions, but they're all about identifying change and only deploying
change...

~~~
peterwwillis
Each service has its own code directory, and there's one big "shared code"
directory. When you build one service, you copy the shared code directory and
the service-specific directory, move to the service-specific folder, run your
build process. The artifact that results is that one service. Tagging becomes
"<service>-<semver>" instead of just "<semver>". You may start out with
deploying all the services every time (actually hugely simplifies building,
testing, and deploying), but then later you break out the infra into separate
services the same way as the builds.

------
bechampion
I worked in a big bank in the UK using monorepo "cuz Google uses it", error
number 1, your not Google. The clones were gigantic, Jenkins would timeout
cloning the whole project when all it needed was a bunch of files. Merge
conflicts all over the place, but the best part, we had scripts on our
pipeline literally removing folders after cloning the repo to avoid automatic
inclusions of libs etc. In my opinion separation of boundaries is one of those
things that should t be mess with.

~~~
kyrra
Monorepos with Git don't play together nicely. Perforce is key if you have
lots of devs on a monorepo.

------
jayd16
I don't understand the isolation difference. You can hide, protect and branch
code in a monorepo so why is isolation a concern?

~~~
swsieber
It depends on which VCS you use. Git for example, doesn't have any native
support for hiding or protecting code in particular folders within the
repository.

~~~
jayd16
Hmm seems unfair to judge monorepos on what git is capable of. I hate perforce
but it accomplishes this easily.

------
edoceo
We do multi-repo. It makes it a little slower, cause we have to get commits
into our common libs repos (there are two) before we can do app/product repos
updated. Using the environment package manager (composer, nom, yarn) rather
than git-sub-module helps a lot.

------
solarengineer
GoCD provides “fan in” which supports monorepos

[https://docs.gocd.org/current/advanced_usage/fan_in.html](https://docs.gocd.org/current/advanced_usage/fan_in.html)

------
akhilcacharya
Amazon does multi-repo. I don't see what the problem or debate over this is.
We seem to be handling it pretty fine despite a massive-scale SOA
architecture.

~~~
catalogia
When I was there, they were migrating away from perforce because they could no
longer scale perforce fast enough to meet demand. I've not seen this talked
about much outside of Amazon.

It was also a huge day-to-day quality of life improvement for the users (the
developers.) There are UX problems with git, but they pale in comparison to
the UX problems with perforce which is _truly_ unpleasant software.

~~~
blandflakes
The Alexa division migrated aggressively to git as soon as it was available
and nobody publicly voiced any regret about losing perforce.

~~~
catalogia
Several of the people I worked with at Amazon were skeptical of git, at least
initially. Some people prefer tools they already know, prefer the routine and
habit over learning a new tool. And I totally respect that by the way, git's
UX is superior in mainly aesthetic ways, in terms of tactical productivity
it's more of a wash. I still think git has the edge, but there is nothing to
say a seasoned developer who's used perforce for years isn't being
exceptionally productive with it.

Nearly everybody I talked to about it eventually came around to prefer git
though. Once you've been forced to swallow the bitter pill of learning
something new and changing your workflow, I think the advantages begin to
shine through.

On the other hand, maybe I'm just biased because I was proficient in git years
before I was ever exposed to perforce. So maybe it was myself who was balking
at learning something new, and that's why I was so relieved when my team
switched to git. But I do genuinely believe that git has a superior UX.

~~~
blandflakes
Our team _did_ have the complicating factor that we were doing private builds
- which mean that our source code was in a private subversion repository and
then perforce was used to track the brazil primitives and private build
decryption key stuff.

Once git support was good enough, leadership was very supportive of an en
masse exodus.

I also think my team was pretty junior, which meant they'd never actually
_seen_ perforce, so as you say, moving to git was going back to something
familiar for nearly everybody.

------
sthomas1618
I'm curious about those who use a monorepo with microservices: how do you
solve CD/CI? Is Bazel the only solution?

~~~
peterwwillis
CI and CD are more workflows than tools. It doesn't really matter what your
repo setup is, you just adapt your workflows to it. On one project I work on
we use a monorepo for a handful of microservices. We use standard GitHub flow,
no special repo consideration for the CI.

For CD, we have scripts that ask what service you want to build, and they
specifically package that service using the set of files & processes dedicated
to that service. The build generates a versioned artifact. After that, repo
doesn't matter at all, we're just moving service artifacts around.

------
nhumrich
The cons to multi repo are all anti patterns for microservices anyway. If
you're doing microservices you shouldn't have build dependencies on other
projects. The should only call eachother at a network level.

~~~
likeliv
Calling eachother at network level is still a dependency. (And even a build
dependency if you use something like protobuff or other protocol description
files)

~~~
nhumrich
A network dependency is not a build dependency. Protobuf files should be copy
pasted, not referenced directly. Saying you need a single repo to build
correctly for your network dependencies is like saying you cant use a third
party system (aws, etc.) Without having a link to their code base.

~~~
likeliv
Point is that when you want to do a change in the "API" (or call it
"protocol"), you need to touch the different repositories and coordinate to
use the right versions together.

About the copy/paste of protobuf files, it works but makes it more difficult
to keep them in sync.

And I did not say you need a single repo. I'm saying the stated disadvantages
of multi repo are real.

------
gravypod
I am a big fan of monorepos and I've worked on a few open source projects that
have used mutli-repos and at some places that used a hybrid approach. I agree
with some of the ideas this article has put into writing but I wanted to
provide some pointers from my experience.

Some background: at my current place of employment I have 28 services, should
be 30 in the next few days, and so I think my use current case is very
representative of a small to medium monorepo. At my last job right before this
one we had sort of a monorepo that was strung together with git submodules
although each project was developed independently with it's own git repo+ci.

> Isolation: monorepo does not prevent engineers from using the code they
> should not use.

Your version control software does not prevent or allow your developers from
using code they should not use. It is trivial to check in code that does
something like this:

    
    
        import "~/company/other-repo/source-file.lang" as not_mine;
    

Or even worse in something like golang:

    
    
        import "github.com/company/internal-tool/..."
    

Because of this it is my opinion that it is impossible to rely solely on your
source control to hide internal packages/source/deps from external consumers.
That responsibility, of preventing touching deps, has to be pushed upwards in
the stack either to developers or tooling.

> So, big projects in a monorepo have a tendency to degrade and become
> unmaintainable over time. It’s possible to enforce a strict code review and
> artifact layouting preventing such degradation but it’s not easy and it’s
> time consuming,

I think my above example demonstrates this is something that is not unique to
monorepos. The level of abstraction that VCS' operate at is not ideal for
code-level dependency concepts.

> Build time

Most build systems support caching. Some even do it transparently. Docker's
implementation of build caching has, in my experience, been lovely to work
with.

\---- Multi repo section ----

> In case your release flow involves several components - it’s always a real
> pain.

This is doubly or tripply true for monorepos because the barrier of cross-
service refactors is so low. Due to a lack of good rollout tooling most people
with monorepos release everything together. I know my CI essentially does
`kubectl apply -f`. Unfortunately, due to the nature of distributed compute,
you have no guarantee that new versions of your application won't be seen by
old versions (especially so of 0-downtime deployments like blue-green/red-
black/canary). Because of this you constantly need to be vigilant of backwards
compatibility. Version N of your internal protocol must be N-1 compliant to
support zero-downtime deployments. This is something that new members of
monorepo have a huge huge difficulty working with.

> It allows people to quickly build independent components,

To start building a new component all one must do is `mkdir projects/<product
area>/<project name>`. This is a far lower overhead than most multi-repo
situations. You can even `rm -r projects/<product area>/<thing you are
replacing>` to completely kill off legacy components so they don't distract
you while you work. The roll out of this new tool whet poorly? Just revert to
the commit before hand and redeploy and your old project's directories,
configs, etc are all in repo. Git repos present an unversioned state that
inherently can never be removed f you want a source tree that is green and
deployable at any commit hash.

\--- Their solution ---

I accomplish the same tasks as a directory structure. As mentioned before if
you just put your code into a `projects/<product area>/<project>` structure
you can get the same effect they are going for by minimizing the directory
layout in your IDE's file view. The performance hit from having the entire
code base checked out is very much a non-issue for >99% of us. Very very few
of us have code bases larger than the linux mainline and git works fine for
their use cases.

Also, any monorepo build tool like Bazel, Buck, Pants, and Please.build will
perform adequately for the most common repo sizes and will provide you
hermetic, cached, and correct builds. These tools also already exist and have
a community around them.

[0] - [https://docs.microsoft.com/en-
us/azure/devops/learn/git/git-...](https://docs.microsoft.com/en-
us/azure/devops/learn/git/git-at-scale)

