
Monorepoize – Bash scripts for creating a monorepo out of smaller repos - deepaksurti
https://github.com/gigamonkey/monorepoize
======
frollo
After working on a monorepo and then the split up repos for the same codebase,
I cannot fathom _why_ somebody would want to take small repos and merge them
in a single repo. The mess and complexity just increase.

~~~
koonsolo
In git I fully agree, and I wonder which company successfully runs a monorepo
in git.

For me, I prefer git submodules, which seem to have the benefit of both
monorepo and separate repo's.

~~~
Kipters
> I wonder which company successfully runs a monorepo in git.

Microsoft

~~~
pjmlp
Microsoft does definitely use git, monorepo I am not so sure.

~~~
wikibob
[https://www.google.com/search?q=microsoft+monorepo](https://www.google.com/search?q=microsoft+monorepo)

[https://devblogs.microsoft.com/bharry/the-largest-git-
repo-o...](https://devblogs.microsoft.com/bharry/the-largest-git-repo-on-the-
planet/)

~~~
pjmlp
That doesn't say that everything related to Windows is developed on the same
repo, just the kernel and several core components.

~~~
foobarian
How did I miss this! Can you imagine someone claiming Windows is going to be
in git as few as 10 years ago? This world never ceases to amaze me. What
happened to SourceSafe?

~~~
oblio
SourceSafe was never used for anything internal major. They had their own
custom source control system, based on Perforce, I believe, and then on TFS
and then they switched to git.

SourceSafe was something inflicted on SMBs :-D

~~~
foobarian
Interesting, didn't realize that. I haven't worked there but knew some people
who did, and they were very proud of dogfooding - I guess at least the OS and
IDEs, though maybe not the VCS.

------
stared
In the last few years, I saw a lot of people using monorepos, and discouraging
using submodules.

Frankly, I don't know why. For small projects, sure - it it an overhead. For
anything larger, I find it useful to encapsulate things - be it installable
packages, or if it is not the case - at least other repositories.

In the last months I split one project into 4 repos
([https://github.com/Quantum-Game/](https://github.com/Quantum-Game/)) and
couldn't be happier about that move. It makes the code cleaner, Pull Requests
more separated, etc.

~~~
edwintorok
Once your code grows and you end up with too many repos you'll be yearning to
get back to monorepos. It allows to make changes in one go instead of having
3-4 PRs that all have to be merged at same time or otherwise the build breaks.

Having separate repos only makes sense if they are maintained by separate
teams with a well defined API in-between that doesn't change often. If the
same team maintains all repos then it is just overhead, especially if you end
up with a 100 repos and you have significantly less than 100 engineers in your
team.

There are also potential build time savings when having a monorepo since you
can do parallel builds across different subdirs without serialising at package
boundaries.

~~~
cryptica
I maintain such a project with many dependencies which I also maintain
separately. Yes it can be a lot of extra work to publish but this argument is
based on the assumption that dependencies need to be changed constantly and
this assumption is wrong. There should be pressure to design dependencies in
such a way that they are modular and don't need to be changed often - This
unpleasantness associated with updating deep dependencies is good and
necessary. These should rarely be updated, if they need to be updated often,
then they were designed incorrectly.

Many of the dependencies and sub-dependencies which I maintain are several
years old. They are small enough and stable enough that they basically never
need to change.

If I need to update them (which is rare), I may need to do a cascading update
of multiple dependents and this can take a while but if I have to do this big
update only once every 6 months, it's totally worth having them in separate
repos.

~~~
beagle3
That depends on the scale of the project, of course - if you have thousands of
engineers as MicroFaceGoog does, then it’a a daily, even hourly, occurrence
for many dependencies.

You are not MicroFaceGood, of course, so you only do that every 6 months.

However, I what way are separate repos nicer for you? Personally, I settled on
a monorepo with a “$dep-devel” branch for every dependency, which I
occasionally have on multiple working trees out at the same time, and which I
feel gives me all the benefit a multirepo could have (whatever they may be, I
can’t find them) while still making tracking, branching and merging across
dependencies trivial.

------
shufeng
I have worked on projects organized both ways with git and my observation is
that regardless of the choices, the right tooling can make the workflows a lot
more fluent.

For separate repos, share code as much as possible through a packaging system
so one does not have to make a lot of refactoring across multiple repos. It
sounds backward but an auto minor version update can ease a lot of merging
pains in a CI environment.

For monorepos, figure out as early as possible what/how things should be
shared and separated. I worked on a project with 20+ services and websites
that form a whole product and each service chooses its own languages and build
systems but shared deployment interfaces for unified service discovery. CI got
tricky as it can be blocked because of an unrelated change. I have not found
the unified commit history super helpful as I only have context on a few of
the services and most of the time I only look at histories with `git log
my/path'.

------
jakub_g
I feel bad about plugging myself into a post to Peter Seibel's work, but if
you're interested in this kind of git preservation, I wrote a script to move a
_subfolder_ between two repos:

[https://github.com/jakub-g/git-move-folder-between-repos-
kee...](https://github.com/jakub-g/git-move-folder-between-repos-keep-history)

(I didn't use it for a while, but it worked for my case a few times in the
past).

------
wodenokoto
Why is there so much code? Both python and bash. What are the edge cases that
it is handling? What niceties does it introduce that you might otherwise miss
out on?

I combined ~10 repos last year use a few one liners and loops.

------
dustinchilson
Having done this migration, I recommend you look into filter-repo,
[https://github.com/newren/git-filter-repo](https://github.com/newren/git-
filter-repo)

I don't remember the specifics but the method used here didn't produce the
results we were looking for when migrating long histories and lots of
branches.

------
zaphar
The single biggest impediment to monorepo's might be jenkins. Monorepos need a
radically different way to do CI/CD and none of the existing tooling does it
well. There is probably a market out there for some tooling to do good
Monorepo CI/CD without requiring hacky scripts and polling.

~~~
neeeeees
This is really a build system issue - ie figuring out which parts of the
(mono)repo are affected by which change. Bazel, for example, depends on
explicit, per-directory, BUILD files, and does a fairly good job of finding
dependants.

~~~
dkryptr
Bazel works great for our pretty simplistic monorepo. My main gripe is that
it's a pretty big pain to add support for custom build requirements. There
have been times where it takes a whole sprint to implement build functionality
because, imo, it's hard to run with unless you've been in the ecosystem for a
while to understand everything.

I'm curious though, am I the only one who feels this way?

------
alexhutcheson
Repo[1] is another tool for this, which is pretty well-tested and widely used.

[1] [https://gerrit.googlesource.com/git-
repo/](https://gerrit.googlesource.com/git-repo/)

------
jtompl
My initial though was: "Oh! This lets me quickly build a repo that merges all
the smaller repos using git submodules!"

And then I see, it doesn't do it, it just merges the repos... Why would you do
that?

Honestly, I don't know why for years git submodules have had such a bad fame.
It works out of the box. You can checkout subrepos but you don't have to if
you don't want to. You can set up separate branches, have separate commits,
CI/CD workflows for each of the repos.

AND you're still able to have all the monorepo advantages: to lock down the
dependencies to an exact version; to let a dev/CI pull all of them at one go.

Why wouldn't you just use git submodules for that?

------
seph-reed
I have a custom script for my monorepo that circumvents sub-modules while
getting most of their benefits.

Basically, on git pull it goes through and finds all the folders with
`.x_git`, renames them to `.git` and pulls. On push it does the inverse. My
submodules are really just copies of one small part of the monorepo, but it
allows me to have a monorepo with some parts of it being public.

So far it's been amazing.. there's no real line between sub-repos and sub-
folders, if I can push a commit that modifies many with a single PR, it's
obvious that monorepo is for internal stuff while the public repo is for
public facing discussions... I really like it.

------
fgeiger
What is the advantage over using `git subtree`?

~~~
leethargo
Git subtree wouldn't preserve hashes of the existing history?

~~~
eithed
You can preserve history. You can also squash it.

~~~
leethargo
Maybe I don't know about git subtree. I guess it can "preserve history" in the
sense of keeping a corresponding new commit for every old commit, but they
wouldn't have the same hash id?

~~~
sgn
They have everything the same with old subtree. From commit message, history,
object ID, commit ID.

git-subtree(1) also allows splitting subtree.

git-subtree(1) uses plumbing git command, which is stable interface, unlike
this one, which uses porcelain command.

IIRC, Git’s maintainer uses git-subtree(1) to merge gitk and git-gui to Git.

------
globular-toast
I'm constantly amazed at how fashions come and go in cycles. Only a few years
ago everyone was vigorously splitting everything up into microservices and now
the monolithic repository is the latest trend.

~~~
juped
Both of those are current trends - microservices in a monorepo - as far as I
can tell. (Not a fan of either.)

~~~
reilly3000
Right, microservices beget monorepos because managing permissions and build
pipelines across 100's of repos isn't pleasant. As mentioned elsewhere in this
thread, coordination is really challenging across multiple repos and versions.

------
jupp0r
From having done this a couple of times in the last few years: make sure you
use the mandatory history rewrite in this process to get rid of history you
don't want. Team christmas party videos that got checked into master (just
kidding) can easily be removed before switching.

------
LockAndLol
It seems like github and gitlab are written for multi-repo work, especially
the CI. Is there a selection of tools to make monorepo life on those services
easier?

------
cryptica
Big corporations are extremely inefficient. Why does everyone want to copy
them?

And why do we use all the bulky tools they produce when there are far simpler,
better and more open alternatives available?

~~~
beagle3
Because it turns out big corporations that use multirepos tend to be even less
efficient (about versioning) than those that use monorepos.

What are those far simpler, better, more open alternatives?

~~~
hocuspocus
> Because it turns out big corporations that use multirepos tend to be even
> less efficient (about versioning) than those that use monorepos.

How do you measure that exactly?

~~~
beagle3
Personally, informal survey of people I know who work at big corps.

Google and Microsoft have both evaluated this internally and reached that
conclusion, with some of the evaluation criteria and conclusions publicly
documented (though I don’t have links available on my phone, google will
likely find them for you)

~~~
hocuspocus
Microsoft doesn't really use a monorepo.

This leaves companies like Google and Facebook that made a decision fairly
early in their existence, set constraints, and then spent several hundreds
engineer-years into developing their own infrastructure and tools to support
that decision.

Does it work? Yes. Is it more efficient than what other companies of similar
scale are doing? I don't believe anyone knows.

~~~
beagle3
From reading articles, I got the impression Windows is one monorepo (but it
doesn’t include Office) and e.g. Visual Stduio is another. So it’s true that
Microsoft doesn’t use a monorepo as a whole, but e.g. just Windows itself is a
codebase significantly larger and more diverse than 99% of companies would
have, and it does have multiple “independents” products - Notepad is an
independent module from Solitaire and they are both independent of essentially
everything else, whereas e.g. File Explorer, the win32 subsystem and the
kernel often need to be updated together for new features.

GoogBook made it early, but re-evaluated. For sure, they found the cost of
switching is higher than any potential benefit. I must say that I have not
found anyone describing benefits other than “git is faster for smaller
repositories” (which was very true but is only slightly true with sparse
checkouts and shallow branches these days) and “I like it that way”.

Yosefk has a convincing article about “if your culture is bad, it doesn’t
matter; if your culture is good, monorepos are technically easier until you
hit some scale wall but then they don’t become harder that multirepo at that
scale” IIRc.

But w.r.t Microsoft I admit I’m not very well informed, thank you for setting
the record straight.

