Single repo enables company-wide code sharing and refactoring with the least amount of friction. E.g. A programmer can modify "foobar(x,y,z)" to "foobar2(z,y,x)" and update all the corporate-wide client code that calls it. This ensures all teams are always using the same code instead of Team A on helperutils_v5.2.3 and Team B on helperutils_v6.1.0 which are incompatible with each other.
The biggest downside seems to be slow performance for syncing huge repos. This was one of the motivations for Microsofts GVFS for git. It enables sparse downloads.
If there are good arguments for multi-repos that are not related to performance which override the benefits of company-wide consistency of code, it would be good to discuss them. (I'm talking about companies like MS/Google/Facebook and not consulting firms with a different repo for each customer project.)
It does, but primarily it enables the tooling around that to be so much more integrated.
Indexing the repo becomes possible, and fast searches, find-usage, inter-project code review becomes so much more frictionless. Build and testing systems end up having inter-project dependencies all the time. You probably already have them.
It's like the difference between a big database with a hundred tables and a hundred small databases. Go with the the big database every time, until you have reason not to, in which case you have a lot more knowledge about your requirements.
1. Security. It boggles the mind that companies with large source trees would let their entire source tree get cloned by a single developer. Maybe this isn’t really an issue but seems like you’re creating unnecessary surface area when the iOS dev team can walk around with the machine learning groups code.
2. Dev aesthetics / onboarding. There’s a complexity cost to sharing all of that code. If I join a company and the first step is spending (weeks? months?) getting comfortable understanding the universe of this gigantic source tree, instead of being able to safely understand a small piece of it, that can lead to frustration and it’s certainly not going to be productive.
I’ve never worked in a massive, well put together monorepo, so maybe these issues are solved in other ways. But it seems like your point #1 is a trade-off – being able to call code from anywhere in the same repo is powerful, but it’s also capable of breeding a ton of complexity, so please mind the mere mortals who will follow one day.
Most of the code that you run is open source dependencies, like the Python interpreter, GNU C library, and Linux kernel.
You could import everything, including outside dependencies, into a single repo and build an OS/application image from that repo. That is effectively what Google does. But that requires a lot of careful attention to pulling in updates of external code, and requires a particularly excellent build and dependency manager (aka a package manager). It is also huge enough to run into serious performance problems no matter how you try to fix it.
So more practically, your monorepo is just for your internal code. Which means if you ever want to fix a bug in any outside libraries, or make some of your code available as a separate library for the outside world, you need to work with multiple repos.
So why not just be multi-repo from the start? Or rather, have multiple monorepos, one per project or group.
Now that could mean a monorepo, if you're a company with one product that's released as a single unit. Or it could mean multi-repo if you have distinct components with their own release cycles. IMO that comes down to Conway's law in reverse: it's best to structure your components in a way that reflects your organization's communication structures. If you're in constant communication then moving from `foobar` to `foobar2` is a "just do it" scenario; if you're in distinct teams don't talk to each other much then having this happen via an explicit release cycle, deprecation period and so on is better.
Personally, I'd find it a bit frustrating. It can be really interesting to poke around and learn how some internals or backend stuff works.
Over here we use both and make a call based on the project and team that will have to work on it. For a couple of projects we even went through mono->multi->mono (and kept the whole git history without history rewriting thanks to git-write-tree and git-commit-tree, see ).
PS: our history looks like this (yes, multiple starts of history, which as valid a git DAG as anything but throws some GUI tools off):
P1 split --*---*---*------ |
P2 *---*---*---*---*---*-- |
In our case, it went from app, to app with supporting library, the library got extracted into a module and used with multiple apps, tools and new libraries were added that got shared between multiple apps, and pretty soon it felt like everything was sized MxN. We ended up combining all of the source code into one Git repo and stashing the media assets in SVN.
One of the primary driving factors here was the ability to refactor code between modules more easily. Code would show up in an app or two, and then get refactored into a common library.
So before making any siginficant changes one should discuss what the desired tradeoff is, and then measure the continuous change towards that tradeoff.
For instance monorepo is a goal for people who want interacting tools to be build in their corresponding state without need to consult someone else. Can you do that? Check out the repo (for 2-5 hours depending on your code size) and then just hit "make" and "make test" and it runs through? If yes it's a success, if no something went wrong in the process.
PS: There are actually build tools that are language independent, e.g. make, CMake or Maven. You can just use these.
PPS: I'm personally really happy to see that monorepo in itself doesn't resolve the dependency tracking for internal tools. I still believe in the other way around. A good dependency tracking tool can totally replace monorepos.
This is technically true, but I wouldn't inflict on myself and my team the use of Maven if I wasn't using Java or a related language.
Some language ecosystems come with a preferred or natural build system (and related tools), and it's weird to use something else within that ecosystem, or to use standard but cumbersome tools (such as Maven) outside their ecosystem. It's possible in practice, but weird.
I think it's important to distinguish build tools like make from declarative versioned-dependency management tools, which multirepos require.
For example, you can build a multi-language monorepo with either make or CMake, but you can't declare that your repo builds "foo" version 1.2.3 and that it depends on "bar" version 2.3.4.
Maven does that, and it will automatically download the right version of bar for you and make it available, but Maven doesn't work well for C# dlls. You can use NuGet for dlls, but don't try it with JS modules. Use npm to manage JS modules, but don't try it with Perl packages. You can use CPAN for Perl packages, etc.
I am amazed what is done in large organisations sometimes. No one would ever sign off on me doing something like this. How do you convince the number crunchers that its a good thing to build things like this if there is no measurable outcome?
Not trying to be rude or devalue OPs work, but I haven't worked in an organisation that would allow for this in a long while.
1. Larger organizations have bigger budgets they can allocate toward solving problems unique to that organization and the overhead they have built up that is unique to their particular way of working.
2. Most of the organizations that work on this type of thing don't require number cruncher approval. The number crunchers may track and report on these types of efforts but they don't control the decision making. If you are lucky enough to work at one of those be thankful.
It rarely ever is, of course, but don't underestimate the will of a developer to push an interesting project that they know in their hearts will not really benifit the company.
Fuck me it's a nightmare wrapped in a shit show.
Due to the way the build tool generates the app, not a single 'standard' method can be used. Any developer niceties you use can't be (think interactive debugging, code-completion, hot-code-reloading).
I can see why it was done... sorta. Our FE team ripped out their component and standardised development back to their platform norms. But now we have integration issues between front and back, and the BE devs are now subject to a 60 second front-end build each time we change some code and reload the application.
You can split up a git monorepo with splitsh, and/or pull in & out with git subrepo
This gets you the best of both worlds and avoid most of the cons of both, although you do get a new set of cons, like keeping them in sync and ensure tooling works in both places. It's not too bad, though. Make sure you bless one or the other as the source of truth and make the other basically read-only, and add a hook to automatically merge from one to the other.
- NIH, teams re-creating instead of re-using other teams work either due to not knowing it exists or because they feel they cannot control it.
- The gatekeeper: "just ask Bob to give you access", repeat for the next two to four weeks. Once you get really big, source code access is even used as a political tool between teams.
- The internal customers: Team A works on a common framework, teams B, C, D all use it. Good luck getting anyone of them to take a new release more than once in a blue moon, and everyone will be on a different version too so you have to support them all. Fun times.
This is clearly not true
It seems obvious to me that OP has a problem with his team, other than with being able to measure pros/cons of his technologically choosen path
If you thank thats not true the industry would love to see what you've got as its one of the central problems in software development.
(edit: It seems funny to me that many other industries are fine with less than perfect information, yet somehow the software industry can derive absolutely no value from it.)
Lots and lots of other industries struggle with measuring productivity. The medical profession is rife with that problem. I think what might exaggerate the problem in software is we over value the objective decision making. Its our nature to want to quantify and we don't like making subjective decisions even when that is what we are being called to do.
Would you like to back your statement up?
There's half a century of research and practice into this topic. Anyone in the industry _should_ know about this. Anyone who doesn't can find it on a search engine within 30 seconds.
If you're aware of one then a link to an article about it, or even a wikipedia page, would be awesome.
1. Break your workload into small-ish units of work
2. Before starting a unit, assign it a number value based on how complex the task is. 1 is absolute easiest, 2 is twice the work, etc.
3. Compare how many points of work you complete over periods of time
I believe productivity/efficiency in software engineering is something that is felt (i.e you can measure some of its effects but many are a matter of perception, not definitive metrics. like pain.) rather than measured (i.e you can synthesize a definitive metric from a set of known, well-defined data).
Let's imagine that I wrote some test cases. To me, If my piece of code passes those tests, I will consider it of being of X quality. I won't care about any other quality than this X one.
If I write/build/test two different pieces of code, both passing my tests, and happens that I took less time to write the second one of them, then I would consider I was more productive writing the latest.
I think many of us have known software developers who are incredibly fast, but who write "rat's nest" code that's impossible for anyone else to decipher, or which is very difficult to modify when new requirements come along. Quantifying those attributes is incredibly difficult, but as most time in software is actually spent on maintenance, it usually ends up being more important than a simpler assessment of initial output.
I agree with your view regarding maintenance cost, but I don't think that's what's at stake here. We can go further and try to compare the amount of time needed to fix rat's nest with the amount of time needed to write the software all over again. Covering the rat's nest is in many occasions faster (on which I'm considering more productive) than rewriting.
Years ago "function points" were bandied about as a truly objective measure of software output and value. It turned out to be trivially game-able and added 0 value to software development estimation.
There are so many visible as well as hidden factors that attempting to measure any (let alone all) of that is a pipe dream.
Also, by writing the second one you are not accounting for second version effects since it is a reimplementation of something you know, and if someone else develops it for you then you just changed a major variable.
I get that you said "oversimplified", but this is just like saying "thought experiment", yet as soon as you confront that to the real world, it just doesn't realistically fly.
We're paid per hour of work.
Also, on today's market, the first to release wins. I know there are some exceptions but whoever wins the market first has clear advantages.
So why is time not a good metric?
And I didn't account for second version effects, that was not my aim.
Besides, how often do you write code twice just to see which way is better? What you have is a comparative measure that is completely useless for measuring new code, i.e. code which implements a new feature.
"Productivity, of course, is something you determine by looking at the input of an activity and its output. So to measure software productivity you have to measure the output of software development - the reason we can't measure productivity is because we can't measure output."
In an earlier post, I argue that in light of the fact that we can't measure productivity, we should measure job satisfaction instead. https://redfin.engineering/measure-job-satisfaction-instead-...
I like multiple repos because it encourages that: having pieces of code that do not depend on each others, and having more well defined API.
The speed gain of the tooling is just a side benefit.
They wrote a tool called jiri , and track projects in a manifest  repo.
With custom tooling, you can overcome almost any of the differences between the two approaches, enough so that the two approaches can start to look rather similar. There are ways to do atomic refactors across large numbers of repos, with the right tooling, and there are ways to use semver for components inside a monorepo.
How did we get here? An (imo) extreme dedication to "microservice all the things", a CI system which intentionally only allows one build artifact per repo, and a release system that encourages the separation of parts of a service's components to minimize red tape for changes.
At my $DAY_JOB we went with a mostly-monorepo solution (1 main repo and 2 small auxiliary repos), with multiple dependent modules inside it -- after me insisting on it, while the other dev wanted a bazillion of small repos -- we're talking about just a really small dev team (3 devs initially, more later) -- but we still agreed that 1) each subfolder of the monorepo should be as independent as possible and only depend on the minimum subset of other modules it really need (so multirepo-ish behavior in a sense), while at the same time 2) allowing to change things in multiple modules and quickly check if everything still works (we used symlinking between the modules' folders).
I was babysitting our build and tooling for months and fixing more and more exotic edge cases with incompatible transitive dependencies/multiple branches/versioning/sumodules building order etc. (In hindsight, we threw way too much complexity and requirements on an opensource tooling we used than it was able to handle - but we didn't acknowledge that upfront, because at the beginning it looked like it would work fine).
If we went with a multirepo instead, it would have been even worse: all the pains that I described would still be there, plus we'd go crazy with non-atomic refactorings and merges.
IMO starting a project as a multirepo is a madness: things are messy and change too often at the beginning and it's not worth the overhead. Maybe it makes sense to start with a monorepo but structure things in a way to make them splittable in the future, when things stabilize more. But even that is way more costly than expected.
* Vim vs emacs
* Tabs vs spaces
* Static vs dynamic
* Monolith vs microservices
* Functional vs imperative
And a long etc. We're dogmatic folks.
I fall in the multirepo camp.
multirepo is a nightmare for changes spanning multiple repos, but the freedom of using the best tool for the job is much easier to achieve.
or so is my thinking about these from experience.
Its very common to have multiple languages in a monorepo codebase.
Lots of FUD on this topic, and although its good that the author has bought this up, its just adding to even more FUD. I'd go as far as to say the title is clickbait. The end topologies and tooling for monoliths are different. Not necessarily one is better than other, they are just different. There is no magic pill, there are right places/requirements and right people/experience/culture to use each of them.
* Imagine it like: Building a new Ubuntu OS release for every Firefox update. Then work backwards
* Mono-repo needs completely different toolchain like borg/cmake, to find out the whole dependency graph, rebuild only changes and downstreams, avoiding a 2-hour build, avoiding 10gb release artifacts, scoped CI/commit builds (i.e. triggering build for only one changed folder instead of triggering a million tests), independent releases, etc
* Some more tooling for large repository and LFS management, buildchain optimization, completely different build farm strategies to run tests for one build across many agents, etc
* Must have separate team with expertise to maintain mono-repo with 100 components, and knows everything that needs to be done to make it happen
* Making sure people don't create a much worse dependency graph or shared messes, because its now easier to peek directly into every other module. Have you worked in companies where developers still have troubles with maven multi-module projects? Now imagine 10x of that
* Making sure services don't get stuck on shared library dependencies. Should be able to use guice 4.0 for one service, 4.1 for another, jersey 1.x f/a, jersey 2.x f/a, etc etc. Otherwise it becomes an all-or-nothing change, and falls back to being a monolith where services can't evolve without affecting others
* Does not mean its easy break compatibility and do continuous delivery (no, there is database changes, old clients, staggered rollout, rollbacks, etc. contracts must always be honored, no escaping that, a service has to be compatible with its own previous version for any sane continuous delivery process)
* Dependency graphs have to be split for continuous delivery
* Must not get into multi-repo if you don't have true Continuous Delivery practices (not continuous integration/deployment as tools, but Delivery as a mindset)
* Do not have "Release management" and "Release managers" with multi-repos, usage of trunk-based practices
* Don't split just code but split whole dep graph
* Proper artifact management and sharing, because shared components are enforced to be "diligently" managed, not just thrown into a folder
* Plan for multiple types of deliverables: Services (runtime), static artifacts, libraries/frameworks, contracts, data pipelines, etc
Not even scratching the surface of anything here. Its just an introduction to the introduction. If the team believed they would get this resolved, they are wrong, both are just beginning of two journeys. With the other direction, we would have got a different blog post today, that's all.
In the end, if one has >10 services, they WILL have challenges whether its mono-repo or multi-repo without having holistic modularization. They will need componentization of the service, metrics isolation, build process isolation, issues/project management isolation, team isolation, etc etc. The repo is the last thing to worry about.