Hacker News new | comments | show | ask | jobs | submit login
We’ll Never Know Whether Monorepos Are Better (redfin.engineering)
50 points by kalimatas on June 29, 2017 | hide | past | web | favorite | 94 comments



From the arguments I've seen, the top highlights of single vs multi repo:

Single repo enables company-wide code sharing and refactoring with the least amount of friction. E.g. A programmer can modify "foobar(x,y,z)" to "foobar2(z,y,x)" and update all the corporate-wide client code that calls it. This ensures all teams are always using the same code instead of Team A on helperutils_v5.2.3 and Team B on helperutils_v6.1.0 which are incompatible with each other.

The biggest downside seems to be slow performance for syncing huge repos. This was one of the motivations for Microsofts GVFS for git. It enables sparse downloads.

If there are good arguments for multi-repos that are not related to performance which override the benefits of company-wide consistency of code, it would be good to discuss them. (I'm talking about companies like MS/Google/Facebook and not consulting firms with a different repo for each customer project.)


> enables company-wide code sharing and refactoring

It does, but primarily it enables the tooling around that to be so much more integrated.

Indexing the repo becomes possible, and fast searches, find-usage, inter-project code review becomes so much more frictionless. Build and testing systems end up having inter-project dependencies all the time. You probably already have them.

It's like the difference between a big database with a hundred tables and a hundred small databases. Go with the the big database every time, until you have reason not to, in which case you have a lot more knowledge about your requirements.


Off the top of my head:

1. Security. It boggles the mind that companies with large source trees would let their entire source tree get cloned by a single developer. Maybe this isn’t really an issue but seems like you’re creating unnecessary surface area when the iOS dev team can walk around with the machine learning groups code.

2. Dev aesthetics / onboarding. There’s a complexity cost to sharing all of that code. If I join a company and the first step is spending (weeks? months?) getting comfortable understanding the universe of this gigantic source tree, instead of being able to safely understand a small piece of it, that can lead to frustration and it’s certainly not going to be productive.

I’ve never worked in a massive, well put together monorepo, so maybe these issues are solved in other ways. But it seems like your point #1 is a trade-off – being able to call code from anywhere in the same repo is powerful, but it’s also capable of breeding a ton of complexity, so please mind the mere mortals who will follow one day.


I think that being able to look at the source of everything is such a big productivity advantage that the risk of an employee stealing your source is negligible in comparison.


Neither will help if you employ bad people or the company has a bad culture. You should trust every developer.


Even if you trust every developer implicitly that's not a reason to forego the Principle of Least Privilege


I can't imagine working somewhere that I'm not allowed to even see all the code.


Company wide code sharing is also why you wouldn't want a monorepo. By having it in a monorepo you require a coordination effort on all changes even if it would be ok for people to be using different versions.


There are ways of doing this in a monorepo (by copying code to a new target for instance) - the difference is that you don't need to pay the price of versioning with every change.


Of course. Thats why this is one of those holy war things. Its trade offs all the way down.


Another problem is that a single repo for all the code you run is impossible for all but the biggest companies (Google).

Most of the code that you run is open source dependencies, like the Python interpreter, GNU C library, and Linux kernel.

You could import everything, including outside dependencies, into a single repo and build an OS/application image from that repo. That is effectively what Google does. But that requires a lot of careful attention to pulling in updates of external code, and requires a particularly excellent build and dependency manager (aka a package manager). It is also huge enough to run into serious performance problems no matter how you try to fix it.

So more practically, your monorepo is just for your internal code. Which means if you ever want to fix a bug in any outside libraries, or make some of your code available as a separate library for the outside world, you need to work with multiple repos.

So why not just be multi-repo from the start? Or rather, have multiple monorepos, one per project or group.


The repo should be the unit of versioning/releasing, just for convenience - that way bisection is easier and release tags for different units are in different places. (As an extreme example, if you had two completely distinct projects with no code in common, you wouldn't want them to be in the same repo would you?)

Now that could mean a monorepo, if you're a company with one product that's released as a single unit. Or it could mean multi-repo if you have distinct components with their own release cycles. IMO that comes down to Conway's law in reverse: it's best to structure your components in a way that reflects your organization's communication structures. If you're in constant communication then moving from `foobar` to `foobar2` is a "just do it" scenario; if you're in distinct teams don't talk to each other much then having this happen via an explicit release cycle, deprecation period and so on is better.


Company-wide code sharing is not always desired, especially if you want to protect IP.


Monorepo doesn't imply read permissions for the entire repo for everyone in the company.


Based on what I've heard from people at Facebook and Google, they both grant engineers access to most code. I've read Apple doesn't provide full access.

Personally, I'd find it a bit frustrating. It can be really interesting to poke around and learn how some internals or backend stuff works.


The natural namespacing of multirepo makes this (and many other things) much simpler and sometimes even almost zero effort through the repo-handling tool (like Git{Hub,Lab}).

Over here we use both and make a call based on the project and team that will have to work on it. For a couple of projects we even went through mono->multi->mono (and kept the whole git history without history rewriting thanks to git-write-tree and git-commit-tree, see [0]).

[0]: https://gist.github.com/lloeki/37044e3f28812ce4ced0d9924a651...

PS: our history looks like this (yes, multiple starts of history, which as valid a git DAG as anything but throws some GUI tools off):

    P1 *---*---*---*---*---*---*---*---*---*---*
                    \                 /|
            P1 split --*---*---*------ |
                                      /|
            P2 *---*---*---*---*---*-- |
                                       /
                   P3 *-------*---*---*


That history graph looks pretty familiar :-)

In our case, it went from app, to app with supporting library, the library got extracted into a module and used with multiple apps, tools and new libraries were added that got shared between multiple apps, and pretty soon it felt like everything was sized MxN. We ended up combining all of the source code into one Git repo and stashing the media assets in SVN.

One of the primary driving factors here was the ability to refactor code between modules more easily. Code would show up in an app or two, and then get refactored into a common library.


then again, point about "requesting access rights" as the downside of the multirepo is mute.


From a broader perspective, most of these complaints are complaints about specific tooling (Git, Perforce, whatever build system or dependency management you use) rather than about multirepo or monorepo approaches.


Moot?


The primary reason for multi-repo in my experience is when you have to share a code base with an external team: contractors, customers, open source, et cetera. Certainly you can do that with a mono-repo, but if tools & processes have to work without the whole mono-repo, it may be easy to go all-in on multi-repos.


isn't that (monorepo) approach essentially forces you to have one release and synchronized release schedule? What if you have rolling release&


I am pretty solidly in the multirepo camp more often than not, but nothing about monorepo forces synchronized release schedules as long as you are branching appropriately.


There is no better, there is only tradeoffs, and some people prefer one tradeoff and others prefer another tradeoff. That's just how it is.

So before making any siginficant changes one should discuss what the desired tradeoff is, and then measure the continuous change towards that tradeoff.

For instance monorepo is a goal for people who want interacting tools to be build in their corresponding state without need to consult someone else. Can you do that? Check out the repo (for 2-5 hours depending on your code size) and then just hit "make" and "make test" and it runs through? If yes it's a success, if no something went wrong in the process.

PS: There are actually build tools that are language independent, e.g. make, CMake or Maven. You can just use these.

PPS: I'm personally really happy to see that monorepo in itself doesn't resolve the dependency tracking for internal tools. I still believe in the other way around. A good dependency tracking tool can totally replace monorepos.


> PS: There are actually build tools that are language independent, e.g. make, CMake or Maven. You can just use these.

This is technically true, but I wouldn't inflict on myself and my team the use of Maven if I wasn't using Java or a related language.

Some language ecosystems come with a preferred or natural build system (and related tools), and it's weird to use something else within that ecosystem, or to use standard but cumbersome tools (such as Maven) outside their ecosystem. It's possible in practice, but weird.


(Author here.) I agree with your main point but wanted to respond to your PS.

I think it's important to distinguish build tools like make from declarative versioned-dependency management tools, which multirepos require.

For example, you can build a multi-language monorepo with either make or CMake, but you can't declare that your repo builds "foo" version 1.2.3 and that it depends on "bar" version 2.3.4.

Maven does that, and it will automatically download the right version of bar for you and make it available, but Maven doesn't work well for C# dlls. You can use NuGet for dlls, but don't try it with JS modules. Use npm to manage JS modules, but don't try it with Perl packages. You can use CPAN for Perl packages, etc.


Agreed. Make+Monorepo possible. Make+Multirepo maybe not enough.


At Google, which is mostly a very large mono-repo (Android/Chrome being exceptions), I can get my own writable copy of the entire code base in about 3 seconds.


They don't use git, right?


> Rolling our own dependency-management tooling took a small team of engineers (including me) more than a year of very hard work, and there were problems.

I am amazed what is done in large organisations sometimes. No one would ever sign off on me doing something like this. How do you convince the number crunchers that its a good thing to build things like this if there is no measurable outcome?

Not trying to be rude or devalue OPs work, but I haven't worked in an organisation that would allow for this in a long while.


There are two factors at play here:

1. Larger organizations have bigger budgets they can allocate toward solving problems unique to that organization and the overhead they have built up that is unique to their particular way of working.

2. Most of the organizations that work on this type of thing don't require number cruncher approval. The number crunchers may track and report on these types of efforts but they don't control the decision making. If you are lucky enough to work at one of those be thankful.


In my experience - working solely for large organisations for the last 17 years - the number crunchers are almost always swayed by "it'll be better in the long run".

It rarely ever is, of course, but don't underestimate the will of a developer to push an interesting project that they know in their hearts will not really benifit the company.


I've recently inherited a nearly 100% python microservice-monolith monorepo project, whos tooling was written in-house by the original developers, and then subsequently re-implemented using the tooling's plugin system within the project. And even then, the entire project is coupled together with some bash scripts to actually run it.

Fuck me it's a nightmare wrapped in a shit show.

Due to the way the build tool generates the app, not a single 'standard' method can be used. Any developer niceties you use can't be (think interactive debugging, code-completion, hot-code-reloading).

I can see why it was done... sorta. Our FE team ripped out their component and standardised development back to their platform norms. But now we have integration issues between front and back, and the BE devs are now subject to a 60 second front-end build each time we change some code and reload the application.


Note that there is a third option: both.

You can split up a git monorepo with splitsh[1], and/or pull in & out with git subrepo[2]

This gets you the best of both worlds and avoid most of the cons of both, although you do get a new set of cons, like keeping them in sync and ensure tooling works in both places. It's not too bad, though. Make sure you bless one or the other as the source of truth and make the other basically read-only, and add a hook to automatically merge from one to the other.

1: https://github.com/splitsh/lite 2: https://github.com/ingydotnet/git-subrepo


There's a fourth option: neither. It's also surprisingly common. A lot of companies have a component named something like CompanyNameSupportLibrary that's pulled in and used by everything. Voila most of the cons of both multirepo and monorepo without any of the benefits.


There is also git-meta: https://github.com/twosigma/git-meta


Cool. It looks like git-meta stores metadata to make git submodule easier, and git-subrepo stores metadata to make git subtree easier.


There are technical pros and cons to both, but monorepos in my experience can "fix" or at least sidestep many human/big organization issues:

- NIH, teams re-creating instead of re-using other teams work either due to not knowing it exists or because they feel they cannot control it.

- The gatekeeper: "just ask Bob to give you access", repeat for the next two to four weeks. Once you get really big, source code access is even used as a political tool between teams.

- The internal customers: Team A works on a common framework, teams B, C, D all use it. Good luck getting anyone of them to take a new release more than once in a blue moon, and everyone will be on a different version too so you have to support them all. Fun times.


> There’s no way to measure productivity in software

This is clearly not true

Edit: It seems obvious to me that OP has a problem with his team, other than with being able to measure pros/cons of his technologically choosen path


Its a simplification of "there's no way to objectively measure productivity in software that doesn't introduce bad moral hazards".

If you thank thats not true the industry would love to see what you've got as its one of the central problems in software development.


Should we as an industry stop trying measure productivity, or should we measure and try to manage the moral hazards? Is this a case of "perfect is the enemy of good enough?"

(edit: It seems funny to me that many other industries are fine with less than perfect information, yet somehow the software industry can derive absolutely no value from it.)


I don't think we've stopped trying to measure productivity, but when the author says "theres no way to measure productivity" he's short circuiting a whole meta conversation about trying to measure if a monorepo is better than a multirepo approach because the merits of the measurements then become what the argument is about and you end up with the same place but with a level of abstraction on top of what you are trying to determine.

Lots and lots of other industries struggle with measuring productivity. The medical profession is rife with that problem. I think what might exaggerate the problem in software is we over value the objective decision making. Its our nature to want to quantify and we don't like making subjective decisions even when that is what we are being called to do.


You've made an assertion, but not provided any evidence for it. That doesn't seem, to me, to add much to the discussion.

Would you like to back your statement up?


They don't need to back it up. It is up to the person making the original claim to back it up.

There's half a century of research and practice into this topic. Anyone in the industry _should_ know about this. Anyone who doesn't can find it on a search engine within 30 seconds.


No, it's a well known problem. The old lines of code joke still applies. You can introduce metrics, but those will just be gamed because software is a multidimensional optimization problem with complex trade offs. So, you can't translate that to a single measurement and call it productivity.


Yes, it's a well known problem.


There's half a century of research - and yet, no clear way of measuring productivity, that I am aware of.

If you're aware of one then a link to an article about it, or even a wikipedia page, would be awesome.


I find Agile velocity to be a fairly useful measure:

1. Break your workload into small-ish units of work

2. Before starting a unit, assign it a number value based on how complex the task is. 1 is absolute easiest, 2 is twice the work, etc.

3. Compare how many points of work you complete over periods of time


In software, productivity can pretty much be measured through time spent on a task, while mantaining same quality.


Before which you have to figure out how to measure quality. There are countless ways of course, but choosing your metrics is subjective and pretty much arbitrary.


Also, since many tasks are literally only done once, "task" is itself a variable, not a fixed data point. As for "time", that can be quite hard to measure, since unless it's a very menial task it'll span several hours, or days even, and it's quite elusive to total what really constitutes real time spent on a given task.

I believe productivity/efficiency in software engineering is something that is felt (i.e you can measure some of its effects but many are a matter of perception, not definitive metrics. like pain.) rather than measured (i.e you can synthesize a definitive metric from a set of known, well-defined data).


I will oversimplify my response.

Let's imagine that I wrote some test cases. To me, If my piece of code passes those tests, I will consider it of being of X quality. I won't care about any other quality than this X one.

If I write/build/test two different pieces of code, both passing my tests, and happens that I took less time to write the second one of them, then I would consider I was more productive writing the latest.


I think your example shows exactly why measuring software productivity is so problematic. With your scenario, probably the easiest way to "improve productivity" would be to not write any documentation comments, heck it would probably be faster to just stick it all in one giant method.

I think many of us have known software developers who are incredibly fast, but who write "rat's nest" code that's impossible for anyone else to decipher, or which is very difficult to modify when new requirements come along. Quantifying those attributes is incredibly difficult, but as most time in software is actually spent on maintenance, it usually ends up being more important than a simpler assessment of initial output.


One can assume writing documentation and comments is covered by both approaches. Still, the fastest scenario would be the most productive one.

I agree with your view regarding maintenance cost, but I don't think that's what's at stake here. We can go further and try to compare the amount of time needed to fix rat's nest with the amount of time needed to write the software all over again. Covering the rat's nest is in many occasions faster (on which I'm considering more productive) than rewriting.


You're never going to get around the fact that most of these things are qualitative assumptions. Is your documentation good? There's really no completely objective way to measure maintainability, because maintainability is often about how easy it is to handle unknown future change requests.

Years ago "function points" were bandied about as a truly objective measure of software output and value. It turned out to be trivially game-able and added 0 value to software development estimation.


Where is your metric? By how much will you be declared more productive? Is it just time to achieve a result? Because that is obviously wrong. How are you accounting for quality, or even measuring that X? Do you account for readability? maintenance? documentation? factoring? over-engineering?

There are so many visible as well as hidden factors that attempting to measure any (let alone all) of that is a pipe dream.

Also, by writing the second one you are not accounting for second version effects since it is a reimplementation of something you know, and if someone else develops it for you then you just changed a major variable.

I get that you said "oversimplified", but this is just like saying "thought experiment", yet as soon as you confront that to the real world, it just doesn't realistically fly.


> Why is time for achieving a result a wrong metric?

We're paid per hour of work.

Also, on today's market, the first to release wins. I know there are some exceptions but whoever wins the market first has clear advantages.

So why is time not a good metric?

And I didn't account for second version effects, that was not my aim.


I think you really did oversimplify it. Writing a piece of code once obviously affects your speed at writing it again.


As you clearly understood, that's not what I meant.


Ok, so now you "just" need to factor in long-term maintenance.

Besides, how often do you write code twice just to see which way is better? What you have is a comparative measure that is completely useless for measuring new code, i.e. code which implements a new feature.


Even it you just hardcode the tests cases in a giant lookup table? Tests can prove the presence of bugs, but paying doesn't mean the software is bug free.


I don't know if I would agree that there are countless ways to measure quality. I think the standard one is: the ability of the software to meet its design specification. Depending on your spec this could be more or less objective.


I would say that's a good baseline, but I would extend that to "the ability of the software to meet its design specification well", where real quality lies in the more subjective subtleties between minimally functional and genuinely happy users.


(Author here.) When I wrote that, I was echoing what I believed to be a widespread view. For example: https://martinfowler.com/bliki/CannotMeasureProductivity.htm...

"Productivity, of course, is something you determine by looking at the input of an activity and its output. So to measure software productivity you have to measure the output of software development - the reason we can't measure productivity is because we can't measure output."

In an earlier post, I argue that in light of the fact that we can't measure productivity, we should measure job satisfaction instead. https://redfin.engineering/measure-job-satisfaction-instead-...


I think that productivity is a tough thing to measure...yes there are metrics, but do any of them give an accurate picture of productivity? What do you think is an accurate measure of productivity?


Time would be an accurate measure, considering the same level of quality is met.


Time spent working on the wrong thing is not productive. How do you measure quality in software? If you need to measure quality in order to measure productivity, than one cannot truly measure productivity.


Or to describe this in practice: you can write prefectly modular, documented, tested and well written code that gets top marks at cyclomatic complexity checks. And yet, if you think carefully, your program can be solved by a quick shell one liner. Were you productive at all?


You don't need to measure quality, unless you want to achieve perfection. If you set your quality according to goals, you can ensure that no matter different software you write, it will need to meet same goals (same level of quality).


Everybody will agree that big codebase need some sort of componentization so that understanding some part of the code doesn't require understanding the entire codebase.

I like multiple repos because it encourages that: having pieces of code that do not depend on each others, and having more well defined API.

The speed gain of the tooling is just a side benefit.


While there are certainly ups and downs to both, I'd much rather have to deal with the downs of multirepo myself.


You can have your cake and eat it by mixing both. I've been reading through the Fuchsia [0] source, and they essentially have a monorepo setup that's made up of multiple repos.

They wrote a tool called jiri [1], and track projects in a manifest [2] repo.

[0] https://fuchsia.googlesource.com/?format=HTML

[1] https://fuchsia.googlesource.com/jiri/

[2] https://fuchsia.googlesource.com/manifest/


Cross-language artefact and dependency management. Tell us more about the tool.


I guess Conda is the most popular one. In the python world, we glue all sorts of things together. So I guess it's natural it'd come from there. But now it supports all sorts of stuff other than python.


Generally known as a "package manager". I wrote this short article comparing some: http://catern.com/posts/deps.html


Nix (http://nixos.org/nix). Disadvantage: works only on Linux and MacOS X.


I've always wondered why this kind of tool wasn't widely available. Is it really that hard to get right? What's so complicated about it?


My conclusion in the monorepo/multirepo debate is that for large code bases, current tooling isn't up to the task, and switching between monorepo and multirepo changes one set of tooling problems for another.

With custom tooling, you can overcome almost any of the differences between the two approaches, enough so that the two approaches can start to look rather similar. There are ways to do atomic refactors across large numbers of repos, with the right tooling, and there are ways to use semver for components inside a monorepo.


Interesting how the HN comments are also dividing between "it's obvious that monorepos are better" and "it's obvious that multirepos are better".


So, here is some more anecdata for the pile. We go with the multi-repo model at work, and the number of repos we "use" is over 1100 right now, and growing about a hundred repos a month.

How did we get here? An (imo) extreme dedication to "microservice all the things", a CI system which intentionally only allows one build artifact per repo, and a release system that encourages the separation of parts of a service's components to minimize red tape for changes.


OT, what is it with Medium sites pushing onto my history state? Getting out requires at least two, frequently more applications of <Back>.


It all depends on your needs and architecture.


Anecdata:

At my $DAY_JOB we went with a mostly-monorepo solution (1 main repo and 2 small auxiliary repos), with multiple dependent modules inside it -- after me insisting on it, while the other dev wanted a bazillion of small repos -- we're talking about just a really small dev team (3 devs initially, more later) -- but we still agreed that 1) each subfolder of the monorepo should be as independent as possible and only depend on the minimum subset of other modules it really need (so multirepo-ish behavior in a sense), while at the same time 2) allowing to change things in multiple modules and quickly check if everything still works (we used symlinking between the modules' folders).

Result:

I was babysitting our build and tooling for months and fixing more and more exotic edge cases with incompatible transitive dependencies/multiple branches/versioning/sumodules building order etc. (In hindsight, we threw way too much complexity and requirements on an opensource tooling we used than it was able to handle - but we didn't acknowledge that upfront, because at the beginning it looked like it would work fine).

If we went with a multirepo instead, it would have been even worse: all the pains that I described would still be there, plus we'd go crazy with non-atomic refactorings and merges.

IMO starting a project as a multirepo is a madness: things are messy and change too often at the beginning and it's not worth the overhead. Maybe it makes sense to start with a monorepo but structure things in a way to make them splittable in the future, when things stabilize more. But even that is way more costly than expected.


This article missed a great opportunity to enumerate the advantages and disadvantages of either model.


It's because this is an article about engineering decision-making, not repo management.


Same with:

* Vim vs emacs

* Tabs vs spaces

* Static vs dynamic

* Monolith vs microservices

* Functional vs imperative

And a long etc. We're dogmatic folks.

I fall in the multirepo camp.


Who cares about that? Which is better, vi or emacs?


Notepad++


monorepo is very difficult to get right - and once you do, you are pretty much stuck with the language/tool choices since the switch usually takes months/years.

multirepo is a nightmare for changes spanning multiple repos, but the freedom of using the best tool for the job is much easier to achieve.

or so is my thinking about these from experience.


I think that's having a single codebase. You can have different projects using different languages on a single repo.


Don't confuse multi-repo with multi-language.

Its very common to have multiple languages in a monorepo codebase.


In my experience the problem you're identifying there is an general failing to understand boundaries when designing the system. A language or framework shouldn't pervade every package, to aid this in my most recent large project we formulated a naming structure for packages that gave an indication of where they fit in, from a high level perspective, and the sorts of dependencies they should have.


Again, this article and in general engineers discussing Monorepo vs Multirepo - seem to be discussing about splitting code without splitting dependency graphs, CI/CD and independent delivery. So they had "religious arguments" (exactly as said in the article). Holistic Modularization and SOA (i.e. not just splitting dev's job or qa/ops/architect/manager's job, but holistic) is not an easy thing in general.

Lots of FUD on this topic, and although its good that the author has bought this up, its just adding to even more FUD. I'd go as far as to say the title is clickbait. The end topologies and tooling for monoliths are different. Not necessarily one is better than other, they are just different. There is no magic pill, there are right places/requirements and right people/experience/culture to use each of them.

Mono-repo:

* Imagine it like: Building a new Ubuntu OS release for every Firefox update. Then work backwards

* Mono-repo needs completely different toolchain like borg/cmake, to find out the whole dependency graph, rebuild only changes and downstreams, avoiding a 2-hour build, avoiding 10gb release artifacts, scoped CI/commit builds (i.e. triggering build for only one changed folder instead of triggering a million tests), independent releases, etc

* Some more tooling for large repository and LFS management, buildchain optimization, completely different build farm strategies to run tests for one build across many agents, etc

* Must have separate team with expertise to maintain mono-repo with 100 components, and knows everything that needs to be done to make it happen

* Making sure people don't create a much worse dependency graph or shared messes, because its now easier to peek directly into every other module. Have you worked in companies where developers still have troubles with maven multi-module projects? Now imagine 10x of that

* Making sure services don't get stuck on shared library dependencies. Should be able to use guice 4.0 for one service, 4.1 for another, jersey 1.x f/a, jersey 2.x f/a, etc etc. Otherwise it becomes an all-or-nothing change, and falls back to being a monolith where services can't evolve without affecting others

* Does not mean its easy break compatibility and do continuous delivery (no, there is database changes, old clients, staggered rollout, rollbacks, etc. contracts must always be honored, no escaping that, a service has to be compatible with its own previous version for any sane continuous delivery process)

Multi-repo:

* Dependency graphs have to be split for continuous delivery

* Must not get into multi-repo if you don't have true Continuous Delivery practices (not continuous integration/deployment as tools, but Delivery as a mindset)

* Do not have "Release management" and "Release managers" with multi-repos, usage of trunk-based practices

* Don't split just code but split whole dep graph

* Proper artifact management and sharing, because shared components are enforced to be "diligently" managed, not just thrown into a folder

* Plan for multiple types of deliverables: Services (runtime), static artifacts, libraries/frameworks, contracts, data pipelines, etc

Not even scratching the surface of anything here. Its just an introduction to the introduction. If the team believed they would get this resolved, they are wrong, both are just beginning of two journeys. With the other direction, we would have got a different blog post today, that's all.

In the end, if one has >10 services, they WILL have challenges whether its mono-repo or multi-repo without having holistic modularization. They will need componentization of the service, metrics isolation, build process isolation, issues/project management isolation, team isolation, etc etc. The repo is the last thing to worry about.

Edit: Formatting


Monorepo is harder to get right initially, and you have to think things through. But once you do, you're set for it. I love that there is so little code duplication.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: