Hacker News new | past | comments | ask | show | jobs | submit login
From a Single Repo, to Multi-Repos, to Monorepo, to Multi-Monorepo (css-tricks.com)
122 points by leoloso 64 days ago | hide | past | favorite | 68 comments

I recently built mani[0], a cli tool used to manage multiple repositories, which can solve some of the same issues with many-repo solutions, though I have a lot less packages than the author. It's similar to the package meta mentioned in the article, but comes with improved filtering of projects when running commands, a way to organize your projects from a single config, create scripts that target certain projects/groups only, auto-completion and some other helpful features.

[0] https://github.com/alajmo/mani

I have worked with all of these models.

I have figured out that monorepo mainly covers for faults in maintaining separation between components (like ensuring only single direction of dependencies and no cycles, ensuring backwards compatibility, self-service, et.c) and to some extend to cover for out of control microservice craze.

Kinda like giving unrestricted access to your PROD databases does help improve efficiency, but at the cost of additional risks and deterioration of separation between applications, lack of APIs for users to self service themselves, etc.

Especially with microservices, these tend to fare poorly if you don't solve static costs of maintaining a service. So rather than invest in proper tooling let's just plop it into single repository. It will cover some problems a little bit.

Now, I don't want to say monorepo is bad by itself. The problem is when it stunts people's ability to maintain proper APIs and development process.

That's the whole point of a monorepo is it not? You get rid of all the ceremony about APIs and just say "Whomever modifies an API is responsible for updating all the API consumers as well as the API itself" (and the reverse too).

This means that developers are free to write code that is correct now rather than be limited by technical decisions made in the past, that may no longer be valid.

It also means that when you do update an API you are confronted with the technical reality of how the API consumers actually work. This may cause you to reevaluate and improve your mental model of the API.

If your API is siloed away in its own repo then you can easily get in a position of taking it in a direction that is not actually in line with what API consumers actually need.

Whenever you do code updates of separate services, you still need to update all running instances at the same time. Being able to change the code in 1 commit does not necessarily mean the services also update at the same time, which will cause api errors.

It depends on what kind of APIs we're talking about. There's APIs in the sense of protobuf microservices, which are deployed individually and talk to live systems, but there's also APIs in the sense of libraries.

In my company the former is handled by never deprecating any field (and then clients just get to deal with picking and choosing the ones they want). If a major schema change is required, spin up an entirely new service and tell people to migrate over and eventually turn off the old one when nobody's using it anymore.

Library changes can be done atomically: change the API, change the call sites, if tests pass you're done. One may opt to use the same strategy as w/ microservices here too.

Regardless, the dynamics for interacting w/ microservice API changes doesn't change based on whether you're on a monorepo or not. But a monorepo can help in the sense that some aspects of a service are version-controllable (namely the schema definition files) and it's in the clients' best interest to have those always up-to-date rather than having to remember to run some out-of-band syncing command all the time.

If you wanted to you could spin up new clusters of every changed service and direct traffic such that every version of a service is a separate cluster. Then slowly redirect external traffic to new services.

Every internal service would always hit the exact version it was compiled with and you only need to worry about external api compatibility at that point.

Most use cases you can just get some scheduled downtown, though.

Common ownership of code doesn't work though. People becomes experts in things they work on a lot, and will refactor better for it because they have long term plans they are working towards. Someone outside making updates will make a mess of the long term plans not understanding where it needs to go.

Maybe I missed some implied context, but I would still assume a CODEOWNERs model for the monorepo, wherein _someone_ is the expert/owner of any given folder of code and is brought into conversations when other are touching that code.

Yep. Bazel has a visibility feature that indicates what projects are allowed to consume a given thing. This is used by the owner of the library to indicate to what extent they are willing to support the library. Some libraries are only meant to be used by the immediate team, some are designed to be useful to other teams.

In the same vein, at my company we have a mechanism to specify package ownership, and the owners may opt to make themselves mandatory code reviewers for any incoming change.

IMHO, this is nicer than multi-repo because you get a lot more visibility into who's actually using what and you can enforce some level of accountability, which means you don't get into awkward situations where A made a breaking change, B uses A but never upgraded and now C is trying to deal with a newly discovered vulnerability affecting A and B.

Right, what you outline here is exactly what I have in mind when I think of monorepo. I actually always assume the CODEOWNER is a mandatory code reviewer in this model. The import rules are less a guarantee since they rely so much on tooling support.

The way we do it is a bit more complex to account for meatspace:

A project may have ownership data, and if it does it defaults to making the owner an optional reviewer. But if the project doesn't have ownership data, ownership bubbles up the folder tree to a folder that does have such information.

We organize projects such that increasing folder depth also increases ownership specificity (e.g. at the project level, the folder structure implies a specific team has ownership, one level up is their business vertical, one level up is the cost center category and so on, all the way up to root, which represents "everyone"). With this scheme, we can reassign ownership in situations like when the sole owner of a project leaves the company, or if teams get restructured. And by not making review mandatory, we are able to unblock cases like landing a high priority security patch while the reviewer of one of the affected packages is on vacation.

There can be that (should be), but the difference is the code owner isn't making the change, instead the person changing the API is making it.

If there's a dispute then a monorepo makes things easier because all interested parties have a complete global view of the entire change.

In a multi-repo setup, one change becomes many commits and everybody sees only their piece of the jigsaw puzzle.

It does depend on everybody being adults and behaving sensibly; unfortunately there's no technology that can solve that problem.

software repositories don't solve process problems in meatspace.

That is the point. People who claim any repository organization scheme is better are wrong because they can't touch on the real problems.

The failings of scaling source control should not be what enforces your dependency graph. Some other tool that actually understands the semantics of your code can do that. Source control should just support any workflow you throw at it.

And this is right. In ideal world the choice of source control would not influence how you design your application.

Unfortunately, the world is not ideal. Some problems that are easily detected when you have multiple applications that have their own teams, apis, release schedules, and repositories stop being naturally easily detectable when every developer has access to every repository and can introduce changes in sync on both client as well as service side.

One company I worked for used monorepo. Developers would shortcut development process and would omit some practices by modifying both client and the server at the same time, in sync, not caring for backwards compatibility.

Then there was a huge outage because the two changes would require both client and the service to be deployed at the same time. But you know, in a distributed system no two things happen at exactly the same time, so there was a short moment where the services were mismatched and some broken data was saved. And a day later that broken data completely destroyed an important production batch with large loss for the company.

And while it is definitely fun, convenient and efficient to be able to do just that, it requires also a little bit of care so that the whole system does not deteriorate over time.

How does forcing two commits to two repos help this example in any way? You still need the testing to catch backwards compatibility and that can be done in a monorepo. It can arguably be done better because you have a more direct link to the downstream code.

Every choice is a compromise with pros and cons. You make a choice and then figure out how to mitigate the cons of your choice.

I have a project, a programming environment/runtime, that contains a number of overlapping applications. I started with a small number of repos, including one main one. I had a number of frustrations with that. One of them is I didn't like having shared and unshared code from different applications in the same repo, for the sake of managing versioning. I separated into more repos, maybe 20. After I split it up I discovered a number of new issues, much like the OP here. One of which is complications from working with several repos at the same time, such as when refactoring.

I'll think about the next steps taken by the OP. Does anyone have any other practices they can recommend for managing these type of projects?

A monorepo that contains a directory per thing is the way forward IMO.

> I had a number of frustrations with that. Could you elaborate more? The OP had to split the monorepo due to https://getcomposer.org/ limitations (each package needs it's own repo it seems). What other issues have you found when working with a monorepo? Thanks!

For some background, I have a web version and two electron versions of the programming environment. Then there is a web runtime and a web server which run on the output of the programming environment. Originally the main repo contained code for all the applications, but also some code specific to one or more of the applications. I had trouble versioning a subset of the repository. I ended up versioning the main repo and bumping the version of all apps if anything in the main repo changed, because I didn't have a systematic way of knowing that only certain parts of the repo changed. Perhaps that is the piece I would need?

I said there were a number of frustrations, but that was the driver.

In the new repo format each repo produces a single library/module, and can be versioned as a whole. When I run an application for dev, I can run a given library from a release or from live code (with a dev server serving ES modules from a configured path). This works well when one of the repos is not changing. But I end up developing in a significant number of repos at the same time, so I just configure it to run from live code rather than release for all those libraries i'm working on. This feels like working with a monorepo except I have the hassle of managing commits to all thse repos. Long story short, right now I am working out of the master branch because I screwed up commits. You may have figured out by now my strength is not in dev ops.

Thanks for sharing your experience! I can see how you were struggling trying to handle different "versions" in subsets of the repo. In my experience the "monorepo" workflow implies a "working on master/head" mindset, but this is not always the best approach depending on your specific needs.

Lerna let's you version things independently, and if you use something like conventional commits / commitizen it can auto detect whether versions are major or minor.

When you make a version with lerna it auto-tags the commit with the version numbers of the components, so consumers can still depend on a specific version of a component.

I'm setting it up now at a new company and its pretty amazing.

While composer does have this limitation in that packages are published by making new tags within the repo, frameworks like symfony and cakephp have workarounds where they have one monorepo where all packages are worked on, and then automation to push changes to read only repos of each component. So there's https://github.com/symfony/symfony pushing to https://github.com/symfony/event-dispatcher which gets published to packagist.

> Does anyone have any other practices they can recommend for managing these type of projects?

Honestly, the only way around these sorts of issues is to utilize automation in some form.

I've found that setting up repositories (like devpi[0], Artifactory[1], or Docker Registry[2]) on a shared network location (for your project, could be local if you work alone) and using CI/CD tools (like Jenkins[3]) are the key. The goal is that you end up working on one portion of the code base at a time, and those need to go through the standard validation processes so that you can pull in the updated package version when you work on something down-stream. You making sure that the CI/CD environment _doesn't_ have access to other packages's non-versioned code is key for making sure things actually work as expected.

For example, if you have FooLib, and you need an update in that for BarApp, then even if you branch FooLib 1.2.3 to 1.2.3-1-gabc1234d (the `git describe` of the commit) on `feat/new-thingy` , then even if BarApp v2.3.4-1-gaf901234 depends on that new branch, it shouldn't be in any way able to reference that branch on the CI/CD build process. How do you get around this? Good development -- finish the FooLib branch, get that working, merge it in with the updated version, and push the package (with the new version) to the CI/CD-accessible repository. At that point, when you push your BarApp change, it can actually build and not die. But until FooLib has got a versioned update, BarApp's branch _shouldn't_ be able to build.

The statement of "But I want to work on the changes locally, in parallel" is valid. That's what local development is for -- giving you space to work on related things that don't impact the upstream codebase. You should have the option to utilize FooLib's branch code in your BarApp code locally, and you can often do that via things like `pip install` or `maven install` or whatever the relevant local install command is. At this point, the package still probably has the same version number, so the local build doesn't trigger issues. You can work on the two and tweak and twist as you want, but refrain from actually trying to push BarApp referencing FooLib's branch until it's actually in the repo.

This all takes a great deal of restraint and patience. The goal here is make it just a tad harder to introduce problems somewhere since you can't depend on something that hasn't been given the go-ahead. While there might be a lot of "Updated FooLib requirement to v1.2.4" throughout your codebase, why are you doing that just off-hand? If you are doing it because of a security issue or bug, let that be known in the commit message. If you are doing it because you can utilize a new feature/whatever, your commit message won't be just "Updated FooLib", you likely are doing "Added Feature X2Y, updated FooLib to 1.2.4".

PHP I try not to touch much, simply because I've always had bad experiences. I know for a fact that there are decent ways to do it with build tools like Maven[4], setuptools[5], and Docker[6]. Hell, I have used Docker as a way to introduce versioned dependency packaging, only needing to use Docker Registry (each dependent project does a multi-stage build, pulling in the dependencies via the versioned package images).


[0]: https://devpi.net/docs/devpi/devpi/latest/%2Bd/index.html

[1]: https://jfrog.com/artifactory/

[2]: https://docs.docker.com/registry/

[3]: https://www.jenkins.io/

[4]: https://maven.apache.org/

[5]: https://setuptools.readthedocs.io/en/latest/

[6]: https://www.docker.com/

> But until FooLib has got a versioned update, BarApp's branch _shouldn't_ be able to build.

This is such a horrible practice. You're creating mountains of extra work, and encouraging devs to delay integration testing, which is certain to lead to cycles of rework. It also only 'works' on toy features. When you're building a complex feature that requires a few weeks of work and a few devs, it quickly breaks down, and further prevents early QA testing of the new feature itself.

Unfortunately this practice is often forced on people by reliance on the horrid SemVer scheme, which only makes any kind of sense for 3rd party dependencies, but is foisted on internal dependencies as well by many idiotic package managers, like Go mod or NPM.

I find that this pain is a symptom of complecting. If you well and truly can't test your code to 80% or better confidence until other feature comes online, well then maybe there are insufficient separation of concerns.

Typical CRUD stuff should be like at least 80% purely functional business logic (that is 100% testable without integration) and 20% or less IO code. If you really need that integration to find all the rough edges and work out the bugs, you probably have too much surface area in your IO "tainted" code.

Java-style-OOP really encourages this sort of thing by subconsciously compounding data state with functional methods. The whole "I needed a banana and you gave me a whole jungle" problem.

Integration vs component testing has almost nothing to do with functional vs state/IO heavy work. Instead,it has everything to do with the amount of effort you spend in specifying your components and writing test cases. If we're building a feature where componentA must call componentB with some data structure to achieve some goal, we can formally specify all valid inputs of componentB and their semantics, and write tests for each combination etc; and then have componentB religiously stick to the same; or we can agree in more informal terms on the expected input values and rely on integration testing to put them together and make sure we are achieving the right result for the expected inputs to componentA.

For some problems, the formal specification is tractable and even necessary. But for many complex problems, it is either not tractable (the input is too complex, you would need on the order of magnitude of componentB to actually specify the semantics) or its just not worth it (componentB is only called by componentA).

I also want to note that I'm not talking about regular types when I say 'formally specifying the valid inputs and their semantics', though I'm sure dependent types could in principle achieve this. I'm talking about cases like components which comunicate though script-like objects or configuration templates etc.

If you have a CI system your version patch will always increase and you can then always integrate the latest dev version. Most of your developer issues will be someone not using the latest versions for everything.

How will that work with in development branches? At any one time, there are multiple sub-teams developing multiple independent sub-features all impacting some of the same components. How are they supposed to do this if they can't branch out the components and each work on their own independent branch of the integrated application? There isn't a single 'latest dev version', there are many.

In lerna you can restrict versioning to a particular branch. So you check out your feature branch, work until it's ready to share and then merge it back into master and create a version.

Creating a version tags the commit with the version number for each package that's been updated and it allows for the creation of pre-release versions. If you have things that aren't ready for prime time.

Consumers can depend on a particular git commit by referencing the tag.

So there is one main branch that contains all of the commits, but different components are versioned independently and reference particular commits in the branch.

Integrate often enough that nothing has diverged enough for that to be a problem. Short lived branches are good, long lived branches get into the problem you speak of.

We're back to things that only work for small changes. Large features that need days or weeks of work before they can be mainlined aren't a rare occurrence, they are the norm, and usually generate the most value for a product.

Not to mention, you often need to polish a release while developing large features for the next release - again cases where you need branches.

Of course, you can also try to take the feature flag model, and avoid refactoring entirely. Unlikely to be a good strategy for a long lived product.

I agree. Having a huge mono repo is basically throwing the towel in the ring with your automation/package management/dependency validation.

In the .net world your CI/CD pipeline should continuously build and publish NuGet packages of your common code as you make changes. Since the old versions are obviously still available, other parts of the system are not forced to be updated to the new version of the dependency.

Thanks for those links! I will check that out.

I will say one problem I have is in refactoring the interfaces of my modules, which is what I seem to spend a lot of time one, at least in this stage of the project. When I am updating the bottom ones, I pretty much have to update the others in parallel.

Yeah, that's understandable. Don't be afraid to have refactored changes on other repos locally, just be sure to do the package version updates first.

Whatever the problem, git submodules are rarely the solution.

Now that most guis support them, submodules actually work pretty well.

The only major issue I face with git novices is making sure everyone on the team sets their machines to pull recursively.

I saw this recently https://chromium.googlesource.com/chromiumos/platform2

" We moved from multiple separate repos in platform/ to a single repo in platform2/ for a number of reasons:

    - Make it easier to work across multiple projects simultaneously
    - Increase code re-use (via common libs) rather than duplicate utility functions multiple items over
    - Share the same build system

for "multi"-repo (not sure) what really, fuchsia uses something called jiri, and there is still gsync

One thing to note is that Chromium and other Google projects are HUGE codebases; most codebases of people thinking about multi- or monorepos, microservices or monoliths, etc only WISH they had the scale of code and developers that companies like Google have.

All I'm saying is, beware of cargo cult thinking. Do something because you need it, it's practical, it's faster, not because someone else does it. I've had a few projects that were intentionally difficult because Someone decided to make it a microservices architecture, but in a monorepo to encourage the distributed monolith idea.

To be fair, Google, Facebook, et al chose to have monorepos when they were startups, so it's not necessarily cargo culting to look at what they did. I'd argue part of the success was cultural - with everything in one repo there isn't generally specific folders which are restricted in these large codebases, and that helps with a bunch of things.

Nice write-up. I have explored different repo strategies quite a bit myself in the course of a few efforts that I've been involved with. On one, we originally had a monolythic framework and everything the article said about cons is pretty spot on. However, I'll qualify by saying that I think the problems come less because of the nature of monolyths in general and more because of lack of experience with modular design.

We wrote a new framework from scratch using a monorepo approach, with separate packages via Lerna. The problem here was tooling. Dependent builds were not supported and I've had to delete node_modules more times than I'd ever cared to count. The article talks about some github specific problems (namely, the issues list being a hodge-podge of every disparate package). We tried zenhub, it works ok, but it's a hack and it kinda shows. I've seen other projects organize things via tags. Ultimately it comes down to what the team is willing to put up with.

We eventually broke the monorepo out into multi-repos, and while that solved the problem of managing issues, now the problem was that publishing packages + cross-package dependencies meant that development was slower (especially with code reviews, blocking CI tests, etc).

Back to a monorepo using Rush.js (and later Bazel). Rush had similar limitations as Lerna (in particular, no support for dependent tests) and we ditched it soon afterwards. Bazel has a lot of features, but it takes some investment to get the most out of it. I wrote a tool to wrap over it[0] and setup things to meet our requirements.

We tried the "multi-monorepo" approach at one point (really, this is just git submodules), and didn't get very good results. The commands that you need to run are draconian and having to remember to sync things manually all the time is prone to errors. What's worse is that since you're dealing with physically separate repos, you're back to not having good ways to do atomic integration tests across package boundaries. To be fair, I've seen projects use the submodules approach[1] and it could work depending on how stable your APIs are, but for corporate requirements, where things are always in flux, it didn't work out well.

Which brings me to another effort I was involved with more recently: moving all our multi-repo services into a monorepo. The main rationale here is somewhat related to another reason submodules don't really fly: there's a ton of packages being used, a lot of stakeholders with various degrees of commit frequency, and reconciling security updates with version drift is a b*tch.

For this effort we also invested into using Bazel. One of the strengths of this tool is how you can specify dependent tasks, for example "if I touch X file, only run the tests that are relevant". This is a big deal, because at 600+ packages, a full CI run consumes dozens of hours worth of compute time and we see several dozens commits a day. The problem with monorepos comes largely from the sheer scale: bumping something to the next major version requires codemods, and there's always someone doing some crazy thing you never anticipated.

With that said, monorepos are not a panacea. A project from a sibling team is a components library and it uses a single repo approach. This means a single version to manage for the entire set of components. You may object that things are getting bumped even when they don't need to, but it turns out this is actually very well received by consumers, because it's far easier to upgrade than having to figure out the changelog of dozens of separate packages.

I used a single repo monolyth-but-actually-modular setup for my OSS project[2] and that has worked well for me, for similar reasons: people appreciate curation, and since we want to avoid willy-nilly breaking changes, a single all-emcompassing version scheme encourages development to work towards stability rather than features-for-features-sake.

My takeaway is that multi-repos cause a lot of headaches both for framework authorship and for service development, that single repos can be a great poor-mans choice for framework authors, and monorepos - with the appropriate amount of investment in tooling - have good multiplicative potential for complex project clusters. YMMV.

[0] https://github.com/uber-web/jazelle

[1] https://github.com/sebbekarlsson/fjb/tree/master/external

[2] https://mithril.js.org/

It looks like the last step in missing: after discovering all disadvantages of submodules (they are in fact quite dangerous when working in a team), the author should have had switched to Git X-Modules[0] and use the same multi-monorepo approach without extra hassle :-)

[0] https://gitmodules.com

Maybe it's time to wonder if it's not how you arrange the files?

Monorepos have nothing to do with arranging files. It's all about making atomic commits to multiple projects simultaneously.

I'd argue that it's not. The atomic commit thing is cited because it's about the only argument in favor of monorepo that has technical merit. So a smart debater will latch onto that argument knowing it is hard to defeat. imho it is all about inertia and laziness: an organization begins with little code and one repo. They grow. They write more code. They acquire more code. Nobody is interested in stepping forward to do the hard work of managing all this code. So the repo gets bigger and bigger with more and more discrete projects inside it, until the situation becomes largely unworkable (e.g. clone the tree takes 20 min, CI builds of different projects don't work because tools used can't deal with multiple projects in the same repo). Then people justify their mess with arguments about sharing code and atomic commits. Also I'd argue it is to some extent about lack of knowledge -- people may just not be familiar modularity mechanisms that will solve their problems, because they've always depended on code being in the same repo.

The atomic commit thing is just a means to some ends. Some ends are indeed easier to achieve with monorepos. Security audits/patching is one example I've dealt with in the past (dealing w/ supply chain vulns in a multi-repo world is a goddamn nightmare). Cross-package integration testing is another example.

The thing about projects getting bigger is orthogonal to whether people pick monorepos or not. Many project feature sets are simply complex in a way that you can't refactor your way out of. When I hear this argument that projects could somehow be made smaller, typically it's from someone who's never had to deal with the regressions of such a decision or someone who's never had to be accountable for their estimates. I've seen firsthand projects that got rewritten with the benefit of hindsight, proper staffing, management blessing and all the jazz, and still struggle to meet feature parity of their older counterparts. It's clearly not a question of being interested in putting in the work, or even having the budget for it.

From what I've seen, messiness is largely a function of experience. A lot of people simply don't have experience in writing libraries and/or architecting systems. I find that once they acquire the skills, quality of encapsulation improves a lot. FWIW, I think monorepos help with that transition because it provides that fast feedback loop of single repos while a developer is learning the ropes of how to librarize, instead of getting bogged down waiting on slow code review/CI/publish feedback loops.

People with monorepo just want to have versioned filesysytem. They don't care about splitting and separating stuff.

Should managing all this code be hard work? I don't blame people who want to skip that.

The core problem is that code is always changing, every single day from Monday to Friday. And it gets worse when other code/projects depend on it

How far does this have to go before people realize how ridiculous this is. The Multi-mono-multi-monorepo stage?

You don't need a monorepo. It's an anti-pattern that people resort to when their code is a tangled, tightly coupled mess.

If the modules are loosely coupled and high cohesion, multiple repos is the ideal approach.

The solution to the problem of having to 'constantly update dependencies' is not to bring them all into a single monorepo. The solution is to ensure that these dependencies handle separate concerns and have simple interfaces which allow them to be loosely coupled with the main project logic in such a way that they can be updated independently of each other.

If different module dependencies often need to be updated together whenever you add a new feature or fix a bug, it almost certainly means that you have a problem with coupling and/or cohesion. You don't need tools that make it easier to work with a tangled mess. The correct solution is to untangle the mess. Otherwise the mess will keep getting worse.

Tightly coupled code is, and should be, the norm. Loose coupling is a pipe dream as soon as domain-specific data structures are involved.

Loose coupling over domain-specific data structures (and their related procedures) just means that errors that should be caught by a run-of-the-mill type checker are now found in production.

An example of this pattern can be seen pretty much whenever you open the developer console in a non-chrome browser on any site using some JavaScript framework. Downloading some code from the website and executing in the browser is an extreme form of loose coupling and by my experience it is pretty much never free of errors, except for the very specific combinations of browser/platform that the developer could test.

This line of thinking goes against not only everything that the original inventors of programming languages ever taught, but it goes against almost 20 years of my own professional experience... And I've written and worked on many highly maintainable projects which had loose coupling (including in some very complex business domains).

In fact, they're the only 2 rules that have consistently delivered value across all the languages that I've tried and for all different kinds of system that I've built. Closest thing possible to a silver bullet.

I find this kind of mindset that x is not possible to be defeatist.

Instead of acknowledging that programming is difficult and can take many decades to master, people prefer to pretend that programming is easy but unavoidably messy. People always try to come up with narratives which make it easier to accept mediocrity rather than work hard to keep improving themselves.

You can also see the difficulty of loose coupling when people transition from a monolith to microservices. What used to be a function call is now a network request that can fail in all sorts of interesting ways.

You can have the procedural/functional code only tightly coupled to the datastructures but loose coupling between the other bits.

  > The solution is to ensure that these dependencies handle
  > separate concerns and have simple interfaces which allow
  > them to be loosely coupled with the main project logic
  > in such a way that they can be updated independently of
  > each other.
If your requirements ever change, you will need to make changes to these simple interfaces. At this point you will have to make changes to multiple repositories. Some changes can be done without breaking changes, but sometimes a change must be a breaking change.

A 'single' monorepo makes your problem easily solvable using automation and code. You can use a multirepo if you prefer to solve problems using human coordination.

I've worked on several projects/codebases which could handle requirement changes whilst rarely requiting changing the interfaces of dependencies. It's not easy to design a system this way, but it's possible and it's a worthy goal to aim for. Making it out as if it's simply not possible is not only mediocre, it's untrue.

It would be mediocre if I suggested that all changes are breaking changes and didn't know about additive changes, deprecations, or (kelvin) versioning.

If an API turns out to be insecure and has to be removed or a mitigation put in place that changes it's behaviour that can be an unavoidable breaking change.

Either way, my point wasn't that "breaking changes always have to be made" but "avoiding making breaking changes because everything has to be deployed granularly slows you down and so does needing to make the same change in N places".

I think I could not disagree more. Coupling and cohesion are problems you will have, regardless of the repo approach you take. But a mono-repo gives you a host of incredibly beneficial advantages, especially for larger organisations.

Any downstream library team can instantly tell if their change has broken any upstream users because a good CI system will just automatically run all tests of all relevant projects. That way a library developer can either iterate until all projects depending on it are green, or the library developer can even proactively change all dependant projects and roll all fixes out in a single atomic commit.

This has proven incredibly beneficial to our development speed and the number of avoidable code conflicts that crop up.

> Any downstream library team can instantly tell if their change has broken any upstream users because a good CI system will just automatically run all tests of all relevant projects. That way a library developer can either iterate until all projects depending on it are green, or the library developer can even proactively change all dependant projects and roll all fixes out in a single atomic commit.

Any proper (recent) package manager will do the same with a good CI without having any of the mono-repo disadvantages ( duplicated libraries, diamond dependency mess ).

The only real advantage of mono repo is that editing simultaneously multiple software components with API breaking changes is made easier, much easier.

Please share the package managers that handle this, thanks!

Nix, Guix or Spack do that correctly.

The problem is not when you need to change the implementation of a module, it's when you need to move the boundaries between them.

What you are describing is a reasonable approach if you can get the module boundaries right in advance. For more open ended software problems, that is impossible - you need the freedom to rearchitect and change responsibilities. That's the situation where a monorepo is much simpler than anything else.

The issue to me is more like Git is terrible for some enterprise use cases yet people try to force a square object into a round shaped hole.

Git isn't the best tool for THAT job. I've been saying the same thing for 10 years now, nobody wants to listen because a lot of developers only know Git.

I see this thinking frequently, it is really is a fundamental issue how differently we design software.

Everyone starts with an idea how how the software should turn out. A vision where there often is remarkable agreement among people, software should be modular, independent parts encapsulated in some way, and easy to change over time.

Then some people starts from this vision by focusing on the ideals. Since stateless software is so much easier to reason about, let's build our architecture on that it should be. The same goes for other ideals such as side effect-free and idempotency. Any parts that deviate from this vision are dirty little special cases and can be treated as such.

But software without state is useless. State, and side effects, are the whole reason for the software to exist.

Some people take this second approach and starts with the state and side effects, how these are represented and stored and how to allow for change over time. Then the rest of the software, the easy parts if you will, is sketched out after that to accommodate to this design. These tends to be the same people who starts thinking in data structures and thinks the design of data is more important than the design of code.

Just as an example, sometimes I see people with monstrous Kubernetes-style architecture for a web app and then all state shoved in a Postgres in the back without even a thought. Well, in reality that's your whole application right there, in the back. There are a million ways to start stateless web workers, all perfectly fine, that's not where your energy should be spent.

Maybe the above is a simplification. I know for certain that I am in the latter camp. But over the years where I have found myself in disagreement over architecture, it is often with people I have later come to see as in the first camp. And this keeps coming back again and again, on all levels of software architecture.

In the JavaScript ecosystem, most the challenges with repos are not the code but the tooling. SOLID principles are not going to solve that.

In our monorepo, few changes are cross module. But its a grand day to reduce total package count by 20 by finally getting rid of some old project dependencies and being able to use a single test runner config and version across packages. I keep my sanity that way.

Meanwhile, our non-JS repos haven’t required as much attention in this area.

What if you want to publish some of your modules and keep the rest private but depending on the published ones?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact