Hacker News new | past | comments | ask | show | jobs | submit login

The real problems with monorepos are that most of the benefits vanish as the scale increases unless you invest into building more monorepo tooling.

In your particular case, if your library becomes too popular, your one or two line implementation detail changes ripple out and trigger rebuilds of too many downstreams, many of which will have flakey tests and fail your MR. If most users are not actually depending on that functionality, or if you simply are doing semver properly, then you could avoid rebuilding those dependencies. Eventually the builds take too much time and CI rejects the pipelines or you continuously bump the build timeout but wait longer and longer for your changes to go live. You can solve these problems, if you invest in more monorepo tooling.

Similarly, once you are too popular a library in a monorepo, you will never do any atomic breaking API changes since it would require updating too many downstreams. Instead you will fake version it: add a new API, migrate users to it in multiple commits, delete the old version. Some of these migrations run out of steam midway in the biggest phase: phase 2. This approach does have the benefit of forcing the upstream author to make the two versions of the API co-exist.

Of course I am talking about scales where real limits start to break down. When your codebase is larger than your ram, an index won't fit into RAM anymore and every code search requires disk or network I/O. Eventually your repository doesn't fit on disk anymore and you interact with it with a special IDE and only download things you start editing or perform sparse checkouts in the first place so discoverability is again a problem.

Edit: of course some problems crop up sooner than hard limits are reached, like the flakey test issue I mentioned as well as visibility and control of changes to actual maintainers.




> Similarly, once you are too popular a library in a monorepo, you will never do any atomic breaking API changes since it would require updating too many downstreams.

This happens no matter which repo type. Even worse if a project chooses to update after a while, its far more painful having to do the changes after losing the context you had when you did the original changes.

If you want a monorepo, libraries being on the same version is a feature, and it keeps you from diverging.


This doesn't happen in a poly repo because you can just do it. You release version 2.0.0 of something and downstreams update at their own pace. Diverging as you call it.

But this isn't a problem. If 1.0.0 is a finished product then why do you ever need to move to 2.0.0 if you don't need the new features?

The issue in the monorepo is that if you are too popular the change must happen all at once or with copying (fake versioning, like people who version excel files by suffixing with dates), which places pressure on maintainers to not fix design mistakes.

It isn't a feature of the monorepo because you can still diverge by copying, forking or merely stopping support for the old library and this becomes more and more necessary at scale: you lose the feature you thought you wanted the monorepo for.


> But this isn't a problem. If 1.0.0 is a finished product then why do you ever need to move to 2.0.0 if you don't need the new features?

Fair point, I assume (from personal experience at the places I've worked at) that updating the library is inevitable and doing so at a later date tends to be more painful than doing these migrations all at one.

> It isn't a feature of the monorepo because you can still diverge by copying, forking or merely stopping support for the old library and this becomes more and more necessary at scale

This is a problem if all the projects in the monorepo are not actually related. But imagine if all these subprojects are bundled as one OS image, in that case it is very rare that you want multiple library version.

At very large scale I can see your point, I don't have experience there so I can't really argue.


I think in a centralized environment (workplace), it could be argued that immediately triggering all the build failures and having good hygiene in cleaning them up is actually not a bad thing. It really depends on how that's set up.

And how is sparse checkout worse for discoverability? With multiple repos it's even harder to find what you want sometimes if you are talking about 100's of random repos that aren't organized well.


> I think in a centralized environment (workplace), it could be argued that immediately triggering all the build failures and having good hygiene in cleaning them up is actually not a bad thing.

In abstract I agree. However when I'm trying to get my code working having test failures in code that isn't even related to the problem I'm working on is annoying and I can't switch tasks to work on this new failure when the current code isn't working either.


How could broken code (or broken tests) be merged in master ? That is a rhetorical question, of course it happends, and of course this is the root issue you would be facing


There are multiple ways that code gets merged into master and ends up broken.

First, the one where everyone does everything correctly: CI executions do not run serially because when too many people are producing a lot of code, you need them to run at the same time. So you have two merge requests which are done around the same time A and B, they each see a commit C before each other. Say merge request A deletes a function or class or whatever that merge request B uses. Of course merge request A deleted all uses of that function but could not delete the use by B since it was not seen. A + C passes all CI checks and merges. B + C passes all CI checks and merges. A + B + C won't compile since B is using a function deleted by A. If you are lucky, they touch the same files and B doesn't merge due to a merge conflict and the rebase picks it up, otherwise broken master.

Then you will typically have emergency commits to hotfix issues which might break other things.

Then you will have hidden runtime dependencies that won't trigger retests before merge due to being hidden, but every subsequent change to that repo will fail.

Then you will have certificates, dependencies on external systems that go away.


As you may be aware, 100% broken code cannot be merged. However code that works 99.99% of the time can be merged and then weeks later it fails once but you rebuild and it passes. There are a lot of different ways this can happen.


Things become difficult at scale regardless of mono- or multirepo. You also have to build dedicated tooling if you heavily lean into splitting things into a lot of repositories, in order to align and propagate changes throughout them.


Sure, but polyrepos don't break with scale in the same way as monorepos. You only need additional tooling when you are trying to coordinate homogeneity a scale larger than your manual capability. Autonomous services don't typically do that kind of coupling without cohesion that people naturally find necessary in a monorepo and you can build cooperative and coexisting products without that kind of coupling.

When I read the white papers by google or uber on their monorepos, when I see what my company is building, it is just a custom VCS. Everything that was thrown away initially gets rebuilt over time. A way to identify subprojects/subrepositories. A way to check out or index a limited number of subprojects/subrepositories. A way to define ownership over that subproject/subrepository. A way for that subproject/subrepository to define its own CI. A way to only build and deploy a subproject/subrepository. Custom build systems. Custom IDEs.

The entirety of code on the planet is a polyrepo and we don't have problems dealing with that scale like we would have if we stuffed it all in one repo like this debian monorepo shows. Independence of lifecycle is important, and as a monorepo scales up people rediscover that importance bit by bit.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: