Hacker News new | past | comments | ask | show | jobs | submit login

I think this article is complete horseshit. A monorepo will serve you 99% of the time until you hit a certain level of scale when you get to worry about whether a monorepo or a polyrepo is actually material. Most cases are never going to get there. Before that point, a polyrepo is purely a distraction and makes synchronous deployment really painful. We had to migrate a polyrepo to a monorepo and it was not fun because it was a migration that should have never had to be done in the first place. Articles like this are fundamentally irresponsible.



I work on CI/CD systems, and that’s one thing that definitely gets harder in a monorepo.

So you made a commit. What artifacts change as a result? What do you need to rebuild, retest, and redeploy? It doesn’t take a large amount of scale to make rebuilding and retesting everything impossible. In a poly repo world, the repository is generally the unit of building and deployment. In monorepo it gets more messy.

For instance, one perceived benefit of a monorepo is it removes the need for explicit versioning between libraries and the code that uses them, since they’re all versioned together.

But now, if someone changes the library, you need to have a way to find all of its usages, and retest those to make sure the changed didn’t break their use. So there’s a dependency tree of components somewhere that needs to be established, but now it’s not explicit, and no one is given the option to pin to a particular version if they can’t/won’t update. This is the world of google & influcenced the (lack of) dependency management in go.

You could very well publish everything independently, using semver, and put build descriptors inside each project subdirectory, but then, congratulations, you just invented the polyrepo, or an approximation thereof.


> So you made a commit. What artifacts change as a result? What do you need to rebuild, retest, and redeploy?

If you're using Git, then typically for each push to the remote repository you get a notification with this data in it:

  BRANCH        # the remote branch getting updated
  OLD_COMMIT    # the commit the branch ref was pointing to before the push
  NEW_COMMIT    # the commit the branch ref was pointing to after the push

  # To get the list of files that changed in the push:
  git diff --name-only "$OLD_COMMIT" "$NEW_COMMIT"
Once you know which files changed in a push you can figure out which artifacts you need to build. Right now you'll have to write that tooling yourself since I don't know of any off-the-shelf tools that do it. In my company's case, we have "project.yml" files scattered through the repo telling us which directories have buildable artifacts and what branches each one needs to be built for. The tooling to support this is a few hundred lines of Bash and Python. In our case we're still small enough that we can brute force some stuff, but we can easily improve the tooling as we go along.


This is something I've been working on a bit myself.

Figuring out which files change is relatively easy (as you've demonstrated). Figuring out what the impact of that is quite hard in non-compiled languages (tools like Maven, Buck, Bazel, etc do this well for compiled languages). I.e. In a repo which is primarily JavaScript, I can get the list of changed files, and hopefully have unit test files which are obviously linear to those. However, knowing if these are depended on by other files/modules (at some depth) is much harder. Same for integration tests -- which of these are related?


I believe the typical approach is to have project.yml list its dependency projects. Build a DAG(error on cycles) and then build all changed and downstream projects.


Rebuild and deploy everything, what's the actual problem? Like the OP said, that's a scale issue and most projects don't have it.

Also building/testing is far more effective at finding dependencies than just going by repo structure. There are numerous package managers available to solve versioning if you need separate components.


100% agree with your entire comment. This is what we do with our monorepo now -- it turns out the rebuilding and deploying everything is actually just fine. If your application services are stateless and decoupled from your state stores, it's completely harmless. If you need to do something fancy, congrats! You're at scale -- enjoy it but remember that it's something rare.


Yes! This brings to mind Donald Knuth: "Premature optimization is the root of all evil."


One thing I heavily enjoy about monorepo's (I'm talking java/c#/c++ projects) is the ability to navigate the entire codebase from within an ide. That alone has caused me to migrate projects (medium projects ~20 developers) from poly to mono repos. Dropping tons of duplication in the build system in the process. I can think of good reasons to split projects along boundaries when it makes sense, but not blindly by default, and not without carefully considering the tradeoffs.


bazel/buck/pants all solve this, but independently of that, they're probably the best build systems.


In the java world this gets solved with gradle's incremental build system, which uses a build cache, a user configured dependency tree, and some hashing to determine what needs to build.


I found it to be neither horseshit nor irresponsible. A bit overdrawn and skewed in some of its arguments, perhaps. But then again... so was your critique. For example:

We had to migrate a polyrepo to a monorepo and it was not fun because it was a migration that should have never had to be done in the first place

s/polyrepo/monorepo/ in the above and you have an assertion of about equal plausibility and weight.


No, it is horseshit. 99% of companies will never hit big company VCS scaling issues, and once they do, they're on their own. To characterize that scale as common is one of the most embarrassing failures of modern software engineering. People are so embarrassed to use well worn tooling and accept that large scale is both uncommon and something that doesn't invalidate tried and true patterns for smaller scales. It's utterly baffling to me.


> It's utterly baffling to me.

It's not hard to explain: Scale has been fetishized by the industry/trade. Everyone wants the cachet of working at scale. 1.5 GB of CSV text? That's Big Data, let's break out map-reduce. 1 load balancer and not enough servers to fill half a rack? That's a scalable architecture, we could scale to multiple datacenters at some point in the future, so let's design it now.

Deploying oversized solutions is partly due to outsiders jonesing the scale of Google, Fb and gang, partly Resume-stuffing ("I have worked with this tech before"), and lastly FANG diasporans who miss the tech they used and rewrite systems/evangelize the effectiveness of those solutions to much smaller organizations.


To be fair, part of the problem is that each of us has been bitten throughout his career by issues which could have been prevented by being able to predict the future. We then move from the truth that if we had known the future, we could have acted better yesterday to the fallacy that today we finally know what we're going to need tomorrow.

This isn't isolated to our industry, of course: a constant refrain is that generals & admirals fight the last war; the financial industry is rife with products which are secure against the last recession, and so forth.


"To characterize that scale as common is one of the most embarrassing failures of modern software engineering."

This point cannot be stressed enough. Almost all the worst software engineering failures I have seen have been caused by premature scaling - which is way worse than premature optimization because the latter's effects are usually local. But premature scaling causes architectural decisions that affects the whole project and simple cannot be undone.

One example among many are some of the influential engineers insisting on that we needed four application servers with fail-over because they had experienced servers crashing under heavy load. This complicated failover setup took huge amount of time and resources to setup, delaying the project by months. In the end it only attracted a few hundred visitors per day and was cancelled in under a year.


This complicated failover setup took huge amount of time and resources to setup, delaying the project by months.

Hmm - failover shouldn't be that hard to set up. If it was then that suggests that other issues (technical debt, inexperienced management) were the more likely culprits.

Not the simple fact that they chose not to ignore the need for failover.


> [it] shouldn't be that hard ...

Now where have I heard those words before... :)


> 99% of companies will never hit big company VCS scaling issues

A much higher percentage of developers will. Number of companies is not a good metric for whether a topic is worthy of discussion.


Number of companies is a good metric, because companies own the repos and if it becomes a pain-point, only the developers working at that point in time will be hit by this. Anyone who leaves before this inflection point or joins after it's been solved will not be hit, so I don't think the percentage of developers in that intersection is large.


> after it's been solved

I think a quick perusal of this page will show that it's not really "solved" after all. A far higher percentage of developers continue to be affected by large-repo issues than a Python-specific issue (currently #1 story on the front page) or anything to do with Ethereum (currently #7). Are those "horseshit" topics too?


I agree, it's not really solved, but solved "enough". You can't have your cake and eat it, there are tradeoffs involved- if you grow large enough to hit monorepo limitations, you are large enough to invest in tooling that manage your workflow (the tradeoff). However, if you're a small organization, you can't afford the tooling and you're wasting time/quality coordinating polyrepo releases, so you are better off with a monorepo.

> A far higher percentage of developers continue to be affected by large-repo issues than...

Are you suggesting that the results of the HN ranking algorithm at this very moment in time is a good metric of measuring what affects developers? I don't agree, and besides @yowlingcat's opinion that the article is "horseshit" is unrelated to how well its ranked on HN.


> opinion that the article is "horseshit" is unrelated to how well its ranked on HN.

When the opinion is not just disagreement but outright dismissal of the topic as worth discussing, I'd say ranking is relevant. So is comment count. Clearly a lot of people do believe it's worth discussion, not irrelevant or a foregone conclusion as yowlingcat tried to imply.


A lot of people can think a lot of things are worth discussion, but it doesn't mean it's prudent to waste time on it.


Incidentally, I also think those are horseshit topics as well (Coconut is someone trying to daydream Python into Haskell with no practical reasons to do so and making Ethereum scale better doesn't make a legitimate use case for it emerge) but that's besides the point.

What you call large-repo issues I call organization issues. From your other comments, it's clear that we draw the lines at different places, but I think I'm right and you're wrong in this case because I've seen engineers try to solve organizational issues with technology enough times that it's a presumable anti-pattern. Why don't we take your own words at face value?

"That hasn't been my experience. Yes, it's a culture thing rather than a technology thing, but with a monorepo the "core" or "foundation" or "developer experience" teams tend to act like they're the owners of all the code and everyone else is just visiting. With multiple repos that's reversed. Each repo has its owner, and the broad-mandate teams are at least aware of their visitor status. That cultural difference has practical consequences, which IMO favor separate repos. The busybodies and style pedants can go jump in a lava lake."

Why are there busybodies and style pedants working in your organization? Because your organization has an issue. Do you think that would be at the root of this pain, or a tool choice? I'll give you a hint, it's not the tool choice.


> Why are there busybodies and style pedants working in your organization?

Because to an extent they serve a useful purpose. In a truly large development organization - thousands of developers working on many millions of lines of code - fragmentation across languages, libraries, tools, and versions of everything does start to become a real problem with real costs. You do need someone to weed the garden, to work toward counteracting that natural proliferation. That improves reuse, economies of scale, smoothness of interactions between teams, ease of people moving between teams, etc. It's a good thing. Unfortunately...

(1) That role tends to attract the very worst kind of "I always know better than you" pedants and scolds. Hi, JM and YF!

(2) Once that team reaches critical mass, they forget that the dog (everyone else) is supposed to wag the tail (them) instead of the other way around.

At this point, Team Busybody starts to take over and treat all code as their own. Their role naturally gives them an outsize say in things like repository structures, and they use that to make decisions that benefit them even if they're at others' and the company's expense. Like monorepos. It's convenient for them, and so it happens, but that doesn't mean it's really a good idea.

Sure, it's a culture issue. So are the factors that lead to the failure of communism. But they're culture issues that are tied to human nature and that inevitably appear at scale. I know it's hard for people who have never worked at that scale to appreciate that inevitability, but that doesn't make it less real or less worth counteracting. One of the ways we do that is by putting structural barriers in the corporate politicians' way, to maintain developers' autonomy against constant encroachment. The only horseshit here is the belief that someone who rode a horse once knows how to command a cavalry regiment.


You realize many, if not most, people reading this work at places already big enough to have "VCS scaling issues". I've seen more than a few monorepos, but I've never seen one used as anything but a collection of small repos.


No, it is horseshit [because scale]

The thing is, scale was only one factor listed among many.


Was it? Once scale problems is gone -- you assume that all code can be checked out on one machine, and you have enough buildfarm to build all the code -- the most of the article's points no longer apply.

The downsides which still apply are Upside 3.3 (you don't deploy everything at once) and Downside 1 (code ownership and open source is harder).

And those are pretty weak arguments -- I would argue that deploying problems exists with polyrepo as well, and there are now various OWNERS mechanisms.

The fact the polyrepos are harder to open source is a good point, but having to maintain multiple separate repos just in case we would want to opensource one day seems like sever premature optimization.


In my experience, monorepos cause outrageous problems that have nothing to do with scale. Small or medium monorepos are equally as terrifying.

It’s much more about coupling and engendering reliance on pre-existing CI constraints, pipeline constraints, etc. If you work in a monorepo set up to assume a certain model of CI and delivery, but you need to innovate a new project that requires a totally different way to approach it, the monorepo kills you.

Another unappreciated problem of monorepos is how they engender monopolicies as well, and humans whose jobs become valuable because of their strict adherence to the single accepted way of doing anything will, naturally, become irrationally resistant to changes that could possibly undermine that.

It’s a snowball effect, and often the veteran engineers who have survived the scars of the monorepo for a while will be the biggest cheerleaders for it, like some type of Stockholm syndrome, continually misleading management by telling them the monorepo can always keep growing by attrition and will be fine and keep solving every problem, unto the point that it starts breaking in collossal failures and people are sitting around confused why some startup is eating their lunch and capable of much faster innovation cycles.


Oddly enough, you could s/mono/multi in your post and that would exactly align with my own experience. I'm not kidding: everything from engendering reliance on weird homegrown tooling, CI & build pipelines to the pain of trying to break out to a different approach, to enforced bad practices, to developers (unknowingly) misleading management, to colossal failures.

I've worked on teams with monorepos and teams with multiple repos, and so far my experience has been that monorepo development has been better — so much so that I feel (but do not believe) that advocating multiple repositories is professional malpractice.

Why don't I believe that? Because I know that the world is a big place, and that I've only worked at a few places out of the many that exist, and my experience only reflects my experience. So I don't really believe that multiple repositories are malpractice: my emotions no doubt mislead me here.

I suspect that what you & I have seen is not actually dependent on number of repositories, but rather due to some other factor, perhaps team leadership.


Everyone always says this type of response about everything though. If you like X, you’ll say, “In my experience you can /s/X/Y and all the criticisms of X are even more damning criticisms of Y!”

All I can say is I’ve had radically the opposite experience across many jobs. All the places that used monorepos had horrible cultures, constant CI / CD fire drills and inability to innovate, to such severe degrees that it caused serious business failures.

Companies with polyrepos did not have magical solutions to every problem, they just did not have to deal with whole classes of problems tied to monorepos, particularly on the side of stalled innovation and central IT dictatorships. Meanwhile, polyrepos did not introduce any serious different classes of problems that a monorepo would have solved more easily.


Absolutely amazing to me how much engineers conflate organizational issues with tooling issues. Let's take a look at one of your comments:

"The last point is not trivial. Lots of people glibly assume you can create monorepo solutions where arbitrary new projects inside the monorepo can be free to use whatever resource provisioning strategy or language or tooling or whatever, but in reality this not true, both because there is implicit bias to rely on the existing tooling (even if it’s not right for the job) and monorepos beget monopolicies where experimentation that violates some monorepo decision can be wholly prevented due to political blockers in the name of the monorepo.

One example that has frustrated me personally is when working on machine learning projects that require complex runtime environments with custom compiled dependencies, GPU settings, etc.

The clear choice for us was to use Docker containers to deliver the built artifacts to the necessary runtime machines, but the whole project was killed when someone from our central IT monorepo tooling team said no. His reasoning was that all the existing model training jobs in our monorepo worked as luigi tasks executed in hadoop.

We tried explaining that our model training was not amenable to a map reduce style calculation, and our plan was for a luigi task to invoke the entrypoint command of the container to initiate a single, non-distributed training process (I have specific expertise in this type of model training, so I know from experience this is an effective solution and that map reduce would not be appropriate).

But it didn’t matter. The monorepo was set up to assume model training compute jobs had to work one way and only one way, and so it set us back months from training a simple model directly relevant to urgent customer product requests."

What do you think is the cause of your woes, the monorepo, or the disagreement between your colleague in central IT tooling who disagreed with you? Where was your manager in this situation? Where was the conversation about whether GPU accelerated ML jobs were worth the additional business value to change the deployment pipeline? Was that a discussion that could not healthily occur? Perhaps because your organization was siloed and so teams compete with each other rather than cooperate? Perhaps because it's undermanaged anarchy masquerading as a meritocracy? Stop me if this sounds too familiar.

I've been there before. I know what it feels like. But, I also know what the root cause is.


Nobody is conflating anything. Culture / sociological issues that happen to frequently co-occur with technology X are valid criticisms of technology X and reasons to avoid it.

To argue otherwise, and draw attention away from the real source of the policy problems (that the monorepo enables the problems) is a bigger problem. It’s definitely some variant of a No True Scotsman fallacy: “no _real_ monorepo implementation would have problems like A, B, C...”.

The practical matter is that where monorepos exist, monopolicies and draconian limitations soon follow. It’s not due to some first principles philosophical property of monorepos vs polyrepos — who cares! — but it’s still just the pragmatic result.

Also you mention,

> “Where was the conversation about whether GPU accelerated ML jobs were worth the additional business value to change the deployment pipeline.”

but this was explicitly part of the product roadmap, where my team submitted budgets for the GPU machines, we used known latency and throughput specs both from internal traffic data and other reference implementations of similar live ML models. Budgeting and planning to know that it was cost effective to run on GPU nodes was done way in advance.

The people responsible for killing the project actually did not raise any concern about the cost at all (and in fact they did not have enough expertise in the area of deploying neural network models to be able to say anything about the relative merit of our design or deployment plan).

Instead the decision was purely a policy decision: the code in the monorepo that was used for serving compute tasks just as a matter of policy was not allowed to change to accommodate new ways of doing things. The manager of that team compared it with having language limitations in a monorepo. In his mind, “wanting to deploy using custom Docker containers” was like saying “I don’t want to use a supported language for my next project.”

This type of innovation-killing monopolicy is very unique to monorepos.


Here here yowlingcat. Article is a way too prescriptive and agreed, borders on irresponsible. Monorepo vs polyrepo argument is way too broad a subject to create generalized stereotypes like this. These opinions sadly are taken as facts by impressionable managers, new developers, etc, and have cascading effects on the rest of us in the industry. Use what makes sense in the project environment and team, don't just throw shade at teams who are successfully and productively using monorepos where they make sense. Sure there is good reason to split things up on boundaries sometimes, (breaking out libraries, rpc modules, splitting along dev team boundaries, etc etc etc), but not blindly by default. Will Torvalds split up the kernel into a polyrepo after reading this article? Something tells me that would be a bit disruptive.


It's interesting that you talk about "team's using monorepos". I think that's different than what the article is arguing against, which is an entire company (100+ devs) using a monorepo.

A team with 5 services and a web front-end in a single repo is doable with regular git. It's a different beast I think.


Thanks softawre, what triggered me is the sensationalist title and general bashing of monorepo's (which a large percentage of impressionable readers will walk away from this article thinking, ie: that monorepo's are only for dummies and you're doing it wrong if you're not using a polyrepo). A less inflammatory title more along the lines of "Having trouble scaling development of a single codebase amongst 100's of developers? Consider a polyrepo". This argument comes up in developer shops almost as much as emacs vs vi, tabs or spaces, etc.

When you have 100+ developers on a project, managing inbound commits/merges/etc will become tedious if they're all committing/merging into one effective codebase.

IMHO, It depends on the project, the team makeup, the codebase's runtime footprint, etc whether or not/or when it makes sense to start breaking it up into smaller fragments, or on the other hand, vacuuming up the fragments into a monorepo.

I did enjoy reading Steve Fink's from Mozilla's comment (it's the top response on the OP's medium article) and counter arguments about monorepos vs polyrepos in that ecosystem (also clearly north of 100 developers). It's easy to miss if you don't expand the medium comment section, but very much worth reading.


> A monorepo will serve you 99% of the time until you hit a certain level of scale when you get to worry about whether a monorepo or a polyrepo is actually material

If you worked in a company that had a core product in a repo, and you wanted to create a slack bot for internal use, where would you put the code? I assume not within your core product's codebase, but within a separate repo, thus creating a polyrepo situation.

So when you say a monorepo will serve you in 99% of cases, are you not counting "side" projects, and simply talking about the core product?


This article is too agressive and have a childish language that is not for my taste.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: