The post tangentially touches on a pet peeve of mine that I think lots of companies get fundamentally wrong these days: version control, CI, and deployment are separate concerns and shouldn't be jumbled together into an amorphous blob.
If the build fails, you should be able to re-run it without pushing to version control. You can e.g. have builds fail due to issues with your artifact repository that don't require any changes to the code to debug/fix. You can have issues with deployment after the build completes, you don't want to have to rebuild to redeploy. The most shocking of all, though, and I have actually seen this: If you deploy a broken build, a rollback to an earlier version should never, ever, require a rollback on your source control.
As always, it's very much a case of too much of a good idea. People correctly identify that automation is a good thing, but then turn that into "manual operation is a bad thing, and therefore we shouldn't cater to it".
I kind of disagree. While obviously these three things (VCS, CI, and Deployments) are different things, there's a lot of value in reducing all of them to just one thing: VCS is truth. It can make the system far easier to reason about.
Question 1: What's running in production? Allowing production to run "anything" means "anything" could be running there, so you need tooling to answer that question. Maybe that's just "give every developer read-level access to AWS", or maybe its a slack slash command, or something, but there's a cost there. With a coordinated model: its just the main branch. Its always the main branch. So look at the main branch.
(Sidenote: this isn't a perfect solution, because oftentimes there's a temporal delay between a main branch merge and deployment, and developers want to know precisely when the cutover starts/finishes. So you end up building that extra stuff anyway. But: that can be delayed quite a bit, as its a higher level question given that the main-is-production model solves a lot of problems alone).
Question 2: How do I re-run the build? Well, you can push a new commit, or issue some slack slash command, or maybe a comment in the PR that reads `!buildbot run` or whatever. How about we just make it simpler? Just push a new commit.
The broader point is that: turing machines can do anything. Human brains can't. So when operations are simplified around models like "source-is-truth" its because that model is a satisfying fit for the problem space which helps the human brain reason about what is happening. All models are wrong; but some are useful.
> Allowing production to run "anything" means "anything" could be running there, so you need tooling to answer that question. Maybe that's just "give every developer read-level access to AWS",
You can setup your deployment tools to always deploy the lastest from a specific branch unless manually overridden. It is also not very difficult to post the deployed got commit hash to an internal channel, or even expose it via a web interface. Node of this require designing your system so that the only way to trigger a new deployment is by pushing a new commit.
> Well, you can push a new commit, or issue some slack slash command, or maybe a comment in the PR that reads `!buildbot run` or whatever. How about we just make it simpler? Just push a new commit.
And what happens if you want to test building before your change is fully ready to merge or roll back to an older release? It is better to design a modular system than to try to hack this basic stuff in when you suddenly need it.
Then there's my security concerns of conflating "write access to github" full access to the environment. If you allow code to be deployed immediately on commit, then anyone who can commit can compromise your environment.
"VCS is truth" does not assert what it is, but rather its treatment in the system. There is no knowable truth in large-scale systems design.
Velocity is relative in physics, right? Relative to what? What's the frame of reference? Usually, we say, the Earth itself. VCS is that to software engineering, in this setup; there's movement and realities outside of VCS, but VCS is our zero; the tare. Everything is relative to the VCS. It is not truth through its nature; it is truth through how we treat it, how we build systems to interface with it, synchronize with it, etc.
I agree. This goes back to the recent post "don't use latest". If you don't pin everything you can't claim that VCS is truth. A built binary is the truth, whether that be a container image, go bin or whatever. Even then I wouldn't say it's the absolute truth. The whole system is the truth.
> If the build fails, you should be able to re-run it without pushing to version control.
I've uh, not seen one that doesn't? (I agree with you, it shouldn't. Does your CI system not have a "retry" or "restart" button?)
> The most shocking of all, though, and I have actually seen this: If you deploy a broken build, a rollback to an earlier version should never, ever, require a rollback on your source control.
So, my company does this, and I love it; it gives even the least skilled of our devs a good shot at the "recognize things aren't working → problem started with last deploy → revert last deploy" debug path, and getting it right. Having that last step of revert a deploy literally correspond to a git revert is very useful & easy.
Keeping the "state of what is deployed" in code also gives us good confidence that there aren't random changes being made, allows for code review, provides a log of who changed what when — all the goodness of infrastructure as code.
Now, we keep the main production "monorepo" separate from our repo that describes what is deployed (let's call it "deploy-state". So a revert of a deployment happens in the "deploy-state" repo, not in the monorepo. So, someone's merged feature branch isn't going to get reverted. (Though, we should have a conversation about whatever the actual problem is: if it is code in the monorepo, why did it not fail tests there? We also have ways of saying "this range of commits, from [break, fix), should not get deployed to prevent repeats.)
Whether that setup is a "true monorepo" or not, since it literally involves more than one repo … well, you be the judge. It seemed to us pragmatic to have it be separate, and we've not regretted that.
> So a revert of a deployment happens in the "deploy-state" repo, not in the monorepo.
This makes me think you're disagreeing with the literal interpretation of what I wrote, while very much agreeing in spirit.
I'm not trying to say that a VCS is an intrinsically wrong solution to managing deployment state (I'm not a fan, but it's an eminently reasonable approach). My issue is with deployment state and application source being enmeshed in the same repo such that you can't evolve one without messing with the other.
> Does your CI system not have a "retry" or "restart" button?
It's pretty common on GitHub that PRs trigger a build but unless the author has some permission on the target project already, can't trigger a rebuild. I'd estimate 30%+ of the PRs I create fail for reasons unrelated to my changes and I have to ask someone to retry.
Ah, if the project is open-source & its CI is external, I guess I could see that. I was reading the remark in the context of "within a company", where devs would have access to the CI system (& thus, the "restart" button).
This isn't that very different from having a "deploy-state" branch that runs alongside your master branch too, bringing it even closer to a true monorepo.
Another class of problem, even without errors, is needlessly restricted concurrent access: your teams tries to push an urgent fix that needs to be built, tested and deployed an hour ago, to find that the CI system will be on its knees, or actually locked, because another team is compiling and testing something unimportant. In many cases, insult can be added to injury by forcing you to waste more time after the other build finishes to change version numbers or merge completely irrelevant changes in source control.
We have reverts bypass testing and most of CI. A revert commit on master goes directly to production as fast as possible. Nothing should stop an emergency revert. If it’s not an emergency, it goes through the normal process.
Don't all action runners allow you to re-run tests and builds with a click of a button and most platforms allow rolling back to any arbitrary commit number?
(also I don't agree on not rolling back, I want one branch to always be the actual code living on prod, or else you end up with arbitrary time wasting rabbitholes of accidentally debugging code that isn't live)
Key word there is "require". Having that automation set up is super convenient and a pretty important factor in making things more productive, not contesting that.
The other side of the equation is that if I break something, I want to make the smallest change possible that fixes it, so as to avoid confounding variables if it doesn't work. Off the top of my head:
* I don't want to make things worse by messing up the rollback due to stress
* I don't want to trigger a build and wait for CI to run and build
* I don't want to depend on my build not being perfectly reproducible (e.g. because the problem comes from a bad dependency upgrade and the rollback rebuilds with the bad dependency)
What I do want is a really simple way to just tell my system "You know that thing you were running 5 minutes ago? Please go back to running that again".
Thank you! I thought I was alone in thinking like this.
I hate Atlassian Bamboo with a passion but it is the only CI/CD system I remember coming across that actually has a concept of deploying a built artifact.
> In real life, using a hermetic build system for tests does not actually scale. The dependency graph of your repo becomes entwined with how much of your repo gets tested. Change a deeply nested component often? Even though you only want to deploy one out of all your services, you’ll get full rebuilds of your repo.I still think a hermetic build system is the way to go; but I wouldn’t gate merges on testing the full repo. Instead, let’s give some flexibility to our engineers.
Scaling a monorepo also involves scaling with the number of engineers at the company. The strategy mentioned here, of putting the trust in your engineers to determine what tests need to be run after platform-wide changes does not scale. Eventually there will be enough product teams who care more about speed than quality. Likely resulting in increased outages, because changes won't be guaranteed to be fully tested in CI.
A more reliable way to solve this problem would be to scale out testing in CI. For example with Bazel's remote execution or any type of CI sharding strategy. Optimizing CI build and test times for changes impacting many targets is a scaling problem any monorepo will eventually need to tackle.
In a perfect world, you can scale CI indefinitely. However, I don't think it's a simple as that. As mentioned in the post, even with a hermetic build system your CI times become entwined with your dependency graph, no matter if you're able to shard it over remote executors or not.
The block you've quoted specifically mentions gating _merges_. I still think its prudent to run ~all tests after merge, with automatic reverts if tests start failing. I really want to make sure that people think of CI, merges, and deploys as pieces of a larger puzzle and not a monolith.
I updated the language around this in [0], hope that makes it clearer as to my intention.
Thanks for the clarification! fwiw Uber's monorepo treats CI and merges as the same, and deployments separately [0].
What it comes down to is engineering a solution to "mitigate a broken main" (automatic reverts) or a solution to "keep main green" (gating commits on a green build/test).
Monorepo is just one small part of the puzzle. If you want to actually achieve the dream state that is alluded to when someone says "monorepo", you have to be willing to endure a super-deep and honest evaluation of your tech stack and processes.
We have been successfully running a monorepo for ~5 years now using nothing more than the bare-ass GitHub PR process with some basic checkbuilds sprinkled on top. We intentionally avoided getting our hands dirty with CI automation as much as was feasible. Building our software manually is so simple its not really worth dedicating a team to worrying about.
I would say the biggest key to our success was finding a way to minimize 3rd party dependencies. By making certain strategic technical choices, we were able to achieve this almost by default. Our repository is effectively hermetically sealed against outside changes in all of the ways that matter to the business owners. Github providing issues at the repository grain is also a very powerful synergy for us in terms of process - we started using issue numbers as the primary key for all the things.
With regard to technical choices - consider that some languages & ecosystems provide a lot more "batteries included" than others, which provides a substantially different landscape around how many 3rd parties may need to be involved and how.
Same here. We migrated from 5 repositories down to a single one in 2013, and have been working with a simple PR process (first Phabricator, then Bitbucket) ever since, along with a TeamCity as a CI to run tests and deploy.
We do try to minimize our "technological footprint" by reducing the number of languages, tools, third party packages, etc. But I would say our main strategy is to enforce static analysis everywhere. Global type-checking between C# and F# projects comes out-of-the-box (but you have to design the shared types properly), and we produce TypeScript contracts and signatures out of C# contracts and API endpoints.
Strong typing + monorepo is a superweapon in my view.
I enjoy being able to make a change to some common type, open up a "Monorepo.sln" panopticon and instantly see every project/system/service/library/et.al. across the entire organization that would be impacted by the change.
> Building our software manually is so simple its not really worth dedicating a team to worrying about.
I’m curious about this one — do you mean you don’t have an automated build/deploy mechanism? If something is very simple, it’s normally also very simple to automate. A short bash script running in a basic GitHub action is next to no maintenance and saves a lot of time, even if it’s just 5-10 minutes each day.
> do you mean you don’t have an automated build/deploy mechanism? If something is very simple, it’s normally also very simple to automate.
Absolutely. Making things simple also synergizes very well with automation.
We actually have written an in-house system that can build our software and package it up to AWS S3 buckets for final delivery. This is something used more frequently by project managers to create releases for customers. Developers usually prefer to run builds out of VS2022 for the special ad-hoc cases and other non-customer unicorn targets (i.e. internal admin dashboards).
The point is that the whole thing is so simple in aggregate that we don't need engineers spending dedicated time worrying about how it all fits together. One basic msbuild command and you have a release folder you can zip up and send to the customer (or extract to some internal QA system).
I think I get where the parent is coming from -- automatons are a "behind the scenes" thing and can balloon in complexity without realizing it. But if you're forced to perform every step manually with no automation you create natural pressure to avoid doing complex things.
I'm not OP, but we have a similar setup at my company. We actually use C++ as our primary language, with a significant amount of python for testing and glue.
In general, dependencies are included in our repository directly, changing/adding dependencies means getting your dependency in to the repository.
"There are a couple of types of flaky tests that come to mind:
Tests that perform network operations
...
The first one is easy, ban these. You don’t need them in the blocking portion of your pipelines."
At a certain large FAANG company I worked at in the past, there were some end-to-end tests that used the network that were blocking. Most e2e tests were not blocking, but a few were considered critical enough to halt the whole build/deploy.
We had a system that checked both flakiness and running time for each of these tests. If they were consistently flaky, they were removed from the pool and the owner was informed and had to manually work to put it back in the pool and get sign-off, including showing things like load test results to prove that the same test had been run many hundreds of times in different environments.
"Your version control system not being able to handle the sheer number of references and operations done to the repo."
This was also an issue. For most companies, git is totally fine (unless you are filling it with petabytes of binary data or something else that is not a good idea without special mitigations) but the argument that "git is used by the Linux kernel! It should work for you!" falls down when your codebase is at least 20 times bigger than the Linux kernel.
Interesting post! We use a monorepo at Mito [1], it’s open source if you want to check it out [2].
We solved the test flakiness issue by getting rid of all of our frontend tests, which means we spend more time manually testing our product. We’re happy with the tradeoff because we like being forced to use our product. It’s something a velocity focused team can forget to do… at least something we can forget.
Merge races aren’t really a huge issue for us. They happen sometimes, we catch them, and we fix them. Putting thought into making them easier to fix isn’t something that makes sense at our scale.
That’s being said, we’re a tiny team of 3 first time founders - so the above choices make a lot more sense for in our context than at stripe :-)
A reminder that you need not design your systems how the big companies do! Read their informative blog posts, and then design them for your goals and constraints! There’s a whole world of solutions to problems you don’t have when your small, and it can be very easy to feel like you need to adopt all of them. Hint: you don’t.
> We solved the test flakiness issue by getting rid of all of our frontend tests, which means we spend more time manually testing our product.
This is a terrible idea. Getting rid of testing doesn't make the problem (flakiness) go away, it just moves it to another part of workflow (manually testing). So, now instead of having tests, you have more humans (or because you're a small company, instead of building features which grow the company, you spend time manually testing and fixing regressions that your end users embarrassingly find for you). Never mind that it just doesn't scale as your codebase increases.
Even a basic level of (snapshot) testing react components is not difficult. I wrote a library [1] that intentionally has a ton of tests and over the years, I've been able to do new releases that upgrade to new versions of the underlying dependencies, without worry.
You can't avoid manually testing the frontend anyways. In my experience there is a small part that benefits from automated tests but the vast majority of frontend code needs to be looked at and interacted with by a human being, preferably people who are really good testers.
I work with someone who does this _really_ well, he finds stuff (very quickly) that your typical user/client just wouldn't - and all the other stuff as well. We sometimes joke about how cool it would be if we were able to encode this in automated tests somehow. A good tester knows their users and knows the tech well enough to find anything that could break that communication between them.
Some of that you can encode in acceptance tests - after the fact. But the tricky things you just can't. You have to interact with it, discuss it, look at it from different angles and try to break it - play.
This is why I use frontend error tracking tools (Rollbar, Sentry, LogRocket, take your pick) to notify me when there are edgecase errors.
There are too many combinations of browsers and settings to really rely on any sort of manual frontend testing. It is definitely a case of letting your users test for you, but if you've got notification of thrown exceptions, then now you have a chance of fixing things.
Getting rid of frontend unit testing is a huge mistake though. I don't blame people for doing it. Testing frontend is poorly documented and misunderstood. It took me hours of research to even figure out how to do it myself. That said, I did it. It works. I've proven it works over many years of doing frontend development. Don't use initial difficulty as an excuse to not do it.
There's a difference between regression testing and exploratory testing. For regression testing, stuff that gets repeated every release is hard to catch by human eye, especially as releases get more and more frequent. Automating rote regression testing just to ensure that what works yesterday still works today is a great way to ensure that the manual testing is about to focus on exploratory testing only
I think frontend tests are the reason that every f*king app I use these days seems to crash regularly. Nobody is manually testing, all the (flawed) tests are passing, "yay, we're done, push the damn thing out".
Classic garbage in garbage out. If all the tests are bad then how could they possible help anything. That doesn't mean unit tests or other automated tests are worthless though.
> We solved the test flakiness issue by getting rid of all of our frontend tests
Getting rid of the whole swath of tests seems extreme, but I am absolutely a proponent of outright deleting flaky tests. If a test has a reputation for being flaky, if people roll their eyes and sigh when it fails, and especially if anyone ever ignores it and submits/merges anyway, then that test has negative value.
"Maybe it will catch something one day" is not a strong enough argument for keeping it around forever. The person making that argument should be arguing to spend the time/effort to make the test not flaky, because tests that people don't trust are just useless baggage.
I think this becomes more true as a product gets bigger and more complicated, because it's more likely that people are changing something they don't fully understand and thus have more of a need to be able to trust the tests.
I like the blog post and broadly agree with the conclusions, but I want to double down on something in the article:
> Shared pain
I've worked in some orgs with very large monorepos (1000s of developers working in a single repo) and broadly have had positive experiences with them, but this 'shared pain' was by FAR the biggest drawback I experienced. When things went wrong with the monorepos they tended to go VERY wrong and effect EVERYONE. Multiple incidents of the monorepo just crushing productivity for 1000 person engineering orgs, for a period of time.
That's not to say I think monorepos are bad, it's 100% context dependent whether they make sense for your org in my experience, but I learned that the same tradeoffs you get with architectural monoliths/distributed systems often apply to multi/mono repos as well.
Same experience. It's interesting because we've had great success on our internal design lib's monorepo compared to the overall application. Which makes sense in some regards.
There's a split between our orgs. DocuSign runs a mono repo. But the big product I work on has its own giant mono repo. Design lib is another giant mono repo comprising general front end things also.
I'm sure there are better arguments for a monorepo because I disagree with all 4 of the OPs ones:
> Single source of truth
This has never been a problem for me. You have an app in one repo depending on a version of a package built from another. I don't see the problem.
> Visible cost of change
Altering code does NOT require you to make sure everything is updated or compatible, only if you make a breaking change that the compiler can see or you affect a test that will fail after your change. There are lots of changes you can make that the monorepo doesn't detect any more than a multi-repo
> Sharing and reusing code
I don't agree that finding existing functionality to reuse is easier. How so? You search for "Tokenize" and hope to find something relevant? It is honestly no harder for me than looking into shared libraries or other packages.
> Improved collaboration
"Aligning repos". Not an issue as mentioned earlier. "Change the code and get proper reviews", this is nothing to do with monorepos.
So sorry, I am not personally convinced that many orgs would benefit from monorepos unless they have the skill or cash to pay for the maintenance, the much larger testing requirements, the ability to unblock an organisation when one numpty has broken something or the fact that a load of projects get up-versioned all the time because one change was made to one thing, and this article did not convince me any more.
I think the biggest argument in favour of monorepos is that both code and dependencies are version controlled. You can check out any commit and get a fully working environment, knowing that each module will be at the right version that's compatible with the other modules.
Without this, you need to have some infrastructure to handle these dependencies, for example by agreeing on version numbers, making sure dependency versions of all modules in all repositories are correctly updated, etc. Maybe also scripts to rollback everything to a different versions. There are tools for this of course, but the nice thing with a monorepo is that it's built-in - it's all in the commit history.
So when you say the org needs cash and skills for monorepo maintenance, I actually think it's the other way around. I've seen companies splitting each module in its own repository, but with no tooling to get dependencies right, and the whole thing was a mess. You can't know what commit of what repo is compatible with another commit of a different repo. Had they used a monorepo they wouldn't have this problem, because a commit either build or not, there's no unknown dependencies.
You have found a local optimum that involves multiple repos. Others have found a local optimum that involves a single repo. Arguments for mono-repos are not arguments against multi-repos.
For some background: we migrated from multi-repo to mono-repo to solve several pain points, which are touched on in TFA. It's fair for you to state that you have never had a problem, or that you don't see it, but don't conclude that no one else has ever had that problem, or that they, unlike you, consider that problem to be important enough to need a solution.
> [Single source of truth] has never been a problem for me. You have an app in one repo depending on a version of a package built from another.
But I don't have a package built from another. To support multiple repos, do I need to configure package creation, set up an artifact server, and teach the team how to use a locally-built package for debugging/prototypes ?
> There are lots of changes you can make that the monorepo doesn't detect any more than a multi-repo
There are several techniques for working on a large codebase that involve making easy-to-detect changes, that can then followed to both explore the extent of what the work should cover, and ensure that no piece is forgotten. Having a monorepo makes it easier to put all the code in one place for the static analysis tools to crunch through.
> I don't agree that finding existing functionality to reuse is easier. How so? You search for "Tokenize" and hope to find something relevant?
You can "Find all references" of a specific type to identify functions that can operate on it. The list will usually be short enough to scan in a minute or two. If you're looking for a function to frobnicate that type, it will almost certainly be in the list. I'm not sure how you would this on packages from several repos (especially if that function is non-public).
> "Change the code and get proper reviews", this is nothing to do with monorepos.
My current tooling does not allow me to submit a PR that pushes two different commits to two different repos. If using packages to share code between repos, it also means that the change must deal with package versioning.
There exists a set of tools, techniques and never-been-a-problems that can make mono-repo work and be productive, just like there exists a set of tools, techniques and never-been-a-problems that can do the same for multi-repos.
I like the concept of a monorepo, but have found it challenging to implement because most developers are only responsible for their part - and there is often a big productivity benefit to keeping them narrowly focused. One trick, has been to have a monorepo for CI, rather than a monorepo for code. When one of the smaller packages gets updated the CI monorepo is triggered and all of the systems are tested for interoperation. Github makes this particularly easy with repository dispatches. It's been a wonderful "canary in the coalmine" for early problem detection. Bonus: The monorepo for CI becomes deployment documentation and can easily have its own set of tests that are specific to interop.
The flakiness recommendation isn't tenable at reasonable scale. All tests are flaky to some degree. People frequently set timeouts - if your CI infra is overloaded tests, the assumptions about the timeouts will fail. Additionally, the number of non-flaky tests trends to 0 as the number of tests & engineers increases.
Now I am far from an expert on monorepos but I want to know what you think of the next sentence:
> The first one is easy, ban these. You don’t need them in the blocking portion of your pipelines.
I don't think they mean ban integration testing.
I think they are saying don't block merge requests on integration testing.
Is this still a problem?
Perhaps they are doing some kind of trade off management. Isn't that basically what engineering is, like I remember when I was in high school the classic problem of engineering was do you want thicker heavier wires on electricity poles which means more poles and more expensive or thinner lighter wires so they are lighter but higher resistance and more "leakage". My understanding is there are no correct answers, only the least bad ones.
Imagine someone trying to change the background color of a button in css and waiting three hours for integration tests to finish before anyone can +2 it.
I'd argue that you should have thorough unit tests for each service to ensure that they always respect their public API. Additionally, in a monorepo you can share type definitions so regardless of the service, you know for sure that you're using the right API. If all of that is in place then you can test the integration by just mocking out those services rather than testing their API n times.
If you're in an environment where you have limited/absence of type checking then you're right. In my experience, most problems that come from the integration of networked services come from not properly accounting for all possible responses. You're expecting to get a 200 response and a field of `x` but you got a 204 so the response is empty. That sort of thing.
This is easy when you own all the code, it's a god damn nightmare the moment you introduce code that isn't yours like Redis, Postgres, Elasticsearch, every cloud API.
If you're assuming that the hundreds of millions of lines of other people's code totally outside your control always follows its spec, doesn't have bugs, or can change out from under you with no version bump (yay cloud) you're gonna be bit hard and then have to invest in integration tests.
Good point. I have always run integration tests with Postgres etc. and have found value in it. I do think that you should have a substantially smaller set of those sorts of tests though.
This hit a little to close to home for me. Haha basic sections is perfect. It is supremely hard to get monorepos right at scale without the right culture, tooling, and gating. But on the microservice side there is a list of clear cons as well. I prefer it but have run into the side of deployments which is difficult getting things in sync at times if you don't have truly isolated services and the testing in the middle to expose failures before anything goes out. Databases are an especially difficult variable when you need to roll out migrations.
I’ve seen a company move from mono repo to micro services with moderate success. However they had to increase the team size 5x. Most productivity dropped a lot.
Although on some of the really hard problems it freed them up enough to be more successful.
Using monorepos at a scale above 3 people is difficult to do right. Using multirepos at a scale of repos above 7 becomes difficult as well. It is certainly possible to use either model well, but both are also guaranteed to be much more difficult and time-consuming than you first thought. It takes several years of increasing complexity to finally see all the weird little quirks they need to work well.
As with many words, monorepos can mean different things to different people.
Some people use it for a repo where a single product is developed and typically deployed/published together, but happens to be distributed through separate packages. The Babel repository would be one example of this: https://github.com/babel/babel
Many “design system monorepos” fall in this category as well.
I would say the difference between one team building a single product in one repo or many might be interesting to some, but it’s a completely different problem from a multi-product multi-team repository. At some companies this would be a single repo for all source code.
Building software like this can have a profound impact on your entire engineering culture - for the positive, if you ask me. The single-product monorepos are unlikely to have a similar impact.
I'm working on a huge monorepo hosted on GitHub and using Git. I just wanna say at some scale Git stops being fast and GitHub doesn't have the tools needed for a monorepo. This is a 10 million lines of code monorepo. Not that big. I wonder how Microsoft manages to use Git for Windows. My git status takes 2 seconds to return!
Git can be used in monorepos. The problem is often that people don't put in the effort to configure it (or configure their aliases, or per-repo gitconfig) so that it's not slow; they just accept that slowness is inevitable. I think that your slow `git status` may be due to untracked file detection: try `git status --untracked-files=no`
I got some good pointers the last time I asked, but I'd love some more in depth information:
Does anyone have a good Bazel recipe for making a Rust + Typescipt + protos monorepo that atomically and hermetically builds only the changed portions of the DAG?
Are there any samples of this to refer to? (Any that run on Github workers?)
I'd massively appreciate it. I'm sure I'll get to this naturally someday, but I've got too many other things going on and it's not quite a "life or death" pain point just yet. But it sure would help to have now.
Conversely, if someone wants some contract (possibly to hire) work to help me get to this, my email is in my profile. :)
> At Stripe, my current project is to implement a speculative approach to testing and merging changes, this will allow us to scale merges to hundreds of commits per hour while keeping the main branch green.
I don't know how to interpret this sentence. What does the author mean when they say "implement a speculative approach"? Is this something that other monorepo orgs can also adopt? Or is it specific to Stripe?
Merge conflicts seem like an area with heavy potential for headaches that arises from the use of monorepo, and I wish the author would've shared deeper insights here.
> What does the author mean when they say "implement a speculative approach"?
My guess is that they want to speed up testing by being able to say "this PR only touches these files, and we know they are only tested by these tests, so we don't need to run all the other tests". And instead of needing to define these connections manually, it should be possible to use information from previous tests runs (although that information may be out of date depending on the changes being made, making it speculative). Anyways, just a guess
It's more likely that they will become independent and non-blocking.
But then, "monorepos done right" tend to be quite similar to "multirepos done right", just with the tooling organized differently. So they often share problems and failure modes too.
I worked in a monorepo for the first 10 years of my career. That repo was pretty much a custom Linux distribution including userspace, various kernel version and various wireless drivers version. By the time I left the repo was close to 20 years old.
Due to all the legacy baggage this wasn't for sure the best example.
But I don't miss one bit being called to ssh on a machine to merge conflict in urgency to unblock some unknown project release. Nor the fear of blocking 1000+ dev if I f up somehow.
Not real informative but it did bring up bazel which I went and looked at: https://bazel.build/docs
Wondering what the experiences of others are with this thing. It claims to be anti-"task-based" which strikes me as anti-flexible, although it does appear to have a custom programming language... eh... Suppose I wanted to add ssh/scp, how would I do that?
bazel is a very powerful tool due to it's design builds are fully cacheable und you can bring your own toolchain - so it's good for having reproducible builds without any system dependencies - here is an complete example: https://github.com/drakery3d/fullbazel - it's flexible and powerful but also complicated and it's probably difficult to convince your team/org to adopt it.
Bazel isn't a general purpose a automation tool or a deployment tool, so, you wouldn't use Bazel for this.
Your deployment script would run Bazel to build your binary/container image/whatever. That is, building might be part of the deployment process, but deploying is not part of the build process.
I'm not sure it's so black and white. We have bazel targets that invoke the Serverless framework (for deploying to lambda), or invoke kubectl (rules_k8s), and we `bazel run` these targets in our GitHub workflows. Through dependency resolution, running the deploy targets will rebuild artifacts as necessary (a lambda archive, a docker image, rendered kubernetes yaml) as needed.
If I'm understanding you correctly that's using bazel to _build_ a deployment script/deployment tool (injecting the actual deployment artefacts as build dependencies) and then running the script?
I think that's compatible logically with what I said, but I agree that setup does blur the boundaries a bit.
1. build the app, as needed (java, python, go, or some mixture)
2. build and push docker images, as needed.
3. render kubernetes manifests, replacing image name placeholders with something that exactly matches what's been pushed.
4. apply the kubernetes manifests to a the cluster.
Of course we still use an automation system (Github workflows), and it does a few more things such as actually installing bazel, setting up some credentials, etc. But yeah - the lines are blurred.
Well now I'm curious how the push-docker-image step is implemented...? :-)
`bazel run` just builds the specified target (a normal hermetic build) and then executes what it just built (not hermetic). But if you're actually pushing images to somewhere as part of the build itself (not pushing as part of whatever the app_prod.apply tool does) then that sounds like a non-hermetic/non-reproducible build step, which makes me think there's something happening that I don't understand. Sounds interesting.
`app_prod.apply` is just an executable target that can do the non-hermetic stuff. The hermetic building of rendered yaml files, docker images containing built app artifacts, etc. happens through normal Bazel actions.
You can perform bazel run to execute a build artefact. This executable could run a deployment, SSH somewhere, etc. The advantage of this is that Bazel will ensure the artefacts the executable needs are built before execution.
> Pipelines taking way too long. Build, test, and deploy cycles growing linearly with the size of the codebase.
Gitlab CI config files have a "changes" stanza to limit which pipelines run, so that when you make a change in one part it doesn't run the entire codebase through CI and only releases that component. I'm sure other CI systems have similar controls, so by using that, this should be a non issue?
I'm not opposed to monorepos, but a lot of folks use them as an excuse to build monoliths for cases where those are counter to goals of the business. Therefore I have some level of (hopefully healthy) skepticism about them.
> If your organization puts all its code, and all assets that go along with that code into a single repository; you’ve got yourself a monorepo.
I'm not sure I agree with this. I suppose in the most technical sense, sure, but it's not really true.
We have a single repo with a bunch of microservices in it. Builds/tests are localized to a single microservice though. The beauty of git is that two people can work on two parts of the repo pretty much independently. So while technically there is only one repo, I feel like calling it a monorepo would just confuse people.
The author draws a monorepo vs monolith distinction that articulates it well. I've only every understood monorepo to be literally about how source is managed.
Submit Queue such as described in e.g. Uber’s “Keeping master green at scale” absolutely does scale, as demonstrated by their paper. What I’m referring to is that naively serializing commits does not. I’ll improve phrasing.
If the build fails, you should be able to re-run it without pushing to version control. You can e.g. have builds fail due to issues with your artifact repository that don't require any changes to the code to debug/fix. You can have issues with deployment after the build completes, you don't want to have to rebuild to redeploy. The most shocking of all, though, and I have actually seen this: If you deploy a broken build, a rollback to an earlier version should never, ever, require a rollback on your source control.
As always, it's very much a case of too much of a good idea. People correctly identify that automation is a good thing, but then turn that into "manual operation is a bad thing, and therefore we shouldn't cater to it".