Hacker News new | past | comments | ask | show | jobs | submit login
Keeping master green at scale (uber.com)
301 points by roshanj on Apr 18, 2019 | hide | past | favorite | 115 comments



Adrian Colyer dug into this a little further on the morning paper:

https://blog.acolyer.org/2019/04/18/keeping-master-green-at-...

His analysis indicates that what uber does as part of its build pipeline is to break up the monorepo into "targets" and for each target create something like a merkle tree (which is basically what git uses to represent commits) and use that information to detect potential conflicts (for multiple commits that would change the same target).

what it sounds like to me is that they end up simulating multirepo to enable tests to run on a batch of most likely independent commits in their build system. For multirepo users this is explicit in that this comes for free :-)

which is super interesting to me as it seems to indicate that an optimizing CI/CD systems requires dealing with all the same issues whether it's mono- or multi- repo, and problems solved by your layout result in a different set of problems that need to be resolved in your build system.


> For multirepo users this is explicit in that this comes for free :-)

Only if you spend the time to build tools to detect commits in your dependencies, as well as your dependent repositories, and figure out how to update and check them out on the appropriate builds.

So, no, it doesn't come for free.


sorry, "this" is rather ambiguous.

You are totally correct that to achieve the same performance, correctness, and overall master level "green"ness in a multirepo system you would have to either define or detect dependencies and dependent repos, build the entire affected chain, and test the result. That part is much easier in monorepo.

What I was referring to with "this" is that Uber's method of detecting potential conflicts. In multirepo land it would be a "conflict" if two people commit to the same repo. In multirepo, therefore, detecting potential conflict is trivial.

If Bob commits to repo A and Sally commits to repo B, their commits can't result in a merge conflict. Well, unless the repos are circularly dependent - which would be bad :-) don't do that. Of course, monorepo makes that situation impossible so there's an advantage for monorepo.

It seems like whether you have mono- or multi- the problems solved by one choice will leave other problems the build system has to solve that it wouldn't have to solve if the other option were chosen.

Different work would be required in multirepo but it would be work to solve the problems that monorepo solves just by virtue of it being a monorepo.


> You are totally correct that to achieve the same performance, correctness, and overall master level "green"ness in a multirepo system you would have to either define or detect dependencies and dependent repos, build the entire affected chain, and test the result. That part is much easier in monorepo.

You also would need to do that as an atomic operation (in the mono-repo + especially with a commit queue you're building on the atomicity of git).

Having to unwinding that transaction if you aren't atomic can get you into a big mess at larger scales.

Here's a good related talk: C++ as a "Live at Head" language: https://www.youtube.com/watch?v=tISy7EJQPzI by Titus Winters (from Google).


thanks for the link - looking forward to watching.

Currently, I'm not convinced that you need to track and apply commits across dependencies and dependent systems atomically/transactionally to have a sane build environment even at scale, but you definitely get that part free with monorepo.

Any links to docs or presentations that address that specific issue would be very welcome :-)


> If Bob commits to repo A and Sally commits to repo B, their commits can't result in a merge conflict.

This holds true when A and B are leaf repos, but gets tricky with repos inside a dependency graph. More concretely, if C depends on both A and B, and it turns out that C depends on A and B in such a way that A_bob and B_sally are mutually incompatible, you need some kind of mechanism for reconciling that.

Of course, exactly as you point out, mono and multi are two tradeoffs for the problems that large codebases intrinsically are.


Package managers solve it quite well. Just depend on the latest version of your dependencies and tag a new version whenever they change.


This doesn’t work when an underlying system changes, and upgrading is mandatory for all clients or package dependants (happens often at scale for a multitude of reasons).


That's not good stewardship. You have a better API? Great, convince us it's worth investing in soon, you can even deprecate the known-good version.

There's always a window where both will be in use, because we can't synchronously replace every running process everywhere (not that it's even a good idea without a canary). The shorter you try to make that window, the more needless pain is created and plans disrupted. While we could use prod to beta test every single version of everything, that shouldn't be our priority.


That's not reality for most large companies, even though it's the right mindset for most software libraries. Ex: A new legal requirement comes in, resulting in a mandate that fields X and Y for certain queries, that are being done all over the codebase, now have to be tokenized. This is a breaking and mandatory change, with no room to allow systems to stay behind, and expensive consequences.

In this case you'll have a very short transition, between all consumers updating their client code (possibly in a backwards compatible way) and the change in the implicated system being deployed, not the other way around.


If adopting the new API is mandatory, every team should be told why it's mandatory, and we'll reprioritize and get it done. Doing it to our code behind our back is passive-aggressive and likely to break stuff, because who is monitoring and reporting on a change in our system's behavior that we didn't even see?


This is essentially the problem go has had, precisely because you could only ever pull the latest from master.

Introduce a breaking change into a common library and now you have to update every other dependency to support it.

Not so bad in a monorepo. But when your codebases are distributed?


It's almost like semantic versioning exists for some reason


Or realize that it is an advantage to control your dependencies.


Now you have to desperately try to get people to upgrade every time an important change goes through. And you quickly live in a world where you need to maintain tons of versions of all your services.


Not exactly for free, but there are free tools that handle this job for you very nicely:

https://zuul-ci.org/


You are right to say that conflict analyzer tries to treat commits independently based on the service or app (which are usually in separate repositories in a multi-repo world). However, note that the problem of conflicting changes (or a red master) exists even when you are in a multi-repo world as you could have one repository getting a large number of commits.

In fact, at Uber we have seen that behaviour with one of our popular apps when we did not have a monorepo. The construct of probabilistic speculation explained in the paper applies even in this scenario to guarantee a green master.


Do you mean the construct of probabilistic speculation applies in multirepo because you may end up with a hot spot repo that receives a high volume of commits at once?

Or do you mean that multirepo could also benefit from the construct of probabilistic speculation by ordering commits across multiple repos such that you are maximizing the number of repos that have changed before you build and minimising the number of commits applied to single repos?

Or both :-)


the former actually.

If you have 1 app that's the bread and butter of your company and, 60% of your 2000+ engineers working on various features of that one app, then even in a multi-repo world, you are going to have that 1 repo receiving ton of commits and the problem of keeping it green remains. Prob. speculation helps there.


Funny that you would draw a comparison to a Merkle tree. At one client they had such coupling between systems CI/CD was nearly impossible without either an explosion of test environments or grinding everything to a near halt.

We began working with the idea of consensus based CI/CD. If you pushed a change, you published that to the network. It gave other systems the opportunity to run their full suite of tests against the deployment of your code. Some number of confirmations from dependent systems was required to consider your code "stable". This progressed nearly sequentially assembling something like a block chain.

Ultimately the client was unable to pull this off for the same reason they were unable to decouple the systems: lack of software engineering capability.


"Based on all possible outcomes of pending changes, SubmitQueue constructs, and continuously updates a speculation graph that uses a probabilistic model, powered by logistic regression. The speculation graph allows SubmitQueue to select builds that are most likely to succeed, and speculatively execute them in parallel"

This is either brilliant or just something built for a promotion packet


Sounds so much simpler outside the context of a 'research' paper:

>When an engineer attempts to land their commit, it gets enqueued on the Submit Queue. This system takes one commit at a time, rebases it against master, builds the code and runs the unit tests. If nothing breaks, it then gets merged into master. With Submit Queue in place, our master success rate jumped to 99%.

https://eng.uber.com/ios-monorepo/


I can guarantee you that the system that's described in the paper is what we use at production. The blog post that you are pointing to was meant to describe the usage of monorepo at Uber and the challenges we faced at a high level. It didn't dive deep into the submission system and we have the paper to address that :-).

(I'm one of the authors as well as the tech-lead of the system.)


The paper is fantastic and the system sounds brilliant. Thanks for writing it and sharing your experiences. Don't let HN's characteristic middlebrow dismissals get you down.


That's because the research-y part is "how do we pick which commit to enqueue next", and that's harder to answer succinctly.


You can get most of the benefit on smaller scales by building feature branches and ensuring they pass unit tests, deployment and integration testing before they're allowed to be merged to mainline.

It still depends on well written tests, lest your confidence be dashed when a human starts pushing buttons and pulling levers.

Also, don't break up tightly coupled code/modules into separate repos for the sake of microservices. Hard working developers will have to do two or more builds, PRs, possibly update semvers, etc... Find the right seams. If two repos tend to always change in lockstep, think about merging.


This is what we did for a while at a project; there's options in most git hosts nowadays that force any PR to be up to date with master before merging on the one hand, and to have a green pipeline on the other. That works fine for smaller projects, but because it's not automated you end up with quite a bit of manual labor (rebase on master, push, wait for CI, discover someone else merged into master first, repeat).


It isn't just that the tests need to be good and humans break things sometimes. At a certain scale, the following happens enough to be a problem:

- changeset A is submitted, an integration branch is cut from latest master, and CI begins

- changeset B is submitted, an integration branch is cut from latest master, and CI begins

- changeset A's integration branch passes CI build/test, so A is merged into master

- changeset B's integration branch passes CI build/test, so B is merged into master

- however, changeset A + B interact in such a way that causes build and/or tests to fail

- build is now broken

You're probably thinking "that sounds like it wouldn't happen very often. Both changes would need to be submitted within some window such that changeset B's integration branch does not include changeset A, and vice-versa". Which is correct, but that's where the scale comes in. With enough engineers this starts happening more, and the more engineers you have the more unacceptable it is to have the build broken for any amount of time. And the more engineers the more code you have so the longer any individual build starts taking which lengthens the window during which the two conflicting changes could be submitted.

You need to do it in a way that serializes the changes because that's the only way to prevent this, but that takes too long. So the paper is about how to solve this problem.


This is why you rebase and test before each feature branch is to be merged to master. The only issue comes up when someone decides to merge while someone else has already rebased and is running their tests... but when they try to merge to master they will see their branch is out of date and that they need to rebase again. For small teams, it's easy enough to let everyone know that you're merging and not to merge anything else in the meantime. In a larger company, I've seen queue tools that give teams a 'ticket' for their turn to merge into master. It's a little clunky, and probably wouldn't scale to huge engineering teams... but sometimes low tech solutions work just as well.


submit queue makes sense and is used by lots of people, it's the "machine learning" which is applied to choosing commits to enqueue which I found to be interesting. if the master success rate was already 99% in 2017, with just submit queue, why build the complex ML stuff?


As far as I understand, what your describing is a simple build only one commit at a time submit queue.

What they are describing here is to detect if items do not conflict beyond a simple merge conflict and build & commit them simultaneously, increasing the throughput of the submit queue system.


If there are 1000 commits per day (which wouldn't be that many), that's 10 master breaks per day.


And at least in my limited experience, the impact of master being broken is pretty big, and even bigger when you have multiple teams. Either you block master - leading to a lot less than those 1000 commits making it to master on that day - or continue merging in stuff, which causes the root cause of the master branch to become fuzzy - and if your reporting is not in order, that is, if the person who broke the build isn't told they did, it'll be a lot of "Who broke master?" and people looking at each other to find out who is going to look into it. That's not scalable.


That's why master shouldn't be whatever you're about to deploy for the first time, but the known-good version that has been burned-in on prod (for an hour or a week or whatever), so if you have to abandon a release you don't have an entire team who already rebased on top of it.


The way we achieved the master success rate of 100% at scale was by using the techniques that we describe in the paper. The blog doesn't go into details on Submit Queue and how it works.

Just to clarify, the ML models are used to predict the prob. that a given change will succeed against master as well as the prob. of conflict between changes.


The Submit Queue is the name of the complex ML stuff.


Promotion-oriented design, no doubt.


no it is not, this whole thing being used in production and really reduces the time changes submitted and they are merged into master. and the master is almost always green, meaning developers can build and test any piece of code without problem.

They have designed this as a result of a need, not just a fancy project.


I can guarantee you that none of the ideas on the paper were born out of a desire to get promoted. They were invented because ML models helped figure out which set of builds we need to run more accurately at scale.


I still hope you got promoted though, you deserve it :)


We're building some similar tech at GitLab, though without the dependency analysis yet.

Merge Requests now combine the source and target branches before building, as an optimization: https://docs.gitlab.com/ee/ci/merge_request_pipelines/#combi...

Next step is to add queueing (https://gitlab.com/gitlab-org/gitlab-ee/issues/9186), then we're going to optimistically (and in parallel) run the subsequent pipelines in the queue: https://gitlab.com/gitlab-org/gitlab-ee/issues/11222. At this point it may make sense to look at dependency analysis and more intelligent ordering, though we're seeing nice improvements based on tests so far, and there's something to be said for simplicity if it works.


There's a nice middle ground between this and a one-at-a-time submit queue: have a speculative batch running on the side. This gives nice speedups (approaching N times more commits, where N is the batch size) with minimal complexity.

One useful metric is the ratio between test time and the number of commits per day. If your tests run in a minute, you can test submissions one at a time and still have a thousand successful commits each day. If your tests take an hour, you can have at most 24 changes per day under a one-at-a-time scheme.

I worked on Kubernetes, where test runs can take more than an hour-- spinning up VMs to test things is expensive! The submit queue tests both the top of the queue and a batch of a few (up to 5) changes that can be merged without a git merge conflict. If either one passes, the changes are merged. Batch tests aren't cancelled if the top of the queue passes, so sometimes you'll merge both the top of the queue AND the batch, since they're compatible.

Here's some recent batches: https://prow.k8s.io/?repo=kubernetes%2Fkubernetes&type=batch

And the code to pick batches: https://github.com/kubernetes/test-infra/blob/0d66b18ea7e8d3...

Merges to the main repo peak at about 45 per day, largely depending on the volume of changes. The important thing is that the queue size remains small: http://velodrome.k8s.io/dashboard/db/monitoring?orgId=1&pane...


The paper mentions Zuul as a previous work, but notes that batching has downsides:

> Optimistic execution of changes is another technique being used by production systems (e.g., Zuul [12]). Similar to optimistic concurrency control mechanisms in transactional systems, this approach assumes that every pending change in the system can succeed. Therefore, a pending change starts performing its build steps assuming that all the pending changes that were submitted before it will succeed. If a change fails, then the builds that speculated on the success of the failed change needs to be aborted, and start again with new optimistic speculation. Similar to the previous solutions, this approach does not scale and results in high turnaround time since failure of a change can abort many optimistically executing builds. Moreover, abort rate increases as the probability of conflicting changes increase (Figure 1).


The same thing was (is?) done in openstack with zuul, I believe. When you going to merge something, your branch goes on top of things already going through the CI.


We talked to the Zuul team, they use more parallelism but it's similar: https://zuul-ci.org/docs/zuul/user/gating.html

Most of the complexity and suffering of a submit queue evolves from the interactions between your VCS and CI systems. Keeping things simple is great! Kubernetes' CI system is Prow, which runs the tests as pods in a Kubernetes cluster. Dogfooding like this is great, since the team you're providing CI for can also help fix bugs that arise.


Yes, I recently switched my org to using Zuul for this purpose; by having an internal speculative queue of future states for master you can have multiple pending changes tested at once, while also ensuring that the tested code is exactly what goes into master. So far it's been a really good experience, in particular as our tests take a long time.

It sounds like Uber's thing has a lot more smarts regardint deciding what gets tested. For the scale I work at (<200k lines of code) that isn't necessary.


I am still trying to wrap my head around a giant monolithic repo model instead of breaking codes into multiple repos.

At Amazon, for example, they have multi repos setup. A single repo represents one package which has major version.The Amazon's build system builds packages and pulls dependencies from the artifact repository when needed. The build system is responsible for "what" to build vs "how" to build, which is left to the package setup (e.g. maven/ant).

I am currently trying to find a similar setup. I have looked as nix, bazel, buck and pants. Nix seems to offer something close. I am still trying to figure how to vendor npm packages and which artifact store is appropriate. And also if it is possible to have the nix builder to pull artifacts from a remote store.

Any pointer from the HN community is appreciated.

Here is what I would like to achieve:

1. Vendor all dependencies (npm packages, pip packages, etc) with ease. 2. Be able to pull artifact from a remote store (e.g. artifactory). 3. Be able to override package locally for my build purposes. For example, if I am working on a package A which depends on B, I should be able to build A from source and if needed to build B which A can later use for its own build. 4. Support multiple languages (TypeScript, JavaScript, Java, C, rust, and go). 5. Have each package own repository.


> At Amazon, for example, they have multi repos setup.

And didn't you find that this created massive headaches trying to build many disparate and inconsistent dependencies across repos? I think the benefits touted from mono-repos are exactly illustrated by the pain points working with Amazon's multi repo setup, in my opinion.

https://danluu.com/monorepo/

"Refactoring an API that's used across tens of active internal projects will probably a good chunk of a day."

This was my experience.


How often have you interacted with “hot” packages that both change rapidly and are high dependency? Haven’t worked at amazon but in my experience that’s been low occurrence or a reason to build evolving api / not breaking the api.

I’m just curious, but in fairness both of these schemes have obvious issues that will become headaches or positive design depending on your outlook. Clearly you can engineer effectively in either scheme.


Monorepos are really nice if you want to enforce consistent and sane engineering practices and not waste time managing all the repos individually by teams.

Bazel has target caching including remote caching which can be shared across multiple engineers/execution environments. The tricky part would be ensuring your builds are hermetic and reproducible (which is also easier to achieve in monorepo setup).


Quite a premise: "Giant monolithic source-code repositories are one of the fundamental pillars of the back end infrastructure in large and fast-paced software companies."


facebook, google, airbnb, quora, many more all use monorepo

obviously there are many others who do not use monorepo (amazon comes to mind) but it's reasonable to claim that they are actually widely used and fundamental when used


Microsoft uses it for Windows as well, which was so large they wrote their own git filesystem to power it.


Does anybody know how these companies development environments look like? I know about Piper at Google but how do the rest manage? Does every single engineer have the entire monorepo in their machines?


At facebook, a virtual filesystem (https://github.com/facebookexperimental/eden) + change monitor (https://facebook.github.io/watchman/) = source control operations run in O(files I have modified) instead of O(size of repository) time


Very interesting, thanks!


Most places I know of use Git Virtual File System or equivalents.


It is my understanding that VFSForGit only works on Windows.


The github repo has instructions for running it on Mac and says that the stable Mac version is under active development.


Airbnb uses a monorepo for JVM-based project but most of Airbnb's code at least as of mid-2017 was not run on the JVM and was hosted multi-repo.


What are AirBnb's unique scaling issues other than just being a web app with tons of usage?

Uber has navigation, route optimization, queuing, etc. Facebook has to propagate activity our to massive and complex social network graphs.

I'm not discounting the toughness of operating at Airbnb's scale, but from my limited understanding it seems like they are not solving a new problem.


Map search, automatic dynamic pricing, predicting busy periods, predicting user & provider preferences, etc.


i was a data scientist so i knew of monorail but never touched that side of things :)


Notably, Netflix doesn't (or least didn't):

https://medium.com/netflix-techblog/towards-true-continuous-...


til! Most of my experience with large, single-repository projects are just plain monoliths. The design goal we strive for tends to be the microservice architecture, assuming that isolation of responsibility leads to more maintainability, better decision making, etc. I can see how, with a well disciplined team, the monorepo could have the best of both worlds.


Many companies do use a monorepo. Many other companies do not. There are trade offs.


All the companies have in common a huge budget they can invest on their build systems to overcome the shortcomings of monorepos.

They do have some benefits, but they also come with an immense cost


But that's what the OP's quote is saying. "large" companies use them.


all of those companies were once small too


I like it at Google!


Google has the team + tooling to properly support it. The same cannot be said for many other orgs.


Google has more people working on the problem than many other companies have employees.


I don’t doubt it. They also do more traffic through their VCS than most companies do through their main product.


Do they? They didn't used to. In 2015 we were routinely dead in the water, unable to test and deploy anything from our google3 projects because some random submitted a CL for a project we didn't even care about. Teams would appoint "build cops" whose job is to complain as quickly as possible because that's all we could do about it.

Every problem you could have with bad dependencies is entirely self-inflicted. The Right Thing™ is to choose a known-good version, and update when you have the bandwidth to pay down the tech debt.


Many teams


Yes, my first thought "I wish the systems I worked on in big corps were big monolithic giants"


Which huge successful software companies don't use a monorepo?


Amazon.


I think these count as huge (although maybe not when next to Google), but Spotify and Netflix.


I'm willing to agree with that premise.


Anyone fancy comparing this to bors?


Actually we have compared it in our paper.

Bors builds one change at a time. On the other hand, Submit Queue speculatively builds several changes at a time based on the outcomes of other pending changes in the system. Apart from that, Submit Queue uses a conflict analyzer to find independent changes in order to commit changes in parallel as well as trim the speculation graph.

We have also evaluated the performance of Single-Queue (idea of Bors) on our workloads. In fact, as described in the paper, the performance of this technique at scale was so high (~132x slower) that we omitted its results. Submit Queue on the other hand operates at 1-3x region compared to an optimal solution.

I recommend you to read the paper here for further details. https://dl.acm.org/citation.cfm?id=3303970


> Bors builds one change at a time.

Bors builds multiple changes at once (it creates a merge commit of all available changes and then runs the tests on all of them), and merges if all of them are good.

Possibly you are thinking of the older bors, as opposed to modern bors-ng?


The main difference is in the conflict-detection system. Whereas bors only has a single queue, this new system can have one queue for each set of changes which doesn't interact with any other set. Eg. if you've got an ios app, a webapp, and a bunch of documentation all in the same repo, then this system will automatically work out that changes to each of those independent projects can be tested and merged in parallel, because they can't possibly conflict.

It relies on understanding the inputs and outputs for all CI build steps to work out how changes to particular files might conflict.

Also, it has a much more sophisticated understanding of how likely a change is to be the source of failure, which it updates in response to repeated test runs. It can then prioritise the changes which are most likely to succeed.


Is the logic of which queue what files trigger automatically or manually determined?


I think that what works for companies like Uber/Google/Facebook is not applicable to the rest of fortune 500 or all of the rest of the companies.

disclaimer: I am one of Datree.io founders. We provide a visibility and governance solution to R&D organizations on top of GitHub.

Here are some rules and enforcement around Security and Compliance which most of our companies use for multi-repo GitHub orgs. 1. Prevent users from adding outside collaborators to GitHub repos. 2. Enforce branch protection on all current repos and future created ones - prevent master branch deletion and force push. 3. Enforce pull request flow on default branch for all repos (including future created) - prevent direct commits to master without pull-request and checks. 4. Enforce Jira ticket integration - mention ticket number in pull request name / commit message. 5. Enforce proper Git user configuration. 6. Detect and prevent merging of secrets.


Having been at both Lyft and DoorDash where I've been an engineer responsible for unit test health, I decided to do a side project called Flaptastic (https://www.flaptastic.com/), a flaky unit test resolution system.

Flaptastic will make your CI/CD pipelines reliable by identifying which tests fail due to flaps (aka flakes) and then give you a "Disable" button to instantly skip any test which is immediately effective across all feature branches, pull requests, and deploy pipelines.

An on-premise version is in the works to allow you to run it onsite for the enterprise.


I don't want to come across as negative, but just an observation and to play devil's advocate - wouldn't it be better to fix the flaky test or delete it entirely instead of build a feature to disable it during a test run in an automated fashion?

Whenever our team has a significant number of flakey tests (more than 1-2) we usually schedule a bug squash session to fix them and amortize the cost over the whole team.


What you really want to do is first disable a test you know is unhealthy to unblock everybody. Then, you fix it. After you've reintroduced it healthy, you can turn it back on.


I was talking to someone from Google who works on Bazel things, and he brought an interesting point: flaky tests are asymmetric in that they don't provide much value when they fail (since you don't know if the failure was due to flakiness), but they do provide a lot of value when they pass (because they presumable test something non-trivial.)

With this in mind, what Bazel does when a test is marked flaky is run it several times. This is a simple way of minimizing the effect of flakiness while still getting confidence from green tests.


I dislike rerunning flaky tests. It too often masks genuine failures.


If the effort required to mark-disable/comment out/rm a known-unhealthy test is more than a few seconds beyond the efforts to navigate through a tool like the one you describe, I think the problem is likely in the change control/source control processes being employed. That seems like it should be so easy as to not need an additional tool (unless tests are flaking out so often that even the <1min of overhead to disable them is adding up, in which case I suspect that people are misinterpreting something fundamental about the role of tests in their development processes).


What I've seen in most companies is that when a test goes bad (imagine 10k unit tests, and 1 hits stripe's api sandbox which just went down) the bad test affects everybody who's busy working on their respective feature branches. Everybody wonders how their feature branch broke the stripe integration and you have hundreds of developers trying to diagnose and fix the same broken test.

Our solution allows the someone to know the test failed because its flaking out immediately as soon as it flakes, and provides a 1 click option to instantly disable that test across all feature branches so that everybody else can continue working undisturbed.

Without something like this, you have to: 1) Create a new feature branch 2) Commented out the broken test 3) Wait for it to pass CI 4) Gain approvals as needed 5) Merge the PR back to the master line 6) Message everybody to let them know the test was removed and they should rebase

The process above is sort of the industry standard and this means a giant loss in productivity for everybody on your team and is especially painful for monolith codebases.

Companies where I've worked easily hemorrhage $1m per year on this problem in terms of developer productivity losses if you consider the number of hours wasted per year.


Your first example is an integration test, not a unit test, which should be changed.

Integration tests are nice, but best if ran separately...


Best practice is actually just to disable all tests that are failing. Can't hold up our sprint deadlines!


Failing != flaking. If your tests interact with any level of randomness (seed data, time based constraints, etc) you're going to find the occasional test that doesn't work and subsequently works on the rebuild.

If something is consistently failing I would assume this tool does not disable it.


A possible complication would occur if there are tests that occasionally fail.


What's exactly a monothlic ? Is it only related to codebase (monothlic vs monorepo) ? Or it's about runtime like microservices vs monothlic.


From the first sentence of the abstract:

> monolithic source-code repositories

A monorepo is a monolithic repository


To answer the parent, it doesn’t imply a monolith application, but deployment to multiple server roles and apps will happen using the same source repository.


How common is this in the industry? Do multirepos run on a batch?


Is this novel? Other companies have had this for ages.


No, they haven't. This is a system to queue commits, not a simple CI setup. This problem only comes up when you start having contention due to commit volume in a monorepo (think thousands commits/day). This is only the 3rd one I've heard about.

> This paper introduces a change management system called SubmitQueue that is responsible for continuous integration of changes into the mainline at scale while always keeping the mainline green. Based on all possible outcomes of pending changes, SubmitQueue constructs, and continuously updates a speculation graph that uses a probabilistic model, powered by logistic regression. The speculation graph allows SubmitQueue to select builds that are most likely to succeed, and speculatively execute them in parallel. Our system also uses a scalable conflict analyzer that constructs a conflict graph among pending changes. The conflict graph is then used to (1) trim the speculation space to further improve the likelihood of using remaining speculations, and (2) determine independent changes that can commit in parallel


The problem also occurs if your CI Build + Test steps take a while to run, even on a small team pushing dozens of commits per day.

Two code-conflict-free changes may pass a pre-merge build+test cycle independently but may logically break one another if both changes are merged into master. Using a submit/merge queue guarantees that each change has passed tests with the exact ordering of commits it would be merged onto. The example described here is a better explanation: https://github.com/bors-ng/bors-ng#but-dont-githubs-protecte...


I don't quite understand the problem they are trying to solve. Is there so many change sets that they couldn't provision enough ci servers, hence the "speculation graph with probabilistic model"?


Sort of, though not really.

Imagine I have three changes, C1 modifies F1, C2 modifies F2, and C3 modifies F1. There's no relation between F1 and F2.

At low-ish rate of submission, you test and commit C1, then test and commit C2, then when you try and test and commit C3, you rebase, and re-test and commit. (the merge doesn't conflict so can be automatically fixed)

Now assume all three changes are submitted by 3 different engineers in the span of a minute and engineers don't want to manually rebase. The rebase/build/submit time is less than the time between changes!

So you have a tool that queues up the changes, and at each change you

1. Rebase onto current head

2. build with the new changes

3. Submit

But that's still really slow. Since everything is sequential. If my change takes ~30m to test, it blocks everyone else who depends on my change.

So OK, do things in parallel: Build and test C1, C1 + C2, and C1+C2+C3. Then, as soon as C1 is finished testing, you can submit all 3. There's still 2 problems though: C2 is unreasonably delayed, and "what if C1 is broken".

So, if C2 and C1 don't conflict, you can actually just submit C2 before C1 even though the request to submit was made after. But when there really is a dependency, like C3 and C1, the question is, do I build and test {C1, C1+C3}, {C3, C3+C1}, or something else. SubmitQueue appears to try and address that question. "Given potentially conflicting changes (not at a source level but at a transitive closure level), how do I order them so that the most changes succeed the fastest, assuming some changes can fail, and I have enough processing power to run some, but not all, permutations of changes in parallel"?


awesome explanation


Say you change test A, and I change test B. Before, they both expect a value VAL to be 1, but now test A expects it to be 2 and test B expects it to be 3. We both submit a change, and both changes passed CI on that respective branch. You merged your change into master since it looks OK, and I do too. Now master is broken. Womp womp.


I think they are more common than you are thinking. I am familiar with several, even going back to the svn and cvs era, all of which predated the whole formalized-and-named CI/CD thing. In my experience, we called this model submit-to-commit, and depending on the specific manifestation, worked with diffs or branches. I'm talking 1990s.

The fancy bits in this implementation from the paper are interesting but the model itself is not that unusual.


In companies like that, is there any consideration given to minimizing conflict-prone actions, like, say renaming functions, an activity that could conflict with any commit that uses the old function name, but which in itself is unlikely to break anything? Maybe certain commits could be scheduled over the weekend?

I guess I just have a hard time imagining how many buys developers really commit important work all at once on large projects...


Other companies often just wait for tests to finish, while at the time of running tests proposed changes (branch/PR) might be not based on current version of master. Then they just rebase/merge after tests pass, without running tests again. For smaller projects, this rarely breaks. For monorepo with lots of committers rate of breakage becomes too large.

Next step is to serialize all proposed changes, so they are rebased one on top of other before running tests. This eliminates breakage due to merging, but does not scale:

> The simplest solution to keep the mainline green is to enqueue every change that gets submitted to the system. A change at the head of the queue gets committed into the mainline if its build steps succeed. > > This approach does not scale as the number of changes grows. For instance, with a thousand changes per day, where each change takes 30 minutes to pass all build steps, the turnaround time of the last enqueued change will be over 20 days.

This paper is about scaling a variant of such queue.


Which companies?


Us, for instance.[0,2]

But sure enough, we definitely weren't the first to go down this path. Facebook was using (or developing the tech for) server-side rebasing in 2015.[1] Gitlab provides native server-side rebase functionality, likely inspired by various parties already having developed tools to do the same.

These aren't new ideas. But handling them at the scale where you land hundreds or even thousands of commits a day to a repo and require the ability to deploy at will, that's where engineering comes into play.

0: https://smarketshq.com/marge-bot-for-gitlab-keeps-master-alw...

1: https://softwareengineering.stackexchange.com/questions/2787...

2: https://github.com/smarkets/marge-bot


Yours seems identical to Bors, just for Gitlab instead of GitHub? That isn't really what's described in the OP.


Yup, pretty much. I was mostly answering the parent, who in turn was questioning the lack of novelty.

The concept of an evergreen master with testing done in branches, followed by automated merges/rebases is not special. Quite a few companies have been doing it for years, it's the off-the-shelf tooling and subsequent publicity that haven't necessarily been around as long.

As for OP's material? The automated conflict resolution via reordering to optimise parallelism - that certainly feels novel.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: