
Keeping master green at scale - roshanj
https://eng.uber.com/research/keeping-master-green-at-scale/
======
underrun
Adrian Colyer dug into this a little further on the morning paper:

[https://blog.acolyer.org/2019/04/18/keeping-master-green-
at-...](https://blog.acolyer.org/2019/04/18/keeping-master-green-at-scale/)

His analysis indicates that what uber does as part of its build pipeline is to
break up the monorepo into "targets" and for each target create something like
a merkle tree (which is basically what git uses to represent commits) and use
that information to detect potential conflicts (for multiple commits that
would change the same target).

what it sounds like to me is that they end up simulating multirepo to enable
tests to run on a batch of most likely independent commits in their build
system. For multirepo users this is explicit in that this comes for free :-)

which is super interesting to me as it seems to indicate that an optimizing
CI/CD systems requires dealing with all the same issues whether it's mono- or
multi- repo, and problems solved by your layout result in a different set of
problems that need to be resolved in your build system.

~~~
ori_b
> For multirepo users this is explicit in that this comes for free :-)

Only if you spend the time to build tools to detect commits in your
dependencies, as well as your dependent repositories, and figure out how to
update and check them out on the appropriate builds.

So, no, it doesn't come for free.

~~~
msangi
Package managers solve it quite well. Just depend on the latest version of
your dependencies and tag a new version whenever they change.

~~~
ricardobeat
This doesn’t work when an underlying system changes, and upgrading is
mandatory for all clients or package dependants (happens often at scale for a
multitude of reasons).

~~~
erik_seaberg
That's not good stewardship. You have a better API? Great, convince us it's
worth investing in soon, you can even deprecate the known-good version.

There's always a window where both will be in use, because we can't
synchronously replace every running process everywhere (not that it's even a
good idea without a canary). The shorter you try to make that window, the more
needless pain is created and plans disrupted. While we _could_ use prod to
beta test every single version of everything, that shouldn't be our priority.

~~~
ricardobeat
That's not reality for most large companies, even though it's the right
mindset for most software libraries. Ex: A new legal requirement comes in,
resulting in a mandate that fields X and Y for certain queries, that are being
done all over the codebase, now have to be tokenized. This is a breaking _and
mandatory_ change, with no room to allow systems to stay behind, and expensive
consequences.

In this case you'll have a very short transition, between all consumers
updating their client code (possibly in a backwards compatible way) and the
change in the implicated system being deployed, not the other way around.

~~~
erik_seaberg
If adopting the new API is mandatory, every team should be told why it's
mandatory, and we'll reprioritize and get it done. Doing it to our code behind
our back is passive-aggressive and likely to break stuff, because who is
monitoring and reporting on a change in our system's behavior that we didn't
even see?

------
huac
"Based on all possible outcomes of pending changes, SubmitQueue constructs,
and continuously updates a speculation graph that uses a probabilistic model,
powered by logistic regression. The speculation graph allows SubmitQueue to
select builds that are most likely to succeed, and speculatively execute them
in parallel"

This is either brilliant or just something built for a promotion packet

~~~
pastor_elm
Sounds so much simpler outside the context of a 'research' paper:

>When an engineer attempts to land their commit, it gets enqueued on the
Submit Queue. This system takes one commit at a time, rebases it against
master, builds the code and runs the unit tests. If nothing breaks, it then
gets merged into master. With Submit Queue in place, our master success rate
jumped to 99%.

[https://eng.uber.com/ios-monorepo/](https://eng.uber.com/ios-monorepo/)

~~~
huac
submit queue makes sense and is used by lots of people, it's the "machine
learning" which is applied to choosing commits to enqueue which I found to be
interesting. if the master success rate was already 99% in 2017, with just
submit queue, why build the complex ML stuff?

~~~
numlocked
If there are 1000 commits per day (which wouldn't be that many), that's 10
master breaks per day.

~~~
Cthulhu_
And at least in my limited experience, the impact of master being broken is
pretty big, and even bigger when you have multiple teams. Either you block
master - leading to a lot less than those 1000 commits making it to master on
that day - or continue merging in stuff, which causes the root cause of the
master branch to become fuzzy - and if your reporting is not in order, that
is, if the person who broke the build isn't told they did, it'll be a lot of
"Who broke master?" and people looking at each other to find out who is going
to look into it. That's not scalable.

~~~
erik_seaberg
That's why master shouldn't be whatever you're about to deploy for the first
time, but the known-good version that has been burned-in on prod (for an hour
or a week or whatever), so if you have to abandon a release you don't have an
entire team who already rebased on top of it.

------
jl-gitlab
We're building some similar tech at GitLab, though without the dependency
analysis yet.

Merge Requests now combine the source and target branches before building, as
an optimization:
[https://docs.gitlab.com/ee/ci/merge_request_pipelines/#combi...](https://docs.gitlab.com/ee/ci/merge_request_pipelines/#combined-
ref-pipelines-premium)

Next step is to add queueing ([https://gitlab.com/gitlab-org/gitlab-
ee/issues/9186](https://gitlab.com/gitlab-org/gitlab-ee/issues/9186)), then
we're going to optimistically (and in parallel) run the subsequent pipelines
in the queue: [https://gitlab.com/gitlab-org/gitlab-
ee/issues/11222](https://gitlab.com/gitlab-org/gitlab-ee/issues/11222). At
this point it may make sense to look at dependency analysis and more
intelligent ordering, though we're seeing nice improvements based on tests so
far, and there's something to be said for simplicity if it works.

------
Scaevolus
There's a nice middle ground between this and a one-at-a-time submit queue:
have a speculative batch running on the side. This gives nice speedups
(approaching N times more commits, where N is the batch size) with minimal
complexity.

One useful metric is the ratio between test time and the number of commits per
day. If your tests run in a minute, you can test submissions one at a time and
still have a thousand successful commits each day. If your tests take an hour,
you can have at most 24 changes per day under a one-at-a-time scheme.

I worked on Kubernetes, where test runs can take more than an hour-- spinning
up VMs to test things is expensive! The submit queue tests _both_ the top of
the queue and a batch of a few (up to 5) changes that can be merged without a
git merge conflict. If either one passes, the changes are merged. Batch tests
aren't cancelled if the top of the queue passes, so sometimes you'll merge
both the top of the queue AND the batch, since they're compatible.

Here's some recent batches:
[https://prow.k8s.io/?repo=kubernetes%2Fkubernetes&type=batch](https://prow.k8s.io/?repo=kubernetes%2Fkubernetes&type=batch)

And the code to pick batches: [https://github.com/kubernetes/test-
infra/blob/0d66b18ea7e8d3...](https://github.com/kubernetes/test-
infra/blob/0d66b18ea7e8d3f216287ad06b11042c12bc6e48/prow/tide/tide.go#L759)

Merges to the main repo peak at about 45 per day, largely depending on the
volume of changes. The important thing is that the queue size remains small:
[http://velodrome.k8s.io/dashboard/db/monitoring?orgId=1&pane...](http://velodrome.k8s.io/dashboard/db/monitoring?orgId=1&panelId=10&fullscreen&from=now-7d&to=now)

~~~
viraptor
The same thing was (is?) done in openstack with zuul, I believe. When you
going to merge something, your branch goes on top of things already going
through the CI.

~~~
Scaevolus
We talked to the Zuul team, they use more parallelism but it's similar:
[https://zuul-ci.org/docs/zuul/user/gating.html](https://zuul-
ci.org/docs/zuul/user/gating.html)

Most of the complexity and suffering of a submit queue evolves from the
interactions between your VCS and CI systems. Keeping things simple is great!
Kubernetes' CI system is Prow, which runs the tests as pods in a Kubernetes
cluster. Dogfooding like this is great, since the team you're providing CI for
can also help fix bugs that arise.

------
antimora
I am still trying to wrap my head around a giant monolithic repo model instead
of breaking codes into multiple repos.

At Amazon, for example, they have multi repos setup. A single repo represents
one package which has major version.The Amazon's build system builds packages
and pulls dependencies from the artifact repository when needed. The build
system is responsible for "what" to build vs "how" to build, which is left to
the package setup (e.g. maven/ant).

I am currently trying to find a similar setup. I have looked as nix, bazel,
buck and pants. Nix seems to offer something close. I am still trying to
figure how to vendor npm packages and which artifact store is appropriate. And
also if it is possible to have the nix builder to pull artifacts from a remote
store.

Any pointer from the HN community is appreciated.

Here is what I would like to achieve:

1\. Vendor all dependencies (npm packages, pip packages, etc) with ease. 2\.
Be able to pull artifact from a remote store (e.g. artifactory). 3\. Be able
to override package locally for my build purposes. For example, if I am
working on a package A which depends on B, I should be able to build A from
source and if needed to build B which A can later use for its own build. 4\.
Support multiple languages (TypeScript, JavaScript, Java, C, rust, and go).
5\. Have each package own repository.

~~~
PKop
> At Amazon, for example, they have multi repos setup.

And didn't you find that this created massive headaches trying to build many
disparate and inconsistent dependencies across repos? I think the benefits
touted from mono-repos are exactly illustrated by the pain points working with
Amazon's multi repo setup, in my opinion.

[https://danluu.com/monorepo/](https://danluu.com/monorepo/)

 _" Refactoring an API that's used across tens of active internal projects
will probably a good chunk of a day."_

This was my experience.

~~~
awinder
How often have you interacted with “hot” packages that both change rapidly and
are high dependency? Haven’t worked at amazon but in my experience that’s been
low occurrence or a reason to build evolving api / not breaking the api.

I’m just curious, but in fairness both of these schemes have obvious issues
that will become headaches or positive design depending on your outlook.
Clearly you can engineer effectively in either scheme.

------
chairleader
Quite a premise: "Giant monolithic source-code repositories are one of the
fundamental pillars of the back end infrastructure in large and fast-paced
software companies."

~~~
huac
facebook, google, airbnb, quora, many more all use monorepo

obviously there are many others who do not use monorepo (amazon comes to mind)
but it's reasonable to claim that they are actually widely used and
fundamental when used

~~~
vruiz
Does anybody know how these companies development environments look like? I
know about Piper at Google but how do the rest manage? Does every single
engineer have the entire monorepo in their machines?

~~~
Shish2k
At facebook, a virtual filesystem
([https://github.com/facebookexperimental/eden](https://github.com/facebookexperimental/eden))
+ change monitor
([https://facebook.github.io/watchman/](https://facebook.github.io/watchman/))
= source control operations run in O(files I have modified) instead of O(size
of repository) time

~~~
vruiz
Very interesting, thanks!

------
richardwhiuk
Anyone fancy comparing this to bors?

~~~
sundargates
Actually we have compared it in our paper.

Bors builds one change at a time. On the other hand, Submit Queue
speculatively builds several changes at a time based on the outcomes of other
pending changes in the system. Apart from that, Submit Queue uses a conflict
analyzer to find independent changes in order to commit changes in parallel as
well as trim the speculation graph.

We have also evaluated the performance of Single-Queue (idea of Bors) on our
workloads. In fact, as described in the paper, the performance of this
technique at scale was so high (~132x slower) that we omitted its results.
Submit Queue on the other hand operates at 1-3x region compared to an optimal
solution.

I recommend you to read the paper here for further details.
[https://dl.acm.org/citation.cfm?id=3303970](https://dl.acm.org/citation.cfm?id=3303970)

~~~
richardwhiuk
> Bors builds one change at a time.

Bors builds multiple changes at once (it creates a merge commit of all
available changes and then runs the tests on all of them), and merges if all
of them are good.

Possibly you are thinking of the older bors, as opposed to modern bors-ng?

------
shimont
I think that what works for companies like Uber/Google/Facebook is not
applicable to the rest of fortune 500 or all of the rest of the companies.

disclaimer: I am one of Datree.io founders. We provide a visibility and
governance solution to R&D organizations on top of GitHub.

Here are some rules and enforcement around Security and Compliance which most
of our companies use for multi-repo GitHub orgs. 1\. Prevent users from adding
outside collaborators to GitHub repos. 2\. Enforce branch protection on all
current repos and future created ones - prevent master branch deletion and
force push. 3\. Enforce pull request flow on default branch for all repos
(including future created) - prevent direct commits to master without pull-
request and checks. 4\. Enforce Jira ticket integration - mention ticket
number in pull request name / commit message. 5\. Enforce proper Git user
configuration. 6\. Detect and prevent merging of secrets.

------
jonthepirate
Having been at both Lyft and DoorDash where I've been an engineer responsible
for unit test health, I decided to do a side project called Flaptastic
([https://www.flaptastic.com/](https://www.flaptastic.com/)), a flaky unit
test resolution system.

Flaptastic will make your CI/CD pipelines reliable by identifying which tests
fail due to flaps (aka flakes) and then give you a "Disable" button to
instantly skip any test which is immediately effective across all feature
branches, pull requests, and deploy pipelines.

An on-premise version is in the works to allow you to run it onsite for the
enterprise.

~~~
roskilli
I don't want to come across as negative, but just an observation and to play
devil's advocate - wouldn't it be better to fix the flaky test or delete it
entirely instead of build a feature to disable it during a test run in an
automated fashion?

Whenever our team has a significant number of flakey tests (more than 1-2) we
usually schedule a bug squash session to fix them and amortize the cost over
the whole team.

~~~
jonthepirate
What you really want to do is first disable a test you know is unhealthy to
unblock everybody. Then, you fix it. After you've reintroduced it healthy, you
can turn it back on.

~~~
lhorie
I was talking to someone from Google who works on Bazel things, and he brought
an interesting point: flaky tests are asymmetric in that they don't provide
much value when they fail (since you don't know if the failure was due to
flakiness), but they do provide a lot of value when they pass (because they
presumable test something non-trivial.)

With this in mind, what Bazel does when a test is marked flaky is run it
several times. This is a simple way of minimizing the effect of flakiness
while still getting confidence from green tests.

~~~
jacques_chester
I dislike rerunning flaky tests. It too often masks genuine failures.

------
cjfd
A possible complication would occur if there are tests that occasionally fail.

------
revskill
What's exactly a monothlic ? Is it only related to codebase (monothlic vs
monorepo) ? Or it's about runtime like microservices vs monothlic.

~~~
jade12
From the first sentence of the abstract:

> monolithic source-code repositories

A monorepo _is_ a monolithic repository

~~~
ricardobeat
To answer the parent, it doesn’t imply a monolith application, but deployment
to multiple server roles and apps will happen using the same source
repository.

------
techmortal
How common is this in the industry? Do multirepos run on a batch?

------
7e
Is this novel? Other companies have had this for ages.

~~~
ricardobeat
No, they haven't. This is a system to queue commits, not a simple CI setup.
This problem only comes up when you start having contention due to commit
volume in a monorepo (think thousands commits/day). This is only the 3rd one
I've heard about.

> This paper introduces a change management system called SubmitQueue that is
> responsible for continuous integration of changes into the mainline at scale
> while always keeping the mainline green. Based on all possible outcomes of
> pending changes, SubmitQueue constructs, and continuously updates a
> speculation graph that uses a probabilistic model, powered by logistic
> regression. The speculation graph allows SubmitQueue to select builds that
> are most likely to succeed, and speculatively execute them in parallel. Our
> system also uses a scalable conflict analyzer that constructs a conflict
> graph among pending changes. The conflict graph is then used to (1) trim the
> speculation space to further improve the likelihood of using remaining
> speculations, and (2) determine independent changes that can commit in
> parallel

~~~
zyang
I don't quite understand the problem they are trying to solve. Is there so
many change sets that they couldn't provision enough ci servers, hence the
"speculation graph with probabilistic model"?

~~~
joshuamorton
Sort of, though not really.

Imagine I have three changes, C1 modifies F1, C2 modifies F2, and C3 modifies
F1. There's no relation between F1 and F2.

At low-ish rate of submission, you test and commit C1, then test and commit
C2, then when you try and test and commit C3, you rebase, and re-test and
commit. (the merge doesn't conflict so can be automatically fixed)

Now assume all three changes are submitted by 3 different engineers in the
span of a minute and engineers don't want to manually rebase. The
rebase/build/submit time is less than the time between changes!

So you have a tool that queues up the changes, and at each change you

1\. Rebase onto current head

2\. build with the new changes

3\. Submit

But that's still really slow. Since everything is sequential. If my change
takes ~30m to test, it blocks everyone else who depends on my change.

So OK, do things in parallel: Build and test C1, C1 + C2, and C1+C2+C3. Then,
as soon as C1 is finished testing, you can submit all 3. There's still 2
problems though: C2 is unreasonably delayed, and "what if C1 is broken".

So, if C2 and C1 don't conflict, you can actually just submit C2 before C1
even though the request to submit was made after. But when there really is a
dependency, like C3 and C1, the question is, do I build and test {C1, C1+C3},
{C3, C3+C1}, or something else. SubmitQueue appears to try and address that
question. "Given potentially conflicting changes (not at a source level but at
a transitive closure level), how do I order them so that the most changes
succeed the fastest, assuming some changes can fail, and I have enough
processing power to run some, but not all, permutations of changes in
parallel"?

~~~
jrochkind1
awesome explanation

