Hacker News new | past | comments | ask | show | jobs | submit login
A Continuous Integration Mystery (danhough.com)
31 points by basicallydan on Dec 5, 2020 | hide | past | favorite | 25 comments

> Should CI create a merged branch behind-the-scenes and run tests on that before allowing the branch to be merged?

This is surprisingly difficult to get right, especially with projects with lots of concurrent changes. Gitlab merge trains[1] really help here.

1. https://docs.gitlab.com/ee/ci/merge_request_pipelines/pipeli...

> Should CI create a merged branch behind-the-scenes and run tests on that before allowing the branch to be merged?

Yes. Anything else is bound to run into the issue described in the article.

I'm honestly surprised that anybody would operate a system that does not do this. Master has to be where all tests are green.

> Should CI create a merged branch behind-the-scenes and run tests on that before allowing the branch to be merged?

The answer to that is: yes. The final test cycle before landing code to master MUST be against the code that would be in master after merging.

This is what the better workflow tools do, among other things to prevent these weird error situations. We have Marge-bot for Gitlab. Uber open-sourced their tool for the same some time back.[0] Bors comes up often as a reference, too.[1]

0: Discussion at the time: https://news.ycombinator.com/item?id=19692820

1: https://github.com/bors-ng/bors-ng

This is helpful, thank you. I will bring this up with my colleagues.

It isn't difficult if you use tools like GitHub, Bitbucket, Azure DevOps, Gitlab, Gitea/GOGS, etc. Because they automatically create git refs for those merge commits for each Pull Request that you can use in your CI. Often called refs/pull-requests/<PR-ID>/merge or similar.

As I always understood it, the role of a CI server was primarily to draw together all outstanding ready-to-merge feature branches into a single merged proposed branch and run integration tests against that. Anything which didn't integrate would be marked red by default, effectively "breaking the build". By doing that consistently before merging you'd have many fewer opportunities to land things which break other branches.

The "integration" part of CI appears to have largely been lost - CI pipelines now tend to just build branch tips and confirm tests run. Performing the proposed merge and checking that combination still runs is still pretty rare IME.

I'd like to actually do this, but how do you tell your CI server what it needs to merge where? In our case we have Jenkins and BitBucket. It's not in my power to change that, and these tools aren't integrated that well (to my knowledge) that you could say which PRs are open and which branch merges into which.

Maybe if you have a workflow where it's always the same target branch (master) it works, but we often have long standing feature branches that get multiple smaller merges from other branches.

Bitbucket (at least on-premise version) has REST api designed to give you information and allow to modify existing PRs.

This is new to me, thank you for sharing! It would fit the bill. Very interesting.

EDIT: Hang on a second, Tom Forbes! Fancy bumping into you here yet again. It's Dan from Marvel!

Is this the best OSS implementation of this? It looks like there's research in doing it better[0] but I can't find an open implementation.

[0] https://news.ycombinator.com/item?id=19692820

The issue is with how GitHub (and most PR / CI systems...) relies on stale CI results.

Master/Main changed between the tests running for that PR, and it merging. Why should those test results be considered a useful indicator for what happens after the PR merges anymore?

GitHub does have a setting you can enable to help with this, essentially, when any PR merges, it forces the developer to rebase any open PRs. A huge waste of effort, so nobody turns it in.

There are far better systems out there, e.g. zuul-ci.org which was designed to handle this exact problem at scale. Others have mentioned GitLab merge trains, though I've never used them - so hard to tell if they really solve this :)

You can actually clone/checkout a "merged" PR provided by Github:

git checkout --force refs/remotes/pull/119/merge

They generate it async behind the scenes, so it may not be immediately available after your PR is opened, and changes to master may be delayed.

This won't help all that much (though, it is slightly better)..

Testing the merged branch at say 10am, then actually merging the PR at 11am says nothing about how that PR interacts with all the changes merged between those 2 points in time.

The only way to ensure always green master/main branch is to test every change as it goes in serially. (There are optimisations that allow you to test things serially but in parallel - which of course sounds like madness .. https://zuul-ci.org/docs/zuul/discussion/gating.html explains it better than I could :D)

I miss having Zuul CI on the projects I work on - was far better than any of the others I've used :)

Just configure your CI to re-run all PR jobs when master updates. E.g. Bitbucket is updating the refs/pull-requests/pr-id/merge git ref automatically and I'm sure many other tools do, too. We are doing exactly this at my workplace and I am surprised that it seems to be unknown to so many people on HN.

We are also doing the same, however it obviously does not scale well.

That's one way of doing exactly what I describe - but it's horribly slow and wastes a tonne of CPU resources so is far from the ideal option.

Maybe the difference is whether you consider each atom to be a releasable candidate, or is there is another release layer (a-la good old “a successful git branching model”)

I’m confused because this isn’t stated clearly in the comments, but shouldn’t you create a merge commit (for the feature branch) that combines both master and feature branch code before running the tests? And then change master to point to your newly created merge commit once tests pass?

Actually one should treat changes as totally ordered before merging them.

But that brings a choke point into your workflow:

* Change A developed and good for master

* Change B developed and good for master

* Change A merged into master

* Change B not good for master any more, gets pushed back

* Change C developed and good for master

* ...

In principle, a change can be put back forever, depending on your practical situation.

That's not a technical problem. No tool can solve this for you. If people keep stepping on each others toes, you need some process in place to ensure some fairness, or better, they talk to each other and find a solution.

You could use a merge queue, that does an rebase on latest master, then tests your code and merges.

Another point in favor of rebase not merge.

This is nothing to do with force pushing / rebasing vs merging. The CI only tested changes in isolation, and not the cumulative changes on the default branch

If anything this is a reason to use merge rather than rebase and force push because it’s easier to debug this nuisance when your devs aren’t mutating implicitly their history

I meant the workflow where you rebase your changes on master, let tests complete, then “merging” doesn’t result in a merge commit with unexpected changes.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact