The advantage of that is that you get "linear" history as far as the just-pushed topic topic building upon the last one, and you can use merges to group commits, which allows e.g. easy reverting of an entire buggy feature introduced over N commits by reverting out the merge.
I've enforced this rule with my team for years and our git tree is a completely flat tree with the occasional branch diverging from and merging back into master.
Here's what it looks like in practice: https://imgur.com/a/OMplzuv (although in this example you can see that someone messed up and thought they rebased before merging but actually didn't; mistakes happen.)
we use a pre-receive hook which checks that the "oldest ancestor" between the branch being pushed to (i.e. current master's SHA) and the branch being pushed (i.e. the tip of what is to become the new branch) is actually the current master's sha; else, the push is rejected with "Rebase a merged branch before merging it into master".
"oldest ancestor" being:
* take git rev-list --boundary $current_master_sha..$new_sha
* get the last line matching ^- and remove the ^-
* that should be $current_master_sha; if not, error.
That's not very likely, but in the interest of sanity and avoiding history complexity (which is the whole point) perhaps it's better to check:
1. That what's being pushed has either zero or one merge commit. Use "git rev-list --min-parents=2". If it's zero you don't need to check anything else (I'm assuming you're checking non-fast-forwards already).
2. If that one merge commit is there, unpack its parents and check that one side of it is strictly ahead of the current master. You can do this with "git merge-base --is-ancestor A B".
3. For consistency, validate that the LHS or RHS of the merge is the existing master, depending on your taste. Usually you'd want it to be the LHS.
Only this way will you guarantee that you have a history without nesting in "git log --oneline --graph", if that's what you're after, which I suspect people who'd like a hook like this actually want, not what you've outlined.
But in the context of this discussion, LLVM is moving to GitHub and there is no possibility of custom pre-receive hook there as far as I know.
Using either of those methods they can enforce this. E.g. if they go through PRs just write a script that e.g. grabs the refs for the PR, rebases (and --no-ff merges, depending) the commits there, pushes them to master, and closes the PR as merged. The user running that script would be the only one allowed to commit to "master", and would use its own scripted method of integrating them, instead of clicking "Merge" in the GitLab UI.
If we were to go through pull-requests, we wouldn't need anything as GitHub has this setting built-in: https://help.github.com/articles/configuring-commit-rebasing...
This is generally rather easy to do, and for feature work it's compatible with a 'simple' rebase workflow: simply rebase onto master whenever conflicts start to show up. For bugfixes it's rather useful to be able to base a fix on the commit which caused the bug (or a common ancestor of all maintained releases which have it) and just submit multiple merges.
Conflicts are a fact of life: you either deal with them as part of a rebase, or you deal with them when merging. But the testers or reviewers are accepting to merge, and they might not be the right people to resolve the conflicts (plus GitHub's PR workflow provides no nice way to deal with merge conflicts), so the author should fix it.
tl;dr: Non-FF non-conflicting merges only, so your changes get integrated as a 'single' commit in first-parent history, but you get to pick where you base the branch. Props for minimising 'copy' branches.
Does depend on density and structure of code, of course. If every feature branch is touching the same file, you're going to have problems either way.
IMO: linear history all the way. Please!
For large projects, the upstream would get closed for a few hours (or days, even, like when SMF and ZFS integrated) so that tests could all pass without getting reset.
Optimizing CI/CD is still important for the smaller pushes.
A merge workflow doesn't really prevent pushes the get in ahead of yours from resetting the testing of yours, so anyways, I think the CI/CD thing is orthogonal to merge vs linear history.
Incidentally, I have a script for rebasing a branch across thousands of commits that "bisects" so that for any conflicts you're always rebasing across one commit. I.e., first it tries to rebase across all upstream commits, and if there's no conflicts, it's done, else it picks the N/2'th commit to rebase onto, and repeats, then when it's it restarts from whichever commit it ended up at.
In other VCS systems than git, you don't have the same "immutability" issue which means a "push" can be (for example) just sending a "diff" to apply to the server.
Presumably the tooling does this automatically though? They're not literally have to start CI pipelines themselves?
If you rebase, you have to restart CI.
If another developer merges, you have to rebase.
As a result I sometimes get stuck in a cycle where I'm not fast enough on the trigger and it's difficult to get my code merged because while I am waiting for CI to pass, someone else merges and I get to rebase and start CI over.
Merge-vs-rebase is totally orthogonal to CI/CD, but if you find that "merging means not restarting the CI" then you've already got a bias in favor of merging in your policies and tooling, and that's why you have that perception. I would challenge you to understand how your CI/CD is forcing a merge workflow rather than merely accepting it unquestioningly.
> Merge commit with semi-linear history
> A merge commit is created for every merge, but merging is only allowed if fast-forward merge is possible. This way you could make sure that if this merge request would build, after merging to target branch it would also build.
Is there a good way to avoid that?
If you have CI on every PR, and you require all merges into master to be fast-forward merges, then you a) guarantee that master is always green and b) don't need to run CI again on the master branch.
If you want a) integration tests that may take some minutes, and b) a bullet proof guarantee on always having those tests green on master...
..then I don't think linear history can be made to work (without a lot of real rebase pain in daily work). I would love to be proved wrong.
Not exactly true. We have a different CI pipeline run based on whether the commit is on a feature branch vs. master.
On a feature branch you probably just want to make sure it builds and passes tests. On master you might want to do things like generate documentation, generate a code-coverage report (doing this on a feature branch might be very time consuming), push to production, etc.
I'd argue that a feature branch is the place where you want to run code-coverage reports, to make sure you improve the numbers
The downside of rebasing is that you need some kind of queuing mechanism for pull requests so that developers aren't constantly rebasing and racing each other to get changes in.
Use pull request reviews.
Use a sane subset of existing buildbot configurations in webhook-triggered CI to maximize coverage for some "reasonable" build+test duration, optimistically batch commits for test. Pay for CI service from Circle/Azure/Travis/whoever, or use existing LLVM infrastructure to execute CI. Some of these CI services are downright simple to enable and start using.
Yes, unfortunately, this will probably involve duplicate configurations between CI and buildbots. Yes, unfortunately, commits will still escape the limited CI tests and cause buildbot regressions. It's still a net win IMO.
After having contributed to projects like Rust and others that use an integrated CI, phabricator & arcanist feels too disconnected to me. Whenever I upstream a change to llvm I'm never confident that it's going to work and it feels so dissimilar from a 'normal' workflow.
I did not mean incomplete pull requests, I meant master and release branches.
Though using unmerged stuff is quite common if the project is yet without an established release process, or not maintained well.
1. A branching history allows use of git bisect to identify where breaking bugs were introduced.
2. An aesthetic argument to the effect of: it's awesome to have that branching history to explore, to see alternatives and choices. A linear history is a recitation; a branching history is a multiverse.
I don't find either convincing. 2 is a personal preference, but seriously, who has time to go wandering through commit histories looking for precious gems?
1 seems like a more valid argument, but against it: who actually uses git bisect? In both essays I've reading arguing for it, they cite "once I used it to find a really gnarly bug". It doesn't seem like a useful everyday tool, or am I missing a larger set of use cases for it? Is git bisect happening a lot more than it seems to be?
git bisect bad on the bad commit, git bisect good on the good commit, and as long as there is an unbroken path in the graph from bad to good, you should be able to do a binary search.
Linear history is easier on everybody but the contributor who can't wrap their mind around it. It's easier on everyone else because git log tells them everything they need to know -- no need to chase branches -- and because commits are meaningful (because the contributor must take the time to make them so).