Hacker News new | more | comments | ask | show | jobs | submit login
RFC: linear history vs merge commits (llvm.org)
53 points by jupp0r 18 days ago | hide | past | web | favorite | 52 comments

There's an option between #1 and #2 that they're not listing. They could allow merges, but only merges after a rebase on top of upstream produced with --no-ff.

The advantage of that is that you get "linear" history as far as the just-pushed topic topic building upon the last one, and you can use merges to group commits, which allows e.g. easy reverting of an entire buggy feature introduced over N commits by reverting out the merge.

Yes this.

I've enforced this rule with my team for years and our git tree is a completely flat tree with the occasional branch diverging from and merging back into master.

Here's what it looks like in practice: https://imgur.com/a/OMplzuv (although in this example you can see that someone messed up and thought they rebased before merging but actually didn't; mistakes happen.)

> you can see that someone messed up and thought they rebased before merging but actually didn't

we use a pre-receive hook which checks that the "oldest ancestor" between the branch being pushed to (i.e. current master's SHA) and the branch being pushed (i.e. the tip of what is to become the new branch) is actually the current master's sha; else, the push is rejected with "Rebase a merged branch before merging it into master".

"oldest ancestor" being:

* take git rev-list --boundary $current_master_sha..$new_sha

* get the last line matching ^- and remove the ^-

* that should be $current_master_sha; if not, error.

Note that this will allow someone to push a history where you have a deeply nested merge mess of some sort all based off the current upstream master.

That's not very likely, but in the interest of sanity and avoiding history complexity (which is the whole point) perhaps it's better to check:

1. That what's being pushed has either zero or one merge commit. Use "git rev-list --min-parents=2". If it's zero you don't need to check anything else (I'm assuming you're checking non-fast-forwards already).

2. If that one merge commit is there, unpack its parents and check that one side of it is strictly ahead of the current master. You can do this with "git merge-base --is-ancestor A B".

3. For consistency, validate that the LHS or RHS of the merge is the existing master, depending on your taste. Usually you'd want it to be the LHS.

Only this way will you guarantee that you have a history without nesting in "git log --oneline --graph", if that's what you're after, which I suspect people who'd like a hook like this actually want, not what you've outlined.

> we use a pre-receive hook

But in the context of this discussion, LLVM is moving to GitHub and there is no possibility of custom pre-receive hook there as far as I know.

As discussed in the linked E-Mail thread they're either talking about using GitHub as a publishing platform for something where they do have custom hooks, or making everything go through GitHub pull requests.

Using either of those methods they can enforce this. E.g. if they go through PRs just write a script that e.g. grabs the refs for the PR, rebases (and --no-ff merges, depending) the commits there, pushes them to master, and closes the PR as merged. The user running that script would be the only one allowed to commit to "master", and would use its own scripted method of integrating them, instead of clicking "Merge" in the GitLab UI.

LLVM allows direct push to master from its contributor.

If we were to go through pull-requests, we wouldn't need anything as GitHub has this setting built-in: https://help.github.com/articles/configuring-commit-rebasing...

The rule we use is: DAGgy fixes for preference, then create multiple fix/feature branches and use rebases to eliminate conflicts. The idea is that each individual branch is based on a commit chosen such that it won't conflict when merged.

This is generally rather easy to do, and for feature work it's compatible with a 'simple' rebase workflow: simply rebase onto master whenever conflicts start to show up. For bugfixes it's rather useful to be able to base a fix on the commit which caused the bug (or a common ancestor of all maintained releases which have it) and just submit multiple merges.

Conflicts are a fact of life: you either deal with them as part of a rebase, or you deal with them when merging. But the testers or reviewers are accepting to merge, and they might not be the right people to resolve the conflicts (plus GitHub's PR workflow provides no nice way to deal with merge conflicts), so the author should fix it.

tl;dr: Non-FF non-conflicting merges only, so your changes get integrated as a 'single' commit in first-parent history, but you get to pick where you base the branch. Props for minimising 'copy' branches.

Does depend on density and structure of code, of course. If every feature branch is touching the same file, you're going to have problems either way.

So, this doesn't suck. The merge commits are noise, but you can filter it out.

As I've explained before, at Sun we used a rebase workflow -- linear history -- for decades. Here's one comment on HN from me about this:


IMO: linear history all the way. Please!

How many developers contribute to a given repository using this model? A problem we have with rebasing is that developers wanting their code merged are constantly having to start CI pipelines over because they had to rebase because another developer's code merged into master before theirs.

At Sun, just in Solaris engineering it was over 2,000 during the 8 years I was there.

For large projects, the upstream would get closed for a few hours (or days, even, like when SMF and ZFS integrated) so that tests could all pass without getting reset.

Optimizing CI/CD is still important for the smaller pushes.

A merge workflow doesn't really prevent pushes the get in ahead of yours from resetting the testing of yours, so anyways, I think the CI/CD thing is orthogonal to merge vs linear history.

So were developers constantly having to rebase their feature branches trying to race them into master before another developer merged? How did you solve that problem?

Regarding conflicts, I don't see how merge vs rebase makes a difference. If some other change sneaks in before yours and causes conflicts, well, you'll have to fix them -- rebase or merge makes no difference.

Incidentally, I have a script for rebasing a branch across thousands of commits that "bisects" so that for any conflicts you're always rebasing across one commit. I.e., first it tries to rebase across all upstream commits, and if there's no conflicts, it's done, else it picks the N/2'th commit to rebase onto, and repeats, then when it's it restarts from whichever commit it ended up at.

Parent should clarify, but I doubt they were actually using git (Sun is quite old, and comment said "for decades", git isn't that old).

In other VCS systems than git, you don't have the same "immutability" issue which means a "push" can be (for example) just sending a "diff" to apply to the server.

We used Teamware, then Mercurial. Always in a rebase workflow, at least from 1992 onward IIRC (I was there from late 2002 through one year past the completion of the Oracle acquisition).

git rerere and mostly automatic rebasing. very rarely there are conflicts.

I edited my comment to respond to that.

> are constantly having to start CI pipelines

Presumably the tooling does this automatically though? They're not literally have to start CI pipelines themselves?

Correct, the pipeline is automatically restart, but it takes a good 5 minutes to run to completion in which time another developer might sneak in a merge and the cycle continues.

Merge or rebase is orthogonal to CI/CD.

Our policy is that code can't merge until CI passes.

If you rebase, you have to restart CI.

If another developer merges, you have to rebase.

As a result I sometimes get stuck in a cycle where I'm not fast enough on the trigger and it's difficult to get my code merged because while I am waiting for CI to pass, someone else merges and I get to rebase and start CI over.

When you merge you still need to restart the CI. You're taking a shortcut if you have a head that passed CI, do a merge w/ no conflicts and push without redoing the CI. EDIT: And you can totally take that same shortcut with rebase. Yes, with rebase you've lost the metadata about the previous head, but you can actually keep it by including a merge commit whose only purpose is to record the previous head of your branch (you can strip these out at push time, or keep them if you like, but they will essentially be useless -- at Sun we called merge commits "turds").

Merge-vs-rebase is totally orthogonal to CI/CD, but if you find that "merging means not restarting the CI" then you've already got a bias in favor of merging in your policies and tooling, and that's why you have that perception. I would challenge you to understand how your CI/CD is forcing a merge workflow rather than merely accepting it unquestioningly.

Right, but how's that different between a rebase and a merge commit? In both cases, new commits appear and you need to re-run to get your commit (merge or rebased) in on top of them.

Our current policy in GitLab is like a combination of 1 and 2:

> Merge commit with semi-linear history

> A merge commit is created for every merge, but merging is only allowed if fast-forward merge is possible. This way you could make sure that if this merge request would build, after merging to target branch it would also build.

A more general approach to this is something like https://bors.tech/ where each merge is explicitly tested before hitting the main branch. Branches then don't need to be manually rebased, but there's still much reduced risk of failures due to not merging (if tests aren't even run on the potential merge commit), or "merge skew" (if they are, but not every time the target branch is updated, like Travis CI, etc., do by default).

Exactly, we prefer that too. But we haven't found a way to enforce that on GitHub. Is there builtin precommit hooks for that on Gitlab?

Not sure about GitHub. In GitLab it's just a Merge Request setting[1].

[1] https://docs.gitlab.com/ee/user/project/merge_requests/#semi...

In my organization we use merge commits because references to the original commits in the PR discussion (including code comments) get all confused after a rebase.

Is there a good way to avoid that?

A nice thing about merges is:

If you have CI on every PR, and you require all merges into master to be fast-forward merges, then you a) guarantee that master is always green and b) don't need to run CI again on the master branch.

If you want a) integration tests that may take some minutes, and b) a bullet proof guarantee on always having those tests green on master...

..then I don't think linear history can be made to work (without a lot of real rebase pain in daily work). I would love to be proved wrong.

> b) don't need to run CI again on the master branch.

Not exactly true. We have a different CI pipeline run based on whether the commit is on a feature branch vs. master.

On a feature branch you probably just want to make sure it builds and passes tests. On master you might want to do things like generate documentation, generate a code-coverage report (doing this on a feature branch might be very time consuming), push to production, etc.

I definitely want those things you say done on the feature branch. Whether push to production happens before or after integration to master isn't critical, but it should be a close 1:1 between master and prod (within minutes).

> generate a code-coverage report (doing this on a feature branch might be very time consuming)

I'd argue that a feature branch is the place where you want to run code-coverage reports, to make sure you improve the numbers

This is quite simplistic: it does not scale on projects with a combination many "PR" in flights (many developers) and hours of CI resources to validate.

I believe that I am on some level OCD and like things to be nice and neat but I think the the push that some people have for these linear histories on git take things way too far. It could be that I'm OCD on the other side of this in that I don't want to do things that modify or attempt to hide the messy realities of distributed development. I do merges, never squash, never rebase, and my tree may be busy but it's easy to follow. A linear history in my opinion only really works for small projects where development is actually linear.

Feature branch, review, squash, merge - only way to roll imho.

The question is how the merge should be enforced. Should the developer have to rebase their feature branch or not? The advantage of rebasing is that you can run CI on the rebased branch, and if it passes, it's guaranteed to pass post-merge. Not so for non-rebased merges.

The downside of rebasing is that you need some kind of queuing mechanism for pull requests so that developers aren't constantly rebasing and racing each other to get changes in.

Squash or not depends on the nature of the change. Sometimes it can be a feature branch with enough work in it that rolling it all up into a single large commit is doing a disservice to somebody who's going to be doing code archaeology later while debugging.

For longer running larger features I would use interactive rebasing to selectively squash the larger feature into several more bite sized commits that declare intent and then merge that

Indeed, and if they're using GitHub, there's actually a 'Squash & Merge' option that can be enabled for all PRs (and possibly the other options can be disabled also). This keeps the feature branch history in the PR, but squashes and merges a single commit into master. Best of both worlds!

No squashing. Unless your log consists of developer diary instead of properly fragmented logical changes - in this case squash before final review (and not necessarily into a single commit).

This discussion is bound to LLVM's current process: phabricator reviews and commit-but-revert-if-it-breaks. Several folks on the thread suggest that it can be decided independently but IMHO it should not be. Want to leverage GH? Then let's really leverage it!

Use pull request reviews.

Use a sane subset of existing buildbot configurations in webhook-triggered CI to maximize coverage for some "reasonable" build+test duration, optimistically batch commits for test. Pay for CI service from Circle/Azure/Travis/whoever, or use existing LLVM infrastructure to execute CI. Some of these CI services are downright simple to enable and start using.

Yes, unfortunately, this will probably involve duplicate configurations between CI and buildbots. Yes, unfortunately, commits will still escape the limited CI tests and cause buildbot regressions. It's still a net win IMO.

After having contributed to projects like Rust and others that use an integrated CI, phabricator & arcanist feels too disconnected to me. Whenever I upstream a change to llvm I'm never confident that it's going to work and it feels so dissimilar from a 'normal' workflow.

Please, never rebase published commits if you are an open source project. If your project is used as a submodule anywhere, it screws up release tags hard.

This is about rebasing "user branches" and/or "pull-requests", if your project is referencing un-merged pull-request as a submodule, there is something fishy.

> if your project is referencing un-merged pull-request as a submodule, there is something fishy

I did not mean incomplete pull requests, I meant master and release branches. Though using unmerged stuff is quite common if the project is yet without an established release process, or not maintained well.

You're right, but no one is proposing to rebase the upstream. So many users think that rebase == lie about history... That's just not what any of us rebase proponents ever do -- we do not rebase upstreams. Indeed, the upstreams I manage disallow non-fast-forward pushes, yet I rebase all the time, just not the upstreams.

The only arguments I hear against the linear history model are:

1. A branching history allows use of git bisect to identify where breaking bugs were introduced. 2. An aesthetic argument to the effect of: it's awesome to have that branching history to explore, to see alternatives and choices. A linear history is a recitation; a branching history is a multiverse.

I don't find either convincing. 2 is a personal preference, but seriously, who has time to go wandering through commit histories looking for precious gems?

1 seems like a more valid argument, but against it: who actually uses git bisect? In both essays I've reading arguing for it, they cite "once I used it to find a really gnarly bug". It doesn't seem like a useful everyday tool, or am I missing a larger set of use cases for it? Is git bisect happening a lot more than it seems to be?

git bisect is actually easier to use with a linear commit history. And it is useful; I've root-caused regressions with it multiple times. So I think this is actually an argument in favor of linear commit history.

Why can't you use bisect on linear histories?

git bisect bad on the bad commit, git bisect good on the good commit, and as long as there is an unbroken path in the graph from bad to good, you should be able to do a binary search.

For what it’s worth I use git bisect at least once a month. I tend to use it when fixing bugs in code I’m less familiar with, and the fastest way to track it down is bisect.

If you have thousands of contributors, then branch graph becomes incomprehensible. Only linear history will do at that point.

Linear history is easier on everybody but the contributor who can't wrap their mind around it. It's easier on everyone else because git log tells them everything they need to know -- no need to chase branches -- and because commits are meaningful (because the contributor must take the time to make them so).

What's stopping you from bisecting a linear history?

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact