
RFC: linear history vs merge commits - jupp0r
http://lists.llvm.org/pipermail/llvm-dev/2019-January/129723.html
======
avar
There's an option between #1 and #2 that they're not listing. They could allow
merges, but only merges after a rebase on top of upstream produced with --no-
ff.

The advantage of that is that you get "linear" history as far as the just-
pushed topic topic building upon the last one, and you can use merges to group
commits, which allows e.g. easy reverting of an entire buggy feature
introduced over N commits by reverting out the merge.

~~~
gouggoug
Yes this.

I've enforced this rule with my team for years and our git tree is a
completely flat tree with the occasional branch diverging from and merging
back into master.

Here's what it looks like in practice:
[https://imgur.com/a/OMplzuv](https://imgur.com/a/OMplzuv) (although in this
example you can see that someone messed up and thought they rebased before
merging but actually didn't; mistakes happen.)

~~~
mfontani
> you can see that someone messed up and thought they rebased before merging
> but actually didn't

we use a pre-receive hook which checks that the "oldest ancestor" between the
branch being pushed to (i.e. current master's SHA) and the branch being pushed
(i.e. the tip of what is to become the new branch) is actually the current
master's sha; else, the push is rejected with "Rebase a merged branch before
merging it into master".

"oldest ancestor" being:

* take git rev-list --boundary $current_master_sha..$new_sha

* get the last line matching ^- and remove the ^-

* that should be $current_master_sha; if not, error.

~~~
Joky
> we use a pre-receive hook

But in the context of this discussion, LLVM is moving to GitHub and there is
no possibility of custom pre-receive hook there as far as I know.

~~~
avar
As discussed in the linked E-Mail thread they're either talking about using
GitHub as a publishing platform for something where they do have custom hooks,
or making everything go through GitHub pull requests.

Using either of those methods they can enforce this. E.g. if they go through
PRs just write a script that e.g. grabs the refs for the PR, rebases (and
--no-ff merges, depending) the commits there, pushes them to master, and
closes the PR as merged. The user running that script would be the only one
allowed to commit to "master", and would use its own scripted method of
integrating them, instead of clicking "Merge" in the GitLab UI.

~~~
Joky
LLVM allows direct push to master from its contributor.

If we were to go through pull-requests, we wouldn't need anything as GitHub
has this setting built-in: [https://help.github.com/articles/configuring-
commit-rebasing...](https://help.github.com/articles/configuring-commit-
rebasing-for-pull-requests/)

------
cryptonector
As I've explained before, at Sun we used a rebase workflow -- linear history
-- for decades. Here's one comment on HN from me about this:

[https://news.ycombinator.com/item?id=19012599](https://news.ycombinator.com/item?id=19012599)

IMO: linear history all the way. Please!

~~~
umvi
How many developers contribute to a given repository using this model? A
problem we have with rebasing is that developers wanting their code merged are
constantly having to start CI pipelines over because they had to rebase
because another developer's code merged into master before theirs.

~~~
cryptonector
At Sun, just in Solaris engineering it was over 2,000 during the 8 years I was
there.

For large projects, the upstream would get closed for a few hours (or days,
even, like when SMF and ZFS integrated) so that tests could all pass without
getting reset.

Optimizing CI/CD is still important for the smaller pushes.

A merge workflow doesn't really prevent pushes the get in ahead of yours from
resetting the testing of yours, so anyways, I think the CI/CD thing is
_orthogonal_ to merge vs linear history.

~~~
umvi
So were developers constantly having to rebase their feature branches trying
to race them into master before another developer merged? How did you solve
that problem?

~~~
Joky
Parent should clarify, but I doubt they were actually _using git_ (Sun is
quite old, and comment said "for decades", git isn't that old).

In other VCS systems than git, you don't have the same "immutability" issue
which means a "push" can be (for example) just sending a "diff" to apply to
the server.

~~~
cryptonector
We used Teamware, then Mercurial. Always in a rebase workflow, at least from
1992 onward IIRC (I was there from late 2002 through one year past the
completion of the Oracle acquisition).

------
umvi
Our current policy in GitLab is like a combination of 1 and 2:

> Merge commit with semi-linear history

> A merge commit is created for every merge, but merging is only allowed if
> fast-forward merge is possible. This way you could make sure that if this
> merge request would build, after merging to target branch it would also
> build.

~~~
deepsun
Exactly, we prefer that too. But we haven't found a way to enforce that on
GitHub. Is there builtin precommit hooks for that on Gitlab?

~~~
umvi
Not sure about GitHub. In GitLab it's just a Merge Request setting[1].

[1]
[https://docs.gitlab.com/ee/user/project/merge_requests/#semi...](https://docs.gitlab.com/ee/user/project/merge_requests/#semi-
linear-history-merge-requests)

------
ufo
In my organization we use merge commits because references to the original
commits in the PR discussion (including code comments) get all confused after
a rebase.

Is there a good way to avoid that?

------
dagss
A nice thing about merges is:

If you have CI on every PR, and you require all merges into master to be fast-
forward merges, then you a) guarantee that master is always green and b) don't
need to run CI again on the master branch.

If you want a) integration tests that may take some minutes, and b) a bullet
proof guarantee on always having those tests green on master...

..then I don't think linear history can be made to work (without a lot of real
rebase pain in daily work). I would love to be proved wrong.

~~~
umvi
> b) don't need to run CI again on the master branch.

Not _exactly_ true. We have a different CI pipeline run based on whether the
commit is on a feature branch vs. master.

On a feature branch you probably just want to make sure it builds and passes
tests. On master you might want to do things like generate documentation,
generate a code-coverage report (doing this on a feature branch might be very
time consuming), push to production, etc.

~~~
dagss
I definitely want those things you say done on the feature branch. Whether
push to production happens before or after integration to master isn't
critical, but it should be a close 1:1 between master and prod (within
minutes).

------
optimuspaul
I believe that I am on some level OCD and like things to be nice and neat but
I think the the push that some people have for these linear histories on git
take things way too far. It could be that I'm OCD on the other side of this in
that I don't want to do things that modify or attempt to hide the messy
realities of distributed development. I do merges, never squash, never rebase,
and my tree may be busy but it's easy to follow. A linear history in my
opinion only really works for small projects where development is actually
linear.

------
jtms
Feature branch, review, squash, merge - only way to roll imho.

~~~
int_19h
Squash or not depends on the nature of the change. Sometimes it can be a
feature branch with enough work in it that rolling it all up into a single
large commit is doing a disservice to somebody who's going to be doing code
archaeology later while debugging.

~~~
jtms
For longer running larger features I would use interactive rebasing to
selectively squash the larger feature into several more bite sized commits
that declare intent and then merge that

------
wyldfire
This discussion is bound to LLVM's current process: phabricator reviews and
commit-but-revert-if-it-breaks. Several folks on the thread suggest that it
can be decided independently but IMHO it should not be. Want to leverage GH?
Then let's really leverage it!

Use pull request reviews.

Use a sane subset of existing buildbot configurations in webhook-triggered CI
to maximize coverage for some "reasonable" build+test duration, optimistically
batch commits for test. Pay for CI service from Circle/Azure/Travis/whoever,
or use existing LLVM infrastructure to execute CI. Some of these CI services
are downright simple to enable and start using.

Yes, unfortunately, this will probably involve duplicate configurations
between CI and buildbots. Yes, unfortunately, commits will still escape the
limited CI tests and cause buildbot regressions. It's still a net win IMO.

After having contributed to projects like Rust and others that use an
integrated CI, phabricator & arcanist feels too disconnected to me. Whenever I
upstream a change to llvm I'm never confident that it's going to work and it
feels so dissimilar from a 'normal' workflow.

------
lostmsu
Please, never rebase published commits if you are an open source project. If
your project is used as a submodule anywhere, it screws up release tags hard.

~~~
Joky
This is about rebasing "user branches" and/or "pull-requests", if your project
is referencing un-merged pull-request as a submodule, there is something
fishy.

~~~
lostmsu
> if your project is referencing un-merged pull-request as a submodule, there
> is something fishy

I did not mean incomplete pull requests, I meant master and release branches.
Though using unmerged stuff is quite common if the project is yet without an
established release process, or not maintained well.

------
fatbird
The only arguments I hear against the linear history model are:

1\. A branching history allows use of git bisect to identify where breaking
bugs were introduced. 2\. An aesthetic argument to the effect of: it's awesome
to have that branching history to explore, to see alternatives and choices. A
linear history is a recitation; a branching history is a multiverse.

I don't find either convincing. 2 is a personal preference, but seriously, who
has time to go wandering through commit histories looking for precious gems?

1 seems like a more valid argument, but against it: who actually uses git
bisect? In both essays I've reading arguing for it, they cite "once I used it
to find a really gnarly bug". It doesn't seem like a useful everyday tool, or
am I missing a larger set of use cases for it? Is git bisect happening a lot
more than it seems to be?

~~~
KerrickStaley
git bisect is actually _easier_ to use with a linear commit history. And it is
useful; I've root-caused regressions with it multiple times. So I think this
is actually an argument in favor of linear commit history.

