
How well do various revision tools handle merge conflicts? - decklin
https://plus.google.com/u/0/113507488402105365870/posts/WHCbnMsqLoB
======
notaddicted
Flashback: Discussion of merging

2009: Git: Bram Cohen vs Linus Torvalds
<http://news.ycombinator.com/item?id=505876>

which refers to

2007: A look back: Bram Cohen vs Linus Torvalds
[http://www.wincent.com/a/about/wincent/weblog/archives/2007/...](http://www.wincent.com/a/about/wincent/weblog/archives/2007/07/a_look_back_bra.php)

which refers to

2005: Re: Merge with git-pasky II.
<http://www.gelato.unsw.edu.au/archives/git/0504/2153.html>

Where Linus says:

For example, it seems like most SCM people think that merging is about getting
the end result of two conflicting patches right.

In my opinion, that's the _least_ important part of a merge. Maybe the kernel
is very unusual in this, but basically true _conflicts_ are not only rare, but
they tend to be things you want a human to look at regardless.

The important part of a merge is not how it handles conflicts (which need to
be verified by a human anyway if they are at all interesting), but that it
should meld the history together right so that you have a new solid base for
future merges.

In other words, the important part is the _trivial_ part: the naming of the
parents, and keeping track of their relationship. Not the clashes.

For example, CVS gets this part totally wrong. Sure, it can merge the
contents, but it totally ignores the important part, so once you've done a
merge, you're pretty much up shit creek wrt any subsequent merges in any other
direction. All the other CVS problems pale in comparison. Renames? Just a
detail.

And it looks like 99% of SCM people seem to think that the solution to that is
to be more clever about content merges. Which misses the point entirely.

Don't get me wrong: content merges are nice, but they are _gravy_. They are
not important. You can do them manually if you have to. What's important is
that once you _have_ done them (manually or automatically), the system had
better be able to go on, knowing that they've been done.

~~~
haberman
I agree with Linus that creating a solid merge base is far more important than
clever merging. But I think there is still room for improving on Git in this
respect. A lot of what I'm about to say is inspired by this video from the
Camp guys: <http://projects.haskell.org/camp/unique>

Git forces you to treat your history as a single linear sequence of commits.
This is an unnecessary restriction if some of the changes in that sequence are
totally independent of each other. For example, if two changes touch two
completely different files and are unrelated, why should you be forced to
sequence them in one order?

Here is a practical situation illustrating this limitation. Once in a while
I'll want to patch a coworker's in-progress change into my working directory
where I also have changes. Perhaps I want to build a binary with several
experimental in-progress changes in it. Suppose my coworker's changes are
totally independent of mine (say they touch completely different files).

I can do this in Git by applying his patch to my working directory (or doing a
"git merge" with his branch). But now suppose I'm done with the experiment and
want to back out my coworker's change, so my working directory is left with
only my change. If I haven't made any more tweaks to _my_ change in the
meantime then I'm ok, I can just "git reset --hard HEAD^" to discard my
coworker's change. But what if I made further changes to my change in the
meantime? There's no easy way with Git to manipulate the two changes
independently within the same branch, even though there are no actual
dependencies between them.

Sure you could create a separate branch for the merged thing. Every time you
want to change _your_ part you switch back to your branch, make the change,
then switch back to the merged branch and merge again. But who wants to be
that disciplined? Who should _have_ to be that disciplined when the computer
could do the work of knowing that the two lines of change are independent of
each other?

Git's ability to create stable and verifiable SHA1's is important, and I think
that any future SCM will need to have this capability. But I don't think this
implies that you have to treat the history in a strictly linear way. You could
create SHA1 checkpoints when a particular person wants to publish and/or sign
a tree and its contents, but still allow the individual commits to be treated
in a more flexible way. The SHA1 checkpoints could be like barriers; each
change is either part of the checkpoint or not, and the checkpoints could know
their parent checkpoint(s) so that there is still a verifiable history
available for auditing.

I hope an approach like this could make large projects like Linux more
intuitive to follow. I always found it unfortunate that the graph of commits
for any project with lots of merge activity is totally indecipherable. For
example, here is a screenshot of Git's own Git repository:
<http://i.imgur.com/RyQm3.png> If independent changes could be viewed
independently, and if every merge didn't have to be an explicit commit,
perhaps this could be easier to follow.

There are definitely lots of unanswered questions here and I don't claim to
have all the answers. My point is just that I don't think Git is necessarily
the last word in distributed version control.

~~~
ithkuil
> If I haven't made any more tweaks to my change in the meantime then I'm ok,
> I can just "git reset --hard HEAD^" to discard my coworker's change.

You can use "git rebase -i" to remove the unwanted commit from the history.

Git rebase is very powerful, definitely worth mastering.

~~~
rwmj
This. In fact we have a policy of never using merge, but always using rebase
(to the point where we think 'git pull --rebase' should be the default action
of pulls).

~~~
lparry
+1, although we use "git fetch origin && git rebase -p origin/branchname" to
avoid the nasty behaviour of 'git pull --rebase' where it rewrite all commits
of a merged branch on the current branch instead of just redoing the merge
commit. Looking back at a 18 month old part of the history and seeing (feature
branch x was merged here) is far more helpful than finding a bunch of
duplicate commits IMHO

------
jcdavis
Some of the more complicated merges, e.g. "adjacent lines" scare me. The
comment says "They clearly don't conflict since they don't modify the same
lines." And while that seems obvious for humans reading the given test case,
it seems easy enough to construct a situation where that is not the case due
to, for instance, a function call spread out over multiple lines.

Sadly these merges require a fair amount of language-specific knowledge. That
doesn't have to be something that we can't ever expect merge tools to do, but
one has to be realistic.

~~~
qznc
More than language-specific knowledge is necessary. Let's say we have two
branches to merge:

branch A: rename foo() to bar() and adapt calls

branch B: add baz(), which calls foo()

We can assume that there is no merge conflict here, since A touches various
lines within major code blocks and B adds some lines between two code blocks.

Now show me one merge tool, which understands that the call to foo() within
baz(), must also be renamed to bar(). Most tools will probably just merge and
produce a broken build.

~~~
InclinedPlane
Imagine you have a modularized compiler that can round-trip between raw text-
based source and parse trees as well as final binaries with associated meta
data attached. In that case it's not too far fetched to imagine version
control systems that merge at the level of parse trees, which would allow it
to detect the conflicts you describe.

~~~
qznc
Detection is possible. Just automatically try to build it after the merge.
Auto-fixing seems impossible to me, though.

~~~
rwmj
... unless the reason you renamed 'foo' was so you could introduce another
function called 'foo' which does foo properly/differently.

For a realistic example, suppose you decided that 'foo' should acquire a lock.
So you rename all existing 'foo' to 'foo_nolock', and add a new wrapper 'foo'
which takes the lock and called 'foo_nolock'.

If your other branch called the original 'foo', it should probably now be
calling 'foo_nolock', but instead it'll be calling the lock function after the
merge, and your compile (or even tests) may not be able to find that error.

~~~
sirclueless
This is why the round trip between source-code and parse tree is so great. Say
branch A adds a call to foo(), and branch B swaps out foo() for foo_nolock().
You can tell from the round trip on branch A that there was a new reference to
foo(). Then in branch B you can tell that the implementation of foo() has
changed.

I'm not sure how you would represent such a conflict. A valid way to resolve
it would be to tell the DVCS, "You dummy, this isn't a conflict, the author of
branch B obviously wanted to change foo() for every call-site, even those he
didn't know about." The normal diff-file syntax of "this branch added these
lines, that branch removed those lines" wouldn't work.

~~~
rwmj
There is already a semantic format for patches:
<http://coccinelle.lip6.fr/sp.php>

However I don't think semantic parsing helps here. For example, suppose I'd
told you (the feature branch developer) that I was going to change 'foo' so
that it had locking semantics, and you had deliberately used 'foo' because of
this. Now when we merge you definitely don't want your 'foo' to be changed to
'foo_nolock'. Alternately you can think of a case where I don't change all
'foo' to 'foo_nolock', so the VCS has no idea what the "rule" is.

~~~
sirclueless
I don't think it is appropriate to change a reference, like you say. If two
people modify the same code, then you should signify a conflict and have
someone resolve the issue by hand. There's no machine on earth that can tell
whether you _meant_ to call foo() or foo_nolock(). The point is to prevent a
false positive (the worst thing by far when merging). If you modify the foo()
function and I add a new reference to it, current line-based merge strategies
will silently resolve that because our edits appear to be far apart, even
though they are semantically conflicting. With some semantic analysis you can
determine that manual resolution is much better. The point is to throw a
conflict, not change a reference silently.

~~~
InclinedPlane
The point of better merge tools shouldn't be to automate merging with 100%
correctness, that's an impossible task. Instead, the point should be to have a
high level of accuracy in doing safe merges and in alerting a human being to
an unsafe merge that requires resolution.

------
SteveJS
I thought the three way merge tool was independent of the source control
system. I'm pretty sure that's true for the four systems I've used: hg, tfs,
perforce, and the horrible horrible SLM. I can say however that SLM's default
three way merge just seemed to always do the right thing.

Tools for managing real conflicts seem more interesting. Most conflict
resolution tools seem to 'help' in a way that leaves me completely baffled.
They automate the creation of unintentional edits rather than helping you
understand the history of the changes that lead to the conflict, and tracking
and reversibility of what you are doing during merge.

I've resorted to temporarily overlaying another source control system to track
dealing with resolving large complicated merge conflicts.

~~~
sirclueless
Inasmuch as a merge is a 3-way comparison between a common parent and two
branches, it is basically DVCS-agnostic. The really interesting thing here is
that Darcs doesn't just do three-way merges: it actually tracks every change
along the way. From what I understand, Darcs conceptually resolves conflicts
as if you rewound one branch and played it on top of the other, and vice versa
simultaneously. A conflict is considered resolved if these operations are
commutative, that is, the order of commits doesn't affect the result. Manual
intervention is required when this fails to be true. This is inherently more
powerful than a three-way merge, because you have the entire history of each
divergent branch to help you understand changes, instead of only the net
effect of each.

------
Too
See <http://www.guiffy.com/SureMergeWP.html> for another merge test suite with
some background material. A year ago i tried a few of them in various diff-
tools, none passed all of the tests, including guiffy even though they claim
to in the article. Some of the tests can also be considered objective or non-
resolvable but it still an eye opener to see how poor the merge tools really
are.

Btw, i thought merge conflict handling was a feature of the diff tool, not the
scm?

~~~
natep
Try Beyond Compare (I'm not affiliated, just a longtime customer). I just went
through each test case and had no problems. Resolving one of the standard
conflicts involved using 'align with <F7>' to separate the changes at the end
of the file. Most of the pathological cases were solved automatically and
without even conflict markers, and when they weren't, selecting a hunk, right
clicking, and choosing 'take left then right' worked.

~~~
tomlu
Seconded. Beyond Compare is the best merge and diff tool bar none.

------
drivebyacct2
I'd like to see TFS in there. Mostly out of resentment.

~~~
EtienneK
Agreed.

Last week alone I lost code 3 times.

