
Git can't be made consistent - mark_h
http://bramcohen.livejournal.com/74462.html
======
decklin
Another interesting case:
[http://www.kernel.org/pub/software/scm/git/docs/howto/revert...](http://www.kernel.org/pub/software/scm/git/docs/howto/revert-
a-faulty-merge.txt)

I'm not sure if the "forget whatever happened" metaphor works for me. In the
"revert a revert" article above, the problem is that merging a topic branch
doesn't cause the first few commits on it to be applied if those commits were
already merged but then reverted -- the revert has no effect on the merge.
This is precisely because every commit object, since it includes it's parent's
sha1, uniquely determines a history of changes, and commits in that are in
both ancestries don't get re-applied.

In Bram's example, you have the opposite problem -- two commits are
semantically the same but were made independently and have different sha1s. If
Linus were drawing this diagram he would label them B and B' (and so on...
there's a lot). To git, B' is totally different so a merge applies the change
"again". If the other person had noticed this and reset their branch to the
first B, the merge would be a fast-forward.

IMHO, the Don't Do That should apply to creating those commits (by cherry-
picking, or not rebasing duplicated work) rather than merging. Not because
such commits are morally wrong or something like that, but because git
intentionally ("the stupid content tracker") doesn't handle them well. That's
the tradeoff of the nice object model.

Our git workflow at my job is pretty messy and does run into this sort of
stuff. I'd love something just a _little_ more darcs-y, like say grafting
together the two branches in the second example that arrive at the same
content (without having to manage a local grafts file separate from the
repository), but that opens many other cans of worms that I'm sure I'm not
intelligent enough to deal with.

~~~
lysium
Thanks for mentioning B', I first did not understand what the problem was.
Merging B' will turn A to B', which is equivalent to B. This causes confusion
if the author of the left branch intended to get rid of B.

------
riffraff
>I have a little secret for you: Git can't be made to have eventual
consistency

David Roundy, the initial author of darcs, seems to disagree on this. From
<https://github.com/droundy/iolaus> : > I realized that the semantics of git
are actually not nearly so far from those of darcs as I had previously
thought. In particular, if we view each commit as describing a patch in its
"primitive context" (to use darcs-speak), then there is basically a one-to-one
mapping from darcs' semantics to a git repository.

~~~
gwern
'actually not nearly so far', 'if we view each commit', 'there is basically' -
many differences and gotchas can lurk in such qualifiers.

------
someisaac
I am surprised to see this post from bram cohen, as he himself had a heated
argument with linus torvalds on git design.

<http://www.gelato.unsw.edu.au/archives/git/0504/2153.html>
<http://news.ycombinator.com/item?id=505876>

~~~
ob
Which he lost... i.e. Linus paid no attention to what Bram said ;)

------
jarin
In short: Don't be a dummy and expect git to be some kind of advanced
artificial intelligence.

~~~
divtxt
Someone who knows this stuff please tell me if I my analysis is correct:

I see smart people pointing out "flaws" in software, not realizing the
solution requires strong AI.

(e.g. yesterday's post: <http://news.ycombinator.com/item?id=2455793>)

~~~
dons
Merging _associativity_ doesn't require magic AI.

~~~
divtxt
Sorry - I'm not sure what "merging associativity" is - can you give an
example?

The above article gives us a simple example (A vs B) of a situation where
doing the right thing requires a human aka strong AI because you need to know
the "intent" of commit.

Is there a simple solution - or even a complex one - which would not require a
human to verify?

For a similar analysis of yesterday's post, see this comment:
<http://news.ycombinator.com/item?id=2455970>

~~~
dspillett
Merge associativity would be where taking an initial stage and merging commit
A then merging commit B (where A and B are commits created independently but
from a common start point) _always_ creates _exactly_ the same results as
merging in commit B followed by commit A. The word "associativity" in this
instance is being used in the same sense as it is used in basic arithmetic:
(1+A)+B === 1+(A+B) === (1+B)+A and (1xA)xB === 1x(AxB) === (1xB)xA.

The merge processes used by Git and other common source control systems are
associative for most circumstances where the two (or more) merges affect
different parts of the code (including different parts of the same source
file). The issue tends to raise its ugly head when the two merges affect the
same lines. For instance:

    
    
        Original:     Commit A:     Commit B:
        line 1        line 1        line 1
        line 2        line 3        line 2
        line 3        line 4        line 3 updated
        line 4                      line 4
    

If you merge in that order line 2 will get put back as it will look to simple
inspection like that is what is intended (merging in A removes line two,
merging in B inserts line 2 (which to the merge algorithm is now a new line)
and updates line 3). If you merge B first then line 2 is gone from the result
(merging in B updates line 3, line 2 not needing to be touched as it is the
same, and merging in A after that will remove line 2.

    
    
        Merge A then B:     Merge B then A:
        line 1              line 1   
        line 2              line 3 updated
        line 3 updated      line 4
        line 4       
    

It isn't just deletes/inserts that are affected: changes to the same lines can
produce similarly inconsistent results depending on merge order. The trouble
is that for a DVCS it is impossible to consistently deal with these situations
without a manual merge (or AI better than we currently have). Either output
could be the intention and without context other than the original state and
the two commits you can't tell one way or the other.

A centralised source control system doesn't have this problem because as far
as the repository is concerned there is one and only one timeline: commits
happen in one order so the second will either always override the first where
there is a question. This doesn't mean that the CCVS would be _correct_
though, just that it would be _consistent_.

With either CVSC or a DVSC where a three-way merge (where the start point of
each commit is known so the compare is done between commit, original state and
current state) can be used then a merge conflict could be flagged for these
issues, but a human still needs to make the final decision as no algorithm can
be _consistent_ (or _correct_ ) 100% of the time without a universe of extra
context.

If you were presented with the commits above, would you know what should be
done with line2? Does the change in line 3 depend upon it existing, so you
must keep it, or is it irrelevant, so you should delete it (A says delete, B
doesn't care either way)? Even if you knew that commit B was done later than
commit A that wouldn't mean that it is necessarily the one to trust, and in
any case there might be a more complex set of commits with a mix of conflicts
where A is right in some cases and B in others.

People expecting Git to be associative in these instances are (by my
understanding) asking for the impossible. Perhaps the merge algorithm could be
made a little more intelligent, but I doubt it could ever be 100% correct or
consistent (where consistent implies the associativity of merges). Remember
that what we are dealing with here are edge cases (unless you have lots of
people working on the same areas of the source tree at anyone time, in which
case you should probably consider a more hierarchical distributed repository
arrangement) and changing the behaviour will likely create other, similar,
edge cases so it is probably not worth spending many man hours tweaking the
merge algorithms for instead of introducing a little human intervention into
the potentially inconsistent situations. Any changes that get "lost" due to
the wrong decision being made by the automatic merge algorithm or the human
will still be present in a good source control system (unless you have
explicitly told it to purge them) so they are not lost forever.

 _Caveat: I've not used Git (or any DVCS) in anger yet,_ but I have been
reading around the area with the intention of starting to use it to track my
personal projects and perhaps recommend it (or something similar) to be
considered at work. This is an issue that I thought about a while ago, and I'm
thankful of this recent discussion as it has reaffirmed what I decided after
thinking about it a bit back then: these are edge cases that are safe to
ignore until the rare occasion when they happen, at which point nothing is
lost (I'll just may have to make some decisions manually and/or raise a new
commit to revert changes that are "made in error" due to the inconsistency).
Of course I lack the experience needed to confidently suggest I can't be
proven completely wrong on the matter!

~~~
divtxt
Thank you for the explanation.

------
dons
Note that darcs implements the "expected" or "naive" semantics, at the cost of
edge cases that have exponential time (rather than going ahead with unflagged
inconsistent merges).

~~~
pedrocr
The really big insight Linus had that Brahm apparently still doesn't want to
recognize is that if that if you design what essentially a snapshotted
filesystem, the merge algorithm is just a convenience. Any better merge
algorithms can be added to git without touching the format. In fact any
individual user can pick and choose their merge algorithm that the repository
just cares about the recorded content history (which trees are parent to which
trees).

~~~
DougBTX
Git also stores diffs from time to time: <http://book.git-
scm.com/7_how_git_stores_objects.html>

~~~
pedrocr
I'm guessing you're referring to packed objects. If I understand them
correctly they are just there for space efficiency of the _filesystem that is
git_. They're not first order concepts on which _git the DVCS_ builds upon,
just an implementation detail.

~~~
DougBTX
My thinking when I posted my comment above was that any diff format used in
the repository could be treated as an "internal" format, and any actual merges
that you perform could use any merge strategy that they like, as long as the
commit code converted it into the repository's format on the way in. Which is
why I pointed out that git also uses an internal diff format. However, if your
point is that hg uses an internal format which cannot store particular changes
to files correctly, or requires excessive engineering, then yes, that would be
a problem and I see where you are coming from. I do very much like the
conceptual simplicity of git.

------
Matt_Rose
Nice to see Bram Cohen coming to the same conclusion I did. Having two
branches constantly cross-merging is a bad idea, no matter what SCM you use.

~~~
apenwarr
I'm pretty sure the example in this article _wouldn't_ confuse git: weirdness
like this is the reason git has the "recursive" merge algorithm instead of
just doing a plain three-way merge. A recursive merge basically tries to merge
some of the parents together before doing the final merge, which resolves this
sort of case.

I do criss-cross merges between git branches all the time with no ill effects.
Maybe non-git VCSes can't handle this sort of thing.

~~~
ob
You need to do criss-cross merges that revert previous commits on one or both
sides of the merges. If you're not reverting you're not hitting Bram's
corners.

This is a though corner case and I'm pretty sure you can confuse _any_ source
control system currently in production with cases like this. BitKeeper has
some theoretical solutions, but we haven't gotten around to actually test them
in production.

------
codex
I stopped reading after the first sentence. The author takes some liberties
with the definition of "eventual consistency.". Either he doesn't know what it
means, or he likes to demolish terms which used to be defined precisely.

~~~
random42
On a related note, to establish the credential of the author, he is the
creator of BitTorrent protocol.

<http://en.wikipedia.org/wiki/Bram_Cohen>

~~~
jacobolus
And has worked quite a bit on the revision control diff/merge problem, e.g.
<http://bramcohen.livejournal.com/37690.html>

------
closedbracket
Mr. Joy is a really ironic name.

------
mrwhy2k
Holy crap... someone still uses LiveJournal as their blog.

