I am the original sentence.
I am a different sentence.
I am the original sentence.
I am the original sentence.
The argument is that in certain cases it can be known which of Bob's 2 sentences is the original and which is the copy (due to context provided by an intermediate commit) and that therefore a correct VCS will figure out that the original is on the bottom:
I am the original sentence.
I am a different sentence.
I am a different sentence.
I am the original sentence.
The problem is not actually solvable. So git doesn't try to solve it. I think that's why it's called "the stupid content tracker."
EDIT: Is there anything worse than "smart" features that only work, say, 80% of the time? The closer they get to 100% the worse it gets, because then you start relying on them and they break right when you stop paying attention.
I thought the point was that if you pull the exact same commits in different order the merge will produce a different result for the same files, meaning that in git the history does matter. Whereas darcs/etc will always produce the same result, such that history does not matter?
Sort of. The OP doesn't write clearly. He's also confused about how git works. What he means is..
Say Bob has 2 commits (B1-B2) and Alice has 1 (A1)
Scenario 1: Alice merges each of Bob's commits in sequence (i.e. she replays his commit history onto her repo: A1-B1-B2).
Scenario 2: Alice merges only B2 (A1-B2).
The point is that, with git, Alice's repo will be different in each scenario. Because in scenario 2 git doesn't examine commit B1 and use that info to try and figure out what the content in commit B2 "means".
With darcs, on the other hand, her scenario 1 repo will be identical to her scenario 2 repo.
The flip side is that in scenario 2 git will always produce the same result for the same B2, because B1 is irrelevant. With darcs a change in B1 will change the result.
NOTE: "git pull --rebase" actually does "replay commit history" instead of "merge" when pulling code into your repo (result: B1-B2-A1). I use it as my default. The outcome is the same as darcs, the difference is that everything is explicit.
I don't understand where or how you could encounter a circumstance where this would matter. This complaint seems to be an abstract theoretical point (maybe to support git alternatives? dunno) that even esoteric usage of a DSCV would never come across.
I dunno, maybe i'm not being creative enough in my use of histories.
EDIT: Okay this explains everything in a considerably more concise fashion than the article does: http://news.ycombinator.com/item?id=2456529
In the beginning:
In Git, or by applying patches manually, it depends on the order in which you merge. If you merge the `B()A()` branch with the `return 2` branch and then the `A()B()A()` one, you'll get the second result. But if you merge the `A()B()A()` directly with the `return 2` branch, you'll get first one. The same set of changes producing different outcomes.
In Darcs, the history between `A()`, `B()A()`, and `A()B()A()` are checked, and it's seen that the second `A()` is the "original" one, so the `return 2` is applied to that one.
Which means that you won't necessarily get the same behavior merging two Darcs patches as you would merging it within the repository, where there is a history. Git behaves exactly as if you were dealing with patches. I side with Git on this, personally, but it's a valid point - you have history, why not use it?
The probable case is something like:
What I "meant to do" could have been as you stated, where both should have changed. Or I could have copied the internals of a function to a new one, and made minor changes around it, and actually do wish to use that new copy as the official version. There is no way to 100% accurately detect such intent without being explicit about it, so I'd prefer something dumb and therefore extremely predictable.
the git people are arguing that the speed lost by gaining this commutative nature is just not worth it. i agree.
Alice and Bob both make changes. Alice pulls Bob's change and merges it. Bob makes a second change. Alice pulls the second change and merges it.
Alice and Bob both make changes. Bob makes a second change. Alice pulls Bob's changes and merges.
The final result, which is Alice's change merged with Bob's two changes, ends up different in the two cases, and there were no merge conflicts.
History is EVERYTHING to a VCS. You ALWAYS want exact information of what changed at what time. This lets you do all sorts of cool things like examine the provenance of a file in detail, integrate a similar change across two different branches whose code may have diverged, etc.
Meticulous tracking of history as well as efficient handling of large binary blobs are why the pros almost always rely on Perforce for large projects.
"And so, when we want to merge our code together, Mercurial actually has a whole lot more information: it knows what each of us changed and can reapply those changes, rather than just looking at the final product and trying to guess how to put it together.
"For example, if I change a function a little bit, and then move it somewhere else, Subversion doesn’t really remember those steps, so when it comes time to merge, it might think that a new function just showed up out of the blue. Whereas Mercurial will remember those things separately: function changed, function moved, which means that if you also changed that function a little bit, it is much more likely that Mercurial will successfully merge our changes."
I'd assumed git and mercurial worked the same way.
If you make a change to a file in Git and commit it, the new version will store the full updated contents of that file (delta compression is an orthogonal issue). Indeed, my use of the word "version" is revealing. That concept is secondary in Darcs; changes are what have primary ontological status.
At least for git, git will do start by doing a 3-way merge, and if that fails, only then will it try to resolve the merge conflict by looking at the intermediate history. This is much faster, and for Linus, who wants to encourage lots of branching and merging, merge speed is highly important. This is what makes git fundamentally better than svn or cvs; the fact that it can get many more merge cases right, and that it can do this quickly and painlessly. So the darcs folks who say that git only does 3-way merges is incorrect; git can do much more sophisticated things than just 3-way merges. However, it only pulls out these more sophisticated weapons when the simple approach doesn't work (and 95+% of the time, the simple approach works just great).
What Darcs did is it focused on the "get many, many, MANY more merge cases right", but it completely ignored the "quickly" part of the equation. That's partially because it's amazingly complicated. Just take a look at the Darcs "Theory of Patches", and its obsessive fixation on being able to whether or not you different patches are commutative, etc., and that gives you a very strong hint of its complexity right here: http://en.wikibooks.org/wiki/Understanding_Darcs/Patch_theor...
The question is whether this complexity is necessary or not. It certainly does slow things down. And fundamentally, that's the question; is it worth it to slow down nearly every single SCM operation just so that a few corner cases can be handled automatically, instead of requiring minimal human intervention? Since people of good will can disagree on this, the controversy certainly continues to exist. But I think a very large number of people are quite happy with the engineering tradeoff made by systems such as Git and Mercurial.
I stopped using Darcs a few years ago, but I heard the current generation at least resolved the notorious exponential time slowdowns.
Git's speed is definitely a big selling point. More than that, the ecosystem and services like GitHub are what really sold me on it versus alternatives. But Mercurial has a lot to offer and its simpler user interface, better Windows support and extensions like BFiles make it a much better fit for certain use cases.
I shouldn't have been so hasty to say that Mercurial doesn't store changes. But I'd argue, and you seem to agree, that Mercurial's revlog does not reflect a difference from Git in the basic philosophy of merging and the status and role of versions. In both cases you're basically dealing with genealogically annotated purely functional trees. By comparison, Darcs's theory of patches represents a radical departure. At the very least I'm happy that someone is trying to think deep and different thoughts in this area.
You can always try to spend more time trying to use more data, or deducing more semantic information, but past a certain point, it's what Linus Torvalds has called "mental masterburation".
For example, you could try to create an algorithm that notices that in branch A a method function has been renamed, and in branch B, a call to that method function was introduced, and when you merge A and B, it will also automatically rename the method function invocation that was added in branch B. That might be closer to "doing the right thing". But does it matter? In practice, a quick trial compile check of the sources before you finalize the merge will solve the problem, and that way you don't have to start adding language-specific semantic parsers for C++, Java, etc. So just because something could be done to make merges smarter, doesn't mean that it should be done.
It's a similar case going on here. Yes, if you prepend and postpend identical text, a 3-way merge can get confused. And since git doesn't invoke its extra resolution magic unless the merge fails, the "wrong" result, at least according to the darcs folks, can happen. But the reason why git has chosen this result is that Linus wanted merges to be fast. If you have to examine every single intermediate node to figure out what might be going on, merges would become much slower, since in real life there will be many, many more intermediate nodes that darcs would have to analyze. Given that this situation doesn't happen much in real life (not withstanding SCM geeks who spend all day dreaming up artificial merge scenarios), it's considered a worthwhile tradeoff.
This all just proves again that there can be no perfect merge strategy;
you'll always have to verify that the right thing was done.
Some people, however, feel that the Git algorithm is good enough, and doing it the Darcs way would be slower without much benefit other than for fairly artificial examples (you have to be doing something where you move a block of code, and then re-introduce that same block back in the original place on one side of the merge, while patching that block on the other side of the merge). Personally, I've found Git's merge strategy adequate for everything I've used it for. Git has support for multiple merge strategies, so if someone wanted to implement a better but slower one as an opt-in, they could do so.
For what it's worth, "git pull --rebase" does enforce a specific order to changes (local changes always happen after remote changes) which will produce the same results regardless of when user Bob pulls user Charlie's changes: regardless of whether Bob pulls change c1 after commiting both b1 and b2 or after commiting b1 and before commiting b2, the final commit order will always be "a, c1, b1, b2".
Of course, if Bob commits and pushes b1 before Charlie commits and pushes c1, the final commit order will be "a, b1, c1, b2", but how could it ever be otherwise?
Ultimately, though, what you really want is for the VCS to just do what you mean. That's a lot trickier than providing mathematical guarantees about patch reordering and convergence.
Some DVCS, like Darcs, might behave better, but they all seem almost comically slow even for medium-sized repos. If I have to sacrifice git's speed for certain types of correctness (that don't trouble me on a daily basis), I will be VERY reluctant to make that choice.
Git merges files, not file-histories. Git's behavior is simple, clear, and easy to understand.
I can see why you might expect merges to be transitive like this (it would be an elegant property, if it were true), but why does it matter to you? In what way do you use merges that could rely on this expectation?
> There are still some people who still think nothing is wrong with git; that it is okay for the result of a merge to depend on how things are merged rather than on only what is merged; that is it okay for two git repositories that pull the same patches to have different contents depending on how they pulled those patches. I don’t know what to say to those people. Such a view seems like insanity to me.
I think the idea is the potential problems with this could emerge if you have two people simultaneously doing somewhat larger complicated merges that have this core problem perhaps more than once. I think that may be true, but the probability of this occurring is well below just plain-old-fashioned human screwups, and the solution to both ("laborious history comparison, examination, and a reset --hard to a hash by somebody") is the same in both. I really don't see how fixing this would solve any real-world problem.
FWIW, I use git-svn to handle complex merges in svn because git has a better merge algorithm. While this particular situation doesn't affect that use case - I think it could, but it should be rare with (svn) branch discipline - the fact that it might is something to keep in mind.
More specifically, if they pulled the same patches, the outcome would be identical. What he wants to be able to do is pull the same history by pulling different patches in that history. A patch is the diff between two repository states, and that's all it is. Sadly, diffs are intransitive.
That said: An elegant property? Are you kidding? That is intrinsic to most tools that dare call themselves "source control". Git requires extraordinary explanation if it behaves in an extraordinary fashion.
Bullshit. 'Merge' is one of the most complicated operations in every versioning system. I'm pretty confident svn is 'inconsistent'. Or is that too niche?
Do I understand you right? This is not the expectation for a source control tool?
Merge is not a tool for reproducing a canonical state, it's a tool for combining two or more of them, an entirely different topic.
Any other straw men you'd like to hold up real fast?
1) The article talks about auto-merges. If the code is "too close" by some definition of close, you get a conflict that needs to be manually merged. The article does NOT talk about manual merges.
2) The article is titled "Git is Inconsistent", it doesn't claim Git is WRONG, it claims it is INCONSISTENT. It does different things depending on how you merge and when.
I think consistency in a DVCS is a desirable goal. It should not matter whether you pull A then B, or pull B then A, or whether given a series of commits, you pull after each one, or just once at the end. The end result should be the same.
That it is a rare occurrence only makes it worse. You will mostly trust the auto-merge algorithm until you hit the corner case and it will be very expensive in terms of time/money to fix the mistake.
Git's brilliance/stupidity is precisely that it only tracks contents, so although it could get the right answer it makes it very expensive to do it.
Ok. The claim that git is inconsistent is wrong. From OP:
The problem with git’s merging is that it doesn’t satisfy the “merge associativity law” which states that merging change A into a branch followed by merging change B into the branch gives the same results as merging both changes in together in one merge.
There is no such concept in git as "merging both changes in together in one merge".
I have modified a shell script written by Simon Marlow that illustrates, using git, how merging two patches separately can give different results than merging two patches together.
The shell script doesn't do what is claimed. It can't because git has no facility for "merging two patches together". Git can only do 2 things with patches:
1. generate a patch
2. apply a patch
But! git has a function which is equivalent to combining 2 patches in a single merge:
git pull --rebase
The shell script does not use this command. It first applies 2 patches separately. It then applies 1 patch separately.
There are still some people who still think nothing is wrong with git; that it is okay for the result of a merge to depend on how things are merged rather than on only what is merged; that is it okay for two git repositories that pull the same patches to have different contents depending on how they pulled those patches. I don’t know what to say to those people.
This is just incoherent. I have no idea what to say in response because I have no idea what the intended meaning is.
Git can't do (at all) what he wants to accuse it of doing wrong (because it has nothing to do with what git does). So I'm just pointing out the closest approximation to what he's aiming at is to use pull --rebase.
Personally I like to have a straight line history as a default and only merge when required. Rather than always merge by default.
Edit: Ok, I'm not sure I understand the point of the pastebin. Maybe. If you want the lower C to become X you need to git checkout master and then git rebase c. Not the other way around. Is that it?
No, OP is saying "when I cook my food in the microwave for 3 minutes, I get it to a very different temperature than if I cook it for 1.5 minutes first and then another 1.5 minutes"
Git doesn't use history to determine merge behavior (edit: in this circumstance). Git behaves like applying patches. Darcs uses the history to make "intelligent" patches.
It's a matter of taste. If you look at Git as having a history, therefore should use the history, yes, it's incorrect. But if you look at it as a patch manager, it's behaving as it should, and Darcs is frighteningly unpredictable - the numbers on the patch might not match the numbers of the lines it modifies.
I side with Git on this. I can generate patches from Git that will work anywhere, and use them 100% identically within Git as manually applying them. The same cannot be said for Darcs.
Darcs, however, is more declarative--it stores patches. And not just patches but patches with dependencies. This set of patches describes how the current state of the repository is constructed. So when you merge you're really just adding new patches to the repo and it knows exactly what to do to make it work.
The interesting thing is that git has all the information there... It could go through the relevant history, diff everything and put the resulting patches in a darcs-like data structure and then commute patches with darcs' patch theory.
But in the end I'm not sure I'm ready to call darcs' style right and git's wrong. Both of them have a fairly easy to understand object models and they both have merges that act in accordance to the internal philosophies of those object models.
What is the specific problem with the algorithm that causes this?
In other words, we're already at the point of significantly diminished, possibly negative returns on effort. The last few percent will always require some level of human-equivalent intelligence. I think effort here is much better spent elsewhere, like researching general AI or playing on waterslides.
Simple merges strategy are "good enough" in practice.
"We have tried to draw spirals using cartesian coordinates, what we have gets us 90% there, but there are infinities and edge cases involved in getting a perfect spiral. The equations describing them would get so complicated it's just not worth it."
What we have in BitKeeper is the equivalent of polar coordinates... it makes drawing spirals much, much easier ;)
I would be nice if you could give some examples where bk gets the merge right while git doesn't.
One example that bk gets right and git doesn't is precisely the one explained in the article.
I suspect what you'll find is changing the base in this way, while fixing this problem would introduce other problems that occur much more regularly, but I hope I'm wrong.
I can see how that can be achieved in the case of fully automatic merges. When merging B2 into C1+B1, you'd effectively un-merge C1+B1, merge B1 and B2, and then merge C1 and B1+B2.
But how would that work if C1+B1 had a conflict that had to be manually resolved? Assuming merging B1+B2 into C1 has the same problem (a fair assumption) will I have to do the same manual fixes again?
Or are they smart enough to look at the failed automatic C1+B1 merge, and generate a patch to that from the manual fixes I did, and then try to use those to resolve the merge of C1 and B1+B2?
I suspect there will be cases where this is just not going to work well.
can't have everything i guess.
The good thing is: "how it works" is really simple.
You should treat it like a language (just like all system/unix tools), not an "app".
checkout and reset do completely different things when given files or when not given files.
reset on files should really have been called unadd. reset on refspecs should really have been jumpto, moveto or something else indicative that the current branch ptr is moved to a new refspec. --soft and friends could have been --no-update-index or --no-update-files.
checkout on files should really have been called overwrite. checkout on branch names should have probably been switch, setcurrentbranch or a name indicative that the current branch is being changed.
pull and push are symmetric names for asymmetric behavior. pull could have been a flag for merge (-f meaning fetch first).
reset --hard was for a long time the only way to move a branch ptr to a new position along with the files, but it has the potentially unintended consequence of also irreversibly deleting working tree changes. If you use it to delete, that's fine, but since you had to use it to move the branch ptr, it is simply wrong to have irreversible damage as a side effect. Especially in an RCS which is used by many as the fail-safe against their own user mistakes.
There's no easy way to see which branches are tracking what. And until recently it was a big PITA to even make the current branch track a remote branch.
Deleting remote branches has awkward syntax (pushing an empty string to a branch name) and then you have to use a specialized command (remote prune) if you want the deletion to be propagated to other repositories.
Another annoyance: Git doesn't let you push a detached head to a new remote branch, so you have to create a temp branch ptr to the detached head position and later delete it.
Git also doesn't have good support for versioned sub-projects. submodule is sub-par, and requires a multitude of extra commands even in the cases that should have been seamless.
I can understand your confusion, given the seemingly separate use cases for reset, but in fact, it makes perfect sense. Reset always does what it says it does. Let's break it down:
git reset --mixed <commit> will make your current HEAD point to <commit>, reset the index to <commit>, and leave your working tree alone. This is useful for "uncommitting" the last commit, e.g. so you can split it up into smaller commits. Example:
git commit -am "lots of changes"
# realize you should really do better
git reset --mixed HEAD~1
git add myfile.py
git commit -m "implemented feature x"
git add yourfile.py
git commit -m "bugfix #3182"
git add dontstage.py
git reset HEAD dontstage.py == git reset --mixed HEAD dontstage.py, since --mixed is the implicit default
Now, if someone (e.g. easy git: http://people.gnome.org/~newren/eg/) wants to make git reset HEAD to unadd, that's fine by me. I'm speculating here, but I imagine that the Linus/git dev point of view is, why call it anything other than exactly what it is? It's just nice and elegant that it happens to suffice multiple use cases.
The more you get into git, the more you start to realize why some of the commands that seemed arcane in the beginning are simple and elegantly named.
I'm OK with having a low-level primitive like "reset" that doesn't have a simple meaning so cannot have a meaningful name. But then, it should be wrapped with meaningful commands such as "moveto" with flags to avoid touching index or working tree, and "unadd" on top of "reset". Then, I don't think anyone would ever use reset directly, so it would probably be phased out :-)
All of these ship with Mercurial, but are turned off be default. Enabling them is just a matter of adding
Though I will add that the index is a horribly named concept and it really bugs me that different commands use different names for it ("--cached", but sometimes "--index"). They need to rename it to "staging" and change all the command line options to --staging (keeping the old ones as hidden backward compatible options, of course... "diff --cached" is engrained in my memory at this point). I think that would make things more consistent and clear.
I wouldn't call strychnine a poison. It's just a quirky food additive.