Hacker News new | past | comments | ask | show | jobs | submit login
Still hatin' on git: now with added Actual Reasons (reprog.wordpress.com)
112 points by MikeTaylor on May 12, 2010 | hide | past | web | favorite | 166 comments



I have read a fair amount of documentation on git. I still don't even know what most git commands do. When people recommend a rebase, I pretty much just pretend that my repository isn't going to die. I wince every time I have to revert a file, as "git checkout" is friendly but "git checkout <filename>" is destructive without warning. I once destroyed a number of local files (I don't remember how) that were thankfully still open in emacs.

git is awesome, and git is terrible, often for the same reasons. git is a Sharp Knife, which is fantastic if you need one, and horrible if you don't. I personally have basic distributed version control needs and git is far too sharp for my needs. I have managed to not cut anything important off, but I tread carefully around each command, often worrying that I will irrevocably destroy my files.

So, +1 for being way better than previous version control systems, but a friendlier modern version control might easily win.


> I wince every time I have to revert a file, as "git checkout" is friendly but "git checkout <filename>" is destructive without warning.

From my point of view, that's like saying that 'rm -rf' is destructive 'without warning.' It does what it was meant to do, you can't expect everything to warn you all the time in an effort to save you from yourself. It would get really annoying, really fast if it asked you to confirm every time you performed an operation in git.

'git checkout <filename>' == 'replace <filename> with the version of <filename> in the index' (if the index is empty, then index == HEAD)

I'm also unsure how 'svn checkout <filename>' is less destructive.

[Note: Though these 'git debates' tend to get heated because you have "frustrated people that don't understand git but are trying to use it" on one side and "people that understand git and are frustrated at the 'ignorance of newbs'" on the other side, this is not an attack.]


that's like saying that 'rm -rf' is destructive 'without warning.'

Well, "rm -r" when run as root does warn. You have specified a flag that explicitly doesn't not ask. Also, "rm" only ever destroys things, whereas "git checkout":

  - creates a new branch (non-destructive, additive)
  - switches to an existing branch (non-destructive)
  - obliterates uncommitted changes to a file (destructive)
And the two major modes (destructive and non-destructive) have no syntactical differences. "git checkout X/Y" is non-destructive if X is the name of a remote and Y a file, and destructive if X is the name of a folder and Y a file. Deep down, you know this just isn't right.

The chief problem with git is that the user interface is a simple reflection of the mechanics and vocabulary used to implement it. That first part is great for performance, and the whole thing might seem like a good idea if you happen to be a filesystem hacker, it's arbitrary nonsense to the rest of the world.

git has been purposely architected so that you do not lose data. "stash" allows you to temporarily store local changes while you test things out. Extensive branching is key to sharing patches in a trackable way. Most destructive commands actually alert you that data will be lost. I really like this philosophy of never accidentally losing anything.

But the interface is inconsistent and needlessly complex, and should not be defended. git could be vastly improved simply by changing the command structure and vocabulary. Destructive commands could universally prompt unless an option ("-f"?) is specified.

Maybe someone could make "tig", which is a proper version control system organized from an everyday user's perspective, based on git.


> Well, "rm -r" when run as root does warn.

No it doesn't, but perhaps you're using a distro (like RedHat) where "rm -r" is actually aliased to "rm -ri".


> 'git checkout <filename>' == 'replace <filename> with the version of <filename> in the index' (if the index is empty, then index == HEAD)

Well, really: "git checkout foo" means:

1) Switch the branch, if foo is a branch.

2) Destroy all local modifications, if foo is a file.

These are two drastically different actions, and they're given the same name. The "rm" command has one name, and does one thing: remove files. It isn't also, sometimes, used to gain access to otherwise inaccessible files. I think, ultimately, the names are the worst part of git for me.

There's checkout, which is ambiguous. I'm used to using "svn revert" in the past, but I have to use "git checkout" to do that ... at least for individual files. If I want to do them all, "git reset" is, usually, what I'm looking for in this case. The "git revert" command also exists, but does something different (apply past commits in reverse). Git has three different commands for the "put the files back the way they used to be" concept (at least, are there more I don't know about yet?).

Then there's "the in between thing". From the man pages:

If you're doing "git add" then it "updates the index using the current content found in the working tree" and "the content staged for the next commit". If you're doing "git reset" then you have to ask for "--mixed" or "--hard" in order to "resets the index". If you're doing "git diff" then you have to ask for "--cached" to get the "changes you staged for the next commit". Then there's "git ls-files" which talks about "the file listing in the directory cache index" and "--cached Show cached files" and "--stage Show staged contents object name". Wait, "--cached" and "--staged" are different options?

So what is it? Is it "the index" or "the stage" or "the cache"? Do I have to pass "--hard" or "--cached" or "--index"? Whoops, I made that last one up! But wouldn't it be nice if it had one name, and that's always the name I used to refer to it? I know there's something between "files I edit on disk" and "files in the repository" but the documentation does anything but make it clear what that something is even called.


> 2) Destroy all local modifications, if foo is a file.

Not exactly. It updates the file to the state from the index.

git-checkout is just poorly overloaded. The idea is the same in both operations (updating some set of files in the working directory to some state from this or some other branch) but the two should be separated.


> Not exactly. It updates the file to the state from the index.

Whoops, you got me. I of course meant "in the working tree" when I said local. It's just easy to be close, but not quite, right with the terminology.


'svn checkout <filename>' isn't a valid svn operation; checkout applies to directory urls. Actually, beyond svn revert, I struggle to think of an easy way for you to lose your local modifications. If you really just wanted the latest version of <filename>, I guess you might use svn cat, but you'd have to redirect it to overwrite the other file yourself. For svn update, it won't overwrite local modifications.

svn revert seems to be better named for this purpose.


Sorry, it's been a while. But yea. 'git checkout <filename>' is basically doing what 'svn revert <filename>' is doing. So, I fail to see how one is necessarily 'more dangerous' than the other.


On the face of it, I think it's dangerously misleadingly named, but that's all.


Not a good comparison. "rm" stands for remove; "f" stands for force. Neither "git" nor "checkout" indicate anything irreversible or destructive.


Well you should be using more than just the 'name of the command' as your determination of whether or not you should just run a command with little to no understanding of the consequences... Unless the command is 'do-something-completely-safe' and it doesn't live up to it's name, then I'm failing to see the issue. 'git checkout' doesn't convey safety or danger, so I would think that one should try to have a grasp of what it does before randomly running it.


Git checkout also switches branches. "Checkout" sounds pretty safe, and svn has a checkout command (Which doesn't do anything but print an error message if you're in a working directory. It's the equivalent of git clone.). Finally, in most commands, git is paranoid about not destroying your changes. Combine all of these and you're in for a nasty surprise.


It's still being paranoid. You shouldn't be checking out anything while you have working copy changes unless you really, really mean to. This is why it's not a big deal that it's named the same.

Also, git defaults to working on a branch if the names conflict. If I have branch 'foo' and file 'foo' (both terrible names for files and branches) and try: `git checkout foo` it will switch to the foo branch and not checkout the file. Why would you ever do this with working copy changes except in very rare circumstances?

In short, I'm suggesting that if you have working copy changes then `checkout` means "restore this file to it's state in the index" and if you don't have working copy changes then it means "switch to this branch".

Since you always know if you have working copy changes or not, this is a pretty easy distinction to make.


That's not being paranoid, that's depending on me being paranoid. People make mistakes, and a clearer separation of these two functions would make it harder to make this mistake.


I don't really want my tool to be the kind of paranoid you're describing. I know what I'm doing, I've never lost changes like that, and I don't want git acting like it knows better. The commands are semantically correct, you just need to use them in conjunction with your brain.


Er, first, if you'd like to call me stupid, please do so openly.

Personally, I like safety-features on tools. You know that feature that stops rotating blades when they come into contact with fingers? Would "just don't touch the blade" have been an equal solution to you?


I didn't mean to imply you personally are stupid. Just that 'one' wanting to use git should effectively concede that it gives you the tools to cut off your hands if that's what you're aiming to do.

So as to your next point, I don't think that's a very good analogy. If the rotating blades are meant to be used to amputate limbs and to cut logs in half - then yes, I'd say if you are trying to cut a log in half you shouldn't put your arm in it.


"I didn't mean to imply you personally are stupid."

You implied I wasn't using my brain when using git. What did you want to imply with that if not stupidity?

"Just that 'one' wanting to use git should effectively concede that it gives you the tools to cut off your hands if that's what you're aiming to do."

Not sure I get that. I need to use git, so it's my fault? The thing about a mistake is that I wasn't aiming to do it.

"If the rotating blades are meant to be used to amputate limbs and to cut logs in half - then yes, I'd say if you are trying to cut a log in half you shouldn't put your arm in it."

Who would build such a thing? And even further, who would build a machine with 100 buttons, two of which look exactly the same, but one cuts wood, one cuts of your arm.

What is to gain by having a destructive file operation and a non-destructive branch operation have the same name? Why do people seem to feel hurt when people suggest to name them separately?


What is to gain by having a destructive file operation and a non-destructive branch operation have the same name? Why do people seem to feel hurt when people suggest to name them separately?

Because they are both the semantically correct name for the operation. Check out a file, overwriting my local unstaged changes. Or checkout an entirely new working copy from the point referred to by a branch.

I don't feel hurt that you suggest it, though. In fact I mentioned in another comment that I always thought `git branch` could do the switching itself rather than checkout... I just don't think it's a problem. I know I'm working with a file or a branch, so I just know what will happen. It doesn't matter that the name is the same since it's function depends on context.


"Check out a file, overwriting my local unstaged changes. Or checkout an entirely new working copy from the point referred to by a branch."

Well, of course they both seem the same if you use the word "checkout" in the explanation for "checkout." But "rolling back changes in a file" and "switching the current branch I'm developing in" aren't that close anymore. The word itself might mean multiple things, but that doesn't imply it's a good idea to use it in both meanings.

"In fact I mentioned in another comment that I always thought `git branch` could do the switching itself rather than checkout... I just don't think it's a problem. I know I'm working with a file or a branch, so I just know what will happen."

But evidently there are people who do think it's a problem. Also, the argumentation is that this would prevent mistakes from being destructive. When you know what you want to do, and it does it, it's not a mistake. When you want to do one thing, but the other one happens for any reason (bash completion was one of the mentioned ones) you don't know it.

"It doesn't matter that the name is the same since it's function depends on context."

It depends on the type of its argument. A simple "git checkout foo" is indistinguishable. I'm just a bit astounded by the amount of contra to this simple proposal to an existent problem.


Once files are added to git it is almost impossible to destroy them irrecoverably. So my impression of git is the opposite I perform any command without fear to lose anything. If the command doesn't do what I need I just rollback to the last known good state.


'git gc' ? ;-)


http://www.kernel.org/pub/software/scm/git/docs/git-gc.html

"git gc tries very hard to be safe about the garbage it collects. In particular, it will keep not only objects referenced by your current set of branches and tags, but also objects referenced by the index, remote tracking branches, refs saved by git filter-branch in refs/original/, or reflogs (which may reference commits in branches that were later amended or rewound)."


If you have an object that's no longer referenced by anything you can lose it when you garbage collect, which was my point. I was only being half-serious, which is what the wink was meant to convey.


If you are having trouble with git losing your changes you shouldf really be committing more often. Once a change is in git it is very hard to lose but until it is in git losing it is very easy.


The complaints about git porcelain are totally understandable. The UI is horrendous. However the internals are elegant and actually much simpler than subversion, so if you understand the internals and do a little rote memorization then suddenly you can manipulate your repository expertly, creating better commits and handling unusual needs with ease. It's the difference between being good with a unix shell and running windows configuration wizards to do everything.

I don't expect the OA to grok that, however one thing that I hope the author takes to heart is this:

With git plumbing it would be fairly straightforward to create a new porcelain VCS combining any balance of elegance, simplicity and power that you desire. With subversion on the other hand, it is almost unimaginable that they would ever be able to correctly track a merge without a hundred bizarre edge cases due to the fact that their repository structure muddles branches and directories.

Why hasn't someone who knows git created a better porcelain yet? Probably because once you know git well enough you don't want to give up any of it's native power just to satisfy some noob's anxiety attack.


> Why hasn't someone who knows git created a better porcelain yet? Probably because once you know git well enough you don't want to give up any of it's native power just to satisfy some noob's anxiety attack.

They exist. gitx, magit, etc... are all great tools.

Most people who use it for any length of time seem to be just fine with it, though. I can recognize a few things as... silly (e.g. checkout), but not so much as to try to introduce a new verb and redefine the semantics just to come up with a better separation when I already know what I'm doing with the tool.

I certainly don't have enough of a problem to write an entirely new interface when the existing one continues to get incrementally better.

Similarly vi and emacs are both great editors whose power cannot be harnessed by those who don't spend time trying to learn them. The vested are less affected by idiosyncrasies.


> So I edit the file, fix the trivial conflict, and git commit filename.

This is a bit vague, but I'm guessing you mean you open up the file in your text editor of choice and edit it directly.

What I do these days, probably originally prompted by exactly the sort of frustration you describe, is to run "git mergetool", which I happen to have configured to fire up emerge, but you can use any of a number of tools. Once I'm done, git is happy and I can just push. Initially I was a little grumpy about learning how to use emerge when before I just edited the file with all its <<<< and ==== and >>>> directly, but after learning the tool, I wish I'd been doing conflict resolution this way back when I used CVS and SVN. It's really much nicer and less error prone.

So, I hope that's a helpful hint. You're absolutely right about a lot of the unhelpful & unintuitive messages and commands.


Isn't the message intuitive? Git usually says, "merge conflict, fix the conflict, stage the fix, and run git rebase --continue". If you follow the instructions exactly, everything works.

Now, I know he wasn't doing a rebase (but should have been), but still; when you edit things in git, you stage them, and then commit them. After a merge, the index is mostly ready to be committed, except for the conflicts. So you are supposed to manually stage the fixed conflicts, and then plain "git commit". Simple. It works exactly like anyone who understands git would expect it to.

Now, if you don't like the index, then git simply isn't for you. It's a core concept, you just can't avoid it. (Personally, when I was using non-git version control systems, I often had a separate working copy I used exactly like git's index. So having that built in is a joy for me.)


No, “! [rejected] master -> master (non-fast forward)” is not intuitive. The example you give is one of the pleasantly surprising examples of a good error message from git. But, by and large, git's error messages are pretty awful.


What else should a one-line summary of syncing master -> master say?

"Hello dear user. The remote server refuses to accept your branch, because it would delete information. As a result, your request has been rejected. Please rebase and try again. If you need help, hang up, and then dial your operator."

Personally, I am fine with [rejected]. I figured it out, after all...


Mercurial says this in the same situation:

    abort: push creates new remote heads on branch 'default'!
    (you should pull and merge or use push -f to force)
Yes, it's two lines. I think we all have large enough monitors now that an extra line of output to be a bit friendlier is a good thing.


Where does it end? Should a compiler say "unknown identifier 'foo': perhaps you need to import a header file that defines foo, or spell it better, or write code that works instead of this garbage" instead of a simple "unknown identifier: foo"?

No. You just look up the error in the manual, and never think about it again.


LLVM will attempt to guess the proper name of a misspelled identifier for you: http://blog.llvm.org/2010/04/amazing-feats-of-clang-error-re...


It theoretically ends where it's most comfortable for a wide range of people, experienced or not. And hyperbole doesn't help. Telling me to pull first so it can incorporate the remote changes isn't a big deal I think. Where does this idea come from that the interface has to be cryptic?

If it isn't in there because nobody has done it, fine, that's cool and understandable. Actively arguing against better user messages is simply weird to me.


1> Intuitive is not everything

2> To actually make said message more intuitive, it could have a "type git help error ffwd for an explanation"


You said he should have been doing a rebase. I disagree. Almost always, if you think you should rebase, you really should merge instead.


All the people suggesting various git commands he should be using are completely missing the point.


The point -- as I understand it -- is that the author feels that a version control system should act like X. So he takes a random version control system, reads about it briefly, and then tries to use it assuming that it acts like X. The version control system doesn't act like X, which leads to problems for the author. Author writes a blog post about how random version control system sucks because it doesn't meet his assumptions.


That is what I got out of it. A big clue was his "git commit filename". git is not svn. Is git's greatest sin really that it uses a few of the same words with different meanings from how they have been used in the past?

Someone in another post mentioned revert. svn revert file == git checkout file, while git revert patches out a whole change. It actually makes sense if you forget about cvs and svn's version of the API.

I saw the word "intuitive" thrown around... http://www.asktog.com/papers/raskinintuit.html


The only valid point made is that checkout has two functions:

* If there does not exist a branch 'foo', then 'git checkout foo' will reset the file named foo to the head.

* If such a branch does exist, then 'git checkout foo' will move to that branch. You have to say 'git checkout -- foo' to have the first behaviour.


You mean it actually makes sense if you forget the other VCS versions of that command, and know git's? Because ignoring all VCS experience I have, "checkout" would indicate to me that I could "check something out" as in "have a peek at it."


In git, it means "checking out the current branch version of a file" or "checking out an entire branch". Pulling all the patches from another resource also makes sense.

Think of checkout as a local thing to do, and pull/fetch as a remote action.

It can make sense if you do not have prior associations. It is not as if the English language is not terribly abused and ambiguous in our languages. What does "cancel" mean? In linux with CUPS, it cancels a print job. To cancel a process, you have to kill it, but you cannot kill print jobs. Shocking.

What is OOP? How does that definition change between C++, Java, Smalltalk, Lisp/CLOS, and vimscript? How about MVC? What about the differences in GUI lingo between Object Windows, McClim, Gtk, and qt? In other words, you have to learn the language of each different tool you use, and it is unreasonable to expect that it is the same as your other tools.

Aside from that, it is terribly unfair to try to compare svn to git in any meaningful architectural sense; they are completely different creatures that accomplish some of the same goals. git does it better. And it uses different language to describe its new approach.

I have not had a difficult time switching between using svn and git after using both for a long time, except for missing nice features like "git log -p" and "git add -p" when I work with svn. It seems petty to me to quibble over command names.


"What does "cancel" mean? In linux with CUPS, it cancels a print job. To cancel a process, you have to kill it, but you cannot kill print jobs. Shocking."

Those are different projects. Git is one. OO is an abstract concept, we're talking about git communicating to the user, and what level the user must have to understand that communication.

The problem is not that git and svn have different opinions on what "checkout" means, it's that git itself has different opinions on what it should mean.

Again, why is it so bad that we say the interface could be better? Why do so many of those that managed to get a grip on this feel the need to strictly deny that there could be any problem in the interface itself?

Yes, Linus designed it. But he's just human too.

I still have yet to see a single useful advantage of mixing up different meanings in the same word. All I get is "It is that way, learn it or GTFO."


When you 'checkout' something from the library, you are not 'having a peek at it' in the 'Hey! Come check this out!' usage of the word. Words can have multiple meanings depending on context. 'git checkout' has no context, so you can't just assume what the author meant when the command was named as such.


I know, and having multiple meanings for a single word that's used as a technical command without visible content is not a good idea. That's what I'm saying


> without visible content

There is a manual... You don't blame a drill press company just because someone thought it was a good idea to use their drill press to pierce their nose, do you? What's the point of providing people with documentation, if you are going to rail on the product for not being 'so easy' that people don't need to read the documentation?


This is about catching mistakes. No manual will help you there. Documentation doesn't even come into the mix.


The point being that the author isn't looking for solutions, but rather just wants to bitch?


The point being that the solutions are problematic.


And the problems shouldn't exist in the first place.


Linus didn't write git for you. He wrote it for himself. Git works exactly the way Linus (and his merry band of kernel hackers (and others, who thought DVCS was a cool idea when they heard about it and were prepared to relearn SCM to use it)) wants it to work. If you want to learn to use it, cool, welcome to the party. If you aren't prepared to invest the time, that's cool too. But don't try it for like a day expecting it to be svn++ and then complain that it isn't. That's like a Windows user declaring that Linux sucks because it isn't Windows. It wasn't meant to be.

Although, complaining about Ubuntu not doing things the way they "should" is more valid, since Ubuntu is targeted at everyone. :)


Git is gaining a lot of attention lately. It's becoming "the dvcs" now. And that means, if you start using it for your project, you should consider what does it mean for contributors / users. If it creates an additional step where someone will say - "it will take me more time to learn, than creating the patch itself", you've lost. If many people start using git, we can start expecting things from git. It's just not about an abstract "you" anymore.


If many people start using git, we can start expecting things from git.

Why? The author of a piece of software is under no obligation to satisfy the expectations of its incidental users, especially if he's not selling it to them.


If they get big enough... no they don't have an obligation, but the idea of many people depending on them should be enough. It's like MS had no obligation to satisfy expectations of IE5 users (not sold directly) - sure they didn't - but many got very annoyed because of that and because of the market share, we had internet-wide consequences for standard-compliant websites. Still - they had no obligation.


That is just wrong on so many levels. Firstly, Microsoft marketed the hell out of IE. The bundling of IE with Windows was their way of getting people to use IE instead of Netscape Navigator. Since their stand (in court, no less!) was that IE was an integral part of Windows, which they sold to users, they did have an obligation to satisfy users' expectations of it. Did anyone force you to use git by bundling it with some other piece of software you were using? Nope.

Secondly, in what crazy alternate universe can git, which is given to you free as in libre as well as beer, be compared with IE? If you think git sucks, but still want to use it for some crazy reason, go ahead and modify it to suit your needs.

Lastly, I cannot imagine a world where anyone would be forced to have to cater for a VCS the way we have to for browsers. They're simply too different in function. It's not like you're in danger of being forced into the business of implementing a service supporting commits and checkouts for two different types of repositories on the same source files... unless you work at GitHub. Heh.


Nobody is telling Linus he messed something up. Are we not allowed to raise issues, discuss and bring forth our perceived problems? "Use it as it is or GTFO" doesn't seem like the Open Source way.

The problem here is that there are people telling everybody that "git is easy," "everybody should use git," and then every time someone raises a concern someone brings up the argument above. I wonder, what is your goal? Are you trying to get people who "don't get it" to stop using git? Do you think killing discussions about it will further progress? Are you afraid the interface you learned might change?


I don't follow this at all. Taking time to learn a new and improved way of managing your source code is a deal breaker? These are exactly the people I'd hope to catch by doing phone interviews. "So, what's the last thing you picked up and learned for fun?... Oh, nothing? Ok, thank you for your time."

The creators of git have created/are creating a modern changeset management tool which, yes, you will have to learn to use effectively. This is a good thing.


New - definitely. Improved... it depends. If you think it's improved, fair enough, but this is your opinion. Not everyone has to agree. For example I think, it's got more features, but is lacking regarding the interface and usability. Overall, I don't consider it "improved" enough to adopt in everyday development.

Not spending time on completely learning how to deal with an inferior (imo) tool, gives me more time to learn things which matter to me. So it's the exact opposite of "learnt nothing".


But, yes, I do also like to bitch about them :-)


Watch out for the flying door-knobs. ( http://news.ycombinator.com/item?id=1272779 )

Seems that the peasants who have binaries should use rsync manually, instead of expecting tools to do the job.


Thank you.

I suspect these folks were OK with spending hours learning about their DVCS, but for most others, this just seems like unnecessary tedium and frustration, "When can I get back to worrying about the code?"

Someone needs to make the Mac of a DVCS, i.e. one that doesn't require a 20+ page tutorial to figure out, is completely intuitive and "just works."

I currently use Mercurial, and while I don't think it qualifies as the Mac of DVCS's, it seems to be working decently enough so far (for my simple purposes). I could learn git, but thanks to hg-git I'm hoping I won't have to.


Why yes, a lot of us are just fine spending "hours" learning something that we'll probably spend more than a few years using.

Git's user interface isn't perfect, but you're not going to get the "Mac of DVCSs" either. There's a reason for the complexity, doing stuff like merging together the work of three people is inherently complex.

You're not going to come up with some solution that allows the user to "just do it" without also destroying history.


> You're not going to come up with some solution that allows the user to "just do it" without also destroying history.

Really? I've got state A and state B. I tell my DVCS to merge them into state C. After reviewing the automatically guessed changes and manually changing the things which could not be guessed, I commit C'. C' is successor to both A and B.

What exactly has been lost / destroyed here?


Nothing. Conveniently, git allows you to do exactly that.

1. Checkout B. 2. Merge A into B. 3. Git tells you there's a merge conflict and what to do to fix it: "Auto-merging foo CONFLICT (content): Merge conflict in foo Automatic merge failed; fix conflicts and then commit the result." 4. Edit foo and fix the conflict. 5. "git add foo" 6. Commit.

What's the problem?


Parent claimed that it cannot be done, I responded. Not sure what is the problem...


But you're not committing, you're adding.


My post was unfortunately worded. That's what you get for posting late at night I guess.

What I meant to say is that you're not going to get away with using any sort of complex system without understanding what it's doing. This seems to be the sentiment expressed in the original posting, and some of the replies here.

Sure, Git's UI could be improved. But managing several divergent histories and correctly merging them together is a non-trivial problem. If you make a tool that allows the user to just plow on without understanding the underlying paradigm the results are going to suffer.

It's like asking for a RDBMS that allows you to "just do it". You might get away with that for a while, but not for long.


> Git's user interface isn't perfect, but you're not going to get the "Mac of DVCSs" either. There's a reason for the complexity, doing stuff like merging together the work of three people is inherently complex.

This is a limitation of your imagination. You can write a DVCS that's easy to use and intuitive. Git has an absolutely terrible UI, it borders on idiotic.

One example of a DVCS with a good design and UI is a Bazaar. Sortof like the Mac a few years ago, no one uses it, but I wouldn't be surprised if it eventually gains in popularity. I see two things holding it back right now:

- Launchpad is terrible and needs to be at least as good as Github.

- In concert with having a terrible "hub", Bazaar has a small community of users.

To gain users, it will need to out-innovate like Apple did. That means on all fronts, not just bzr itself but the community, marketing, and the websites that support it.


Issues that I've heard of with Bazaar:

- Being pure Python it doesn't have the same performance that git does. This may not show up in day-to-day usage, but things like 'convert this svn repo to Bazaar' tend to bring the bottlenecks/performance issues into the light.

- Bazaar made this weird design decision to have two revision numbers. a 'r12345' type revision number that is similar to SVN, but only applies to your local checkout of the repo, and a checksum (SHA1 or MD5) which applies globally (across different checkouts). I think that choosing a revision scheme similar to SVN's but one that doesn't map 1-to-1 with SVN is just asking for confusion from newbies. (I'm not entirely sure why one needs to have local and global revision numbers...)


The use case for numeric local revisions is pretty clear. How many global operations do you really do? In reality, nearly all operations are local. So would you rather write r37 or 0b090d7267df? Why wouldn't you want the choice?


Two points:

1. You can abbreviate the SHA-1's in git

2. When you are collaborating on code, it's much more common to use a global identifier so that people can all know which commit is being referenced. It seems like you are making things needlessly complex when you are trying to maintain local and global revision numbers, for little (in my opinion) return. Sure 'r37' is easier to remember off-hand than '5b98d8f' (a shortened SHA1), but if you memorize 'r37' it only means something locally, you'll still have to lookup the global revision id when you want to communicate with others.


>Being pure Python it doesn't have the same performance that git does.

Well, hg is nearly all Python (if I recall, their diff algorithm is the only C code) and it's quite comfortably fast. Seems to me that it's a design issue with Bazaar, and not something that one can simply blame on Python.


How do you think the Bzr UI design compares with that of Hg?


Don't give up faith my friend. Git and mercurial are not the end of the story.

I'm optimistic that, within another few years (give or take a decade), someone will come up with a VCS that's both powerful and usable. :-)

Other than that: I enjoyed your article. Very nice walk through some of the problems and hair-pulling that I'm going through regularly, too.

Pro-Tip: My life with git got a bit easier since I keep this big bucket of paste-ready shell-snippets near my terminal window. You know, those handy little 14-liners that look like line-noise at a glance, but get you out of "situations" when chanted.


> Git and mercurial are not the end of the story.

> I'm optimistic that, within another few years (give or take a decade), someone will come up with a VCS that's both powerful and usable. :-)

Do you think Hg shares the same amount of usability as Git?


link to your 14-liners?


Sadly I guess they most of them don't make much sense without context or the discussion/thinking that led to them.

But for example, this is my snippet for merging my current branch into master and pushing that to origin (hopefully merging/cleaning up in between as appropiate):

  # on local branch
  git fetch origin master
  git rebase origin/master

  # tidy up
  git rebase -i origin/master

  # merge to master
  git checkout master
  git merge $my_branch

  # push to master
  git push origin master

  # go back to local branch
  git checkout $my_branch
I didn't originally come up with all that, mind you. It's pulled from a workflow tutorial and you'll find basically the same procedure recommended in every tutorial (keywords: feature branches and "don't work directly on master").

And yes, I paste that multiple times per day and still throw up a little bit inside my mouth every time.


Your first two commands can be replaced with (one time):

git config --global branch.autosetuprebase always

Then "git pull"

Without doing the config change, you can just do a "git pull --rebase"

Your "git push origin master" is typically what I spell "git push" (recommend "git config --global push.default tracking" just to make sure you're pushing the same stuff that's tracked automatically).

So, my every day workflow (rebasing upstream and building on top of stuff) looks like this:

    git pull
    [edit a bunch of files]
    git add -p
    git commit
    git push
My mouth is minty fresh.


I guessed I'd have something like that coming and yes, that might work.

Except when you regularly need to create new working copies on varying git versions where those config details tend to differ in subtle ways. Too often did I have to wrestle problems on someones MacBook simply because the config or command syntax changed slightly between a few git revisions.

I'd go as far as to say that local (i.e. not repository-wide) config settings that change the semantics of how the commands work are deeply harmful in a team-setting.

It's yet another layer of complexity that you have to keep in mind when debugging problems.

Sure, it can save typing when everyone involved uses the same git version and has a good understanding of git. Unfortunately neither has been the case on most projects I've been involved with.


'git commit -a' is not the answer to your merge problem. You should 'git add <file-you-merged>', then commit.


Also, doesn't it tell you to do thst when the conflict happens?

edit, yup:

    $ git status
    # On branch master
    # Your branch and 'origin/master' have diverged,
    # and have 1 and 1 different commit(s) each, respectively.
    #
    # Unmerged paths:
    #   (use "git add/rm <file>..." as appropriate to mark resolution)
    #
    #       both modified:      readme
    #
    no changes added to commit (use "git add" and/or "git commit -a")


Nope.


I'm confused about what part of:

  (use "git add/rm <file>..." as appropriate to mark resolution)
is confusing...


see above, I edited to add what I'm talking about


My bad, I was wrong on this.

I would just delete my comment, but that would leave your replies looking dumb. Instead, feel free to downvote my earlier "Nope." into oblivion.


No worries, color me unoffended. I should have generated the message I was talking about before I posted anyway.


You'd still need to commit the files you pulled (that have changes) though right? Which is sort of fundamental to the whole distributed VCS, it doesn't matter that they were committed in the pulled repository, you have to commit them in the new one.


That doesn't work. The "git add README.md" completes with no output, but thereafter, "git commit README.md" still fails with "fatal: cannot do a partial commit during a merge."


just "git commit" after that, not "git commit README.md"

edit: it's infuriating that yc won't let me reply to you, so I have to reply in this edit. Anyway.

"git commit" says to git: "move anything in the index (i.e. things which I have 'git add'ed, or which have been merged) into the local repository".

"git add FILE && git commit" says "put FILE in the index and then commit the whole index to the local repository"

"git commit FILE" says "assume the index is clear, and only commit FILE" to which git says the logical response, "sorry, dude, the index isn't clear, I can't do that"

perhaps this will help: http://osteele.com/archives/2008/05/commit-policies


Oh! Well, thank you. Yes, I see that that works.

But:

(A) I see that it still commits everything, just like "commit -a", including all the files that I didn't change but that my colleague did in the branch that I'm pulling; and

(B) Why would "git add FILE; git commit" but "git commit FILE" not work? I bet it's something to do with the index.


> (A) I see that it still commits everything, just like "commit -a", including all the files that I didn't change but that my colleague did in the branch that I'm pulling; and

'git commit' commits everything that is currently in the index. 'git commit -a' commits all files currently tracked by git in their current state (if the version in the index and in the working directory differ, IIRC, the version in the working directory is used). It's the equivalent of a "git add FILE" for all tracked files with outstanding changes, right before a 'git commit'.

> (B) Why would "git add FILE; git commit" but "git commit FILE" not work? I bet it's something to do with the index.

The index is meant to be the 'staging area' where you can pick and choose what you want to commit. See "git add -p FILE" for an example of more complex usage of the index than just picking and choosing which files to commit.

In general, think of 'git add' as your ability to promote changes from the working directory to the index, and 'git commit' as your ability to create a commit from the contents of the index.


> you can pick and choose what you want to commit

How is that supposed to be a reasonable thing to do? Unlike my working directory, I can't build and test the contents of the index. To me, making a commit that I've never tested is somewhere between sloppy and negligent, so why is git encouraging me to do it?


I think that your issue is that you're thinking of a commit in git in the same way that you would think of a commit in svn. They are not the same. To commit in svn you are forced to push to the remote repo. Therefore you screw everybody up if your commit hoses things. In git, you can commit locally, and you are not even required to push those changes out. You could trash those commit without anyone ever knowing that you made them. You could make small atomic commits of small changes, and when the feature is ready to be pushed out to the 'canonical' repo, you could squash all of those small commits into one large commit, then push it out.


After some reading, it looks like it's actually possible to use git-checkout-index after an uncommitted merge to get the index contents out where I can test them. I'm still mystified that the preferred workflow is to blindly commit stuff that may not even run, which leaves you with the problem of somehow knowing which of the commits are actually useful points to roll back to or start new work from.


This is a bit of a late reply, but the reason it's committing everything that your colleague changed is that it's the definition of a merge commit.

I find that it's desirable to avoid merges to reduce the number of overall commits. To do this, do git pull --rebase instead of just git pull when your commit is rejected.


This is exactly the kind of unintuitive behavior that he was complaining about.


The thing is that it is intuitive if you only take some time to learn the new tool's features and how it is effectively used. Simply treating it like svn wont work. Yes you can 'get by' with some svn-like behaviors, but really dvcs is an entirely new thing you will have to learn to make sense of.

Here's an analogy. Svn is a horse and git is a car. You can imagine all these horse-riders (and there are tons of them) sitting around in a bar saying to each other "these new cars? What is up with them? It makes no sense that you change 'gears' and push pedals to stop and go! I just want to <insert whatever you do to ride a horse> and I'll get to my destination!" -- Well, yea, horses work fine a lot of the time. But it's 2010 (1910? :)) and I have more eclectic needs in my change management. I'm willing to invest some time in learning it just like I was willing to invest time into learning vim, or Erlang, or anything else of value.


1) "pull --rebase" might help with one of the particular problems mentioned (though obv. not appropriate for all occasions)

2) More broadly: yeah, if you expect it to work Just Like Things You Already Know, it just won't work out for you. It really is a totally different workflow, and it either works for you, or it doesn't. Empirically, it works great for many people/organizations (though, absolutely, not without some trade-offs). But truthfully, yeah, if you're a single developer, disinclined to use some of the niceties and very interested in maximizing simplicity/reducing keystrokes... svn really may be a better fit.


> Cheap branching is better than expensive branching, sure, but that’s like saying influenza is better than cancer. I’d rather just not be ill at all, thanks.

Please stop posting this know-nothing's uninformed rants.


I think the move back to CVS/Subversion might be the way to go

From your complaints, it sounds like you'd really like Mercurial. Give it a shot.


No, mercurial has exactly the same problem he mentioned: You want to push 'some' of your local uncommitted changes, but if the remote head has also changed you often can't. (hg shelve is a poor substitute).

This happened to my team enough in practice that they rebelled and forced a switch back to SVN.


Looks to me like he had two problems.

The first was that git gave very poor error messages and needed a confusing series of command-line options. In mercurial, this is much better; the pull creates a new head, and tells you what to do if you want to merge them. Commit is one-step, instead of via the staging area. It just gives more feedback, more usefully, and requires less knowledge about the underlying representation.

His second problem is common to both systems, namely that during a merge-with-conflicts you have to essentially re-commit both changes. This is actually a kind of safety feature, since the conflicted change might have screwed up something and it's up to you to make sure the code is in a good state after the merge. If the merge isn't conflicted, then Mercurial (and probably git) does it cleanly without forcing you to read over your teammate's change, which he hates doing.


There's two problems with this approach:

1. If you have other uncomitted changes, you're simply hosed. You can't continue with the merge process with local changes at all.

2. It pollutes your change list with unrelated, unconflicted changes belonging to your teammate (the whole changeset).


> 1. If you have other uncomitted changes, you're simply hosed. You can't continue with the merge process with local changes at all.

That's what 'git stash' is for. It stores away all changes to the local tree and index (but leaves ignore and untracked files alone). Then you can perform operations like changing branches, doing merges, etc. Then you run 'git stash pop' or 'git stash apply' to pull back your changes. (You'll need the '--index' option if you want you index back in the same state though. It won't 'trash' your changes. They will only be in the working tree, not in the index.)

> 2. It pollutes your change list with unrelated, unconflicted changes belonging to your teammate (the whole changeset).

Yes. The automatic algorithms have failed to determine how to cleanly merge the changes, so you have to rebuild the final state that you want the tree to be in. When you commit that state, it commits the diff of what is needed to resolve the conflict.

Remote:

  A-B-C
Local:

  A-B-D
Final:

  A-B--D--E
     \_C_/
You are basically building the state that you want the source tree to look like in commit 'E'. Then the diff of what needs to be done to resolve the merge of states 'D' and 'C' is recorded in the merge commit 'E'.

It's not 'polluting your change list' with those changes. It's setting your index into what the final state will look like and pointing out to you what the files that it couldn't figure out. You just need to make them look like they are supposed to (i.e. resolve the conflicts), put them into the index (i.e. 'git add') and push out the merge commit (i.e. 'git commit').


1. Yes, git stash is an option. Unfortunately, if it's anything like mercurial's shelve, it's not a good one. What happens when you 'unstash'? Does it properly give you conflict markers, or does it generate patchfiles?

The latter is mercurial's behavior, and it sucks. To do the former, the stash command needs to also track the repo version that the local changes were based upon, so it has the historical information to present conflict markers (effectively, enough info for a 3-way merge).

FWIW, This is actually the only thing holding my team back from switching back to hg. If hg shelve were part of the standard distribution and rock solid (multiple shelves would form a queue, no chance of corruption, and unshelve preserved historical information so it could properly merge conflicts), the world would be a better place. :)

2. Yes, I know WHY it does what it does. But the way it presents this info is unfortunate, because after a merge it's far more difficult to know exactly which changes you were making - many programmers are in the habit of reviewing their changes before committing. Does GIT let you easily tease out these differences?

[edit: call to action]


> many programmers are in the habit of reviewing their changes before committing. Does GIT let you easily tease out these differences?

I'm unsure what you're asking here. I can say that 'git diff' shows you the difference between the working tree and the index and 'git diff --cached' shows you the difference between the index and the most recent commit (i.e. HEAD). Both of those accept a filename as an argument to just git the diff of that file in the respective spaces. All of the changes that cleanly merged are in the index, and all of the unresolved conflicts are not when you first enter the 'unresolved conflicts' state while attempting a merge.

If your asking something more specific than that you'll have to rephrase or explain for me.


There are a number of ways to do this in mercurial. If you've got a set of uncommitted changes, only some of which you want to push out. Just commit the files (or use the record extension to commit only parts of files). Then, update back to the previous revision and commit the remaining changes.

This makes 2 heads. You can then update back to the first commit and nudge those changes out ("hg push --rev .", which will only push the current version and any unpushed parent revisions).

Switch back to the other revision to finish off your work there and merge the 2 heads together when you're done and ready to push it all.

You're basically working with anonymous branches at that point, but if you understand that the commits are really just nodes in a DAG, it's pretty easy to visualize what to do.


Mike, Git seems unintuitive because you don't have a good grasp of what it does behind the scenes. Imagine trying to get to grips with a Unix shell, if you had no concept of files or directories. In such a scenario, even a simple command like "cat" would seem incomprehensible.

If you'll indulge me, I'd like to propose a thought experiment.

* * Designing a patch database * *

Consider you're responsible for administering a busy open source project. You get dozens of patches a day from developers and you find it increasingly difficult to keep track of them. How might you go about managing this influx of patch files?

The first thing you might consider is how do you know what each patch is supposed to do? How do you know who to contact about the patch? Or when the patch was sent to you?

The solution to this is not too tricky; you just add some metadata to the patch detailing the author, the date, a description of the patch and so forth.

The next problem you face is that some patches rely on other patches. For instance, Bob might publicly post a patch for a great new scheduler, but then Carol might post a patch correcting some bugs in Bob's code. Carol's patch cannot be applied without first applying Bob's patch.

So you allow each patch to have parents. The parent of Carol's patch would be Bob's patch.

You've solved two major problems, but now you face one final one. If you want to talk to other people about these patches, you need a common naming scheme. It's going to be problematic if you label a patch as ABC on your system, but a colleague labels a patch as XYZ. So you either need a central naming database, or some algorithm that can guarantee everyone gives the same label to the same patch.

Fortunately, we have such algorithms; they're called one-way hashes. You take the contents of the patch, its metadata and parents, serialize all of that and SHA1 the result.

Three perfectly logical solutions, and ones you may even have come up with yourself under similar circumstances.

* * Merging patches * *

Under this system, how would a merge be performed? Let's say you have two patches, A and B, and you want to combine them somehow. One way is to just apply each in turn to your source, fix any differences that can't be automatically resolved (conflicts), and then produce a new patch C from the combined diff.

That works, but now you have to store A, B and C in your patch database, and you don't retain any history. But wait! Your patches can have parents, so what if you created a 'merge' patch, M, with parents A and B?

   A   B
    \ /
     M
This is externally equivalent to what you did to produce C: patches A and B are applied to the source code, and then you apply M to resolve the differences. M will contain both the differences that can be resolved automatically, and any conflicts we have to resolve manually.

Having solved your problem, you write the code to your patch database and present the resulting program to your colleague.

* * A user tries to merge * *

"How do I merge?" he asks.

"I've written a tool to help you do that," you say, "Just specify the two patches you want to combine, and the tool will merge them together."

"Um, it says I have a merge conflict."

"Well, fix the problem, then tell the system to add your file to the 'merge patch' it's making."

Your colleague dutifully hacks away, and solves the conflict. "So I've fixed the file," he says, "But when I tell it to 'commit file' it fails."

"Remember, this is a patch database," you reply, "We're not dealing with files, we're dealing with patches. You have to add your file changes to your patch, and then commit the patch. You can't commit an individual file."

"What? That's not very intuitive," he grumbles, "Hey! I've added the file to the patch, but it tells me the merge isn't complete!"

"You need to add all of the files that have differences that were automatically resolved as well."

"Why?!"

"Because," you explain patiently, "You might not like the way those files have been changed. It needs your approval that the way it's resolved the differences is correct."

"Why to I have to re-commit everything my buddy has made?" he complains, "Seriously, I want to just commit one file. What the hell is up with your system?"


Mike, Git seems unintuitive because you don't have a good grasp of what it does behind the scenes.

In other words, Git's abstraction is leaky. That's usually considered a bad thing in our profession.

Except that since it's Git, we all use it, and it's better than the alternatives, we all pretend that's a good thing in this case.

I'm fine with the way Git works internally, and by now I've come to deal with the fact that sometimes it takes five commands to carry out what is, in fact, one desired action.

But Git's main point of failure is typical of all young projects that are in any way involved with Linux - there's no effort to make it elegant or pretty, and anyone that points that out and suggests that maybe things could be easier is ridiculed for not understanding it.

Usually "That's not very intuitive" is, in fact, an indication of something that could be improved...


In other words, Git's abstraction is leaky

No, it means you're using the wrong abstraction.

As you change your codebase, files are modified. To the untrained eye, it looks like this a simple linear progression of history, and you just want to record savepoints as you go along. CVS lets you pretend this is the case.

Actually, that's not the case at all. What you actually want to record is the changes you're making, and the relations between them. In the vanishingly small edge case where you never have any collaborators, never any experimental code, you never need to backtrack, you never need to work on more than one portion of the code at a time - this is isomorphous.

The rest of the time, it's not. CVS & SVN try to stretch the first abstraction to take care of these differences, but fail.

git makes you face up to the fact that your abstractions are wrong.


No, it means you're using the wrong abstraction.

It seems a large number of users prefers to work at a different level of abstraction than git requires.

git makes you face up to the fact that your abstractions are wrong.

Strangely mercurial tackles the exact same abstractions, yet has a much friendlier user-interface. The standard-rebuttal at this point will be "Fine, use mercurial then". I wonder if, at some point, more users will start doing that than the git-community would like. I, for one, am certainly tempted, but have so far held out due to the switching cost and because hg is, of course, not without flaws either.

However, I don't think this "If it hurts then you're doing it wrong"-attitude can be healthy for git in the longterm.

A bad user-interface remains a bad user-interface, no matter how you spin it. The big problem I see is not even with git currently having this bad user-interface, but rather with the widespread reluctance in git-circles to even think about ways to improve it.


You make valid points about open source tools frequently having leaky abstractions, and I often have exactly the same response that you do -- "Why don't people make more effort to make this elegant/pretty/intuitive?"

But the more I use git, the more I actually appreciate the fact that the abstraction is leaky. When I'm manipulating my history, I often really _want_ to have all the guts hanging out so that I can slice and dice them. If git wasn't designed using the "composable tools" idea [1] then it would make this stuff a lot harder.

The tradeoff is that it makes the learning curve a lot steeper. I understand that some developers don't want to know too much about their VCS, but I can't count the number of times when I have appreciated having an in depth knowledge of it [2].

I said this here once before, and I don't mind repeating it: git is a power tool for power users. It is also designed as VCS toolbox, so anyone who wants to write a more intuitive UI layered on top of git is welcome to. There are a couple out there, but they don't seem popular. I'm not sure why.

[1] Although this phrase also conjures the UNIX Hater's Handbook's take on it: "tools for fools"

[2] The next question is how much of the time do I get into these situations _because_ the guts are hanging out? Is that the reason I need the power tools to get me out of trouble? It's hard to judge because I'm too close to the trees to see the forest on this issue.


I see this mentioned so many times:

"Git seems unintuitive because you don't have a good grasp of what it does behind the scenes"

but I fail to see why this is true. Do we need to understand the implementation of block allocation, snapshots, atomic writes, etc. to save files? Do we need to know the ip checksumming algorithm to connect to use internet? Do we need to understand congestion control algorithms to browse websites? {and many other examples}

No - we don't and many of us do not know those things. Then why are we expected to know the internals of git to use it? (use - not modify, analyse, edit without an interface, etc.) Why can't we get a tool which gets an address and makes the file appear on the local storage? (or any equivalent to VCS workflow)

"Designing a patch database" - No, I want to use a VCS, not design it. If it works on pixie dust, I'm ok with that. It's just a tool. It's supposed to help me do the real work, not give more stuff to think about on every step. Why is "you don't understand how it works" an acceptable answer here? Where's the iphone of VCS-es?


"Do we need to understand the implementation of block allocation, snapshots, atomic writes, etc. to save files?"

No, but consider what you need to know about file systems to use them. At the barest minimum, you need to know that:

* Data is stored in files

* Files have names

* Files are contained in directories

* Directories have names

* Directories can contain directories

If you had no understanding of what a file or a directory was, I'd imagine you'd find the behaviour of "grep" or "cat" completely unintuitive.

You don't need to know how Git stores data internally, but you do need to have a basic understanding of its design, just as you need a basic understanding of a filesystem in order to use it.


I expect to need exactly the same level of knowledge to work with a DVCS, because I'm editing files. SVN was very close to providing that in many ways and it worked. I don't want to know about a staging area unless I explicitly use it, for example. I don't want to know about the design or patch handling, unless I explicitly request or apply a patch. Merging is merging - it's good couple of levels of abstraction above patches.

Basically - I want my DVCS to require as much knowledge as cp, diff, patch. I could live with them, if I had to (see - `quilt`). Now a DVCS should be easier to use, not harder. Otherwise, what's the point?


People use DVCS for its features and capabilities. Because it can do more things not because it is easier to use. In the case of git the 'simple' workflow runs into some troubles because git is designed to accommodate much more complex workflows and as a result some of the core concepts have been tweaked in what people consider to be 'unintuitive' ways.

People choose vi/emacs over notepad/ed/pico for many of the same reasons and people complain about many of the same things (it's unintuitive, complicated, confusing, and so on...)


The reason why this is important is that dvcs-es are very different from editors. If I don't like emacs, I'll use vi, gedit, ed, ... - we'll get the same file and the same result, it's only the method that's different.

If you use git however, I have a choice of a) git b) hg-git c) not working with your code.


Surely the same argument applies to SVN as well? Whichever VCS you use, you're going to force contributors to adapt to a particular version control philosophy.


"I expect to need exactly the same level of knowledge to work with a DVCS, because I'm editing files."

From Git's point of view, you're not editing files; you're creating patches.

Git is fundamentally a tool for constructing, sharing and storing patches. You may disagree that this is the best way to approach version control, but if you accept this philosophy, then Git is remarkably simple and logical.

Personally, I feel that treating a version control system like a filesystem is the wrong approach. I'm primarily interested in managing changes to the code, not in tracking chronological changes in individual files.


Weavejester, this is utterly brilliant. Like a lightbulb going on. THANK you!

I'd love to post it (with attribution, of course) as a followup article on The Reinvigorated Programmer. Please contact me to let me know whether that's OK -- mike@miketaylor.org.uk


You're very welcome to! I had worried it was a little too long, but I'm glad it turned out to be enlightening despite its length.

Git is not without its flaws, but I'm convinced the majority of problems people have with it is because most tutorials on Git seem to focus only on the commands, without giving them any context on how Git actually works. Initially, I had exactly the same problems as you did (and exactly the same disillusionment) until I happened across Git From The Bottom Up (http://ftp.newartisans.com/pub/git.from.bottom.up.pdf). Upon reading that, I also had a lightbulb moment.

Git From The Bottom up is definitely worth reading, but it does tend to be a little too low level at times. So I've been toying around with the idea of writing a "You Could Have Invented Git" article, in the style of You Could Have Invented Monads (And Maybe You Already Have!) (http://blog.sigfpe.com/2006/08/you-could-have-invented-monad...).


Many thanks, WJ. As you may have seen already, I went ahead and posted at http://reprog.wordpress.com/2010/05/13/you-could-have-invent... Much appreciated!


>Mike, Git seems unintuitive because you don't have a good grasp of what it does behind the scenes

I actually believe that's what "Unintuitive" means.

However intuitiveness is not everything. Power is often more useful than easy to learn.


Seems to me he should just use Subversion. He has a mental model of how version control works (based on his prior use of CVS) and apparently doesn't wish to invest the time to learn git's model.


Someone needs to introduce this guy to the concept of rebase and then maybe he'll get on board with branching. And then maybe some of his problems with be solved. I used to have these issues until I started branching and rebasing off a core "develop" and "master" branch.


Am I the only one who reads posts like this and wonders why people make merging/pushing/pulling with Git so complicated? It's really not that hard. Sure there are plenty of esoteric and strange things you can do with Git but I have seen devs go from zero knowledge to using the basics in no time. Some never have to go beyond the add, commit, push, pull and merge commands (with their respective switches) and work quite happily. Sure they aren't Git experts by any means but they are able to get their work done and the tool is just as transparent to their workflow as any other SCM tool.


I sort of take issue with this statement:

“git is bad for me because it makes assumptions about how I work that don’t match how I actually work”

I think that it's the other way around. git was built with a specific type of workflow in mind. If you take git and try to insert it into your current workflow with a minimal understand of git (or its intended workflow), then isn't it really you that are making assumptions about how the tool works, rather than the tool make assumptions about how you work?

{edit}

  - I make a one-line change to a single file.
  - I commit my change.
  - I git push, only to be told “! [rejected]	     master -> master (non-fast forward)”. 
    (This is git’s cuddly way of telling you to do a pull first.)
  - I git pull, only to be told
    “CONFLICT (content): Merge conflict in filename.
    Automatic merge failed; fix conflicts and then
    commit the result.”
This is because you've created your local tree as such:

   A-B-C
And the remote tree looks like:

  A-B-D
When you run a 'git pull' it's a combination of two operations:

  1. 'git fetch' or 'git remote update' which updates your local
     copy of the remote tree. This is stored locally in the branch
     <origin_name>/<branch_name> (e.g. origin/master).

  2. 'git merge <branch_name> <origin_name>/<branch_name>' most
     of the time this merge is a fast-forward merge which you don't
     notice at all. It's when your local commit 'C' has conflict
     with remote commit 'D' that you run into an issue.
In the end your tree will look like:

   A-B--D--E
      \_C_/
Where E is a commit with that resolves the conflict and has two parent commit IDs which point at D and C. In general, you want to just avoid this kind of stuff in the first place by either using rebase or doing the 'git pull' before you commit you changes, then push them out.

[ I'm having a hard time picturing how you would have resolved this with SVN. If you have a conflict between a local file and an update that you are pulling down with 'svn up' what happens? (I've not used SVN extensively) ]

{edit}

> Well, darn. So, OK, no problem, I already fixed the conflict, so now I’ll just git merge again to get it to make its world consistent, right? Nope: “fatal: You have not concluded your merge. (MERGE_HEAD exists)”. Well, duh! I was telling you to merge, you stupid git. You’re the branches-and-merges-are-easy version-control system around here.

You're telling git to start a new merge, not complete an in-progress merge. Think of 'git merge' as 'create a commit that merges these two things together.' When you run into a conflict 'git merge' tells you, "Sorry I couldn't automatically resolve this for you, but I got you as far as I could go. You'll have to manually resolve these conflicts and create the merge commit." At this point your working directory is in a state of 'middle of a merge.' You just have to resolve the conflicts that were pointed out to you (adding them to the index once they are resolved), and run a 'git commit' to push out the merge commit.


"svn update" will randomly delete your work with no way to ever get it back.

I don't think new git users migrating from svn actually understand that they can rollback any changes git makes. If a pull goes bad and you don't want to deal with it, just reset to your last head. All is forgotten until you feel like fixing it. And no data is ever lost.

(svn loses data by merging everything it can with your uncommitted changes, and then barfing when it can't do the merge. You're left with a bunch of conflicted file, and a bunch of merged files, with no way to ever get the unmerged files again. With git, this simply cannot happen. Git will not touch untracked files or unstaged changes, it will just die. And what it does automatically merge can be unmerged with reset. Or, you can commit the conflict markers to a separate branch, deal with it later, undo that commit, and move on with your life. Git makes easy things a little harder, but hard things easy. Subversion makes anything except hair loss very difficult.)

Oh, and also, the OP really wants his tree to end up like:

    A--B--D--C
This can be achieved with "git pull --rebase" (or automagically with the config key branch.<name>.rebase = true). That is what Subversion does, modulo the ability to revert to your original branch.


"svn update" will randomly delete your work with no way to ever get it back.

No, it will not.

You're left with a bunch of conflicted file, and a bunch of merged files, with no way to ever get the unmerged files again.

Umm... no? Take a look at conflicted-filename.mine. Whenever there is a conflict, svn appends .mine to your copy of the file. After that, it creates conflicted-filename.r123 and conflicted-filename.r124, and goes crazy with the angle brackets in conflicted-filename.

I've lost plenty of data with git. Most of it has to do with innocuous-sounding commands that don't ask for confirmation when deleting data. For example, git checkout filename is equivalent to svn revert filename. Of course git checkout branchname does something completely different. If a branch and a file share the same name, git will default to switching branches, but that doesn't stop bash autocomplete from ruining the day.

Here's a crazy idea: If you have an innocuous action and a dangerous action, do not label them with the same command.


No, it will not.

Yes it will. Imagine you check out revision 1, which consists of two files:

    foo:
      a

    bar:
      b
You do some hacking, and end up with:

    foo:
      a
      

    bar:
      b
      d
While you were doing that, though, someone else committed, revision 2, which is:

    foo:
      a
      b

    bar:
      c
You go to commit your changes, and svn tells you you can't, because you are out of date. So you have to svn update. Now you have:

    foo:
      a
      b

    bar: CONFLICT
      >>>>
      b
      d
      ====
      c
      <<<<
Now, how do you roll back to what you had before you updated? The state of "foo" has been lost forever by the successful merge.

For example, git checkout filename is equivalent to svn revert filename.

Annoying, maybe, but this is user error, not design error. With git, if I want to losslessly discard my working copy, I can just "git stash". If I want to losslessly update my svn working copy, though, I have to make a copy myself, and then manage the copy.

By your logic, "rm" is flawed because it doesn't ask for confirmation when you pass -f instead of -i. Well, yeah. Sorry.


You didn't lose any data in that scenario. Every line of code you wrote still exists in those files. The problem is that you are in a conflicted state that must be manually resolved. Unfortunately, making updates atomic across a branch has disadvantages. For example, svn lets you update individual files or directories instead of the whole branch. If you want to avoid this pitfall in the future, run "svn merge --dry-run -r BASE:HEAD ." before a real update. (I wish svn update had a --dry-run flag. Just because git is bad doesn't mean svn is perfect.)

Also, your scenario is extremely unlikely. I've used svn for 5 years and I've encountered that problem once. It was for a binary file. Two versions of the same image. It's not very often that two developers create a new file with the same name at practically the same time. It's even less often that those files can be properly merged.

By your logic, "rm" is flawed because it doesn't ask for confirmation when you pass -f instead of -i. Well, yeah. Sorry.

$ git checkout blah

This command either switches to branch blah or it erases all uncommitted changes in a directory or file named blah. Without more information, you can't tell. I find that frustrating and annoying. Your analogy would be more accurate if rm somename was the equivalent of apt-get update, and rm othername was rm -fr othername. Oh, and somename is never tab-completed but othername is.


> You didn't lose any data in that scenario. Every line of code you wrote still exists in those files. The problem is that you are in a conflicted state that must be manually resolved.

I think that the point is that file 'foo' has already been merged, regardless of the conflict in file 'bar'. There is no way for you to revert to the pre-update state. In git, your previous commit still exists in the objects store even if it is no longer connected to the tree. And garbage collection won't even clean it out right away because it is still in the reflog ('git reflog'). The point being that once something is committed, it's permanently (barring garbage collection) in the repository. Whenever you make a change to a commit, a new commit is created, some pointers are changed, and the old commit still remains in the repository.

> Oh, and somename is never tab-completed but othername is.

Responsibility for the tab-completion falls squarely on your shell (or where ever you got the tab-completion setup from). Don't point your finger at git and say, "git sucks because bash tab-completion screwed me up." Neither rm nor git can control how your shell bothers to determine tab-completion.


Still, it can't be right that "get checkout foo" does one of two COMPLETELY different things depending on whether or not there is a file called foo in the current directory. Surely one of those two commands should have a different name.


I've always felt like `git branch` could be the one to switch branches (since it's used to create them too). But `git checkout -b` also creates branches... I think semantically checkout is the right command for this.

It's never come up as a problem because I tend to know what files are in my project and I also tend to know what it is I'm about to/want to do. I very rarely switch branches with a dirty working copy anyway and my branches are never named even remotely close to what files are named (by coincidence, I suppose, but I name branches after: 'releases' which have names like "2.2.2"; 'bug fixes' which have names like "bug2598" or 'features' which have names like "dashboard-rewrite" and "chunk-load-thumbnails").


Here's another crazy idea: don't run 'git checkout ...' on a dirty work tree. Problem solved.

Another one: don't reuse filenames as branch-names.

To be honest: I have the same problem with careless invocations of 'rm' ruining my day but when I'm muttering curses it is at my lazyness/stupidity and not at bash completions or the behavior of 'rm'


Honestly, it's annoying that we all use single-version filesystems. It was a good idea back when computer storage consisted of a big rotating metal drum and a major government could only afford 1MB of storage.

Now that 1TB is like $70, we should just keep every filesystem state around. Maybe not forever, but so that "rm -rf *" is just a "gah, that's not what i wanted!" moment instead of a "the rest of my day is ruined" moment.

Premature optimization is the root of all data loss.


svn update has never randomly deleted my work, but then I never willingly do an svn update with local changes.

svn update with local changes gives me this dialog:

    Conflict discovered in 'test.pas'.
    Select: (p) postpone, (df) diff-full, (e) edit,
            (mc) mine-conflict, (tc) theirs-conflict,
            (s) show all options:
However, I would normally just abort at this point, and make sure my local changes are packed up into a diff like they should be.


So basically, you've written your own revision control system that integrates with Subversion. Fine.


Well, this begs the question, why not just let git do all that diff-packing and 99% of the conflict resolution crap for you?

As long as you don't mind spending a weekend or a few evenings reading about the new tool and how it works (like any new addition to your skillset), that is.


I'm afraid the git clone behaviour is unusable for the svn repository I work from. A clean pulled tree is over 1GB at the moment I believe, and contains many binary files (static libraries etc.) that aren't rebuilt that often, but often enough to make the repository as a whole very large. I don't know exactly how large it is, but it's large enough that it would take many weeks to download over VPN. Just getting a fresh tree takes over 24 hours as it is.

As for diff packing, it's as simple as this on the bash command line:

    collect-patch $(now)-hint.patch foo/{bar,baz}
The collect-patch script will do an svn diff over foo/bar and foo/baz from the root of my working directory, and deposit the results in a file in my current directory (e.g. 2010-05-13_15-50-whatever.patch).

Then, I can revert local changes:

    revert 2010-05-13_15-50-whatever.patch
(This reverts changes to the files listed as modified in the patch.) Because it's a file, I get completion on that. And to apply (e.g. after an svn update):

    apply 2010-05-13_15-50-whatever.patch
If things get more complicated, and there are conflicts in the lines modified by my patch and the updated source (which is less than 1% of the time), then I do:

    manual-merge 2010-05-13_15-50-whatever.patch
That uses a tool (BeyondCompare 3) to do three-way merging between the original revision of my patch, my local edit to the original revision, and the new revision that's been gotten locally. BC3 can auto-merge for all non-conflicting edit sections, but be interactive with a nice diff display and editing if there's a conflict (/automerge and /reviewconflicts arguments to bcomp).

I'm sure I could migrate this scheme onto an analogous git scheme that could keep track of the merges better, but in practice this scheme is enough - and because I wrote it, I'm intimately familiar with its mechanics, so I can extend it, and I'm not at risk of file loss, etc.


I'll tell you what happens when I use svn and there's been an upstream change: I never update my local tree with local modifications. Instead, I extract all my local changes into a diff, then I update my local tree, and then I merge my diff back into the updated tree and commit.

When I need three-way merging, which isn't often - usually patch can resync simple things like line offsets - it's handled by a file comparison tool. I have a simple script which handles this.

I work remotely (across the Atlantic) on a large svn repository, so I tend to not to update every day, nor commit every day. Instead, I version local changes with diffs.


I would suggest using git-svn rather than versioning your local changes with diffs, but I guess that's just personal preference.

{edit}

Just to add that:

> I'll tell you what happens when I use svn and there's been an upstream change: I never update my local tree with local modifications. Instead, I extract all my local changes into a diff, then I update my local tree, and then I merge my diff back into the updated tree and commit.

git has this built-in with 'git stash' which cleans up the index and working directory (leaving ignored or untracked files alone) and stashes the changes in a 'stacked' list. You can then use operations like 'git stash pop' or 'git stash list' to operate on that list.


Sure, if I wanted to solve a problem I didn't have.

I found this answer enlightening:

http://stackoverflow.com/questions/747075/how-to-git-svn-clo...

git-svn doesn't look like it deals terribly well when it only has a shallow copy of the repository. I may be entirely wrong of course, but I'd hate to have to pull down 100+GB of data over VPN onto my 80G SSD just to keep git happy. (Yes, there are binaries in there, rather too much of them for me to update every day.)


It was just a suggestion (I use git-svn at work for our svn repositories). But none of our repos are 100+GB, so it wouldn't work in your use-case. Though I shudder to think of what maintaining a SVN repo of that size must be like (from the back-end of trying to keep it running and performant).


Duh. I've never got the hang of git either, but I figure git is for those people who don't get enough mileage out of Mercurial because they have 42 megatons of source code, or need git rebase, or just want to shock and awe their friends. But no need to hate git for that. No one's forcing you to use it. You don't like vim? Go ahead and use emacs/gedit/eclipse/perl. You don't like git? Use mercurial. Or subversion, if it makes you happier. Merging is always a pain with VCS, more so with DVCSs because they let you conveniently forget a bunch of patches in that repository copy on the USB stick in your back pocket . And these long mumblish version numbers are also going to stay - but they're not so bad when you use a graphical repository browser (hg view/hg_web/git-web/tortoise).

And if you just need versioning for a couple .doc files: use Apache's mod_dav_svn and be done with it (i.e., you can mount it as a directory in Nautilus or the Mac finder and it just works). Even though I like all the fancy stuff in Hg, I started using it and never looked back because it's always been painless. (Oh, and you can work directly* on the repository, version history is mostly useless because it reflects weird program behaviour including temporary files, but you'll never get any conflicts unless you mess up really badly).


"But no need to hate git for that. No one's forcing you to use it. You don't like vim? Go ahead and use emacs/gedit/eclipse/perl. You don't like git? Use mercurial"

If only this were true. But it's not: if I work on a project where my colleagues keep the source in git, then I have to use git (whereas if they use vi, I am still at liberty to use emacs). That's the problem with CVSs -- they have a lot more inertia than most other tools because they are, by nature, shared by groups.


I presume you haven't come across Hg-Git, then.

http://hg-git.github.com/


I've seen things like this. I would be very wary about trusting them with my precious source code, wouldn't you? However good they are, they're an additional layer in which (A) things can go wrong, and (B) you're insulated from the reality that you need to access to fix problems.

I fear that Hg-Git would be one of those things that is an absolute delight for as long as it Just Works, then abruptly transitions into an absolute horror the moment something goes wrong.


I suspect that many people feel the same about computers in general...

I don't use Hg-Git, but I do use git-svn. Just like Hg-Git, I'm sure there are some caveats, but what finally got me to switch is that (at least for git-svn) it's a real live Git repo. Which means the code I write, my actual work, is all still there. The fix to my nightmare scenario? I do a fresh svn checkout to a new directory, and I do git checkout, I copy the files out, I do an svn commit of that version of files.

Definitely far from ideal, to be sure, but in my mind, knowing that was the worse case scenario, put me quite at ease. (Sure I'd read about git svn dcommit, but knowing that did set my mind at ease.)


I've been trusting my precious source code to Hg-Git for a while. The sort of "something goes wrong" moments have usually been fixable with a bit of grubbing around in Hg-Git's source. They've got themselves a pretty inviting code base, FWIW.


> they have a lot more inertia than most other tools because they are, by nature, shared by groups.

The term you're looking for is "network effect". http://en.wikipedia.org/wiki/Network_effect


Mercurial also has had rebase built-in for some time. For git users who have compared both, what features made you choose git over hg?


I think git is much better than hg, but I think most complaints made in the post are fair. Git UI is truly awful. IMO, The two killer features of git are the index and content-based tracking.

First the index makes dealing with OP situation much better once you understand it. For example, in hg (or any other VCS I know of), when you merge something and there is a conflict, every change is in your tree, and you don't know what's correctly merged from what's not through the VCS command. For example, hg diff will show you changes of both merged and unmerged stuff, but thanks to the index, git diff will show only the unmerged stuff. When you start fixing conflicts, when you add files with git add, those are not shown anymore with git diff (but are through git diff --cached). The index has a high learning curve, though.

The other killer feature is code tracking: git blame -C -M is extremely powerful. It can tells you that which changes are coming from which file (through heuristics, so it also works for code converted to git, e.g. git-svn). I explained this in more details there http://cournape.wordpress.com/2009/05/12/why-people-should-s....

I think in the end, git is actually simpler than hg - that is, the UI is awful, but the underlying model is simple. For example, the branching model in git is simpler than hg. In hg, you have branch-through-clone, bookmark, branches created by hg branch ( http://stevelosh.com/blog/2009/08/a-guide-to-branching-in-me...). This is maybe a matter of personal opinion, but I hate simple version numbers in DVCS (I changed from bzr to git because I wasted a lot of time with bzr so-called simple numbers). With DVCS, it is impossible to have a consistent version scheme (that is at some point, you will have several branches with the same simple version referring to different commits). One thing to understand is that you almost never need to use the raw id in git, because most commands understand a lot of different syntax such as HEAD^, HEAD~, branch names and tags. If I need to refer to a commit more than once, I use a tag for it.

One think which git got wrong IMO is fast-forward when pulling: it means you lose branch information, and the history is hard to understand (it complicates life for bisect or continuous integration) - I change the pull command to make pull non fast forward by default in git.

Going back to svn is insane if you ask me, at least for usual usage of source-code only (DVCS currently suck at assets management). I agree with the OP that branching is sometimes used too much by git users, but branching for release management, code reviews, etc... has saved me hours of work as a release manager on several middle-sized open source projects (through git-svn).

Git got the low-level stuff right, but I think we have barely seen what's possible with DVCS. Bug-tracker integration, code-review integration, etc... are still in infancy.


> One think which git got wrong IMO is fast-forward when pulling: it means you lose branch information, and the history is hard to understand (it complicates life for bisect or continuous integration) - I change the pull command to make pull non fast forward by default in git.

That sounds pretty terrible. I would certainly refuse any changesets from you in OSS projects I'm managing if you tried to send me changes that had artificial commits introduced everytime you tried to sync with the upstream.

You don't lose any information with a ff. There's no information to lose -- you're just not specifically recording a commit that has no code changes, but says "and on this date, I grabbed some upstream code."

I've seen that complaint elsewhere, though. I always do pull --rebase (and only take merge commits in exceptional cases) because I'm on the other end of that spectrum. My history is very easy to read. Glad you can bend it to do what you want, though.


You do loose useful information, because you don't know where branches started/ended. For feature branches, that's very useful: useful for bisect, useful for reverting a branch. For syncing to upstream, that's indeed not so useful.


Git pull/merge etc. have a --no-ff switch that will do a normal merge even if a fast-forward is possible. You can even configure the default, though I don't remember the config variable.


If it fast-forwarded, it's because you haven't added any changes that aren't already upstream. If you're losing information, it's when you considered starting a branch.


I get the idea that he wants version control that 'just works'. Unfortunately version control, like most other software, has become complicated.

He needs to spend less time typing and more time reading. I don't think git is something you can reason out just by throwing commands at it.


The fact that you need to install either Cygwin or a 100MB collection of Msys files put us off it for our project. We stuck to Mercurial because we thought it would be easier for newbies to get involved, and it seems to be working.

Also, unlike Git, Mercurial is coded in Python.. Git is a mesh-mash of half a dozen different languages. I'm guessing long-term, Mercurial will be maintained better (but might be wrong).


The length of the explanations in this thread says something about the usability of git. Not hating, just saying.


I generally agree with the sentiment of this article, though I feel like I need to get better at using git anyway. It's a leaky abstraction and very unintuitive, no doubt, but after using Git regularly I just can't go back to subversion, the workflow in Git is so much nicer (when everything goes as expected, at least...)


I felt the same way about git but I wasn't able to articulate it as well as he did. Go mercurial!


We all need to throw a tantrum sometimes when things don't work the way we like. It's ok. Let it out. We're computer users, we understand.

...feel better?

Now shut up and learn how to use your tools.


It's true. www.ventatme.com isn't registered...




Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: