Hacker News new | past | comments | ask | show | jobs | submit login

Mike, Git seems unintuitive because you don't have a good grasp of what it does behind the scenes. Imagine trying to get to grips with a Unix shell, if you had no concept of files or directories. In such a scenario, even a simple command like "cat" would seem incomprehensible.

If you'll indulge me, I'd like to propose a thought experiment.

* * Designing a patch database * *

Consider you're responsible for administering a busy open source project. You get dozens of patches a day from developers and you find it increasingly difficult to keep track of them. How might you go about managing this influx of patch files?

The first thing you might consider is how do you know what each patch is supposed to do? How do you know who to contact about the patch? Or when the patch was sent to you?

The solution to this is not too tricky; you just add some metadata to the patch detailing the author, the date, a description of the patch and so forth.

The next problem you face is that some patches rely on other patches. For instance, Bob might publicly post a patch for a great new scheduler, but then Carol might post a patch correcting some bugs in Bob's code. Carol's patch cannot be applied without first applying Bob's patch.

So you allow each patch to have parents. The parent of Carol's patch would be Bob's patch.

You've solved two major problems, but now you face one final one. If you want to talk to other people about these patches, you need a common naming scheme. It's going to be problematic if you label a patch as ABC on your system, but a colleague labels a patch as XYZ. So you either need a central naming database, or some algorithm that can guarantee everyone gives the same label to the same patch.

Fortunately, we have such algorithms; they're called one-way hashes. You take the contents of the patch, its metadata and parents, serialize all of that and SHA1 the result.

Three perfectly logical solutions, and ones you may even have come up with yourself under similar circumstances.

* * Merging patches * *

Under this system, how would a merge be performed? Let's say you have two patches, A and B, and you want to combine them somehow. One way is to just apply each in turn to your source, fix any differences that can't be automatically resolved (conflicts), and then produce a new patch C from the combined diff.

That works, but now you have to store A, B and C in your patch database, and you don't retain any history. But wait! Your patches can have parents, so what if you created a 'merge' patch, M, with parents A and B?

   A   B
    \ /
     M
This is externally equivalent to what you did to produce C: patches A and B are applied to the source code, and then you apply M to resolve the differences. M will contain both the differences that can be resolved automatically, and any conflicts we have to resolve manually.

Having solved your problem, you write the code to your patch database and present the resulting program to your colleague.

* * A user tries to merge * *

"How do I merge?" he asks.

"I've written a tool to help you do that," you say, "Just specify the two patches you want to combine, and the tool will merge them together."

"Um, it says I have a merge conflict."

"Well, fix the problem, then tell the system to add your file to the 'merge patch' it's making."

Your colleague dutifully hacks away, and solves the conflict. "So I've fixed the file," he says, "But when I tell it to 'commit file' it fails."

"Remember, this is a patch database," you reply, "We're not dealing with files, we're dealing with patches. You have to add your file changes to your patch, and then commit the patch. You can't commit an individual file."

"What? That's not very intuitive," he grumbles, "Hey! I've added the file to the patch, but it tells me the merge isn't complete!"

"You need to add all of the files that have differences that were automatically resolved as well."

"Why?!"

"Because," you explain patiently, "You might not like the way those files have been changed. It needs your approval that the way it's resolved the differences is correct."

"Why to I have to re-commit everything my buddy has made?" he complains, "Seriously, I want to just commit one file. What the hell is up with your system?"




Mike, Git seems unintuitive because you don't have a good grasp of what it does behind the scenes.

In other words, Git's abstraction is leaky. That's usually considered a bad thing in our profession.

Except that since it's Git, we all use it, and it's better than the alternatives, we all pretend that's a good thing in this case.

I'm fine with the way Git works internally, and by now I've come to deal with the fact that sometimes it takes five commands to carry out what is, in fact, one desired action.

But Git's main point of failure is typical of all young projects that are in any way involved with Linux - there's no effort to make it elegant or pretty, and anyone that points that out and suggests that maybe things could be easier is ridiculed for not understanding it.

Usually "That's not very intuitive" is, in fact, an indication of something that could be improved...


In other words, Git's abstraction is leaky

No, it means you're using the wrong abstraction.

As you change your codebase, files are modified. To the untrained eye, it looks like this a simple linear progression of history, and you just want to record savepoints as you go along. CVS lets you pretend this is the case.

Actually, that's not the case at all. What you actually want to record is the changes you're making, and the relations between them. In the vanishingly small edge case where you never have any collaborators, never any experimental code, you never need to backtrack, you never need to work on more than one portion of the code at a time - this is isomorphous.

The rest of the time, it's not. CVS & SVN try to stretch the first abstraction to take care of these differences, but fail.

git makes you face up to the fact that your abstractions are wrong.


No, it means you're using the wrong abstraction.

It seems a large number of users prefers to work at a different level of abstraction than git requires.

git makes you face up to the fact that your abstractions are wrong.

Strangely mercurial tackles the exact same abstractions, yet has a much friendlier user-interface. The standard-rebuttal at this point will be "Fine, use mercurial then". I wonder if, at some point, more users will start doing that than the git-community would like. I, for one, am certainly tempted, but have so far held out due to the switching cost and because hg is, of course, not without flaws either.

However, I don't think this "If it hurts then you're doing it wrong"-attitude can be healthy for git in the longterm.

A bad user-interface remains a bad user-interface, no matter how you spin it. The big problem I see is not even with git currently having this bad user-interface, but rather with the widespread reluctance in git-circles to even think about ways to improve it.


You make valid points about open source tools frequently having leaky abstractions, and I often have exactly the same response that you do -- "Why don't people make more effort to make this elegant/pretty/intuitive?"

But the more I use git, the more I actually appreciate the fact that the abstraction is leaky. When I'm manipulating my history, I often really _want_ to have all the guts hanging out so that I can slice and dice them. If git wasn't designed using the "composable tools" idea [1] then it would make this stuff a lot harder.

The tradeoff is that it makes the learning curve a lot steeper. I understand that some developers don't want to know too much about their VCS, but I can't count the number of times when I have appreciated having an in depth knowledge of it [2].

I said this here once before, and I don't mind repeating it: git is a power tool for power users. It is also designed as VCS toolbox, so anyone who wants to write a more intuitive UI layered on top of git is welcome to. There are a couple out there, but they don't seem popular. I'm not sure why.

[1] Although this phrase also conjures the UNIX Hater's Handbook's take on it: "tools for fools"

[2] The next question is how much of the time do I get into these situations _because_ the guts are hanging out? Is that the reason I need the power tools to get me out of trouble? It's hard to judge because I'm too close to the trees to see the forest on this issue.


I see this mentioned so many times:

"Git seems unintuitive because you don't have a good grasp of what it does behind the scenes"

but I fail to see why this is true. Do we need to understand the implementation of block allocation, snapshots, atomic writes, etc. to save files? Do we need to know the ip checksumming algorithm to connect to use internet? Do we need to understand congestion control algorithms to browse websites? {and many other examples}

No - we don't and many of us do not know those things. Then why are we expected to know the internals of git to use it? (use - not modify, analyse, edit without an interface, etc.) Why can't we get a tool which gets an address and makes the file appear on the local storage? (or any equivalent to VCS workflow)

"Designing a patch database" - No, I want to use a VCS, not design it. If it works on pixie dust, I'm ok with that. It's just a tool. It's supposed to help me do the real work, not give more stuff to think about on every step. Why is "you don't understand how it works" an acceptable answer here? Where's the iphone of VCS-es?


"Do we need to understand the implementation of block allocation, snapshots, atomic writes, etc. to save files?"

No, but consider what you need to know about file systems to use them. At the barest minimum, you need to know that:

* Data is stored in files

* Files have names

* Files are contained in directories

* Directories have names

* Directories can contain directories

If you had no understanding of what a file or a directory was, I'd imagine you'd find the behaviour of "grep" or "cat" completely unintuitive.

You don't need to know how Git stores data internally, but you do need to have a basic understanding of its design, just as you need a basic understanding of a filesystem in order to use it.


I expect to need exactly the same level of knowledge to work with a DVCS, because I'm editing files. SVN was very close to providing that in many ways and it worked. I don't want to know about a staging area unless I explicitly use it, for example. I don't want to know about the design or patch handling, unless I explicitly request or apply a patch. Merging is merging - it's good couple of levels of abstraction above patches.

Basically - I want my DVCS to require as much knowledge as cp, diff, patch. I could live with them, if I had to (see - `quilt`). Now a DVCS should be easier to use, not harder. Otherwise, what's the point?


People use DVCS for its features and capabilities. Because it can do more things not because it is easier to use. In the case of git the 'simple' workflow runs into some troubles because git is designed to accommodate much more complex workflows and as a result some of the core concepts have been tweaked in what people consider to be 'unintuitive' ways.

People choose vi/emacs over notepad/ed/pico for many of the same reasons and people complain about many of the same things (it's unintuitive, complicated, confusing, and so on...)


The reason why this is important is that dvcs-es are very different from editors. If I don't like emacs, I'll use vi, gedit, ed, ... - we'll get the same file and the same result, it's only the method that's different.

If you use git however, I have a choice of a) git b) hg-git c) not working with your code.


Surely the same argument applies to SVN as well? Whichever VCS you use, you're going to force contributors to adapt to a particular version control philosophy.


"I expect to need exactly the same level of knowledge to work with a DVCS, because I'm editing files."

From Git's point of view, you're not editing files; you're creating patches.

Git is fundamentally a tool for constructing, sharing and storing patches. You may disagree that this is the best way to approach version control, but if you accept this philosophy, then Git is remarkably simple and logical.

Personally, I feel that treating a version control system like a filesystem is the wrong approach. I'm primarily interested in managing changes to the code, not in tracking chronological changes in individual files.


Weavejester, this is utterly brilliant. Like a lightbulb going on. THANK you!

I'd love to post it (with attribution, of course) as a followup article on The Reinvigorated Programmer. Please contact me to let me know whether that's OK -- mike@miketaylor.org.uk


You're very welcome to! I had worried it was a little too long, but I'm glad it turned out to be enlightening despite its length.

Git is not without its flaws, but I'm convinced the majority of problems people have with it is because most tutorials on Git seem to focus only on the commands, without giving them any context on how Git actually works. Initially, I had exactly the same problems as you did (and exactly the same disillusionment) until I happened across Git From The Bottom Up (http://ftp.newartisans.com/pub/git.from.bottom.up.pdf). Upon reading that, I also had a lightbulb moment.

Git From The Bottom up is definitely worth reading, but it does tend to be a little too low level at times. So I've been toying around with the idea of writing a "You Could Have Invented Git" article, in the style of You Could Have Invented Monads (And Maybe You Already Have!) (http://blog.sigfpe.com/2006/08/you-could-have-invented-monad...).


Many thanks, WJ. As you may have seen already, I went ahead and posted at http://reprog.wordpress.com/2010/05/13/you-could-have-invent... Much appreciated!


>Mike, Git seems unintuitive because you don't have a good grasp of what it does behind the scenes

I actually believe that's what "Unintuitive" means.

However intuitiveness is not everything. Power is often more useful than easy to learn.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: