
Git from the bottom up [pdf] - iamwil
http://ftp.newartisans.com/pub/git.from.bottom.up.pdf
======
10ren
Linus is a hacker, not an academic, so there is no clean high-level
abstraction of git; no _algebra of git_. OTOH, it is ridiculously fast and
ridiculously useful (for Linus, and for anyone who goes to the trouble to
understand its execution model).

I think there'd be a paper or two in attempting to infer an algebra of git.
There wouldn't be a clean one, so you could propose modifications of git to
facilitate a clean algebra, in the way that physicists and biologists
hypothesize laws to explain the observed universe.

~~~
etherealG
I disagree, this paper outlines the algebra quite clearly. A series of commits
in Directed acyclic graph layout, with the ability to modify this graph
however you please.

~~~
10ren
Yes, that's the underlying execution model. The approach of the article "git
from the bottom up" is that one needs to understand this in order to use it,
rather than being able to understand it in terms of the UI. Taking a lateral
step towards the ease-of-adoption and learning of a tool, there's an
intriguing aspect of git that I want to articulate further:

 _To understand part of git, you must first understand all of it._

Several aspects of git are interconnected; some design decisions don't make
sense in isolation; and you need to know the detailed and apparently
incidental behaviour of some commands in order to be able to use them. It's an
_expert tool_.

As an ideal (perhaps unrealizable), a tool has a learning curve, with a series
of "closed" subsets of functionality, such that each subset is complete in
itself. Learn a subset, and you are a master of that aspect. (I'm using
"closed" not in the strict mathematical sense of the results of operations
being in the domain of their operands, but to mean that you don't need to go
outside that subset). It's a similar concept to minimizing coupling between
modules.

e.g. you can play UT without knowing about alternate fire; you can using basic
reg exps without knowing all the weird clever stuff; you can write C in C++.
Languages especially have this quality - you can write procedural code in an
OO language, you can use one library without using all the libraries.

Git doesn't do this.

As an example, if you want to change the message of a commit, you need to
create a new commit with the new message. This immutability is important (I
infer) so that other instances of the repository can remain in sync (the new
commit will have a different hash from the old one (with a very high
probability), because the message is different). Thus, the apparently simple
operation of "changing a message" is interconnected with the distributed
nature of the tool.

If you want to change the message of the _first_ commit (at the root of the
DAG), it is more complex. First you 'change' the message of that first commit,
via making a copy (as above) - but the rest of the DAG is still pointing to
the old root, not your new one. Therefore, you _rebase_ (disconnect and
reattach a node) the child from the old root to the new root. Finally,
_rebase_ does not actually disconnect and reattach - it makes copies
(immutability again). You haven't changed the old DAG, but created a new one.

Changing a message is not as simple as one might expect.

An additional problem is that the documentation (man pages) doesn't always
define terms, and not always clearly. But they are obvious to someone who
already knows how it works (and doesn't need the man pages, except for
reference). BTW: I found that some terms are defined at the end of the options
section (the definitions aren't always complete or unambiguous, but they
help); and it's helpful to think of the _type_ of the arguments (though the
man pages don't use that term)

Please don't take this as a disparagement of git - it is an expert tool,
tackling and solving some extremely difficult problems of a master. I
understand it took Linus years of coping with the problem, and of having the
brilliant example of _bitkeeper_ before he could whip up git overnight. I've
invested quite a bit of time to reach the (limited) understanding that I have
so far, and each step teaches me more about the problems and solutions of
serious distributed version control. It is complicated because the problem is
complicated; and it _remains_ complicated because a master already grasps the
problem, and can cope with that complexity.

~~~
etherealG
your example is a great point, and something that just isn't practically
possible. Changing history is impossible once it's been pushed without making
chaos for anyone else that uses your code. And in git, commit messages are
part of history, not just annotation to it.

perhaps in other ways though, the closed subsets do exist. as long as what you
want to do isn't "complex" in git. things like commits, pushes etc.

Most importantly I would say is the fact that branching merging and sharing
that branched history _is_ a closed subset. If you only ever use the
commit/push/fetch/merge commands (with pull being the same as pull+merge), and
perhaps local branching as well, then you have a relatively simple tool for
sharing code and history of that code. by far you don't need to understand the
plumbing to work within that work flow, and I think I could explain it to
someone in a short hour or so of explanation and example.

------
fdb
It took a lot of time for me to really _get_ Git, and reading this document I
finally got it. Genuinely recommended.

I also like PeepCode's "Git Internals" <http://peepcode.com/products/git-
internals-pdf> .

------
oozcitak
Being new to git I found this document very helpful. _git stash_ was new for
me, and I am glad to have learned it.

------
tyrmored
This looks very helpful. I've had some trouble migrating from a GUI Subversion
client to command-line Git.

~~~
maurycy
Everyone I know seems to have. :-)

Personally, I'm a bit disgusted with the git hype. I love the git idea but the
interface is horrible.

~~~
davepeck
By interface, I assume you mean git's command-line interface? I agree; it's
terrible.

My impression is that projects like grit are far enough along that it should
be possible to build an entirely new git front-end/porcelain on a clean
technology stack. Something with more sane exposure. I don't know if anybody's
actually working on something like this right now?

~~~
mbrubeck
David Roundy (author of Darcs, a DVCS that predates git and has one of the
nicest command-line interfaces) is working on a git porcelain that has the
same user interface and semantics as Darcs:

<http://github.com/droundy/iolaus>

~~~
davepeck
Thanks for the pointer -- I played with Darcs back in the day. This looks
interesting.

