
Git concepts simplified - whiteshadow
http://gitolite.com/gcs/index.html
======
stormbrew
A suggestion for anyone hoping to write accessible tutorials:

> Hg folks should read this section carefully. Among various crazy notions Hg
> has is one that encodes the branch name within the commit object in some
> way. Unfortunately, Hg's vaunted "ease of use" (a.k.a "we support Windows
> better than git", which in an ideal world would be a negative, but in this
> world sadly it is not) has caused enormous takeup, and dozens of otherwise
> excellent developers have been brain-washed into thinking that is the
> only/right way.

If this is one of the most important concepts, starting it off with a slew of
negativity about things the reader may be currently using (windows, hg) is
probably not the best way to get them to keep reading.

~~~
ender7
There's also a bit of "people in glass houses..." to this comment. I don't
think git really wants to start a fight when it comes to poor design
decisions.

~~~
ranman
out of curiosity could you enumerate some of the poor design choices in git?

~~~
stormbrew
I'm a huge fan of git (but became one the hard way) and I think that most of
the poor design decisions are in the command layer. Largely the tendency for
the same command to do several things that, to the user, are wildly different
(often because they map to the same fundamental operation on the DAG).

As the most simple example, git add both adds a new file to the index and adds
changes to an existing file to the index. They are both fundamentally the same
operation, but your intentions are different.

Checkout, likewise, both changes HEAD's symbolic reference and changes the
working copy to match the index or a commit. This makes it both a means of
full tree changes and local reversion.

And then there's rebase...

I don't think these are huge issues, but I do think they're barriers and a
source of common early issues with git for new users.

~~~
Already__Taken
Even just simple commands kind of bother me for poor consistency.

    
    
      git remote -v
      git branch -a
      git tag -l
      git stash list
    

1 concept, 4 commands. I don't even use git much but this stuff makes it
really slow to pick up.

~~~
davvid
If you're curious, "git remote", "git branch", and "git tag" all present a
list without supplying those options. You only need "git branch -a" if you
want to see remote branches.

"git stash" is certainly the outlier; the reason is that when "git stash" was
first added it was meant to be a short-and-sweet command. Otherwise, you'd be
forced to type "git stash save <name>", which is not so short and sweet. "git
stash list" was added later.

~~~
dolinsky
remote branches can also be achieved with one flag by using `-r` ( git branch
-r ). Neat note about the reasoning behind stash.

------
tootie
There is an inverse relationship the number articles title "x explained
simply" and the actual simplicity of x. I honestly don't understand why the
developer community refuses to admit the obvious that git is unholy
clusterfuck of a product. It has a nice data structure inside it? Name another
end-user product for which you are even vaguely aware of what data structures
were used.

~~~
jlgreco
> _Name another end-user product for which you are even vaguely aware of what
> data structures were used._

Unix.

You are operating primarily on a tree of files and streams of text. To operate
on these you have a wide array of utilities that perform simple tasks (and a
handful that perform complex tasks as well) that, when composed, allow you to
perform any transformation you want. You can get a freshman CS student off the
ground with Unix like systems in what, one lecture?

Because the (bleedingly simple) data model is the focus when learning Unix,
you don't need to memorize every single little edgecase of the system.
Knowledge of the data model alone is enough to tell you what sort of things
can or cannot be done, and _general_ knowledge of the sort of thing that a few
utilities do is enough bootstrap yourself. If you want to list out some files
in some _particular_ way, you may not know immediately what exactly to type,
but you probably do know that ls or find is a decent place to start looking.

~~~
pjmlp
Except there isn't a single version of UNIX, each flavour has its own
deviations.

~~~
jlgreco
Technically there isn't a single version of Git either, though _(thankfully)_
they can all operate on the same repositories (whereas in UNIX land, you'll
find different file systems that the others do not support). They do however
have some different capabilities. jGit for instance can push to S3, which is
pretty neat.

------
WestCoastJustin
Gitolite (where this is hosted) is actually pretty cool too. For those who
don't know what gitolite is, it is software to works in tandem with git-
daemon, that basically allows you to run a centralized git sever with access
rules.

I created a screencast about it @
[http://sysadmincasts.com/episodes/11-internal-git-server-
wit...](http://sysadmincasts.com/episodes/11-internal-git-server-with-
gitolite)

~~~
beagle3
There's also gitlab, which is a github style app you can run locally. It used
to use gitolite internally for access control, but it is now using its own
access control system.

~~~
csense
I've used both. Gitolite does everything on the command-line, including access
management.

Gitlab is essentially an open-source clone of Github's web UI. Of the two
projects, I think Gitlab is harder to deploy and far more resource-intensive
on the server, but easier for users.

------
tekp2
I was expecting this: [http://tartley.com/?p=1267](http://tartley.com/?p=1267)

~~~
nilkn
I really did not expect that. That's one of the best unexpected funny things
I've experienced this week. I don't think I would have found it as funny if I
had stumbled on it randomly; the discussions here provided the perfect context
for this link.

------
lisper
Great article. Just one rather glaring omission: only a single mention of the
index, and that only in passing. I have used git for years, and I still don't
understand what the fleeping index is supposed to be for. What can you do with
the index that you can't do with a branch? And why is it called the index?
(And why is it git add -a but git commit -A? Or maybe it's the other way
around?)

~~~
Cyranix
Not to be cruel, but I don't understand how you've used Git for years without
understanding what the index is. There's more than a few learning resources
for Git online to satisfy your curiosity. Anyway, to give an abbreviated
explanation:

The index is a staging area for your commits. When you use `git add`, changes
in the working directory are staged (prepared) for the next commit. If you
pass the `-a` flag to `git commit`, Git will stage all changes to files that
it is already aware of. (Recall that new files are untracked and must be
manually added to the index the first time they're committed; `-a` won't add
those files because Git doesn't already know about them.)

Why have a staging area instead of just creating a commit directly from all
the changes in the working directory? It's basically a sanity measure for
organizing commits if you're ever anything less than a perfect developer. If
you make a bunch of changes and later realize that there's more than one "unit
of work" represented in those changes (however you choose to define those
units), you can selectively add files to the index to create commits that make
sense. You can even use the interactive mode of `git add` to selectively stage
changed sections within a single file. If you care about the benefits of
sensible commits -- bug hunting with bisection, ability to run `git revert` to
undo a logical unit of work -- then the index is your friend.

A few random pages on the index that I pulled up:

[0] [http://www.gitguys.com/topics/whats-the-deal-with-the-git-
in...](http://www.gitguys.com/topics/whats-the-deal-with-the-git-index/)

[1] [http://git-scm.com/book/en/Git-Tools-Interactive-Staging](http://git-
scm.com/book/en/Git-Tools-Interactive-Staging)

~~~
lisper
Well, I'm being a little facetious. I do understand what the index is and what
it's used for (but not why it's called the "index" instead of, say, the
"stage"). What I don't understand is why the index exists as a separate
abstraction. You could have the exact same effect by, for example, doing a git
stash, and then popping changes out of the stash into your (now clean) working
directory. The WD in effect plays the role of the index, and you get the same
result, but with fewer abstractions, fewer commands, and less confusion.

But I hold Linus in high enough regard to take very seriously the possibility
that the index is a reflection of some deep wisdom that I have missed. That's
the real reason I raise this every now and again.

~~~
Cyranix
I'm not sure I see how you could use `git stash` to accomplish the same thing.
Running `git stash && git stash pop` is almost a no-op, so you don't get the
benefits described above. Am I missing something?

~~~
lisper
I thought you could pull individual files out a stash, but it seems I was
wrong about that. But it's not hard to imagine a variation of git stash that
allowed you to select individual files from a stash to pull back into your
working directory.

------
aeon10
There seems to be ALOT of 'Git Explained' stuff. The best way to learn Git in
my opinion is start using it! Google stuff as you go.

~~~
zwieback
True, but even after travelling along the RCS->VSS->CVS->SVN graph for the
past 20 years I still find git baffling sometimes. I like it but the question
is whether I want to invest my brain resources, which are a scarce resource at
age 47, in learning a revision control system that can drive you into a cul-
de-sac when using even the most basic workflow.

I think git!=simple but I applaud the effort.

~~~
MBCook
The model behind Git, the way the repository works/conceptual model, is
incredibly simple. The only trick is learning which commands are used to make
the changes you want, and that can take a little time (and Googling).

------
dcre
I love Git, but I'm pretty sure there's no way to explain it simply.

~~~
jlgreco
I don't get this attitude. The DAG is exceedingly simple and I think can be
taught reasonably in only a few minutes. From there you really only need to
teach a _very_ minor amount of the UI, and teach the user how to perform _" I
want to do this to the DAG"_=> _" This is what I type"_ translations on their
own. I've seen all of this done well in sub-hour presentations.

Mercurial on the other hand has a pain in the ass datamodel (so much so that
most introductions to it that I have seen do not even approach the topic), so
you actually do have to learn all of the UI commands to get an idea of what
can be done and what cannot be done. It is far more complex than git.

I really cannot think of a simpler VCS than git. I've used plenty, but never
got off the ground faster with anything else.

~~~
damncabbage
I agree with the model being great. The git command itself is abominably
baroque in its user interface (inconsistencies and strange defaults abound),
but I've gotten over that with more effort than I'd like to admit.

I love Git's plumbing; I just hate its porcelain.

~~~
tootie
People who refuse to see the ugliness in git are the same people who think
it's manly to live on the command line. Using git through Eclipse looks a lot
like using SVN through Eclipse. If I right-click on a file and go Team >
Replace With > Remote and select 'origin master' is saves me from trying to
remember the obscure list of flags I need to pass to the command line to do
the same thing.

~~~
jlgreco
_Refuse_ to see the ugliness? No, I really don't see it. Sure, a few flags
could be cleaned up, but the beauty of the rest of git more than offsets a few
weirdly named flags.

Meanwhile git through eclipse causes nothing but trouble as far as I have
seen. Making it seem like SVN is exactly the problem, git _isn 't_ like SVN so
if it seems that way, something is going wrong. Pretending git is something
that it isn't will bite you in the ass sooner rather than later. The
disappointing part is that there isn't any technical reason why git
integration in eclipse _couldn 't_ be good, it just isn't currently.

~~~
tootie
The fact that I can select files to add/commit with checkboxes and permanently
check a box that will push every commit is exactly what I want 99% of the
time. I know that isn't the "git way", but the "git way" has no value for the
kind of projects I work on.

~~~
jlgreco
The "git way" is just whatever way you want. If you want a central server that
you always push to, that is fine; there is no problem with that.

The problem is with that particular _tooling_. That tooling presents a
workflow that is perfectly fine (though it is problematic that it does not
facilitate alternative workflows, which becomes particularly problematic when
working on a team with other users), but it obscures what is actually going on
and executes that workflow imperfectly, generally falling over in rather novel
ways. When it fucks something up, and it eventually will, you will need an
understanding of the basic concepts underlying git to figure out what went
wrong. I'm not saying you need to know how to use the default git porcelain,
I'm saying you need to be aware of the _concepts_ underlying git.

------
andrewflnr
This has been danced-around in the comments, so I'm just going to say it: I
don't need the concepts of git simplified, I need a better explanation of how
git's bizarre command set maps onto the obvious DAG/filesystem operations.

~~~
ams6110
Seconded. And, at 30 printed pages, I'd hate to see the non-simplified
explanation (no, I didn't actually print it).

------
Pxtl
... that is not simple. But I'm figuring it out anyways.

I commit my changes to my own repo and keep building changes under HEAD,
committing as I go. If I get a branch from a buddy and I want to add it to my
code, I either merge or rebase depending how I want his commits to be
intertwingled.

Because GIT is decentralized, there's no difference between merging in a
buddy's branch, and import the latest changes from Origin into my branch. So I
fetch the changes from origin/master and then rebase or merge my repo on top
of that. That's my "Get Latest" command, basically, right? Assuming I'm
working on "master", I fetch then rebase or merge origin/master.

To check in, I tell the origin server to take _my_ stuff and then rebase or
merge its master with that.

I still feel like this is a rather baroque approach to the problem... managing
oodles of local commits separate from rebase/merges seems bizarre, above and
beyond the decentralized approach that makes my own repo, my peer, and the
"origin's" stuff all equivalent.

The decision of when to merge vs. rebase is still confusing to me.

~~~
dcre
One small point: on check-in, I don't think it's exactly that origin is _also_
rebasing/merging just like you did. It's more like origin is just taking
whatever you have and copying it exactly. The merging/rebasing process itself
only happens locally.

Regarding merge vs. rebase, here's my approach: rebase to keep history a
straight line when it's just your changes and it's just a few commits. If it's
too many commits you tend to have more conflicts and it's usually easier to
merge.

~~~
Pxtl
> One small point: on check-in, I don't think it's exactly that origin is also
> rebasing/merging just like you did. It's more like origin is just taking
> whatever you have and copying it exactly. The merging/rebasing process
> itself only happens locally.

But then what happens when I merge/rebase locally and publish at the same time
as somebody else? I assume the origin doesn't keep a lock on the whole mess
while I'm doing my local merge/rebase. That sounds like it would lead to a
"last-one-wins" conflict-resolution or a complete reversion of the origin if I
publish without rebasing on the origin's version first.

~~~
sethrin
Are you using gitflow?
[https://github.com/nvie/gitflow](https://github.com/nvie/gitflow)

So yes, if you try to push without pulling down changes, then you will get an
error about the histories having diverged. Sometimes you don't care about
wiping out history, and can just push with the -f parameter, but most of the
time that's your cue to rebase.

Branches are cheap in git, and there's no real advantage to doing dev work in
the master branch. You should create a branch at least as often as you develop
a new feature. In git, history isn't necessarily a static thing. Sometimes you
need to mangle it to achieve your goals, and sometimes you fat-finger the
merge and do time in History Hell. In either case it's only a huge flaming
deal if you're working on master. Ditto with the problem you describe.

I recommend the O'Reilly book on Git. The simple explanations are nice in
theory, but this is a complicated subject and deserves a full explanation.
There are very sound reasons for the how and why of git, and they should be
within the grasp of any aspiring programmer.

~~~
Pxtl
I'm not using git _at all_ , I'm trying to understand how it works before I
start. Every guide to git I read before was a completely opaque mess of
terminology, so this is the first time it's starting to make a lick of sense.

~~~
MBCook
I think Git is much easier to understand if you understand the underlying data
model. The core object in Git (as far as you need to care) is a commit. Each
commit holds a link to the commit(s) it was based on. If two commits have the
same parent, you have two branches. If one commit has two parents, it's a
merge. Individual commits don't know what branch(es) they are a part of.

Branches are just pointers to commits. When you make a new commit on a branch,
the new commit is saved and the branch pointer is moved forward to point to
the new commit.

Tags point to branches, but they don't get updated. They're supposed to be
permanent (but can be modified if you try). If a tag points at a commit and
you make a new commit, the tag still points to the original commit.

The only other odd term would be HEAD, which is like a branch that _ALWAYS_
points at what you have checked out at the moment. You'll see this if you make
commits without being on a branch (say you checked out an old commit and just
started working).

Since branches are just pointers at commits, you can move them around easily.
If you make 6 new branches, they all just point at the same commit to start,
which is why it's so amazingly fast to do. If you want to undo a commit (that
you haven't pushed), you can move the branch pointer to a previous commit.
After that when you make new commits it will be like the mistake never
existed. (Note: If you accidentally do this, it can be fixed if you catch it
soon enough).

When it comes to using branches, "A Successful Git Branching Model" [1] is
_very_ commonly used, and works fantastically. I had almost no trouble getting
the other developers in my company on the model, and it makes it very easy to
keep things straight.

If you'd like help understanding Git, I'd be glad to try to help you. My email
address should be in my profile. You may find this kind of thing much simpler
if you look at the graph of repository. My company uses SourceTree[2], which
is a pretty great GUI and makes it easy for me to see how the various branches
I've got relate.

[1] [http://nvie.com/git-model/](http://nvie.com/git-model/) [2]
[http://www.sourcetreeapp.com](http://www.sourcetreeapp.com) (Mac & Windows,
Git has a basic gui command built in if you want)

~~~
dolinsky
> Tags point to branches, but they don't get updated.

Did you mean to say tags point to a commit?

~~~
MBCook
Yep, good catch.

------
0xdeadbeefbabe
It's possible, at least after I see the concepts this way, that git has the
simple design Hoare was talking about when he said:

“There are two ways of constructing a software design: One way is to make it
so simple that there are obviously no deficiencies, and the other way is to
make it so complicated that there are no obvious deficiencies. The first
method is far more difficult.” ― C.A.R. Hoare

------
b0z0
Very nice, and to see it in action, this is also really cool:
[http://pcottle.github.io/learnGitBranching/?demo](http://pcottle.github.io/learnGitBranching/?demo)

------
forrestthewoods
If your "simple" explanation of how to use something is 27 pages then it is
many things and simple isn't one of them.

My favorite thing about Git is how it's forced Perforce to add features and
lower price. Thanks Git!

------
bb0wn
I found the git book to be more than adequate enough for explaining the
structures and practice patterns of git.

[http://git-scm.com/book](http://git-scm.com/book)

------
pessimizer
This did it for me:
[http://www.sbf5.com/~cduan/technical/git/](http://www.sbf5.com/~cduan/technical/git/)

------
acqq
I don't use Git and would like to know how the following problem is solved in
Git.

Say you have a project which is a hundred megabytes big. And you have to
develop almost in parallel three or four "generations" of the project -- let's
say. v1, v2 and v3. In parallel means you'd like to be able to build any of
the three versions without having to take the version out of the repository
first. You can't say that v1 is obsolete, as soon as some bugs are reported in
v1 you have to fix them in v1, v2 and v3. And every bigger version is "newer"
but some features can be added in v2 and v3 some just in v3 etc.

How can you work on such a big project and have a single repository where all
three versions are present, and work on these three versions in parallel
(having sources which are compiled in different base directories)?

~~~
taspeotis
> (having sources which are compiled in different base directories)

With my nascent Git understanding, I think you would just have multiple
branches for v1, v2... and then clone the repository multiple times so you
have multiple working copies.

Check out v1 in the first one, v2 in the second one.

Although changing between related branches is usually quite quick in Git.
Also, a fresh checkout of ~100mb is not a lot. At least for an SSD.

This also relies on having a centralised Git repository for you to push/pull
changes to. But I believe Git allows you to synchronise multiple repositories
on disk.

You're rarely developing two things at once in any given instant of time...
why not just quickly check out the branch you want?

~~~
MBCook
> and then clone the repository multiple times so you have multiple working
> copies.

This is probably not what you want. First, you should know that switching
between branches in Git is _insanely_ fast. In general, it won't get in your
way.

If you clone the repository, each one is a _full_ git repository. That means
you'll triple the storage on the disk. Worse, you'll have to do 3x as many
pulls to keep all 3 repositories up to date.

> You're rarely developing two things at once in any given instant of time...
> why not just quickly check out the branch you want?

It often comes up, but that's what we do. We may have a dozen branches on our
machines (the thing(s) we're working on, recent things we worked on, the one
that's been sitting for a while we're waiting on an answer to pick up again)
and we can switch our project within a second or two on a simple rotating hard
drive.

~~~
taspeotis
I know this, but GP was asking about how to do it with "different base
directories" which I'm assuming is asking for an analogy to Subversion's
multiple working copies (i.e. check out this location from the repository to
this location on disk).

~~~
acqq
Yes you're right, the thing is, the project produces different binaries which
should be accessible for all different versions, which was solved by having
only different base directories, all the configuration files build to same
subdirectories no matter which version. If I'd switch to "always having only
one version in a single base directory" then I'd have to maintain different
temporary output and binary output names in all the configuration files in
every version which is quite ugly. Then I can't use any "known fixed
subdirectory names" in the projects.

------
anandabits
This is very nice as a review of Git, but probably best in that context rather
than an initial presentation of the concepts. I really enjoyed the Source
Control Made Easy series by Jim Weirich. It presents the same information in
an easily digestible, step-by-step approach. Highly recommended for those
trying to understand how Git works and how to best make use of it.

[http://pragprog.com/screencasts/v-jwsceasy/source-control-
ma...](http://pragprog.com/screencasts/v-jwsceasy/source-control-made-easy)

------
wiremine
Git's learning curve feels similar to the learning curve of a programming
language like Python.

Once you understand you're not going to pick it all up in an afternoon (just
like a language) and that there will be lots more to learn down the road (like
a language), git feels great.

~~~
guard-of-terra
Programming language is 80% of my work effort but version control is more like
5%. It should be an _utility_ , I don't want it to be a world of its own
right, I don't have needs for insanely powerful version control system because
my needs are sane and limited - and that's where git is not so cool.

~~~
Perseids
You can use the same argument about debuggers, dependency managers, editors,
etc.

If you are happy with whatever tool you are using, why change?

~~~
guard-of-terra
Version control is social and there you can see a few maniacs ruining it for
the rest of the team.

Debuggers aren't much harder than pour and drink. Dependency managers are pain
in the ass (unsolved problem in CS) but you don't wrestle with them every day.
I don't use very many features of my Eclipse and I don't use terribly many
commands in vim. I also use arrows, I kid you not.

------
Sir_Cmpwn
It's the stuff behind the curtain that makes me love git. It doesn't just seem
like version control software - it seems like the software is an interface to
a much more powerful version control _engine_. Git just makes _sense_ under
the covers.

------
delinka
See also Git for Computer Scientists at [http://eagain.net/articles/git-for-
computer-scientists/](http://eagain.net/articles/git-for-computer-scientists/)

------
gbog
Nice. I'd love to see more details about the index, the way to see differences
between branches with log and diff, and I think stash should be mentioned.

------
mrcactu5

       git init
       git add .
       git commit ":-)"
       git push origin master
    

6 months later I still look it up... this tutorial is for me!

------
jheriko
a lot of this is generally (d)vcs and applies equally to mercurial or even
svn...

also if you think x pages of anything is a simple explanation then you missed
a trick or two.

e.g if you have to explain why your arrows are pointing backwards you are
doing it wrong, instead of using the standard notation for graphs and lists
and stuff which are not generally well known, use what most people will
understand on inspection.

------
obilgic
Can someone please convert this into a nice pdf? Chances are I will do this
type of reading when I am offline...

------
ezrasuki
Yeah right.

------
Jugurtha
I prefer the Fox News tutorial on the subject.

'Repo' means 'reciprocity' or 'reposotory' if you didn't know.

~~~
grapeot
[http://static4.businessinsider.com/image/52330e4deab8eaef7ac...](http://static4.businessinsider.com/image/52330e4deab8eaef7ac8a35b-960/fox-
news-interview-github.jpg)

I also wanted to post this screenshot and then saw your comments...

~~~
MBCook
I'm... slightly afraid. I can't wait to send this to my coworkers.

------
btbuildem
The 90's called, they want their bitmaps back..

