
Git from the inside out - Tomte
https://codewords.recurse.com/issues/two/git-from-the-inside-out
======
edejong
Certain systems can best be understood as black boxes. You put some commands
in and magic happens. Git was not designed to be such a system and early users
of git know this.

During the last 5 years, many GUIs have filled in this gap, making it
increasingly likely to find people completely stuck because they miss
knowledge of the foundations.

Git is a utility to manage an append-only repository of tree-objects, blobs
and commits. To help humans, git adds

\- human-readable pointers (branches, HEAD, stash)

\- an method to incrementally add changes (staging/index/working area)

\- a method to append tree-objects, blobs and commits from repository to
another

\- some commands which alleviate steps in common tasks

These last set of commands cause pain, as users without foundational
knowledge, do not realize these commands are compounding many small steps.

~~~
annnnd
> During the last 5 years, many GUIs have filled in this gap, making it
> increasingly likely to find people completely stuck because they miss
> knowledge of the foundations.

Arguably, these GUIs became popular because Git by itself offers awful UX. I
am all for obscure commands and switches (I use Linux and prefer CLI to
mouse), but Git really took it to the next level. And it's not as if it
couldn't be done better (as Mercurial shows).

As a user I don't want to know the intrinsic details about how some system is
implemented. I just don't care - give me an external model which helps me use
it, and leave it at that. I have my own code I need to worry about.

If most users get the "wrong" mental model when using Git, then the problem
lies with Git, not with users. </rant>

~~~
derefr
A large part of the point of Git (in contrast to earlier tools like CVS) is
that the internal data model that it enforces is one that _works_ in the long
term in distributed project settings.

Distributed source control requires a kind of _hygiene_ , much like dental
hygiene, or like cryptographic security. In the practice of hygiene, what's
_convenient_ is usually opposed to what's _sustainable_. Hygiene is
effectively the set of discoveries of _un-natural_ or _non-intuitive_ ways to
do things that _work better_ than the equivalent intuitive/natural practices.
Thus, any process that is "hygenic" is going to cause at least a little bit of
pain or annoyance to follow—if it didn't, it'd be "natural."

And _that 's_ the fundamental problem with Git: that it tries to paper over
the fact that it's "hygenic" by presenting itself as something seemingly
"natural." People have the wrong mental model of Git because Git tries to meet
them in the middle, by translating its (correct, sustainable) internal model
into (broken, leaky) familiar abstractions.

Git isn't your friend. Git is your toothbrush. Imagine your dental hygienist
telling you they're disappointed in how infrequently and badly you're brushing
your teeth. The right response to that isn't _blaming the hygienist for making
toothbrushes annoying to use_ , right? It's _shame_ , shame that you haven't
bothered to overcome the stupid stubborn ignorance getting in the way of you
taking better care of your teeth. Shame that you can't "meet your toothbrush
where it lives."

Meet Git where it lives.

~~~
ubernostrum
If git exposed a consistent, understandable interface to its data model you'd
have a point.

Mercurial exposes a much more consistent, much more understandable interface
to a very-similar-to-git data model, so it's not like this is impossible.

So it'd be really nice if people could be allowed to point out the
shortcomings of git's command-line interface without other people implying
their complaints must be rooted in being too dumb and/or too lazy to
understand git.

~~~
jeremy_wiebe
Having never used mercurial. What are some of the places where mercurial
provides a better interface than git?

~~~
marcinkuzminski
Just read this: [http://stevelosh.com/blog/2013/04/git-
koans/](http://stevelosh.com/blog/2013/04/git-koans/)

And then think about that Mercurial simply makes handling
branches/updates/resets in a very clean and consistent way.

~~~
amdavidson
Clearly not a new post, but the first time I had seen it, clearly spells out
what has frustrated me (and many others) for a very long time.

------
AceJohnny2
See also "Git from the Bottom Up" [https://jwiegley.github.io/git-from-the-
bottom-up/](https://jwiegley.github.io/git-from-the-bottom-up/)

I read lots of tutorials on Git when I started with it just a few years ago,
and that's the one that best helped me grok it.

~~~
josteink
Am I the only one who read the "official" Git ebook and found that perfectly
understandable?

[https://git-scm.com/book/en/v2](https://git-scm.com/book/en/v2)

~~~
AceJohnny2
Yes, the book is excellent and also recommended :) But for me, having an
understanding of git's internals really helped provide a foundation for the
rest. Otherwise a lot of the stuff (like rebases, or amend commits...) were
just black magic.

------
no_protocol
I like the writing style and the scope of the piece. Well done.

I kind of wish there were at least mentions of git plumbing commands where
appropriate, to shake off one more level (half a level?) of magic. For
example, just link to some information on `git hash-object` in the section on
`git add`. Footnotes would probably be enough. No need to bog down the
relatively quick pacing. Sometimes it can be hard to discover which plumbing
commands correspond to the actions mentioned.

Most git tutorials come with diagrams of blobs and trees and branches with all
the arrows and color coding. They get the meaning across but often seem to
come with a bit of a mental disconnect from what is actually happening in the
working directory and .git directory. Does anyone know of a tool to display
that kind of diagram in real time while you are making commits or checking out
new branches? It could bring an extra level of interactivity to the
presentation. Imagine if the graphs on this page were updating live while you
had to type the git commands to get them to update AND you could monitor the
filesystem at the same time, showing exactly which files were changed by the
command.

~~~
qznc
Github built one: [https://try.github.io](https://try.github.io)

More: [http://sixrevisions.com/git/interactive-git-
tutorials/](http://sixrevisions.com/git/interactive-git-tutorials/)

------
davewhat
[http://learngitbranching.js.org/?NODEMO](http://learngitbranching.js.org/?NODEMO)

I haven't seen a git article link to this amazing website recently. By far one
of the best ways to teach someone git is to walk someone through git by
executing commands and allowing them to see the visual representation of those
commands.

There is also an amazing single-player learning mode.

------
avip
OT (or not) -

Of 20 most popular Qs on SO, 7 now ask how to do trivial operations in git.

[http://data.stackexchange.com/stackoverflow/query/36656/most...](http://data.stackexchange.com/stackoverflow/query/36656/most-
upvoted-answers)

Take it or live it, git has facts-based proven track record of ui wtfness.

~~~
qznc
The two top questions are "How to modify existing, unpushed commits?" and "How
to undo last commit(s) in Git?". That could be interpreted as evidence in
favor of a staging area. People actually need to change committed things very
often.

On the other hand, on place 20 is "How to undo 'git add' before commit?",
which displays a UI shortcoming with staging.

~~~
avip
It suggests staging is confusing for newcomers (or maybe - for the mass in
general).

~~~
crooked-v
A simple way to make it clearer: have 'git stage <file>' and 'git unstage
<FILE>' commands, instead of the extraordinarily unintuitive 'git checkout --
<file>' for the latter.

------
MichaelBurge
git by itself, I recommend reading the README on the initial commit:

[https://github.com/git/git/commit/e83c5163316f89bfbde7d9ab23...](https://github.com/git/git/commit/e83c5163316f89bfbde7d9ab23ca2e25604af290)

It was only 1000 lines of C at the time, so it couldn't have needed a million
different articles and blogs to explain.

That doesn't give you a workflow or explain any of the more advanced features.
The workflow you can get from any cookbook list of shell commands, and the
advanced features you can get from the manual.

~~~
glandium
Direct link to the README:
[https://github.com/git/git/blob/e83c5163316f89bfbde7d9ab23ca...](https://github.com/git/git/blob/e83c5163316f89bfbde7d9ab23ca2e25604af290/README)

It's interesting to note that what is currently the index/staging area was,
back then, only a cache. (which is also why you'll still find it called any of
these three names, participating in the overall confusion)

------
martijn_himself
Off-topic: this must be one of the most beautifully designed sites I've come
across lately.

I wish more sites were an oasis of calm like this.

------
benhoyt
Very good read. I was just hacking around with a tiny Python program that
implemented enough to init, add, commit, and push itself to GitHub. It's all
very simple until you get to the index format ... which isn't that bad, but
it's definitely more complicated than this article makes out.

She refers to .git/index as a text file where "each line of the file maps a
tracked file to the hash of its content". However, .git/index is actually a
binary file where each entry is a bunch of different fields like creation time
and modify time and SHA-1 encoded in binary. See
[https://github.com/git/git/blob/master/Documentation/technic...](https://github.com/git/git/blob/master/Documentation/technical/index-
format.txt)

So I wasn't sure whether this part of the article was simply wrong, or whether
git index format "version 1" was text, or something else?

~~~
benhoyt
For future reference: I asked Mary (the author) about this and she said,
"Great point! As far as structure of the data, the file may as well be a text
file of lines of information. I decided to omit talk about how git actually
stores data to focus on the structural concepts. Maybe I should have included
a bullet point. I don't actually control the version of the article that was
posted on HN. but I'll certainly update the version on my website."

------
Meic
See also [http://eagain.net/articles/git-for-computer-
scientists/](http://eagain.net/articles/git-for-computer-scientists/)

------
noufalibrahim
I think this is a good way to teach git.

I approach the whole thing similarly during my trainings and wrote a few dirty
scripts to generate an image of what the repository looks like using graphviz
[https://gist.github.com/nibrahim/6119925](https://gist.github.com/nibrahim/6119925)

~~~
zwischenzug
I did similar with mermaid:

[https://github.com/ianmiell/learn-git-the-hard-
way](https://github.com/ianmiell/learn-git-the-hard-way)

[https://github.com/ianmiell/learn-git-the-hard-
way/blob/mast...](https://github.com/ianmiell/learn-git-the-hard-
way/blob/master/learngitthehardway.pdf)

------
eropple
This is how I teach git when I do training sessions for companies. I really
dig this approach; the plumbing on top of Git is not really sufficiently
abstracted to avoid knowing this stuff, but at the same time it tries to hide
just enough of it to end up biting you when (not if) something goes sideways.

------
cakeface
This reminds me of The Git Parable which helped me understand git when I was
first getting started. I really do believe that you have to understand how git
actually works under the UI in order to have long term success using it.
Whether it's bad or good to need this understanding can be debated but I
believe that it is truth.

[http://tom.preston-werner.com/2009/05/19/the-git-
parable.htm...](http://tom.preston-werner.com/2009/05/19/the-git-parable.html)

------
GoToRO
The thing that is missing from all tutorials is that branches, tags and
everything else are just pointers to a node in the tree. I.e. branches have no
"content".

------
preordained
Good stuff. Can't help but agree with others that I don't really want to be
forced to be intimate with Git at a gory insides level, though. I use tortoise
Git, and perhaps it's removed the temptation to experiment with things that
could blow my foot off, but 99% of the time I have no reason to drop to the
command line--nor do I want to.

------
scandox
There's also this from the same author:
[https://maryrosecook.com/blog/post/git-in-six-hundred-
words](https://maryrosecook.com/blog/post/git-in-six-hundred-words)

She's a really clear informative writer

------
yread
This is all simple stuff. It's when after a rebase there are conflicts in
files where they shouldn't be and the CI says "failed" then my knees weaken
and i yearn for a guide that would explain everything

------
partycoder
Explaining version control might be a bit challenging. Distributed version
control is a bit more challenging. So teaching git can be a lot to take for a
newcomer.

I went from subversion to git. In retrospective, subversion was much simpler
conceptually (but problems like syncing branches were harder).

I found myself once explaining a git concept based on the plot of Back to the
Future II. I think it was a perfect example to how to resolve some merge
problem.

There are some "git cheatsheets" that provide a very straightforward graphical
explanation of what some commands do. That helped me to consolidate some
concepts.

------
disposablezero
Anyone really interested in git internals should look at git-draw.

[https://github.com/sensorflo/git-draw/wiki](https://github.com/sensorflo/git-
draw/wiki)

------
bobthedino
Worth linking to the excellent YouTube version of this too:
[https://www.youtube.com/watch?v=fCtZWGhQBvo](https://www.youtube.com/watch?v=fCtZWGhQBvo)

------
RobinL
This is really excellent. I have now read it three times, running through the
commands as I go, and for the first time I feel like I actually understand
git.

Here are some commands I found useful as I went through the tutorial: To show
a given commit without diffs but with tree and parent sha1s: git cat-file -p
sha-1

To show a given commit including diffs: git show sha-1

To show a given tree (blobs and subtrees): git ls-tree sha-1

Show the tree recursively (i.e. show sha1 of all files, ignoring folder) git
ls-tree -r sha-1

------
supersan
This is one of the best ways to lean new techonology. I remember that long
back ago when i was in college i had some trouble understanding some aspects
of web servers and so i decided to write a small web server in perl and soon
soon i kinda knew it inside out. Same for writing my own Smtp client, the
knowledge i gained from it will be with me forever.

~~~
LoSboccacc
True. Never understood git restriction to empty folders until I had to build
my filesystem-on-database

------
caf
This kind of makes me wonder why you have separate objects/ stores for each
repository you have checked out - it seems like you could have just one in
your $HOME that all your git repositories share.

------
golergka
Read and upvoted this previous time it was posted. Reread and upvoted it this
time too.

This is a good example explaining why reposts should be not only allowed, but
encouraged on HN.

------
mnsc
Or just check out the talk with the kindergarden teacher wearing bib pants
that stacks balls and pins... Git for ages four and up I think.

------
JustSomeNobody
Author has a lot of good reads on her blog. Also, some of her live coding
demos on YouTube are fun to watch.

------
erikb
This is how you do git. Read it. Learn it. And you will see the light of
version control awesomeness!

------
rplst8
There are a lot of comments here such as "Git has a horrible UI/UX", "I don't
want to have to learn the inner workings of a tool", "Certain systems are
designed so well that they can be understood as black boxes", and a lot of
other complaining about Git and it's idiosyncrasies.

I think this needs to be dissected a bit. First, Git operates in a manner
(internally) that is foreign to most users of _other_ SCM systems. Second, Git
has a bit of a "tacked-on" nature to it's CLI that can make use cumbersome for
newcomers especially when they have been taught the shortcuts before the
fundamentals.

For the first problem, I think this is where the black-box comments apply. And
honestly, I think treating SCMs as black boxes is what got us into the
situation we were in before Git. Version control, branching, merging, change
management, and change deconfliction are _hard_ problems, IMO. Personally, I
think the base level functions that Git provides, combined with Git workflows
from Atlassian (and others) really helps provide a daily routine to handle
these situations. After cloning: branch -> change -> index (or update) ->
commit -> pull -> fix conflicts -> commit -> push -> merge (or pull request)
-> repeat. There are some variations depending on your branching model, but
by-in-large this is what prevents regressions and forces people doing the
committing to resolve the changes and not to break master.

I think you need to understand the "internals" of any SCM to really be able to
conquer the challenges of distributed version control and the complexities of
modern software development. I've worked in Rational ClearCase shops, and we
needed a ClearCase guru on site too. Every team should have a Git guru.

For the second problem, yes, the CLI is a bit clunky at times. This, combined
with a misunderstanding of Git fundamentals can lead you down some bad paths.
Cleaning up the CLI is an independent problem from Git internals - and I'll
admit some taxonomy/hierarchy/ontology/whatever of commands is probably needed
to refine the day to day workflows. However, if you mess up the repo because
you don't understand the branching and merging model, you are going to have to
use the more "specialized" commands which, let's face it are going to be a bit
more cryptic. This is the same for any system that has some maintenance or
repair type functionality.

This is why I say, learning how Git works, allows you to learn the branching
model better, which will hopefully allow you to avoid those particularly
thorny paths.

Sure, you can choose some other SCM system that seems less cryptic or easier
to use, but you will likely find yourself in a bind someday in those systems
that you will need it's cryptic commands to get out of. Or more likely, doing
a lot of work that Git would have allowed you to do in a fraction of the time.

------
ryenus
Too bad git clone and fetch are still not resumable.

~~~
zimmund
how would you implement it?

------
kasabali
Mods, can you please add 2015 to the title?

------
andrewvijay
just when I needed the most. Thanks a lot homie!

