
The Architecture of Git (2012) - wheresvic1
http://aosabook.org/en/git.html
======
nine_k
The best (known to me) informal intro into the architecture of Git is The Git
Parable: [http://tom.preston-werner.com/2009/05/19/the-git-
parable.htm...](http://tom.preston-werner.com/2009/05/19/the-git-parable.html)

~~~
svat
Also greatly worth reading: “Git from the Bottom Up”:

* HTML version: [https://jwiegley.github.io/git-from-the-bottom-up/](https://jwiegley.github.io/git-from-the-bottom-up/)

* PDF version: [http://ftp.newartisans.com/pub/git.from.bottom.up.pdf](http://ftp.newartisans.com/pub/git.from.bottom.up.pdf) (via [http://newartisans.com/2008/04/git-from-the-bottom-up/](http://newartisans.com/2008/04/git-from-the-bottom-up/))

The great thing is that after reading and understanding these, one's mental
model matches the reality of the Git program, so one can both try bolder
things, and get unstuck from any mess.

~~~
freshhawk
I recommend this constantly, it is excellent.

Not to outright beginners, but anyone past that point should read this. It is
a great job of clearly presenting an accurate mental model that helps you use
git. If you are already a git expert this is just a good thing to learn from
in general and with how to explain and teach git usage in particular.

The only documentation I know of that can turn people from cargo cult git
users to people who just do the version control things they need done with the
parts of git they need. That is damn useful.

------
vram22
I had come across this gem about understanding the vi editing model a while
ago:

"Your problem with Vim is that you don't grok vi."

It's the top answer on this StackOverflow question:

What is your most productive shortcut with Vim?

[https://stackoverflow.com/questions/1218390/what-is-your-
mos...](https://stackoverflow.com/questions/1218390/what-is-your-most-
productive-shortcut-with-vim)

Wonder if there is any such post about Git that cuts to the chase, even for a
part [1] of the Git model, and explains it clearly [2].

[1] I had also come across a StackOverflow post that explains a part of git
very clearly, like the vi example I quoted. I think it was about how to roll
back accidental changes using "git reset --hard" and variants. Saved it, but
don't have it handy right now.

[2] Note: I said "clearly", not necessarily "simply". I like to quote the
(probably out-of-print) book by Abbe Dimnet [3] called The Art of Thinking, in
which he said something like this (while deploring the trend of books that try
to make things artificially simple, a.k.a. dumbed down):

"French grammar cannot made simple. It can be made clear."

[3]
[https://en.wikipedia.org/wiki/Ernest_Dimnet](https://en.wikipedia.org/wiki/Ernest_Dimnet)

He also wrote a book on that same topic (French grammar made clear). Googled
the former book recently and saw about the second book. I've read the first,
long ago, which I found in a second-hand bookshop. Good book. Apparently it
was a best-seller at the time it came out, according to Wikipedia.

Quotes by him:

[https://en.wikiquote.org/wiki/Ernest_Dimnet](https://en.wikiquote.org/wiki/Ernest_Dimnet)

------
sfescape
I'm surprised that it's not mentioned in the article that one of the most
interesting architectural aspects of git is that it's a blockchain system.

~~~
erwan
That's arguably _not_ the most interesting architectural aspect of Git.

Or does any back-linked tree data-structure becomes interesting if the nodes
keep a hash of their parent instead of a raw reference? I don't think that's
the case.

It might be a bit heretical but I don't think Git has a super interesting
internal architecture. I'm not downplaying the fact that Git was very
innovative, especially considered the landscape of SVMs at the time. The tool
as a whole is great and has desirable properties but its internals don't
strike me as particularly innovative. It's a clever composition of solutions
to well established problem domains. And in that aspect it is a beautiful
engineering solution although there is room for a lot of improvement in terms
of UX.

And in addition to that, I would argue that it would be a very weak definition
of "blockchain". The innovation in Bitcoin is the incorporation of proof-of-
work and resulting alignment of incentives such that it can achieve
probabilistic consensus in an adverse setting and with some degree of
asynchronicity.

The underlying structure of the data is an obvious choice because it is simple
and "captures" the idea of aggregate global state, but it's also hardly an
important innovation. UTXOs are more significant.

And also, recall that the textbook example for state-machine replication is
always an append-only log for example. So that's not the crux of it, or of
blockchain, in my humble opinion.

~~~
sfescape
I said one of the most. I didn't declare it the most...

------
devy
Very informative. This explains well about why Linus once said signing every
single Git commit is unnecessary and it only means you have automated the
signing process. [1]

[1]:
[https://news.ycombinator.com/item?id=12290873](https://news.ycombinator.com/item?id=12290873)

------
50
Not sure if it’s of any relevance but I recently discovered
[https://gitup.co](https://gitup.co) and must say, it’s pretty cool.

~~~
dilap
Frickin' _amazing_ UI design, and some great pieces of functionality (edit the
commit graph directly; ability to undo most operations).

Unfortunately development seems mostly paused, and it still has some big
gaps...

Still tho by far my favorite git client. (#2 would be Sublime Merge.)

------
royalghost
I still recommend people to watch this video from the man himself, Linus
Torvalds, if they really want to understand the architecture and data
structure behind git -
[https://www.youtube.com/watch?v=4XpnKHJAok8](https://www.youtube.com/watch?v=4XpnKHJAok8)

If I remember correctly, he mentioned that he wrote the MVP in 4 weeks and the
data structures he used are quite simple. Never got a chance to look at the
source code but I guess they are in github (at least the mirror copy.)

~~~
nickthemagicman
It only took 4 weeks plus 30 years of programming experience and a genius
mind.

------
dabei
The whole website is a treasure trove for learning about system design.

------
macspoofing
"The Architecture of Open Source Applications" is great. I highly recommend
their other articles, as they are all interesting and informative. If you can,
throw a few bucks their way by buying the PDF or paperback versions.

------
xrd
I still prefer my book's history of Git in which I compare Linus rejection of
Monotone to the movie "Back to the Future":

[https://github.com/xrd/BuildingToolsWithGithubBook/blob/ca7f...](https://github.com/xrd/BuildingToolsWithGithubBook/blob/ca7f2c0b24496019a56fb1a1e3d81c752d601082/old/chapter-
bootstrapping-git.asciidoc#monotone)

~~~
agbell
Interesting, what am I reading?

~~~
xrd
This (a book about building tools with Git and the GitHub API) was published
by O'Reilly in 2016, and we just released it under creative commons. You can
read it completely free at
[https://buildingtoolswithgithub.teddyhyde.io](https://buildingtoolswithgithub.teddyhyde.io)
or get the repo of the book contents above.

------
adamkl
I've found this online course by Paolo Perrotta to be the best introduction to
Git's architecture, by far: [https://www.pluralsight.com/courses/how-git-
works](https://www.pluralsight.com/courses/how-git-works)

He takes you pretty much from first principles all the way up to how remote
repositories are tracked. Its been far more useful to me than simply trying to
learn the CLI.

I realize its behind a paywall, but I'd recommend signing up for the free
trial just to watch this course (if you're curious about the inner workings of
Git).

------
frou_dh
Git is the VCS most suitable for bottom-up learners. Which is the VCS most
suitable for top-down learners?

~~~
crispyambulance

        > Which is the VCS most suitable for ....
    

Git.

It doesn't matter what your question is. :-)

I mean, come on, it's not like you have a choice. You gotta use whatever your
teammates/coworkers/organization is using.

That said, yeah, there's a lot of pedagogic problems with git stemming from
the INSANE inconsistency of command-line. The only redeeming quality? It works
and it's popular.

~~~
pfranz
I really hope Git isn't the end of the road for VCS. I work in visual effects
and video games. Most video game studios still begrudgingly use Perforce for
project data. Many also have a separate Git server for code. Perforce or SVN
is still the go to solution for binary assets. Trying to explain Git to
programmers is difficult enough, for artists who have never used the command
line it is unreasonable. Every time I've seen an "asset library" it's usually
written from scratch instead of built on another VCS. I've made some crude
attempts at building on Git and the data just structure isn't appropriate.

I know it's a bit unreasonable to expect the same tool for everything (but
that's what you were implying). The binary problem is more manageable for code
with a few UI assets (like icons), but isn't great.

~~~
wereHamster
I work in web development and Git has usually been good enough for our
projects. But recently we have had a few projects that used large files and
started using git-lfs in those. Is git-lfs not good enough for storing large
assets in a git repository?

Speaking of non-coders using a VCS, our designers use Sketch (a macOS app) and
until recently have used Dropbox for sharing and put the date into the file
name as a form of version control. But they have now started using the
Abstract app ([https://www.goabstract.com/](https://www.goabstract.com/))
which is a fancy UI around git (I think, not sure though), but none of that
VCS complexity is leaking through. And they seem to like it. So maybe all it
takes is a custom GUI that's tailored for a narrow and specific use case.

~~~
pfranz
On those web projects, is Git used just for versioning the delivered assets
(psd, jpg, and gifs) or is it used for the working files (Illustrator,
Photoshop, etc)? I've used it for the former, which is why I said it was more
manageable, but not a great solution. If you were to treat it like we do code,
you'd only commit the working files and have a build script to generate the
deliverables.

Those are tooling problems, but I also think there are architectural problems.
I don't have a lot of experience, but I have looked at git-lfs. You need a
separate repo and also a separate path for data, right? It's also an add-on.
It's all working around Git itself. For artists, what's the value add over
Perforce or SVN? I can see that maybe you could use the same tools as the
coders, but you have a bunch of new problems. I'm not saying it doens't have
its place. I can see myself using it in the future, it just doesn't look like
an out of the box solution.

A few years ago I was toying with something that would be more like how a lot
of backup systems work (and I think macOS attempted something like this a few
years ago and abandoned it). Each time you save it will make an auto-commit,
if your app has integration, it takes a screen shot and stores any metadata
from the scene (this was targeting a 3d application, but with the goal of also
working with the filesystem directly). You could also make an explicit commit
and give a commit message. Auto-commits would get flattened into hourly,
daily, weekly. Explicit commits would stay around as indefinitely as you'd
want. Git was a poor backend for this because I don't think you can merge
commits in the background, that also rewrites history. This was focusing on
version control for an individual, ignoring collaboration and merging.

The whole reason Git was written is because the data-structures facilitate the
needs of the Linux kernel's programmers more cleanly (“Bad programmers worry
about the code. Good programmers worry about data structures and their
relationships.”) I feel like a lot of these Git tools for artists are coercing
Git's data structures into an awkward workflow. That's the biggest reason why
I hope it's no the end of the road for VCS.

Thanks for introducing me to Abstract. I know I've seen similar attempts in
the past. I hope it's useful to artists and I hope Abstract is a successful
business, but I'm a bit bummed that Open Source is relegated to a sub-group of
programming.

------
dmoreno
I loved the AOSA books, but there's been sometime that they don't publish a
new one.

Do anybody know if there is something new in the works?

And of any other similar books?

~~~
emmanueloga_
Have you already read all of them? I was calculating how long it would take me
to read and understand every chapter of every book. I usually like to write
small snippets of code or look for references and stuff for books like these.

Anyway, if I read, say, one chapter every Monday, it would take me about two
years to complete the books!

Probably not a problem that there's "nothing new" since the books seem to be
more about timeless design principles and less about novelty.

------
chmaynard
The fourth paragraph in section 6.2 refers to "BitMover" with no previous
context. Copy-paste error?

~~~
sitzkrieg
there is context in the intro (see bitkeeper)

~~~
chmaynard
I found nothing about BitMover here:

[http://aosabook.org/en/intro2.html](http://aosabook.org/en/intro2.html)

~~~
sitzkrieg
sorry, I meant in the git background section 6.2, it briefly mentions BitMover
as the developer of BitKeeper, but that's all needed to know about it

------
ArchTypical
> To understand Git's design philosophy better it is helpful to understand the
> circumstances in which the Git project was started in the Linux Kernel
> Community.

I don't understand this sentiment. It's not helpful to know the history at
all. At best, it romanticize the choices made. Stating the goals would be an
intro that shows some level of analysis.

~~~
gmueckl
No, the author is right. Git us in its core a database for managing patches.
Understanding the needs of Linus Torvalds as his role of Kernel maintainer is
about the only good way to understand why git is so strangely designed.

~~~
avar
Darcs is a database for managing patches. Nothing in Git inherently cares
about patches. To a first approximation it's a database for managing full
snapshots of trees of files.

~~~
gmueckl
Git only needs lists of files because it needs entry points into its lists of
patch fragments that make up the file and to assign file names to them. Other
than that, a changeset is just another name for a patch that can be added,
altered, rewritten or removed. That makes git a patch database in my book.

Darcs feels more like a research project to me. The developers try to find a
theoretical foundation in which they can base a VCS, but they have not managed
to make their theory work with the level of perfection that they want. But if
they eventually get it right, it will probably have the provably best text-
based merge tool possible.

~~~
avar
This is not how Git's data model works. You may be thinking of delta-
compression which during "git gc" and _purely_ as an optimization step does
delta-compression across content in the repository.

But that's purely an optimization that has nothing to do with the intrinsic
data model. There's no point at which the patch output you see with "git
diff/show" is actually stored as-is in Git. It's computed on-the-fly.

This separates Git from many other SCMs where patches or other deltas are
permanently stored at the time of commit in a way that can't modified
afterwards.

The distinction matters because those systems generally have storage that
doesn't compress as well, since they need to compute and store a diff at the
time, whereas a system like Git can keep finding better delta candidates as
history progresses.

This goes all the way back to the likes of RCS. The Subversion FSFS backend
also works like this, and I believe Mercurial to some extent, and certainly
Darcs since storing a history of patches is what it's for.

