Hacker News new | past | comments | ask | show | jobs | submit login
Understanding Git for real by exploring the .git directory (medium.com/pierreda)
365 points by adamnemecek on Feb 21, 2016 | hide | past | favorite | 87 comments

I worked with someone whose approach was very interesting: he committed the .git directory of a newly initialized repo to a separate, newly initialized repo. And then watched what changed when he added a file, changed things, branched, etc, in the committed .git directory.

It's always seemed worthwhile to me to dig into git's model more, but if you're already comfortable and productive with stuff like detached head/rebasing/basic workflows, it's hard to justify when you're already trying to find time to learn new languages, frameworks and devops tools ...

for about six months my git workflow was this:

git add -A

git commit -am "fixed some stuff"

but I've finally found some time to start digging into how to really use it.

The issue I have with it is that if you step outside the basics it's so easy to get yourself into a thorn bush and the way that git is explained most places is really not intuitive at all.

Try using a tool like 'git cola' that allows you to selectively commit soecific lines, instead of just by file.

Start a project and attempt to maintain a clean history. Break up changes logically into separate commits. To the point where 90%+ don't need more than the subject to describe a change and the history can be read like a story.

Use 'gitk --all' to view the tree of changes. get comfortable with using feature branches, using interactive rebasing to clean up a messy history of changes, etc.

The CLI works great for most things but visual tools make it much easier to write a clean history and reason about the changes over time.

You can also commit specific lines using `git add -p`

Adding on to your helpful comment, you can also get a more concise, gitk-like graph of commits with `git log --graph`. This is my standard alias for viewing history, which also adds colors and tag/branch names:

    lg = log --graph --pretty=format:'%Cred%h%Creset -%C(yellow)%d%Creset %s %C(magenta)(%cn)%Creset %Cgreen(%cr)%Creset' --abbrev-commit --date=relative --left-right

It's much easier when you can see all of the staged/unstaged files and drill down to staging/unstaging individual lines within those files.

It also highlights whitespace nastiness (ie trailing spaces, missing newline at end of file, inconsistent newline chars, etc).

The CLI is the ideal too for a lot of things. Preparing commits is not one of them.

You can see all the files and drill down with "git add -i". Git also highlights whitespace problems.

I was excited when I first heard about git add iteractive mode but I find the interface to be quite unintuitive and idiosyncratic. I just stick to using "git add -p" to stage hunks selectively in a much more straightforward manner.

Having said that, I stage all my commits on command line, and generally go to a GUI rarely for certain visualization tasks.

I use GitX (forked/updated version) so I can stage/commit line-by-line and see a nice branch/merge visualization: https://rowanj.github.io/gitx/

Ah yes, the "subversion" method of using git.


I also do this. :|

Also the method that gets logging, debuggers and temp files committed by accident.

Or authentication tokens. How many ssh keys or database passwords have been lost like this?

Been there. You stop doing it when you start collaborating with others. It just doesn't work then. ;-)

And I don't think it's an issue if you are using continuous delivery.

This is still my git workflow. Aside from the off times I have to rebase or revert a commit.

I'm curious as what git commands you've found the most valuable or you've used the most since digging deeper into git.

* Using stash to store stuff when I want to pull a remote in that will overwrite things I'm not ready to commit

* git add -p, git add -i are nicer ways to add files

* git grep

* git reset, revert, and checking out old commits

     - these commands I currently find tough to get right

     - this is mostly because I don't really get the HEAD~2 ^ and what the syntax is for accessing older stuff
* git fetch and merge -i instead of pull. I got burned by using pull a few times.

Most of this stuff is stuff that I've known about since I started using git but I was afraid to use it because I didn't really know how it worked and didn't want to "mess up". Since the previous comment describes 90% of what I need to do, there wasn't really any point to doing it any differently.

The biggest problem I have with git that I have yet to solve is that I will be working on something on my laptop and then want to switch to my desktop and pick up where I left off. This leaves the obnoxious necessity to commit for just syncing things instead of for actually finishing a feature. I don't want to do rebasing because I don't want to lose history. This is the main reason I don't think git is currently an ideal solution for me but since I have nothing better I'm stuck with it.

It needs a simple semantic interface and it needs the ability to "sync" in-between commits.


    git checkout -b wip-syncing
    git commit -m "wip means work in progress"
    git push <whatevs> wip-syncing
on your laptop, then

    git fetch --all
    git checkout <whatevs>/wip-syncing -- .
    git push <whatevs> :wip-syncing
on the desktop. of course, this "rewrites history", but only in a very localized way.

In general, you're going to be fighting against git if you take an absolutist stance against rewriting history. Which is fine! But a little bit of controlled rewriting can open up a lot of options.

Edit: and I'm typing this from memory on my phone so please don't copy and paste the commands without verifying that they work correctly first!

not a bad idea to keep a separate branch for doing that. I might try that out.

> The biggest problem I have with git that I have yet to solve is that I will be working on something on my laptop and then want to switch to my desktop and pick up where I left off.

Could you use something like rsync or unison to sync the working directory (including the .git directory) between your desktop and laptop? I'm new to git myself, but after reading through the OP article I imagine this would work.

Yeah I've thought about rsync. It just seems like a half solution and I'm not really sure how well it would work when I'm off my home network. Sometimes I ssh into the desktop because my laptop is old and run into limitations with front end build tools.

a couple things helped me get into a comfortable flow w/ git: realizing git stash creates a commit (accessible via git reflog show stash). v helpful for managing interrupts, and gaining confidence you're not going to lose any work.

also, learning to be quick to create (and dispose of) branches, as they're just names.

> accessible via git reflog show stash

You can just do "git stash list".

IIRC, standard git clients don't show you the commit sha with 'git stash list', hence the extra few chars (easily skipped w an alias) are worthwhile. shrug

For some time now, we use the rebase workflow. (create your branch, do some work, rebase on master, push).

It is a great way to have a clean linear history.

But it makes git pull 'illegal' because it does a merge implicitely.

That's tipically something I didn't think about the first times I used git.

> It is a great way to have a clean linear history.

Why is this considered by so many people to be a Good Thing? Engineering is an inherently messy human process, and the repository history should reflect what actually happened. To that end, I've been advocating a merge-based workflow instead:

- The fundamental unit of code review is a branch. - Review feedback is incorporated as additional commits to the branch under review. - The verb used to commit to the trunk or other release series is 'merge --no-ff'.

Under that model, merges are very common, particularly merges from the trunk to the feature being developed. But that's OK, because its what actually happened. When most people perform a 'rebase', they are actually performing a merge, while dropping the metadata for that merge.

Before reading more about rebasing, I wouldn't have an opinion here, but like most things in programming I think it's a matter of philosophy. Do we want the history to be "record of what actually happened" or "story of how your project was made." [0]

I see merits in both approaches: Rebase seems to be good when you want to focus on the project minus the process, while merging seems to be good when you want to know the process behind the project. For larger projects with multiple contributors, I think the merging approach is better because of the process visibility. For smaller projects with one or two developers, a rebase approach could be "cleaner" when looking through the logs later on.

I'm interested to hear what other's opinion on the topic as well.

[0] - https://git-scm.com/book/en/v2/Git-Branching-Rebasing#Rebase...

It is an interessting analysis. I think you're right saying that its a matter of philosophy after all.

In my experience, the clean linear history can be important when you build a product which is going to be certified since the developement process is key to obtain the certification.

Also, I like the fact you can always reorganize your commits before rebasing, making them more atomic / cleaner.

    git pull --rebase
doesn't merge implicitly and

    git config --global pull.rebase true
will set that as the default `pull` behavior.

Note that

  git config --global pull.rebase true
was added in v1.7.9 - if you're using an earlier version of Git for whatever reason, the config you should be setting is

  git config --global branch.autosetuprebase always

Didn't know that. Thanks

I once had a filesystem watcher watching a git repository. It does not give you detailed information about what changed in the files, but shows in real time what files are changing while you do your regular work.

The most enlightening introduction to git internal model (graph of commits) and how the main commands alter it I have read so far: https://jwiegley.github.io/git-from-the-bottom-up/

I think it is slightly more relevant to understand the model than the .git/ structure since the .git/ folder is just an implementation detail.

The point of this is to also understand just the implementation.

Could someone please post a link to a PDF version of this article?

For offline use? But git is a dvcs... ahem, anyhow, wkhtmltopdf[1] is a reasonable choice for converting arbitary urls to pdfs, and can be installed via apt, homebrew cask etc.

[1] http://wkhtmltopdf.org/

Obligatory link to Charles Duan's most excellent git tutorial:

> you can only really use Git if you understand how Git works. Merely memorizing which commands you should run at what times will work in the short run, but it’s only a matter of time before you get stuck or, worse, break something.


Somehow I do not see this mentioned often despite it is, in my opinion, hands down the best tutorial for git. Not too short, not too long, not too simplifying, not too complicated. Just right.

Is this article available as a PDF document?

The article claims to be aimed a beginners: "There are a lot of posts out there about learning the basic commands of git, this is not one of them. What I’m going to try here is a different approach."

The article rings a lot of bells because I actually do understand how Git works. I'm not so sure it is understandable by someone who is new to git.

The second sentence of the actual tutorial part: "When you create a git repo, using git init". I'm sorry, what is a repository? Some sentences later: "Here is what’s your .git will look like before your first commit:". What is a commit exactly?

I once read another tutorial, which I can't find now. It reads as a story of a group of people exchanging files and in order to avoid administrative mess, eventually end up inventing the core of git, because it solves their problems. I think that tutorial is much better for explaining the concepts of git to beginners.

Edit: I think this is the tutorial i mentioned: http://tom.preston-werner.com/2009/05/19/the-git-parable.htm...

One can be new to git but familiar with i.e. SVN or with VCS theory in general. Sometimes I really do hate it when I'm trying to learn how to use a new tool and every beginner's tutorial starts with explaining how computers work or with some story that is somewhat relevant.

That's a perfectly fine parable that, except for the distributed part and some vocabulary, could describe _ANY_ modern version control system not just git.

THe problem with git is not that it can't do those things, but that it is hard to attain mastery of git commands even if you know what version control is supposed to do.

That's unfortunately the problem with most tutorials: they just assume you have all the required knowledge. It's hard to make good learning material.

When I was still quite new to git, I attended a presentation based on The Git Parable.

I found it to be an enlightening talk and it helped me understand git better than what I had until then.

I saw this posted a couple years ago, but it was a great guide to starting out with git: http://pcottle.github.io/learnGitBranching/

I just have to say, I've gone through that guide and it was great. It really helped me understand the refs (branching) model.

It was a couple of years ago, so it may be even better now.

Thanks for posting it. I always had a really hard time trying to track down the link.

I feel this one is more well-written and complete:


That and The Git Parable (http://tom.preston-werner.com/2009/05/19/the-git-parable.htm...) gave me a wonderful understanding of Git.

I've been using git at work for a couple of years and I haven't spent very much time thinking about the internals at all. It just seems to work pretty much, though sometimes I get into a weird state and just delete and redownload.

That's the thing, there is almost no reason to ever have to delete and redownload. Moreover, the reason that git is dominant is (network effects, and) because it is solving the right problem with the right internal abstractions. It has even managed to become popular in spite of its user interface.

There is a reason to re-clone. It fixes every weird git state. And you don't even need to learn a bunch of theory to do it.

We are all ignorant in most respects; there is an opportunity cost in knowing anything. If you use git daily, I submit that it does not make sense to be ignorant of how it works. Yes, you will take no physical harm from misusing this tool, but you're setting yourself up for failure. Reading about how git works will prevent you from making the mistakes that lead to an unworkable state, and allow you to resolve any unexpected situations or errant keystrokes. To make something of an analogy, git has the most powerful and flexible form of "undo" that has been conceived to date, and you are discarding this because you can always re-download an old copy and redo your work. If it is something that you use only rarely, there is nothing wrong with choosing to study other things. If you are employed as a software developer then I would consider you to have very little excuse for ignorance.

I have never had to resort to this technique. However, there's a lot in git I don't understand. I've spent multiple days learning git, and at some point, the diminishing returns of further investment don't justify themselves. If I broke my repo in such a way that caused me to consider re-cloning to "get out of jail", I probably would only do it if I didn't lose a significant amount of work. (e.g. copy in-progress files to a safe location before recloning)

You say I have very little excuse for my ignorance, but I would also say that I have very little excuse to spend any more time learning a tool with as many dark corners as git.

Yeah usually there's a way I can figure it out but if it takes more than 15 seconds it's easier to just clone the repo again.

You should never have to redownload unless somebody is carelessly force oushing changes to the remote.

If you're about to do something 'weird' like fix a merge conflict, you can save the changes by creating a branch.

Sometimes you have to go back and fix a previous commit to reconcile a conflict. The 'gitk --all' command is your friend here. With it it's easy to visualize the history, reset, cherry-pick commits, etc.

How many "Understanding Git" posts have hit #1 on Hacker News? More than a few. How many have hit top 10? Surely dozens.

Can we, please, take this as an indicator that Git is too fucking complicated? After the first thousand "Git made easy" blog posts it should have been apparent.

Le sigh.

But maybe everyone agrees that it's too complicated but it's used. You know, English is way too complicated, but you can't just make everyone switch to Esperanto. People are interested in things that help people understand English and things that help people understand Git.

Indeed, the miracle here is how git got so popular.

It has _NOTHING_ to do with meeting needs or doing things spectacularly better than other tools. There was version control before git and it worked just fine. I just think some "cool kids" started using it, it developed a certain Caché that made it desirable and that was it: here we are with the most popular version control system in the world with an absolutely shitty, inscrutable interface.

For the vast majority of git "users", this doesn't matter. They use git as little more than filestorage.


I think most of us do agree. The official documentation is written as though the reader is already intimately familiar with the internals of git. I'm a smart guy, and I've read the official docs numerous times but every single time I failed to come away with a greater understanding than I had before. The commands are not particularly intuitive either.

I personally think git is fucking awful.

... but it's the best version control system anyone's come up with so far. It sucks less than the alternatives and I use it for all my projects.

I work in video games where everyone uses Perforce. It's not perfect but I think it's vastly superior to Git. I can train someone whose never even heard of source control how to use it in 5 minutes.

Git might qualify as best FREE version control system anyone's come up with so far. But best? No, no I don't think so.

>Git might qualify as best FREE version control system...

No way it is the best. I think Mercurial easily holds the winning position for the version control with most power to weight ratio...

Agreed. I've used Perforce for a long time and it just clicked from the start. A lot of people swear by DVCS, but somehow I've always worked on projects where centralized VCS makes more sense. I wonder what the difference is? It can't be just about size, because Google uses centralized and Linux uses distributed, and both seem to be happy...

It's about organization structure. Centralized organizations where work is assigned top-down and/or where people working on the same part of the code base are likely to be in the same timezone (if not in the same office) are better served by a centralized VCS which is simpler to understand and mirrors how the organization works.

Open Source will probably never produce a good centralized VCS because it's not a problem that exists for Open Source projects.

The most inscrutable part of Git is rebasing and how it works. It took me really long to form a mental model where I can visualise how it does its job.

I agree with the author that understanding how Git works is the only way to get comfortable with it. For eg, Rebasing is hard but you will have your "Aha!" moment when you realise that it is nothing but a combination of hard reset and cherry-pick

I don't understand why most people find the concept of rebasing so alien. Maybe it's because I've never used another VCS, but it's all just very rudimentary graph theory.

Nice article. https://codewords.recurse.com/issues/three/unpacking-git-pac... is another article that i found very useful to understand packing and unpacking. (I found this when I was building a standalone html viewer for .git directory)

I'm finding this comment thread to be a great example of the idea that everyone has different and particular learning styles. 6 comments, 4 different git tutorials (not counting the original post.)

The CLI is a complete mystery and that makes it hard to explain to people. Most of the commands and arguments differ so much that it makes little sense. Delete a branch, commit or tag in a similar way; the commands to do this are totally different.

How Git works can be easily drawn out on paper to explain it to someone. The branches, commits and merges is simple to draw.

When we lose sense of our state we always take a piece of paper and draw it out. Most of the time you can just figure it out what's up.

Yeah, I used to race through svn workflows without a second thought. With svn, I still don't understand how to (just one example) throw away all the local changes. (No, not git reset --hard. At least that didn't work when I last tried).

It came to the point where I thought about spending a week just learning git to defend my berd creds.

Instead, I just use a gui for anything beyond add/comit/push. I still don't like not to understand one of my daily tools, but I have real work to do.

I found I had to build a whole parallel patch management system to rationally deal with svn. Applying the same patch in multiple branches was a royal PITA in svn; in git, it's trivial.

git status will tell you how to throw away local changes, btw. Depending on the kinds of changes (e.g. edit/delete added, staged or not) need different commands.

Beware, once you learn some of the git useful patterns and features, most likely you'll start feeling very uncomfortable when you have to go back to svn ;)

The best way to understand Git is to learn Mercurial. It teaches you the things that matters with none of the useless implementation bullshit.

If you start with git, you won't have a clue where the interface ends and the implementation details starts. Which is why, I think, people find git hard.

I still hope that the whole development world will come to their senses and start using Mercurial more and more...


This one really helped me in the beginning. Visualisation in general helps when learning git (branching, rebasing etc). Ungit is my goto when I want to just see how it (repo) looks like.

I have read enough of all these tutorials that I understand how git and the git commands work. I could pass a test. But still I have a hard time feeling comfortable enough to spit out commands as needed. I've created a lot of aliases to do common things but that is not the same thing.

My entire git workflow is based on these tutorials (don't seem to need much else): https://git-scm.com/book/en/v2/Git-Branching-Basic-Branching... and http://nvie.com/posts/a-successful-git-branching-model/

Anyone knows what would be most usefull outside the scope of the tutorials above?

Everyone has their favourite explanation of how git works. Mine is:

Knowledge is Power: Getting out of trouble by understanding Git by Steve Smith [1]

[1] https://www.youtube.com/watch?v=sevc6668cQ0

For anyone looking for a video/demo on this, here's a great presentation by Tim Berglund - https://www.youtube.com/watch?v=MYP56QJpDr4

I wrote a series back in 2011 that looked at a bite-sized chunk each week, including how the git repository works, and although the series is long over there is a summar/index page here:


As a meta note; it can be really difficult to keep up a continual blog series. It helped that I advertised it in advance and then wrote around two in advance so that writers block or holidays didn't impact the schedule.

And this is my favourite tutorial so far: https://www.youtube.com/watch?v=ZDR433b0HJY I think it explains everything in 80 minutes.

Too late to edit. I meant this one: https://www.youtube.com/watch?v=xbLVvrb2-fY

If this kind of approach to learning Git interests you, I took a similar approach with A Hacker's Guide To Git (https://wildlyinaccurate.com/a-hackers-guide-to-git/). It is much longer and goes into a bit more detail than the OP but (hopefully) is arranged in a way that you can read it a few sections at a time.

There was a very cool resource (video, ebook) on how git works on a website called peepcode. I just realised they're no longer operational. It was by Scott Chacon I think. Can't find it anywhere else now. Would have made a good addition to the resources listed here.

It seems that the old website is availible via archive.org[0], however it was part of a payment system at the time, so archive.org does not store a copy.

However, after some digging it turns out that Pluralsight put it under a Creative Commons license, so the git guide I think that you're talking about seems to be availible on Github[1].

[0]: http://web.archive.org/web/20121015074953/https://peepcode.c...

[1]: https://github.com/pluralsight/git-internals-pdf

A little late, but this really opened my eyes:


    you can put the files you don’t want git to deal with in your .gitignore file. 
Since .gitignore is committed itself this is very useful.

I look at this, and then I look at people wondering why non-coders don't use version control, and I laugh and laugh.

I don't think this article is a very good argument why non-coders don't use it. They don't care about the internals. Git's CLI, well, that's a good reason why they don't. (but you can get quite a lot of the benefits with a few basic commands via a GUI tool, and you totally can explain that to many non-coding users)

Non-coders do use version control. e.g. MS Word documents have some version control features, as does Atlassian Confluence. Some even use Git!

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact