Hacker News new | comments | ask | show | jobs | submit login

New to git as well as programming in general. This site has helped me somewhat, but I'm still confused about some basic concepts. Aren't the arrows all facing the wrong way? What happens to a commit after 'git reset' abandons it; can the commit be accessed ever again? What's the difference between reset and revert? How can merge and rebase be trusted? What are the best practices for how many branches a project should have and how & when should they be merged? And what are remote tracking branches? Since I only program alone so far, my friends tell me not to concern myself with these questions until I am forced to do so in a practical situation... but that doesn't satisfy me.

Something no one has mentioned but I think is important is that reset is best not thought of in relation to revert but in relation to checkout. Each of these commands operates on very different things, but they all seem to have similar purposes at a very superficial level [1].

- Checkout manages HEAD. When you use checkout, HEAD is changed to point to the branch, commit, tag, or whatever you've told it to and the working copy is altered to match that ref.

- Reset manages what commit a branch points to (though, see [1] again for the disclaimer). Several of my sibling posts here talk about it altering history but I think this is not really a correct way to look at it. Branches, in git at least, are not really 'history'. They are dumbass pointers and they move around a lot. This is not an alteration of history most of the time.

- Revert creates a commit, so is actually not really like the above two commands at all. It's just an automatic way to create a commit that undoes an earlier commit (which can even be a merge commit, btw).

The more destructive version of revert is actually rebase, which can truly alter history in that it takes existing commits, replaces them with new ones that resemble the old ones, and then pretends that it was like that all along.

[1] In fact, they overlap in areas, and checkout and revert in particular have developed a swiss-army-knife set of commands unrelated to their main purpose. I want to talk about their main purposes here, though. Reset, in particular, has a lot of different uses.

- The arrows point in the direction of references to other commits. Each commit (except the first) will reference one or more parents (in the case of a merge). Commits do not reference their children at all (because then adding commits would modify the previous ones). You can point the arrows in the other direction but that is not the direction in which git reads the commit graph.

- If a commit can't be reached from a branch, then it may be deleted after a time (I think 30 days or when you manually run git gc). Before it is deleted you can recover it by adding it to a branch, checking it out, or otherwise making it visible again. The command 'git reflog' will print a history of commits so you can recover them from there.

- Reset moves the current branch to a given commit. This changes the 'history' of the branch. 'revert' creates a new commit which undos another commit, which does not change history

- Merge and rebase are only semi-automatic. Part of git's philosophy is that they are 'stupid' and will produce an error if they cannot figure out how to apply a commit, and prompt you to do it manually (which you do by editing the files in question, git adding them, and then either commiting them or running 'git rebase continue'). They do occasionally mis-apply a patch automatically, however. It's good practise to inspect the result of a merge because of this.

- There's many different approaches you can use, which one is most appropriate depends on the project. A common one is 'feature branches', where you create a branch for each new feature you add, and then merge the branch into master when the feature is complete.

- remote tracking branches are just local branches which are associated with a remote branch, so that git pull will automatically merge changes in the remote branch to the local branch

I recommend using a GUI like gitk to visualise the repository (running gitk --all gives you a full view of all branches, and F5 will refresh the view with new changes). I still give git commands on the console, but gitk will show you a graph very much like on this website, which I find very helpful when manipulating the repository.

You got a lot of questions in there, and I'm writing in a hurry, so sorry if some of the following doesn't make sense.

> Aren't the arrows all facing the wrong way?

Technically, commits are linked to their parent (you can see this in the git log), which is why the arrows point the "wrong" way. If you think about it, a child node has a fixed number of parents, but a parent can have an infinite number of children.

> What happens to a commit after 'git reset' abandons it; can the commit be accessed ever again?

The commit doesn't go anywhere (yet). What actually happens with `reset` is a touch complicated, but basically you are changing the label of the "thing" that you are resetting to point to a different commit . After you reset, you can still `git show <sha>`, ie. the commit is still accessible, it's just not part of the "tree".

> What's the difference between reset and revert?

Reset will change labels (tags, branch heads), whereas revert just takes the commit you've given it (say `C`), inverts it (so "add line X" becomes "delete line X"), and creates a new commit that undoes (reverts) the changes in `C`.

> How can merge and rebase be trusted?

... to do what exactly? One thing to note is that git will bail out when a merge/rebase causes a conflict, and leaves it up to you to fix it.

> What are the best practices for how many branches a project should have and how & when should they be merged?

This is close to a "Emacs vs Vim"-type question (Emacs, of course ;) ). We use feature branches in our 5-developer project quite successfully, sometimes we have 10 branches in flight, and sometimes we can collapse down to `master`.

Branching is so cheap in Git that you can just use them as much as you want.

> And what are remote tracking branches?

A remote tracking branch (as far as I know) is just a setting for git saying local branch `A` will automatically use remote branch `B` for updates (push/pull) without having to tell git every time to update from `B`.

I can attempt to answer your questions briefly

Aren't the arrows all facing the wrong way?

>> No, the arrows are in the right direction. Git maintains a Directed Acyclic Graph (DAG) of commits. Each new commit created points to 0..n parents. So let's say you were on the first commit, and then you created a new commit. The newly created commit is the "child" of the first commit, and has a "parent" pointer that is the SHA of the parent. The parent cannot have pointers to the children since they don't know that one will be created!

Now, if you were to do a merge between branch a and b, assuming it's not a fast-forward merge, a new child commit is created that will have 2 "parent" pointers - the SHA of a and b.

A commit will have 0 parents if it the first commit in a repo.

What happens to a commit after 'git reset' abandons it; can the commit be accessed ever again?

>> That commit is "lost". I put that in quotes because git dosen't immediately throw away the commit (rather there is a garbage collection process that will eventually remove that commit). For the same reason if you happen to know the SHA of the commit you just reset, then you can do a "git checkout <new_branch> <sha>" and create a branch off that commit.

One way to get to that "lost" commit is to use "git reflog" - the reflog (that is ref-log) is a lot that updates each time the "HEAD" moves. Since when you do a "git reset" the HEAD moves from the current commit to another one, the reflog will record that.

What's the difference between reset and revert?

>> "reset" is "destructive" in that it can leave dangling commits. A dangling commit is a commit that nothing else points to. Remember the DAG? Well that means that every commit (except) the last commit has a child pointing to it. And who's pointing to the last commit? The branch, and the HEAD. If a commit has NO ONE pointing to it, it is eligible for garbage collection.

Furthermore, since a reset can "delete" commits, one usually never "reset"s public commits - that is commits that have been pushed upstream (to Github or what-have-you). The reason being - if someone else pulled that branch and started working off that commit, and you reset it, it leaves the other person in an unknown state when you do a push.

If you wish to "undo" a public commit, you should revert. If you see the visualization you will notice that "reset" blurs out the last commit, while "revert" creates an "anti-commit" - that is a commit that undoes everything the other commit did.

How can merge and rebase be trusted?

>> I am not sure what you mean by that? Do you mean will Git screw up? Or how do you know that a git pull (which could merge or rebase) pull in commits that have been verified?

What are the best practices for how many branches a project should have and how & when should they be merged?

>> This is a "depends on" answer IMO.

ONE approach is that all "work" should happen on short topic branches off the "integration" branch (which in many cases is master). Commits are made, and things are reviewed, and tested they are merged into master.

This also means that each developer on the team merges one branch per feature into master. Every time you are ready to merge your code into master you first rebase master (which plays your commits on top of master) and then merge into master.

FWIW I am giving you the 30 second answer, but this needs a detailed explanation.

A few resources to get you started -




Good luck!

Thanks for answering so many questions. And thank you bodhi and rcxdude as well! I had an idea that best practices might still be an emerging topic. My unclear question about merging was indeed about conflicts, so I'm glad to hear that the merge philosophy is to be "stupid" as rcxdude said. I'll do some more research and check those links.

The best resources I've found for understanding git workflows are by Vincent Driessen[1] and Atlassian[2].

The Atlassian page is much more comprehensive, and in fact derives the 'GitFlow' workflow from Driessen. However, I prefer his original top-to-bottom representation (a left-to-right time axis seems less intuitive to me - they remind me of a football playbook rather than a waterfall).

[1] http://nvie.com/posts/a-successful-git-branching-model/ [2] https://www.atlassian.com/git/workflows

for me, that git for computer scientists link was very clear

Not going to bother answering since I think everyone else has already covered it, but just wanted to say those are all the right questions.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact