Hacker News new | comments | ask | show | jobs | submit login
Visualizing Git Concepts with D3.js (wei-wang.com)
346 points by sebg on Mar 22, 2014 | hide | past | web | favorite | 50 comments



See also "Learn Git Branching": http://pcottle.github.io/learnGitBranching/?demo


The Ungit[1] gui can also help you to visualize git operations like merging and rebasing commits. Here's the youtube demo: https://www.youtube.com/watch?v=hkBVAi3oKvo.

[1]: https://github.com/FredrikNoren/ungit


That's exactly what I was just thinking I wished existed!


That is pretty cool.


This is really cool - I wish I'd had this for teaching git a few weeks ago.

FYI - it looks like branch creation is broken. Creating a branch and following with `git commit` gives the error "Not a good idea to make commits while in a detached HEAD state." It's not a detached HEAD, it's a branch.


Purely anecdotal feedback coming up.

This is somewhat old (like at least a year old I think). I've been using git for a pretty long time but as my main usage was on solo projects I was not at all making use of branches, rebasing and so on. I knew the concepts though but the entire thing was still a mix grey cloud for me up until a few months ago; even though I've had previous experience dealing with git internals (libgit and libgit2). I saw this site before and I thought it was very cool but it was not teaching me anything new. Although everything seems clear and simple, it is a representation I simply could not wrap my head around. It's like when you learn a new language: I know "libro" means "book" in spanish but I cannot associate it with the concept of a book, until I move to spain and speak the language.

Then I got my new job. New git workflow and I had to use branches and rebasing in the field on an hourly basis. And ever since, everything clicked. Now that I see this link again, it is not only clear, it feels like I speak the language.

All in all, I think this is a great piece of work, but pedagogically speaking, I think of it more as a visual cue to pair with the git actions when you do them than something to rely on.


Excuse the self promotion I wrote a tool aimed at teaching git at https://github.com/Gazler/githug


ok, I need to download && install just to see a demo?

Maybe you should create a demo page.


New to git as well as programming in general. This site has helped me somewhat, but I'm still confused about some basic concepts. Aren't the arrows all facing the wrong way? What happens to a commit after 'git reset' abandons it; can the commit be accessed ever again? What's the difference between reset and revert? How can merge and rebase be trusted? What are the best practices for how many branches a project should have and how & when should they be merged? And what are remote tracking branches? Since I only program alone so far, my friends tell me not to concern myself with these questions until I am forced to do so in a practical situation... but that doesn't satisfy me.


Something no one has mentioned but I think is important is that reset is best not thought of in relation to revert but in relation to checkout. Each of these commands operates on very different things, but they all seem to have similar purposes at a very superficial level [1].

- Checkout manages HEAD. When you use checkout, HEAD is changed to point to the branch, commit, tag, or whatever you've told it to and the working copy is altered to match that ref.

- Reset manages what commit a branch points to (though, see [1] again for the disclaimer). Several of my sibling posts here talk about it altering history but I think this is not really a correct way to look at it. Branches, in git at least, are not really 'history'. They are dumbass pointers and they move around a lot. This is not an alteration of history most of the time.

- Revert creates a commit, so is actually not really like the above two commands at all. It's just an automatic way to create a commit that undoes an earlier commit (which can even be a merge commit, btw).

The more destructive version of revert is actually rebase, which can truly alter history in that it takes existing commits, replaces them with new ones that resemble the old ones, and then pretends that it was like that all along.

[1] In fact, they overlap in areas, and checkout and revert in particular have developed a swiss-army-knife set of commands unrelated to their main purpose. I want to talk about their main purposes here, though. Reset, in particular, has a lot of different uses.


- The arrows point in the direction of references to other commits. Each commit (except the first) will reference one or more parents (in the case of a merge). Commits do not reference their children at all (because then adding commits would modify the previous ones). You can point the arrows in the other direction but that is not the direction in which git reads the commit graph.

- If a commit can't be reached from a branch, then it may be deleted after a time (I think 30 days or when you manually run git gc). Before it is deleted you can recover it by adding it to a branch, checking it out, or otherwise making it visible again. The command 'git reflog' will print a history of commits so you can recover them from there.

- Reset moves the current branch to a given commit. This changes the 'history' of the branch. 'revert' creates a new commit which undos another commit, which does not change history

- Merge and rebase are only semi-automatic. Part of git's philosophy is that they are 'stupid' and will produce an error if they cannot figure out how to apply a commit, and prompt you to do it manually (which you do by editing the files in question, git adding them, and then either commiting them or running 'git rebase continue'). They do occasionally mis-apply a patch automatically, however. It's good practise to inspect the result of a merge because of this.

- There's many different approaches you can use, which one is most appropriate depends on the project. A common one is 'feature branches', where you create a branch for each new feature you add, and then merge the branch into master when the feature is complete.

- remote tracking branches are just local branches which are associated with a remote branch, so that git pull will automatically merge changes in the remote branch to the local branch

I recommend using a GUI like gitk to visualise the repository (running gitk --all gives you a full view of all branches, and F5 will refresh the view with new changes). I still give git commands on the console, but gitk will show you a graph very much like on this website, which I find very helpful when manipulating the repository.


You got a lot of questions in there, and I'm writing in a hurry, so sorry if some of the following doesn't make sense.

> Aren't the arrows all facing the wrong way?

Technically, commits are linked to their parent (you can see this in the git log), which is why the arrows point the "wrong" way. If you think about it, a child node has a fixed number of parents, but a parent can have an infinite number of children.

> What happens to a commit after 'git reset' abandons it; can the commit be accessed ever again?

The commit doesn't go anywhere (yet). What actually happens with `reset` is a touch complicated, but basically you are changing the label of the "thing" that you are resetting to point to a different commit . After you reset, you can still `git show <sha>`, ie. the commit is still accessible, it's just not part of the "tree".

> What's the difference between reset and revert?

Reset will change labels (tags, branch heads), whereas revert just takes the commit you've given it (say `C`), inverts it (so "add line X" becomes "delete line X"), and creates a new commit that undoes (reverts) the changes in `C`.

> How can merge and rebase be trusted?

... to do what exactly? One thing to note is that git will bail out when a merge/rebase causes a conflict, and leaves it up to you to fix it.

> What are the best practices for how many branches a project should have and how & when should they be merged?

This is close to a "Emacs vs Vim"-type question (Emacs, of course ;) ). We use feature branches in our 5-developer project quite successfully, sometimes we have 10 branches in flight, and sometimes we can collapse down to `master`.

Branching is so cheap in Git that you can just use them as much as you want.

> And what are remote tracking branches?

A remote tracking branch (as far as I know) is just a setting for git saying local branch `A` will automatically use remote branch `B` for updates (push/pull) without having to tell git every time to update from `B`.


I can attempt to answer your questions briefly

Aren't the arrows all facing the wrong way?

>> No, the arrows are in the right direction. Git maintains a Directed Acyclic Graph (DAG) of commits. Each new commit created points to 0..n parents. So let's say you were on the first commit, and then you created a new commit. The newly created commit is the "child" of the first commit, and has a "parent" pointer that is the SHA of the parent. The parent cannot have pointers to the children since they don't know that one will be created!

Now, if you were to do a merge between branch a and b, assuming it's not a fast-forward merge, a new child commit is created that will have 2 "parent" pointers - the SHA of a and b.

A commit will have 0 parents if it the first commit in a repo.

What happens to a commit after 'git reset' abandons it; can the commit be accessed ever again?

>> That commit is "lost". I put that in quotes because git dosen't immediately throw away the commit (rather there is a garbage collection process that will eventually remove that commit). For the same reason if you happen to know the SHA of the commit you just reset, then you can do a "git checkout <new_branch> <sha>" and create a branch off that commit.

One way to get to that "lost" commit is to use "git reflog" - the reflog (that is ref-log) is a lot that updates each time the "HEAD" moves. Since when you do a "git reset" the HEAD moves from the current commit to another one, the reflog will record that.

What's the difference between reset and revert?

>> "reset" is "destructive" in that it can leave dangling commits. A dangling commit is a commit that nothing else points to. Remember the DAG? Well that means that every commit (except) the last commit has a child pointing to it. And who's pointing to the last commit? The branch, and the HEAD. If a commit has NO ONE pointing to it, it is eligible for garbage collection.

Furthermore, since a reset can "delete" commits, one usually never "reset"s public commits - that is commits that have been pushed upstream (to Github or what-have-you). The reason being - if someone else pulled that branch and started working off that commit, and you reset it, it leaves the other person in an unknown state when you do a push.

If you wish to "undo" a public commit, you should revert. If you see the visualization you will notice that "reset" blurs out the last commit, while "revert" creates an "anti-commit" - that is a commit that undoes everything the other commit did.

How can merge and rebase be trusted?

>> I am not sure what you mean by that? Do you mean will Git screw up? Or how do you know that a git pull (which could merge or rebase) pull in commits that have been verified?

What are the best practices for how many branches a project should have and how & when should they be merged?

>> This is a "depends on" answer IMO.

ONE approach is that all "work" should happen on short topic branches off the "integration" branch (which in many cases is master). Commits are made, and things are reviewed, and tested they are merged into master.

This also means that each developer on the team merges one branch per feature into master. Every time you are ready to merge your code into master you first rebase master (which plays your commits on top of master) and then merge into master.

FWIW I am giving you the 30 second answer, but this needs a detailed explanation.

A few resources to get you started -

http://git-scm.com/book

ftp.newartisans.com/pub/git.from.bottom.up.pdf

http://eagain.net/articles/git-for-computer-scientists/

Good luck!


Thanks for answering so many questions. And thank you bodhi and rcxdude as well! I had an idea that best practices might still be an emerging topic. My unclear question about merging was indeed about conflicts, so I'm glad to hear that the merge philosophy is to be "stupid" as rcxdude said. I'll do some more research and check those links.


The best resources I've found for understanding git workflows are by Vincent Driessen[1] and Atlassian[2].

The Atlassian page is much more comprehensive, and in fact derives the 'GitFlow' workflow from Driessen. However, I prefer his original top-to-bottom representation (a left-to-right time axis seems less intuitive to me - they remind me of a football playbook rather than a waterfall).

[1] http://nvie.com/posts/a-successful-git-branching-model/ [2] https://www.atlassian.com/git/workflows


for me, that git for computer scientists link was very clear


Not going to bother answering since I think everyone else has already covered it, but just wanted to say those are all the right questions.


After having seen a few learn-git tools on HN and other programming forums, I would have to say that this is my personal favorite. What this tool does exceptionally well is teach the core git concepts from a visual point (seeing as it's git this works very naturally) instead of asking the user to memorize an incredibly small subset of the git commands. Often this leads to a lot of trouble when you get into a real-world scenario that some inappropriate learn git tutorial didn't cover. Fortunately, I think visualization tools like this should help ;) Perhaps github could incorporate this into their Try Git page?


It seems cool, but I don't understand why I can't just click multiple times instead of having to type the command several times. That UI decision makes the site nearly unusable on an iPad.


I think I do understand: the author either did not think this was going to be used on a tablet (after all, it teaches git in it's bare cli form, not by the aid of an point-and-click ui) - or just didn't care about it (for pretty much the same reason, or because he thought there aren't many people using git cli on a tablet anyway).


This is great, kudos to the dev!

Like others who wished they had this while teaching git, I wish I had this while learning git.

However, the interaction seems a bit clunky - after doing a `git fetch` I wanted to try a `git rebase` but realized I had to click on the `git rebase` option first before typing it out. Maybe remove the need to click the option first OR clicking it will automatically type stuff for you. Basically, it's a two step process that once needs one.


Thanks, I'm glad you like it.

You do not have to click the 'git rebase' option before typing it out, actually. All of the (supported) commands can actually be used anywhere.


I was surprised and gratified to see that pressing 'up' in the terminal window did exactly what I was hoping it would do.


but not tab.


This is great for teaching git, a nice add-on to the tool would be to add command by command playback of custom scenarios and see them pan out. Also it would be interesting to grab tons of different history records from developers' shells and get common git patterns and play them back with this tool.


In addition to this, I think something that would help me a lot is being able to step back and forward through any history of commands that I'm experimenting with.

Great stuff!


I absolutely love D3, it's what we've been seeking for a very long time and finally a decent product was created. Prepare to see this blow up and take over the web soon.


D3 already underlies quite a few popular visualization libraries. While you can use D3 directly, it's kind of a toolkit for creating visualizations, so it's often hiding 'behind the scenes' in libraries you already know and love.


Very nice, thanks for making this. I tried crazy workflows and the history tree had multiple levels. If possible, could you add scrolling to the visualization viewport?


Can't wait to see this working as a plug-in for Atom (github's new text editor). Since Atom plugins are written like browser extensions (in javascript)...


Nice work!

This is a great complement to the codeschool tutorial [1] it would be nice similarly offer the feature that clicking on the command will auto-populate the virtual terminal. New git users might be more likely to use a client-side GUI where they are clicking buttons rather than typing in the terminal.

[1] http://try.github.io/


Oh thanks, never new that one existed. :)


This is a great teach aid, hits the mark.

Further, if this style of material were applied to crypto (gpg), more people might be comfortable using it.


This is fantastic. It's amazing how the right graphic can really clarify a somewhat tricky concept like merging.


This just taught me more than any git tutorial I've read. What a great visualization!


This is probably my favorite post ever on HN. Very enlightening. And, like lots of good things.... it's simple.

As others have mentioned, it would be nice to scroll down the diagram.

But, I guess if this were really wanted, git clone would be the first step:)


this is pretty cool and I have been meaning to do sort of this for some time now. I even registered a domain for it: simgity.com (a simulator for git).

Happy to "donate" it if the author is reading


This is a great visualization, this should definitely be taught alongside any existing Git tutorials. I just wish the canvas would allow scrolling, or resize automatically.


I thought the new comments rule would ban the whiners, nitpickers and nay-sayers, but it seems all just like it was before. :O Kudos to the developer, who made this tool.


I see this as a great tool for teaching the basics of git. Definitely saving this for the next time I have to explain why branching is useful and how it works. Great tool.


constructive criticism: I'm a git user, but, if you need D3 graphics to explain a svc... can we at least accept that git concepts are not intuitive?


We switched over to git from svn recently at work and the amount of confusion and accidents is crazy. I've been using git (mostly solo to two coworkers) for years, but there are still issues I run into where I'm stumped. I look things up all the time.

You kind of need git training in cases like this. Unfortunately, we didn't have any.


May I ask what it is about git concepts that you find un-intuitive? To me personally, git makes a lot of sense conceptually, although the use cases of some commands can be a little confusing at first (e.g. `git checkout` for both switching branches and undoing changes to a dirty working copy).


Oh, absolutely. I have yet to meet someone who didn't take several months to feel comfortable with it.


I thought this was pretty awesome :) Would be great if it were fleshed out a little more with more complex examples. Kudos!


Either something wasn't working for me, or the 'branch' function is completely over my head.


funny how the git push one doesn't work until you git pull, but it doesn't tell you you have to do that, it just assumes you knew.


Can anybody suggest good D3 books?


I'm using Interactive Data Visualization for the Web by Scott Murray

http://shop.oreilly.com/product/0636920026938.do

You can read it for free at Oreilly's Chimera labs

http://chimera.labs.oreilly.com/books/1230000000345


This is great, thanks.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: