Hacker News new | past | comments | ask | show | jobs | submit login
How to teach Git (rachelcarmena.github.io)
544 points by amanzi 3 months ago | hide | past | web | favorite | 266 comments



And here is something to take the garbage quality of Git manpages with some humor

https://git-man-page-generator.lokaltog.net/

"git-eliminate-head eliminates all downstream heads for a few forward-ported non-counted downstream indices, and you must log a few histories and run git-pioneer-object --pose-file instead. [...]"


This is a kind of very specific, targeted, niche comedy, excellently executed. I only use Git casually, and I can't tell how much of that text couldn't make sense in an actual doc; it all looked real enough at a glance.

Thank you for sharing, this made my week.


This is funny, but in reality it's not.

Git is one of the most amazing, powerful tools ever conceived, with one of the must byzantine and ridiculously designed 'interfaces' ever conceived.

People confuse the raw power of a tech, with how well it can be feasibly used. Sadly, due to the later issues, git will only ever be a shadow of what it could have been.

With all due respect to Linus, who'd be the first to admit he's not very good at UI stuff (I mean command line as well)... it's truly a sad thing.

This is a major 'problem that needs to be solved' I'm interested to see how it could evolve into something 'better'.


I don't think the complexity of git's command options is a UI problem. It results from the basis of its operation. We could change some names, add or remove some concepts to how some of the operations are performed, but there are simply a large number of actions to handle many edge cases.

A better solution for prose was to always be merging with live multiple collaborator updates. Conflicts are visible in real-time. I can't see something like this would work with code. Hmm interesting... unless we only allow additions and refactorings to working checkpoints.


Yes, I don't really understand all the people trying to "fix git". Git's fine, though the complexity makes it challenging, especially for new users. However the complexity is a direct result of useful features. "Keeping it simple" is great, except when the complexity is needed. I'm hard pressed to name any features I could do without.


There are two problems: one of inherent complexity, one of UI.

First - the UI is a mess, and that should have been fixed. It would make a big difference.

Second - is the inherent complexity. That's a good point, but I feel many things could be hidden or obfuscated.

Most poignantly, Git does something that most of us do not need: it was designed to work as a 'completely distributed system', i.e. for open source.

Almost none of us do that. 95% of uses cases related to you and I working collaboratively, on a project together.

The need to have repos which are essentially totally distinct from one another is a huge source of complexity and it simply doesn't need to exist in most cases.

So Git is basically an 'admin level tool' that is commonly used in scenarios for which it wasn't meant to be used, with a confusing interface.

It's costing a lot of time and money and headaches, I do believe someone may come along eventually and fix it.

This thread is essentially evidence of this - see how many people have difficult teaching what should essentially be a simple thing in most cases.

Way too many very smart people still spend too much time clustering around in git.


Ever used a centralized VCS, such as SVN? That seems to be what you want? A distributed VCS is extremely useful, for mine and many others' uses, however.


>I'm hard pressed to name any features I could do without.

Nobody is complaining about there being too many features. People are complaining about the arcane incantations that one needs to conjure to call them.


Good point. I wouldn't say nobody though. There are people out there that think there are too many commands. I've even seen academic papers that claim the staging area is problematic. I love the staging area, and don't think it's terribly confusing; there's always commit -a if you don't want to use it. It does lead to some confusion, but it's worth it for the added features.

That's just one example, but there definitely are people that think git's too featureful. As for more valid criticisms, I'd agree. I've heard the CLI compared to being in an abusive relationship. All that said, I can't really think of a better way to handle things without losing useful functions. In which case, I don't have any better ideas, and don't really know what I'm criticizing.


Impressively small generator data too, given how realistic it reads: https://github.com/Lokaltog/baba-grammar-git-man-page-genera...

but maybe that's due to the low bar that it's being compared to.


Please note[1] that:

> To rev-parse an automatic FLOUNDER_LOG or diff the working subtrees, use the command git-link-submodule --retrieve-wrestle-change.

I often overlook this detail when trying to hulk smash some broken change I've made upstream (what the manual entry correctly refers to as RIP_OTHER_TIP).

[1] https://git-man-page-generator.lokaltog.net/#81394c8bf3806f9...


Brilliant, absolutely had me rolling. Most cleverly-executed satire I've seen in a long while.

"To parse a staged SKIRT_SUBTREE and blame the working histories, use the command git-purchase-pack --snuggle-muster-branch, as after reapplying subtrees..."


Thank you for this, it made my day. :)


Am I only one who's getting an error on Firefox?

    Content Security Policy: The page’s settings blocked the loading of a resource at self (“script-src”).


Good to see tax accountants contributing to open source.


Love it. I'm sold on git but it often feels like I'm in one of those man pages.


In my experience, it gets better. After a while you know everything you need to not only do your job, but to understand and resolve common issues. After a couple of years you'll be at the point where the only time you have to dig through manpages is when you're trying to do something esoteric or clever.


That's hilarious


Nothing short of sublime. Thanks for sharing.


OMG THIS IS THE REALEST


That's fantastic!


oh my sides.


Here is my personal recommendation for getting more comfortable with git. Use "git status" a lot. Everytime you do something in git, and before you do something, do a "git status" and see what you change with your commands. And what you didn't change.

Also "git log".


`git log` on its own (with no flags) isn't that useful, as it's missing a lot of important information. I prefer `git config --global alias.lg "log --color --graph --oneline --decorate"`. Then you can just type `git lg` and get a much more useful overview of the state of your current branch.


Git in many occasions (including when using git log) feels like the designer just threw their hands up and said:

"F*ck this shit, make your own UI on top if you want to use this tool!"

I don't think Torvalds has a proper excuse for the pain and suffering he's inflicted on millions of developers worldwide :(

(To the people going: "Oh, it's Open Source!". Sure, so are Mercurial, Fossil, etc.)


This makes a lot more sense when you realise that Linus thought he was building a low level tool that people would build a UI on top of. The 'original' git command line was more a proof of concept and engineering tool than something aimed at actual use.


Right, doing a search through the git mailing list for the use of the word "porcelain" is fascinating. It's unfortunate some of those "porcelain" projects never finished.


... because none of the power users used them.


Heh. I think Linux on desktop it is like that, at least a bit; frequently, by the time someone could write a friendly GUI (or friendly CLI for that matter), they have little use for it.


That's a disturbingly insightful parable for everything from startups to linux/FOSS contributions.


I don't think it was ever really intended for worldwide use by Linus, it was intended to exactly replicate his workflow and his alone. The real responsible party is github; I'm not sure how they came to dominate?


By being much, much, much better than the competition at the time (sourceforge) - github was faster, without ads, used superior vcs. Github also had Octopuss/Octocat as a logo :-) .


Sure, but why github rather than mercurialhub etc? The website has great usability but the git CLI does not. Maybe the github gui was also critical.


Yes, the gui was good (and it still is, or may be I am too familiar with it at this point, I do not know).

Another factor is that git was always FAST, and the most common operations (init, log, status, commit, checkout, push, pull) are not that complicated as people make them to be, so you can very easily start to use it for new local projects, and continue without installing additional plugins, unlike mercurial ... And at some point, when you wanted a colaboration hub or just a public remote repo, you just pushed your code to github.


> Git [...] feels like the designer [...] said "F*ck this shit, make your own UI on top if you want to use this tool!"

Actually this is exactly what happened back then, which is why many people used https://en.wikipedia.org/wiki/Cogito_(software) as a frontend to git in the early days. That said, I personally find current git UX absolutely fine and I can't imagine being effective without having all those commands like git reset --whatever -p and git rebase --interactive.


What an absurd statement: "Torvalds has a proper excuse for the pain and suffering he's inflicted on millions of developers worldwide :(" !

Who are you exactly to judge people like that, oh failed incarnation of a tibetan Lama ???

He does not need an excuse to make a tool that has proven very useful to many (including me). In fact, I am grateful, that he chose to make it and publish it.


It definitely feels like it was written by someone used to writing an ABI or library rather than a set of command line tools.

Certain tools are organized by what code goes together rather than by what user functions make sense to go together. It's part of what makes it confusing.


    git config --global alias.lg "log --color --graph --oneline --decorate"
for easy copy-pasting. HN doesn't support backticks or triple backticks like a lot of markdown parsers. To get fixed-width fonts, you have to insert 4 paces at the beginning of the 'code' block.


Two spaces (unlike Markdown).

Of course four will work, it just adds more indentation.


In some mobile browsers, doing this will result in the fixed-width font line getting cut off when viewing the resulting post.


I got into the habit of teaching people `gitk`. It's not the prettiest tool, but it's included with most distributions of git, defaults to a decorated color graph log, and thus avoids the "copy paste this weird config line" step on other people's machines.


I have a soft spot for Tk interfaces. I hope they become cool again (or, I mean, for the first time).


+1 these, I wish they were default for git. Much better both for beginners and experienced users.


I just use a GUI for that. I don't like using applications like Sourcetree for any actions, but they really are superior for visualisation of the commit tree, diffs between non-adjacent commits or between branches, etc.


I'd recommend Git Extensions, still, as a good compromise between porcelain and plumbing. The most useful commands are all at your fingertips, with deeper ones available if necessary. Visualization is best I've seen - clean graph display, easy to read, and _doesn't lie in complex cases like SourceTree does._

SourceTree tends to treat commits with multiple ancestry in a very weird way that has led to difficulty on multiple occasions, where people think changes 'went missing' because SourceTree decided the other ancestor wasn't important enough to show.


I don't use Windows. Seems a little clunky to get it to work on Linux, even more so on the Mac. Haven't had any multiple-ancestry issues with SourceTree - perhaps resolved in an update since you encountered it?


Zsh users can use glgga to get a nice full detailed graph log as well. It's not as concise as the custom config, but I use it quite often.


Classic git CLI. The option everyone wants is behind 5 optional flags.


`git log -p` is my favorite obscure git command. Shows you commit by commit changes. If you specify a path, it limits to only those files. If you do a single file you can do `git log -p --follow <file_path>` and it will track the file across moves and renames.

Also `git whatchanged` is a super helpful command to see just the list of files that changed in each commit


`git add -p` is also very, very useful. I use it all the time, to the point that I've aliased it to `a`.

If you're interested, here's my .gitconfig, including all of my aliases: https://gitlab.com/lyndsysimon/dotfiles/blob/master/git/gitc...


Completely agree. I think a lot of people would enjoy git more if they made better use of the `-p` options.


Do you know about `git commit -p`? You might like it even more than `git add -p`.


I didn't know "whatchanged", thanks! However, look at the description: https://git-scm.com/docs/git-whatchanged


Yea. It's technically deprecated, but it is also super useful so I don't see any reason not to use it.


Yikes, do people consider `git log -p` obscure? I can't live without it!


here's my git log alias: alias gl='git log --date=local --pretty=format:"%C(124)%ad %C(24)%h %C(34)%an %C(252)%s%C(178)%d" --stat -p'

has nice color coding around the date, person committing, commit hash, and message.


Better yet, alias "git status" and "git log" so you'll be more inclined to use them frequently.


I have taught git for university classes for some years. To be honest, git is a mess. It is conceptually not that hard, but the nomenclature is inconsistent and dangerously ambiguous (quick, what is the difference between reset, rebase, revert, and checkout?).

The most effective work flow I have found so far, is teaching only status/clone/pull/add/commit/push. Show them explicitly what happens normally, what happens when two changes conflict, and how to resolve merge conflicts. Using git on the command line only.

Then, have the students use git for a big-ish multi-student project. They will figure out the workflow themselves. After that project, once they understand the basics, you can talk about branches, debasing, the log and reflog, pull requests, and all the rest. Don't introduce graphical front ends before this point.

This method works well. It takes about one hour of teaching, and five weeks of active use afterwards. Git is a total pain to learn, and can only be understood by actively using it. I often get very positive feedback for having taught git.

I have gone through a few iterations with this topic, and have found that stripping down the initial instructions to an absolute minimum works best. All those fancy box diagrams are actively harmful to beginners.


I found that running workshops was the most effective way to teach git. I wrote a set of tools against the github api to spin-up and manage a large number of repositories and students in a way that would result in merge conflicts (and doesn't require knowledge of any programming language).

https://github.com/probinso/ABC


I don't know if this is true. For me personally I have had to use git for a lot of different projects, but I still don't understand anything besides commit, push, pull, add, and force-push (lol). In my experience you can get away with just learning those commands, but I still don't feel comfortable with git.


Does anyone really feel comfortable with git? I've been using it for over a decade, in a handful of different organizations, on projects large and small, both at work and for personal projects, and it still feels like a convoluted mess. Its terminology is wildly inconsistent and my mental model of its behavior remains stubbornly full of fog. I have a library of recipes committed to memory, but I still don't really understand it, and sorting out what's gone wrong when things inevitably go wrong remains challenging.


An honest criticism:

If I had trouble with understanding the reason behind add->commit->push workflow, I would definitely have no idea what this article talks about when it says things like "merge, rebase, diamond shape". The flow chart looks almost exactly the same for "pull" and "pull --rebase". The only difference between the charts is the wording which has no meaning at all for a newbie.


It sounds stupid but something that really simplified git to me as an absolute beginner about a year ago was the fact that it doesn't exist on GitHub. Knowing that git was just something that sat in the folder on your filesystem and monitored changes took away the notion that there is some kind of sync between your remote and local. Working just on my own laptop made me realise how intuitive all the commands and strategies for solving problems were. Then came pushing, pulling etc between branches and it all just fell into place.


I wouldn't say it monitors the directory, because there is no git daemon running. Git just looks at the changes whenever you run a git command


This is an article, not for people who don't understand git, but for people who do understand git and want to explain it to others.


Right, thanks!!


> The flow chart looks almost exactly the same for "pull" and "pull --rebase".

"pull" and "pull --rebase" can cause two kind of conflicts:

1. Merge or rebase conflict inside the repository.

2. The would-be merged or rebased HEAD conflicting with the dirty working directory.

The article demonstrates the latter and it's the less important one as it's avoidable by pulling from a clean working directory or pulling a non-HEAD branch.


I would suggest making sure to explain why there staging step exists: to create coherent commits. Then go into "why distributed version control."

How to write a Good Commit Message (https://chris.beams.io/posts/git-commit/).


Thanks for the feedback! Right, this is only a help to have a mental map. Only some initial drawings for a mentor or trainer, not for the newbie.


If so, I'd love to see a more elaborate, guide-like version of this model.

I've tried a similar approach years ago for teaching and failed spectacularly. Eventually, my peers became comfortable when they got used to the Github Desktop Client. They compared the buttons they click with my terminal commands. We also compared our graph views on Github website to visualize the logic.

It's been years and still none of them used rebase even a single time. A sad story in my teaching non-career. :(


I drew those diagrams after reading Pro Git Book. I missed what appeared at my head, but the entire book is very useful.

For me, command line is freedom. GUIs are very limited.

Don't give up! For my humble experience:

- Try to know your peers, how they work, what difficulties they face when using command line, ...

- "I'm lost" > "git status"

- "How did we solve... ? > "git log"

- "This command is difficult to remember." or "This command makes no sense, I prefer this another name for that action" > "git alias"


This is nice, but I'd like a 201-level handholding on git. I've been using it for 5 years and I'm still just a clone/commit/merge/(bang head)/push user yet I know there is tons more it can do that would probably make me more effective. (I'd also like to switch my team of SVN. Someday....)


My recommendations:

* Git From the Bottom Up: https://jwiegley.github.io/git-from-the-bottom-up/ (PDF: http://ftp.newartisans.com/pub/git.from.bottom.up.pdf) (See also: "Linus Torvalds' greatest invention": http://perl.plover.com/yak/git/) Once your mental model matches the program, you'll be able to understand everything, hack your own solutions if necessary, etc.

* Learn Git Branching: https://learngitbranching.js.org -- in fact I think this should be the UI for Git; wish someone would make that happen (i.e., so that you can point it to any Git repo and get that sort of visualization for it).


Thanks! Awesome resources! I'll add it to my post as soon as I finish reading all the answers. I'm learning a lot from you!



SublimeMerge is also a wonderful tool. Been using that since it came out and has helped tremendously with difficult merges. It's lightning quick as well.

Similar to magit - Sublime Text also has SublimeGit, not as fast as SublimeMerge.

Visual Studio Code built in git functionality is also nice.


A mention for Tower. I only do the basics but it totally took away my Git fear.

https://www.git-tower.com/mac


gitKraken is nice too

https://www.gitkraken.com/


I think once you understand the graph, then looking at what the graph looks like and what you want it to look like usually leads you to being able to figure out what operation you want.

And then git fetch/push/pull are mostly about copying parts of that graph from one place to another.

After that, it's mostly a question of what workflow you want and that becomes much more than a git question because git works for a variety of workflows, and many of the features of git are really only relevant for certain flows.


100% agree. Once you know what you want the tree to look like it's usually just a matter of making a branch/tag (in case you mess up, and need to start over) and not being afraid of reset, even with the --hard flag.


Use a GUI. Git repos are graphs and trying to understand a graph from the command line is like trying to paint over the phone.

I recommend GitX (FOSS, Mac only, a little buggy but has the most logical UI and let's you easily amend commits), SourceTree (free, quite slow, Windows/Mac), Tower (paid, cross platform) or SublimeMerge (paid, easily the fastest, cross platform).


If you're an emacs user (and maybe even if you're not), try the magit package.

That taught me in a few weeks of usage more git 102 and 201 type functionality than a year of using only the basics did.


As a new emacs / spacemacs user, I bounced off magit hard. Too many weird things going on, hard to tell what I can do.

A couple months later, and a bit more experience with how emacs structures things, and I've returned to Magit and _love_ it. It just feels natural to hop around in for standard tasks. Certainly a good replacement for other porcelain UIs I'd normally use.


I use Spacemacs ONLY to have access to Magit. And I've successfully converted many non-emacs non-vim folks to use it, all of them love it.


This is so so fitting, and describe me to a "T" as well.

I'd add oh-crap-I-screwed-up-so-let's-clone-the-repo-and-start-over to the list.


Alternative to nuking from orbit is taking a look at "Flight rules for git": https://github.com/k88hudson/git-flight-rules


Awesome! I'll add it to my post! Thanks!


What can you screw up that requires re-cloning the repo rather than just a reset --hard?


Not knowing about the "reset --hard" feature? :)


For one, accidentally deleting the .git directory.


Honestly, you just need to try some things. Your reflog is your friend. You'll get the hang of reset/rebase/ect with a little experience. I also think it's worth it to learn a bit more about diff and log.

I have no idea why anyone would want to go back to SVN. If you don't want to use all the features, I understand; by all means, continue to only ever branch -> commit -> push -> create pull request. There's no need to subject the whole team to version control that doesn't work. That's what sold me on git; it's the only version control system that never failed at it's job of letting me track my changes. It doesn't break it's promise to always be there for me.

SVN, centralized systems, and even mercurial to some extent, prevent users from tracking their changes. This leads to questionable workflows (lots of copying directories to "save things"), or even worse, developers just don't track their changes at times. It sounds weird that I have to say this, but I feel that version control systems should be available to track changes 100% of the time. It seems like many of the people who dislike git, don't see the value in this, which I find absolutely baffling. Git means never being scared to create a commit.


"and even mercurial to some extent, prevent users from tracking their changes"

False. Stop spreading misinformation about Mercurial.

"Git means never being scared to create a commit."

This is indeed one of the biggest benefits of DVCS tools, including git and mercurial.


Thanks for the feedback. Quite frankly, I never found a comparable workflow in Mercurial (but that doesn't mean one doesn't exist). Commits still felt like a permanent thing that made me pause and think about whether something was "worthy" of a commit, thus I was unable to track my changes. This problem just doesn't exist in Git, though some care is needed when publishing (as w/any system).

I'm unaware of anything within base Mercurial that lets me do what I can do with git. Maybe the answer is to use the extensions, however that's a bit unsettling when some of them are being depreciated (eg. hg queues). Often times I found myself cloning and maintaining multiple directories for something that was better off as one thing. This led to reluctance to track changes, which is clearly evil. I guess I must have been doing something wrong, but I've heard similar feedback from others regarding Mercurial. I loved using hg and would have no problem using it (it was my first); I just prefer git now.


I agree with sjburt when he says "I think once you understand the graph, then looking at what the graph looks like and what you want it to look like usually leads you to being able to figure out what operation you want." When I was first getting in to Git ~10 years ago, I remember that I found the PeepCode "Git Internals" book[1] very helpful for getting that understanding of the graph.

Once you have the better understanding of the graph, it's hard to find resources on how to improve from there; most resources focus on beginner stuff, or function more as a technical reference without really talking about use-cases. I've found following Mark Dominus' blog[2] for his posts about Git to be the single best thing to "level up" my Git usage once already being at a high-level.

[1]: At the time, I had to pay money for the eBook, now they have the whole thing on GitHub: https://github.com/pluralsight/git-internals-pdf

[2]: https://blog.plover.com/


I took it upon myself to introduce Git to my colleagues: statisticians who've never used version control. I tried to preempt hard questions by going into a repo, wrecking it like a mad bull, and trying to undo the damage with Git. Turns out, it was a great way to learn for myself.

Exercises:

- Commit a "secret" and involve it in other branches, commits, merges, etc. Then remove the secret so that nobody can ever learn it from the repo.

- Clone a repo from another local repo and then rewrite history with rebase and revert. Then commit different work in each repo. What's the least painful way to get them "compatible" without losing any work?


Humbly, I'd disagree that's the best, though this is a superb _part_ of the picture to teach Git well. These are nearly the last steps, I would say.

When new colleagues joined our firm and hadn't yet learned Git, the problems were always the same: uncertainty. They didn't know which Git operations were safe, and they didn't understand how to perform seemingly risky maneuvers with zero risk. They're used to even more dangerous tools that can wipe your work in a second - and, to be fair, Git can as well. The difference is that once you understand Git, you never have to worry about losing work.

So the way I would teach Git is to honestly start with the graph. Show it in action with pictures. Show how to always keep references to commits around to ensure work sticks around. Show how branching and stashing work, let them be confident that the tool will keep everything right where you left it.

_Then_, once they're confident in the basics, weave in the remote repositories.


> So the way I would teach Git is to honestly start with the graph. Show it in action with pictures. Show how to always keep references to commits around to ensure work sticks around. Show how branching and stashing work, let them be confident that the tool will keep everything right where you left it.

Personally, I think this should be coupled with teaching `git reflog` as the universal undo (as long as they don't `gc`).


Don't teach anybody git gc. The people who really need to find it will come upon it all on their own.


Having taught git several times within a data science course I find two concepts especially worth extra time: WHY there is a staging area, and what is the difference between “git” and “github”.


> WHY there is a staging area

I understand your second point, but I have a hard time understanding the difficulty with this part. Why is it hard for people to understand the idea of staging?

You put things in a box one at a time before closing the box. Does it require more explanation than that? What do people find difficult about it?


People are very used to the web "save always" style: There is one document, and you're editing it. Most people will be familiar with the traditional desktop "save" model where you have to do something to make your changes permanent.

People often then learn that there is a local file and some remote file: they can cope with a save -> upload workflow. Lots of traditional VCS turn this into a save -> commit workflow.

Git adds two stages to this that people can't see the need for without understanding the internals: an extra step between save and commit, and an extra step after commit.

(The discussion reminds me of all those people who think that if they just start by talking about monads then people will find Haskell easy and natural...)


There are a whole bunch of layers now, though they're all useful.

1. Is my document saved?

2. Are the changes staged?

3. Are the changed committed?

4. Are the changes pushed to my fork on e.g. github?

5. Are the changes merged into the upstream repository on e.g. github?


The don't need to understand the internals for this: just knowing that every save you do will be stored forever as-is makes you double-think about what you put inside


So I have a solid mental of git, and I understand the theoretical need for the staging area.

However, I find the occasions for using the staging area in practice are few and far between, for the simple reason that I can't test and execute the code that's in the staging area without also having the code from the working directory also be there. It feels like after having partially staged some of my working directory, it would be a blind commit with no guarantee that things are working.

Very rare is the situation that I can break out a list of files over here that are for feature A and some over there for feature B, and never the two shall interact.

I think this is probably what most struggle with regarding the staging area, without being able to articulate it.


I use it quite a lot, especially with `git add -p` to stage only parts of a file for an atomic commit.


I second this. It wasn't until I adopted this practice that the staging area really made sense to me. I find it helpful not just for making atomic commits, but as a way of remembering what I was actually doing, so that I can write a good commit statement.


This has never made sense to me. I've seen others say that they commit only parts of a file. How does this scenario start? Are you working on solving one problem, but then notice some other unrelated issue and fix that too, before committing the first change?


Partly, yes. Or, I'll be working on a task overall, and have to touch multiple files in the process. Then when I'm ready to commit, I review all the modified files on disk, and look for ways to break those down into smaller discrete logical changes. I prefer to avoid "big bang" commits as much as possible, because smaller individual commits are easier to inspect, easier to back out if necessary, and provide a better "story" when inspecting a file's history sometime down the road.


But then, you either never run/tested those smaller individual commits, or you have to do extra work (stash changes, test, restore stash) to do that.

I do not see why a source control system should make it easier to make a commit that hasn’t ever existed on disk and thus cannot have been tested.

I think the better model would be to stash your changes and have an diff editor between the on-disk working copy and the stashed version that allows you to commit a set of changes as several smaller, more coherent commits.

That wouldn’t guarantee that each of those intermediate commits gets tested or even built, but it would guarantee that each smaller commit is in the on-disk copy at some time.


> But then, you either never run/tested those smaller individual commits

Not necessarily. One nice option that the git rebase command has is --exec (which can be specified multiple times). So you can run a rebase and have git execute a command (like running a test suite) for each commit in the branch. If any commit files, the rebase process will stop and let you amend the commit to fix the issue.

> or you have to do extra work (stash changes, test, restore stash) to do that.

I've found that it's easier to write and locally test a given feature and them incrementally stage parts of it and create commits before pushing the code up for review. To me, that's easier than just making a large commit and then trying to split it out into a better set of commits after the fact.

For example, I may write a new method and then call it several places in the code. So my first commit would be to add the new method along with its unit tests and my second commit would be to add calls to it in the code base and update the associated integration tests (if necessary).


Did not know about rebase with exec! I'll have to try that! Thank you for the insight.


One common scenario is that I'm working on one problem, and in the process of solving that issue do some refactoring of related code. In this case, I want to commit the refactoring (which does not change the program's behaviour) before committing the changes that do change the program's behaviour.


I typically then send that first refactoring commit to Github (on its own branch) so that it gets full CI test coverage. And then continue working on the fix/feature while it runs.


One use case is to exclude extra lines of the file you don't want to commit. For example, I might have some debug print statements in my file that I want to keep in my local copy of the file while testing, but I don't want to include in the commit I push up for review.


> Are you working on solving one problem, but then notice some other unrelated issue and fix that too, before committing the first change?

Almost. Most often it's:

- Working on solving problem A - Notice problem B - Start to solve problem B - Notice I'm getting distracted from A, and return to finish it. - Want to commit my fix for A, but don't want to lose or forget the partial work on B.

Two different approaches I might take in this situation, depending on whether B is related to A.

1. If they are related (eg, B depends on A), use `add --patch` to commit A, then finish and commit B. 2. If unrelated, use `git stash --patch` to stash B, then commit A, then switch to a different branch to finish B.

Honestly, I see the point of both stash and staging, but not both together. Too many tools for the same job. On my long list of projects to do is a git porcelain that combines some of these concepts (eg, stash and working directory which would be tied to a branch):

- Each branch would have a single stash. - When you check out a new branch, all uncommitted changes are automatically stashed. - If the branch you're switching to has anything stashed, that stash gets popped. - Any current workflow that involves stashing can be replicated by using a branch instead of a stash.

This way, branches can be thought of as "state of the working directory", which is more intuitive with the branching tree model, imo; commits are a snapshot of the repo at that point in time; and the staging area is just a way to choose what should be included in those commits.


Amending the last commit does basically the same thing and records each state in the reflog.


You never amend commits or rebase locally before pushing? I rebase before pushing almost every time.

Git’s workflow wouldn’t even be sane without the staging area. This is what allows you to fix mistakes and make your work presentable for remotes.


> Git’s workflow wouldn’t even be sane without the staging area. This is what allows you to fix mistakes and make your work presentable for remotes.

I did exactly the same diff/tidy/diff workflow when I used p4 and svn, neither of which make a distinction between "working directory" and "staging area".


Right, but p4 & svn have “checkout” which is similar to staging. Staging is part of what we get because we can edit files without having to checkout / open for edit.

P4 and svn don’t have a strict commit parentage, which is why you can push commits in those systems in any order. Git’s strict concept of parentage is what makes the staging area so important for keeping your workflow similar to p4 & svn Workflows. Without a staging area, you’d either have to always fix mistakes with new commits, which is bad, or rewrite already pushed history, which is worse.


> without having to checkout / open for edit.

The terminology is a bit different - unless configured with mandatory locking (essential for some workflows) you don't have to open for edit. You just edit stuff and it goes in the "default changelist", roughly equivalent to automatic staging.

> Without a staging area, you’d either have to always fix mistakes with new commits

Mistakes at what point? In the normal svn workflow you can review with svn diff, then when you're happy do svn commit; it's just that there's no local place you're committing to. In both cases there's a critical point, either "svn commit" or "git push".


> unless configured with mandatory locking ... you don’t have to open for edit.

I’d guess you’re learning toward talking about svn, which I don’t remember very well, and I am leaning towward talking about p4, which always does mandatory locking.

You’re right the terminology is different between these different systems, I’m just pointing out that the git staging area has what you can think of as some equivalences in the other systems. Or, you can think of it as tradeoffs. Either way, the git staging area is something that helps you pretend like you’re using svn or p4 in the sense that it helps support editing multiple changes at the same time before pushing them to a server.

> Mistakes at what point?

With git I’m referring to mistakes between commit and push. But there’s a philosophical difference here that I glossed over. With git it’s easier to commit early and often than it is with svn or p4. With svn & p4 it’s easier to lose your work because version control doesn’t know anything about it before you push. If I make micro-commits, which I want and I like, then I put more “mistakes” along the way into my local history, and I can use the staging area to clean everything up before I push. With svn & p4, you make those mistakes and do the cleanup without ever telling the version control, and you run a greater risk of losing that work while you do the cleanup.


Never, and can never remember what rebase actually means.

At work I’ll hit the squash option on gitlabs merge request which moots all local machinations.


judging by the atrocious management of remote history I've seen at workplaces, "making work presentable" is pretty far down the line of priorities


Amending commits and rebasing involve the staging area?


Usually. You can also amend and rebase remote commits, but that’s usually a big no-no.


Committing isn't a commitment. After making the first commit, you can use the `git stash` command to put the rest of your changes aside, and go through the normal test->amend loop until you're happy with that first commit. Then you just retrieve your other changes from the stash to make your second commit.

It's also possible to do this without the stash command, by making both commits right away, and testing them later. However, that would involve rebasing(?) your second commit on top of any changes you end up making to your first commit, so using the stash makes more sense to me personally.


Fwiw, stash can get you into trouble more easily than commit. It’s no more typing to commit or branch, so I recommend preferring those to stash when it makes sense, or when you’re playing with changes you don’t want to lose. Stash is handy for a bunch of things, so use it by all means, just remember that there’s often an equivalent way that is just as easy and much safer.

The git stash man page talks about this: https://git-scm.com/docs/git-stash

“If you mistakenly drop or clear stash entries, they cannot be recovered through the normal safety mechanisms.”

One of the best things about git is how big the safety net is, as long as you tell git about your changes. Almost any mistake can be fixed, so why use features that aren’t sitting over the safety net?


A scenario:

You're adding a feature to your proggie. That involves modifying the main bits to add the feature and, say, adding a couple of interfaces to internal library modules.

Split out the changes to the library modules into separate commits---it's safe because nothing uses them, they're logically separate from the feature changes (although they don't appear to have a justification without the feature), the log will be marginally cleaner, and git bisect will have more granularity.


Why is the staging area needed in such a case ? In more traditional systems, you'd just do, say, "svn commit library/" and then commit the rest. (and you could do just the same in git too without seeing the staging area)


Understanding the staging area first requires understanding the need for it: The need for atomic commits. The need to create commits that have specific changes in them and are not always a snapshot of the entire world below the git root exactly as is right now.


Yes, it requires more explanation than that. I've used git for years, and never really understood why staging is even a thing.

Your example is an implementation of the box-putting algorithm, but it doesn't need to be mirrored in the put-box CLI.

    put-close-box file1 file2
This command could encompass all the putting and closing. Since you only close boxes when you are done putting things in it, I don't see a need or purpose to split it up.

    put-box file1 file2
    close-box
A closed box (commit) is always going to contain stuff that was put in it, so why separate commands?


That's not convenient when you're putting things into the box piecemeal, especially with `git add -p`. A thing I do frequently is to run `git diff`, scan through it, and add files (or parts of files) one by one in a second terminal. Then I do a final review of the staging area (with `git diff --cached`) to make sure it only has the changes I want and commit. I'm the sole devops engineer at my company and my workflow is a bit more scattered than a typical developer's.

Anyway, `git commit file1 file2` by itself is most of the way to being the put-close-box function you want; it just doesn't work for adding/deleting files from the repo. Seems like they could make a lot of people happy by closing that gap and letting `git add` be an intermediate-level feature.


To me, that ought to be a concern of the "porcelain", although no one uses that word anymore. CLI is particularly bad at certain types of interaction. So to compensate, a mitigation is moved into the underlying model of git. That mitigation is staging. The inconvenience of "piecemeal adding" could have easily been addressed in the UI layer using a more suitable presentation, rather than forcing all clients to follow the stage/commit dichotomy.


For simple projects (like ppl experimenting with git) you will always want to save all changes. So why stage first ?


Not everyone stays a beginner forever, and it's nice to have a tool that doesn't play to the lowest common denominator. It's really not that hard to just do a "git commit -a" if you want to avoid staging.


> Not everyone stays a beginner forever

But the vast majority do, or at best become perpetual intermediates (https://blog.codinghorror.com/defending-perpetual-intermedia...).

99% of developers out there didn't need a power tool for source control (source control is already quite a power tool many devs can barely handle, even in SVN form...), yet here we are: Git is imposed everywhere, with its horrible UX.


Git's UX isn't that bad if you're only cloning projects to build them locally and keep them updated. The UX only gets really crufty as you use more and more of the features.


I think people find it difficult because for most beginners at git, they just want to put everything in the box. Having the option to put just some things in the box seems more complicated than needed. Obviously, as you get better with the tool, you realize the power of literally "staging" your changes into multiple commits, but as beginner, it's not even in your purview.


My hurdle was 15-20yrs of no staging area from previous VCSes so the extra step took some time to understand why it was needed.


Isn't the staging area closer to an intermediary box? That's where it can get confusing.


Staging puts things in the box, commit closes the box, puts it on the pile with the other boxes, and gives you a new empty staging box.


But why is it an extra step? It's basically just a "longterm" selection of what you want to commit.


Because you not always want to put everything in the box (and if you do, there's a shortcut to do it), and "git commit file1 folder/folder/ * .cpp folder/folder/ * .h ..." for a complex set would be annoying and require you to mentally keep track of it from the beginning.

Many beginners will start by always doing "git commit -a" and that's fine, as long as they know there's an alternative once they need it.


But why is the exceptional case the default?

Surely, most of the time when you go to commit, it's all the files you've changed?


Not for me! I often find myself refactoring tangential features while producing a new one. Sometimes that will even intersect in a single file. But that refactoring doesn't come with any changes relevant to the feature I am working on in my branch. So I save them for their own isolated commit(s). While this doesn't happen on every commit, it probably happens for me about every other push. The alternative is bundling in a bunch of changes that have very little to do with the feature that my branch is ostensibly about.

EDIT: Now that I think about it, I also have several repos where I have changes that I never intend to ever commit them, because they are development conveniences for me personally.


Same for me! Webpack config changes to cache settings, config changes to hit a different API for testing, using a different database for testing. Most of these live in my staging area and get stashed and popped when I switch branches/rebase.


Not really. I think of my git use case at work pretty simple. I usually stash, pull down, fast-foward and then pop my stash on top. Occasionally I'll need to rebase too. Just to show I'm not a super advanced user or anything.

I'm a JS dev mainly working in React on a web app with a backend team using PHP. Often I'll be working on a branch with maybe 2 or 3 people and I often end up working on a few things at a time. Say I'm working on a feature, and I notice some bug I'll fix that and then get on with my feature. Once I go to commit I pretty much always do a 'git add . -p' and I very rarely want to add all the files I've worked on!

Even things like switching a config file to use a service like apiary where I don't want to commit my change to the config to use apiary.. Or change to my webpack config for testing, etc.

I've used Perforce, SVN and Git and the whole 'staging' area thing always felt very natural to me. Here are the files you've edited, which ones want to be commited? It gives me a second chance to go through and check everything before I've commited, and often that stops me leaving in any odd comments or debug code.


Almost never actually. I never commit all the changes in my repo (for big projects I often have some small changes in other places, I don't want to commit them)


My point was more why staging is a special feature that even has a name. You're basically just selecting what changes you want to commit.

What is the usecase where one needs to remember that selection for more than just a few minutes?


probably related changes grouped together


The staging area is really an extraneous concept that isn't required. It's like a commit that isn't a commit.

In Mercurial, I much prefer to just make it an actual commit in the draft phase (the default phase) and just keep rewriting that commit. Mercurial provides tools for both selectively adding and removing hunks from a commit (both `hg amend` and `hg uncommit` accept --interactive for hunk selection). If you're extra paranoid, you can make it a commit in the secret phase so it's not shared prematurely by accident.

It's pretty much functionally equivalent and doesn't require an extra location in which your code can be. It's either in your working directory or in a commit.

A bonus of this approach is that now you have a meta-history, hidden by default, of what you've "staged" and "unstaged". It's kind of like a reflog but with, in my opinion, a better UI. And of course, the index/cache/staging area in git doesn't use refs, so there's no reflog there.


I've helped move a couple teams (kicking and screaming) from TFS to git, and I start back even further than that - why is it so much more complicated than clicking a button to save and share my work, and what is the benefit of that complication?


I'm very experienced with git, approaching expert level, and I don't use the staging area. I use

    git commit --verbose --patch
and bypass the staging area entirely. I don't find it helpful.


Git itself is so simple, it's all the stuff around it that can be overwhelming. Social mores about all the possible workflows are maybe the biggest (rebasing vs merging, granularity of branches and their longevity, acceptance of partial commits reflecting a state never realized in isolation on disk, commit hooks, requirement of every commit to satisfy properties x,y,z, direct access to common parent repo copy vs requiring some sort of pull request flow (and dependence on github and all their stuff)) but there's also work tracking, code review, build automation, test automation, deploy automation...


From the article: "If you take care the commit history, consider the use of git pull --rebase. Instead of fetch + merge, it consists of fetch + rebase. Your local commits will be replayed and you won’t see the known diamond shape in commit history."

No, not how to teach Git.

Open source just can't do good user interfaces. The result is almost always a zillion features in search of an architecture.

Blender managed to almost dig itself out of that hole, but it took over a decade. Gimp is still down in the pit.


Perhaps managing git is finally the business case for VR, Hold my coffee I'm going rebase jumping.


I was just running into this dichotomy this morning. To test my app Autumn on High Sierra, I had to install it in a VM, so I tried VirtualBox (open source) and Parallels Desktop Lite (in-app purchases). Not only did Parallels have a smoother, cleaner, more modern and easier GUI, VirtualBox just out-right doesn't work when trying to install High Sierra, and I had to find third-party instructions online just to bypass this bug. Plus it likes to crash right after shutting down the VM. I'm not really sure if there's some deeper philosophical reason behind this dichotomy, but I've seen it hold true for a lot of apps and their open source alternatives, and many people have said the same thing holds true about my app Autumn and open source alternatives like Hammerspoon and Mjolnir. As a rule, we really do seem to get what we pay for.


You're pinning THAT on opensource?

a) VirtualBox is an oracle product. That by itself should be telling.

b) High sierra is unsupported as a Guest OS in VirtualBox. You do know what that means, right?

c) You seriously complain about the darth of open source virtualization for an OS, which disallows virtualization on anything but apple hardware? ...really?

If you want good open source virtualization you'll want to use Qemu/KVM. Which obviously doesn't support any apple OS either, because they're not allowed to virtualize it. Take that up with Apple, not open source


It was the most recent thing I did (literally this morning) so it was fresh in my mind. I'm also not an expert in virtualization solutions given how I haven't needed to use it until now. But this has held true for many, many apps and their open source alternatives. Paid products generally tend to be higher quality than open source, for whatever reason.


> Paid products generally tend to be higher quality than open source, for whatever reason.

Putting aside the fact that this isn't true, and that there are quite a few quality open source apps on macOS, it's pretty clear why paid products have higher quality: the bar for people to buy them is much higher, so they generally need to be at least somewhat decent for people to consider paying for them.


Virtualizing macOS on a macOS host is allowed, and I think QEMU will let you do this.


There wording was iirc "you're allowed to run one instance of Mac OS per Apple chipset."

So, virtualization is technically only allowed if you're running your apple hardware with anything besides Mac OS.

But apple isn't enforcing that limitation, as products like VMware fusion on Mac OS are (at least as far as I can tell) officially sanctioned.


Here's the phrasing for macOS Mojave:

If you obtained a license for the Apple Software from the Mac App Store or through an automatic download, then subject to the terms and conditions of this License and as permitted by the Services and Content Usage Rules set forth in the Apple Media Services Terms and Conditions (https://www.apple.com/legal/internet-services/itunes/) (“Usage Rules”), you are granted a limited, non-transferable, non-exclusive license:

(i) to download, install, use and run for personal, non-commercial use, one (1) copy of the Apple Software directly on each Apple-branded computer running macOS High Sierra, macOS Sierra, OS X El Capitan, OS X Yosemite, OS X Mavericks, OS X Mountain Lion or OS X Lion (“Mac Computer”) that you own or control;

(ii) If you are a commercial enterprise or educational institution, to download, install, use and run one (1) copy of the Apple Software for use either: (a) by a single individual on each of the Mac Computer(s) that you own or control, or (b) by multiple individuals on a single shared Mac Computer that you own or control. For example, a single employee may use the Apple Software on both the employee’s desktop Mac Computer and laptop Mac Computer, or multiple students may serially use the Apple Software on a single Mac Computer located at a resource center or library; and

(iii) to install, use and run up to two (2) additional copies or instances of the Apple Software within virtual operating system environments on each Mac Computer you own or control that is already running the Apple Software, for purposes of: (a) software development; (b) testing during software development; (c) using macOS Server; or (d) personal, non-commercial use.


No one is appreciating the effort and a different take here. Let me congratulate the author on job well done.

There are people who love illustrated explanation and for those these are perfect. This is just meant as a template which others can use to build the illustrated material and in no way a comprehensive git tutorial.


I completely agree. I've seen a lot of these, and this one is simple, to the point and easy to understand.


Agree, I think this is sort of like a very simple introductory on what git is about and to me it does the job.


When I was first learning git, I found an online visualizer like this [0] that really helped make make concrete the ideas of git history being a graph, and what different operations did on that graph.

There was still obviously the issue of memorizing the commands, but at least I knew what the commands were doing on a deeper level.

[0] https://learngitbranching.js.org/


The other problem I find in git is that there are many GUI interfaces and none of them are consistent. In Eclipse I had a different interface depending on what project I opened, despite both the projects being in Python.


I disagree with this idea. The best way to learn git is to read the git book, in this order: chapters 1, 10, 2, 3, and the rest at your discretion. This way teaches you about the internals first, and if you understand the internals the rest of git is pretty intuitive.

https://git-scm.com/book/en/v2


Great idea. This kind of culture is why there are a lot of people that don't and probably never will use git.


There's something to be said about reading documentation rather than relying on stackoverflow answers or possibly inaccurate tutorials.

Substitute C, Java, Python, etc for git. You can probably do something with those languages, but you aren't going to get very far without reading some sort of documentation.


The article is about how to get a more fundamental understanding idea on how git works, and this book demonstrates fundamental ideas on how git works. I don't see a problem in this reccomendation. If you want just a cursory knowledge of how to use git to get by, this probably not the right choice, but that's not really what this discussion is about is it?


Well, that is what the article is about.


Unfortunately, it isn't possible to effectively use git without knowing something about the internals. You can do the basics taught more 'by rote', but sooner or later you're going to run into something unexpected, or something complex you need to do and you need to understand the data model in order to have a chance of sorting it out.


That's a lot of reading for a tool that should be making life easier.


It's an engineering tool. You'll be using it all day every day for the rest of your career, the investment is worth it.


That's simply not true for many Git users. I'm a developer, so I do use Git every day, but I work with a bunch of researchers that absolutely do not need to use Git more than once a week at most. Convincing those people to care enough to learn the internals has been a constant uphill battle for me.


I will save this answer for when anyone complaints about C++ or Rust being complex languages.


How about now :) Rust and C++ are complicated languages, and that's bad.

But git isn't complicated. Git is a handful of simple ideas composed in interesting ways. It looks complicated because there's a lot of porcelain commands with a lot of options, but all of them are just manipulating the same simple internals which, once understood, are clear and intuitive.


Many people would disagree---the permutation of simple ideas gets...less simple quickly.

On the other hand, I've used ClearCase.

(Do not use ClearCase.)


That is the thing, dealing with git issues brings back flashbacks of using Clearcase views on UNIX.


I see you got my devil's advocate gist. :)

Thing is, dealing with git feels like being in the early 2000's using Clearcase views.


The difference being that you will likely spend hours of your day thinking in rust, while git should be taking minutes of your day, but often ends up taking hours when you screw up a command and need to restore things to how they were.


You could say the same bout Stack Overflow, but its a lot more intuitive.


I gave a git talk at work recently and what I found works was teach the graph from the beginning. This includes a lot of diagrams of what the graph looks like as you commit and branch:

- show what happens as you add commits to a branch - show that branches are just pointers to commits - show what creating new branches looks like, i.e. creating a new pointer - show what merging looks like (a new merge commit is added, or else fast-forward merge and that the branch pointer just moves) - show what happens if you don't rebase (i.e. "ugly" non-linear graph) ... then teach what rebasing does (i.e. creates new nodes and moves the pointer)

I found that building up from the ground up and illustrating graphs allowed people to conceptualize things much better. There was still some confusion once merging was introduced (what happens to those earlier commits? do we need them?) and mostly because people hadn't thought of the graph before.

Git is one of those tools where you can totally do your job just by knowing the basic commands but not really know what is happening under the surface, which I think is a testament to the tool. But, that leads people to conceptualize their own idea of what is going on... and getting it wrong and being confused when they want to do something outside of their basic toolset.


Yeah, it was definitely a set of images [0] that made git click for me, but even those are not necessarily the best pictures to use.

Git's mental model is basically:

> Ok, what if we took our SVN repository and the first thing we did was check in an SVN repository to that. So now you check out the repository, and you get a complete local repository that you can do anything you want with!

> So Pull/Push are the terms we use for the check out and commit of your local repository to the remote repository, and then Checkout/Commit works from your local repository just like you're used to with SVN.

And the real magic is what they did by building a system on top of this model to let you merge changes all the way up while looking at the code like you'd expect you'd need to.

The problem is that git's toolchain still feels arcane to use, and it requires that you have good working knowledge of the underlying models. It's confusing enough that you can't function unless you have that because you don't know where you are or where you're going. It's a fantastic tool, but it's like driving a car with two steering wheels, two gear shifters, and six pedals. Then you say to yourself, "How do I get to the market, buy some milk, and come right home?" and your brain starts to melt a little bit.

You shouldn't need to know the nitty gritty of how git works internally just to get it to work right any more than you should need to know how a disk works in order use a file system, but over and over we keep seeing that knowing that is really the only way to use the tool correctly and that it takes quite awhile for people to get.

[0]: https://blog.osteele.com/2008/05/my-git-workflow/


I did this quite a while ago following your same teaching methodology. Feedback welcome! :-)

http://www.robertames.com/blog.cgi/entries/git-in-two-ten-mi...


The best way to learn git is to learn what's happening at the DAG level. That way you can think about what should happen on the DAG and then think of how you can use git to achieve that. For example, a fast-forward merge and a reset can be used to achieve the same thing.

It's also very important to learn to use the reflog. When I was learning to climb they told me I'd never get really good until I'd fallen once. The same thing goes for git. People are really scared of it because they think they could lose work or something. Thanks to the reflog and the way git works, that's actually quite difficult to do.


I learned that Git stores data as DAG at the very beginning when introduced to it. But I came to get clicked until I realized that it is not only a DAG but an immutable one. That is, existing nodes of the DAG are never changed once created. The only operation supported by the system is more or less: create. Also, using the plumbings to peek into the content of the objects and refs inside the .git helps a lot as well.


Yeah, that's a very important point. Even changing the parent of a commit, that is, an edge in the graph, changes the hash of the commit. Therefore to change the parent (like in a rebase) you have to make a new commit, but the old commit doesn't go anywhere, you just can't see it because you have no reference to it any more.

Git will garbage collect these commits that can't be reached by any reference after a while, but usually that's long after you've forgotten they ever existed.


I like the diagrams, as many explanations often skip the staging portion of the whole git process. I use that so much that it baffles me that most people skip it. Then again, I typically don't like having tons of "WIP" commits so I stage a lot and if I need to switch to another branch I'll commit a WIP that I quickly `rebase -i` to get back to a clean status.

As for the teaching part, I have found the best results by having you and the learner actually "working" on the project at the same time on your own machines both pushing to a centralized server. It becomes too easy to go over all the commands and feel like you understand it. Most of the scary parts of git are used when its multiple developers on the same project. And oddly enough, having the learner do both Developer A and Developer B's tasks don't seem to work as well as having the learner just do Developer B while I do Developer A. Trying to explain to someone when to use merge, when to rebase, and when to use cherry-picking to get the code I just pushed into their current working branch can be done so much easier when its hands on AND knowing exactly who is doing what steps.


I found this very helpful: http://eagain.net/articles/git-for-computer-scientists/. Git's data structures are well designed. Once you have internalized them, you will be better equipped to navigate through the jungle of command-line options.

To understand the data structures interactively, use "git cat-file -p HEAD" and continue drilling down to an individual file in a subdirectory with "git cat-file -p OBJECTHASH"


I've never had to understand the internals of a web browser or text editor in order to use it; drivers ed courses don't start with a discussion of thermodynamics. Why should it be necessary for git?


Because every source management tool has a model, and to use it at all you need to know the model. Else you're jabbing buttons and turning dials on a complex machine and the outcome is going to be tragic.


I've used SVN reasonably well without knowing its internal model. I kind of knew a bit about it, I'd never call myself an SVN expert and I still managed to do my job efficiently.

Git fails majorly in this regard.


Maybe because git has a different model. Learn that, the objections go away. As the OP attempts.


This logic is kind of circular.

"Git sucks, the UX is atrocious, I don't want to spend half my life learning a tool that shouldn't even need that much hand holding."

"Learn Git!!!"


No the logic is not circular, and the advice to learn git is a good one.

It may seem paradoxical at first to you, but is true (as are many things in this profession). Another paradoxical advice like that, is to learn vim, or emacs, but I digress.

Git does not suck - as any other tool it, it just has strenghts and weaknesses (for example working with very large binary assets is its main weakness).

The UX of most common git CLI operations is clean actually, as they are fast, and you do not need many arcane options (although they are there, and are documented well, for people who read...).

If you screw up something, you just use the reflog to fix the state of your repo in most cases. Even if you can not (or do not want to), the troubleshooting is still easy - you can always do a fresh clone from your remote repository in a new folder and copy what you want there.


You're assuming that I haven't read about Git. I've read a ton about it and its internal data structures. And regarding your digression, I'm a vim user.

Regarding Git, Git does suck. It does the job Linus designed it to do, but that job is not most software engineers across the world need it to do.

In smaller or in corporate shops, Subversion was almost adequate and several bad implementation details, mostly related to branching, led to its demise. So that world needed Subversion++, not Git.

In the FAANG world, there's basically no company that uses Git as-is. It's strength/weaknesses aka tradeoffs aren't good enough for them.

Git won because tech is a popularity contest and people in our domain like to do a lot of virtue signalling ("this tool is hard to use, I use it, so I'm special/cool").


My response was to your sarcastic mini dialog above my reply. Do not try to read other peoples minds - it is impossible, and if you really want to, you can simply ask.

>> It does the job Linus designed it to do, but that job is not most software engineers across the world need it to do.

Speak for yourself, you are not most software engineeers.

>>In smaller or in corporate shops, Subversion was almost adequate ...

I have administered SVN profesionally for several years (2005-2007), and was paid to unfuck screwups made by other developers using it (which many times involved restoring from incremental hourly backups done on the SVN server side). Dealing and helping others with their git problems is many times easier.

The FAANG world (which I had to google just now) I imagine has unique requirements (many teams that must coordinate, super gigantic legacy source code base), and the resources to do whatever they want (money, humans to develop and maintain tools and do research). For them, the integration pain from managing multiple smaller repos may be significant. Outside this world however, teams are more independant and the source code size is much much smaller (even for legacy projects).

>> Git won because tech is a popularity contest

You have a point here, but this factor (and network effects in general) is just inertia, and does not explain why git won, given that for example SVN or Perforce had such a head start (in tooling, and in mindshare), and there were other distributed contenders like mercurial and darcs and BitKeeper developed at aproximately the same time, and even earlier. It won in my opinion because it was simply superior tech - faster, good enough and very very easy to get started.

(edited to clean up formatting)


Git won because it is technically superior (IMHO, but I've used most other VCS only sparingly). It's well designed, flexible, and fast. Apart from a command-line interface with some unfortunately named options and a "big file issue" (that has never been a problem for me), I don't think there is anything wrong with it.

What do you think is wrong about it? Or have you only "read a ton" but never used it for a while? In the latter case, I suggest you start with the things I mentioned above, and make good use of "git reflog" as suggested by somebody else. If you know git reflog, "delete tree and clone a fresh copy" is not a thing anymore.

> In the FAANG world, there's basically no company that uses Git as-is. It's strength/weaknesses aka tradeoffs aren't good enough for them.

Do they use svn then?


git is much more powerful. It gives you control to manipulate the repository like say, a relational database. If svn is enough for you, that's fine.


> It gives you control to manipulate the repository like say, a relational database

And how many people do that? Not that many. This angle is a bit like the guy who said that he doesn't want Unix file names to be UTF-8 text-only, because he crafted a sort of relational database on top of a Unix FS and by having file names be just text he couldn't do some super niche trickery. I think it was in response to this article: https://dwheeler.com/essays/fixing-unix-linux-filenames.html

> If svn is enough for you, that's fine.

It is, but every job these days forces you to use git. And most places I've worked at, git is used as a glorified SVN where people just have a 2-step commit to a remote server.


> And how many people do that?

I regularly fix my commits, (just like I edit most comments on hn within a few minutes after sending them). And sometimes I take back commits even from the server (working in small teams).

Locally, I regularly switch to older commits and branches to try things out. In svn all this requires a network connection and it is quite costly. Which means you do a hell of a lot less of it. And I would assume it shows in software quality.

Man, how I used to freak out regularly waiting for a stupid svn diff or svn log to finish. These operations are the bread and butter of version control, and they are instantaneous in git.

Or, aren't you regularly blocked from doing quick fixes using svn for problems that you saw, but couldn't fix them because you had a different pending commit? I have that often when working with svn (admittedly I don't know svn very well, but I'm sure it is a real blocker in many situations). Situations like these are easy in git.

And how idiotic, after all, is it that we have to setup a SERVER to keep a simple log of changes to a few files? I have so many small ephemeral projects that I simply put in my laptop's home directory. There is no point in maintaining server repositories for those. Git is simply a tool that lets you do that. It lets you do bookkeeping of your data. It's a tool. While svn feels like an inflexible process. (yes, svn has the file:// protocol, but you still have to setup a separate repository, right?)

In a sense, git is the sqlite of version control. You are really missing out.

> It is, but every job these days forces you to use git.

Here in Germany, svn seems to be the norm still in the engineering domain. But I figure the situation is quite different for web shops.

> And most places I've worked at, git is used as a glorified SVN where people just have a 2-step commit to a remote server.

That's my experience as well. When it's about communicating changes most setups will be centralized just like a regular svn setup. It's a fine approach for small teams. But with git you still have the huge advantage of being independent of the server. And you can easily fix problems that are already committed to the server.


You don't need to understand git's internals; I for sure don't know how it does delta compression of pack files etc. to provide you with efficient storage of snapshots and whatever else it does.

However, what you do need to understand is the model that git uses, which is extremely simple.

Git provides you a way to store snapshots of data into an on-disk graph data structure* that you can sync to and from remote repositories. You also get refs to store symbolic references to the snapshots in the data structure, and you can sync those too.

That's pretty much my mental model of git in its entirety, and it allows me to merge, branch, rebase and perform all kinds of commit surgery with ease, because I can always tell what effect an operation is going to have on my data structure (and even if I'm wrong, I can't lose data)

I seriously can't see how it could get any simpler.

(*) a git repository can actually contain several separate graph data structures, but that's usually not what you want...


As unorthodox as it is, the way I learned Git was through re-implementing it as a project for the data structures course I was taking [1]. This isn't the method I would entirely recommend to most people: instead, like the author, make a diagram for each of the (most-used/basic) commands and sketch out how they interact with the data structures. [2] is a good starting point.

[1] https://inst.eecs.berkeley.edu/~cs61b/fa17/materials/proj/pr...

[2] https://git-scm.com/


I got mixed results with a completely different approach: starting with what actually exists within a Git repository (i.e. roughly: focusing on aspects of the plumbing layer first).

However, this only works with people who can make the mental leap to be able to deduce knowledge of what should be from knowledge about what is. In the end, I concluded it's a bit like teaching cooking. There are those folks who need to be taught about full recipes and those who need to be taught about resources and corresponding steps.

I have not yet seen a working approach to make both of these fractions happy, unfortunately.


This is a pretty great overview! However, I learned Git and Github from Udacity's free course [1] and it was amazing to me since I am more of a visual learner. It got me up and running with Git within a week or two. I recommend others who have no prior experience with Git to check it out.

[1] https://www.udacity.com/course/how-to-use-git-and-github--ud...


I once taught[1] the building blocks of git (basically bullet time view of a commit) and people found it a bit too theoretical - even though it contained all the elements, that helped me to understand and appreciate the simplicity.

There is a point, where you go from memorization (add, push, commit) to deduction (graph, objects and refs) but when this point is reached, depends on many individual factors.

[1] https://git.io/fhWxg


Personally, I think using git from the commandline is too complicated for the purpose it serves in most companies.

Using a git GUI works quite well for people inexperienced with git.


Its opposite for me. I am very comfortable with the command line. It lets me do crazy things and if I mess up, quitely crawl back to the peaceful place.

GUI integrations have some advantages but the knowledge is not transferrable. Different GUIs work differently but the commands stay the same. I think both can complement each other. I use Pycharm a lot and it is a lot easier to see diffs or file history there. I think the same can be done with cmd as well but I don’t think I ever bothered to learn advanced commands.


Committing to version control is one of the ideal uses of a GUI. You can skim your eye over the files you've changed, and flick between their diffs by just clicking on their file names, before going ahead with the commit. You can simultaneously cast your eye over recent commits in the revision history. Of course these are easily accesible with git status and git log (|less) but it's not the same as dealing with the information graphically.

I would posit the ed text editor for comparison. Why bother with a graphical text editor (including terminal editors like Vi and Emacs)? After all, you can look at the context surrounding lines you wish to edit by entering the appropriate ed commands.

I think it comes down to developers being so used to working with command line tools that they don't give GUIs a proper chance - ironically the exact objection they have with less experienced users not using the command line.


I like the commandline as well, but most of my coworkers are happy using a git GUI, and I can see why:

  1. Most GUI's give you an overview of the changes before committing.

  2. Most GUI's let you commit and push in one go, and also show unstaged changes so you don't forget to add/commit/push anything.

  3. A good git GUI is explorative. Newbies just remember the icons to click at first, and they learn more by exploring menus and reading messages.

  4. Commit histories are easier to view and filter.

  5. Exotic steps are easier to do, since you don't have to remember commands you barely use.

  6. Adding remotes is a piece of cake.


    > and if I mess up, quitely crawl back to the peaceful place.
Ok, but getting to THAT level of comfort takes a long time and many failed attempts.

While I understand why folks may want a GUI for git, I am continually amazed that there hasn't been an effort to "refactor" git commands so they're more consistent, easier to remember and easier to discover. It's a miracle that git has taken root so strongly, given it's shitty user experience.

I've been using git for years, and STILL, I need my cheatsheets and google far more than I would ever admit in person. Anything that's outside of heavily practiced workflow-- and I'm in a world of confusion.


> Using a git GUI works quite well for people inexperienced with git.

I agree, and I wish there was a really good git GUI that either abstracted git nicely, or was as powerful as the CLI, or both. I’ve tried many of them, in production, and there aren’t any that give you a UI for everything git can do.

The problem I’ve noticed is that people raised on the GUI often don’t understand how to get out of trouble once they have a serious problem. Also the GUI tends to be a crutch that prevents them from learning the CLI well.

Git is natively a CLI and it really shows once you know both. Git’s CLI interface is awkward and hard to learn, and all the GUIs for git are somewhat awkward, both due to git being awkward, and also because trying to fit UI workflow onto CLI commands introduces awkwardness. All git GUIs are incomplete interfaces to git, there are none that give you access to all of git... specifically things like finding lost data and managing repos is something you’ll need to drop to the CLI for.


Same here. I am using Git for over 10 years now (with breaks), but I always pain me getting into the CLI (apart from basic add/commit/push/pull). Almost all commands require flags to give you a decent behaviour and the documentation is seriously lacking.

For example, I've never had any problems with Mercurial. Everything just works there and it is intuitive. You can never lose data there, and it has extremely sensible defaults.


Even for experienced people, I rather spend my time on GUI tools and only drop into the CLI on as needed basis.

From UI/UX point of view, developers are users as well, but many seem to think developers should put up with bad interfaces.


I'm in the same boat. When I do something bad CLI helps me fix it, but otherwise an interface is really useful for me.


The problem is that most Git GUIs are just that—Git GUIs, with much of their functionality being a thin layer over a subset of the decidedly poor-usability Git command line interface. There are small areas of their functionality where I find that some do a really good job, but mostly they still require you to learn the underlying concepts and nuances of Git rather than of version control in general. I am not aware of any that have evidently been constructed from the other end, focusing on the users and workflows, bending Git into shape around that. Naturally, there are dangers of being opinionated like that when it comes to playing ball with other users that might not use that particular client; this is a hard problem, which explains the paucity of attempts. (Making “a Git client” is easy; making a good tool for users is hard. It’s similar in other domains where there’s an easy path and a far better path—the far better path is seldom trodden.)


This is like teaching people to drive a car with an automatic gearbox.


Which happens to be the only option on future car driving technologies.


Not sure whether your statement is ignorant or insightful.

Yes, once electric cars (with superior traction and brake control) take over automatic gearboxes (actually electrics don't even have gearboxes) will be the norm, but as long as there are engines driven on dinosaur fuel there will be a need for manual gearboxes. Which is why you should learn how to drive one. Once you're able to do that, driving an automatic is trivial, which was the point I was trying to illustrate.


> Not sure whether your statement is ignorant or insightful

I’ll answer that for you. It’s insightful.

In the U.S. today, less than 2% of new cars sold are manual now. Automatic gearboxes are well beyond the norm already.

https://www.chicagotribune.com/classified/automotive/sc-auto...

I didn’t understand why you said there will always need to be manuals for vehicles that run on gas... what do you mean?


> In the U.S. today, less than 2% of new cars sold are manual now. Automatic gearboxes are well beyond the norm already.

At that percentage, the U.S. is a huge outlier however, and accounts for quite some chunk of of worldwide automatic gearbox equipped vehicle production on it's own.

The automatic gearbox remains the less popular option worldwide [1], though as you can see they are more popular than they were before.

[1] https://www.statista.com/statistics/204123/transmission-type...


The real reason that's the case is because most of the rest of the world is much poorer than the US. Also many the people in those other countries don't have to drive as much or as far as people in the US.

At some point it's just Stockholm syndrome (we can only afford manual, everyone has it, so let's at least pretend it's cool!).


Indeed, if I could afford semi-automatic gearbox for the same price as manual, including the regular servicing, I would certainly use it.

After all, as GC fan, it does not make much sense to malloc()/free() my gearbox. :)


I mentioned future, naturally I drive manual today.

The point being that what is right to learn today won't stay like that forever.

Only people that maintain old timers still know how to use a motor starter.


I think the best thing you could do for novices is to avoid priming them with preconceptions of Git being difficult. Psyching people up before instructing them never seems to do people any good, yet it's very common.


I agree with this. For many people who are learning Git for the first time, it's also their first experience with contributing code to a repository at all.

The last thing you want to do is turn them off entirely, or gatekeep the profession to exclude people who aren't good at reading dense documentation.

The best way to teach git is to get them comfortable with "Add, Commit, and Push" and then explain what's happening at each stage.


Despite using CVS and SVN for years, I was never truly comfortable with either. When I first learned git, it was like a breath of fresh air. Suddenly the VCS was behaving in predictable ways. Yet I'd put off learning git for two or three years because everybody was telling me "git is so much more confusing than subversion." I shouldn't have listened to them, and they shouldn't have said that!

Maybe if somebody has already mastered subversion then git will confuse them a lot, but I'm not convinced even that is necessarily true. Regardless, it seems clear to me that novice users shouldn't be told that git is complicated.


This is missing pointing everyone at a phenomenal ncurses git log viewer: https://github.com/jonas/tig

And another great bit: always do `git commit -v` for verbose commit edits. Read your diffs before you commit them.


How can anyone have a project home page without..showing me why I should use the project.

Would it have killed Jonas to include a screenshot in the README?

I'm not going to install something just because someone says it's good. I have to go multiple links deep to get taken to a Flickr gallery, and even then I'm not sure what is so good about it: https://www.flickr.com/photos/jonasfonseca/sets/721576144707...


You make a good point, and I was similarly disappointed in the homepage, so let me elaborate a bit on why I use it:

1. I want a Git log viewer right in my terminal, that gives me most of the benefits of visualizing a Git log like a "real" GUI.

2. Fast startup. I work in dozens of Git repositories a day, many submodules, etc. To visualize these effectively, I want to type a few characters in my terminal and see the Git history in a digestible way.

3. This essentially `git log --graph --pretty` with the ability to deep-dive into a single commit view of `git log -p` with nothing more than a stroke of the return key. To exit, I hit ESC.

4. Vim-like bindings in a Git viewer. Every visual app that I use daily in my terminal has "roughly Vim" bindings, from Emacs (https://github.com/emacs-evil/evil) to Zsh (bindkey -v) to Tig.

5. This is just as fast to start as `git log --graph --oneline`, but I get a fully interactive view of a Git Repo history. For example, the Linux source [1].

Perhaps I should compile my thoughts and send some README updates and screenshots to help the project.

[1]: https://i.imgur.com/yd7cSd1.jpg


Years ago, I tried to teach git to my (college) students, and it was a complete disaster. Like a lot of technical things, it's easy to get them totally confused with a single off-the-cuff sentence (and to be honest, I think I underestimated how difficult it would be to explain it and the kinds of pain points they'd encounter).

Reading this article, it occurs to me how useful the idea of a "staging area" would be to helping them understand. I don't think of it that way myself when I'm working with it (I suppose I do, but not in those precise terms). But looking back, that's what was tripping them up. If you're just talking about local and remote repositories, you're not really giving them the right idea of the workflow.


I found teaching Git to someone unfamiliar with any other VCS was far easier than teaching someone who already knew SVN.


I only have a sample of 1 (my wife) but I've found Ungit: https://github.com/FredrikNoren/ungit unparalleled for explaining the graph model behind git — what's a merge, what it means to fetch vs pull, what's the difference between committing locally vs push etc...

The specific points that make it great for such teaching:

1. pretty graph

2. hovering over actions such as Commit / Merge / Rebase / Push shows what would happen to the graph if you do it.

3. you can manually "Move" local & remote branches anywhere you want! This is mildly risky as a habit, but much clearer to explain than fast-forward and push, especially with multiple remotes.

4. automatic fetch that works pretty well (though explicit fetch UI with multiple remotes is clunky). For people scared of merging and conflicts, it's liberating to teach "fetch is always safe" and "local commit is always safe", and that you can fast-forward or merge/rebase separate step.


It would be so much easier to use git if there was simply a "git undo" that would cleanly undo the last git command (or last n commands would be even better).

Then you could learn more easily via exploration and experimentation.

And I do mean just one command to undo all the things, I know that you can undo a lot of things in git but each command is different, and I know that undoing pushes is hard.


Shameless plug: "GIT isn't perfect, and other blasphemies."

https://blog.hackensplat.com/2018/12/git-isnt-perfect-and-ot...

(Addresses some of the points raised in this article.)


I like the painting analogy and have used it frequently.

Let's say you were commissioned to create a landscape painting. It'll be a big payday if you get it right and its due in 30 days.

Because you know you can make mistakes, you make a photo copy of your work every day and make a stack of them neatly on your desk. This photo copier is really high quality as copies are made at the molecular level.

On day 16, you find yourself working on the mountains in the picture. You sneeze and Oh no! You got paint all over your work. But since you have your copy pile, you just make another copy of day 15 and keep going.

But lets say you think its a lot and want a friend to work on the painting too. Just allow her to copy your latest photocopy and you both are off to the races!

Git seems complex but when you bring it down to a practical, non technical level, beginners pick it up faster :)


Some ideas on how to teach it non-verbally in class, regarding the add, commit, push workflow:

In class I'd stand on my right side, so students see me on the left (people are more used to left to right motions because of reading).

I'd demarcate the working directory, staging area and local repository non-verbally as 3 different spaces. My most right space (the student's left) was the working directory, my most right space (the student's right) was the local repository.

Everytime I'd make a transition from one space to another through add or commit, I'd make a hand gesture when I'd say "add" or "commit".

In order to add some humor (humor is rememerability) with push I'd point upwards to the ceiling as if I were looking to God. I'm not religious, but people got the reference and they all left out loud.


Many developers view git (and other "helper" tools like text editors, linters, grep, SQL, ... everything but their programming language) as second-class citizens.

The way to teach these tools is to make them primary citizens in your company culture. Emphasize that, without a strong grasp on a variety of important tools, devs will be unable to perform at or above the bar. Don't promote anyone to "senior" engineer who can't easily show the benefits of these tools while a junior engineer pairs with them.

Otherwise, developers will teach themselves the absolute bare minimum necessary to get their jobs done and go home. The quality of the training materials is of minimal importance. I learned git because I wanted to be better at it.


I applaud this effort, but it's still not hitting the right notes, eg:

> how to show the changes of a file in the staging area: git diff --staged

Ok, but this shows the diff against what: the file in the working directory, the file in the local repo, or the file in the remote repo?


Agreed that this is lacking. Staged is what will be included in the next commit (as made by git commit). So it is a diff against HEAD (current commit).

Also, it does not show a particular file, but all the changes in the staging area. To show only one file have to do `git diff --staged -- src/myfile.c`


I agree with the author - many developers learn git (or version control in general) along the way, even though it's a fundamental skill needed for almost every project out there. This leads to insecurities, inefficencies and wasted time when (not if) something breaks. That's why our company training for new graduate hires consists - among others - of 2 days of git/github training. I love git (even though I might experience some sort of Stockholm syndrome), and while it's the most popular VCS, it is not an easy tool to learn and should not be condensed into "commit - pull - push - now start working on some tickets".


1. Install TortoiseGit 2. Use Visual Studio to commit (it automatically does "git add" with the new files you added to the project - I always forget to do this manually) 3. Use TortoiseGit to do "git push", because Visual Studio has a problem with ssh keys (or at least had a while ago and I didn't bother to check since) 4. When something breaks (and it will), google like crazy until you find a solution

There. I never needed more in a number of companies.

[Yes, somewhat tongue-in-cheek and of course limited to people who use Windows and Visual Studio, but I have discovered that's a significant number.]


Yeah that's probably a good bare minimum amount of git knowledge. But I feel like that's making git which can be a super helpful and powerful tool nothing more than a means to an end.

Learning the functionality of git can really help you out. For example, I almost always use the `-p` flag (particularly `git add -p` which gives me a chance to manually see every change I've made, often I'll find some junk that I may have left somewhere else. Also being familiar with how to rebase series of commits onto and off of other branches can be a huge time saver.


> But I feel like that's making git which can be a super helpful and powerful tool nothing more than a means to an end.

It also feels like a clunkier workflow than just using git from the command line. It's probably easier to get started that way, but learning the basics of git from the command line isn't too hard.


TortoiseGIT always required me to check way too many boxes for each individual operation. Took me way longer than on the command line, where you just define one alias and you're good.

My biggest issues with git were when you were working in projects that had other usage patterns. It's a bit like C++, apparently everyone uses different subsets and strategies.


"...it automatically does "git add" with the new files you added to the project..."

Uh, yeah. You don't add random temporary files (notes, temporary testing data, the florist's phone number that you'll need after the meeting you're already late for) to your projects?


I want the notes in git. I never added temporary testing data (or the florist's phone number) to a project unless it was in code.


I've found the Visual Studio git integration to be not good with large solutions, just slows everything down. So I have it disabled in my projects.

What I do: I use GitHub desktop to get everything configured, then use the PowerShell command line to do all my work. Merge conflicts are fixed with VS Code.


Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: