Certain systems can best be understood as black boxes. You put some commands in and magic happens. Git was not designed to be such a system and early users of git know this.
During the last 5 years, many GUIs have filled in this gap, making it increasingly likely to find people completely stuck because they miss knowledge of the foundations.
Git is a utility to manage an append-only repository of tree-objects, blobs and commits. To help humans, git adds
- human-readable pointers (branches, HEAD, stash)
- an method to incrementally add changes (staging/index/working area)
- a method to append tree-objects, blobs and commits from repository to another
- some commands which alleviate steps in common tasks
These last set of commands cause pain, as users without foundational knowledge, do not realize these commands are compounding many small steps.
> During the last 5 years, many GUIs have filled in this gap, making it increasingly likely to find people completely stuck because they miss knowledge of the foundations.
Arguably, these GUIs became popular because Git by itself offers awful UX. I am all for obscure commands and switches (I use Linux and prefer CLI to mouse), but Git really took it to the next level. And it's not as if it couldn't be done better (as Mercurial shows).
As a user I don't want to know the intrinsic details about how some system is implemented. I just don't care - give me an external model which helps me use it, and leave it at that. I have my own code I need to worry about.
If most users get the "wrong" mental model when using Git, then the problem lies with Git, not with users. </rant>
A large part of the point of Git (in contrast to earlier tools like CVS) is that the internal data model that it enforces is one that works in the long term in distributed project settings.
Distributed source control requires a kind of hygiene, much like dental hygiene, or like cryptographic security. In the practice of hygiene, what's convenient is usually opposed to what's sustainable. Hygiene is effectively the set of discoveries of un-natural or non-intuitive ways to do things that work better than the equivalent intuitive/natural practices. Thus, any process that is "hygenic" is going to cause at least a little bit of pain or annoyance to follow—if it didn't, it'd be "natural."
And that's the fundamental problem with Git: that it tries to paper over the fact that it's "hygenic" by presenting itself as something seemingly "natural." People have the wrong mental model of Git because Git tries to meet them in the middle, by translating its (correct, sustainable) internal model into (broken, leaky) familiar abstractions.
Git isn't your friend. Git is your toothbrush. Imagine your dental hygienist telling you they're disappointed in how infrequently and badly you're brushing your teeth. The right response to that isn't blaming the hygienist for making toothbrushes annoying to use, right? It's shame, shame that you haven't bothered to overcome the stupid stubborn ignorance getting in the way of you taking better care of your teeth. Shame that you can't "meet your toothbrush where it lives."
If git exposed a consistent, understandable interface to its data model you'd have a point.
Mercurial exposes a much more consistent, much more understandable interface to a very-similar-to-git data model, so it's not like this is impossible.
So it'd be really nice if people could be allowed to point out the shortcomings of git's command-line interface without other people implying their complaints must be rooted in being too dumb and/or too lazy to understand git.
See my other replies below. I can't edit my above post now, but I'd like to make clear that I wasn't disagreeing that git(1) has a horrible, obtuse interface.
When I say to "meet Git where it lives", I mean "ignore git(1) and focus on the merkle-tree datastore it creates." That datastore (to really stretch the analogy) is a toothbrush-head with rather excellent bristles. Used correctly, it can get your teeth very clean. git(1), on the other hand, is a horrible, obtuse handle attached to it. Ignore the handle; focus on the bristles.
(What this statement really means, to me personally, is that I'd love nothing more than spending a few months factoring that merkle-tree datastore out of git(1) into its own library—and then writing a new, much lower-level frontend to expose it. That would be "meeting Git where it lives" in a real sense.)
But, presuming that none of us have time to do that but need to get Real Work done (and that work, for whatever reason, requires using git(1) instead of Mercurial et al), the best thing you can do to "understand Git" is to not try to understand git(1), but rather to understand that merkle-tree library that exists in a nascent form inside git(1)—and then understand git(1) itself as a bad and leaky wrapper around it. As long as you understand what each command you give is doing to the object pool, you can "see through" git(1) and it will no longer be a mysterious dental-cleaning machine, but rather just a simple toothbrush with a really weird handle you have to finagle to get it into your mouth.
Or, to put it another way: as long as you understand the hygienic goal you're attempting to head toward, then you can understand Git in terms of that goal, rather than in terms of its interface.
In this case, the "goal" is building up a content-addressible object pool that you can dump a bunch of people's individual commits into as entirely independent trees (which share objects), that can then be compared and worked with.
git(1) doesn't make very clear that this is the goal. git(1) explains itself only in terms of how to operate the handle of the merkle-tree toothbrush, and where you point it; not in terms of what the brush is doing (cleaning your teeth.) git(1), learned in a cargo-cult manner or taught by someone only familiar with centralized or patch-based SCM, will not result in clean teeth. You'll just be waving the toothbrush around in your mouth. That's git(1)'s fault, of course. But that doesn't mean there isn't a powerful toothbrush-head there, if you look past the handle.
The basic mercurial command set is directly descended from the time-tested cvs and svn commands. Significantly different tasks are executed with distinct commands. In contrast, git dispatches to a subprogram with the top-level command, and then also gets the subprogram to execute different distinguished tasks with its arguments. Its a rather leaky abstraction that pollutes the command syntax.
But since I've said something disparaging about git, I want to balance it out with something nice about it. Git handles remote tracking branches much more nicely than mercurial. For a while I found myself in a mode where I did most of my work on my workstation, saved to a central repository for collaboration, and took a laptop into the lab/field (I'm an embedded developer). Git trivially allows me to handle multiple remotes from the laptop, with one remote referring to the desktop and one referring to the central repository. Since git lays down distinguished branch names automatically for all of (in mercurial terminology) the remote heads, its easy to diff/merge between them as appropriate when I'm in the field. In Hg, its impossible as far as I know. I had to set up multiple clones to track the two.
A good UI exposes the underlying structure where necessary, while offering shortcuts for commonly used tasks or groups of tasks, and most importantly of all, it is consistent. The internal design is elegant and simple, but the UI has grown organically and does not show the same level of care. I'm not convinced the problems with git lie with some fundamental difficulty with its internal model, they are more basic and it simply hasn't had as much care put into the UI as it should. I use the command line, and off the top of my head, problems with the UI are:
Inconsistent options between subcommands, e.g tag and stash
Complex options for everyday commands, e.g. git reset HEAD --file, git rebase -i HEAD~3, git checkout -b mybranch
The mix of options and subcommands, with no clear rationale and unclear naming (things like reset)
Separation of the stages of adding files with committing, I think the default should be combined, with options to separate the stages if required.
The man page assuming understanding of fundamentals for simple commands like git push or rebase
Finally, as an example of how unintuitive the UI is around something as simple as reset/undo - here is one of the top stack overflow questions for git with 12195 votes, which just wants to know how to undo a commit:
I agree with most of what you said but I strongly disagree with the following:
>Separation of the stages of adding files with committing, I think the default should be combined, with options to separate the stages if required.
My workflow is I make changes which may include many files. Once I'm done with the changes, I like to list the files that have changed, look over the changes to ensure that I don't have any unrelated changes, and if I do then I use "git add -p" for the files that I only want to add part of the changes for, and so at the point where I do that, I need to be able to add files in multiple steps because I only use "git add -p" for the files that need it, and for the remaining files I do regular "git add" since there is no point in having to go through the "git add -p" process with those. Once I have all my changes staged so far, I sometimes remember that there was yet another file that needed to change, so I change that file and stage it for commit. When I'm ready to commit, I review the whole diff that is staged for commit and then I commit it. To me, separate stages of adding files with committing is the only sane default.
This does depend how you work and what you're working on. Since the default is your workflow, it'd be nice to have a command which dealt with the workflow of only changing files you wish to be in the next commit (which I imagine is also very common).
git push origin HEAD:mainline
git rebase origin/mainline
git checkout -b origin/mainline origin/mainline # kappa, caused me hours of pain
Some day we'll hopefully get a re-write of the command line, and replace some of this stuff that is obviously just cruft... and cruft that you could see coming a long time ago (been using it since rails core switched to git). I remember a conversation int the mailing list where people were talking about re-aligning this stuff but they were also worried about adoption and consistency so they left it. Maybe with git version 2.0 they said. This was like 2009.
I like my toothbrush, I just wish it didn't hurt so much if I align it incorrectly. Does it really need those sharp edges? Does it need to be made of steel? I know, I should have known better by now, but...
There is this other toothbrush (called "Mercurial") that is made of plastic and has round edges, and according to my dentist, does the job perfectly unless you are among 0.5% of population with REALLY large teeth. But the local shops (GitHub) don't sell it and I have to drive to the other part of town to some obscure shops to get it (BitBucket)...
Git's GUI is full of horrible features, whcih are almost impossible to learn.
Why does 'git checkout' do double-duty, of checking out a clean copy of a file/directory in the current branch, and checking out a different branch? How does that fit into "learning the mental model of git"
If I have a directory called "master", and a branch called "master", what does "git checkout master" do? How do I force git to do the other one? (I know this, but I shouldn't have to)
Actually, git checkout makes perfect sense if you have the git mental model: git checkout makes your index and working copy equivalent to something in the repository. By default, that thing is HEAD, but it can be some other branch if you desire. Checking out a clean copy of a file from the current branch is just a special case of checking out clean copies of files from some other branch.
`git checkout my_branch` does something else that is quite different: it changes the current branch!
Also, `git checkout my_branch` is very different from `git checkout my_branch -- .`
I strongly dislike the double duty of `checkout`. It's made worse by the fact that `git checkout my_branch` will never lose uncommitted changes, while as `git checkout files` will.
It changes the current branch as it brings in clean files from that branch. Bringing in specific clean files from another branch is a special case of bringing in all clean files from another branch. I look at the current HEAD pointer being changed as a side effect when used without file arguments as it simply changing another file: the HEAD symlink. Think of the HEAD symlink as existing above your target commit (and it kind of does) and it makes a lot of sense.
But regardless: the goal of git checkout without file arguments is something you have to have, and if you then ask yourself what tool should implement just the tiny bit of functionality of reverting changes adding a separate command for that seems awkward when that is 99% of what git checkout is doing.
I feel you are over defending git. There is effectively two commands here, why not just give them different names?
The man page starts "switch branches or restore working tree files", which sounds like ramming two things together. Then the manual has to have a special section on how to disambiguate the two uses (files and branches). It would be less work to have an unambiguous command.
Perhaps, at least, make the -- separating files and branches mandatory (so you have to start -- always with files)
I'm not disagreeing with you. I'm more saying that in a world where people "met Git where it lives"—learned the internal model Git is based on—the Git GUI with its misfeatures wouldn't exist.
Picture factoring Git out into 1. a library that manipulates content-addressible manifest-tree pool directories; and 2. a GUI that consumes that library. Now, throw away the second thing. The fact that even #1 hasn't happened already—and that there's no demand for it to happen—is a testament to just how obscure Git makes what it does.
Git's internals could be the next LevelDB; they're that much of a swiss-army knife. But instead, we just have git(1), in all its ugliness.
> Imagine your dental hygienist telling you they're disappointed in how infrequently and badly you're brushing your teeth. The right response to that isn't blaming the hygienist for making toothbrushes annoying to use, right?
I'm very happy with my electric toothbrush, a purely UI/UX improvement to the problem of toothbrushing.
There's no reason to take it away from me, just to make me realize the pure awesomeness of the procedure of toothbrushing.
An electric toothbrush doesn't paper over the abstraction of brushing teeth. The thing the electric toothbrush is doing for you is a clear, 1:1 automation of the thing you do yourself with a manual toothbrush.
This isn't true of git's high-level commands. If Git was a tooth-cleaning tool, it wouldn't be one that helps you to brush your own teeth. Instead, Git['s high-level command set] would be a fancy machine that asks you to take your teeth out (and hopefully they're dentures), put them into a black box, and then they'll come out cleaned. But sometimes they come out with the teeth rearranged and it's hard to understand why. Sometimes they come out tasting of fluoride and merge-commits. Sometimes you have to take your tongue-ring out before Git will accept your teeth. It's all very mysterious.
Which is silly. Inside the box, Git is brushing your teeth. Inside the box, a very simple thing is going on. It's the abstraction layer that's complicated and unintuitive. The "mystery" is all in how they map their black-box interface to the tooth-brushing internals.
If Git actually was an electric toothbrush—a way to accelerate and ease brushing teeth, but keeping 1:1 clarity on what's going on, and complete control of the result—then nobody would ever be confused about Git. It'd be slightly harder to use than a black box that you put whole dentures into—in about the same way that sed(1) is harder to use than the "Find and Replace" in a text editor—but those familiar with it wouldn't find the workflow any more strenuous.
Lets take this (obviously broken) analogy a bit further.
Toothbrushing, as a skill, is learned early in life and passed over by generations. Best practices, such as in which order to brush the teeth and at which moments during the day we are supposed to use the brush, are repeatedly given during dentist visits, at schools and early in life. Reasons need to be given why to do it this way and these reasons involve some understanding of microbiology, sugary food and plaque. So, also with toothbrushing, it is important to RTFM.
It is important to understand that the task at hand is extremely repetitive, does hardly require a mental model and is shared among all humans.
Now, compare this with git. If I would start using an electric toothbrush without having any understanding of how to use it and what effects I'm trying to achieve, I would hardly be effective. Seeing the dentist bill after a long and painful visit, I would point out the advertisement: "helps prevent carries" and ask what all this toothbrushing is supposed to be about.
I'm not saying the interface of git is intuitive or provides useful affordances. It can be improved. Nevertheless, using a distributed version control system is inherently a difficult task and requires a good understanding to prevent problems down the road. Just like plaque, microbiology and sugary food give a model for brushing, so do blobs, tree objects and commits a good model for (distributed) version control.
As I'm reading these comments I can't help but wonder, why hasn't anyone come up with a sane git cli interface that people can use in place of the standard git cli commands (of course the sane git cli commands would just call the "old" git cli commands under the hood)?
Because such abstraction will always be leaky and when things break, you will need to get down to the git level anyway. Also it would be harder to find contributors to your project if you don't use git directly, or you'd have to maintain two sets of instructions, for git users and for easy-git-whatever users.
It's kind of like wondering, "why hasn't anyone come up with a sane language that people can use in place of Javascript (of course the same language would just use old Javascript under the hood". Sure, there is TypeScript and like, and it solves many problems, but it introduces others.
Sure you would need somebody on the team who can get down to the git level when things go wrong, but do all the rest of the team members have to work at that level? Not everyone has to be a DBA to work with a database. It seems that we could have a level of abstraction for the majority of use cases.
People complain about missing abstractions, not leaky ones. Mercurial has leaky abstractions imho. A Mercurial changeset ("commit" in git terminology) has a branch property. In git a commit does not belong to a certain branch. With both SCMs, most changesets/commits are on multiple branches, thus I consider the Mercurial abstraction leaky.
Mercurial people seem to like this, but it just doesn't fit my mental model.
Exactly, there is the flexibility. IF you need to track source of commits together with the permanent branch info you use Mercurial branches, if you want a "symlink branches" in GIT style, bookmarks are the way to go.
It works nice with Pull requests when you can move branch pointer after rebase or updates of commits.
It's arguably become much better in terms of "native" GUIs that don't try to hide the details. Between gitk and git-gui, what's really missing?
(Both of those examples could probably be polished here and there, but the basic concepts are sound.)
Also, you really don't have to know the intrinsic details of how Git is implemented. What you do need to know is its model. The "native" model of Git is already very clean and sound, tacking on an external model with subtle incompatibilities is precisely what will make life difficult.
> Certain systems can best be understood as black boxes. You put some commands in and magic happens. Git was not designed to be such a system and early users of git know this.
This is true. A less favorable reading of the same story is "Certain systems are designed so well that they can be understood as black boxes. You put some commands in and magic happens. Git was not designed as well and instead requires that user to fully understand its internal workings or risk running into trouble".
My theory is that people are so proud that they finally mastered git that they like to forget that it simply has a horrible UI (and by that i mean the CLI and the design of the core set of commands).
I know of only one other popular productivity tool that has a similar sense of "you need to learn how it thinks, deep inside, or stuff will mess up" and that's Microsoft Word¹. I haven't seen shares of MS Word geeks loudly heralding its awesomeness here on HN yet. Maybe hackers just don't write documents? I can't come up with any other reason.
¹) Word works great once you learn how to use its styles (a bit like CSS classes), its fields, and its cross-reference system. MS Word's font and paragraph buttons are there in the toolbar to lure you to the dark side, or something. Don't use them! Friends don't let friends touch Word's font settings from anywhere but the styles editor. Like Git's, MS Word's UI is awful (awful) but its internals is like CSS-done-right for paper documents.
I am totally aware of the internal model and exploiting it (and other features) daily as much as I can, and it absolutely doesn't change the fact, that I see the UX/UI as irregular, accidental, chaotic, and more grown than designed. A classical example of irregularity I'm constantly stumbling upon: `git remote remove` vs. `git tag -d` vs. `git rm`. Every now and then I type `git tag rm foobar` and have to break my flow to check if this did what I wanted or not, and then check manpages in angry frustration to find out what is the proper incantation to clean the resulting mess.
Now, don't get me wrong: I love what I can do with git — the dance of rebases, add -p, splitting, merging, reverting, diff + vim + apply --recount, etc etc; and I'm now much more versed in git than in hg; but I never can forget that Mercurial's author did "miraculously" manage to make the UX and docs approachable, regular, easy and carefully designed for friendliness (...uhm, maybe except MQ?... :/)
The parent definitely did read the manpage if you read the post. The problem with git is that it has a different vocabulary for the same action (e.g.: remove) for different entities. (git push origin :branchName anyone?)
By not breaking the compatibility the git interface accumulates cruft over time. I think they would gain by versioning the cli interface somehow and just removing old commands that were since replaced by better alternatives.
Something must be wrong with me because I almost never remember the command to do that. Perhaps its because I only do it once in 2 months or so. Thankfully my browser has me covered. I type in "git delete" and it autocompletes to links describing deletion of branches and tags.
@yoz-y: I did read the post, thanks very much. It took no time at all to search (`/\(delete\|remove\)`) those alternatives on the man pages, hence suspected lack of RTFM. :)
My point was a lot of these 'quirks' have been aligned over time. For example, your `git push origin :branchName` can also be expressed `git push -d origin branchName`. There is an obvious convergence toward using the `remove` command in command sub-groups, likewise using a `[-d|--delete]` flag in cases where commands are singular actions.
Your point about versioning is a good one. Default behaviours have been updated using a config flag and a warning in the past, perhaps something similar could be done. I think we're in agreement that Git's porcelain interface could do with a good deal of refurbishment. However, Git's backwards compatibility has become a prison, given you have a ton of code out in the wild running on the promise of: "Any `git` is good `git`". This is where a lot of the cruft lies.
IMO part of the responsibility lies with the community who are creating learning resources to keep up with the changes made - especially those monetising. So newcomers learn a sane, modern, consistent interface, instead of a bunch of esoteric legacy incantations. Indeed, these resources too rely on Git's promise of backwards compatibility to maintain their level of complacency.
Having said that, there's no reason some individual(s) who feel(s) strongly could't write a 'training potty' level interface as a separate binary; intended to be user-friendly, OCD consistent, and to be understood as a functional but completely non-conceptual black box. But they haven't.
This comment made me think back to a period of time just as git became popular when every second blog post was on the topic of "git is easy once you understand X" in which people who'd just had that moment of inspiration and discovered the innards of git would share their newfound knowledge. Inevitably, the comments would point out various mistakes and bad advice contained in the blog post. Often comments would then correct the authoritative sounding corrections in the other comments.
It's almost like git hit some weird sweet spot where its UX was so confusing that it could confuse smart people into thinking they understood it, even when they didn't.
The problem is many of us don't want / don't have time to learn all the intricacies of gits model. Its a tool to help us with our jobs, not the focus of our jobs. Its also very powerful and as as a result most of us only need a fraction of its features.
First line in the description of the 'git-commit (1)' man-page:
Stores the current contents of the index in a new commit along with a log message from the user describing the changes.
Since most people are searching the man-pages for a certain task (in this case, the task is to record the changes), the title makes sense. As a user, you are kindly requested to read at least the description part of the man-page.
It is not a snapshot of a directory, if you want that use btrfs snapshot support, or use tar and gzip or duplicity.
A commit records changes to a repository, which can be changes to only a file and not a whole directory of files, and the changes stored is only the delta of the previous change, and not "snapshot" in the sense of storing 2 different versions of your files in full - thats a waste of space, its stores the deltas, the patch - and other meta-data such as author and dates.
A commit is not deltas. It contains a hash of a tree object, which in turn contains a list of hashes of blob and tree objects. The tree hashes are of course of tree objects which are your subdirectories, and the blob hashes are the hashes of your full files. When you make a change to a file, and then add it, git saves the new file as a full copy, and gives it its own sha1 hash.
If you go into `.git/objects` and find the file whose name is the hash of your most recent commit, you can decompress it (zlib inflate) and the first few characters of the file will be something like "commit 485\0tree 6f3eeb2952a...". This tell us that this object is a commit object of size 485 bytes, and then after the null character is the commit itself. If you then take that tree hash and do the same thing, you'll see a list of blob hashes next to filenames, and tree hashes next to directory names (in a tree object, git stores the hashes as the raw bits, instead of ascii encoded, so if you want to follow the hash list, you'll have to convert the hashes to their ascii equivalent to find the appropriate object in the object store).
You are correct that git uses deltas, but it doesn't use them for commits, it uses them when it recompresses your objects into a packfile (which happens when there are too many loose objects or when you pull and push).
Every commit can reconstruct the state of your project at the time the commit was made. Each commit can do this without the help of any other commit.
Once I understood this, things like merging and shallow clones made a lot more sense.
Although it makes rebasing more confusing. What's happening there is that git is creating patches on the fly, and then applying them to the new base, and creating new commits. But the resulting commits are snapshots of what the files would have been had you applied the patches yourself.
Of course, there can still be "merge conflicts" since both branches might make changes to the same place in the same file. But since everything is a snapshot, if you have no pending changes in your working directory, hopping around the commit history is a safe action, so long as you have a branch pointing to where you left off.
I don't think it's fair to assume they didn't read the article. It's difficult to know when an explanation is succinct vs analogous, especially if you have a working model of knowledge. "A is just B" can mean more than one thing.
Nah, semantically, the parent commentor is closer to the truth.
There are SCMs that literally store commits as patches or diffs, and then, when you want to check out anything other than HEAD, they have to run history backward by applying those patches, in series, to move through time.
A git commit, on the other hand, is more like a handle to a pure-functional tree data structure that happens to share some of its pointers with earlier versions of the same tree. Each commit is still the whole tree—not a diff; any marginal commit just happens to not take up much space, because a lot of the objects within it are objects that were already entered into the pool in previous commits.
You can easily see that this is true by cloning a large repo, using git's `git checkout --orphan` command to create a new entirely-disconnected branch, and then committing the whole repo to it. If git was diff-based, the size of the .git folder would balloon when you did this and your computer would chug computing the commit. But instead, this operation is approximately free, because your new commit will create a tree object that shares all its child objects with existing objects already in the pool, even though the "diff" of this commit is against a blank-slate state.
(Mind you, git will chug if you ask it to `git show` this orphan commit; it will have to convert, right then, the (cheap) tree representation into a huge diff for you to look at. But that huge diff is just for your convenience; it has nothing to do with git's data model.)
A git commit is not a "patch" or "delta" as it may seem from the CLI/UI. A git commit consists of a reference to the previous commit and a reference to a git "tree" object.
A git "tree" object is essentially a directory listing of sub-trees (folders) and blobs (files). Each blob and sub-tree is addressed by a hash of the respective object. So in a sense, each commit by virtue of the tree it references represents the entire state of the repository at that commit. You do not need to accumulate the diffs of each commit to find the repo. state at that commit.
The way git saves space is two fold:
1) Since blobs are hashed (content-addressable), if the same file exists in two trees or in two locations in the repository it will not take up additional space.
2) Git occasionally re-packs the objects into packfiles that are compressed and leverage deltas to reduce storage space. This is the closest you get to storing deltas, but this is at the blob level and not the commit level, and is more an aspect of storage than how the Git data model actually works.
- Some meta-data (one or more parent commits, author, date, message).
The commit command (git commit) basically takes the tree object which you created in the index, your HEAD pointer (the commit you were at right now) and some user entered information to create a commit object.
So, notably, the commit does not store the changes! It stores a reference to the root of the index tree object (which contains references to the root of other tree objects or blobs). There are no diffs recorded anywhere, only calculated when needed.
"Certain systems can best be understood as black boxes. Git was not designed to be such a system"
"In software development, a leaky abstraction is an abstraction that exposes details and limitations of its underlying implementation to its users that should ideally be hidden away. Leaky abstractions are considered problematic, since the purpose of abstractions is to manage complexity by concealing unnecessary details from the user." https://en.wikipedia.org/wiki/Leaky_abstraction
You may dress git's flaws as "design choices" or however else you like, but the king is naked to me! I believe the time will come when our descendants will look back at git only in a social research context.
Yes, the book is excellent and also recommended :) But for me, having an understanding of git's internals really helped provide a foundation for the rest. Otherwise a lot of the stuff (like rebases, or amend commits...) were just black magic.
I like the writing style and the scope of the piece. Well done.
I kind of wish there were at least mentions of git plumbing commands where appropriate, to shake off one more level (half a level?) of magic. For example, just link to some information on `git hash-object` in the section on `git add`. Footnotes would probably be enough. No need to bog down the relatively quick pacing. Sometimes it can be hard to discover which plumbing commands correspond to the actions mentioned.
Most git tutorials come with diagrams of blobs and trees and branches with all the arrows and color coding. They get the meaning across but often seem to come with a bit of a mental disconnect from what is actually happening in the working directory and .git directory. Does anyone know of a tool to display that kind of diagram in real time while you are making commits or checking out new branches? It could bring an extra level of interactivity to the presentation. Imagine if the graphs on this page were updating live while you had to type the git commands to get them to update AND you could monitor the filesystem at the same time, showing exactly which files were changed by the command.
I haven't seen a git article link to this amazing website recently. By far one of the best ways to teach someone git is to walk someone through git by executing commands and allowing them to see the visual representation of those commands.
There is also an amazing single-player learning mode.
Not just that, the descriptions you find online of how it works simply don't prepare you for the reality of using it. It's a bunch of boxes and arrows when what you really want is what to do when things inevitably don't work how they are supposed to.
I mean, everyone who has written a guide to rebasing must live on Mars. They show how your branch used to point to this box, but now it points to THIS box! And then what you really get when you actually try to use it is a bunch of conflicts between changes that affect files you haven't done anything to and a shit ton of confusion.
The two top questions are "How to modify existing, unpushed commits?" and "How to undo last commit(s) in Git?". That could be interpreted as evidence in favor of a staging area. People actually need to change committed things very often.
On the other hand, on place 20 is "How to undo 'git add' before commit?", which displays a UI shortcoming with staging.
A simple way to make it clearer: have 'git stage <file>' and 'git unstage <FILE>' commands, instead of the extraordinarily unintuitive 'git checkout -- <file>' for the latter.
It was only 1000 lines of C at the time, so it couldn't have needed a million different articles and blogs to explain.
That doesn't give you a workflow or explain any of the more advanced features. The workflow you can get from any cookbook list of shell commands, and the advanced features you can get from the manual.
It's interesting to note that what is currently the index/staging area was, back then, only a cache. (which is also why you'll still find it called any of these three names, participating in the overall confusion)
Very good read. I was just hacking around with a tiny Python program that implemented enough to init, add, commit, and push itself to GitHub. It's all very simple until you get to the index format ... which isn't that bad, but it's definitely more complicated than this article makes out.
She refers to .git/index as a text file where "each line of the file maps a tracked file to the hash of its content". However, .git/index is actually a binary file where each entry is a bunch of different fields like creation time and modify time and SHA-1 encoded in binary. See https://github.com/git/git/blob/master/Documentation/technic...
So I wasn't sure whether this part of the article was simply wrong, or whether git index format "version 1" was text, or something else?
For future reference: I asked Mary (the author) about this and she said, "Great point! As far as structure of the data, the file may as well be a text file of lines of information. I decided to omit talk about how git actually stores data to focus on the structural concepts. Maybe I should have included a bullet point. I don't actually control the version of the article that was posted on HN. but I'll certainly update the version on my website."
I approach the whole thing similarly during my trainings and wrote a few dirty scripts to generate an image of what the repository looks like using graphviz https://gist.github.com/nibrahim/6119925
This is how I teach git when I do training sessions for companies. I really dig this approach; the plumbing on top of Git is not really sufficiently abstracted to avoid knowing this stuff, but at the same time it tries to hide just enough of it to end up biting you when (not if) something goes sideways.
This reminds me of The Git Parable which helped me understand git when I was first getting started. I really do believe that you have to understand how git actually works under the UI in order to have long term success using it. Whether it's bad or good to need this understanding can be debated but I believe that it is truth.
The thing that is missing from all tutorials is that branches, tags and everything else are just pointers to a node in the tree. I.e. branches have no "content".
Good stuff. Can't help but agree with others that I don't really want to be forced to be intimate with Git at a gory insides level, though. I use tortoise Git, and perhaps it's removed the temptation to experiment with things that could blow my foot off, but 99% of the time I have no reason to drop to the command line--nor do I want to.
This is all simple stuff. It's when after a rebase there are conflicts in files where they shouldn't be and the CI says "failed" then my knees weaken and i yearn for a guide that would explain everything
Explaining version control might be a bit challenging. Distributed version control is a bit more challenging. So teaching git can be a lot to take for a newcomer.
I went from subversion to git. In retrospective, subversion was much simpler conceptually (but problems like syncing branches were harder).
I found myself once explaining a git concept based on the plot of Back to the Future II. I think it was a perfect example to how to resolve some merge problem.
There are some "git cheatsheets" that provide a very straightforward graphical explanation of what some commands do. That helped me to consolidate some concepts.
This is really excellent. I have now read it three times, running through the commands as I go, and for the first time I feel like I actually understand git.
Here are some commands I found useful as I went through the tutorial:
To show a given commit without diffs but with tree and parent sha1s:
git cat-file -p sha-1
To show a given commit including diffs:
git show sha-1
To show a given tree (blobs and subtrees):
git ls-tree sha-1
Show the tree recursively (i.e. show sha1 of all files, ignoring folder)
git ls-tree -r sha-1
This is one of the best ways to lean new techonology. I remember that long back ago when i was in college i had some trouble understanding some aspects of web servers and so i decided to write a small web server in perl and soon soon i kinda knew it inside out. Same for writing my own Smtp client, the knowledge i gained from it will be with me forever.
This kind of makes me wonder why you have separate objects/ stores for each repository you have checked out - it seems like you could have just one in your $HOME that all your git repositories share.
There are a lot of comments here such as "Git has a horrible UI/UX", "I don't want to have to learn the inner workings of a tool", "Certain systems are designed so well that they can be understood as black boxes", and a lot of other complaining about Git and it's idiosyncrasies.
I think this needs to be dissected a bit. First, Git operates in a manner (internally) that is foreign to most users of other SCM systems. Second, Git has a bit of a "tacked-on" nature to it's CLI that can make use cumbersome for newcomers especially when they have been taught the shortcuts before the fundamentals.
For the first problem, I think this is where the black-box comments apply. And honestly, I think treating SCMs as black boxes is what got us into the situation we were in before Git. Version control, branching, merging, change management, and change deconfliction are hard problems, IMO. Personally, I think the base level functions that Git provides, combined with Git workflows from Atlassian (and others) really helps provide a daily routine to handle these situations. After cloning: branch -> change -> index (or update) -> commit -> pull -> fix conflicts -> commit -> push -> merge (or pull request) -> repeat. There are some variations depending on your branching model, but by-in-large this is what prevents regressions and forces people doing the committing to resolve the changes and not to break master.
I think you need to understand the "internals" of any SCM to really be able to conquer the challenges of distributed version control and the complexities of modern software development. I've worked in Rational ClearCase shops, and we needed a ClearCase guru on site too. Every team should have a Git guru.
For the second problem, yes, the CLI is a bit clunky at times. This, combined with a misunderstanding of Git fundamentals can lead you down some bad paths. Cleaning up the CLI is an independent problem from Git internals - and I'll admit some taxonomy/hierarchy/ontology/whatever of commands is probably needed to refine the day to day workflows. However, if you mess up the repo because you don't understand the branching and merging model, you are going to have to use the more "specialized" commands which, let's face it are going to be a bit more cryptic. This is the same for any system that has some maintenance or repair type functionality.
This is why I say, learning how Git works, allows you to learn the branching model better, which will hopefully allow you to avoid those particularly thorny paths.
Sure, you can choose some other SCM system that seems less cryptic or easier to use, but you will likely find yourself in a bind someday in those systems that you will need it's cryptic commands to get out of. Or more likely, doing a lot of work that Git would have allowed you to do in a fraction of the time.
During the last 5 years, many GUIs have filled in this gap, making it increasingly likely to find people completely stuck because they miss knowledge of the foundations.
Git is a utility to manage an append-only repository of tree-objects, blobs and commits. To help humans, git adds
- human-readable pointers (branches, HEAD, stash)
- an method to incrementally add changes (staging/index/working area)
- a method to append tree-objects, blobs and commits from repository to another
- some commands which alleviate steps in common tasks
These last set of commands cause pain, as users without foundational knowledge, do not realize these commands are compounding many small steps.