Hacker News new | past | comments | ask | show | jobs | submit login
Key Git Concepts Explained the Hard Way (zwischenzugs.com)
361 points by ingve on March 14, 2018 | hide | past | favorite | 182 comments



So many people have a hard time with git. I have yet to meet someone who understands git and uses it proficiently.

What I tell every new programmer joining my team is to read chapter 10 (Git Internals) section 10.2 and 10.3 of the amazing online git book[0][1][2].

I can not insist enough on how important it is to understand the fundamental design of git, and it is really not complicated.

This is what made me understand git and changed it from a hard tool to use, to this amazing productivity software that I love.

The kicker for me was to realize that in git (mostly) everything is a reference to a commit id.

What's a branch? It's a human readable string, made for humans, by humans, that ultimately simply references a commit id. What is a new branch? It's just a new human readable string pointing to a specific commit in your history.

[0] https://git-scm.com/book/en/v2/Git-Internals-Plumbing-and-Po...

[1] https://git-scm.com/book/en/v2/Git-Internals-Git-Objects

[2] https://git-scm.com/book/en/v2/Git-Internals-Git-References


A lot of people simply don't understand why source control isn't a problem solved 100% by tooling in a way that requires little to no interaction at all. To them, hearing that they need to read a chapter about the internals of a source control system before working on a project is like hearing they need to understand the details of how a transmission works before they go and drive an automatic - it's inane detail about a workflow they shouldn't even have to deal with because technology has solved the problem, or should have by now.

I was on a team that was forced to transition from TFS to git. Among those who weren't excited about it, the majority of the questions, challenges and issues weren't even as far down the road as how to use git, they were mostly along the lines of "why do we have to do all this other stuff now? Today I go to this screen, click this button, my work is now safe and other people can see it." History is linear, full stop, and conflicts are handled by essentially forcing a rebase on commit. Some admin does branching and merging and emails us if there are weird conflicts, we don't worry about that. Why make it more complicated?

These folks didn't need git tutorials, they needed to be convinced that curation and management of source history was one of their job responsibilities, which was tough because it really wasn't previously, at least not in any meaningful way.


> ... hearing they need to understand the details of how a transmission works before they go and drive an automatic

I feel like a much better metaphor is the difference between understanding that turning the steering wheel turns the car and understanding that turning the steering wheel turns the front wheels. You can drive a car without knowing that, but at some point you will find that not knowing that prevents you from having an intuitive grasp of things like parallel parking.

In software engineering, an even more apt analogy is understanding how compilers/interpreters work. You can do a lot of programming treating those as black boxes that make your program run, but eventually you have to have the deeper understanding.


Interestingly, it is really difficult to explain how the rudder doesn’t just change the boat’s direction when teaching sailing.

It especially happens after a student has just begun to understand that they can only move in some directions relative to the wind, and they feel stuck with the boat pointed straight into the wind. You have to move to change your heading, and you have to change your heading to move? The first reaction is to feel like the rudder is broken.


If the boat is small enough you can just rudder pump your way out of that one though. :)


Additionally, being "stuck" pointing straight into the wind isn't a situation where you are stuck with zero velocity indefinitely. The wind is forcing you backwards, and the arrangement of the rudder and keel/centerboard is unstable for going backwards without changing direction. Even if you don't know what you are doing you'll generally point one way or the other enough that you can then fill your sails again and have more normal control.

Ways to make this faster include pushing the main sail on a small boat out to the "wrong" side, tightening the front sail on the "wrong" side of a larger boat, and positioning the rudder only a little off center so that the slowly moving water relative to the boat is less turbulent.

Probably the only time when it is an actual problem is when people are drifting into some obstacle. It seemed to take students a matter of seconds to regain control of their boat when they were surrounded by large expanses of water. When they were frantic, their intuition about what they needed to do when in a hurry prolonged their loss of control.


If you're racing then you need to make sure you only pump in one direction (called sculling - in this case ) or you'll fall foul of rule 42:

http://www.sailing.org/tools/documents/42Interpretationsforb...


Those people have not worked on large enough teams; we had 50+ committers to an SVN repo before switching to Git, and it was very apparent where the limitations with that 'simple' workflow were.


I don't know how many committers have been on the average project I've worked on, but it's probably 25+, and I've worked on several with 50+ - and I don't know how you'd even make Git work at that sort of scale. Obviously people do actually do this, so I assume it must work somehow; I just don't see how it's going to work particularly well.

The larger projects I've worked on have typically used Perforce, but I used Alien Brain (which is pretty terrible) for some of the older ones. The check in/check out workflow, which is the same in each case, is basically what makes it all work once you get past a handful of people. Just simply being able to see who else is working on a (almost certainly perfectly cleanly mergeable) file that you're contemplating modifying is a big time-saver.

(I've used SVN, at a much smaller scale. It has similar Lock/Unlock functionality, which is a good start, but the performance in general of that seemed to be really bad. Locking a few hundred files might take a full minute, for example. Meanwhile, Perforce will check out 1.9 gazillion files in ten seconds, and that's only because it takes nine seconds to change NTFS permissions for that many files.)


> I don't know how many committers have been on the average project I've worked on, but it's probably 25+, and I've worked on several with 50+ - and I don't know how you'd even make Git work at that sort of scale.

Well, I actually don't understand how you can make it NOT work :) You obviously have to work with branches split per projects/sub-projects and different repositories for different apps. You have to find your branching model that works for you, it doesn't always works with a dev branch (we don't do that, we have bug, feature, release and master branches).

SVN is so out of this league that I don't even try to understand why people use it.


When you've got a lot of people, you've got a lot of changes - that's the long and the short of it. This is one thing the check in/check out model (as exemplified by Perforce, among others) is really good for managing. When you go to check out a file, you find out straight away if someone else has it checked out.

If you're just going to make a quick tweak, you'll probably risk it. Either they check it in first, and you've got a very minor merge, or you do it first, and they've got a similar minor merge. Not a big deal, in either case. (And when your gamble doesn't pan out, tough luck. No sympathy from anybody. You knew the risks.)

But, if you're going to make a very large, sweeping change, you'll probably be a bit more cautious. And that's fine: you can go over and talk to them, or message them, or email them, or whatever, to find out what they're doing, and coordinate your modifications appropriately.

I've literally never once found this less than somewhat useful. It's, like, the source control analogue of static typing: a huge pain in the arse if you're not used to it, but, if you've seen it play out, it's a mystery how anybody gets any work done in its absence.

(Of course, if you use git, maybe you can just email/Slack/etc. everybody on the team before you go to edit a file, just in case, and then wait for anybody to mail you back before proceeding... well, I don't deny that would work, assuming everybody checks their mails/Slack/etc. regularly enough. After all, I hear people get work done in dynamically typed languages too! But just think how much better things could be, if the version control system could look after this for you!)


I don't understand the hatred for perforce. It works really well. The times I need an offline branch to work on and keep history of commits are very rare.


I moved from git to perforce when I switched companies, and even though I actually really like git and consider myself reasonably proficient, I don't mind perforce.

My one real pain point with it isn't so bad, but I dislike how perforce tends to work at a file level instead of a commit level. It's hard for me to make several related changes which all touch the same files, like a series of patches which all refactor the same area of code, but which I would like to review and discuss separately, and potentially revert some/all of.

It's hard to manage this with shelves, because perforce can't handle unshelving the same file in two different CLs. I could submit all the changes to a separate stream, but perforce streams just don't usually work well for us, and it's still hard to experiment by constantly making and rolling back changes.

I guess I'm probably only used to this workflow because I have experience with git, but this is the time when I really miss the granularity of a git commit (and I'm doing a pretty gigantic refactor right now... so it's hitting me quite hard).


I recently had to do something similar with an ancient SVN repo, that had to stay in SVN.

I simply started a git repo in the same base directory as the SVN repo, and did my work in there. Every time I merged a branch back to master I committed to SVN's ^/branches/dev. Just add `.svn` to `.gitignore` and `.git*` to the SVN prop ignore.

You _will_ want to merge from upstream (Whatever Perforce's equivalent to `svn up` or `git pull` is) often, I was merging from upstream before every SVN commit (SVN mostly forces you to do this, `svn status --show-updates` is a huge help here but I don't know if Perforce has a similar feature).


There are googlers who do the same thing, mostly work in git until the branch is ready for one big perforce changelist to review and commit.


Not ideal, but you can use shelves, branches or streams. Or even complement it with .git there (it still works).


Perforce is awesome, now that there is review web interface for it (swarm), I'm completely happy!


same. i mean, it could be that perforce has great visual tools and people prefer complicated, esoteric cli tools. it has the perception of being more hardcore.

perforce's APIs are actually pretty good as well. they aren't documented that well, but they are easy enough to build some complicated tools with.


Hmm... I have to say the APIs and command line tooling is not where Perforce shines ;)

I found the APIs generally a disaster, and rapidly gave up on them. It was much easier to just run p4.exe and scrape the output. But... oh god. That's not saying much. The command line client was shit too. It eventually proved possible to get everything I wanted, but the data was never quite in the right format, the available queries were never quite suitable, and the results were never quite normalized enough. In the end I had to run p4.exe several times, discarding a lot of redundant data while doing this, and then cross-referencing the multiple sets of results afterwards to get what I wanted.

(One thing I had hopes for initially was p4's supposedly Python pickle-friendly output. But this was no good either! - since, from memory, p4 produces multiple pickles'-worth of data in one go, but Python will only read one pickle at a time, so you couldn't actually read it in from Python without a bit of work. Really made me wonder whether anybody actually tested any of this stuff. Really felt like the thing had been coded by a bunch of Martians, working from a description of the process made by people who'd never used Python and weren't even programmers in the first place.)


There are a huge number of developers out there who spend the overwhelming majority of their time working solo or on with a couple of peers.


Sure but for me, even on a solo project git works so much better than svn because I can create multiple branches and easily rebase or merge them as needed.

When exploring ideas I might have 3 or 4 possible paths and it’s really nice to have the option of breaking them into different branches until I figure out the right path.


Oh, I agree - but I already know git fairly well. IME those people who have been using TFS solo for a decade have a much harder time seeing the benefits, because they're totally alien to the development workflow that they have become used to.


You don't even really have to be that many people on a project for some limitations to become apparent.

With TFS and SVN, it's already enough to have stable versions you need to backport fixes and the occasional feature to in order to end up in branching hell, for example.


> A lot of people simply don't understand why source control isn't a problem solved 100% by tooling in a way that requires little to no interaction at all.

This is true of basically all problem domains: "Why isn't X easy? Can't you just do Y?" And it usually just indicates that the person asking the question doesn't appreciate how difficult and nuanced a real solution to X is.

Real-world problems are often difficult and have non-obvious, sometimes-intractable challenges. "Easy" solutions often cause more problems than they solve — there's a reason why more complicated tools like git have taken over from less-capable tools like Subversion and CVS.


that can happen, but it can also serve as an excuse.

in git, for example, why can’t i switch branches with uncommitted changes? it makes no sense why i can’t do that, but yet, there it is.

> there’s a reason why more complicated tools like git have taken over from less-capable tools like Subversion and CVS.

i don’t think the reasons have much to do with usability or even capability. i came to git from perforce, where the latter was much easier to use, reason about, fix issues, and has much better tooling, in particular visual tools. it’s also amazing how slow git is.


> in git, for example, why can’t i switch branches with uncommitted changes? it makes no sense why i can’t do that, but yet, there it is.

Except you can?

    [stouset:~/Development/some-project] some-branch(+8/-4)+ $ git status
    On branch some-branch
    Your branch is up to date with 'origin/some-branch'.

    Changes not staged for commit:
      (use "git add <file>..." to update what will be committed)
      (use "git checkout -- <file>..." to discard changes in working directory)
    
    	modified:   path/to/file1
    	modified:   path/to/file2

    no changes added to commit (use "git add" and/or "git commit -a")
    [stouset:~/Development/some-project] some-branch(+8/-4)+ $  git checkout master
    M	path/to/file1
    M	path/to/file2
    Switched to branch 'master'
    Your branch is up to date with 'origin/master'.
> i don’t think the reasons have much to do with usability or even capability. i came to git from perforce, where the latter was much easier to use, reason about, fix issues, and has much better tooling, in particular visual tools. it’s also amazing how slow git is.

Perforce isn't free. git's popular predecessors were.

That said, I've found in my experience that way more people seem to hate Perforce than seem to hate git.


You only can switch like that sometimes, and it would have been more intuitive if there was an implicit per-branch stash and stash pop when switching branches, which would always let you switch.

See https://stackoverflow.com/questions/22053757/checkout-anothe...


Git’s direct predecessor, bitkeeper, wasn’t free.


> in git, for example, why can’t i switch branches with uncommitted changes? it makes no sense why i can’t do that, but yet, there it is.

Because the working tree having uncommitted changes being blown away would be terrible data loss or a conflicting merge, so when that could happen, git prevents it.

Use `git stash` to stash your uncommitted changes before switching branches and you'll no longer have data loss.

If you want to apply those changes again use `git stash pop`.

If you don't want those changes use `git stash drop`.

Why isn't there a commandline option to checkout a branch and do a stash operation at the same time?

Because no one has needed it enough to add it.


>Because no one has needed it enough to add it.

Or because those that did need it whipped up a quick and dirty alias to cover that use case.


I wouldn't want this. This would only be useful if the stash is automatically popped when switching back.


I think the implication is that switching branches shouldn't blow away your uncommitted changes. And one shouldn't have to stash their changes to switch branches.


> in git, for example, why can’t i switch branches with uncommitted changes? it makes no sense why i can’t do that, but yet, there it is.

You can, if the changes don't conflict with the branch you're changing to.


> This is true of basically all problem domains: "Why isn't X easy? Can't you just do Y?" And it usually just indicates that the person asking the question doesn't appreciate how difficult and nuanced a real solution to X is.

Upvoted because this is so true, yet it is also only half the story -- let's call it the ignorant or lazy users half. The other half is the ignorant or lazy developer side: that is, developers who don't properly prioritize the 90% use case, and who don't appreciate or understand the natural mental models they should be accommodating and treating as problem constraints.


This is the crux:

> they needed to be convinced that curation and management of source history was one of their job responsibilities, which was tough because it really wasn't previously, at least not in any meaningful way.

Getting people to be responsible for the whole is hard.


Similar experience here - I had a team of developers who had to move off of an internal TFS server to Bitbucket / git. I gave them all of the reference/doc links, gave them a brief overview about how git in fundamentally different than a centralized VCS. At the end of several sessions w/git (using git bash on Windows), their comment to me was: "But we're using to clicking on buttons in Visual Studio"


Visual Studio has a GUI for git, if your developers feel more comfortable that way, but in my limited experience I've had to help a few others who accidentally committed more files than they intended to using it, so the opaqueness of the GUI may be a disadvantage here.


Does it also support Github pull requests and Gerrit reviews? From what I've seen, people rarely use "just" git.


The problem is even deeper than that, with a lot of IDE developers and particularly Visual Studio ones there entire world is in the IDE, anything outside of it may as well not exist, anything the IDE can't do is impossible. Many wouldn't even be able to compile there own projects if you took the IDE away.

An email I got the other day was "we can't debug problem with x because it's built on the CI server" and the problem was in the first few milliseconds of startup so "attach to process" was no good, even if they knew that was possible. The idea of using a debugger outside of the IDE (mdbg in this case) never crossed their mind.

IDE's a great for coding, but too many devs are way to reliant on them.


Why shouldn’t they be? Why should compiling a file or debugging a project require so much command line hackery? Why can’t IDEs actually do what they promised (integrate dev tools and make them more usable)?

It is so weird that in the 90s I could do MORE in my IDE than I can today, somehow our tools have trended to increasing complexity while IDE advancement has come almost to a stand still, with a preference to simpler less integrated ones (e.g. VS Code).


I used to feel the same way, especially as a VIM user. Why is every IDE worse than VIM, and why do I have to learn a new IDE for Python, C++, PHP, and Java?

Enter the Jetbrains IDE. The same basic program for almost every programming language and a terrific VIM plugin to boot. It's on the order of $100 per year. Well worth the cost, and the only non-FOSS code on my computer other than the MP3 codecs.

I usually don't use the built-in Git commands in PyCharm, but I do use the built-in terminal to perform Git operations. The only Git work I do with the IDE directly is resolve merge conflicts, because that specific tool is terrific.


Git is the poster child for "leaky abstraction"


I disagree that this is the case. I see it more as git is a DAG, and git's toolchain is simply a set of tools for manipulating the DAG.

It's not so much that git is a leaky abstraction as it is that git isn't an abstraction. If you don't know what a DAG is or how commits, files (e.g., blobs), trees, etc. are mapped into the DAG, of course you're going to have a hard time!


I could be able to write the git internals code from memory and it wouldn't help me know what `git checkout` was going to do.


That's almost patently false, since `git checkout` is pretty much just a tool that takes a commit (or a name that points to a commit, like a branch or tag name) and updates the working tree, index, and the HEAD pointer to match the contents of the commit, with some extra safeguards to make sure uncommitted data isn't lost.

Three examples (that cover 95%+ of use-cases):

    $ git checkout some-branch
> Updates the index and working tree to reflect the contents of `some-branch`, and updates the HEAD pointer to indicate we're on the `some-branch` branch.

    $ git checkout -b some-branch
> Updates the index and working tree to reflect the contents of `some-branch`, with the "convenience" of creating it first through wrapping `git branch`. This is identical to calling `git branch some-branch; git checkout somebranch`.

    $ git checkout <optional-commit> -- path/to/a/file
> Updates the index and working tree to reflect the contents of the named file on the current or named branch/commit/tag/whatever.

TL;DR, `git checkout` has a very clear and well-defined responsibility: update the filesystem and index to reflect the contents of some commit. Various flags and options cause it to perform reasonable variations of this core responsibility (create a branch that doesn't exist first, or update individual files rather than the working tree as a whole). If the commit-name provide is a branch (and you're not checkout out individual files), it also updates the HEAD pointer to point to the named branch, so git "knows what branch you're on".

I would be absolutely astonished if you could write the git internals (or even just knew how they worked) and the idea of a tool to make the filesystem look like the internal contents of the repo wasn't immediately apparent. You might not have realized that that's all that `git checkout` does, but once you understand that, it's extremely straightforward.


So it always updates the HEAD pointer, except when it doesn't, and you give it the name of an existing branch that represents the commit to change your working tree into, except when you specify -bf to use the HEAD commit, and except when you specify -b and a branch name that doesn't exist. You can also specify start_point to use instead of HEAD.

So, the checked out commit is the second ref specified, or HEAD, unless the flags are such that you can only specify one ref.

Git checkout also "merges" uncommitted changes with your new HEAD, which you didn't mention and makes me wonder why the command isn't called "git merge".


Your comment would be at home on https://git-man-page-generator.lokaltog.net/


I can forgive the '-b' creating a branch as a convenience. Having checkout perform safe operations by modifying HEAD when you pass a branch but destructive operations on the working tree when you pass a pathspec is too much. I appreciate your attempt to build a mental model for checkout, but I just don't find modifying HEAD and destructively overwriting working changes to be related operations.


I don't think leaky abstraction correctly describes the problem here, because the leaky thing (the fact that history is not linear, and multiple people edit the same files, and that everything is distributed) is not actually an abstraction. In my mind, Git forces you to deal with the fact that history is inherently nonlinear, and then you can choose to create a linear history by the way you use Git.

Git's main abstraction is the fact that everything is a content-addressable blob. This "leaks" but it's not a major contributor to the problems people have using Git, at least from what I've seen.


> In my mind, Git forces you to deal with the fact that history is inherently nonlinear, and then you can choose to create a linear history by the way you use Git.

That is not the problem with Git. Mercurial or Darcs also force you to deal with it all the same yet are not anywhere near the UI clusterfuck that Git is.

> This "leaks" but it's not a major contributor to the problems people have using Git, at least from what I've seen.

It absolutely is, because it leads to commands having unintuitive and incoherent behaviours and thus to the porcelain being impossible to model top-down.


> is like hearing they need to understand the details of how a transmission works before they go and drive an automatic

Which would be a reasonable attitude for your average driver but not for a professional mechanical or automotive engineer.


Why?

Unless I'm misunderstanding, you seem to be saying that being a programmer implies having a desire to understand every program's internals. But I don't care how Photoshop stores the 30-layer document I'm editing, and I feel like I shouldn't have to. I can't see why Git should be treated differently.


That's a great question and I had to think a lot about how to answer it. I think there are 2 aspects to consider whether something is worth learning: relevance to your current skills, and use of the knowledge.

We insist, especially when hiring, that programmers know something about the internal design of other software that they use, but don't develop, on a day-to-day basis. Any programmer, depending on programming specialty, that doesn't have some idea about design of their operating system, networking stack, databases, or hardware would flunk out interview processes (and probably rightly so). The reason we insist on this specialized knowledge is because most of the time these things work fine, but when they don't the good software engineer proves their worth by being able to dig deeper.

Version control software is used just as frequently as any of this other software. But somehow there seems to be this feeling that it's not worth learning about and should just be a black box (I think package managers are treated similarly). Understanding git internals and design is a valuable software engineering lesson in and of itself.

The other aspect is having knowledge of Photoshop file formats is of limited utility unless you're an Adobe employee or Gimp developer (I don't know if the spec is open or not). Photoshop itself is closed-source and not so extensible or scriptable as an open-source command-line tool.

That's not true for git - knowing git internals reaps productivity benefits in your other work too. Treating it as a black box robs you of the knowledge to do some really cool stuff.

Broadly speaking this applies to any other software tool used by programmers - the more you know of its internal details, the more proficient you'll be in its use.

So in both respects (practical and from a learning standpoint) I think knowledge of git internals for a git-using programmer is a desirable thing.


While I mostly agree with you, I used linear VCS systems (mostly Perforce and SVN) for decades, and I still have no idea how they actually work. Git is fairly unique in that it basically forces you to understand how it works in order to use it, which I think is a major failing. I mean, it's nice that you can understand it and that that knowledge means that you can be more effective with it, but I don't think it should be compulsory. Currently it is compulsory for anything but the simplest workflows.


This doesn't seem specifically a linear vs distributed VCS thing, as much as it is a unique property of git. I previously used Mercurial, and at no point did I feel I had to understand how the VCS stores its commit graph in order to know what running some set of commands will do.

git is unique in its apparent assumption that it's nothing more than a series of convenience scripts that let you work on the raw filesysten representation a bit more easily. In a way it's more a filesystem-level representation of a DAG with some sugar on top, than it is a DVCS.


> This doesn't seem specifically a linear vs distributed VCS thing, as much as it is a unique property of git. I previously used Mercurial, and at no point did I feel I had to understand how the VCS stores its commit graph in order to know what running some set of commands will do.

Ditto. I've used Darcs, Mercurial and Bazaar, all are distributed, all have various levels of issues, but none of them required that I understand everything from the storage model up in order to actually get a grasp on them. I still have better intuition for Mercurial than Git (and long for a number of its features like GIVE ME FUCKING REVSETS ALREADY) despite not having seriously used Mercurial in years and having no idea whatsoever what goes under the CLI (and not caring a whit).


I think the argument would be that you need to understand what those 30 layers in Photoshop represent and how they interact with each other, not how they are stored.


Specializations exist because nobody can be an expert or even reasonably well informed about how everything works, even their own tools. I don't expect most professional engineers to understand the internals of their text editors or their compilers for that matter. Why should we expect it for something like a source control tool?


The reality is that in basically all profressional domains you are expected to be able to use, at a profressional level, the software required to do your job.

For developers that includes version control and package managers, amongst other things.

Do you need to know how git is implemented? Of course not. No more than you need to know how your compiler converts source code to executables.

...but understanding the basics of the tools (eg. what paths and layer groups are, to compare to photoshop), isn't something you can just wave your hands in the air and say 'not my problem'.

Its your job to know this stuff.


That isn't what I'm arguing; of course we must know how to use our tools. If that requires also understanding implementation details, even trivial ones (e.g., "it's a DAG"), or if the interface and words used to describe the usage are non-standard, awkward, or worse act to obscure what is actually going on, the tool and/or its documentation is poor.

For example: in this case, knowing what a branch is, how to create and delete one, and how to merge with one is necessary. Knowing that the branch identifier is a hash, called a "reference" or similar jargon and what that entails, internally, is superfluous and adds pointless complexity: if that's really required then git isn't good. I have more important things to think about as a developer than what internal machinery operates git.


The fact that commits are organized in a DAG is absolutely not an implementation detail. It's one of the core concepts of distributed version control.

It's also kind of silly to call git's terminology "non-standard". Git was the first truly widely used DVCS, so if anything, git's terminology is the standard terminology by definition.


> I have more important things to think about as a developer than what internal machinery operates git.

That could be extended to other tools like firewall configuration, TLS setup, library details, logging configuratoin, monitoring, etc. But you won't get very far in fixing any problems if you don't read documentation and investigate further.


But git doesn't actually require you to know these internals.

You don't need to know the format of git pack files or the network protocol in order to use git.

You do need to understand the concepts of commits and branches.


But most of us aren't version control engineers; we just use version control as one of many tools in our toolbox. I think a more accurate analogy would be that a taxi driver doesn't need to understand how an engine works.


A taxi driver does not to know how an engine. But I'd be amazed if a professional race car driver didn't understand their car.

We all expect software engineers, regardless of specialization, to understand operating systems, networks, databases and hardware at a better-than-layman level. Try saying "that's just a tool in my toolbox, I don't care how it works" about one of those topics in an interview.


We're stretching the analogy a bit at this point, but assuming there are much fewer race car drivers than other driving professionals (taxi drivers, etc), wouldn't the professional race car driver be equivalent to something like the top 1% of software engineers? In which case, I agree, because I'd expect a software engineer paid in the top percentile to know (at least roughly) how all their tools work. But for everyone else, you can get really far in the industry without knowing how any of that stuff works under the hood (or at least not more than you have to for the work you do).

> Try saying "that's just a tool in my toolbox, I don't care how it works" about one of those topics in an interview.

I've met plenty of engineers who couldn't tell you how any of those things work at a low level, but still manage to maintain high-paying jobs. I've also had plenty of interviews where these topics never got brought up. Maybe that says more about hiring practices than anything else, but I think you're vastly overestimating the worth of that knowledge to most companies and individuals.


>I've met plenty of engineers who couldn't tell you how any of those things work at a low level, but still manage to maintain high-paying jobs. I've also had plenty of interviews where these topics never got brought up. Maybe that says more about hiring practices than anything else, but I think you're vastly overestimating the worth of that knowledge to most companies and individuals.

The question is also simply what you pay for and what is in a job description. To simplify, you have to pay a lot less for someone treating software solutions as blackboxes with known behavior then someone who understands what is done.

You can of course ask in an job interview about everything from compiler optimization to network intrusion detection to image processing, but you better should make sure that is actually what you need.


> To them, hearing that they need to read a chapter about the internals of a source control system before working on a project is like hearing they need to understand the details of how a transmission works before they go and drive an automatic

Shouldn't a mechanic have an understanding of how an automatic transmission works? Similarly, shouldn't a software developer have an understanding of how version control works? In general, a professional should have a very good understanding of the tools they use for their work as well as what they're working on.


I can't speak to your experience on TFS, since I've never used it, but your description of it sounds somewhat similar to Perforce.

> History is linear, full stop, and conflicts are handled by essentially forcing a rebase on commit.

This is similar to Perforce, but not something I miss. That forced rebase in Perforce has the very real possibility of causing breakages, and there is no way (at least in Perforce) to reliably handle it.

I also don't think history should be linear; the cheaper branching model in Git in my experience leads to people actually taking advantage of it; contrast this to Perforce, where branches were expensive: people wouldn't bother.

At least in the case of Perforce, the fact people didn't use branches forced them to put all of a feature into a changeset (Perforce's closest equivalent to a commit), which was usually way too much; this results in multiple, distinct changes being squashed together. The lack of cheap branches makes it hard to build a change off of two in-progress changesets, again because the system cannot efficiently represent that state.

And then there was just tooling: Git's add -p option for splitting off partial files into a separate commit, the ability to rework the convoluted history that came from my discovery of how best to implement something to something that encapsulates discrete changes that can later be reverted or cherrypicked gracefully (without reverting everything, unless that's really required) are to me, extremely useful tools that just don't exist elsewhere.

> Some admin does branching and merging and emails us if there are weird conflicts, we don't worry about that. Why make it more complicated?

(To be clear: this does not match my understanding of Perforce; branches there are possible without an admin.)

I just simply can't fathom why someone would want to have to go through another human for a branch. It seems like so much bureaucracy for something that, as Git shows, does not require it.

> why source control isn't a problem solved 100% by tooling in a way that requires little to no interaction at all. To them, hearing that they need to read a chapter about the internals of a source control system before working on a project

I think because, if we accept that history is not linear, and should not be represented as such, then history must therefore be represented as a DAG. A DAG, in my experience, is about the point where people start trying to cheat, for whatever reason, with incompatible and flawed mental models. This bites them in the ass at some point, and they can't understand what they're doing wrong within that flawed mental model, because the model itself is flawed. (And I see this all the time in software engineer, not just with Git. People are too quick to analogize permanently something to something else that is at best a crude approximation.)

Now, as another poster in the comments notes, I'd temper with the above with: some of git's tooling and UI could be better. The mix of "index"/"cache"/"staging area" to all mean the same thing, the variety of things "git reset" can do, etc. all are terrible UI on top of a superior model, but do contribute to the confusion newcomers experience. I hope some day this changes.


> At least in the case of Perforce, the fact people didn't use branches forced them to put all of a feature into a changeset (Perforce's closest equivalent to a commit), which was usually way too much; this results in multiple, distinct changes being squashed together.

First add a feature flag for your feature, then you can have as many small changelists as you like.


This doesn't help at all with deep refactoring. The "git way" of producing small, self-contained changes covers all possible changes, not just feature additions.


Good point, thanks! In my experience all big refactorings (short of a rewrite) can be done incrementally if you're clever enough, and that has a lot of benefits. But maybe doing them in another branch is a valid approach too.


"These folks didn't need git tutorials, they needed to be convinced that curation and management of source history was one of their job responsibilities, which was tough because it really wasn't previously, at least not in any meaningful way."

I've had a very similar experience. IMO, your source control is an essential working tool, like your text editor, and you should be very proficient with it, but plenty of developers that I've met don't share that view.


I believe people have a hard time with Git because it's simply not user friendly. Once you've grokked a handful of essential concepts, Git becomes easy to use proficiently, but both the user interface and documentation are so user-hostile that most people never get to that point.

A few examples:

- `git reset` command can perform wildly different actions depending on its options

- `git revert` does not do what most users would hope it does

- if you actually need to revert your working tree changes, you might need to use `git checkout`

Naturally, people get confused and reach for the docs, which are filled with even more newbie-hostile terms like 'tree-ish', 'hunks' and 'refspec'.

As a result, people tend to see Git as this scary, incomprehensible hairball.


A lot of it is easy UX wins left on the table.

"git revert" would have been clearer if it were identical but just named "git retract". It's a more natural match because reverting usually just means going back to how things were, whereas retracting can and often does mean to announce that you are going back. After a newspaper prints something untrue or unfounded, they issue a retraction.

"git checkout -b" is confusing because the main and most significant thing it does is create a new branch, and checking it out is secondary to that. If it had been "git branch -c" instead (create a branch also check it out) that would have been more natural than "git checkout -b" (check out a branch that, incidentally, you will have created).

"git reset", as you mentioned, does two things. This could have been made easier by just splitting it into two commands. Perhaps the first two forms could be called "git unstage". The git book ( https://git-scm.com/book/id/v2/Git-Basics-Undoing-Things ) explains it under a heading called "Unstaging a Staged File", which suggests that it's a more natural term.


git reset should be renamed to git remove, because add is used to add the staging. Or "git add" should be "git stage"

"git remove" should then be "git delete".


The staging area was a mistake because it's hidden in the repo where I can't build and test it. I should be stashing changes I don't want to commit, and then committing the known-good workspace.


I would argue git is not novice friendly. It's original intended audience was not lay people learning how to program. It was seasoned kernel developers working on the Linux source tree. It (made) assumption(s) it's userbase were experts in their domain.

> Naturally, people get confused and reach for the docs, which are filled with even more newbie-hostile terms like 'tree-ish', 'hunks' and 'refspec'.

Yup, these are everyday terms for an expert in computer science and algorithm and software design. If I were a student in medical school I'd expect the books I'm learning from to use the expert terminology of my domain of study when describing the body and functions and not laymans terms such as -- "the hip bone is connected to the Red-Squishy-Thing .... ELI5 me plz" ...

I agree git is complex, and can be scary. But building software is a complicated process rife with engineering pitfalls! If one is unable to properly use the Lego building blocks, then stay away :D

I do believe when used properly, a productive Git workflow using Branches and sane commit messages can help to give Visibility to your project development life-cycle and help to make an issue or task on a project understandable in scope and difficulty by segmenting ones development work into comprehensible mergable pieces (branches) within the project.


> I would argue git is not novice friendly.

Git is not friendly period, it's a huge pile of leaky abstraction and incoherent commands, the "high-level UI" being closer to shortcuts for common sequences of lower-level operations than an actual abstraction. It simply can not be understood "top-down".

> It's original intended audience was not lay people learning how to program. It was seasoned kernel developers working on the Linux source tree. It (made) assumption(s) it's userbase were experts in their domain.

Domain expertise has nothing to do with it, unless the domain is playing with footguns and expertise is having toes left.

> I agree git is complex

Git is not complex, it just has a very, very shitty API.


Git and its documentation are a prime example of what languagelog calls "Nerdview": http://languagelog.ldc.upenn.edu/nll/?p=276

    people with any kind of technical knowledge of a domain tend     
    to get hopelessly (and unwittingly) stuck in a frame of 
    reference that relates to their view of the issue, and 
    their trade's technical parlance, not that of the ordinary 
    humans with whom they so signally fail to engage.


Some example of "wrong frame of reference" when "ordinary humans" even include "all other programmers not in his own team":

https://blogs.msdn.microsoft.com/oldnewthing/20110512-00/?p=...

I know a few "nerds" like that. They tend to give technically completely correct but misleading or dangerous or simply unusable answers.


Could this principle not also apply to any journal/conference paper? For the most part, one has to have a basic understanding of the field, and read through prior research to have a better understanding of the current problem domain.

We don't expect journal papers to be written at an elementary level. But people who make an effort to get themselves up to speed in the area can understand them. The same thing applies to a lot of tools that are used in software development.


> Could this principle not also apply to any journal/conference paper?

Git's man pages are not journal papers.

> For the most part, one has to have a basic understanding of the field

"Basic understanding of the field" does nothing for understanding Git.

> and read through prior research to have a better understanding of the current problem domain.

You should not have to "read through prior research" to be able to use a basic VCS tool, literally no other VCS asks that of you, not even Darcs (especially not Darcs, which actually put real effort into a sensible CLI).

> We don't expect journal papers to be written at an elementary level.

We don't expect tooling documentation to require a phd unless the tool's entire purpose is something only a phd would need.

> The same thing applies to a lot of tools that are used in software development.

No. It applies to very few tools that are used in software development. And git most definitely is not one of them.


> "Basic understanding of the field" does nothing for understanding Git.

Assuming one has at least read up on data structures, then one would have read about graph structures and have heard about what a directed acyclic graph is.

> You should not have to "read through prior research" to be able to use a basic VCS tool

You should have some understanding of what source control is and have read at least some of the documentation of the VCS tool you are using. I've never used hg and wouldn't know where to start without reading some of the documentation first. But having prior knowledge of VCSs and distributed VCSs would allow me to target my search for documentation about how to commit changes, how to publish them to a remote repository, how to make a branch, etc.

> We don't expect tooling documentation to require a phd

Many of these papers can be read and understood by students who are pursuing their undergraduate degree. But they do need to put some effort into it.

> No. It applies to very few tools that are used in software development.

Can I use grep without a basic understanding of how regular expressions work? Could I use a debugger like gdb without understanding concepts like the call stack and variable addressing? You have to be able to read through documentation to effectively use a lot of tools in software development.


> Assuming one has at least read up on data structures, then one would have read about graph structures and have heard about what a directed acyclic graph is.

Knowing what a DAG is doesn't help using git and Git being singularly problematic despite every DVCS being a dag indicates that DAGs have nothing to do with Git's issues.

> You should have some understanding of what source control is and have read at least some of the documentation of the VCS tool you are using.

Basic understanding of source control and the ability to read correctly-written documentation (which Git's is not) is not "read[ing] through prior research".

> Many of these papers can be read and understood by students who are pursuing their undergraduate degree. But they do need to put some effort into it.

Completely missing the point.

> Can I use grep without a basic understanding of how regular expressions work? Could I use a debugger like gdb without understanding concepts like the call stack and variable addressing? You have to be able to read through documentation to effectively use a lot of tools in software development.

Your original assertion was not the ability to read some documentation (under the assumption that the documentation actually makes sense, which isn't the case for git).

If you're not going to make even cursory attempts at basic honesty and coherence, there's no point to a discussion, enjoy playing with yourself.


> correctly-written documentation (which Git's is not)

How does git's documentation fundamentally differ from tools like ffmpeg or grep? That is, what makes it incorrect? I've read through those man pages and have been able to figure out how to use the CLIs for those commands.

>> We don't expect tooling documentation to require a phd

[ ... ]

> Completely missing the point.

Using hyperbole doesn't help you make a point.

> enjoy playing with yourself.

You don't have to resort to using insults.


I know a lot of CS experts who go weeks without saying any of “hunks”, “refspec”, or “tree-ish”. You are making excuses for needless, opaque jargon.


I was a pretty good CS student and do not remember any of those terms outside of git.


Out of these, only “hunk” is a somewhat common term.

“refspec” and “tree-ish” are obscure and git-specific.


I've always been puzzled about why programmers aren't considered worthy of tools designed with thought given to user interface.


Very few dev tool organizations employ dev-oriented UX designers. It’s mostly the developers and PMs themselves doing UX design, when they have been evaluated and promoted based on other things (coding, project management).


Because it's hard demanding proper UI when you intuitively know you couldn't build one to save your life, not only does it feel like whining, if it spreads you might be out of a job.


Yep. It's puzzling how it won over hg. Linux is a powerful mannequin I guess.


I suspect things would have gone differently if Github supported Mercurial.

Also Git is now more or less locked in to terrible naming for its actions - yes they could be renamed but that would probably cause more issues than its worth. And Mercurial isn't dead yet - Facebook uses it and they're rewriting it in Rust (yeah yeah) to make it faster. I don't think the VCS wars are over yet.


hg was slow. if linus, at the time, felt otherwise, it might have been hg that "won" when bitkeeper was dropped.


If Linus, at the time, felt otherwise, git probably wouldn't have been created.


One of the nice things about my current job is they are still with hg -- we are going to have to switch to start using Bitbucket. I expect Git will be easier to understand the second time around, but I'll never forgive it for making itself so hard to use.


Bitbucket supports mercurial.


Bitbucket Server (on premesis) does not support mecurial.

https://confluence.atlassian.com/confeval/development-tools-...


The best example I can think of right now as to where git's documentation is unusable: suppose you accidentally git add'd a file, and you want to remove it without deleting it on disk (since, you know, that's your only copy of it). This doesn't come up very often, so there's no way you're going to remember the command off the top of your head. The most reasonable guess is "it sounds like a special rm, so let's look at the documentation for rm."

Here's hg help rm:

    remove the specified files on the next commit
        Schedule the indicated files for removal from the current branch.
        This command schedules the files to be removed at the next commit. To undo
        a remove before that, see 'hg revert'. To undo added files, see 'hg
        forget'.
Oh, I don't want hg rm, I want hg forget. hg help forget confirms this:

    This only removes files from the current branch, not from the entire
    project history, and it does not delete them from the working directory.
Okay, let's ask git help rm:

       Remove files from the index, or from the working tree and the index.
       git rm will not remove a file from just your working directory. (There
       is no option to remove a file only from the working tree and yet keep
       it in the index; use /bin/rm if you want to do that.) The files being
       removed have to be identical to the tip of the branch, and no updates
       to their contents can be staged in the index, though that default
       behavior can be overridden with the -f option. When --cached is given,
       the staged content has to match either the tip of the branch or the
       file on disk, allowing the file to be removed from just the index.

For this to be any use, you have to remember that "index" refers to the staging area and not anything else that might reasonably be presumed to be an index (such as the commit history). Of course, if you knew there was a glossary, you could double-check the definition in the glossary, only to find that the definition (a collection of files with stat information, whose contents are stored as objects) is about as useful as the infamous "a monad is a monoid in the category of endofunctors," but the latter was supposed to be a joke.

Sure, some problems with git may be that people have trouble understanding the concepts. But many of them are due to git's impenetrable documentation and a few of the commands that do too many things. The sheer resistance of many adherents of git to recognize that it has a UX problem is just mind-boggling.


In such cases, I look for my problem on stackoverflow. When I find a solution which seems right to me, I look up https://git-scm.com/docs to get a better understanding of what I am doing and why.

The problem with many documentations is that they are not use-case driven. Much of it is a functionality description. Such as:

- To fasten, you have to turn the nut clockwise.

instead of:

- To fit these two parts together you need to fasten the nuts onto the screws.


At some point, git status started showing the commands for moving changes in and out of the index, and it is an amazing timesaver for me.


Not to mention that the most common way of creating a branch is by using git checkout, and it also reverts files, and also switches branches, again all based on what flags you feed it


`git checkout` has one main responsibility: take stuff from a commit and put it on the filesystem (and put it in the index as well).

All of its functionality is simple variants on that. `-b` is the only real odd man out, and it’s just a shortcut for calling `git branch $branchname` first.


Git is much like Vim...user-unfriendly on the surface...but user friendly at the core. It is a simple tool that works, but is hard to understand for the beginner.


There's one key difference: vim's user manual is useful. git's user manual is not (see https://news.ycombinator.com/item?id=16590497 for an example).


Yeah that's not an excuse though. It could easily have been user-friendly on the surface too!


I suffered using SVN for years, when I switched to GIT it was like pulling my head out of a plastic bag and breathing fresh air. It was unbelievably rejuvenating.

Maybe if you are already skilled with one kind of version control system, GIT is easier to understand. But if you are starting from scratch with GIT, it's more difficult that way? (I am not sure, I only have one experience to lean on)


For a newbie I think SVN is definitively easier to learn at first. But then you have to live with the limitations. Git is the better choice in the end, especially if there are more than a few people working on the same codebase.


> The kicker for me was to realize that in git (mostly) everything is a reference to a commit id.

Same here, and in fact this is almost verbatim how a friend and colleague explained it to me when I was first introduced to git some seven years ago. Once I got a grasp of that, it was almost like walking out of a dense fog. I learned to be very proficient – comfortable even – with git very quickly thanks in no small part to his mentorship. One of the first things he showed me after I royally screwed up a couple of weeks into the new job, was how to use the reflog. This is also when he explained the importance of the commit IDs to me, much like how you did above. I won't say I magically knew how to do anything in git from then on, but it certainly made understanding things a whole lot easier.


> I can not insist enough on how important it is to understand the fundamental design of git, and it is really not complicated.

I absolutely agree.

Everyone I know who has expert-level knowledge of git, and can do arbitrary things with it or solve arbitrary problems with it, thinks along the lines of "I want to get from this repository state to that repository state, what commands will get me there?".


If you grew up with Unix commands it makes sense (mostly), if you grew up with Photoshop it's really not very friendly.

Stackoverflow top item on how to redo a commit (one of most common things a new user may need to do):

  $ git commit -m "Something terribly misguided"
  $ git reset HEAD~
  << edit files as necessary >>
  $ git add ...
  $ git commit -c ORIG_HEAD
I run a ton of aliases - some of them functions - just to do basics. I really like git - but it is a very steep learning curve for many.


I grew up with Linux and have used cvs, svn, git, and mercurial, not doing complex stuff, just basic. Out of all of them git is the most inconsistent and difficult to learn, both at the command level and in trying to deal with the docs and community around it. Everyone seems to think everyone else is a mathematician and every doc or message has this air of “if you don’t intrinsically understand DAGs then you’re just not worthy to use this tool”.


Isn’t `git commit --amend` enough? No need to reset and commit again.


Sometimes it's easier to undo & redo. That's the reset approach. (Reset, your mistake goes away, do it over again but right, then commit.)

Sometimes it's easier to just make another change. That's the amend approach. (Keep what you've got, which was almost right, make another change--add, delete, or edit--so it's right, and re-commit as if you'd made that additional change from the start.)


Maybe if you only need to add something, but not if you included something you didn't want to in the commit, but you still want to leave the change there.


> If you grew up with Unix commands it makes sense (mostly)

> I run a ton of aliases ... just to do basics.

So... it doesn't make sense then.


Git is ... alright, but it doesn't deserve quite as much praise as it gets. I have well over a decade of experience using source control at a fairly high level, and I have experience with a wide variety of source control systems. Git is currently more or less the best thing that's out there in a lot of ways, but it still has a whole shit-ton of shortcomings.

Even when you use git according to the finest shade grown, hand picked, organic best practices available you are still going to routinely run into situations where dealing with git is just a straight up pain in the ass for no good reason. Every source control system has its weaknesses but in some ways git's are even more annoying because so much of the rest of it is engineered so well. It's like opening up the trunk on a maserati and finding a compartment that's lined with splintery unfinished recycled pallet wood and held together by chewing gum, it makes you wonder why it's there and why having some piece that is such a comparatively low quality continues to be tolerated year after year.

Again, overall I think git is pretty great, but the persistent lack of effort toward improving git's core weaknesses is incredibly frustrating.


> The kicker for me was to realize that in git (mostly) everything is a reference to a commit id.

sigh~

I think this is the actual problem with git.

The idea that there is a fundamental abstraction that no one ever told you about git that will let you derive and understand its behavior from first principles is just wrong.

What you need to understand (and all you really need to understand) about git is how to make it behave in a predictable manner, for your particular use case.

Everything else is to support that, not vice versa.


I would say a "tag" is a better example of a human-readable pointer to a commit. Branches are similar, but automatically "grow" to always point to the tip of the branch.


Tags are in general more immutable -- you would't see a remote tracking tag, for example!


That there aren't remote tracking tags is a bug, the reason to have remote tracking isn't purely to watch modifications, but to see where a ref came from.

The git version that's about to be released has a fetch.pruneTags feature I wrote which has more caveats than it should have because of missing remote tracking for tags.

There was a patch series to implement remote tracking for tags a while ago, but it's a lot of work to get right so it stalled. Hopefully it'll be re-visited.


How are tags immutable? They can be deleted and added again to point at a different commit id.


Tags can be added (created).

Tags can be deleted (permanently destroyed, though of course with the distributed nature of git, remote tags must then be handled in a separate step).

But those are not intrinsic properties of the tag itself. Those are properties of the repo that holds the tag.

The tag itself is immutable: It is a human readable name pointing to a commit id. And, to distinguish it from a branch, it can't be updated after it is created.


Setting aside the distributed nature of git, if I delete a tag and add it again to point at a different commit id, how can the tag be considered immutable? If I step away from my desk and come back and look at a commit id, I know it is exactly the same thing as before. I can’t say the same thing about a tag.


The tag is immutable because you had to first delete it, then recreate it, to change its value. Technically the tag might have the same label (say v0.0.1), but it still is a new tag, that so happens to be labeled the same as your old immutable tag.


Same way strings in python are immutable yet still allow you to delete and recreate. Git repos are not blockchains


I don't know enough about blockchains to understand the phrase git repos are not blockchains.

If I say git log 191edc5e2e515aab1075a3f0ef23599e80be5f59 anywhere at any time it means the same thing each and every time. That is not the case with git log v0.0.1


The v0.0.1 is just a label for the tag object itself.

E.g. if you clone git.git you can see a tag for v2.16.2, the latest release. That you can re-create point to something entirely different.

But if you do:

    $ git rev-parse v2.16.2
    86aabcca24951ccfb9392014c8a379992434a7df
    $ git rev-parse v2.16.2^{commit}
    ffa952497288d29d94b16675c6789ef83850def3
You can see that the annotated tag object (GPG signature and all) is really called 86aabcca24951ccfb9392014c8a379992434a7df, and that it points to a commit ffa952497288d29d94b16675c6789ef83850def3.

That 86aabcca... is the immutable part of tags, just as commit objects are immutable.

Note that this only applies to annotated tags, lightweight tags just point to the commit object themselves, so they're really just a sort of branch name that git treats differently in not ever advancing it to another commit.


Are you confusing immutable with tamper-resistant or something along those lines?


I am not sure :/. Is there some reading material you could point me at? Thanks!


I think the problem is the arcane interface and documentations. The verbiage in the documentation is unnecessarily difficult if not a computer science major.

I agree The Git Book is by far the best way I have found to learn it but the commands remain . I imagine they are some kind of balance between what operations are useful day-to-day and what is possible in respect to the program design but they fail so often because they have almost no consideration for interface design.

I just wrote a script for determining which files are ignored by .gitignore in a repository and the solution I arrived at is nothing short of painful to think about. It is ugly, anything but intuitive and impossible to build a test for. What is the excuse for this?


I'd be interested in that technique, he said, completely understanding the masochistic request.

I use git based deployments in a fair number of scenarios (where the .git directory is not web accessible). Git status is the perfect mechanism for detecting code changes, the primary of which being intrusion or developers working on production directly which should never happen.

Being able to detect the breadth of all the .gitignore files over the entire directory structure doesn't sound like a fun exercise but it would be a nice addition to be able to report on what ignored files exist. It's possible there's a command for this but I feel like you're confirming my assumption that such a thing doesn't exist.


Well, to give some perspective, the hg command to do what you want is hg status -i.

There are equivalent options to the hg status command to list only added, deleted, modified, deleted on disk but still tracked, files in the repo that are neither ignored nor tracked, and files that match the current checked-out commit. Admittedly, git status shows more information, but then again, hg usually doesn't need you to know about what you current state is to predict what will happen when you run a command.


    git ls-files --others --ignored --exclude-standard
Or if you want to see files ignored by a specific .gitignore:

    git ls-files --others --ignored --exclude-file=<file>
There's also

    git status --ignored
if you just want a quick view of the ignored files (i.e. not for scripting).


Really? It seems like

    find * -type f | join -v 1 - <(git ls-files)
gets me maybe 90% of the way there.


I do agree that reading the documentation is helpful. But nothing beats using it.

If people have a test machine or a VM to download/install git and create toy projects to learn the basics of git, that would be far more effective. Try branching, merging, diffing, committing, etc to get a better feel of how the source control system works.

If you have background in subversion or other source control ( hell even experience in source safe might help ), then it would help you get caught up much faster.

Also, there are tons of youtube videos on git if you learn better via video.

But at the end of the day, if you don't use it, you aren't going to understand git.


Easier said than done. My last team said they had used git a lot. Some were even very experienced.

What they referred to was what I could most accurately describe with "Treating the GitHub website as a Dropbox except it forces you to confirm conflicting files on upload."

I wish I could find the right words to convey what a nightmare that was. They ignored directory structure, straight up just committed any merge conflicts to the default branch, no sense in what happened in which commit, etc. Frequently overwrote each other's work, since their most reliable merge strategy was deleting the file entirely and then reuploading their own copy. By doing so, they created an endless series of conflicts where they alternatingly uploaded each of their copies -- both locally having a cohesive version, but anyone pulling from them got one or the other, depending on who uploaded last. They usually "fixed" this by hunting down the git conflict markers in the code and deleting just those, leaving two nearly indistinguishable copies of the conflicting hunk. Frequently the only difference was whitespace.

If you tell a person lile that to "just use git more!" they would, best case, not know where to start. Worst case, they'd keep doing that.


Learning git has given me many a-ha moments. Years ago it was even worse, with unstable porcelain and only the man pages for company.

What made it all work for me was conducting courses in it, so I had to make the concepts vivid for people that weren't that interested in the theory. That forced me to reduce it to simple examples, which really worked.


Understanding that in git 'branches' are not really the branches but merely post-its that can trivially moved between the 'real' branches on the tree of commits really opened my eyes. Along with the use of visual tools like https://github.com/gitx/gitx, https://github.com/FredrikNoren/ungit, https://github.com/jonas/tig and aliases like `log --oneline --decorate -50 --all --graph --remotes`.

It really changes Git from a bunch of vague commands to a toolkit that you can use to apply the changes you desire on the database and worktree.

But even then I still get confused as the concepts of Git are not well translated into the commands, the various functions of reset and checkout. I think it is mentioned often before on HN but http://gitless.com/ is an good approach to create a new API/concepts on top of the same internals.


GitUp[1] is an excellent alternative to gitx and ungit, for those unhappy with either.

[1]: https://github.com/git-up/GitUp


The raw shell for the workout is:

  rm -rf lgthw_*
  mkdir lgthw_origin
  cd lgthw_origin
  git init
  echo 1 > afile
  git add afile
  git commit -m firstcommit
  git log --oneline --decorate --all --graph
  git branch otherbranch
  git tag firstcommittag
  git log --oneline --decorate --all --graph
  echo 2 >> afile
  git commit -am secondcommit
  git checkout otherbranch
  git log --oneline --decorate --all --graph
  echo 3 >> afile
  git commit -am thirdcommit
  git log --oneline --decorate --all --graph
  git checkout firstcommittag
  git log --oneline --decorate --all --graph
  git checkout -b firstcommitbranch
  git log --oneline --decorate --all --graph
  cd ..
  git clone lgthw_origin lgthw_cloned
  cd lgthw_cloned
  git remote -v
  git log --oneline --decorate --all --graph
  git branch -a
  git checkout master
  cd ../lgthw_origin
  git checkout master
  echo origin_change >> afile
  git commit -am 'Change on the origin'
  cd ../lgthw_cloned
  git fetch origin
  git log --oneline --decorate --all --graph
  git merge origin/master
  git log --oneline --decorate --all --graph
  cd ../lgthw_origin
  echo origin_change_rebase >> afile
  git commit -am 'origin change rebase'
  git log --oneline --decorate --all --graph
  cd ../lgthw_cloned
  echo cloned_change_rebase >> anewfile
  git add anewfile
  git commit -m 'cloned change rebase in anewfile'
  git log --oneline --decorate --all --graph
  git fetch origin
  git log --oneline --decorate --all --graph
  git rebase origin/master
  git log --oneline --decorate --all --graph


There are amazing ui’s which provide really nice interfaces to git. I always cheerlead http://gitup.co when I come across git complain threads (I love gitup!). Try it and you’ll quickly realize that git is hard to use because it has a terrible command line interface and because the git commands manipulate state which is not at all easy to visualize from the command line.

Being able to Visualize the state of the repositories provides a huge win when learning the actual concepts behind dvcs (which are NOT complicated)


Indeed, I'm a big fan of SmartGit: https://www.syntevo.com/smartgit/

SmartGit isn't an oversimplified GUI designed mainly for beginners, it is very powerful. It gives you a direct view of the state of your repo along with high level tools to manipulate it.

Taking a couple of examples from this thread:

> Stackoverflow top item on how to redo a commit (one of most common things a new user may need to do):

  $ git commit -m "Something terribly misguided"
  $ git reset HEAD~
  << edit files as necessary >>
  $ git add ...
  $ git commit -c ORIG_HEAD
In SmartGit you just edit your files and commit again, and in the commit dialog check the "Amend previous commit" box.

Or the reflog. Ah, the reflog. You get this list of hashes and commit pointers and commit comments. Now what? Which was really the commit you were looking for? If you're like me you may have several commits with the same comment as a result of rebasing and amending and generally fooling around. How do you see the actual diff for one of those commits?

In SmartGit, you open the log window and click the "Recyclable Commits" box at the bottom of the Branches panel. Now everything from the reflog is displayed as part of your commit log, just like any other commit or branch. You can click on any of those commits to see the list of changed files, and click on any of those files to see the diff.

This is orders of magnitude more usable than the command line reflog. And yet, I often find a tool like SmartGit a tough "sell" when I'm talking with other developers. One very talented developer told me, "I would never use a GUI, I'm much more productive in the commaand line."

But the same developer also told me they never use rebase or amend a commit for the reason that you could lose your prior commits. They'd heard of the reflog but never used it. And really, who would want to if you could avoid it? It's a real second-class feature and very awkward to use.

In SmartGit, you just click that box and the entire reflog becomes part of your normal log with all of the tools available.


I use emacs's magit, which is a great labor-saver, but I don't think it protects you from the need to understand the basics of how git represents your history.


This post is almost impossible to understand if those concepts are truly new to you. The hard way is supposed to just dive in, not skip necessary bits. I understand it's a promotional piece for a book, but this makes me wonder whether the book is also like this (probably (hopefully?) not because there is more space).


And even if the concepts are not new, the post seems very misleading. Something describing itself as teaching "the hard way" should certainly avoid fuzzy statements such as the following:

> HEAD is a special reference that always points to where the git repository is.

This kind of makes sense if you think about it a certain way (it generally points to something (possibly indirectly) that the index/working tree is based off), but off-hand, "where the git repository is" sounds like it means the location of the repository within the filesystem or something.

> If you checked out a branch, it’s pointed to the last commit in that branch. If you checked out a specific commit, it’s pointed to that commit.

This is inaccurate. When you check out a branch, HEAD points to the ref for the branch, which in turn usually points to a commit.

> Every time you commit, the HEAD reference/pointer is moved from the old to the new commit.

This is wrong unless you're in a "detached HEAD" state. `git commit` updates a ref that does not point to another ref. If you're in a "detached HEAD" state, that means HEAD is updated (since HEAD is pointing directly to a commit). If you've checked out a branch, that branch's ref will usually be updated (since HEAD is pointing to a ref which points directly to a commit).

In my opinion, git is best described in terms of what is ACTUALLY happening in terms of modifications to the refs database (essentially a `Map FilePath (Either FilePath ObjectId)`), rather than trying to come up with metaphors for what it means to perform certain actions. When I say that "HEAD points to the ref for the branch", that is demonstrably exactly what happens:

> $ cat .git/HEAD > ref: refs/heads/master > $ git checkout -b other-branch > $ cat .git/HEAD > ref: refs/heads/other-branch > $ git checkout --detach > ffd914475c61c18c684ccb0024ca141c54255e28

Understanding this basic concept should make most of the concepts explored in sections 1) and 2) of the article self-explanatory.


This feels more like an Ad than an article to be honest.


Mods, isn't it time you add something to the guidelines [1] to discourage advertisements disguised as articles?

https://news.ycombinator.com/newsguidelines.html


I'm not sure about prohibiting it. I didn't really like the article, but if more people feel that way, the community just shouldn't upvote it -- or at least other, better things get more upvotes than this one. I feel like prohibition should be the last resort for things which the mods notice actively causing conflict among otherwise civil members of the community, like political discussions (e.g. there have been attempts at no politics days because of this, but that didn't work well for multiple reasons).


Git comes down to knowing 6 commands... and you don't need to even need to be an expert with all 6. If can grasp the following 6 commands, you will know about 99% of every day uses of git:

git add .

git commit -m "my commit message"

git merge my_feature_branch | or my personal favorite: git merge --squash my_feature_branch

git rebase my_feature_branch

git log

git checkout -b my_feature_branch


I don't know. I just grepped my own shell history, selecting those lines that start with git and printed the first two words. Even removing those commands that I use just once or twice, I have a much longer list than yours. Here are mine that are not mentioned by you:

git cherry-pick

git shortlog

git push

git pull

git fetch

git stash

git branch

git diff

git blame

git show

git reset

git help


...but do you actually need to understand all of those to use git?

I doubt it. Pull, push, branch, diff perhaps?

I know people who have been using git for years and they never bisect, cherry-pick or stash. Heck, I was talking to a guy yesterday who asked how to push a tag to a remote branch; he'd literally never had to do that before.

I'm just saying, you don't have to use (or know how to use) all the tools in the toolbox to be able to use any of them.

/shrug


I don't know - cloning, pushing, pulling, and arguably fetching are all part of a normal workflow with git. You're up to 9-10 commands now.


I never use log or rebase. I do use push, pull, and checkout. I guess everyone has a different 99%.


until you accidentally use rebase to squash commits and accidentally squash the wrong commits...


then just run `git reflog` to go back to the commit before you rebased


reflog is really out in the tail of git commands you ought to know to use it effectively, though. I'd say you're at much more than six commands at that point.


to learn git this is amazing:

https://learngitbranching.js.org/

It's almost like one of theese games like TIS-100 until you grock it and then you understood git :D


git is one technology that I was able to really understand quickly. Once you understand the architecture, what a commit is and how they relate to one another, all the git commands make a lot of sense, including the more complicated commands like rebasing and branch history rewriting.

The simplicity of the architecture is a great feature of git.


Git is only simple if you haven't been exposed to a simpler version control system before. Despite being one of the newer version control systems git does worse than most version control systems at actually hiding details. If you are not used to a version control system where you don't care about the details git seems great. However those of us who come from something else (mercurial in my case) are not happy about having to learn the git internals to do things that in mercurial didn't require knowing mercurial internals. I know what I want to do, I've been using a distributed version control for years, but git is forcing me to learn something that has always been hidden.

Sadly github and the like have built enough momentum around git that I'm forced to learn it. (Note mercurial does have its warts, but they are on a different level)


why isn't `git log --oneline --decorate --all --graph` the default behavior of `git log`? It makes things so much easier to understand IMHO.


Try it on something that isn't a trivially sized repository. E.g. on linux.git it takes 4-5 seconds on a warm cache on SSD to show anything, whereas "git log" is instantaneous. This is because to show the graph it needs to search forward through the history before knowing how to paint it.

Then when it does show something it's pretty useless on complex histories, e.g. on linux.git a few dozen pages down into the output we're at 140 characters on graph lines before we even get to showing the commit id and climbing. Scroll down a bit more and all you're going to see is jumbled graph line output as the commit info is entirely pushed off-screen.


Fair enough - that's a great point. But I do wonder if the average git beginner is dealing with such complex repositories. Perhaps there could be a `git log --simple` or whatever equivalent, and those developers working on gigantic repositories could alias as necessary.


There's no rule of thumb that git beginners start with their own repositories. Maybe they clone a big project like linux or git, or start working as a junior dev at a company with an established code base. It's important that git work sensibly by default without you needing to know there's a --simple.

Also --oneline --graph can get really nasty on even small repos, particularly those made by beginners who are doing redundant merging.

But yes, all of those are technically solvable, e.g. git itself could fall back on the simpler view given some heuristic, but I think given the principle of least astonishment it makes sense for all the commands to work sensibly by default whatever the complexity of the repo.


You can add aliases to your ~/.gitconfig. These are mine

   [alias]
	co = checkout
	br = branch
	ci = commit
	st = status
        ll = log --pretty=format:'%C(magenta)%h%Creset -%C(bold yellow)%d%Creset %s %Cgreen(%cr) %C(bold cyan)<%an>%Creset'
        lg = log  --graph --pretty=format:'%C(magenta)%h%Creset -%C(bold yellow)%d%Creset %s %Cgreen(%cr) %C(bold cyan)<%an>%Creset'


Yeah, aliases are great for those of us who have had the exposure necessary to learn about all kinds of flags and features. But I think the default `git log` view could do with a lot of improvement to help out newer git users.


What does this git command do? I have used git for a decent amount of time and still have never used this.


Give it a try - it produces git logs which are much more 'graph-like', and really help visualize the different branches used in a given repo. I especially like the `glola` alias provided by the `oh-my-zsh` git plugin.


Ah I see, its git repository history. The 2 places I have worked at thankfully have other ways of visualizing this. I also don't view the different commits using git. Rather use a different tool like Phab etc.


I thought all we needed was the short "Git for Computer Scientists" [0], which consists mostly of images and which had me saying "oh, that wasn't so hard after all".

[0] http://eagain.net/articles/git-for-computer-scientists/


If you'd like a guide to git that explains the basic concepts first and explains the data model only after it's shown it's useful, give Git Magic a look.

http://www-cs-students.stanford.edu/~blynn/gitmagic/


Had a git tepo that I messed up pretty bad while in college, went to the professor to ask him to delete the repo so that I could start over. He told me to fix it as he wouldn’t delete it, best advice I’ve gotten as it truly was a learning experience.


Did somebody mention magit? It helps a lot


'the Hard Way' but no mention of Merkle Trees. Perhaps it's time for Git for Rocket Scientists.


git commit -a -m "Commit" && git push

90% of the time, all you need :)


Fixed this for you:

git commit -a -m "Commit" && git push || ( git push --force; echo "Bob, my git is weird again, could you please help?" > /run/slack/dm/bob-the-local-git-expert )

Obviously, you do this from master.


Until you actually need

git add . && git commit -m "message" && git push <remote> <branch>


Sure, but it's the 10% that hurts!


The other 10% of the time:

git commit -a -m "Commit" && git push -f


Forcing your coworkers do deal with your incompetence, I see.


Since the OP never mentioned a need to use 'git pull', I assume the 10% of the time it didn't work was when other people had gone and corrupted the repository by pushing their own commits. Clearly that's not something we want to have to deal with.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: