I love Git because its core is so simple it fits on a napkin [1]. From such a simple mental model, everything else flows. It's an extremely simple mental model because it compresses so well: from only the first principles, I can derive the rest.
Any other computer system, I try to dig and find its first principles, but nothing is quite so clean, so I have to keep in mind a bunch of exceptions and add-on warts to understand them, a larger mental load.
That said, I seem to often be the one around my teams most comfortable around Git. I can't explain it. Perhaps I've erased my initial struggle to understand it. Perhaps others don't conceive of Git as a datastructure...
[1] files=blobs (named after the SHA1 hash of its content), dirs=trees=collections of blobs+trees (named after the SHA1 hash of its content), a commit=top-level tree + parent commit(s) + commit comment (named after the SHA1 hash of its content). A branch is a reference is a pointer to the latest commit in a series. Creating a new commit adds to the top and updates the pointer.
I think the disconnect is that the "first principles" that you can derive from are implementation details; in other words, git makes the most sense if you embrace the inherent leakiness of the abstraction.
> Perhaps others don't conceive of Git as a datastructure
The problem is that git doesn't really look much like a datastructure at first glance; it's ostensibly a shell command that you invoke, which makes people look at it in terms of inputs and outputs rather than spending time learning the details of all of the parts that aren't obviously in plain sight.
> A branch is a reference is a pointer to the latest commit in a series. Creating a new commit adds to the top and updates the pointer.
I think this is a pretty good demonstration of the issue I'm talking about; I've talked to engineers who totally understand git's underlying model but still are incredibly resistant to this definition of a branch, even though it's the accurate one. The idea of a branch conceptually being the entire chain of commits from the most recent one to the initial one is so pervasive that even the term "branch" heavily implies it (when arguably the more accurate analogy would be to call them "leaves"). When the definition of one of the most fundamental concepts defies people's common intuition, the finer details of advanced operations on them like rebasing don't have a chance of being widely understood. Git's failing isn't that it's complex, it's that the only effective logical unit to think about the system is the entire thing, and that lack of abstraction is the alien model to the way we're encouraged to think of things as programmers.
I've shown coworkers "git log --graph --oneline --decorate --all -50" many times and always have them start with it (and even use it throughout stepping them through something) to try and show that structure as we go. I think it helps a bit, but their IDEs go to great lengths to keep them out of the command line, so they're never completely sure what happened when it does something odd to their repo.
> but their IDEs go to great lengths to keep them out of the command line
Their IDEs can show them the graph you're talking about in a much richer form, lets them navigate through it, check what each commit did, who did it, and even change the graph (as everything in Git is changeable) by removing/rephrasing commits for example. I don't see how using the terminal actually helps anyone understand git, quite the opposite.
From what I've seen, they don't have that functionality. But my favorite is that it has a "sync" button for when a local and remote branch diverge due to a rebase, that doesn't tell you what it's going to do. The one time I saw it in use, it rebased new commits from master onto their branch.
I actually understood Git much better as soon as I switched from a graphical interface (which was Git Tower at the time) to just using the Git CLI. I feel like the graphical interface/overhead obscured what was really going on and made me not so much learn Git, but the tool on top of it.
Git Tower is actually a great tool, but I believe Gits core concepts cannot be understood well without lots of experiments with the bare Git CLI (what helped me most was making toy repos → testing my hypotheses on how things work → observing/inspecting results).
> I think the disconnect is that the "first principles" that you can derive from are implementation details; in other words, git makes the most sense if you embrace the inherent leakiness of the abstraction.
Of the things mentioned in that comment, only SHA1 is really an implementation detail. The rest, especially how commits and branches relate to each other, is basically the core model of using git. The only thing I'd really adjust in that statement is condensing down the description of trees to "commits are a snapshot of the repository" (as opposed to a diff, which a lot of people seem to think commits are despite it being a niche model in VCS design).
(To give an idea, I'd condense SVN's model down to similar terms: commits are a snapshot of the repository given a revision number, and branches are copies of the folder structure, or parts thereof, conventionally under branch/)
> The only thing I'd really adjust in that statement is condensing down the description of trees to "commits are a snapshot of the repository" (as opposed to a diff, which a lot of people seem to think commits are despite it being a niche model in VCS design).
A lot of VCS have used diffs to implement commits. But... they have also used snapshots from time to (every N commits) in order to make checking out old commits more efficient. Apparently this is how Mercurial works and how either Subversion or CVS implements it.
This is based on some article I read yesterday. I could dig it up if this turns out to be wrong.
Now that is an implementation detail: all of those tools treat commits as snapshots primarily. Internally they may use diffs in order to compress the data (as does git), but the commits are primarily presented to the user as snapshots, and only secondarily treated as diffs in some circumstances (this flipping is a confusing point, to be fair, and something which git does a lot more of, which I think can add to the confusion. e.g. rebases are really turnings a series of commits into diffs, and then applying those diffs onto a different base commit to create a new string of commits. But the git man page just describes this as 'moving commits'. SVN and CVS just don't support this functionality, and mecurial buries it behind an extension).
If you want examples of VCS that treat commits as diffs conceptually, look at darcs and pijul.
Good point about patch-based VCSs. The diff/snapshot thing is definitely an implementation detail for snapshot-based VCS. (I haven't tried out patch-based ones yet.)
> arguably the more accurate analogy would be to call them "leaves"
I don't think so. Branches are the more accurate analogy because two branches can have many commits on top of each of their common, parent commit - so branches grow as tree branches do, not leaves.
> so branches grow as tree branches do, not leaves
That’s exactly the mistake though - branches (the implementation detail) don’t grow. Branches - under the hood - are “one pointer to one commit”, and they are always the same size.
The suggestion to call the implementation “leaves” is because you get a pointer to one commit at the end of the branch, and you can then follow that branch backwards (via the linked-list of parent-commits) until you reach the trunk. But the “branch” is never actually stored anywhere, it is only implied by its ends.
That makes no sense. Different branches always have a common parent commit which is the "branching point". When you update the branch to point to a new commit, your branch is growing apart from the other. The branch analogy makes much more sense.
"having a data structure" is very different than "being a unique data structure that you won't encounter anywhere else". I don't think most programmers have implemented their own version control systems, so the idea that needing to learn a new data structure to learn git isn't obvious even if you consider it "inherent"
I was replying to a comment about how the fact that Git has a datastructure (or whatever was meant) is surprising since it's just a command that you call, which is absurd considering what it does (store revision history).
I tend to agree, however it took learning about git internals to get to this point. Most developers don't read stuff like that. Also git creators made the mistake of using terms like add and checkout to mean completely different things to older version control systems. I think that's where a lot of the confusion stems from.
Yes personally I also can't co relate with git being hard, I mean yes during the initial days of learning you get stuck, but it is the same with everything right ?
Git is a tree and the nodes are commits. All of the git operations are operations on the tree except for the staging which is a special place to put things before you make a node out of it.
Edit: it’s not a tree it’s a directed graph. But I find tree a very simple approximation that most people without a computer science degree can understand.
A Git repo is a DAG of commits, and each commit is a snapshot of your files, along with some metadata. Commits are immutable and point to their own parents. Branches and tags are pointers or labels on commits. Everything follows from that.
For me the parts that break my mental model are stuff like trying to check out an historical revision and ending up with a scary warning about a detached HEAD (wtf), or not being able to easily start a new branch from an older commit (like you could very easily in mercurial).
Believe it or not this actually fits into the model, albeit with a little more complexity.
HEAD is a special label that can point to other labels, specifically to "heads" which are the labels that represent branch tips.
Normally when you make a new commit, the current pointed-to branch head advances forward to point to the new commit, and the HEAD pointer follows that branch head pointer.
"Detached HEAD" is whenever HEAD points directly to a commit rather than a branch head.
When making new commits in this state, there's no branch head to advance, so HEAD itself will advance to the newly-created commit. However this is a little dangerous because once you switch HEAD to something else, commits in the DAG without a branch head at the tip/leaf would be considered "unreachable" and would be invisible in logs, and eventually will be deleted by the garbage collection process.
It's still ultimately all about operations on the commit DAG and labels/pointers therein, which to me is really elegant.
What is not necessarily clear from that mental model is what any given operation will actually do to the files in your repo (called the "work tree"). But Git is very consistent here and usually tries to keep the work tree in sync with whatever is ultimately pointed to by HEAD, unless you specifically ask it not to (as with `reset --soft`).
> not being able to easily start a new branch from an older commit
Unfortunately this is a case of CLI docs obfuscation:
They should have just called detached HEAD for branchless mode. That's all that it is: HEAD is directly on a commit instead of pointing to a branch. Clearer and less caps lock.
But people also complain about the “branch” term so maybe they would like the more internal “head” term? In which case we're back to square one.
I think I found it extra confusing because this action (check out previous commit, do some work, merge with main) was completely supported in my previous dvcs of choice, Mercurial.
The Git idiom in general is to make a branch for any work you intend to keep.
But I'm also curious what you mean by doing work on a previous commit and then merging with main. Maybe it's a way to solve a problem I hadn't considered before.
Been a long time since I used Hg, and I've mostly trained myself out of these ideas, but I'll try to recall the use cases.
1. Consider doing a bisect to find when a bug was introduced (which may now be in multiple branches) and being able to fix it immediately then merge and/or rebase over the current versions.
2. Working on a branch, you realize that you'd like to try an alternative approach to a problem. You check out a previous commit, leaving the branch in its non-working state, and try your alternative approach. If it works, you keep going on this version, otherwise you can just switch back. The temporary unnamed branch continues to exist until you choose to delete it (ideally before pushing!) . This unlocks a much lower inertia way to do small experimental work without having to formally label new branches, which carries enough weight to make me hesitate. I would do this all the time in Hg and almost never in git. In git I would probably reach for stash to achieve something similar.
Git is one of the best examples of "great tool with terrible API".
Funniest thing is that people always argue "learn git internals and it'll click"
Yea, git is the only tool that people argue with straight face that you should know internals of software and act as if it wasnt sign of terrible API design.
Imagine if people acted as if you needed to know Visual Studio's or Firefox's internals in order to use those tools even in "far from very advanced scenarios", rofl.
Yes I don't know how anyone can come to the conclusion that the CLI isn't by far the biggest problem. It is terrible.
The other big issue is that people tell you not to use a GUI, and there are a lot of GUIs that are not very good. Using a good GUI (I recommend GitExtensions, GitX on Mac or Git Graph in VSCode) makes Git waaaay easier to understand because the model is literally drawn for you.
Find me a Git tutorial with no diagrams and I will change my mind.
Also the author's "these are core facts about Git" list is not very accurate so I'm not sure how much they really understand it. A commit is not a snapshot and a patch. It is literally only a snapshot.
Edit: I actually searched Git tutorials and there are a surprising number of terrible Git tutorials with no diagrams! Though I guess I shouldn't be surprised about the official Git tutorial and W3Schools.
> A commit is not a snapshot and a patch. It is literally only a snapshot.
When you do cherry-pick, you supply a commit id, but actually a patch (diff between the commit and its parent) is applied. So yes, commit (/id) can mean a patch in some cases.
I'm not sure that people are (generally) arguing that learning how git works internally is required to use the tool. The thing I see repeated a lot is that learning how it works isn't as scary as it seems and will give you a really good understanding of what it's doing and why. Given that such advice is aimed at software developers, it doesn't seem that outlandish, either.
Also, you absolutely should understand how a debugger works to get the most out of Visual Studio. If you understand how the clutch and torque setting on an impact driver works you will have a less bad time using one and the tool will last longer. <add your own car analogy here>
The thing with git is that learning the general idea of the internals is not enough. You can learn enough about git to visualize a DAG of nodes identified by hashes, but what does that get you? You still have a thousand higher-level commands that build on top of it, and for each of those you have to learn how what it does translates into nodes and hashes.
The higher level commands are not individually leaky abstractions. But you can't connect them at a higher level of abstraction. In order to understand how "git foo" relates to "git bar", you go through the lower layer.
So git has you learn:
1. The fundamentals.
2. A thousand higher-level commands.
3. How each of the higher level commands translate into the fundamentals.
Version control is supposed to be a tool to help me, to take away chores so that I can focus on other, more interesting, things. This is not something I want to invest a lot of time on. Even as software developer, most of time when I use software I'm just a user, and I want the tool to do its job and get out of the way.
Car analogy: I drive cars and leave the rest to the mechanics.
You should understand the internals of tools that you use. Abstractions and simplifications do only one thing: delete features in the hope of being easy to understand.
As a software engineer you have the unique privilege where the people that make your tools have literally the exact same job as you, so you can pretty easily understand their work.
As for understanding Visual Studio's or Firefox's internals, that's basically what you do to write Visual Studio plugins or apps that run in Firefox.
>You should understand the internals of tools that you use. Abstractions and simplifications do only one thing: delete features in the hope of being easy to understand.
Just because you should understand internals, it still doesn't mean that git's cli isn't terrible.
Also I don't have time nor desire to understand internals of all software that I use
do I have to learn internals of github, jira, email client, teams/discord/slack/irc, power point, etc, etc?
They have similar or higher value to git in work setting (I mean I could switch to other letters management system)
VS and Firefox don't demand you understand writing plugins to use them.
They wrap their internals with an interface that has well documented entry points, but you don't need to delve into them to get a lot of value out of them.
> As a software engineer you have the unique privilege where the people that make your tools have literally the exact same job as you, so you can pretty easily understand their work.
That's not how that works. Writing frontend apps doesn't teach you MUMPS.
_
At the end of the day Git gets "easy" but honestly most people who say Git is easy are like people saying JS is easy: the people who know either well enough will claim anyone who thinks either is easy simply doesn't know them well enough
Linus himself: Generally, the best way to learn git is probably to first only do very basic things and not even look at some of the things you can do until you are familiar and confident about the basics.
When I work with graduates, I often encourage them to try and understand "1 level down" - that is to say, understand the general design philosophy and how to navigate the tools and libraries you are using. That gives you a better understanding for how to use them, whilst also allowing you to side step the documentation when you need to. On the other hand, if you try to understand what the atoms are doing on the silicon when you run an SQL command, you might be a tad overwhelmed...
Some of it might be inherent though. The data structure consists of immutable commits which are addressed by the SHA1 which depends not only on the snapshot but also the metadata as well as the SHA1 of the parents. You run into this immediately when someone thinks that they can maybe just tweak the commit message a little and end up with the same commit. Or drop the second parent of a merge commit in order to “forget” that line of history (effectively making a “squash merge”).
Imagine if someone was working on a SQL database and they thought that an `update` would not change anything in-place but instead would create a new immutable row and that they could reset back to the previous row (before the update) right after that since the immutable row (they thought) would still exist until some arbitrary garbage collection time. You would end up having to explain the same thing but kind of reversed.
I think Git can be a pretty pleasant experience for most folks, as long as you use the basic features and maybe even consider a GUI, anything from Git Cola (free: https://git-cola.github.io/), to something like GitKraken (paid for all features: https://www.gitkraken.com/).
Curiously, the latter also let me setup different accounts that I can switch between with a simple dropdown, which was otherwise annoying when you have Gitea, GitHub, GitLab and others to manage, way easier than https://docs.github.com/en/account-and-profile/setting-up-an...
Either way, suddenly you see the graph of your repo and most of the common actions are a click away, you can just let your brain idle and think about other things you're doing instead, in addition to that working really well with staging chunks of your code, or individual files, cherrypicking and so on. Even Git LFS or ignoring stuff is a context menu away.
Then again, personally I prefer squashing in merge/pull requests instead of rebasing, or even just doing regular merge commits and leaving the history as is (which doesn't really scale, but I haven't gotten to the point where that matters that much), so how I use Git won't work for everyone.
> Git commits are stored on disk as snapshots, but that's not necessarily how Git treats them! Various Git commands treat a commit as either a complete repo snapshot or as a change set, depending on what you’re doing.
The problem is that the documentation and the developers go between these modes so fluently that you have to know the context in order to distinguish them. Of course to them it's obvious that you are treating the incoming changes as a “changeset” since it's a diff (a patch) from an email (not a snapshot).
This is the “name” section for `man git-rebase`.
> git-rebase - Reapply commits on top of another base tip
How should someone new guess that this means to (effectively) take the diff of each commit and applying those changes? Instead of working on the snapshots?[1] Because I think you could also interpret it as: take the tree (the snapshot) of each of these commits and make an alternative subgraph where the parents are different, propagating from the new base tip. Which means that you can make a new root commit (disregarding old history by “squashing” seems to be popular on SO for some reason) starting from A by rebasing A on `--root`. But it doesn't work like that.
If you want to do that you should instead use git-replace.
In my experience, for every little question about cherry-picking or rebasing, you have to go down to the fundamental concepts. Because the way people work is not like a sponge which only absorbs the information that it is given; people learn 10% about something and then automatically infer the rest, and sometimes 50% of that is wrong. So then you have to unnest that.
I guess this is my long-winded way of agreeing with the author.
[1] Working on the snapshots would also be useful. You could use that to format all of the commits on the branch.
I must be some kind of genius, as I find git quite easy?
I've been using it daily for years now, but I started out with subversion which was quite nice because it had TortoiseSVN.
Later went to Mercurial which was pleasant to work with but since GitHub arrived to the scene the choice both private and professionally became the obvious. Besides git and hg are IMHO not that different anyway from a daily use pov.
When I learned Git, many tutorials were pretty clear that what goes into git stays there as they were added, which is the whole point of a vcs.
So I would dare to say that your example is not entirely a good one? In my 10 years of using Git, in both privately and professional this has happened exactly zero times. But that's just me.
To use an somewhat odd analogy, it would be akin to asking the average car driver to replace the crown wheel in the differential. I'm certain everyone can do it with enough youtube tutorials and practice, but.. I digress.
But actually it makes sense, Git is like a car, once they understand what basic problem it tries to solve and learn the commands to push, pull, commit, rebase and merge, it's enough as learning navigating traffic and use the car from point a to point b.
So I believe git is easy enough for normal, but of course when situations like your example occurs it's harder (but understandable considering what problem the tool is trying to solve).
I am not being condescending here, it's my real world experience that tells me that very few people have issues with git, and looking at how many use github/gitlab/gitea/etc everyday shows the same.
Delete the file from (past) commits? Isn't the whole point (idiomatic way) that you don't do that, and only delete it in the newest commit?
Unless you mean some intentionally weird scenario, where you want to just delete the file to prevent it from being accessible period.
But that's (afaik) against the whole concept of git, and so verry much outside the intended use.
The rule was to not use any external tools so it doesn't qualify.
But you're right. filter-repo is a great tool. You can even use pleasant Python "callbacks" instead of a hodge-podge of env variables and shell snippets.
The rule is arbitrary and unhelpful and only helps drive a rhetorical point, not actual advice: you really don’t want to do this without a tool as safe and fast as filter-repo. Even recommending filter-branch is just aligning the footgun slightly.
If a point is to be made that confirms OP: removing a file completely isn’t trivial, and thinking one did when one didn’t it is too easy.
It ships with git, so without extra tools, it is the implied solution to OP’s question, and the default choice for many.
I’m not saying they recommend it, as the question drives a rhetorical point. But it’s worth mentioning “git filter-repo” as a solution you will not stumble into as easily as the more difficult, brittle and slow “git filter-branch”.
What I said in my one-sentence-reply is that there are no non-deprecated, built-in tools that helps with this task in a streamlined manner. Which means by extension that it is not a trivial task.
I feel sort of in the same boat, at least from reading this blogpost. Though it might be time for me to master fetch rather than pull. What I wonder is how anyone (read: most people) get by with using GUIs for Git that tend to abstract some of these fundamentals away
You're smarter than me, anyway. I'll confess that I don't actually understand git and I sweat bullets every time I have to use it. My only consolation is that's true of most of my team as well.
> I must be some kind of genius, as I find git quite easy?
It’s hard to compare when you knew a conceptually simpler VCS before learning git, since you have no clear memory of getting blasted with the full conceptual model of git at once.
I had to teach git to two colleagues recently, and it helped greatly that they could start with GitHub Desktop, since they also weren’t command-line veterans. The two concepts that were hardest for them to stick was
(a) that there is a difference between “main”, “origin/main” and the remote “main”: the idea of a local copy of a remote branch that is different from your local branch seems abstract and unnecessary until you have merge conflicts and you need the conflicting code to live somewhere locally.
(b) that you want to branch off from “main” instead of whatever branch you’re on right now, when you start on a new thing. This is true when you do a pull request and then begin something new.
I talk about branches as parallel dimensions and merging as portals between them.
I don’t remember when it happened, but there comes a point where you go from “it’s hard to use git without screwing up and having to start over” (https://xkcd.com/1597/) to “it’s hard to use git and screw up”. Having “git reflog” makes you quite invulnerable which makes you dare rebasing more.
Reaching consistent undo should happen sooner than reaching complex history rewriting ability, because it greatly affects what you dare do.
> Commits are unique and immutable, and are anchored to a specific point in the graph of history and causality. This means that a commit’s identity is made up of both its content and its context.
> This in turn means that you need to be comfortable and fluent in a branching many-worlds cosmology, so you can distinguish between changes and snapshots that have the same intent and content but which are completely non-interchangeable and imply entirely different flows of historical events.
"Graph of history and causality", context, content, "many-worlds cosmology", changes, snapshots, "flows of historical events"... This makes Git even harder.
> "Graph of history and causality", context, content, "many-worlds cosmology", changes, snapshots, "flows of historical events"... This makes Git even harder.
Thank you for saying that. I had to pretend I didn't read it because it made me even more confused, and assumed it was because I was an idiot.
Yes, it is. What I found amusing was the preface to the article:
> While doing so, I have often failed to get the basics to stick, which is incredibly aggravating to someone who prides themselves on explaining things.
[...]
> So, I’ve spent some time thinking about this before this weekend.
And yet, what he wrote after thinking about it only muddied the waters for me, not clarified. Which tells me that git is fundamentally really difficult to explain.
Yeah I agree. I actually had the same thought as the author, use the example of wave/particle duality in Physics. The problem is most people don't understand that (how to grasp the paradox and have the right intuition about it) and you will just come across as a pedantic smartass. The ones who do know some advanced Physics will see you as a dilettante. Teaching Physics with correct pedagogy is hard. Even harder than Git.
> Also, Git's CLI aggressively obscures those core concepts, so any learner is in an uphill battle even if their mind does happen to work the right way
I'm not sure how many people struggling to learn git (the 'git novices' of the article) are competent with any comparable tool. Maybe "why git is hard" isn't the right question?
Speaking from personal experience, if you already learned Mercurial, you might find it annoying that commands with the same name do different things, or worse: almost the same thing, but it's practically the same[0] system. Coming from a centralised system to a decentralised one is a bigger jump but understanding the 'distributed' part of "distributed version control system" seemed to me (speaking from personal experience) pretty straightforward, compared to the V, C, & S.
It's just that a lot of the points made here seem only loosely git-related? Not that that makes them wrong or invalid, by any means -- they do all apply to git. Fully agree that teaching people `git fetch` as a separate thing to `git pull` is probably wise.
[0] the same: git doesn't believe in curtains, mercurial just painted and doesn't need curtains
> I'm not sure how many people struggling to learn git (the 'git novices' of the article) are competent with any comparable tool.
I am extremely competent with a couple of other version control systems, but that doesn't really help me in terms of understanding git.
I agree that many of the attempts to explain it that I've read (including a number of the comments here) are really explaining things that all version control systems have in common. For me, those aren't the parts that I struggle with. It's how those things map onto actually using git that I find difficult.
My big issue with Git is the way it handles Merges.
It throws up this dialog saying "there are conflicts", and by the time you see this, it has already monched a bunch of files in your local version and now the only way out is through. You need to drop everything and fix all this stuff right now before you can proceed. Even if it's last thing on Friday night and you'd rather just go home and try again on Monday.
If there was just an "Abort" button that stopped trying to push or commit or whatever and just put things back the way they were before, that would take all the worry out of it.
Because usually it would have been easy to avoid the conflict if only you knew it was coming before Git f'd your whole project.
Where in my comment did I scoff at the idea of using a GUI? I personally mix a GUI with the CLI. Good Git GUIs certainly have an "Abort merge" button. But it’s far easier to write out a command that everyone can use than hope someone is using a good GUI.
My comment is in agreement with yours! I'm just expanding on how absurd it is to struggle with merge because GP didn't know about a simple --abort, and how common it is even with senior devs.
And yeah I too find a mix of GUI and CLI works great.
Oh I'm the complete opposite. Why does everyone use a GUI for everything? You get good at dealing with the good plan _OK_, but if something goes wrong or you want to automate something, there's no flexibility.
Git's core abstractions map pretty dang well to a GUI. And the commit graph is inherently pretty visual. I find it a heck of a lot easier to navigate a repository graphically than via the CLI (it can be done, it's just not fast). For most people a GUI works better when they get into a sticky situation with git than the CLI. Automation is different, of course, but it's not generally hard to map actions in the GUI to commands if necessary.
Here, every veteran developer uses git and rarely needs to ask for help (although it might save time), and every novice developer uses GitHub Desktop.
This works great.
One exception: if someone does a local rebase on a feature branch, GitHub Desktop will recommend “pull” from their remote instead of “push —force-with-lease”.
In addition to the —abort option (with merge & rebase), there’s also the option of resetting to any state you desire, using either checkout or reset — and reflog for viewing a list of the states you might want to reset to. (Or even better, just use a tool that shows the graph of the repository)
I think the CLI is more to blame than the author says.
For example, the "branches are more like bookmarks" point. Well, I think it would have been easier if git called them bookmarks, like Mercurial.
About the "fuzzy identity" part, it is mostly a consequence of git fuzzy naming scheme. Like the "git reset" command that does many different things, the result is that no one know what a "reset" is.
I have never struggled with the idea that a commit is a "worldline". I think that calling it a worldline is making it more confusing than it is. For me, it just represents some work done, but the git tendency to mix high level concepts with implementation details gives people weird ideas.
I don't think so, even as a game artist. It took me a while to understand the concept of branches (that they are basically modes or variants of a project). That clicked. But for daily game development, the most thing I do is just commit+push / pull. That's 90% of the work and Git does a great job here. There are desktop applications such as Anchorpoint (https://www.anchorpoint.app/), which make it even easier for non coders. So the whole big ecosystem of Git makes it easier to use.
I think trying to avoid talking about SHA1s of content makes everything blurrier, not clearer. Because them people say things like this, which really don't make sense and instead seem like speculative SF:
> A commit is its entire worldline
Honestly, most Git problems I see stem from long-lived branches. Git flow is a common culprit.
It's not the same thing but Git Flow reminds me of Conway's Law. Just that the branches end up representing the dev-test-release workflow.
Another variation is that you reify branches according to whatever team that is working on it, like dev and qa and release. As in they have these literal branches and they need to follow a workflow diagram in order to remember what to merge into what and in what order. Then they mess something up and they go to StackOverflow to ask such easily inteligble questions like:
> We are using a standard branching model. But sometimes we find a bug in release/5613248 and need to deploy a hotfix. So naturally we branch hotfix/5613248 from release/5613248, fix the bug, do a PR against release/5613248/joe-the-gatekeeper, then Joe merges that into release/5613248 and also into pre-release/5613248, qa, and finally into develop. But sometimes we get weird merge conflicts when merging into for example qa or develop. How do we avoid that?
I think a lot of the git problems come along due to 'good ideas'.
Basic git isn't too bad. A learning curve, but manageable.
However, next thing you know you're in a sprint and your work depends on somebody else's (process problem, but happens often) and so you are now working on top of their branch.
You make some commits, and have to merge in (or rebase - force push) their changes...
... and then you come in to work after the weekend and find that they have merged their code into main. Cool ! You'll just merge/rebase main, except you find that they had also been working on top of someone else's branch, and now you have all sorts of fun trying to unpick how your changes actually relate to what is in main.
Sure, some people are just happy to rebase and force-push, but if you're happy to force-push then the 'control' has gone.
From my teaching experience it could be a fear to do something wrong. I've come from SVN and ClearCase where doing wrong was quite a trivial thing. And Git relaxed me a lot. Knowing only push/pull makes actual changes and you have reflog to revert any possible mess is a big relief.
I don't think I have ever met a beginner of git that hasn't fucked up their repo to a state where the effort to get back is more than just redo everything. In the beginning the fear is paralyzing and it is not without reason.
For sure it is recoverable, but not to a beginner.
I do agree that fear is a very important factor. I disagree that it was easier to do things wrong with SVN.
The times I got in a weird state with SVN was usually related to a merge. But it was the clunkiness of the tool and not fear of loosing anything or not having a way out. Last resort was just to do the merge manually. No risk to it, just tedious work.
This is definitely why I hate git. It really is possible to lose your work by doing a a bad reset (and really the vocabulary of hard, soft and mixed resets is a blatant cop out from coming up with meaningful terms). Or by checking out another branch and not committing or stagsing all your files - yes I know this is a no no but it's trivially easy to do and git would be better if it just had to get them back. If only there was just a 'revert whatever I just did to git' even if what I did was destructive. Or better, if only navigating my git history was as easy as an undo tree in Emacs. Every change seems frought and so for me and probably a lot of others it becomes easily very stressful.
I find that since most students just `git add .` (a habit I strongly encourage them to break) and then `git commit`, they don't see why the extra staging step exists or is useful. It's just a nonsensical hoop to jump through.
I don't the inconsistent, bad CLI be dismissed so easily. That’s the barrier that first must be overcome to get to any mental model stuff. People have learned more complicated things, but have had better abstractions to work with.
Numpy is hard. But it’s less about accidental complexity of the interface and more about the array programming paradigm being unusual for most people.
Spark is hard. But it’s because distributed data frame processing is hard. Whereas pyspark had made a lot of the actual abstractions easier to work with. Compare this to Hadoop back in the day which has both accidental and real complexity.
I think of branches as variables pointing at immutable points in the commit tree. Git commit creates a new node in the tree and updates the pointer. You can manually move pointer around if needed.
When I was learning to code, I first learned git. It was the most intuitive program I've ever used, I still think its one of the best software ever written.
> I first learned git. It was the most intuitive program I've ever used
Maybe because it is because it was the only one, since it is the first you learned :p
I wouldn't call git "one of the best software ever written", though it certainly was influential. It was a hack job by Linus Torvalds because he was dissatisfied with the existing offering when it comes to source control. What he achieved in so little time was truly impressive, and the foundation of it is good, but it was still a hack job and I guess it gained most of its popularity because of Linux, and later GitHub.
Now, git has much improved, certainly due to its popularity, but its roots as a hack job still show. Maybe, in an alternate universe, Mercurial would have been the one, it is very similar, or maybe Bazaar, or Fossil, which have different philosophies, but are still very capable DVCSs. Interestingly, they all came out at about the same time (2005, except for bazaar, that came out in 2006).
I mean the best in terms of the base ux, the hackiness that you encounter sometimes is proably mostly because of the coupling with remote services (github, gitlab) where there is some friction, but imo pretty minimal (some unaccounted hashes with rebasing etc.). I'm now interested to try the alternatives just for comparison...
Git is hard because it solves a non trivial problem.
But also, because it uses a lot of bad defaults (naming the remotes "origin" for example) and it uses the same name for different things. It's almost like the founders of the project didn't understood the problem space at the start of the project and now it's too late to set sane defaults and behavior. Just like bash, PHP, MySQL and others.
I remember seeing someone post a conversation with chatgpt about making git make more sense. The naming of actions seemed to be one big hurdle. Pull, push, merge, commit are actually confusing terms that don't clearly explain what you're doing. Simply renaming the terms or being slightly more verbose was enough to demystify what was happening.
The things that this post describes are the easy parts (except pull).
Then you come at it with your new understanding of an immutable commit DAG, and run into things that shouldn’t be there at all, yet everybody uses — rebase, cherry-pick, &c.
How would one expect to use a tool they have no mental model of? It's a tool not a service. What tool do you not need to understand (at least some of) the core principles?
Hm.
I love Git because its core is so simple it fits on a napkin [1]. From such a simple mental model, everything else flows. It's an extremely simple mental model because it compresses so well: from only the first principles, I can derive the rest.
Any other computer system, I try to dig and find its first principles, but nothing is quite so clean, so I have to keep in mind a bunch of exceptions and add-on warts to understand them, a larger mental load.
That said, I seem to often be the one around my teams most comfortable around Git. I can't explain it. Perhaps I've erased my initial struggle to understand it. Perhaps others don't conceive of Git as a datastructure...
[1] files=blobs (named after the SHA1 hash of its content), dirs=trees=collections of blobs+trees (named after the SHA1 hash of its content), a commit=top-level tree + parent commit(s) + commit comment (named after the SHA1 hash of its content). A branch is a reference is a pointer to the latest commit in a series. Creating a new commit adds to the top and updates the pointer.