Hacker News new | comments | show | ask | jobs | submit login
Purposes, Concepts, Misfits, and a Redesign of Git (neverworkintheory.org)
161 points by ingve 266 days ago | hide | past | web | 119 comments | favorite

While I don't think git couldn't be improved, especially in terms of ux, I do feel like losing staging area altogether is quite significant sacrifice, like throwing the baby with the bathwater. The ability to build up and massage the commit in staging is really powerful method, and I think it makes git actually easier to learn once you get beyond the initial hump due its interactive nature.

Perhaps the worst piece of UX in all of git is the naming of the staging area. The docs and arguments use 'index', 'cache' and 'staging' all to refer to the same thing. The names 'index' and 'cache' are awful, but the inconsistency is unforgivable.

I still have to check myself when I see references to the "work tree"

I disagree. Forcing a weird pseudo commit, with its own unique set of commands for interacting with it, into the basic workflow is not the right approach. Without an index, beginners can just commit. As people get more adventurous, they can learn commands like commit --amend and rebase --interactive to build up and massage their commits.

So what would be the workflow for committing (and perhaps pushing) partial changes in that case? Say you've made a lot of changes and they don't all belong together.

If I needed to do a partial commit I'd record the diff ("git diff > big.patch" or git stash), clean the repository, and build the patch to be committed manually picking from the recorded diff, test thoroughly and commit only when the patch is coherent (ie tests pass, compiles and runs). Then I'd just apply the rest of the patch to continue working. AFAIK the git model (git add [-p]) and history editing makes it very easy and probable to produce incoherent commits which when checked out does not compile, test all-OK or behave wrong. I use CVS for my personal stuff for various reasons but even if I used git I'd do it like this.

P.S. if your granularity is in the level if files and not individual hunks of the overall patch then you can just "cvs/git commit files..." easily but still, it'll go in untested and you may have forgotten a change that was required in order to make the commited patch valid and coherent.

Do partial commits. Run a new interactive rebase session, set "edit" to all those commits so you can test out individual ones and fix up some more if needed. Basically the same outcome, but less manual diff editing.

I used to do that with SVN. What a pain in the ass.

Git's version is miles better.

While the staging area could use some UX improvements, it is the most important part of git and proper usage of it fosters good history clarity and code quality

For that reason i agree strongly. Any attempt to improve git that does not move the staging area more to being the default way of operating is misguided at the code.

Mercurial gets around this issue (without having a staging area) by providing an interface for interactively committing files or subsets of files via "hg commit --interactive"

The staging area is a lot more useful than what you can do with an interactive commit, since it persists between operations. You can use it to accumulate changes that are "done" while you continue to experiment with some other changes on top, even if you're planning on committing the whole thing at once, and if you wind up not liking your new changes, revert to what you had in the staging area; or you can realize what you're working on will be easier to finish as two commits, stage part of it, stash the other part, finish the part that was staged and commit it, then apply the stashed part and finish that too.

This is a lot more than you get with just a single interactive commit operation.

You could use a commit for this purpose, and continually amend or revert this commit, but in terms of UI, it's helpful to have a set of operations specifically for moving things in and out of the staging area that's different than what you do for manipulating commits on general.

How is it helpful to have more concepts and commands that simply duplicate advanced functionality and have to be used every time complicating even the simplest cases? As you said it can all be done with amend and other normal and optional commands already.

Or you can use an easy-to-use but powerful UI like TortoiseHG, where it's trivial to check files or hunks for commit, stash (shelve), amend, revert, rebase, ...

TortoiseHG with hggit is the best git client.

And that's before even mentioning the best parts like phases and obsolescence markers.

> and have to be used every time complicating even the simplest cases

Take a look at `git commit -a`. I sometimes teach this one to beginners if I feel that the idea of staging is confusing them.

The whole problem with the staging area in git is that it persists between operations.

Why would you want to add things piece by piece and not commit them? If they are done enough to be added they are done enough to be committed and if they are not you keep working on them together with all other unfinished files. Once you actually are ready to commit you can choose which files that are in fact ready.

> Why would you want to add things piece by piece and not commit them?

It provides useful checkpointing, so in practice I find that there are lots of things that I will add to the index even though I am going to keep working on them. Committing these intermediate changes is annoying because it just creates extra work when I make the final (squashed) commit.

This strikes me as one of those disagreements that simplifies to, "That's not how I do things, that's horrible."

The other half to this is that commonly, during an interactive commit (git add -p or hg commit --interactive), I realize that I missed some small change. Being able to persist that state means I can add the small change without redoing the interactive commit.

Surely some experienced people would want to do it this way and im not against providing an option to do so. The problem is that it's required and the names of the commands are totally misleading for beginners. "add" sounds like something you do the first time you add a file to the repository, not something you have to do every time you modify it. Don't even get me started on umstaging files: git reset which sounds like an extremely dangerous operation when all it does is unstage files, except if you pass it some flags then you can really loose data.

Git has the same with `git add -p`, and if you're operating on a gui system, git-gui is excellent.

I did not know that existed. Thank you for informing me.

There is also a very nice curses interface for interactive chunk selection. To turn it on set interface.chunkselector=curses under the ui section in your .hgrc.

Why is that important given that you can do exactly the same with a commit in the local tree?

I haven't read the paper, but the last paragraph in the article makes me a little sad. Then again, there were hardly any progress that wasn't met with significant resistance initially. (EDIT: typos)

It comes down to levels of "written in stone"-ness, for lack of better words. There's a hierarchy of how much thought and effort change requires at various levels, starting at requiring no additional thought, to requiring considerable thought, and maybe even coordination with others:

  - changes on hdd
  - changes in staging area
  - changes in commit
  - changes in commit that came before a recent commit
  - changes in pushed commit
I'm not entirely sure about the following, and i think it merits more thought, but here's my current state of mind about it:

Technically you could do the same as the staging area by forcing the user to create a new, empty, but named commit when they decide they're done with the last one, and have all the current staging-affecting commands affect the last commit instead.

However, for one, i think it may be easier to mess up and lose data unexpectedly when workign that way, and for the other, putting down a name for a commit when you're not even sure what it's gonna be may be an awkward way of working. Lastly, you'd also need to implement "empty" commits. I'm not sure how useful/easy that would be with git.

Empty commits already are supported (see --allow-empty). And you can rename that commit at will using --amend, so the first name could simply be a placeholder - like "staging" :)

Did you read the gitless instructions? [1] You get the same functionality as a staging area by keeping a list of all the files you want to commit, and then run once:

    gl commit -o file1 file2 file3 file4
[1] http://gitless.com

and if my change affects two hundred files, across dozens of directories?

You find it hard to create a list? Here's an example of making a staging file:

    echo file1 > staging                       # e.g. git add file1
    echo dir2/file2 >> staging                 # e.g. git add dir2/file2
    echo dir3/file1 >> staging
    echo dir3/file5 >> staging
    echo dir4/dir9/dir394/file384 >> staging
And committing the file:

    eval "$(echo gl commit -o `cat staging`)"  # e.g. git commit

Congrats. You just recreated a bad implementation of the staging area in a text file.

Congrats, you got the joke! ;)

A staging area is just a list of things to commit. There are many ways to make lists. I personally use emacs because direct manipulation rocks. Complaining about a change in one way to make lists is silly—gitless introduces much more powerful basic mechanisms like uncommitted branching, and lists belong on top of these more powerful basic features; not underneath like in the traditional porcelain.

The staging area is finer than files. It is more like a diff, where you can change parts of a file.

For example, I usually use git-add -p to put my changes in staging without the debug print statements. I still want to keep them in my working copy.


Is there a handy way to stage the current changes, then make further unstaged changes prior to the commit?

Yes, by making a branch. Gitless allows you to branch and have unstaged, uncommitted changes on different branches. This extra power in branching is useful.

It's important to note that gitless aims to be a porcelain on top of git. That means git stash isn't going anywhere: both systems can operate together seamlessly on a same repository.

How do you test what is in the staging area as a subset of all the changes fir consistency (i.e. can you be sure when committed and later checked out it will be a coherent stage of the project history that will test compile and run okay)?

Commit (don't push), stash remaining changed files, compile and run your tests. If anything goes wrong, either fix it and commit --amend, or reset --soft HEAD~ to return the commit to the staging area and fix from there. Unstash as required.

Aside from selective commits (which they provide a different interface for), what do you use the staging area for?

That's kinda the point, though. I don't want to run a single command to both define and complete the commit. I want to be able to do it in steps. Review my modified files, pick and choose what's staged, maybe make some more edits, stage those, double-check the staged diffs, verify the commit has what I want, and _then_ hit the button.

But isn't this what changelists do? I've certainly done pretty much what you describe in hg, svn and Perforce, which don't have staging areas.

Also, if you end up committing only parts of files, then how do you know that what you commit works?

Well, you don't, so... don't do that. Same thing goes for Perforce; you can already create multiple changelists that you commit independently, so it's easy enough to create a bad commit, because it contains only some of the changes you've got locally.

What you're supposed to do in Perforce of course is to create your changelist, shelves the others, make sure your proposed changelist works independently, and commit it only when it does. Which is analogous to what you're supposed to do with git:

    - stage parts of files
    - `git stash -k'
    - check it works
    - `git stash pop'
    - commit
But let's be serious here - the true advantage of git comes when you do the pro move.

    - stage parts of files
    - commit
    - (90% of the time) it's good
    - (10% of the time) bzzt... it's bad
Now, if you were using Perforce, in the 10% case, you'd be stuffed. Somebody would come over to your desk.

    - (them) "Your commit is broken"
Now what can you do? There's no denying it... you gone done fucked up, and there's no denying it.

But with git, you have options. So let's try that again.

    - (them) "Your commit is broken"
    - (you) (confident tone of voice) "Ah no, you are using git incorrectly, you forgot to git fetchpack
      --inflate-upstream-probability-leaves --deny-downstream-packs 
      --integrate-sideway-arguments --discombobulate-flat-file-compressed-alignment-jobs
      -P -Q -J 5 -N 311 -X 0771 -R 0x134774"
    - (them) (uncertain)
    - (you) (confident gaze. Straight into their eyes. YOU = TIGER)
    - (them) (increasingly uncertain)
    - (you) (unblinking gaze. You have the upper hand. YOU = 2 TIGERS)
    - (them) (wanders off to try your technobabble)
    - (you) (fixes your commit)
There's an xkcd about this.

The advantages, I think, are clear.

That's a personal problem. Don't blame the tooling for a bad coworker.

As long as a change or refactoring stands independently of the intended feature change, there is often value in keeping that fine-grained version history. It replicates your thought process at the time in the history messages, gives more manageable git-bisects, etc.

If you can't reliably look at the modified files and pick out what changes stand independently then this approach is obviously not for you, or at least you're gonna need some help from your tooling.

TeamCity has the concept of a 'delayed commit' that is submitted as a patch and tests are run. If the tests fail the patch is not accepted. You can also do a ghetto version of this in any system just by running a local copy of your CI server. Push to its repository target and have it run the tests. If it works push it out to the real build server, if not then there's no problem just force-pushing an amended commit since everything is still local.

Alternately you can commit the independent changeset and then stash everything else and run a build/test.

Obviously you can't always pull out an independent changeset and in those cases you should commit the whole thing at once. But a lot of times it's not actually that difficult to identify independent steps that could be built/tested independently.

Aah, sweet, someone else who uses Perforce! I've been wondering, do you use version control as:

1. a way to record logical changes to files (e.g. implement two features without making any commits, then when you're done, pick out the files/chunks that encode each feature and create commits out of them),

2. a record of history (e.g. just start writing code and make a commit every time you compile/run tests without unexpected failures),

3. something else?

I've found it very painful to apply my git-adapted workflow to Perforce: I _want_ to just start coding, testing out various possible design choices and implementations instead of only theorizing about them, but can't (e.g. "I wonder if I could factor out these methods + fields into a different class?" Perforce: oh well, I guess I should write it down somewhere to remember for later. Git: branch my current work, spend five minutes sketching an extraction, then realize it's insane and continue working). Am I crazy and just don't realize ow much better the Perforce model is?

Well, I used to use it, at least!

I actually came to quite like it, but the workflow was sometimes a bit tiresome :( When I found myself the situation you describe, potentially wanting to make an additional change while already working on another, I did exactly what you suggest: add a todo item, and carry on until I'm done with the task at hand. Then start my second change with nothing checked out, so I can undo checkout on everything should I make a mess.

This works, and you get used to it, and of course many would say that it's a better approach - but it would be nice if the Perforce client tools could be a bit more imaginative.

As for how I think of version control, if it's git, #1. If it's Perforce, #2, plus #3 - backup and distribution.

I don't know how much value I get from being careful about my commits with git, but it does make me feel better (which I suppose could be reason enough). On the large-scale, goal-oriented projects that I've used Perforce for, worrying about logical changes has never felt very important. Does it make a big difference if you have one commit that implements 3 features, or 3 commits? When you're trying to fix a bug, you don't really mind either way, because (a) bugs don't respect feature boundaries, (b) all the features are non-negotiable, so it's not like you can back one out and carry on anyway, and (c) the project is large and fast-moving enough that even if you could, there'd be a good chance that actually, you couldn't.

What history tends to come in useful for when it comes to this sort of project: finding the code that corresponds to a particular build that you have a bug report for, and finding who made a particular change so that you can ask them about it (using the imeline view say). These both work fine whether you make granular changes or not.

You might want to be a bit more careful about things if you're working on changes that might want to be merged into another branch on an individual basis. (Suppose you're moving spot fixes from release branch to main branch - some you'll want, some you won't.) But you usually know when you're making this sort of change, and so you work that way anyway.

> Also, if you end up committing only parts of files, then how do you know that what you commit works?

Because one part was, say, whitespace-errors corrected by my editor, or added/updated comments, or removed dead-and-commented code paths, while the other part was an actual logic change. There are many reasons to modify a file of source-code beyond making the code behave differently, but frequently those aspects come together as "oh, I'll just fix this while it's staring me in the face" and must be picked back apart later.

> Also, if you end up committing only parts of files, then how do you know that what you commit works?

You use your knowledge as a programmer to determine what that part of the file does and whether it can stand independently as a change. This is not always obvious, so in that case, you don't do it. But when it is obvious, it makes it quick and easy to commit it separately.

For example, say you rewrote a function in a script, and while you were at it you improved the usage info (--help). Then you go to commit your changes. You look at each changed hunk, and you see that there are two changed hunks: the function you changed and the help message. With Magit, it's dead simple to commit them separately: 1) select the first hunk, 2) s to stage, 3) c to commit, 4) write the commit message and execute the commit, 5) select the second hunk...

And let's say that, while you were rewriting the function, you commented out some code in the old function. You obviously don't want to keep that old code in the new function, so when you are looking through the changed hunks, you select those lines and press k to revert (or kill) those lines containing the commented code. If you have already staged them, then you can simply select those lines and press u to unstage just those lines, removing them from the staging area but preserving them in the working tree.

The workflow of 1) develop, 2) review and stage related hunks, 3) commit staged hunks, 4) goto 2, also helps a lot by forcing the developer to look at the code he's about to commit. Of course you can just stage everything that's changed and make a big commit, but I find that using the staging area to build up commits before making them helps me catch things that I don't want to commit in the first place. It's really useful for managing config files, because you can commit only the parts you want to save or sync to other systems, while leaving some parts active in the working tree but out of the committed branch.

For example, the Picard music tagger stores window and dialog positions in the same config file along with behavioral settings. I want to store and sync changes to the options, but I don't want to commit and sync every time the window position changes.

The staging area is a powerful (yet essentially optional) feature that, when used correctly, gives the developer great power to make commits more logically independent, while allowing him to develop freely, with little concern for how he will eventually commit the changes he's making.

Thank you. Exactly the kind of workflow I was trying to describe, but said better.

I do this all the time and the answer is:

I test it.

If there's a certain set of changes i wish to commit in one unit, but i have other changes in flight, then i use staging to prepare and reason about them while being able to diff both staging<->previous and staging<->hdd, make the commit, then stash the hdd state and run tests. Afterwards i can restore the hdd from stash and either amend the commit if necessary, or keep on working.

This. I can't test the staging area so I have zero interest in committing it. Actually I have less than zero interest; I would like my team to be prevented from ever creating untested commits.

This isn't an argument against using staging areas, but against creating untested commits.

If your workflow is that every commit must be tested, the correct approach is to enforce that rule, rather than banning an only tangentially related feature.

Does the staging area have any use other than creating a commit that differed from the workspace, which guarantees that it cannot have been tested?

If you're trying to enforce tested commits without running tests on what's actually committed, you're doing it wrong.

There's an inherent incompatibility between the supposed ways that the staging area can be replaced with other things. If you replace it with a commit you keep amending as you go, but also all commits must pass tests, then you can never commit partially complete code, which defeats the purpose of having the amending commit.

If you interactively choose what to commit at commit time, then you run the risk of failing the test and then, after fixing whatever was wrong, having to remember which parts of which files it was you chose to commit last time. Now, you could choose to have your CVS remember that for you, but then you're basically reinventing a less-capable version of the staging area.

If you simply commit everything, then you can never have any code in your working tree which breaks tests, nor can you have code which isn't logically part of the commit you're currently working on. In practice that means that when you find things that need fixing that aren't directly related to what you're doing, you either fix it and commit it with unrelated changes - which interacts poorly with merging or reverting those commits - or you don't fix it and, most probably, forget about it.

I worked for many years with version systems that didn't include staging. People still frequently forgot to commit necessary files. Opinions may differ, but I feel the best place to enforce that constraint is with a CI server that merges only working changes.

This is a symptom of offering an option to perform an incomplete commit. Word processors don't have this problem because they don't make you specify which paragraphs are important enough to save.

I commit the workspace. If I'm not happy with some of the workspace, I fix it (or stash it for later when chaos happens).

Its use is to help in reasoning about the changes by offering an easy way to diff staging to the previous commit and to hdd, while being easy to change quickly.

As for testing, it is trivial to stash or commit your current changes, go go the commit just made, test it, then carry on. A commit having been made partial only guarantees it wasn't tested if the developer is bad.

To me that sounds harder than making a known-good commit of my entire workspace, and refactoring that commit into several by selectively applying it to a clean workspace (this doesn't need to be done at the same time or by the original author).

The staging area is extremely helpful, beyond just making a single selective/partial commit.

Other uses:

1. Breaking a large commit into multiple smaller commits.

2. As a simple "review" system. You stage files that have been reviewed so you know you don't have to look at them again, while continuing on a change.

3. Cleaning the working copy of superfluous changes/files. If you stage the files you want to keep, then clean and/or checkout force, it wipes out the files you didn't stage. Then just reset to unstage all files and continue making changes.

4. Splitting a branch into two different branches, when you realize that a branch could be broken down into smaller separate changes that are both incomplete, but COULD be committed separately. You merge squash to master, staging and commit creating a new branch for feature1, then stage, stash, and stash pop the rest back on to master (since sometimes you cannot just switch back to master), and create the feature2, and feature3 branches, etc.

I also use Magit in Emacs, which makes the staging area like the swiss army knife of Git. I cannot image the amount of extra pointless tedious work that would be required using commits, branches, and merges to simulate the equivalent behavior of the staging area. The staging area is used heavily by folks who like to get a lot of things done with minimal effort required.

I understand the difficulty for new users switching to Git, and I agree that the commands could be more clear and consistent, but beginner (commit all) and novice (selective commit) users should not be left to determine the utility of Git features they very likely do not fully understand or appreciate.

> 2. As a simple "review" system. You stage files that have been reviewed so you know you don't have to look at them again, while continuing on a change.

(How) does this work if you want/need a virtual paper trail of reviews and/or in a distributed team? Do you email your changes for review, without making a commit and push (to a branch and/or fork)? Apologies if I misunderstand what you mean, but I read "review" here as colleague walks by and have a look, and says lgtm?

This is personal review, not peer review. I try to review my code after completion and a period of rest before submitting for peer review. This discipline finds a surprising number of issues with my code that I just didn't see when I was tired, sick of looking at the code, and just ready to be done with it. I stage the changes as I go (particularly the small changes) and leave the meaty changes for last to be analyzed carefully against the original source.

> I would like my team to be prevented from ever creating untested commits

Frankly, this demonstrates that you have exactly zero understanding of how git works. If I were the sort to force my team to do something, I would force them to never test uncommitted code. Committed /= pushed. A commit means you can keep working on the code while the tests run, and you won't forget what you actually tested. Then, when the tests come back clean, you won't accidentally push some additional, untested change along with the tested changes.

If you can concurrently build/test and edit, you necessarily have more than one workspace, one of which can take the place of the staging area.

I do like the idea of having a snapshot of a test run, though we'd have to amend the commit or something to indicate that it passed its tests and is a sane parent for new commits on its branch, and I would switch to working on an independent feature rather than trying to do work on code which may or may not be broken.

> If you can concurrently build/test and edit, you necessarily have more than one workspace, one of which can take the place of the staging area.

This takes a really narrow view of what software projects can be hosted in git. If you're working on a huge project which must be built (at least for full builds) and tested "in the cloud", which you trigger by pushing to a temporary remote branch, then no, you need not have more than one workspace. If you're a webdev, then yeah, gitless or any other git workflow without the staging area might be useful for you.

I'm sure that workflow can be made to work, but if I had to answer the question "assume you can't test on your editing box nor edit on your testing box, so how do you get source from your editing box to your testing box?" git would not be anywhere near the top of my list of solutions because it leaves known-bad commits pushed that need to be contained like waste.

I think you still don't get this whole "git" thing. Commits are just read-only collections of diff chunks plus metadata, nothing more. It's up to you to define where the boundary is for "bad commits shouldn't be here". Git commits are nothing like SVN or CVS commits (or whatever they're called).

Trying to use some other arcane or error-prone mechanism for getting changes onto test machines is exactly what will leak bad code, because inevitably people will end up sending around zips full of code from who-knows-where for who-knows-which-change.

If you're going to condescend to a stranger, be right. A commit conceptually has a tree of complete blobs that git will diff as needed. git has options that make it more or less sensitive to differences even between unrelated files, so it ignores any delta encoding that may or may not be used in packfiles (and never in loose objects).

If you're making an efficiency argument, git doesn't really try to beat the rsync algorithm even if you ignore the metadata. If you're making a "some people know git and nothing else" argument, fix the team, because that's a serious deficiency.

> A commit conceptually has a tree of complete blobs that git will diff as needed

You're a bit confused about the difference between "concept" and "implementation". There's multiple ways to conceptualize git, but the one I offered is the one presented by the default porcelain while the one you offered is closer to the implementation. The encoding used for storage is entirely irrelevant to my point.

Anyways, my point had nothing to do with efficiency but was related to the topic at hand; that is, whether it it safe to use git to put code on test machines, and indeed whether that's safer than the alternatives. Nothing you said really addresses that point.

You make it sound like a commit is some final immutable set-in-stone thing and so you need staging area. Commits are only "set in stone" after you share (push) them with others.

You're still missing my point. My preferred workflow is to iteratively put together the pieces I want, and only hit the "commit" button when I'm pretty sure I've got it crafted the way I want. Sure, I _can_ rebase, amend, and rewrite commits after creating them, assuming I haven't pushed them yet, but I'd much rather do it _before_ the commit is created. At first glance, this "gitless" tool doesn't seem to really support that workflow.

Why, though? What's the difference between a staging area that you convert into a commit, and an initially-empty commit that acts as a staging area, that you "commit" by just not modifying it any more, and instead creating a new empty commit (new staging area) to work in?

They're bijective mappings of the same data, with the same actions triggered at the same points in a workflow. The only difference is what you label the actions.

To go further: how do you feel if you think of all your un-pushed commits as simply "a chain of staging areas"? What if "committing" was something that happened at git-push time, and until then, you just created staging-area after staging-area and populated them with patch-chunks? Because this would also be a bijective mapping, involving the same actions in the same places.

My commit workflow goes like this: I run git diff, look at the first file that has changed, and then if it looks good, I add that file. At this point I may notice a stray print() statement or something, and delete it, then hit add. I rinse and repeat until I've read everything that I'm about to commit and verified it's what I want.

I've been using mercurial for a couple months for work, and I miss the git staging area workflow. I dislike just assuming all the modifications I've made are good to go, I think the focus git's staging area steers you toward is really valuable.

Mercurial patch queues are very similar to the index:


Did you reply to the wrong comment? Your comment seems off topic relative to the parent.

Regardless, your workflow of 'git diff' on staging with 'add's functions the same as 'git diff HEAD~' on a commit with 'commit --amend's

I think you (and the parent) are totally misunderstanding what I and probably most other people use git's staging area for if you think that git diff HEAD~ does the same thing.

To illustrate a little more explicitly, a typical workflow goes like this:

    git diff
    git add file1
    git diff
    git add file2
    git diff
    git commit
Now yes, you could replace `git add fileX` with `git commit --amend fileX` but I don't see that as a better interface. There are lots of circumstances where I rewrite history, but if there's a workflow that doesn't involve rewriting history, I'll take that one please.

To make this point a little more directly, and in a similar way to what the article talks about, using "git commit" for this is using the same command for two different things, and makes it easier to make mistakes. Treating the staging area differently from commits means that you'll never accidentally alter a commit that has already been pushed, or accidentally add changes to a commit that you didn't intend them to be in. Yes, you can implement this workflow with commit --amend, but it's nice to have support from the tools to distinguish between "possibly-incomplete changes that I've signed off on as part of preparing a larger change set" and "completed, approved changes that may have been pushed to other people". Yes, you can implement this by committing each change and then later rebasing them, but that's even worse from a perspective of the tools supporting the workflow directly.

I absolutely agree with you. There is good reason to differentiate staging from commits, I was just curious why he seemed to think it couldn't be done.

I am imagining treating the last commit as your staging area. Am I missing something that makes that impossible? Specifically, what is the operation that you can't map from staging to a final commit?

Unstaging, maybe? I use SourceTree, so I don't have to remember the specifics of staging-related commands vs commit-related commands. It's just right-click > "Stage / Unstage File", click "Stage / Unstage Hunk" or shift-click "Stage/Unstage Lines".

Undoing part of a commit to be the equivalent of an "Unstage" action is feasible, I'm sure, but I couldn't tell you off the top of my head exactly what that command would be.

So overall, part of it is that it's a mental model, part of it is that the concept is built right into Git.

Which means that if you amend commits, you need to keep track of whether or not you've shared them with others yet.

Which mercurial does for you (see: phases), but you are right, I forget that git does not help you there.

This is so tautological that it might sound sarcastic, but I frequently use it as a staging area before committing. Both for whole- and partial-file changes. This seems easier and more sensible than amending commits to me. When I use other version control systems that don't have this feature (which is maybe all of them?), I miss it.

Not the parent but I'm using it to stage sections of the file and then revert everything that is not staged. For example all logging statements. Or use staging as a mini commit. It's sometimes easier to look at the diff and stage what you consider to be working code without introducing commits. That of course could also be done with committing and then squashing but that's more work :)

or committing and amending that commit, which is really no extra work

A quick peruse doesn't show that there's a way to only commit certain parts of a file (like you can do in vanilla git via `git add --patch`)

"There's also a p/partial flag that allows you to interactively select segments of files to commit." http://gitless.com/#gl-commit

But looking at it, this seems like it would strongly couple the decision to include a part of a file with the commit process rather than something that I can be thinking about as I write the change. I think I'd likely find that irritating.

The paper doesn't discuss much how their implementation does better on their listed issues. One particularly interesting issue git has is referred to at one point as "Incoherent Commit":

"The problem with commit is that it constitutes a violation of the coherence criterion: the same concept (commit) has more than one, unrelated, purpose: make a set of changes persistent (P1) and group logically related changes (P2).

"These two purposes are not only unrelated, but in tension with each other. On the one hand, you would like to save your changes as often as possible, so that if something bad happens you lose as little data as possible (thus encouraging early committing). On the other hand, a logically related group of changes usually involves multiple individual changes, which means that you might be working for quite some time before you have enough changes to group (thus encouraging late committing).

Now a fix for that would be useful. Git only provides you with the ability to create a graph, and you get to choose what you use it for. You have to choose though because there is only one graph. Gitless doesn't seem to fix this problem, you still have to choose what goes into your final commits. Though I don't know what you could do about it without changing git to support something like grouping commits into blocks that are modified as a unit.

You can use a branch to accomplish P1 and P2. Create a branch for the group of logically related changes. Commit as often as you want (P1). When all are done with the group of related changes (P2), merge back to base branch.

Git branching is lightweight so I don't see how it can't be used to support P1 and P2.

Git allows you to do violence to history: it's very much possible to commit regularly as you work, and then rearrange those commits later to be more logical.

I have no idea how easy this is, but I've seen plenty about how horribly impolite it is when you rearrange any history that other people have seen.

It helps if you keep unrelated code in separate commits. That way all you have to do is rearrange and merge commits. It gets more complicated if you have to split commits. I've been wanting to make a GUI for this purpose that allows you to drag and drop hunks/commits. Essentially a GUI for interactive rebase.

And yes, don't change history others have seen, but do it as much as you need locally.

EDIT: typo/clarity

what this really means in practice is "never branch off anything but master"

which is ok for maintenance bug fixes, but precludes a lot of interesting workflows involving speculative work, late binding the order of commits, early integration branches, etc.

my workaround for this usually just involves doing what i want up front, and then abandoning git for the merge process (since my history is polluted by false conflicts), constructing diffs and applying them as single commits against master.

which works, its just a bit silly

There's nothing stopping you from using an amend-commit and just adding every change as you go, as long as you never push to other people. That's the exact veneer that gitless provides, albeit it renames what some of the commands do.

One of the things that separates git from other command line tools is the way it handles program state. If you ask git to do something ambiguous (even when there are clear options) it bails and gives you a hint. It would be nice if it prompted with options.

`git checkout branch` with unstaged changes could interactively prompt you to stash, commit, discard, or keep the changes.

I hold my code and documents on a COW filesystem and create a snapshot every hour. The snapshots take care of P1 and git takes care of P2. I would argue that P1 is not the job of a version control system anyway.

Wow, I'm trying this out on my git repositories and it's great! Look how simple it is to "install":

    tar xzvf gl-v0.8.3-darwin-x86_64.tar.gz
    sudo cp gl-v0.8.3-darwin-x86_64/gl /usr/local/bin/
Now you can play with the commands on any git repository:

    $ cd ~/projects/farmhouse
    $ gl diff
      # Woah cool!  It uses whitespace to separate blocks!
    $ gl branch
      # Wow, this not only shows you the branches, but explains
      # all the other branching commands, like how to switch
      # branches!  Now I know that I can just:
    $ gl switch master
      # done!
And because it's just an alternative wrapper for git, you can switch back and forth, at any time, on any repo.

The original paper is here: http://people.csail.mit.edu/sperezde/pre-print-oopsla16.pdf

Their "veneer" of git is here: http://gitless.com/

> Many people complain that Git is hard to use.

Really? Do people think that?

Yes and the paper in the article tries to explain conceptually why it can be hard to understand.

I had to teach git to 10 non-developer (for small configuration code, and configuration files) used to SVN. The stagging area wasn't really a problem to learn.

Most of the problem they had came from stashing: "I want to pull but git doesn't want to !"

Or changes following you when switching branches: "I worked in developed, now I'm back to master and I have all this files modified!" Which could lead to a "reset --hard" for getting rid of those changes, which was generally regretted after that!

Also I had a few detached head states due to simple double-click on a commit name in the UI that resulted in a checkout, when they just wanted to see the commit log.

Finally a lot of confusion arised from using SourceTree that shows you a window with ten options every time you try to pull. (SourceTree wasn't my choice)

So I think gitless does provide an improvement by getting rid of the stash, but I don't really see the point of removing the staging area.

Can anyone share other git porcelain projects? I'm especially interested in any that are not necessarily intended for use with code. Think more along the lines of writing or something, maybe. Any kind of refs juggling that isn't just standard git.

magit being the most prominent and most widely used. Within the emacs GUI. Makes most of those tasks much easier.

Hub is a prominent GitHub remote porcelain tool.

I also want to point out shell prompt tools to display the current git status, which would have helped in many of those misfits.

magit being the most prominent and most widely used. Within the emacs GUI.

Since magit was mentioned, I'll add 'fugitive', the Git wrapper for Vim described to be 'so awesome, it should be illegal.'

I'm a big fan of GitUp. http://gitup.co/

I am sure like all things Git can be improved and I think the idea that users are not open to improvement is more often than not a defensive response from those whose idea of 'improvements' do not resonate.

It's always dubious for those with new offerings to dismiss criticism as 'resistance' and 'defensiveness'. This is extremely disingenious and intentionally takes things from the technical to the political.

They have taken up a marginal use case which matters to them which is fair but then try to dress it up as a git weakness with an elaborate paper which plays up their use case and scatters around strawmen like 'Git is difficult to use' unquestioned. Isn't it possible that the 'shortcomings' are not an issue to other users who may not identify with the use case highlighted and may not perceive it as an 'improvement'.

I think git stands alone in stark defiance to a growing culture of complexity. Given the scope Git could have been an extremely complex and bulky piece of software.

Its a testament to the experience of the authors that it is accesible to most and used by millions.

I have no doubt in lesser hands Git would be hugely more intricate and a complete pain to use.

People take version control for granted (thanks to git) and then claim it has " design flaws and misfits".

It can be tailored to be more user friendly sure. But the original philosophy of git was to build distributed version control system that is efficient, straightforward, and dependable.

All of those goals are achieved with its design and improving on this foundation is certainly possible. These were all hard problems to solve at the time, given the state of the art was CVS, subversion and perforce.

> All of those goals are achieved with its design and improving on this foundation is certainly possible. These were all hard problems to solve at the time, given the state of the art was CVS, subversion and perforce.

The state of the art was not CVS, Subversion, and Perforce when git had its first release in 2005. The state of the art was BitKeeper (which you might remember had a certain connection to Linus and git), Arch, Monotone, and Darcs, and Mercurial itself was first released almost simultaneously with git. And one can debate the extent to which git was an improvement on any of those on key features like cryptographic signing or UI/UX.

"People take version control for granted (thanks to git)..."

Nonsense. Version control software has existed and been in widespread use since at least the 1980's.


Walter Tichy released RCS in 1982. CVS is derived from RCS, Subversion from CVS.

CVS was released to Usenet as shell script sources only in 1986; not that far after RCS.

Subversion is not derived from CVS; it was a project inspired by problems in CVS, to replace it.

From http://svnbook.red-bean.com/en/1.7/svn.intro.whatis.html#svn...

Subversion's History

In early 2000, CollabNet, Inc. (http://www.collab.net) began seeking developers to write a replacement for CVS. CollabNet offered[3] a collaboration software suite called CollabNet Enterprise Edition (CEE), of which one component was version control. Although CEE used CVS as its initial version control system, CVS's limitations were obvious from the beginning, and CollabNet knew it would eventually have to find something better. Unfortunately, CVS had become the de facto standard in the open source world largely because there wasn't anything better, at least not under a free license. So CollabNet determined to write a new version control system from scratch, retaining the basic ideas of CVS, but without the bugs and misfeatures.

Note "retaining the basic ideas of CVS, but without the bugs and misfeatures."


base a concept on a logical extension or modification of (another concept).

Subversion doesn't have a shred of CVS code in it and uses completely different storage algorithm and versioning paradigm, radically different than CVS's "tree of RCS files permanently shaped like the working copy and individually versioned".

The basic ideas of CVS in SVN are the central repo mode with unreserved checkouts.

In SVN, new versions create new entire tree nodes which hare numbered. Branches look like subdirectories and there is renaming support. It's just an entirely different animal.

The SVN project blogged about CVS because that was their target "market" that they wanted to overtake. They trumpeted that they are making something better than CVS to which CVS users could convert.

It's not derived from CVS any more than Adobe Photoshop is derived from MS Paint.

SCCS (Source Code Control System) was developed in 1972 and predated RCS as the dominant version control system for Unix.


Version control is not a new concept and was in widespread use in the 1980's.


I fail to see how Git has a bad UX.. the way it took over other CVSs by storm suggest that those systems had UX that was significantly worse. In fact, I think you would be hard-pressed to find a recent piece of software which was a bigger success, UX-wise.

Or why is Git popular, then? Could Github be built on, say, SVN? Many times in IT, popularity is also driven by support of a large corporation, but it was non-existent in this case.

I think Git's success over solutions like CVS or SVN is the superiority of the underlying data model in its ability to take into account the complexity inherent in software being developed by multiple people at once, while at the same time allowing for individual developers to effortlessly fork their own various tasks.

I guess git's success owes most to the Torvalds as mercurial was started contemporaneously and quite similar in being a DVCS.

git is popular because git is popular (and also because of the DVCS model). I'm pretty sure that it didn't win because of UX but because of network effects (https://en.m.wikipedia.org/wiki/Network_effect). There are plenty of irrational decisions made by developers based on "geek cred" and status signaling.

Regardless of the reason for its success, git has a huge installed base and plenty of online help to find one's way out of a corner. Having git skills is also a big plus for employability. I have migrated teams onto git rather than Mercurial solely for these reasons. UX was a secondary consideration.

So do you think Linus made a mistake writing Git and should have gone with Mercurial instead? I mean, the argument by popularity can apply to CVS and SVN too, which were at times more popular than Git.. I don't find it a very convincing explanation. Popularity could have been the reason why people chose Git over Mercurial specifically (if you're talking about DVCSs), but this kinda makes what Linus did a bad choice. What I am saying is if people chose inferior product (with worse UX) due to popularity or network effects, then this was a mistake, and you should be able to point out what exactly the mistake is.

IIRC git and mercurial were being developed around the same time at each other. And lots of git's design has to do with Linux-kernel workflows, which are quite peculiar.

Those who do not remember cogito and easygit are doomed to reimplement them.

There's not a goddamn thing that's fundamentally wrong with git. My worthless 2c.

Sure there is. It's inscrutable and utterly non-discoverable. Even programmers need a dedicated Git expert on staff to help escape bad states. Non programmers are utterly helpless.

Also it fundamentally can't handle binary files well. This causes great pain.

It's from 2012, and some (but not all) of the things that are complained about here have been fixed, but Steve Benett's classic blog post verbalises all the problems I have with the git UI way better than I can, and is still quite valid:


I guess lots of things have barriers and might not be for everyone. Don't mean they're flawed. Also, git can't handle my laundry. That causes great pain.

Right, but barriers that a high percentage of users experience means it's a flaw. This factor cannot be ignored.

Taken to the extreme your statement is essentially: "Git works for me and me only therefore it is flawless."

That's fair. But tools used by everyone on a team should probably be, at least, for those captive users.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact