P.S. if your granularity is in the level if files and not individual hunks of the overall patch then you can just "cvs/git commit files..." easily but still, it'll go in untested and you may have forgotten a change that was required in order to make the commited patch valid and coherent.
Git's version is miles better.
For that reason i agree strongly. Any attempt to improve git that does not move the staging area more to being the default way of operating is misguided at the code.
This is a lot more than you get with just a single interactive commit operation.
You could use a commit for this purpose, and continually amend or revert this commit, but in terms of UI, it's helpful to have a set of operations specifically for moving things in and out of the staging area that's different than what you do for manipulating commits on general.
Or you can use an easy-to-use but powerful UI like TortoiseHG, where it's trivial to check files or hunks for commit, stash (shelve), amend, revert, rebase, ...
TortoiseHG with hggit is the best git client.
And that's before even mentioning the best parts like phases and obsolescence markers.
Take a look at `git commit -a`. I sometimes teach this one to beginners if I feel that the idea of staging is confusing them.
Why would you want to add things piece by piece and not commit them? If they are done enough to be added they are done enough to be committed and if they are not you keep working on them together with all other unfinished files. Once you actually are ready to commit you can choose which files that are in fact ready.
It provides useful checkpointing, so in practice I find that there are lots of things that I will add to the index even though I am going to keep working on them. Committing these intermediate changes is annoying because it just creates extra work when I make the final (squashed) commit.
This strikes me as one of those disagreements that simplifies to, "That's not how I do things, that's horrible."
The other half to this is that commonly, during an interactive commit (git add -p or hg commit --interactive), I realize that I missed some small change. Being able to persist that state means I can add the small change without redoing the interactive commit.
I haven't read the paper, but the last paragraph in the article makes me a little sad. Then again, there were hardly any progress that wasn't met with significant resistance initially. (EDIT: typos)
- changes on hdd
- changes in staging area
- changes in commit
- changes in commit that came before a recent commit
- changes in pushed commit
Technically you could do the same as the staging area by forcing the user to create a new, empty, but named commit when they decide they're done with the last one, and have all the current staging-affecting commands affect the last commit instead.
However, for one, i think it may be easier to mess up and lose data unexpectedly when workign that way, and for the other, putting down a name for a commit when you're not even sure what it's gonna be may be an awkward way of working. Lastly, you'd also need to implement "empty" commits. I'm not sure how useful/easy that would be with git.
gl commit -o file1 file2 file3 file4
echo file1 > staging # e.g. git add file1
echo dir2/file2 >> staging # e.g. git add dir2/file2
echo dir3/file1 >> staging
echo dir3/file5 >> staging
echo dir4/dir9/dir394/file384 >> staging
eval "$(echo gl commit -o `cat staging`)" # e.g. git commit
A staging area is just a list of things to commit. There are many ways to make lists. I personally use emacs because direct manipulation rocks. Complaining about a change in one way to make lists is silly—gitless introduces much more powerful basic mechanisms like uncommitted branching, and lists belong on top of these more powerful basic features; not underneath like in the traditional porcelain.
For example, I usually use git-add -p to put my changes in staging without the debug print statements. I still want to keep them in my working copy.
Also, if you end up committing only parts of files, then how do you know that what you commit works?
What you're supposed to do in Perforce of course is to create your changelist, shelves the others, make sure your proposed changelist works independently, and commit it only when it does. Which is analogous to what you're supposed to do with git:
- stage parts of files
- `git stash -k'
- check it works
- `git stash pop'
- stage parts of files
- (90% of the time) it's good
- (10% of the time) bzzt... it's bad
- (them) "Your commit is broken"
But with git, you have options. So let's try that again.
- (them) "Your commit is broken"
- (you) (confident tone of voice) "Ah no, you are using git incorrectly, you forgot to git fetchpack
-P -Q -J 5 -N 311 -X 0771 -R 0x134774"
- (them) (uncertain)
- (you) (confident gaze. Straight into their eyes. YOU = TIGER)
- (them) (increasingly uncertain)
- (you) (unblinking gaze. You have the upper hand. YOU = 2 TIGERS)
- (them) (wanders off to try your technobabble)
- (you) (fixes your commit)
The advantages, I think, are clear.
As long as a change or refactoring stands independently of the intended feature change, there is often value in keeping that fine-grained version history. It replicates your thought process at the time in the history messages, gives more manageable git-bisects, etc.
If you can't reliably look at the modified files and pick out what changes stand independently then this approach is obviously not for you, or at least you're gonna need some help from your tooling.
TeamCity has the concept of a 'delayed commit' that is submitted as a patch and tests are run. If the tests fail the patch is not accepted. You can also do a ghetto version of this in any system just by running a local copy of your CI server. Push to its repository target and have it run the tests. If it works push it out to the real build server, if not then there's no problem just force-pushing an amended commit since everything is still local.
Alternately you can commit the independent changeset and then stash everything else and run a build/test.
Obviously you can't always pull out an independent changeset and in those cases you should commit the whole thing at once. But a lot of times it's not actually that difficult to identify independent steps that could be built/tested independently.
1. a way to record logical changes to files (e.g. implement two features without making any commits, then when you're done, pick out the files/chunks that encode each feature and create commits out of them),
2. a record of history (e.g. just start writing code and make a commit every time you compile/run tests without unexpected failures),
3. something else?
I've found it very painful to apply my git-adapted workflow to Perforce: I _want_ to just start coding, testing out various possible design choices and implementations instead of only theorizing about them, but can't (e.g. "I wonder if I could factor out these methods + fields into a different class?" Perforce: oh well, I guess I should write it down somewhere to remember for later. Git: branch my current work, spend five minutes sketching an extraction, then realize it's insane and continue working). Am I crazy and just don't realize ow much better the Perforce model is?
I actually came to quite like it, but the workflow was sometimes a bit tiresome :( When I found myself the situation you describe, potentially wanting to make an additional change while already working on another, I did exactly what you suggest: add a todo item, and carry on until I'm done with the task at hand. Then start my second change with nothing checked out, so I can undo checkout on everything should I make a mess.
This works, and you get used to it, and of course many would say that it's a better approach - but it would be nice if the Perforce client tools could be a bit more imaginative.
As for how I think of version control, if it's git, #1. If it's Perforce, #2, plus #3 - backup and distribution.
I don't know how much value I get from being careful about my commits with git, but it does make me feel better (which I suppose could be reason enough). On the large-scale, goal-oriented projects that I've used Perforce for, worrying about logical changes has never felt very important. Does it make a big difference if you have one commit that implements 3 features, or 3 commits? When you're trying to fix a bug, you don't really mind either way, because (a) bugs don't respect feature boundaries, (b) all the features are non-negotiable, so it's not like you can back one out and carry on anyway, and (c) the project is large and fast-moving enough that even if you could, there'd be a good chance that actually, you couldn't.
What history tends to come in useful for when it comes to this sort of project: finding the code that corresponds to a particular build that you have a bug report for, and finding who made a particular change so that you can ask them about it (using the imeline view say). These both work fine whether you make granular changes or not.
You might want to be a bit more careful about things if you're working on changes that might want to be merged into another branch on an individual basis. (Suppose you're moving spot fixes from release branch to main branch - some you'll want, some you won't.) But you usually know when you're making this sort of change, and so you work that way anyway.
Because one part was, say, whitespace-errors corrected by my editor, or added/updated comments, or removed dead-and-commented code paths, while the other part was an actual logic change. There are many reasons to modify a file of source-code beyond making the code behave differently, but frequently those aspects come together as "oh, I'll just fix this while it's staring me in the face" and must be picked back apart later.
You use your knowledge as a programmer to determine what that part of the file does and whether it can stand independently as a change. This is not always obvious, so in that case, you don't do it. But when it is obvious, it makes it quick and easy to commit it separately.
For example, say you rewrote a function in a script, and while you were at it you improved the usage info (--help). Then you go to commit your changes. You look at each changed hunk, and you see that there are two changed hunks: the function you changed and the help message. With Magit, it's dead simple to commit them separately: 1) select the first hunk, 2) s to stage, 3) c to commit, 4) write the commit message and execute the commit, 5) select the second hunk...
And let's say that, while you were rewriting the function, you commented out some code in the old function. You obviously don't want to keep that old code in the new function, so when you are looking through the changed hunks, you select those lines and press k to revert (or kill) those lines containing the commented code. If you have already staged them, then you can simply select those lines and press u to unstage just those lines, removing them from the staging area but preserving them in the working tree.
The workflow of 1) develop, 2) review and stage related hunks, 3) commit staged hunks, 4) goto 2, also helps a lot by forcing the developer to look at the code he's about to commit. Of course you can just stage everything that's changed and make a big commit, but I find that using the staging area to build up commits before making them helps me catch things that I don't want to commit in the first place. It's really useful for managing config files, because you can commit only the parts you want to save or sync to other systems, while leaving some parts active in the working tree but out of the committed branch.
For example, the Picard music tagger stores window and dialog positions in the same config file along with behavioral settings. I want to store and sync changes to the options, but I don't want to commit and sync every time the window position changes.
The staging area is a powerful (yet essentially optional) feature that, when used correctly, gives the developer great power to make commits more logically independent, while allowing him to develop freely, with little concern for how he will eventually commit the changes he's making.
I test it.
If there's a certain set of changes i wish to commit in one unit, but i have other changes in flight, then i use staging to prepare and reason about them while being able to diff both staging<->previous and staging<->hdd, make the commit, then stash the hdd state and run tests. Afterwards i can restore the hdd from stash and either amend the commit if necessary, or keep on working.
If your workflow is that every commit must be tested, the correct approach is to enforce that rule, rather than banning an only tangentially related feature.
There's an inherent incompatibility between the supposed ways that the staging area can be replaced with other things. If you replace it with a commit you keep amending as you go, but also all commits must pass tests, then you can never commit partially complete code, which defeats the purpose of having the amending commit.
If you interactively choose what to commit at commit time, then you run the risk of failing the test and then, after fixing whatever was wrong, having to remember which parts of which files it was you chose to commit last time. Now, you could choose to have your CVS remember that for you, but then you're basically reinventing a less-capable version of the staging area.
If you simply commit everything, then you can never have any code in your working tree which breaks tests, nor can you have code which isn't logically part of the commit you're currently working on. In practice that means that when you find things that need fixing that aren't directly related to what you're doing, you either fix it and commit it with unrelated changes - which interacts poorly with merging or reverting those commits - or you don't fix it and, most probably, forget about it.
I commit the workspace. If I'm not happy with some of the workspace, I fix it (or stash it for later when chaos happens).
As for testing, it is trivial to stash or commit your current changes, go go the commit just made, test it, then carry on. A commit having been made partial only guarantees it wasn't tested if the developer is bad.
1. Breaking a large commit into multiple smaller commits.
2. As a simple "review" system. You stage files that have been reviewed so you know you don't have to look at them again, while continuing on a change.
3. Cleaning the working copy of superfluous changes/files. If you stage the files you want to keep, then clean and/or checkout force, it wipes out the files you didn't stage. Then just reset to unstage all files and continue making changes.
4. Splitting a branch into two different branches, when you realize that a branch could be broken down into smaller separate changes that are both incomplete, but COULD be committed separately. You merge squash to master, staging and commit creating a new branch for feature1, then stage, stash, and stash pop the rest back on to master (since sometimes you cannot just switch back to master), and create the feature2, and feature3 branches, etc.
I also use Magit in Emacs, which makes the staging area like the swiss army knife of Git. I cannot image the amount of extra pointless tedious work that would be required using commits, branches, and merges to simulate the equivalent behavior of the staging area. The staging area is used heavily by folks who like to get a lot of things done with minimal effort required.
I understand the difficulty for new users switching to Git, and I agree that the commands could be more clear and consistent, but beginner (commit all) and novice (selective commit) users should not be left to determine the utility of Git features they very likely do not fully understand or appreciate.
(How) does this work if you want/need a virtual paper trail of reviews and/or in a distributed team? Do you email your changes for review, without making a commit and push (to a branch and/or fork)? Apologies if I misunderstand what you mean, but I read "review" here as colleague walks by and have a look, and says lgtm?
Frankly, this demonstrates that you have exactly zero understanding of how git works. If I were the sort to force my team to do something, I would force them to never test uncommitted code. Committed /= pushed. A commit means you can keep working on the code while the tests run, and you won't forget what you actually tested. Then, when the tests come back clean, you won't accidentally push some additional, untested change along with the tested changes.
I do like the idea of having a snapshot of a test run, though we'd have to amend the commit or something to indicate that it passed its tests and is a sane parent for new commits on its branch, and I would switch to working on an independent feature rather than trying to do work on code which may or may not be broken.
This takes a really narrow view of what software projects can be hosted in git. If you're working on a huge project which must be built (at least for full builds) and tested "in the cloud", which you trigger by pushing to a temporary remote branch, then no, you need not have more than one workspace. If you're a webdev, then yeah, gitless or any other git workflow without the staging area might be useful for you.
Trying to use some other arcane or error-prone mechanism for getting changes onto test machines is exactly what will leak bad code, because inevitably people will end up sending around zips full of code from who-knows-where for who-knows-which-change.
If you're making an efficiency argument, git doesn't really try to beat the rsync algorithm even if you ignore the metadata. If you're making a "some people know git and nothing else" argument, fix the team, because that's a serious deficiency.
You're a bit confused about the difference between "concept" and "implementation". There's multiple ways to conceptualize git, but the one I offered is the one presented by the default porcelain while the one you offered is closer to the implementation. The encoding used for storage is entirely irrelevant to my point.
Anyways, my point had nothing to do with efficiency but was related to the topic at hand; that is, whether it it safe to use git to put code on test machines, and indeed whether that's safer than the alternatives. Nothing you said really addresses that point.
They're bijective mappings of the same data, with the same actions triggered at the same points in a workflow. The only difference is what you label the actions.
To go further: how do you feel if you think of all your un-pushed commits as simply "a chain of staging areas"? What if "committing" was something that happened at git-push time, and until then, you just created staging-area after staging-area and populated them with patch-chunks? Because this would also be a bijective mapping, involving the same actions in the same places.
I've been using mercurial for a couple months for work, and I miss the git staging area workflow. I dislike just assuming all the modifications I've made are good to go, I think the focus git's staging area steers you toward is really valuable.
Regardless, your workflow of 'git diff' on staging with 'add's functions the same as 'git diff HEAD~' on a commit with 'commit --amend's
To illustrate a little more explicitly, a typical workflow goes like this:
git add file1
git add file2
Undoing part of a commit to be the equivalent of an "Unstage" action is feasible, I'm sure, but I couldn't tell you off the top of my head exactly what that command would be.
So overall, part of it is that it's a mental model, part of it is that the concept is built right into Git.
But looking at it, this seems like it would strongly couple the decision to include a part of a file with the commit process rather than something that I can be thinking about as I write the change. I think I'd likely find that irritating.
"The problem with commit is that it constitutes a violation of the coherence criterion: the same concept (commit) has more than one, unrelated, purpose: make a set of changes persistent (P1) and group logically related changes (P2).
"These two purposes are not only unrelated, but in tension with each other. On the one hand, you would like to save your changes as often as possible, so that if something bad happens you lose as little data as possible (thus encouraging early committing). On the other hand, a logically related group of changes usually involves multiple individual changes, which means that you might be working for quite some time before you have enough changes to group (thus encouraging late committing).
Now a fix for that would be useful. Git only provides you with the ability to create a graph, and you get to choose what you use it for. You have to choose though because there is only one graph. Gitless doesn't seem to fix this problem, you still have to choose what goes into your final commits. Though I don't know what you could do about it without changing git to support something like grouping commits into blocks that are modified as a unit.
Git branching is lightweight so I don't see how it can't be used to support P1 and P2.
I have no idea how easy this is, but I've seen plenty about how horribly impolite it is when you rearrange any history that other people have seen.
And yes, don't change history others have seen, but do it as much as you need locally.
which is ok for maintenance bug fixes, but precludes a lot of interesting workflows involving speculative work, late binding the order of commits, early integration branches, etc.
my workaround for this usually just involves doing what i want up front, and then abandoning git for the merge process (since my history is polluted by false conflicts), constructing diffs and applying them as single commits against master.
which works, its just a bit silly
`git checkout branch` with unstaged changes could interactively prompt you to stash, commit, discard, or keep the changes.
tar xzvf gl-v0.8.3-darwin-x86_64.tar.gz
sudo cp gl-v0.8.3-darwin-x86_64/gl /usr/local/bin/
$ cd ~/projects/farmhouse
$ gl diff
# Woah cool! It uses whitespace to separate blocks!
$ gl branch
# Wow, this not only shows you the branches, but explains
# all the other branching commands, like how to switch
# branches! Now I know that I can just:
$ gl switch master
Their "veneer" of git is here: http://gitless.com/
Really? Do people think that?
Most of the problem they had came from stashing: "I want to pull but git doesn't want to !"
Or changes following you when switching branches: "I worked in developed, now I'm back to master and I have all this files modified!"
Which could lead to a "reset --hard" for getting rid of those changes, which was generally regretted after that!
Also I had a few detached head states due to simple double-click on a commit name in the UI that resulted in a checkout, when they just wanted to see the commit log.
Finally a lot of confusion arised from using SourceTree that shows you a window with ten options every time you try to pull. (SourceTree wasn't my choice)
So I think gitless does provide an improvement by getting rid of the stash, but I don't really see the point of removing the staging area.
Hub is a prominent GitHub remote porcelain tool.
I also want to point out shell prompt tools to display the current git status, which would have helped in many of those misfits.
Since magit was mentioned, I'll add 'fugitive', the Git wrapper for Vim described to be 'so awesome, it should be illegal.'
It's always dubious for those with new offerings to dismiss criticism as 'resistance' and 'defensiveness'. This is extremely disingenious and intentionally takes things from the technical to the political.
They have taken up a marginal use case which matters to them which is fair but then try to dress it up as a git weakness with an elaborate paper which plays up their use case and scatters around strawmen like 'Git is difficult to use' unquestioned. Isn't it possible that the 'shortcomings' are not an issue to other users who may not identify with the use case highlighted and may not perceive it as an 'improvement'.
I think git stands alone in stark defiance to a growing culture of complexity. Given the scope Git could have been an extremely complex and bulky piece of software.
Its a testament to the experience of the authors that it is accesible to most and used by millions.
I have no doubt in lesser hands Git would be hugely more intricate and a complete pain to use.
It can be tailored to be more user friendly sure. But the original philosophy of git was to build distributed version control system that is efficient, straightforward, and dependable.
All of those goals are achieved with its design and improving on this foundation is certainly possible. These were all hard problems to solve at the time, given the state of the art was CVS, subversion and perforce.
The state of the art was not CVS, Subversion, and Perforce when git had its first release in 2005. The state of the art was BitKeeper (which you might remember had a certain connection to Linus and git), Arch, Monotone, and Darcs, and Mercurial itself was first released almost simultaneously with git. And one can debate the extent to which git was an improvement on any of those on key features like cryptographic signing or UI/UX.
Nonsense. Version control software has existed and been in widespread use since at least the 1980's.
Walter Tichy released RCS in 1982. CVS is derived from RCS, Subversion from CVS.
Subversion is not derived from CVS; it was a project inspired by problems in CVS, to replace it.
In early 2000, CollabNet, Inc. (http://www.collab.net) began seeking developers to write a replacement for CVS. CollabNet offered a collaboration software suite called CollabNet Enterprise Edition (CEE), of which one component was version control. Although CEE used CVS as its initial version control system, CVS's limitations were obvious from the beginning, and CollabNet knew it would eventually have to find something better. Unfortunately, CVS had become the de facto standard in the open source world largely because there wasn't anything better, at least not under a free license. So CollabNet determined to write a new version control system from scratch, retaining the basic ideas of CVS, but without the bugs and misfeatures.
Note "retaining the basic ideas of CVS, but without the bugs and misfeatures."
base a concept on a logical extension or modification of (another concept).
The basic ideas of CVS in SVN are the central repo mode with unreserved checkouts.
In SVN, new versions create new entire tree nodes which hare numbered. Branches look like subdirectories and there is renaming support. It's just an entirely different animal.
The SVN project blogged about CVS because that was their target "market" that they wanted to overtake. They trumpeted that they are making something better than CVS to which CVS users could convert.
It's not derived from CVS any more than Adobe Photoshop is derived from MS Paint.
Version control is not a new concept and was in widespread use in the 1980's.
Or why is Git popular, then? Could Github be built on, say, SVN? Many times in IT, popularity is also driven by support of a large corporation, but it was non-existent in this case.
Regardless of the reason for its success, git has a huge installed base and plenty of online help to find one's way out of a corner. Having git skills is also a big plus for employability. I have migrated teams onto git rather than Mercurial solely for these reasons. UX was a secondary consideration.
Also it fundamentally can't handle binary files well. This causes great pain.
Taken to the extreme your statement is essentially: "Git works for me and me only therefore it is flawless."