I'm fairly new to git. I've only been using it for about two months. It seems like this is a lot of work with the end result only being that the commit log is cleaner and perhaps makes cherry-picking a feature/fix a bit easier.
I can understand the need for this kind of cleanup when pushing a fix to an open-source repository that needs pull requests to be self-contained, but for an internal company repo, how important is it to keep the commit history this clean?
If I have changes I'm not ready to commit and need to switch branches to work on another issue, I find that doing a STASH is an easier way to go. I can just stash my working copy changes, switch branches, then come back and apply the stash and keep going and then make one final commit with just the final changes I want to commit.
Other DVCS actually believe that being able to modify commit history is a bad thing and lean toward immutable commit history (e.g., Veracity). Git makes it pretty easy to modify commit history which is ok for local branches but can be easily misunderstood to break your branch if you're trying to modify commits that have already been pushed to a remote repo.
I like the idea of keeping the commit history clean but I'm not sure that it's worth the effort that it takes to manage the process. In the end, only your final good code is going to be merged into an integration or master branch anyway.
There's a few advantages to doing things this way.
If you ever want to git-bisect your code-base to track down when you introduced a bug then you'll be very thankful that every commit is functional and passes your test-suite (apart from the tests relevant to new features on a topic branch of course). Having to hop forwards and backwards from each git-bisect point looking for a functional commit is a huge waste of time.
Breaking your changes into clean commits with proper explanatory commit messages also makes it much easier for people working on the code in the future to work out the intent of various parts of the code.
Think of a messy commit history as a form of technical debt: Sure it's quicker to just move forward with development, but if you have a history that breaks things down into clean commits with well written explanatory commit messages then you're making life much, much easier for whoever has to debug that code in the future. Odds are that that person is going to be you & by the time you come back to the code you've have forgotten all about it and be forced to spelunk through the commit history in order to work out what on earth you were doing. Think of it as a service to your future self :)
"If you ever want to git-bisect your code-base to track down when you introduced a bug then you'll be very thankful that every commit is functional and passes your test-suite"
I agree that this property is very useful, but I disagree that it it is necessarily implied by the workflow as described in the article. By using "git add -p", he is constructing a tree that probably never actually existed during development - hence there is no guarantee that it works and passes the tests.
I strongly agree with you that a clean logical progression of commits is a good thing (especially for code review). However, making sure that each stage works and passes the tests takes extra discipline.
The author runs his tests after his partial commits. He does this by stashing the remaining changes, testing the newly-committed code, and then continuing to make partial commits. Here's the example in the article:
% git stash
% make test
...
OK
% git stash pop
Also keep in mind that the initial tree contains WIP commits that are unlikely to work or pass tests, so reordering commits can hardly make things worse.
I care very little about the cleanliness of my history. I mean, I care that it's not horrendous, but a bunch of extra merge commits or a few random fix-up commits don't make me think twice. However, I've found a the tools for maintaining clean history to be extremely useful for getting real work done.
My typical workflow with Git is one of incrementally appending commits to a branch, and then using git rebase --interactive to bubble sort commits by impact.
For example, let's say I am working on a feature and I stumble across an unrelated bug that's easy to fix. I fix it. Then I commit just that fix. And then I go about my day. Sometimes, my fix may depend on a small refactor or ancillary change that's not finished. So I make the change anyway, and commit it, completely broken. After finishing the ancillary dependency, I use rebase to reorder the commits so that I can test the change in absence of the bug fix, and then again with it applied.
I've had feature branches grow up to 20 or 30 commits, where 10+ of them are totally borked or otherwise need to be re-ordered. By the time I've made sense of it all, I send it out to my team as 2 or 3 pull requests with 1 to 5 commits each. In the process, I look at my code diff over, and over, and over again. I find lots of bugs by inspection this way. I enjoy writing code this way.
This way my code can be reviewed as a series of logical transformations. It's much easier to review this way. I really appreciate it when my co-workers send me clean patches to review, so I respond in kind.
In summary: It's not about a clean history (although that's a nice side effect). It's about using Git as a tool to help you and your team reason about changes.
I agree that in many cases this meticulous approach is overkill. But what it demonstrates nicely is, as the man says, 'Git means never having to say "If only I'd realized sooner."'
When you do need your commit history to be as neat as possible, git gives you plenty of tools to make it happen. Imagine the nightmare a task like this would be with Subversion. (I haven't used Hg a lot so can't say how it compares in this regard--I'd be curious to know.)
Here are some of the situations I run into on a regular basis at my
current company, which is Perforce based, where I wish we were git
based instead. This would be as true in Subversion. They might be
maybe less true in Mercurial or Bzr, I'm not as familiar with those
ones.
* I am hacking on some bit of code, so I set up a Perforce branch. While I'm at it I notice some ugliness and want to refactor it. I could make a new Perforce branch and put it there and then do some horrible merging apparatus but it's complicated and they live forever and it takes me out of the flow state, so I just make a commit. It is now very difficult to extract that commit from its context so that I could apply it independently.
* I make a mistake on a commit that is on a private Perforce branch, or maybe I did something out of order. The Perforce way is "you shouldn't have done that, then." and so my commit logs are full of "fix stupid typo" type commits that are basically just noise.
* I forget to add a file to Perforce and nothing tells me about it, ever, until we get to testing (or sometimes to production!) and Puppet won't run because it can't find some file <foo>. This is fixed with the 2012 betas and p4 status, but those require a server upgrade (because the p4 client is a thin wrapper around what the p4d server understands) and that isn't in the cards yet. This alone is enough to make me want to use git-p4 for everything.
* You can really tell that Perforce is built around a file system of RCS files sometimes. While it has atomic commits, many, many operations are based on individual file revisions (labelling, cherry picking, merging). This leads to the following:
* Cherry picking in Perforce is hard, because it's hard to get handle on the content of a changeset, as opposed to individual upticks in each of the files referenced. It can be done but it is a lot of effort, way more so than in git.
This is not the entirety of my complaints about Perforce, by any
means, but they're the most visible UI limitations compared to what
I'm used to from git.
Mercurial is more like Git than not. It differs in the way it stores revision history (Git stores files, Mercurial stores deltas, basically).
And branching is Mercurial isn't nearly as neatly implemented as it is in Git.
But it would seem to address the issues you listed here. For example, you can rollback a commit if you did the "stupid typo" mistake. And you have a lot of power to "change history" and modify the DAG anyway you wish.
In my opinion branching _used_ to not be as neatly implemented as it is in Git, but that is no longer the case.
A couple of versions ago a new feature was added into core mercurial, called "bookmarks". Mercurial bookmarks are the same as Git branches.
So now you can use the "git style branches" (i.e. bookmarks) in mercurial, or you can use the regular mercurial-style branches (which I personally prefer to git's anyway). In addition you can use "anonymous" branches, which AFAIK do not exist in git at all.
I've been streamlining my commit histories this way for several years now, and it's worth the effort.
I do it pretty much the same way the author does (only I use Emacs and Magit mostly). The reason I do it is because my work has to make sense to other people, and also to me six months later. So I think of my commit histories as stories that must tell people how my software got from point A to B, and tell them clearly.
The chronological history of how I wrote some batch of code is just the first draft of the story. It's messy. It contains WIP commits, false starts, backtracking, and all sorts of cruft that tells more about the noise of software development (and when I stopped for dinner) than about how the software logically moved from one sensible point to another.
Once I've got a chuck of work all figured out, all that noise has to go. It must be edited away before I publish the work upstream. Otherwise, I'm just weighing down everybody who has to understand the work later (including me).
I also don't see the point. If I need to experiment, I'll create a branch, mess around there, and merge it back when I've figured things out and have everything working.
The workflow in the article seems like it's trying to be clever for the sake of being clever. I guess it works for him, but I think most people would find it too confusing to be practical.
I'm a release engineer, and I couldn't agree more with the sentiment of this post. I have immense respect for mjd's work (His articles on Perl saved my bacon more than once - _Suffering with Buffering_ and _Coping with Scoping_ in particular) but this workflow just plain stinks and feels like building a house, then demolishing it and picking through the rubble to build a better house you're satisfied with, rather than (as git was designed to do) experimenting off to the side in a branch and only commiting Good Stuff to your 'for real' branch (master or otherwise.)
Assume that while working on your branch you made a few commits which were not perfect (say you wanted to take
a day off and you want a commit point to work from when you are back). This all good and fine, you come back get the code working on
your branch, make a commit and merge it to the main branch. When one looks at the commit log in the main branch it would
list all the commits you have made (even the ones on which the code was not in a good shape). But at the same time the latest
commit on main is good and this not a big problem. But if you really want to keep the history in the main branch to look clean and show
only good commits, then it is not the case.
Though I don't follow what the author practices, I don't think it is a practice that is done for sake of cleverness.
There is a benefit in keeping the history clean in the main branch.
My only wish is probably if git allowed one to achieve the same without so many hoops.
I find this to be over kill but I see the point.. as if you're always trying to make "good" commits you kind of lose the "get all the shit done" mentality as you have to separate everything in smaller ideas.. whereas it's faster sometime to really hack lots of thing and commit, and then at the end cleaning stuff.
this is a lot of work if you are OCD and print out your diffs to mark with a highlighter. A decent git gui like GitX on mac will let you do it with almost no effort at all.
Maybe it's just me, but using short lived topic/feature branches and squash merges seems much easier than remembering all of these steps to "fix up" all these things.
I think it becomes some what of a game to use all the more obscure corners of git when a good 20-30% of it goes a long way.
I agree, but that also requires a discipline/foresight not all of us have. :) I try to keep my feature branches as atomic as possible, but sometimes I get ahead of myself and work on multiple things that really shouldn't be in the same commit. I wouldn't use this model as my main workflow, but it's definitely useful in certain scenarios.
if your code is 'live' and you plan ahead a few features: feature branches.
if you are running as fast as you can for the first releases, what i do is use feature prefixes on the commits. And of course, a commit can have several prefixes.
of course, if i have to cherry pick something, i still have to look at each commit, but with prefixes it's easy. And i have the time i didn't spend thinking ahead for something i don't even know i will need.
This seems like a pretty terrible workflow to me, do others here actually work like this?
I only use git-add -p when I've screwed up and didn't commit when I should have, so I have to split the current commit into two. It seems to me that rebase -i and merge --squash are better suited to re-writing history in the way that's being done here. I'm especially distrustful of any workflow that includes the line "I eyeball the diff".
But I'm no git guru. Is this a common way to work? Are there advantages over the alternatives?
I have to shift my brain into totally different modes between programming and reviewing/testing/version-controlling, and git add -p is an important tool. Maybe you're disciplined enough to keep a queue of everything you'd like to do in your head at once, and only stick to one task at a time, and shift into review/test/commit mode between quantized chunks. Me, I just go in, hack for a bit, and when I run out of ideas, I shift modes, read the git diff, break out my work into chunks, and then add -p/stash/test/review/commit chunk by chunk.
If you want all of my commits to be functionally and semantically separate and individually tested, I can give that to you. git-add -p is just an interface between that well-disciplined software-engineering expectation and what my brain actually does when it gets into flow.
It means you keep your original history around & build a new one which breaks the code changes into functional chunks.
Rewriting your existing history with git rebase -i is fine until it goes horribly wrong & you have to go groveling through the reflog to work out which commits you need to rescue in order to retrieve your lost work.
I don't see that. Keeping your original history around is a function of doing cleanup on a separate branch. It has nothing to do with how that cleanup is achieved.
Rewriting your existing history with git reset can also go horribly wrong, which is why it's done on a separate branch here.
I think in part he's documenting modules. Because modules often crosscut the primary hierarchy of files/classes/methods, it's awkward to document in the source itself, and so usually isn't documented at all.
Git allows documentation across files; and because commits are naturally associated with specific revisions, it can't get out of date.
{ Still, it does seem a lot of work, and it would be nice to document within the source itself; and in a way that helps and is needed by the code (so it can't get out of date). A bad example: specifically requiring/importing another class before being able to use it. This documents dependencies, and remains current or your code stops working (it would need to be an error to require a class without using it). It is a "bad example" because it doesn't help you, just raises a barrier then "helps" you cross it, like a stand-over man in an extortion racket.
What's needed is some immediate benefit (e.g. reduce code) to associating files/classes/methods in a crosscutting "module". }
I said this in a reply, but lest it get lost in the shuffle, here goes. I have a tremendous amount of respect for mjd. His seminal articles on Perl, particularly "Suffering with Buffering" and "Coping with Scoping" saved my bacon more than once. However, I'm a release engineer, and this is an awful, awful Git work flow. As others have noted, Git was designed so that you can make wacky wild experimental changes off to the side in branches, and only when you're happy, merge them back to your working branch and eventually into master. This approach needlessly complicates things and seems incredibly error prone, especially the bit about having to do house cleaning on the morass of broken commits.
I think he mentions that he does test these new commits that he creates as he goes about re-arranging history, but I think that should be emphasized more. I'd rather have a messy looking commit that passes tests than some nice looking commit that doesn't. bisect is powerful command that shouldn't be broken.
By the way, that article page is too hard to read comfortably. Shortened the width of each paragraph and upsize the font will be really good for me to read it.
Once you get used to reading diffs in your terminal, git -p (and -i) turn out to be very useful friendly. It's really easy to just slam y or n a bunch of times, rather than click around a lot.
big just uses text diffs as well, only color coded which really adds to the readability of it. I don't need guis for most of the things. The benefit of GitX over tig or git -p is that it easily allows to stage a single line instead of the whole chunk
Often I'll be in the middle of something, with a dirty work tree, when it's time to leave for the day. Then I'll just commit everything with the subject WIP ("work-in-progress"). First thing the next morning I'll git-reset HEAD^ and continue where I left off.
Curious, what is the advantage of doing this? Why commit something locally just to reset it out the next morning?
> Such commits rarely survive beyond the following morning, but if I didn't make them, I wouldn't be able to continue work from home if the mood took me to do that.
I think the underlying assumption here is that he commits and pushes these changes somewhere accessible from home and not taking it with him on his laptop or something. Otherwise there really isn't a good reason for committing it.
git-add -p is new to me, but looks like something I'd wish I'd known about for a long while. The number of times I end up copying changes (like a new function) into some temp file while I commit is more than I'd care to admit.
When you guys delete a feature/complex function that you wrote just to discover you don't need it, do you guys mark it somehow to know something that worked was removed in that commit?
The proliferation of "git tips", and "my git habits", "git best practices" articles, make me think that Git is too complicated for its own good to need all those.
(Maybe "need" is a bad choice of words: maybe it doesn't need those articles, but it's too complicated if it gets to have them. You don't get such avalanche of advice for a simple, no BS, tool).
Now, the complicated part means it's flexible --in the rare cases you need it to be. But it could probably use a facade that makes the common use cases more intuitive (there are some half-baked attempts that I'm aware of).
It's not exactly hard to find people discussing different best practices and their own individual workflows in Mercurial, either[1].
The existence of these articles doesn't tell us anything negative about Mercurial, or Git, it merely tells us that these tools are powerful enough to be used for Real Work in Real Workflows, and that there are few things people love more than talking about their workflows.
Probably you meant to use the </misguided snark> tag.
The Photoshop Tips and Tutorials are about doing something new in a realm with inherently infinite possibilities, that is bitmap editing/drawing.
Managing source code shouldn't have "infinite possibilities" -- there are a few common use cases, and several more uncommon.
If you have "infinite flexibility" in your source code management system, you are doing it wrong, or at least inefficiently structured or streamlined, that is, in the wrong end of the scale of 1 => you do everything by hand, 100 => the computer does all for you as you want it to.
Not all workflows can fit an one-size-fits-all team, sure.
But that doesn't mean that you should have to micromanage the workflow even in the most common use cases.
I.e Git is more of a "source code management DIY kit" than a "source code management tool".
Some people like micromanaging their workflow. Apparently Mark Dominus is one of them. Other people don't like managing their workflow, and thus they end up with just `git commit -a; git push` as the entirety of what they use. Just because some people micromanage their workflow doesn't mean everyone has to.
One thing that I read about Git is that Git is really just a document store, not a version control system. That means that you can use the document store to do whatever you want. It's up to you to decide the workflow that you want to use.
In my team we've had long discussions about how we want to code, what to branch, when to branch, when to merge, etc. all about how to keep things organized.
I think people like Git because of the flexibility. Different people and teams can use it in different ways. They don't feel boxed in to a predefined workflow.
Not sure why you were downvoted, gitfanbois I guess.
I've used both git and mercurial. I would tend to agree that git has more of a feel of non-conformity when it comes to best practices and default ways that it behaves.
That said, mercurial has an equally bewildering array of choices when it comes to all the different add-ons that you can install to do similar things.
I think they're both great tools and I love the way that they've both innovated ahead of each other and copied the best of each other.
Had to switch from mercurial to git. Both are nice, neither is perfect.
I long for the time when we won't have to do so much ancillary work to just sync up files. My bet: In 5 years, all these contrived workflows will be replaced by:
You know, syncing is the least I do with Git. In fact, for my personal projects, the syncing is entirely incidental--just a lazy way to publish and back up my files.
In addition to syncing, I use Git to keep track of my history, to maintain multiple versions in parallel, to work on different features at the same time without interfering with each other and probably a bunch of other things I'm forgetting. All this just for projects with one programmer--for work and group projects, I use even more complicated features.
So yes, if all you want is syncing, perhaps Git is contrived. But if you actually want version control, it's actually pretty simple.
I see what you mean but one could consider version control as a subproduct of syncing a working directory with a trunk. That do not need to mean syncing between different user's working directories.
I can understand the need for this kind of cleanup when pushing a fix to an open-source repository that needs pull requests to be self-contained, but for an internal company repo, how important is it to keep the commit history this clean?
If I have changes I'm not ready to commit and need to switch branches to work on another issue, I find that doing a STASH is an easier way to go. I can just stash my working copy changes, switch branches, then come back and apply the stash and keep going and then make one final commit with just the final changes I want to commit.
Other DVCS actually believe that being able to modify commit history is a bad thing and lean toward immutable commit history (e.g., Veracity). Git makes it pretty easy to modify commit history which is ok for local branches but can be easily misunderstood to break your branch if you're trying to modify commits that have already been pushed to a remote repo.
I like the idea of keeping the commit history clean but I'm not sure that it's worth the effort that it takes to manage the process. In the end, only your final good code is going to be merged into an integration or master branch anyway.