If you rebase aren't you destroying that history of experimentation? I feel like this is destroying the whole idea of a VCS as a safety net and making developers self-conscious about something that supposed to tolerant of mistakes.
Cleaning up your history before merging is important. For one, before you merge you should usually have someone do code review. No one wants to do a code review on a branch that has a bunch of false starts, typo fixups, debug print statements being added and removed, and so on. Code reviewing a branch that breaks something and then fixes it three commits later is a real pain; you sit there puzzling over the first commit, wondering how it could possibly work, and writing up a big explanation for why they need to fix something, then you go on to a later commit and realize they already fixed it.
Furthermore, dirty branches lose you a lot of the power that having a good, clean history gives you. When you do a blame on a line of code, to figure out when the last change was, do you want to see the "fix whitespace to match style guide" commit that someone insert in the branch at the end, or the actual meaningful change that occurred earlier? If you don't squash your commits to deal with these kinds of issues, you lose a lot of the power and convenience that good history gives you.
There's more. One of Git's most powerful tools is bisect, but even in a VCS without an automated bisect, doing it manually can be useful to (I've done this in SVN before). If you have a regression, but have no idea what caused it, it can be very useful to bisect your commits; find a known good version and a known bad version, then go to the commit halfway in between, test that, and depending on whether that commit is good or bad, test the one halfway between that and the known good or known bad commit. Keep doing this until you find the commit that broke your code. But this process is seriously impeded if you have a bunch of half-done commits that implement a part of a feature but break something else that's fixed up three commits later.
The "history of experimentation" nature of VCS history is just not all that interesting. Think of your VCS history more as an extended form of comments, that document why everything is the way it is. If you actually wrote comments on every line describing why you had changed it in a particular way every time you changed it, your code would wind up being more than 90% comments in not too long. Most of the time, you don't need to see this; but when you are left wondering "hmm, why is this the way it is?", good history is invaluable. The experimental changes in between aren't all that useful; if you got any information from them, then feel free to summarize that in the cleaned up commit message after you've squashed them out.
Now, that's not to say that you should always produce perfect history while working on a branch. Feel free, when you're in exploratory coding mode, to make lots of checkpoint commits, experiments, and so on. Just clean it up before you present it for review and merge. The nice thing about Git is that you have your own local branches that no one else ever has to see, clean things up quickly and easily with "git rebase -i", and present a much nicer history when it's ready for merge.
No, you should never be afraid of committing anything you have at any point in time. Git works as a development tool as well as a central VCS. As long as you have committed something, it will be restorable in case you overwrite it or delete it. Telling someone to wait before committing is a bad idea. They may get a lot of work done and then inadvertently lose it somehow, permanently. Instead, you should commit often and then use interactive rebase later to clean things up. You want to be able to have the freedom to switch branches, navigate history, and work on multiple features/bugfixes at the same time. You're restricting your ability to do these things if you wait too long to commit, and you're increasing the danger of losing your work.
Explain to me how tags get me to an understandable view of my DAG so that I can see clearly what has been happening to the code, by whom, and why. Tags are just labels put on commits. How can I get a clean view of the history by feature? Do you put a release tag on every single bug fix and logical change that someone makes? Why would I go through the hassle of putting a tag at the tip of every single code reviewed chunk of changes? Why would I want all of these tag names cluttering my git log alias that shows me the history? How are tags going to compensate for the endless bubbles of "merged master into master" that inevitably clutter up the graph when people don't bother to rebase? How do you tell git bisect to skip all the intermediate bullshit meandering commits between the countless tags?
> Explain to me how tags get me to an understandable view of my DAG so that I can see clearly what has been happening to the code, by whom, and why.
That's what the commit history is for. If you don't like seeing merges use git log --no-merges. You can use rebase to avoid seeing merge commits, but it's awfully unnecessary with the nasty side-effect of destroying history.
I was suggesting tags as way to keep an alternate history of features or releases. Features can be developed in separate branches for them, but you could tag features when you merge them in if you want an easy history of feature merges. You can list tags by date, use prefix's for sorting, etc.
The history of tags happens at the release level. That is not granular enough. The history of every last little typo fix is too granular; it's just worthless to preserve. Using tags for every merge isn't all that useful; you already have the merge commits for it.
What you want is a logical sequence of correct changes (or, as correct as anyone could tell at the time; of course no one's perfect).
If you you have to do code review, track down a bug by bisecting a commit history, or figure out what patches from one branch need to be ported to another, you want to have good history. False starts and fixes to typos from previous patches have no value; in fact, they have negative value, as they obscure the interesting information that a good history provides.
Cleaning up history really doesn't take that long. When something is about ready to merge, take a quick look through the history to figure out which patches are redundant or logically belong as part of previous patches, do a "git rebase -i", and squash them into the appropriate patches. In the process, make sure your commit message are actually good enough that someone doing a code review can actually follow what you're doing (no "fixed a bug in this function; fixed a bug in that function"; actually explain what you fixed and why your fix is the right one).
What do you mean that's what the commit history is for? That's what a DAG is; that's what I'm talking about. You know what the DAG is, right? I don't want to exclude all merge commits when looking at the DAG. I merely don't want to see all of your "merged master into master" bubbles because you can't be bothered to clean up a bit and rebase before pushing your changes.
I don't know what you're going on about with this "destroying history" as if the sequence of your little typo mistakes are some kind of precious documentary that needs to be preserved in case some forensic expert wants to trace every step you made along the process of adding a widget. You might as well go find a system that records and tracks every key you type, because after all, every time you hit the backspace key, you are destroying history.
Tags do not keep alternate histories. They are simply labels on commits. You use them to mark certain commits as releases, you do not use them to track every logical change to the codebase. They are used sparingly to track the occasional version number bump as a result of a sufficiently large number of changes. These version tags do not provide the granularity I need when I look to see what is happening on a single branch at any point in time. To add them to every non-trivial commit as a way of distinguishing them from the just-dicking-around commits would be ludicrous.
edit: One more thing. I think it is absolutely silly to say in one comment "stop committing non-workable intermediate stuff and finish what you're doing before committing" and then turn around in another comment and talk about how rebase has a "nasty side-effect of destroying history". You do realize that all the editing and polishing you're doing before you make your commit is the same type of destroying history that would happen if you made small, incremental commits and then cleaned them up with rebase, right? The only difference is that your way is way more dangerous as far as losing history is concerned, and you're not taking advantage of any of the benefits of Git in the process.
Squashing is not the purpose of rebase. Rebase allows you to clean up history. Sometimes, that means _separating_ large commits into smaller, atomic ones. Sometimes that means re-ordering things to make more sense for the reader. And yes, sometimes, an atomic unit requires squashing two or more commits together.
Commits should be logical units of the codebase, not units of developer productivity over time.
You can easily squash those experimental commits and have "incremental, atomic" commits in history. What it gives you is the freedom with a clean slate after each commit. And no stashing doesn't work because the next experiment might depend on the last one. Not happy? Interactively rebase HEAD~n and get rid of all the experimental stuff. Changed your mind? Git reflog is your friend.
> When you do a blame on a line of code, to figure out when the last change was, do you want to see the "fix whitespace to match style guide" commit that someone insert in the branch at the end, or the actual meaningful change that occurred earlier?
git blame -w # works with git diff and git show too
(You might also be interested in --word-diff=color for git diff and git show)
Well, why not a concept of 'soft' and 'hard' commits (or sub commits, or major and minor)? Let people do what they must, let the logic behind it stand, and give a nice clean history by ignoring the soft commits unless you explicitly access them?
I see both sides of this, but on my own team, where we are all meant to be experts on the project, I really like to be able to see the experiments, because there's a reasonable chance I'll be trying something similar to or perhaps inspired by those throwaway experiments at some point. I think there is a different trade-off in open source projects, where it's more helpful to have a history that isn't confusing to newcomers.
I think I wasn't clear. I like to be able to see completely experimental entirely thrown-away approaches in branches like you suggest, but I also like to see the little hints of partial experiments that a "dirty" history shows. My point is that on projects with focused teams where everybody is or should be an expert, literally the more history I can access, the better. Things like "did somebody already try and fail to refactor this class? What was there approach? Can I actually do better, or am I just headed down the same rabbit hole?" are invaluable to me, and the best way to answer those questions is to see "dirty" remnants of things people have tried and un-tried in the history. History can be forensics, and in forensics you don't want things to be "clean".
Exactly this. I don't mind leaving the history of experiments in if they lead fairly logically and cleanly to the final product, but sometimes I will have 2 or 3 "WIP" commits in a row, that turn out to be completely irrelevant to the final product.
> You need clean commits on the history to be able to understand the code later on.
I think this whole debate hinges on peoples' view of that sentence. Sometimes clean history helps comprehensibility and sometimes it obscures things. I think the amount to which each is true varies author to author, reader to reader, and project to project.
> I think this whole debate hinges on peoples' view of that sentence.
Prescient observation. How does clean history obscure things though? You mean as failed experiments get removed? Important things ought to be mentioned in commit messages. Relevant things to document can be showcased like "Tried X but it turns out Y is better because Z." I often find that code alone is not enough to describe why something did or didn't work. One ends up having to explain in commit messages anyway.
> I often find that code alone is not enough to describe why something did or didn't work.
Me too. And I also often find that commit messages alone are not enough to describe why something did or didn't work. Code and commit messages both help.
"Clean history" can obscure things when it leaves out information about the often messy process of creating the software. It's impossible to know ahead of time what information will and will not be useful when attempting to grok a piece of code in the future, so sometimes it makes sense to err on the side of more information, instead of less.
> You need clean commits on the history to be able to understand the code later on.
Really? That seems like an extraordinarily obtuse way to understand code. I would think comments directly the source files would be more useful. Commit history shows how they arrived at that result and that's what I would rather see there.
What do you think is easier for the next guy?
1) three commits that do:
- a = 1;
- a = 7;
- a = 3;
2) one commit that says:
a = 3;
My point is that experimentation is slightly different from changing your mind about the whole implementation.
It is the same as writing your homework. You have a separate piece of paper where you make your experiments.
Not to mention that in practice, if this sort of history cleaning is forbidden then people will just attempt to not commit those first two lines, meaning that they are not getting the full benefit of version control.
When does the "next guy" ever look at revision history to see what's going on? I only ever look at the current state. If I want to see how it diverged from my last commit or the last commit before whatever milestone or release, I will diff the current version against that version ignoring every commit in between.
> When does the "next guy" ever look at revision history to see what's going on?
_All the time_. I read all the commits in the codebase I'm responsible for. I need to keep track of what people are doing and how the system is changing. This is also on top of the need for code review and ensuring each patch is correct.
Not on most projects, no. We are on my current project and the review tool is based on tasks, not commits and can review multiple commits in a single session if they are all related to the same task. Regardless, I am only ever reviewing the end state.
You have NEVER finished work on a feature and you look at the history and there are 10's of commits with crappy commit messages and extremely minor changes? If so, good for you. Not so for me. I don't always squash ALL commits on my feature branch, but I often remove a good number, so the history looks helpful for my future self.
You can always keep the un-rebased branch around if you want to preserve the history somewhere, and then squash the commits down to a smaller number on top of master. That solves the safety net problem, and master keeps its sanitized history.
This is what I do and it's kind of required when you're using e.g. gerrit code reviews.