Rebasing destroys historical information. I can't really see any advantages of rebasing when a merge does the same thing but leaves two things rebasing does not: 1) a point to rollback to if things don't work out, and 2) an explicit entry of when your branch was brought up to date with master.
You may say I'm doing it wrong, and that's fine, but it works for me.
git checkout master
git diff master..featurebranch | git apply
During code review at a later time, the history of the experimentation is useless once you find several commits that touch the same code before settling on a final version.
I think this whole debate hinges on peoples' view of that sentence. Sometimes clean history helps comprehensibility and sometimes it obscures things. I think the amount to which each is true varies author to author, reader to reader, and project to project.
Prescient observation. How does clean history obscure things though? You mean as failed experiments get removed? Important things ought to be mentioned in commit messages. Relevant things to document can be showcased like "Tried X but it turns out Y is better because Z." I often find that code alone is not enough to describe why something did or didn't work. One ends up having to explain in commit messages anyway.
Me too. And I also often find that commit messages alone are not enough to describe why something did or didn't work. Code and commit messages both help.
"Clean history" can obscure things when it leaves out information about the often messy process of creating the software. It's impossible to know ahead of time what information will and will not be useful when attempting to grok a piece of code in the future, so sometimes it makes sense to err on the side of more information, instead of less.
Really? That seems like an extraordinarily obtuse way to understand code. I would think comments directly the source files would be more useful. Commit history shows how they arrived at that result and that's what I would rather see there.
- a = 1;
- a = 7;
- a = 3;
2) one commit that says:
a = 3;
My point is that experimentation is slightly different from changing your mind about the whole implementation.
It is the same as writing your homework. You have a separate piece of paper where you make your experiments.
_All the time_. I read all the commits in the codebase I'm responsible for. I need to keep track of what people are doing and how the system is changing. This is also on top of the need for code review and ensuring each patch is correct.
Check out http://en.wikipedia.org/wiki/Atomic_commit#Atomic_Commit_Con...
You have NEVER finished work on a feature and you look at the history and there are 10's of commits with crappy commit messages and extremely minor changes? If so, good for you. Not so for me. I don't always squash ALL commits on my feature branch, but I often remove a good number, so the history looks helpful for my future self.
This is what I do and it's kind of required when you're using e.g. gerrit code reviews.
Furthermore, dirty branches lose you a lot of the power that having a good, clean history gives you. When you do a blame on a line of code, to figure out when the last change was, do you want to see the "fix whitespace to match style guide" commit that someone insert in the branch at the end, or the actual meaningful change that occurred earlier? If you don't squash your commits to deal with these kinds of issues, you lose a lot of the power and convenience that good history gives you.
There's more. One of Git's most powerful tools is bisect, but even in a VCS without an automated bisect, doing it manually can be useful to (I've done this in SVN before). If you have a regression, but have no idea what caused it, it can be very useful to bisect your commits; find a known good version and a known bad version, then go to the commit halfway in between, test that, and depending on whether that commit is good or bad, test the one halfway between that and the known good or known bad commit. Keep doing this until you find the commit that broke your code. But this process is seriously impeded if you have a bunch of half-done commits that implement a part of a feature but break something else that's fixed up three commits later.
The "history of experimentation" nature of VCS history is just not all that interesting. Think of your VCS history more as an extended form of comments, that document why everything is the way it is. If you actually wrote comments on every line describing why you had changed it in a particular way every time you changed it, your code would wind up being more than 90% comments in not too long. Most of the time, you don't need to see this; but when you are left wondering "hmm, why is this the way it is?", good history is invaluable. The experimental changes in between aren't all that useful; if you got any information from them, then feel free to summarize that in the cleaned up commit message after you've squashed them out.
Now, that's not to say that you should always produce perfect history while working on a branch. Feel free, when you're in exploratory coding mode, to make lots of checkpoint commits, experiments, and so on. Just clean it up before you present it for review and merge. The nice thing about Git is that you have your own local branches that no one else ever has to see, clean things up quickly and easily with "git rebase -i", and present a much nicer history when it's ready for merge.
If you want to mark new features or releases, use tags for that.
Whether you commit often or not does not change the fact that rebase is unnecessary to keep a clean history of features/releases and obscures real commit history.
You can have a clean history of features and/or releases with tags, without destroying commit history.
That's what the commit history is for. If you don't like seeing merges use git log --no-merges. You can use rebase to avoid seeing merge commits, but it's awfully unnecessary with the nasty side-effect of destroying history.
I was suggesting tags as way to keep an alternate history of features or releases. Features can be developed in separate branches for them, but you could tag features when you merge them in if you want an easy history of feature merges. You can list tags by date, use prefix's for sorting, etc.
What you want is a logical sequence of correct changes (or, as correct as anyone could tell at the time; of course no one's perfect).
If you you have to do code review, track down a bug by bisecting a commit history, or figure out what patches from one branch need to be ported to another, you want to have good history. False starts and fixes to typos from previous patches have no value; in fact, they have negative value, as they obscure the interesting information that a good history provides.
Cleaning up history really doesn't take that long. When something is about ready to merge, take a quick look through the history to figure out which patches are redundant or logically belong as part of previous patches, do a "git rebase -i", and squash them into the appropriate patches. In the process, make sure your commit message are actually good enough that someone doing a code review can actually follow what you're doing (no "fixed a bug in this function; fixed a bug in that function"; actually explain what you fixed and why your fix is the right one).
> make sure your commit message are actually good enough that someone doing a code review can actually follow what you're doing
Yes, this is a very important point for rebasing.
I don't know what you're going on about with this "destroying history" as if the sequence of your little typo mistakes are some kind of precious documentary that needs to be preserved in case some forensic expert wants to trace every step you made along the process of adding a widget. You might as well go find a system that records and tracks every key you type, because after all, every time you hit the backspace key, you are destroying history.
Tags do not keep alternate histories. They are simply labels on commits. You use them to mark certain commits as releases, you do not use them to track every logical change to the codebase. They are used sparingly to track the occasional version number bump as a result of a sufficiently large number of changes. These version tags do not provide the granularity I need when I look to see what is happening on a single branch at any point in time. To add them to every non-trivial commit as a way of distinguishing them from the just-dicking-around commits would be ludicrous.
edit: One more thing. I think it is absolutely silly to say in one comment "stop committing non-workable intermediate stuff and finish what you're doing before committing" and then turn around in another comment and talk about how rebase has a "nasty side-effect of destroying history". You do realize that all the editing and polishing you're doing before you make your commit is the same type of destroying history that would happen if you made small, incremental commits and then cleaned them up with rebase, right? The only difference is that your way is way more dangerous as far as losing history is concerned, and you're not taking advantage of any of the benefits of Git in the process.
Squashing is not the purpose of rebase. Rebase allows you to clean up history. Sometimes, that means _separating_ large commits into smaller, atomic ones. Sometimes that means re-ordering things to make more sense for the reader. And yes, sometimes, an atomic unit requires squashing two or more commits together.
Commits should be logical units of the codebase, not units of developer productivity over time.
git blame -w # works with git diff and git show too
That's exactly what `git branch` is for. For experimentation.
>that I possibly undo later in the branch.
That's exactly what `git branch -D` is for. Or even just `git checkout` and leave the branch there. That way if you ever change your mind you can revisit it.
Imagine we are both working on a project. I don't care to know that you merged 3 times from master yesterday before pushing your feature. Also I don't care to know details like you forgot to put a config file in your first commit and had to do a second one, or that it took you 3 commits to have the spelling alright in the UI.
Mainly that information is useful to you. It is also mostly only usable efficiently by you ( I will read your whole feature, most likely I will not be able to rollback to the middle of your change )
For example I commit several time an hour - my coworkers would be pissed if I make 10 commits for each minor feature I develop.
Couldn't there be some way to just tag the "main" commits and mark the dead ends as "extraneous" rather than destroying them? And then have your history-viewing tool hide/squash the unmarked "invisible" commits by default and only expose them when specifically requested?
I mean, it seems to make more sense to just look at a blind alley of commits you made and just flag them all as a mistake rather than actually rearranging the DAG.
In Mercurial, you can kindasorta have this if you do all your development on named branches, and only merge the named branches back into the default branch at these significant moments. Then, you can merge willy-nilly, without rebasing or otherwise destroying or lying about history, and distinguish ignorable work-in-progress merges from significant feature-complete merges by which branch they were on. Most query commands let you filter by branch, so you can easily do that.
For those not familiar with Mercurial, the difference that allows this is that Mercurial permanently records the name of the branch a commit was added to. That means there is an observable difference between merging A into B and merging B into A. This is not universally agreed to be a good feature, but it does allow this particular approach.
Then you just have to choose between having a single shared development branch, a branch per developer, a branch per story, a branch per task, etc, and come up with a coping strategy for any resulting proliferation of branches.
You could do exactly that; look into git-update-ref for how you could implement that so that garbage collection doesn't wipe out those dead ends (git-notes basically does what you would need to do).
Note that you would still be rewriting history, still rearranging the DAG, but you would have references to the old states. Basically like a permanent reflog, though perhaps with an interface tailored to this usage.
Just wanted to point out that rebasing is not just to remove or squash commits. I use it to:
- _separate_ large commits into atomic, logical units
- fixup changes missed the first time around
- rewrite commit messages to ensure they're clear
- reorder commits
So you do commit early and often, just that nobody else sees your commits until the feature is complete and working. And then only after you rearranged and squashed your commits. From the outside it looks like you always commit top quality code from the first try.
$ git rebase -i
(That's just how it is at my company. PCs aren't backed up, central repos are.)
Rebases on the official repo are not allowed and the per-user repos are public and can be used also for collaboration.
Some people use git add -i to be very selective about what they commit, and deliberately increase the number of independent commits, rather than have a single commit that has unrelated changes in it.
Other people have a lot of WIP commits.
We started out doing more rebasing, but do a lot less now, after having run into issues with having e.g. branched off of a branch that has since been rebased and merged (e.g. front end / back end feature split). You try to rebase, but the replay of commits continuously hits merge conflicts, and you spend about 30 minutes repeatedly fixing the same conflicts. And then there's the risk of your force push accidentally chopping off someone else's commit (though that's never happened).
We had one PITA case where a profanity was checked into the codebase, and branches with the offending commit in the history lurked in various places surprisingly long after the commit had been excised from the main trunk lines. Since we're a startup in the financial sector, our code will be in escrow situations, potentially examined by humorless auditors, we don't want profanities in.
But anyhow, just give people private repos on the server. What I do is push my private WIP branches to my home directory on the server, and once it's ready for code review and merge, push it to the central repository.
Less entagled workflow is easier to untangle and consequently understand even if it hides some stuff.
Also it's visual clutter. If your history looks like train map, your project is probably a train wreck.
It would be better to fix this, so you get nice diffs, blame etc., than deal with all the other issues rebase causes.
I'm not convinced on the bisect issue either. If your app is trivial, sure, but if the features are more complex, the squashed commits will be too chunky to narrow down as usefully as a fuller history can.
git bisect might work ok (I haven't used it extensively) but the more important is if you made a mistake and call someone to help, to ease that person's work by presenting a trimmed though more understandable tree. You should strive to keep the branching as simple as possible, but not simpler than that (to paraphrase an infinitely more smarter man).
We follow a similar branching model. Master is kept clean. We do all development on feature branches and merge when complete. History is fully preserved. History is a mess but it works well overall.
While this is often true, `git rebase` changes your commits. Your history is now a lie. I've been in the situation where a `git rebase` ended up breaking commits. It's possible to merge broken history with `git rebase`. Only a `git merge --no-commit` will give you the opportunity to tweak the merge commit so that the resulting merge isn't broken.
I don't think anybody ever recommends rebasing public code. There is a reason for that --force flag on git-push advises the user to use with care.
I mean, I could configure my development setup to automatically commit my persistent undo files so that every single keystroke I make is preserved for prosperity... but I don't do that of course. I could also configure my setup so that every time I write out a file it commits, preserving that history forever... but I don't of course. Why should a bunch of temporary local commits be preserved forever?
Using the terminology "rewriting history" to describe rebasing local commits is misleading. "deciding what history will be" is more accurate.
>I don't think anybody ever recommends rebasing public code.
This very article says it's acceptable to rebase a public feature branch.
# optional: feel free to rebase within your feature branch at will.
# ok to rebase after pushing if your team can handle it!
The problem with the article is that it's wording is imprecise.
On the other hand, being familiar with modern version control is a good idea, if only just in case you inherit code that will be shit.
On the other other hand, regressions occur and go unnoticed even on the best of projects.
You don't submit your first draft almost anywhere else, why do you think nailed it the first time writing your commits? Sometimes you don't get things right, and rebasing is one of the tools that helps you make sure that the written record of your work is helpful.
So git history is not necessarily "human history" but "engineering history" and as such, may be much more important than you think and "curating" it may be a mistake.
Except for the "merge --no-ff" I'm using exactly this model and I think it is great. It can't get any more simple than that and still have a working master.
Regarding the "straight line, clean" history, I've found that most people think that a straight line is "the" history to have. I have no idea what to tell them.
Even if more than one developer is involved, the model doesn't change: "master and feature" branches become "feature and developer-private-feature". The developer still relies on a (partial) working feature branch, and his contribution still has to be self-contained.
But since you have more developers you will have an out of band sync communication between them.
Which I assume means it's only you working on it. Not all topic branches have multiple committers.
I've learned about squashing here (the specific page is dead now): http://www.denx.de/wiki/U-Boot/CustodianGitTrees
untothebreach explains it well . When you are working on changes which are not public yet, sometimes it's helpful to rebase them into a single or fewer number of commits. Period.
No one is arguing to always use rebase and never use merge.
Ok, thats slightly tongue in cheek, but I think its a subtle balance between clean and accurate history, some people are happy to use rebase to, some are not. In my view, rebasing is no different to using undo in your editor, but I appreciate some people feel differently.
1. You rollback exactly to the same point as you would without the rebase - one commit before the merge point.
2. This is also obvious - it was brought up to date one commit before the merge point (by definition of this workflow).
Have you ever had an argument that was just mediocre? But later you think of a witty retort that would have been just perfect. Thats what rebasing is. Its re-structuring the conversation the way you would have liked it to go.
If you restructured the "I Have a Dream" speech, you'd be hacking up history. If Martin Luther King Jr restructured the speech prior to August 28, 1963, as I am sure he did many times, history remains untarnished.
It's the rebase-and-fast-forward merge strategy that causes problems. :)
I care more about the merge then the commits. (In github parlance, the Pull Request) This workflow turns commits throughout time into a single change against the master (while still preserving the commit history). Conceptually this is a lot easier for me to see what's changing, and to deal with issues. So I guess I disagree with point #1)
In a continues delivery environment this workflow makes sense because the log accurately shows the history of the changes to the app in production instead of the development environments. (Merge, Merge, Merge)
#2 doesn't make sense to me, if you rebase you're up to date with master as of the branching commit.
Whatever works best for your team!