Hacker News new | comments | show | ask | jobs | submit login
Git pull --rebase until it hurts (jrb.tumblr.com)
156 points by johnb 1422 days ago | hide | past | web | 67 comments | favorite

Am I the only one here who's frustrated by this entire discussion? I've a very strong underbelly feeling that we should simply build tooling that make these entire discussions unnecessary.

I don't mean a non-sucky CLI for git. I mean something more fundamental, something that connects with common programming workflows so well that we can stop discussing the tool altogether.

I'm not sure what that would be, but I hope that one day someone smarter than me will invent it.

I've been using bzr for years and its porcelain is much, much more intuitive. I'm amazed how much we're willing to put up with with git because it's popular.

Bzr just does what you mean. Revert reverts, pull pulls, merge merges. I don't have to remember whether I need a soft or hard reset or which takes a file as an argument or which doesn't or which can potentially destroy my changes (also, no command in bzr ever destroys your changes, not even hard reset, it keeps backups you must delete yourself).

The main benefit of Bzr in this context is that it inherited Arch's concept of strictly linear, physically separate branches [1] rather than dealing with an arbitrary DAG.

This makes for a simpler mental model and also makes it simpler to keep separate things separate (it also has its downsides, but life is full of trade-offs). It also makes it easier to visualize the revision history and allows one to identify versions globally via branch + serial number rather than a hash.

It is rather unfortunate that Bzr development has stagnated and that DAG-based tools (Git and Mercurial) are the only major players left. Different workflows and organizational requirements benefit from different tools, and the Git/HG monoculture has started to worry me a bit.

[1] To be clear, Bzr has added co-located branches as an option since then on its own.

You mean Darcs [0], which kind-of-sort-of does the rebase automagically. Whenever it can, at least.

Fortunately, Git does 99% of that, and with rebase it's just the right for the job. Especially with git-rerere enabled.

As far as I am concerned, the need to (sometimes!) do rebase by hand is artifact of Git's commit history being strictly ordered by time. But just try to remove that constraint, and whomever considered rebase complex, will go completely crazy ^^

[0] http://darcs.net/

Came here to say this, and drop this video[1] as well. It explains the differences between git and darcs/camp using an example editing session with both tools.

[1]: http://projects.haskell.org/camp/unique

But what about changes that depend on each other that don't happen to edit the same line? Neither git nor camp will be able to detect that (since it relies on semantic knowledge of your application). Merge conflicts are places where the VCS can't even be wrong about what to do, not places where it won't be right.

Yep. Sadly, camp seems pretty much dead.

I found darcs quite intuitive to use and not crazy at all. Yes, its not ordered by time, and so is your rewritten history.

Git's commit history being strictly ordered by time

That's the default of git log output but can be adjusted via --topo-order and --date-order. It's also worth pointing out that git commits have two timestamps, the author timestamp which is not normally affected by rebasing/amending, and the commit timestamp which is reset by rebasing/amending. Git log (again by default) shows the author timestamp but orders by commit timestamp.

This. I get a similar feeling when I read about the latest javascript workflow innovation.

It signals to me that these tools are complicated in a way that will irritate me and that I should avoid these topics until the smart people have fixed the problem and reached a consensus.

Hey. It worked for CSS. I avoided it for a couple of years and most of the problems have been solved for me. ;-)

Are you still waiting until the vim vs emacs vs Sublime Text 2 debate reaches consensus? Or the Eclipse vs IntelliJ vs IDEA vs VisualStudio debate?

Workflows are very specific and personal. I do not believe that there will be any consensus.

The problem that --rebase patches up is the ugliness of merge bubbles in git. As soon as you have a merge bubble, tools like gitk won't tell you which is the actual mainline because it's not recorded, and you end up trying to make sense of a tangled mess of a tree. It would all be a non-issue if git could tell which side of the merge bubble was some developer's private commits and hide them away.

Bazaar is one revision control system that does this right. Each commit is tagged with the branch to which it belongs, so any visualization of the commit graph will by default hide all the side commits. History appears neat and linear at first, but if you need to track something down to the original commits, you can expand the merges to see the exact order in which commits were made.

Nobody uses beta max anymore except me. git rulez! etc.

Agreed. Git rebase is patching the problem at the wrong end. The problem of elaborate history trees should be solved by grouping & summarizing or hiding sets of commits at the display end, not by rewriting history. Once you have a way of summarizing commits, you can start to do things like integrating version control with the undo of the text editor (with every keystroke being a commit). Another problem with current version control is that it does not understand the structure of code, which makes merges involve much more manual effort than necessary.

I have a different reading of the discussion. What I see is that lots of people have different philosophies and workflows, all of which are being supported by git, which is a good thing. What happens when your one-right-tool picks a workflow that I don't like?

The question is, are there better abstractions that would give us all the power we need while simplifying things? Or is to simplify a straight trade on how many use cases the tool will support?

i suspect the latter, but i also suspect that there exists an abstraction that makes things significantly simpler while allowing, say, the 95% most common workflows.

These suspicions aren't really based on anything, though.

I may be wrong (and I'm too lazy to search) but didn't Linus say that the git command-line is basically a series of dev tools for someone to build a "proper" interface over the top of?

> Am I the only one here who's frustrated by this entire discussion? I've a very strong underbelly feeling that we should simply build tooling that make these entire discussions unnecessary.

This feeling lasted about five minutes. Then I moved on doing real work.

PS: you won't be able to pry rebase -i and --onto from my cold, dead hands because I'll be clinging on them all the way into whatever form of afterlife.

What is a "strong underbelly feeling"?

It exists and it's called Mercurial :p

yah, programs should read programmers' mind and do what we want automagically.

Like I said on the other thread, tread carefully friends; there's dogma at work here.

Also, take a step back and look at the history of git. Git was created by Linus Torvalds specifically for Linux kernel development. I'd argue that a key reason that the kernel is so successful is because people are able to maintain history as a first-class entity in their project. The idea the you can 'rebase -i' to build up small, neat commits that will almost always apply cleanly to a sane codebase is wonderful. The fact that I don't need extreme foresight to capture my meaningful units of work into individual commits means that years from now I can look back and see what I was actually doing instead of "wait, was that line deleted as part of the feature, or was he just cleaning up warnings?"

Remember that these features aren't for developers, they're for maintainers. If you want your code in the kernel, you follow the kernel development process or GTFO. Linus doesn't sit around saying "shucks darn, it didn't merge cleanly, I guess I'll go fix it for them." He just doesn't have the time, and neither do his "deputies."

That's not to say that these features don't benefit developers; they do. It's just that you need to have seen them in action to understand why.

And finally, I'm genuinely curious... Why are some people so obsessed with perfect preservation of history? Is this some sense of fear/paranoia? In practice I've never found project history to be useful without modification, so what am I missing? What are people trying to preserve?

And finally, I'm genuinely curious... Why are some people so obsessed with perfect preservation of history? Is this some sense of fear/paranoia? In practice I've never found project history to be useful without modification, so what am I missing? What are people trying to preserve?

I think it's a conflation of having something like incremental backups versus having (as you so eloquently put it) a cleaned up log of development. Sure, you can use a VCS to record the minutiae of every little thing that changes so you have a "snapshot" of the code at any point in time. And git will do that if you want it.

But I'd also have to second your thoughts that git is VCS done right, that is, by maintainers. All code will have to be maintained sooner or later, and as someone who has had to maintain plenty of code, I can tell you I don't care at all about every little change that's made. Even when I'm bisecting a bug, I don't want to have to skip over every stupid bit that was twiddled, or see commits that are immediately reverted by the next commit. That's garbage. I want to see conceptual chunks, things that hang together because a human thought of them in the terms of "this is a feature" or "this fixes a bug". Should commits make the Minimum Necessary Change? Yes. Should a new feature or bug fix be split across several commits, possibly separated by other, unrelated commits, because that's the way some sleep deprived programmer thought of them? Do you like to read author's notes about their novels instead of the edited novels?

I can't speak for everyone, but the main reason I'm interested in a reasonably perfect preservation of history is to account for every line of code in the respositoy and why its there. I think there is a difference between the consumer of a library and not caring about the internals, and being actively involved in the development of a library. Being able to look back in time and see what state a file was in when it was change, what was changed, who changed it, and the reason for the change(with possibly more metadata of links to tickets/bugs/stories) is very valuable before I start mucking around and changing code.

To me, its the same as testing code. You don't need tests when things work perfectly. You only need tests/history when things aren't... And then you are seriously happy you have them.

On the topic of `git pull --rebase`, I think if you have a hard-and-fast rule that you employ without thinking about what you are doing to your commits and the state of the repository then you are doing it wrong (whether that is blindly merging or rebasing)... But that's just me.

> to account for every line of code in the repository and why it's there.

I've found that on projects which disallow the modification of history answering this question is more difficult than if each committer was responsible for recomposing their commits before merging their features (preferably a FF-merge, of course). Meaningful/useful code isn't lost as you're not modifying the long-term history of the project, just your own recent commits relative to the task at hand. Authorship isn't lost, as even if the recomposition is handled by another person, you can always set the author for a commit arbitrarily, and indicate your presence as the maintainer by signing.

Put differently, responsible devs never modify other people's history (and unless you're sharing the same machine, git makes this difficult with push vs push -f). They modify their own history in an effort to limit the noise that other devs are exposed to and to make the maintainer's job easier. The goal is to treat the repository as a full-fledged mechanism for communication and coordination with the rest of the team.

I agree. It doesn't matter what order a line of code was added to the system in, it matters why it was added. When I can take the 15 commits I played with solutions (adding code, nuking code, etc) and slim it down the the one set of code that just works, I've saved everybody who looks at it significant effort in figuring out what I was thinking.

There is some information lost in the process, since you can never see what I did that failed, but if you were to add up the amount of time spent redoing failed experiments and subtract it from the amount if time spent wading through experimental, dead commits, my experience says you wind up with a large balance of time wading through junk. Or those experimental changes never get committed, so you the developer wastes time copying files around to make backups and you still don't know the failed experiments.

I think there are plenty of workflows that make sure everything is accounted for, without cluttering things up with unimportant information.

For example, have a central repo that is the source of immutable history, and have every developer clean up their history into a small linear set of commits before they merge into that. You still have just as much accountability -- nothing can get into master without a developer looking at it and tagging it with a commit message. It's just that the commit message comes from a developer looking at and curating the work he just did on a feature or bugfix, instead of the vague assumptions and notions he was working with during development.

If you think people looking back on their recent work will be better at summarizing their motives and achievements than they were while working and experimenting, as I do, then rewriting local history makes a lot of sense. If you don't trust people, and think they are likely to lose relevant information by haphazardly rebasing with messages like "squash for pushing to master, bug #1933" then you might not.

All in all, I think that, for example, 2 clean messages from 2 developers (even relatively uninformative ones) are better to look at than 1 commit from one developer and 13 from the other with messages like "first stab at xyz" and "Oops, forgot to also change the name here".

Maybe I'm projecting, but I think the main point of doing a --rebase on every pull is that if you have upwards of 20 or so developers constantly doing a pull without a rebase, you will have a lot of merge commits that are essentially worthless. Especially because they'll probably just be the default message.

So, sure, falling back to the merge when things went wrong is ok and all, but odds are high you should go ahead and relook at all of your commits anyway. (Another thing, doesn't the rebase keep the initial author date? It isn't like the history is completely fabricated at this point.)

Of course, I'm a big fan of git rebase -i to do some basic cleanup of your commits before pushing. Leave an excessive amount of log messages in? Rebase them out. Neglect basic documentation since you weren't sure if things were going to change? Rebase them in. Sure, I can sympathise with the "you are messing with history" argument, but I find it challenging to believe that I actually care that you commented last. Or that you actually had a few extra helper classes at some point. etc.

What's worse than all the worthless merge commits is merge commits with actual functional changes introduces while resolving merge conflicts, but still with just the default merge commit message.

> if you have upwards of 20 or so developers constantly doing a pull without a rebase

Or 2 developers; it's still just as annoying. As long as you and the other developer are working at the same time you'll have almost as many merges.

> It’s essentially an anonymous branch. ... Maybe you should have explicitly branched, but hey, we’re all human.

This is the real key here. Most don't really want git-merge(1) or git-rebase(1). They want git-go-back-and-extract-my-commits-into-a-topic-branch(1).

If you have some commits:

Where B is master and origin/master, and you decide you want to make C..G into a topic branch, topicA:

git branch topicA

    A-B-C-D-E-F-G (master)
    A-B-C-D-E-F-G (topicA)
git reset --hard B

    A-B (master)
    A-B-C-D-E-F-G (topicA)
git merge --no-ff topicA

    A-B-----------H (master)
       \         /
        C-D-E-F-G (topicA)

    git checkout -b a-topic-branch
    git checkout master
then you reset master to the remote master, there is a command to do it but I do this kind of things with gitk.

    git checkout master
    git reset --hard origin/master

Assuming you're already on master, you can replace those 2 commands with a single one: git branch a-topic-branch

Is there any way to track both concepts with Git? The logical commit history, like what rebase will produce, and the physical Git history.

The reflog tracks this locally, but is there any way to push it alongside commits centrally so that the people who wish to preserve a physical development commit history can achieve that? I imagine it will work something like: by default, you see the logical history; but if you wish to delve into the physical history (including a history of who ran rebase commands, and when), you could do that.

Does this make sense and would it be valuable?

There's another post on the frontpage saying to NOT use rebase.

This video comes to mind: http://www.youtube.com/watch?v=CDeG4S-mJts

Git is fast, but it's a clusterfuck of weird command calls and esoteric flags. I kind of miss Mercurial in this regard, but I had to make the switch due to the popularity of Github. Having open source projects is a very nice way to show potential employers that you are a good asset.

I switched to BitBucket a while ago and have been so happy. It has free private repositories, HG or Git, and most other important features that GitHub has.

Everything about GitHub is great except for the fact that you have to use Git.

This is what recent versions of GitHub for Windows does by default. There are definitely advantages to merge commits, the biggest one being that force-undoing an unwanted merge commit is as straightforward as resetting to the first parent of the merge commit.

or cherry-picking features across version branches.

I do the same thing really, because I'm lazy, it's easy, and it usually doesn't make much difference to how I'm using git. However, different strokes for different folks.

Sounds like the workflow of a developer working by himself.

I have 40 developers working in my company, all doing pull --rebase, I even blocked trivial merges on the server itself (see my answer at http://stackoverflow.com/a/8936474/258689)

Laziness is only acceptable when you work alone.

If you are curious, check out this project: https://github.com/orefalo/g2

You know, we wouldn't even be having this discussion if people just didn't commit work in progress onto their upstream tracking branches in the first place.

People do this? Like put unfinished clumps of code out in the wild? GitHub is not your personal backup drive!

Along the same lines, what is the point of the "it builds" widgets that I'm seeing lately? Unless you have some kind of stable release available, it had better build.

The central problem here is neglecting to identify the purpose of branches [1] and a haphazard attitude toward "merging from upstream" [2,3].

If you use topic branches for every feature and bug fix, then you can even test them in an integration branch (often called 'next') so that they can interact with other new features before graduating to 'master'. This makes 'master' more stable which is good for users and good for developers because they can be more confident that a bug in their topic branch was introduced in their branch. It is also easier to make releases.

Use of a 'next' integration branch also relieves some of the pressure from merging new features. Other developers' _work_ is not affected if 'next' is broken and the merge can be reverted without impacting the history that ultimately makes it into 'master'. Running 'git log --first-parent master' [4] will show only merges, one per feature, and each feature has already been tested in 'next', interacting with everything in 'master' as well as other new features. See gitworkflows(7) [5] for more on 'master'/'next'.

If we acknowledge that 'master' (and possibly 'next') are only for integration, then we don't have the problem of 'git pull' creating a funny merge commit because we're developing in a topic branch, but the same behavior occurs when we run 'git merge master' (or 'git pull origin master'). This is a merge from upstream and usually brings a lot of code that we don't understand into our branch. These "just keeping current" commits annoy Linus [2,3] because they do not advance the purpose of the topic branch ("to complete feature/bugfix X so that it can be merged to 'master'"). Linus' short and sweet rule of thumb [3] is

    If you cannot explain what and why you merged, you
    probably shouldn't be merging.
We can usually only explain a merge from upstream when we (a) merge a known stable point like a release or (b) merge because of a specific conflict/interaction, in which case that should go into the merge commit. If you use 'git merge --log', merges from topic branches contain a nice summary while merges from upstream usually have hundreds or thousands of commits that are unrelated to the purpose of your branch.

[1] http://gitster.livejournal.com/42247.html (Junio Hamano: Fun with merges and purposes of branches)

[2] http://lwn.net/Articles/328436/ (Rebasing and merging: some git best practices)

[3] http://yarchive.net/comp/linux/git_merges_from_upstream.html (Linus Torvalds: Merges from upstream)

[4] http://git-blame.blogspot.com/2012/03/fun-with-first-parent.... (Junio Hamano: Fun with --first-parent)

[5] https://www.kernel.org/pub/software/scm/git/docs/gitworkflow...

How do you know when to merge 'next' to master? It seems to me like you have the exact same problem as before, only now you're being interrupted because someone else broke next instead of master.

I could see it making more sense if you're on a well understood periodic release cycle, where breaking next isn't critical, and everyone knows to have it stabilized in time for the next release.

You never merge 'next' to 'master'. You merge topic branches when the topic is considered to be complete and stable (it "graduates"). The rerere [1,2,3] feature (a fantastic set-and-forget feature) ensures that you won't have to resolve the same conflict multiple times.

The amount of time required for a topic to stabilize in 'next' depends on the topic and what it affects, but you can easily summarize "branches in next, but not in master" to look for candidates.

Feature releases are tagged on 'master' and 'next' is usually rewound at a release (create a new 'next' branch starting at the release, merge all the branches that failed to graduate in this release cycle, and discard the old 'next'). This is easy to automate.

[1] http://git-scm.com/2010/03/08/rerere.html

[2] http://www.kernel.org/pub/software/scm/git/docs/git-rerere.h...

[3] http://gitster.livejournal.com/41795.html

The central problem here is neglecting to acknowledge that other people and projects may have different needs and workflows. Linus doesn't hold the git truth, that's just how he decided to manage his project. Thankfully, one of git's redeeming features is that it's very flexible and can accommodate different ways of using it.

I'm not a fan of rebasing as it makes for a confusing git history when you are working with Gitflow. I find it much nicer to see the merge bubbles which indicate how features were introduced into a release. Flattening the history makes it tricky to get a clean overview and pick precisely when certain actions were performed.


When merging a rebased feature branch, make sure to use merge --no-ff so that a merge commit is introduced even though fast forwarding could be done.

Yeah. I like that and the approach is outlined in this short guide I found:


i.e. before merging a feature branch, always rebase it on the tip of the integration branch, then merge it in with --no-ff to record an explicit merge commit on the integration branch, even though a fast-forward is possible. This gets you the temporal straightforwardness of rebase while preserving the fact that there WERE feature branches and their commits are partitioned in history.

You nailed it. Commit history is for people to read.

Check out git flow, you might like it. It could add even more structure and readability to your codebase history.

Do you agree with my edit? I'm no git pro, so still trying to get things straight in my mind.

Yes, --no-ff merge after a rebase gives a clear indication that's a feature merged from a feature branch. It's easy to cherry-pick it to another branch (for example for a backport to an old version), easy to bisect this branch or remove the entire feature.

In this debate I am a strong rebase advocate (though more than that I'm for very carefuly and actively avoiding there being remote head contention - for instance by having feature branches with clear owners, or having a PR and code-review based integration style), but when merging features in these should definitely be (--no-ff) merges.

It's not a dichotomy, it's about clear semantics - what a feature branch is (clearly defined linear progress off of an upstream) what a merge means (integrating that progress and vetting the result).

I feel like this echoes the tables vs divs debate: use a table for tabular data and position containers for layout.

In both cases, there are some fringes who argue that you should do one or the other for both use cases - semantics be damned. Git is newer so the fringes are just bigger.

I usually suggest to change the config to avoid that people forgot the --rebase arguments

git config branch.master.rebase true git config branch.develop.rebase true

This will make any pull be a pull --rebase on the master/develop

Doing stuff like that is dangerous in my opinion. People may forget that they're actually doing something different from what they typed.

Explicit is better than implicit.

But even typing git pull isn't canonical, bar what the defaults are. git pull pretty much does a git fetch && git merge for you.

Yes but that's the default behaviour of git pull so it's expected to fetch and merge when you pull. Changing the default behaviour can lead to confusion or mistakes.

But as noselasd said git pull = Fetch + merge

Setting up the rebase in config of a specific branch stay explicit because Git will

If the rebase is not straight forward then you can still abort it.

I would love that git has a config feature to force ff-only on pull but base on what I know you need to create an alias to have `pull --ff-only` replacing `pull`

I'm very confused. I work entirely from private feature branches; I use GitHub pull requests to manage merging those into master, but never touch master myself.

Does this fit into the above workflow at all, or is it only for those who are working off master or sharing branches with other developers?

(I usually follow something approximating this flow: http://julio-ody.tumblr.com/post/31694093196/working-remotel...)

It generally only applies to people sharing branches with others, whether they be master, feature, or other types of branches.

I still prefer trunk based development with very frequent commits and a strong test suite. Write 5 lines of code and a test, commit. When everyone is doing this, continuous integration is running and QA is testing continuously most problems get found fast. Merging is easy as well because all the changes are so small. For stability of the system and speed of development this is works pretty well.

Whats so wrong with a merge? Git is made for it. Sometimes things are nonlinear. I prefer to roll-forward anyway instead of rollback.

It makes figuring out what happened much harder. When 3 people in a row make some small change to a comment or log output, I'd like to just see those 3 rather than follow both sides of a merge and walk the tree endlessly only to realize I've wasted my time on a bunch of commits that couldn't possibly be causing the production bug.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact