

Git pull --rebase until it hurts - johnb
http://jrb.tumblr.com/post/49248876242/git-pull-rebase-until-it-hurts

======
jedbrown
The central problem here is neglecting to identify the _purpose_ of branches
[1] and a haphazard attitude toward "merging from upstream" [2,3].

If you use topic branches for every feature and bug fix, then you can even
test them in an integration branch (often called 'next') so that they can
interact with other new features before graduating to 'master'. This makes
'master' more stable which is good for users and good for developers because
they can be more confident that a bug in their topic branch was introduced in
their branch. It is also easier to make releases.

Use of a 'next' integration branch also relieves some of the pressure from
merging new features. Other developers' _work_ is not affected if 'next' is
broken and the merge can be reverted without impacting the history that
ultimately makes it into 'master'. Running 'git log --first-parent master' [4]
will show only merges, one per feature, and each feature has already been
tested in 'next', interacting with everything in 'master' as well as other new
features. See gitworkflows(7) [5] for more on 'master'/'next'.

If we acknowledge that 'master' (and possibly 'next') are only for
integration, then we don't have the problem of 'git pull' creating a funny
merge commit because we're developing in a topic branch, but the same behavior
occurs when we run 'git merge master' (or 'git pull origin master'). This is a
merge from upstream and usually brings a lot of code that we don't understand
into our branch. These "just keeping current" commits annoy Linus [2,3]
because they do not advance the purpose of the topic branch ("to complete
feature/bugfix X so that it can be merged to 'master'"). Linus' short and
sweet rule of thumb [3] is

    
    
        If you cannot explain what and why you merged, you
        probably shouldn't be merging.
    

We can usually only explain a merge from upstream when we (a) merge a known
stable point like a release or (b) merge because of a specific
conflict/interaction, in which case that should go into the merge commit. If
you use 'git merge --log', merges from topic branches contain a nice summary
while merges from upstream usually have hundreds or thousands of commits that
are unrelated to the purpose of your branch.

[1] <http://gitster.livejournal.com/42247.html> (Junio Hamano: Fun with merges
and purposes of branches)

[2] <http://lwn.net/Articles/328436/> (Rebasing and merging: some git best
practices)

[3] <http://yarchive.net/comp/linux/git_merges_from_upstream.html> (Linus
Torvalds: Merges from upstream)

[4] [http://git-blame.blogspot.com/2012/03/fun-with-first-
parent....](http://git-blame.blogspot.com/2012/03/fun-with-first-parent.html)
(Junio Hamano: Fun with --first-parent)

[5]
[https://www.kernel.org/pub/software/scm/git/docs/gitworkflow...](https://www.kernel.org/pub/software/scm/git/docs/gitworkflows.html)

~~~
russell_h
How do you know when to merge 'next' to master? It seems to me like you have
the exact same problem as before, only now you're being interrupted because
someone else broke next instead of master.

I could see it making more sense if you're on a well understood periodic
release cycle, where breaking next isn't critical, and everyone knows to have
it stabilized in time for the next release.

~~~
jedbrown
You _never_ merge 'next' to 'master'. You merge topic branches when the topic
is considered to be complete and stable (it "graduates"). The rerere [1,2,3]
feature (a fantastic set-and-forget feature) ensures that you won't have to
resolve the same conflict multiple times.

The amount of time required for a topic to stabilize in 'next' depends on the
topic and what it affects, but you can easily summarize "branches in next, but
not in master" to look for candidates.

Feature releases are tagged on 'master' and 'next' is usually rewound at a
release (create a new 'next' branch starting at the release, merge all the
branches that failed to graduate in this release cycle, and discard the old
'next'). This is easy to automate.

[1] <http://git-scm.com/2010/03/08/rerere.html>

[2] [http://www.kernel.org/pub/software/scm/git/docs/git-
rerere.h...](http://www.kernel.org/pub/software/scm/git/docs/git-rerere.html)

[3] <http://gitster.livejournal.com/41795.html>

------
skrebbel
Am I the only one here who's frustrated by this entire discussion? I've a very
strong underbelly feeling that we should simply build tooling that make these
entire discussions unnecessary.

I don't mean a non-sucky CLI for git. I mean something more fundamental,
something that connects with common programming workflows so well that we can
stop discussing the tool altogether.

I'm not sure what that would be, but I hope that one day someone smarter than
me will invent it.

~~~
dexen
You mean Darcs [0], which kind-of-sort-of does the rebase automagically.
Whenever it can, at least.

Fortunately, Git does 99% of that, and with rebase it's just the right for the
job. Especially with git-rerere enabled.

As far as I am concerned, the need to (sometimes!) do rebase by hand is
artifact of Git's commit history being strictly ordered by time. But just try
to remove that constraint, and whomever considered rebase complex, will go
completely crazy ^^

[0] <http://darcs.net/>

~~~
seliopou
Came here to say this, and drop this video[1] as well. It explains the
differences between git and darcs/camp using an example editing session with
both tools.

[1]: <http://projects.haskell.org/camp/unique>

~~~
aidenn0
But what about changes that depend on each other that don't happen to edit the
same line? Neither git nor camp will be able to detect that (since it relies
on semantic knowledge of your application). Merge conflicts are places where
the VCS can't even be wrong about what to do, not places where it won't be
right.

------
benjamincburns
Like I said on the other thread, tread carefully friends; there's dogma at
work here.

Also, take a step back and look at the history of git. Git was created by
Linus Torvalds specifically for Linux kernel development. I'd argue that a key
reason that the kernel is so successful is because people are able to maintain
history as a first-class entity in their project. The idea the you can 'rebase
-i' to build up small, neat commits that will almost always apply cleanly to a
sane codebase is _wonderful_. The fact that I don't need extreme foresight to
capture my meaningful units of work into individual commits means that years
from now I can look back and see what I was actually doing instead of "wait,
was that line deleted as part of the feature, or was he just cleaning up
warnings?"

Remember that these features aren't for developers, they're for maintainers.
If you want your code in the kernel, you follow the kernel development process
or GTFO. Linus doesn't sit around saying "shucks darn, it didn't merge
cleanly, I guess I'll go fix it for them." He just doesn't have the time, and
neither do his "deputies."

That's not to say that these features don't benefit developers; they do. It's
just that you need to have seen them in action to understand why.

And finally, I'm genuinely curious... Why are some people so obsessed with
_perfect_ preservation of history? Is this some sense of fear/paranoia? In
practice I've never found project history to be useful _without_ modification,
so what am I missing? What are people trying to preserve?

~~~
32bitkid
I can't speak for everyone, but the main reason I'm interested in a reasonably
perfect preservation of history is to account for every line of code in the
respositoy and _why_ its there. I think there is a difference between the
consumer of a library and not caring about the internals, and being actively
involved in the development of a library. Being able to look back in time and
see what state a file was in when it was change, what was changed, who changed
it, and the reason for the change(with possibly more metadata of links to
tickets/bugs/stories) is very valuable before I start mucking around and
changing code.

To me, its the same as testing code. You don't need tests when things work
perfectly. You only need tests/history when things aren't... And then you are
seriously happy you have them.

On the topic of `git pull --rebase`, I think if you have a hard-and-fast rule
that you employ without thinking about what you are doing to your commits and
the state of the repository then you are doing it wrong (whether that is
blindly merging _or_ rebasing)... But that's just me.

~~~
benjamincburns
> to account for every line of code in the repository and _why_ it's there.

I've found that on projects which disallow the modification of history
answering this question is more difficult than if each committer was
responsible for recomposing their commits before merging their features
(preferably a FF-merge, of course). Meaningful/useful code isn't lost as
you're not modifying the long-term history of the project, just your own
recent commits relative to the task at hand. Authorship isn't lost, as even if
the recomposition is handled by another person, you can always set the author
for a commit arbitrarily, and indicate your presence as the maintainer by
signing.

Put differently, responsible devs never modify other people's history (and
unless you're sharing the same machine, git makes this difficult with push vs
push -f). They modify their own history in an effort to limit the noise that
other devs are exposed to and to make the maintainer's job easier. The goal is
to treat the repository as a full-fledged mechanism for communication and
coordination with the rest of the team.

~~~
SoftwareMaven
I agree. It doesn't matter what order a line of code was added to the system
in, it matters _why_ it was added. When I can take the 15 commits I played
with solutions (adding code, nuking code, etc) and slim it down the the one
set of code that _just works_ , I've saved everybody who looks at it
significant effort in figuring out what I was thinking.

There is some information lost in the process, since you can never see what I
did that failed, but if you were to add up the amount of time spent redoing
failed experiments and subtract it from the amount if time spent wading
through experimental, dead commits, my experience says you wind up with a
large balance of time wading through junk. Or those experimental changes never
get committed, so you the developer wastes time copying files around to make
backups and you still don't know the failed experiments.

------
taeric
Maybe I'm projecting, but I think the main point of doing a --rebase on every
pull is that if you have upwards of 20 or so developers constantly doing a
pull without a rebase, you will have a lot of merge commits that are
essentially worthless. Especially because they'll probably just be the default
message.

So, sure, falling back to the merge when things went wrong is ok and all, but
odds are high you should go ahead and relook at all of your commits anyway.
(Another thing, doesn't the rebase keep the initial author date? It isn't like
the history is completely fabricated at this point.)

Of course, I'm a big fan of git rebase -i to do some basic cleanup of your
commits before pushing. Leave an excessive amount of log messages in? Rebase
them out. Neglect basic documentation since you weren't sure if things were
going to change? Rebase them in. Sure, I can sympathise with the "you are
messing with history" argument, but I find it challenging to believe that I
actually care that you commented last. Or that you actually had a few extra
helper classes at some point. etc.

~~~
plorkyeran
What's worse than all the worthless merge commits is merge commits with actual
functional changes introduces while resolving merge conflicts, but still with
just the default merge commit message.

------
derefr
> It’s essentially an anonymous branch. ... Maybe you should have explicitly
> branched, but hey, we’re all human.

This is the real key here. Most don't really want git-merge(1) _or_ git-
rebase(1). They want git-go-back-and-extract-my-commits-into-a-topic-
branch(1).

~~~
EdiX

        git checkout -b a-topic-branch
        git checkout master
    

then you reset master to the remote master, there is a command to do it but I
do this kind of things with gitk.

~~~
michaelmior

        git checkout master
        git reset --hard origin/master

------
jcrites
Is there any way to track both concepts with Git? The logical commit history,
like what rebase will produce, and the physical Git history.

The reflog tracks this locally, but is there any way to push it alongside
commits centrally so that the people who wish to preserve a physical
development commit history can achieve that? I imagine it will work something
like: by default, you see the logical history; but if you wish to delve into
the physical history (including a history of who ran rebase commands, and
when), you could do that.

Does this make sense and would it be valuable?

------
sergiotapia
There's another post on the frontpage saying to NOT use rebase.

This video comes to mind: <http://www.youtube.com/watch?v=CDeG4S-mJts>

Git is fast, but it's a clusterfuck of weird command calls and esoteric flags.
I kind of miss Mercurial in this regard, but I had to make the switch due to
the popularity of Github. Having open source projects is a very nice way to
show potential employers that you are a good asset.

~~~
gagege
I switched to BitBucket a while ago and have been so happy. It has free
private repositories, HG or Git, and most other important features that GitHub
has.

Everything about GitHub is great except for the fact that you have to use Git.

------
xpaulbettsx
This is what recent versions of GitHub for Windows does by default. There are
definitely advantages to merge commits, the biggest one being that force-
undoing an unwanted merge commit is as straightforward as resetting to the
first parent of the merge commit.

~~~
32bitkid
or cherry-picking features across version branches.

------
orefalo
Sounds like the workflow of a developer working by himself.

I have 40 developers working in my company, all doing pull --rebase, I even
blocked trivial merges on the server itself (see my answer at
<http://stackoverflow.com/a/8936474/258689>)

Laziness is only acceptable when you work alone.

If you are curious, check out this project: <https://github.com/orefalo/g2>

------
nicholassmith
I do the same thing really, because I'm lazy, it's easy, and it usually
doesn't make much difference to how I'm using git. However, different strokes
for different folks.

------
bdcribbs
You know, we wouldn't even be having this discussion if people just didn't
commit work in progress onto their upstream tracking branches in the first
place.

~~~
gagege
People do this? Like put unfinished clumps of code out in the wild? GitHub is
not your personal backup drive!

Along the same lines, what is the point of the "it builds" widgets that I'm
seeing lately? Unless you have some kind of stable release available, it had
better build.

------
TeamMCS
I'm not a fan of rebasing as it makes for a confusing git history when you are
working with Gitflow. I find it much nicer to see the merge bubbles which
indicate how features were introduced into a release. Flattening the history
makes it tricky to get a clean overview and pick precisely when certain
actions were performed.

-imo

~~~
mzl
When merging a rebased feature branch, make sure to use merge --no-ff so that
a merge commit is introduced even though fast forwarding could be done.

~~~
frou_dh
Yeah. I like that and the approach is outlined in this short guide I found:

<http://williamdurand.fr/2012/01/17/my-git-branching-model/>

i.e. before merging a feature branch, always rebase it on the tip of the
integration branch, then merge it in with --no-ff to record an explicit merge
commit on the integration branch, even though a fast-forward is possible. This
gets you the temporal straightforwardness of rebase while preserving the fact
that there WERE feature branches and their commits are partitioned in history.

~~~
khasinski
You nailed it. Commit history is for people to read.

Check out git flow, you might like it. It could add even more structure and
readability to your codebase history.

~~~
frou_dh
Do you agree with my edit? I'm no git pro, so still trying to get things
straight in my mind.

~~~
khasinski
Yes, --no-ff merge after a rebase gives a clear indication that's a feature
merged from a feature branch. It's easy to cherry-pick it to another branch
(for example for a backport to an old version), easy to bisect this branch or
remove the entire feature.

------
rachbelaid
I usually suggest to change the config to avoid that people forgot the
--rebase arguments

git config branch.master.rebase true git config branch.develop.rebase true

This will make any pull be a pull --rebase on the master/develop

~~~
reledi
Doing stuff like that is dangerous in my opinion. People may forget that
they're actually doing something different from what they typed.

Explicit is better than implicit.

~~~
noselasd
But even typing git pull isn't canonical, bar what the defaults are. git pull
pretty much does a git fetch && git merge for you.

~~~
reledi
Yes but that's the default behaviour of git pull so it's expected to fetch and
merge when you pull. Changing the default behaviour can lead to confusion or
mistakes.

------
damncabbage
I'm very confused. I work entirely from private feature branches; I use GitHub
pull requests to manage merging those into master, but never touch master
myself.

Does this fit into the above workflow at all, or is it only for those who are
working off master or sharing branches with other developers?

(I usually follow something approximating this flow: [http://julio-
ody.tumblr.com/post/31694093196/working-remotel...](http://julio-
ody.tumblr.com/post/31694093196/working-remotely-with-github))

~~~
raylu
It generally only applies to people sharing branches with others, whether they
be master, feature, or other types of branches.

------
d4vlx
I still prefer trunk based development with very frequent commits and a strong
test suite. Write 5 lines of code and a test, commit. When everyone is doing
this, continuous integration is running and QA is testing continuously most
problems get found fast. Merging is easy as well because all the changes are
so small. For stability of the system and speed of development this is works
pretty well.

------
bbwharris
Whats so wrong with a merge? Git is made for it. Sometimes things are
nonlinear. I prefer to roll-forward anyway instead of rollback.

~~~
dasil003
It's not bisectable. <http://darwinweb.net/articles/the-case-for-git-rebase>

