
Understanding the Git Workflow - RSkuja
http://sandofsky.com/blog/git-workflow.html
======
pilif
The minute I learned about "rebase -i" and "add -p" has changed how I think
about commits. I learned how I could easily keep the history clean and
conversely, I learned the huge value that a clean history has for maintenance.

Now, building the commits as self-contained entities that don't break the
build in between not only helps me while searching bugs later on, it sometimes
helps me detect code smells around unneeded dependencies.

That said, I still like to merge big features with --no-ff if they change a
lot of code and evolved over a long time, as that, again, helps keeping
history clean because a reader can clearly distinguish code before the big
change from code after the big change.

Of course the individual commits in the branch are still clean and readable,
but the explicit merge still helps if you look at the history after some time.

"you said 'a long time in development' - surely the merge target has changed
in between. Why still -no-ff?" you might ask.

The reason, again, is clean history: before merging I usually rebase on top of
the merge target to remove eventual bitrot and in order to keep the merge
commit clean. Having merge commits with huge code changes in them which we're
caused by fixing merge conflicts, again, feels bad.

But this is certainly a matter of taste.

~~~
diminish
just like you, i enjoy rebase -i, to change history; but I also hear some
poeple claim the history should be kept as it is and should not be rewritten.
What are your arguments for rebase?

~~~
pilif
I would never ever rewrite the public history.

The public history is what ends up on the repository from where we deploy
from. Whenever a commit is pushed there, it stays there. There will never be
any rebasing (minus emergencies like removing accidentally committed files for
which we don't own a license for - didn't happen so far though).

"rebase -i" is a tool for personal development use. It's not a tool to use on
a public repository as it will make following history incredibly hard and it
will screw with the clones other developers might have.

Conversely though, what I do on my personal development machine or on my
personal public clone (everyone of us has a personal public clone we use for
code reviews or discussions around code), is my business.

Nobody is telling me which editor to use and nobody is telling me whether I
can clean up my commits or not.

Now in general, since learning that having clean commits is possible (it's not
in subversion for example), I encourage my fellow developers to have clean
commits and I discourage them from committing those famous "oops - removed
typo" or "oops - added forgotten file" commits as they are completely useless
for the overall history of the project.

Two months from now, nobody is going to care about you forgetting to add a
file. But I'm likely going to care about when a feature has been added and why
some lines have changed. So that's what I want to have in the public
repository. Not a history of your personal forgetfulness.

If they manage to do that without ever rebasing (you can do it with add -p,
it's just easy to make a mistake), then fine. In the end, I only care about a
clean history on our public repository.

~~~
eropple
IMO, that's really the wrong way to go, and it's one of the big reasons I
absolutely _loathe_ git. I want any changes that are in my tree, ever, to be
in the order and position in which they happened. If somebody screwed up and
forgot to add a file, fine--add it in another commit. It's not like commits
cost money.

As far as rolling back later--meh? I've never had a trouble in 300Krev heavily
branched SVN barf, I strongly doubt it's suddenly harder in a DVCS. Merge tags
are your friend, and indelible history is a good thing.

~~~
pilif
Commits don't cost money, but time wasted on "added forgotten files"-commits
while parsing the history to trace a bug does cost money, so I'd rather not
have the commits.

Additionally, it's impossible for you or anybody else to find out whether I
have rebased my personal history before pushing. As such, it's totally
inconsequential for the main repository whether I rebased or not.

As I said: I think rebase is a personal development tool, not one you would
alter public history with.

~~~
eropple
I dunno, I think the claim that those commits "waste time" (in the sense of
any meaningful amount of time, even cumulatively) is a little hyperbolic.

I guess you view history differently than I do: I consider all development
history to be "public history" regardless of whether it was pulled in from a
clone or not. If you commit it to a repository I am going to be fulfilling a
pull request from, I want the history there.

~~~
planckscnst
So, let's get down to business here, then. What specifically is better about
this:

    
    
        commit 123facdf Add asynchThingerBopper() to Thinger class.
        commit 9f9babd8 Forgot to add Thinger.h file to 1123facdf
    

compared with this:

    
    
        commit 123facdf Add asynchThingerBopper() to Thinger class.
    

What specific value does the first scenario add that the second does not?

~~~
koopajah
It can help for git bisect for example to avoid having half of your commit not
compiling properly because half the time you forgot to add one of the files to
your commit.

~~~
fr0sty
> It can help for git bisect

Not if the first commit doesn't compile.

Indelible history is good for public projects (and no one is arguing about
that) and for change control systems but less good, in my estimation, in cases
where mistakes are easily made, have small to non-existent consequenses, and
serve no historic purpose.

------
decklin
The idea that fast-forward merges are easier to follow is subjective. I find
my --no-ff history easier to read. This author doesn't.

What always using fast-forward merges _really_ means is that you rebase each
branch onto master once it's ready to be public. Therefore, instead of
resolving conflicts when the branch is merged, the commits are rewritten to
avoid introducing the conflict in the first place.

Sometimes, this is really simple -- I added a line in one spot, you added
another line in the same spot, you merged first, so I rewrite my commit to add
my line next to yours instead of merging and resolving the conflict.
Sometimes, it's not -- maybe there's not even any text-level conflict, but
your feature and my feature interact in subtle and unanticipated ways and
something breaks. Now, there's no "good" point in my branch to refer to,
because I rewrote it on top of something where (I didn't realize) it was never
really going to work. The unit test I now need couldn't have existed because
it involves things that, when I was developing the branch, didn't exist.

Rebasing first is trading off _when_ you do that work. There's more to review
when the branch is ready, and there's a stronger incentive to get it right the
first time. I think this may work better for the "two founders deploying from
master when they feel like it" scenario -- you pay for manageability with
context switches. If you have a formal QA process, I think being able to
distinguish between "this branch failed QA" and "the combination of these
branches failed" may be more helpful -- you can parallelize work and hack on a
different private branch.

Git, thankfully, does not force us to choose one model or the other :-)

~~~
sandofsky
In my experience, on large distributed projects the person integrating changes
into master is rarely the same person who authored the change.

For example, when Linux branches are pulled upstream, if your code creates a
conflict your branch will just be rejected and you'll be told to fix.

Rebase forces the author to solve more of these problems before submitting
their change for integration.

I don't think rebase is an end-all solution for the reasons you've described.
It's perfect for medium sized changes you can easily verify afterwards. My
day-to-day work usually falls into this category.

In the case of larger sets of all-or-none changes, such as a site redesign, it
makes perfect sense to maintain a parallel line of development. Cleanup
probably isn't worth it, and the separate branch serves as documentation. You
should consciously create a new public branch.

In this case, I can understand wanting a "no-ff" merge for documentation. I
think you should first consider tags, but sometimes it makes sense to set a
stake in the ground with a placebo commit.

The problem is that if you use "no-ff" all the time on trivial changes, then
these branches lose meaning.

This post wasn't supposed to be an embargo on "no-ff." My case is that people
default to "no-ff" to pave over deeper issues.

~~~
cpeterso
A --no-ff merge also makes reverting a change from master easier because there
is just one commit. You don't need to dig through the log to find the first
commit from the merged branch fast-forwarded onto master.

~~~
fr0sty
Do you actually mean "revert" there or are you talking about rewinding?

using "git revert <merge_commit>" is very nasty[1]. using 'git reset
<before_bad_merge> is less so.

[1][http://kernel.org/pub/software/scm/git/docs/howto/revert-
a-f...](http://kernel.org/pub/software/scm/git/docs/howto/revert-a-faulty-
merge.txt)

------
sunchild
This opened my eyes a bit. I am a walking, talking git anti-pattern today. I'm
mostly on a two-man team, so I can get away with it. I'm definitely going to
start thinking more about a clean history on master.

What are some other best-practice git workflows that HN readers use?

------
gruseom
I work this way and agree about the value of a clean, linear history. It makes
working with past versions of your code a breeze. There's one thing the OP
doesn't mention that I've found important.

Say you're working on a major design change in a private branch and it has 100
commits. When it's ready to be put on top of master, you'd really like not to
squash all 100 commits. Unfortunately, if there are conflicts, then rebasing
B1,B2,...,B100 onto master is likely to be much harder than squashing
B1,...,B99 into B100 and then rebasing. Why? In the squashed case you only
have to deal with conflicts between B100 and master, while in the unsquashed
case you have to deal with all the conflicts that ever existed as you
progressed from B1 to B100. It's frustrating to find yourself fixing conflicts
in code that you know doesn't exist any more. It's also error-prone since it
forces you to remember what you were doing at all those steps. In such
situations, I give up and squash. That's not great either, since you now have
the disadvantages of a single monolithic commit.

The solution is to be diligent about rebasing B onto master as frequently as
master changes, so B never has a chance to drift too far afield. This at least
gets rid of the worst pain, which is conflicts that compounded unnecessarily.
It also keeps you aware of what's happening on master.

~~~
js2
Here's a trick for you: make sure you have rerere enabled. Merge the end
commit, resolve all the conflicts and commit the merge (or just run rerere to
record the conflict resolution). Then abort the merge or reset back to undo
it. Now do the rebase, which will re-use the resolutions for any identical
conflicts. You still have to deal with conflicts unique to the intermediate
state, but in my experience rerere helps a lot.

~~~
gruseom
I tried rerere once and it felt too much like magic to me, i.e. too
complicated in a way that I didn't trust. Experience with conflicts has led me
to eschew magic merge tools and rely on the simplest strategies: 1. minimize
conflicts; 2. bite the bullet and deal with them manually. (Edit: my question
about rerere is: how identical is "identical"? How can I be sure that it will
redo what I did before in exactly the way I would do it now? Doesn't it have
to understand my intent to achieve that?)

The diligent-rebasing-along-the-way workflow I proposed is all about #1. You
still have to deal with intermediate conflicts this way too, but at least
they're minimized. If something you commit to master conflicts with my B49, I
have to fix B1..B49 but at least I can write B50..B100 in a way that takes
your work into account.

~~~
js2
There is nothing really magic about rerere. During a merge, it records each
conflict. When you commit the merge, it records your resolution of the
conflict. If that _identical_ conflict occurs again, it re-applies the same
resolution. You can choose whether it marks the the file as resolved or not,
which allows you to easily review what was done before committing the merge.

------
pflanze
I've always been an extensive user of rebase -i. Committing partial work often
using git commit -a is easier, or at least takes less concentration, than
always being careful to commit selectively with git add -p, git commit $files,
but it needs squashing of those partial commits later on. I found that git
rebase -i wouldn't scale to several days worth of work: I would frequently
make errors when dealing with conflicts, and restarting rebase -i from scratch
would mean redoing much of the work.

Because of this, I wrote a tool[1] that lets me do the same thing as git
rebase -i, but allows me to edit the history changes incrementally, by keeping
the history edit file and conflict resolutions around between runs; it does
this by creating git patch files from all commits in question. I now always
use this whenever I need to do more than one or two changes on some history;
also, I'm now often creating commits to just store a note about a
thought/idea/issue (the tool automatically adds a tag to the original history
head, so I can look at those later on).

I originally wrote this just for me, which is the reason its own history isn't
particularly clean and that I'm relying on a set of never-released libraries
of mine; also maybe there are other, perhaps more well-known or polished tools
than this, I don't know. I guess I should announce this on the Git mailing
list to get feedback by the core devs.

[1] <https://github.com/pflanze/cj-git-patchtool>

/plug

------
simonw
This is the first argument for using rebase that I've found truly convincing -
really worth reading. This will probably change the way I use git.

~~~
eropple
It wouldn't mine, if I used git (I avoid git specifically for this reason,
actually, and use Mercurial). If you're actually looking at your commit logs,
I find that rolling back is trivial; I can't remember the last time I
accidentally rolled back into an incremental commit.

Personally it feels more like an apology for git's bad behavior than a good
method of development.

~~~
gruseom
What exactly does Mercurial do differently that is better?

~~~
eropple
Short of explicitly installing a rebase extension, it simply does not allow
you to do this sort of mucking about with the commit history. For "oops, typo"
commits, you can very quickly (and I mean, "it's a button in Tortoise"
quickly) roll back your change and keep it in abeyance until you've fixed the
typo.

~~~
chousuke
So basically you reject a powerful tool out of idealism ("must never ever edit
history") and fix your commits manually, also forgoing the possibility of
fixing earlier unpublished commits.

Instead you could code and commit (you know, use the VCS :P) without worry in
a _private_ branch, checking for problems afterwards and fixing them using
rebase prior to merging the commits into the main branch. (where "must never
ever edit history" actually applies)

Sorry if I sound snarky, but that's what this seems like to me.

------
alunny
For very short, "oh there's a syntax error I missed" commits, "commit --amend"
is very useful, and quicker than "rebase -i".

~~~
daemin
"git commit --amend" is very useful if you realise you forgot to include some
files in the last commit.

Although if you committed since then you might be better off adding a new
commit with the missing files and then doing a "git rebase -i" to move and
squash the commits as appropriate.

------
zwieback
Nice post, thanks.

I've been using traditional RCSs for years but find that whenever I introduce
SVN (or CVS before that) to a team it's very easy for new users to fall into
bad habits around branching and committing transitory changes.

I'd like to try git to help manage the mess during the prototyping phase but
I'm wondering how suitable it is for new users to learn git vs. learning svn.

Any opions out there on the suitability of git as a first version control
system? My team consists of highly experienced engineers (EE/FW) with little
or no software engineering experience.

~~~
mooneater
they sound like smart people. why hobble them with svn in 2011?

i put off the transition as long as i could out of inertia (switched from svn
in 08 out of desperation when i started needing a lot of branch and merging).
but once you go git, you dont look back, not one bit.

~~~
ulrich
When you are used to the SVN/CVS workflow, it takes a long time to get over
it. It took me a long time to understand why the distributed approach is
better, despite having read a lot about them. In my company we are using git
as well, but most developers refuse to work anywhere else than on master. They
probably had their share of trouble with branching in other systems.

~~~
eropple
It's Mercurial-based, not git-based, but you might have them read HGInit, by
Joel Spolsky:

<http://hginit.com>

Really, really approachable guide to how to properly use a DVCS.

------
andrew311
I'm wondering how people address one of the scenarios raised in the post,
specifically this:

"It’s safest to keep private branches local. If you do need to push one, maybe
to synchronize your work and home computers, tell your teammates that the
branch you pushed is private so they don’t base work off of it.

You should never merge a private branch directly into a public branch with a
vanilla merge. First, clean up your branch with tools like reset, rebase,
squash merges, and commit amending."

I'm wonder how people address cleaning a private branch that has been pushed
(when your goal is to get its changes into master cleanly). Rebasing the
private branch is pretty much out of the picture since it has been pushed
(unless you don't care about pushing it again). I can see some ways of doing
this:

1) You could do a diff patch and apply it master, then commit.

2) You could checkout your private feature branch, do a git reset to master in
such a way that your index is still from the private, then commit it. Ex:

currently on private branch git reset --soft master

Now all the changes from the private branch are changes to be committed on
master. This is easy, but it puts everything in one commit.

If you wanted to do a few commits for different, but stable points, but you
already pushed the private branch and can't rebase it, you could instead do
"git reset --soft" on successive points in the private branch commit chain,
committing to master as you go.

If you wanted to reorder commits from the private branch, I guess you could
rebase the private branch (which means you can't push again since you pushed
it already), then do the tactic from the last paragraph, then ditch the
private branch cause it's no longer pushable.

Does anyone have better ways of putting changes to master for private branches
that have already been pushed?

~~~
gruseom
Whether a branch is private and therefore can be rebased has nothing to do
with whether there's a copy of it on the server. I push my work-in-progress to
the server often for backup purposes anyway. If I want to rebase, I just push
-f.

I can't think of why that would be a problem, but if someone objected to push
-f on a private branch, I'd just make a new branch with a new name and push
that. And if that were a problem, I'd just find another server to push -f to
and only ever commit to master on the official server. But these are silly
workarounds. Why make things harder than they need to be?

~~~
andrew311
This is true. If the assumption is that it's a private branch, then other
people shouldn't care if you push -f because no one else should be using it.

Sometimes there are cases where people want to pull a private branch because
they are working on something that is in the same code path but will be
deployed after the private branch is integrated and deployed. They want to
work off the newest code and avoid a larger merge to their private branch
later. Would rebasing that private branch make their life harder? If so, one
could always stage changes in a feature branch at stable points for them.
Thoughts?

Basically, my understanding is that push -f can be a hassle for others to pull
if they made commits to the same branch already. You're totally right that if
it's truly a private branch, though, this should be irrelevant.

~~~
gruseom
If someone wants to work off my private stuff I would tell them "sure, but be
careful cause I'm push -f'ing" (after all, it's usually pretty easy to fix)
and give them a heads-up when I do. If that weren't acceptable, I'd add a tag
like "stable" to my branch, tell them to use it only up to that tag, and move
the tag forward as the work progresses. If that weren't acceptable I'd make a
branch instead of a tag and tell them to use that.

~~~
andrew311
Thanks! Excellent advice.

------
motherwell
<https://github.com/nvie/gitflow> works really well. The original post
<http://nvie.com/posts/a-successful-git-branching-model/> was really
compelling, and using it has really helped, at least what I do.

------
stretchwithme
What really helped me grasp git was attending one of Scott Chacon's speeches
on the topic. Scott works for github, knows what he's talking about and
explains things thoroughly.

    
    
      http://www.youtube.com/watch?v=QF_OlomyKQQ

------
joelhaasnoot
Hmm, this makes sense to me: lots of Git features I'd forgotten or not used
before.

Can anyone sketch my "merging" strategy I should be using in my scenario: \-
Have 3 branches dev, stage and master \- Bugs are fixed on master, bigger
bugs/changes on stage and new features on dev \- Big functionality
changes/additions come in the form of new branches, which currently I first
merge with dev, then with stage and if everything is OK, with master. This
doesn't always work well due to the timing of things: sometimes my dev branch
is out of date with the master and needs fixes from the master before
applying.

How should I handle merging the branches?

~~~
cvandyck76
I wouldn't have two separate branches for bugfixes and then one for new
features - as you noted, it can get hairy. Personally I find the git-flow
model very straightforward.

Do normal feature development and bug fixes on the develop branch; save master
for production releases. When it's time to make a release, cut a release
branch (e.g. r/1.0.1) from the develop branch. Bug fixes that are made on that
branch should also be merged into develop. Once the release is made, merge
r/1.0.1 back into master and develop and continue on as normal.

Also see: <https://github.com/nvie/gitflow>

~~~
joelhaasnoot
The problem is that between a feature being ready and it going into production
there is a certain amount of QA/tweaking that goes on. Before I was running
into issues where I couldn't fix a much smaller bug than the new functionality
due to not having a branch for that. The flow handles that with hotfixes,
which I guess works well.

I do think the numbering is excessive however: web software releases so often
on a one man team it's mostly extra work.

------
Maro
Great post. Calls attention to the importance of having clean, stable commits
in the 'master' branch and thus avoiding plain vanilla 'git merge' for
'squash' and 'rebase'.

[http://stackoverflow.com/questions/2427238/in-git-what-is-
th...](http://stackoverflow.com/questions/2427238/in-git-what-is-the-
difference-between-merge-squash-and-rebase)

------
swah
He should start the article with the last paragraph.

~~~
sandofsky
People could then read the summary and skim through the rest. The summary is
there just to help you remember.

If people don't internalize the reasoning, it's a disaster waiting to happen.

------
echostar
Under "Declaring Branch Bankruptcy", why does the author throw in a "git
reset" as the last step in the example.

------
endlessvoid94
After reading this, I finally motivated myself to read through the man pages
for git pull, fetch, merge, and rebase.

Thanks :-)

------
trusko
Good article. Thanks

------
jebblue
Git is plain scary. We should stick with SVN.

~~~
j-kidd
Try <http://hginit.com> for a fantastic introduction to Mercurial for people
familiar with SVN (or not).

Git was designed to suit kernel development (as shown in the article). For us
simple-minded mortals who like SVN, it is much easier to migrate to Mercurial.

~~~
jebblue
Good article, well written. If we are to start using distributed version
control then I guess it might as well be git since it seems to have the most
traction in the press.

~~~
koenigdavidmj
<http://hg-git.github.com/>

That is a plugin for Mercurial, written by the Github people, to support
targeting git repositories.

