

My Common Git Workflow - knowtheory
http://yehudakatz.com/2010/05/13/common-git-workflows/

======
buro9
This is incredibly timely and I for one appreciate the post. I'm just moving
myself and a team from past-experience of SVN into the world of git.

None of us know git, but we're all vaguely aware of it. Unfortunately the
place we are at requires us to integrate with ClearCase and SVN, and to make
matters worse we're all on Macs and Linux for which ClearCase tools don't
appear to exist.

Our proposed solution to bring sanity to this is to use git with git-cc and
git-svn to act as bridges to the legacy systems, whilst putting all new work
natively into git.

Now... I'll be the first to admit that we're not full of confidence about this
as it's the fear of the unknown, so if anyone else has solid advice on how to
work with git and workflows and integrating into existing legacy systems I
would be keen to hear it.

~~~
fragmede
Are you trying to integrate ClearCase _with_ svn (with git as the go-between
bridge)?

Thats a bit more than I've had to deal with (though I've heard of
perforce->(git)->svn being done).

As for git-svn itself, I'll say what I said on the other thread - what finally
got me to switch is that with git-svn, my local copy is a real live Git repo.
Which means the code I write, my actual work, is all still there and accesible
via standard Git commands that I can look up. The fix for the worst-case
scenario? I do a fresh svn checkout to a new directory, and I do git checkout,
I copy the files out, I do an svn commit of that version of files.

Far from ideal, but knowing that I could recover from worst the system could
throw at me, put me quite at ease. (Yeah, yeah git svn dcommit, but we're
talking about a fearsome worst-case scenario.)

~~~
buro9
We all have experience with SVN, and that's been our preferred setup in our
prior roles.

Here, ClearCase is used by the back end services team and they use that to
push things through the different environments too. So at some level we need
to access their code, and to contribute code to their repository.

SVN is used by other rogue developers to try and avoid ClearCase... very
understandable really. So we also need to get some of their code and
contribute to them.

So neither holds all knowledge, and IBM doesn't provide ClearCase software for
Mac OS X.

So what we're looking at is a git-cc bridge to go in that direction, installed
on a Windows box and then allowing us internally to use git to work and then
just to periodically push upstream from the Win box.

And as we'll be using git, it seems desirable (though not required) to use the
git-svn bridge if that works well as then we only need to use the one tool
daily.

I've read a lot about how to setup the CC2Git bridge, and no solution seems
great but all appear to be adequate enough for it to work at some level. I
haven't checked out git-svn too much but what I had read was mostly good.

The concerns we have are once we're in the git world, how do you use the
thing? I've seen a lot mentioning the importance of understanding git and
workflows, but not a great deal describing git for SVN users and then showing
simple to follow guides that not just communicate the commands, but also a
nice and solid workflow.

~~~
fragmede
I assume you've seen the git svn crash course
(<http://git.or.cz/course/svn.html>) which is good overview, but falls short
on the subtleties of merging (particularly using git pull --rebase). The side-
by-side commands in the two VCS it has is great. The ParrotVM wiki also has a
git-svn guide (<http://trac.parrot.org/parrot/wiki/git-svn-tutorial>) that
I've found quite helpful and has an introductory workflow.

(Alternately, it looks like svnimporter
(<http://www.subversionary.org/projects/svnimporter>) can import ClearCase
which gets at a tiny subset of your problem.)

------
knowtheory
As a git user, i do really think it's critical that people think about their
workflows and talk over what strategy they're going to use w/ their fellow
committers.

There's (definitely) more than one way to do it with Git, so it's best to read
up on how other people use it.

------
acqq
OK now I know about:

"git pull --rebase"

If only somebody would explain what this really does! The man pages are not
helpful!

git-pull(1) Manual Page says: "--rebase: Instead of a merge, perform a rebase
after fetching. If there is a remote ref for the upstream branch, and this
branch was rebased since last fetched, the rebase uses that information to
avoid rebasing non-local changes. To make this the default for branch <name>,
set configuration branch.<name>.rebase to true."

Oh how clear. So let's try git-rebase:

"If <branch> is specified, git rebase will perform an automatic git checkout
<branch> before doing anything else. Otherwise it remains on the current
branch.

All changes made by commits in the current branch but that are not in
<upstream> are saved to a temporary area. This is the same set of commits that
would be shown by git log <upstream>..HEAD (or git log HEAD, if --root is
specified).

The current branch is reset to <upstream>, or <newbase> if the --onto option
was supplied. This has the exact same effect as git reset --hard <upstream>
(or <newbase>). ORIG_HEAD is set to point at the tip of the branch before the
reset.

The commits that were previously saved into the temporary area are then
reapplied to the current branch, one by one, in order. Note that any commits
in HEAD which introduce the same textual changes as a commit in
HEAD..<upstream> are omitted (i.e., a patch already accepted upstream with a
different commit message or timestamp will be skipped)."

I'm lost!

Now can anybody explain so nice like Yehuda did about the workflow, what and
why that

git pull

did "wrong", and what the hell is that that "--rebase" does "right."
Especially as here and there "--rebase" is described as something that
"doesn't keep history" and therefore "bad."

Note that I put "bad" "wrong" and "right" in quotes. I know that's all
relative, I just want to get some simple explanation what's going on without
reading "recursion see recursion" man pages!

Thanks in advance!

~~~
js2

      "git pull" -> git fetch + git merge
    
      "git pull --rebase" -> git fetch + git rebase
    

git fetch is what syncs the remote changes to your local repo. . If you want
to see these changes, they are referenced by so-called remote-tracking
branches. To view them use:

    
    
      git branch -r
    

Now, if your local branch (let's say its master) has not diverged from the
remote branch to which it corresponds (typically origin/master), then it
doesn't matter whether you use rebase or merge. In the case where your local
master is a superset of the remote master, there are no new remote changes to
incorporate. In the case where where your local master is a strict subset of
the remote master, then either rebase or merge will perform a so-called "fast-
forward" operation which basically just updates your local master to the same
commit as the remote master.

However, if your local master has diverged from the remote master, then there
is work to do:

Rebase will take all your new local commits, set them aside, reset your local
master to the same commit as the remote master, then replay your commits one
at a time. If there are conflicts, you will of course have to resolve them as
you go.

Merge, on the other hand, will attempt to create a merge commit. A merge
commit is one that has two parents. In this case, the parents are the tip of
the remote master and the tip of the local master. Of course, here too there
may be conflicts for you to resolve.

Regardless of whether you performed rebase or merge, when done, your local
master is now a superset of the remote master and you can push out the change.
(Unless of course, while you were rebasing/merging, someone else pushed their
changes. In this case, wash, rinse, repeat…).

Now, as to whether to use merge or rebase… there is no right answer. It very
much depends on the workflow.

Rebase will give you a "cleaner" history in the sense that the history remains
linear which some folks prefer. A linear history is easier to understand and
if you ever need to use "git bisect" to find where something broke, a linear
history is much easier to deal with.

However, some folks prefer to see the true history of development in the sense
of "when so-and-so started this work, on which commit was it originally
based?". In that case, merge preserves that information (as long as it wasn't
a fast-forward -- but you can always force a merge commit with "git merge
--no-ff").

In some cases, it makes sense to use both rebase and merge -- rebase for
simpler changes and merge to incorporate long-lived topic-branches.

It pains me that the git man pages are so awful. Patches welcome (it is
unfortunate that newbies provide the best perspective on how awful the docs
are, while simultaneously are the least likely to be able to contribute
improvements to them). Ultimately, git is actually based on some fairly simple
concepts and I happen to think that once you have a conceptual understanding
of it, all the commands (more or less…) start to make sense. So I strongly
advocate trying to understand what the heck is going on under the hood.

That said, the Pro Git book is a much better place to start learning git than
the git man pages.

Hope this helps.

~~~
acqq
"Merge, on the other hand, will attempt to create a merge commit. A merge
commit is one that has two parents. In this case, the parents are the tip of
the remote master and the tip of the local master."

Now what I understand from that is that "merge" (that is pull without rebase)
should not be worse in any way!

In both scenarios git doesn't have less information to begin with, it doesn't
start with any false assumption, it's just that without --rebase the resulting
local repo is supposed to contain a bit more information.

Then why in the world is "pull --rebase" again "easy" to make the result and
why is plain "pull" something that's "hard" to do for a beginner?

Now I think it must be some side effect of one or another that makes the whole
story relevant?

------
ivenkys
"is that there are a set of high-level version control operations that you’d
expect git to be able to handle in simple cases without a lot of fuss." True.

git is a "particular kind of" swiss knife of Version Control - tonnes of
options, not all of them intuitive but extremely powerful and useful if you
like to work in the git way.

I, for one am a bit surprised about all this noise on workflows of Version
Control , you use what you think is best suited to you and your team.

Its as useful as a language war , not all languages are equal but you use what
suits your use-case and what you are most comfortable with. End of.

~~~
lincolnq
The reason language and version control wars exist is because these are the
mechanisms by which developers interoperate. It is worthwhile for me to
convince you to use Git instead of Subversion, or Python instead of C++,
because I might one day work with you and have to use your tools. If there's
consensus about what the best tools are, then I am less likely to experience
pain when moving.

If we were talking about which editor to use, I am rarely bound to use a
certain editor based on the team I'm on, so that's why editor wars are,
indeed, largely noise.

~~~
ivenkys
"I might one day work with you and have to use your tools. If there's
consensus about what the best tools are, then I am less likely to experience
pain when moving."

Your point though correct in the abstract misses one crucial piece of
information . Is it the best tool for the job i.e. it is not simply enough to
say Use git or Python or <foo> etc. The context around it is very important.

For e.g. in my current team everyone is comfortable using git as we have a
largely de-centralised development model. On the other hand in my previous
role it was a very centralised - one true copy of source code model - hence
SVN.

The comparisons and wars that come out are devoid of this context and just say
, i use this and therefore you should do. My point is that there is no one
true way and attempting to posit that even though interesting is largely
futile.

~~~
lincolnq
When we get to the point where two tools have legitimate, reasonable
tradeoffs, then I agree with you.

However, your example isn't a good one because Subversion has serious
deficiencies and Git works perfectly well in a centralized model. (Just
"bless" one repo on a server and have everyone push to it.) I can't think of a
single use-case where Subversion is a preferred way to solve the version
control problem.

~~~
ivenkys
"I can't think of a single use-case where Subversion is a preferred way to
solve the version control problem".

Developer expertise.

My point is not that Subversion is a better VCS than git , it is not. I am a
big fan of git and will use it everywhere i can but i don;t get to make the
decision to choose the "right" VCS every time , sometimes you end up in places
where there are factors other than technical merit at play.

However, i think we are discussing a point orthogonal to the main topic at
hand.

------
mark_l_watson
I like Yehuda's advice to use --rebase. I'll have to try that.

I used to really like svn, but after using git for about a year I am now
feeling some cognitive dissonance using svn on a customer project. Great tools
are better than good tools.

~~~
knowtheory
I understand why --rebase isn't the default, but in the overwhelming majority
of cases, you want to --rebase when pulling. I always tell newbies that they
should default to --rebasing.

~~~
acqq
Why isn't --rebase default? Can anybody explain what it actually does and why
it's not default, man pages are not clear to me? Every thing is explained by
referring to a bunch of other things and there's not a trace of a simple
explanation.

~~~
knowtheory
Okay, so you have to think about repository state in order to understand
rebasing.

You share some common history with your remote repository (for the sake of
argument lets assume you've got a single remote repo that you and your friends
push to).

When you make changes and commit them locally, what you're doing is adding new
commit nodes to the history tree of your local repository.

When you push to the remote repo, the chain/branch of commit nodes from your
local repository gets added on to the remote repository's history.

BUT, if, while you were working and committing, someone else adds commits to
the remote repository... There is now new history in the remote repository
that you do not share.

What rebasing does, is say "take all the commit nodes from my local branch
(remember them), and rewind the history back to the point where my repository
was in synch with the remote repository. THEN update my repository with the
remote changes, and try and apply all my local commits on top of my newly
synched tree."

TL;DR:

if your commit tree looks like this (where 'H' are commits which you share
with the remote repository. 'L' are local commits and 'R' are new commits to
the remote repo, which you don't have):

H-H-H-H-H-L-L-L-L

Your friend pushes his changes so the remote repository now looks like

H-H-H-H-H-R-R-R-R

Rebasing does this:

1) H-H-H-H-H [snip] L-L-L-L

2) H-H-H-H-H << R-R-R-R

3) H-H-H-H-H-R-R-R-R << L-L-L-L

If there are conflicts, Git steps through each of your commits and lets you
fix the problems and merge. If there are no conflicts, it just seamlessly
merges.

~~~
acqq
Thanks knowtheory, that's exactly what I wanted to read, that git pull
--rebase is "take all the commit nodes from my local branch (remember them),
and rewind the history back to the point where my repository was in synch with
the remote repository. THEN update my repository with the remote changes, and
try and apply all my local commits on top of my newly synched tree."" Now
what's "git pull" explained in the same terms?

Is it "produce errors as if remote were in sync with local even if it's
obvious it isn't"? Who needs that?

If it's not, then what is a proper description?

And is there any scenario in which remote was changed, you want to push there
your changes and you wouldn't want "git pull --rebase"?

Thanks!

~~~
lincolnq
"Normal" git development (without rebasing) naturally creates a branching
history when two people make changes based off a common root. As was explained
above, rebasing causes this branching history to be rewritten into a linear
one. The alternative is merging, where you cause Git to record what actually
happened -- which is that two changes occurred simultaneously and you
explicitly brought them in sync again.

Sometimes (often) this is what you want -- it's especially useful for longer-
lived branches of work, where having the history of the branch can be valuable
in itself.

Also, if you share a commit with someone, you can no longer safely rebase it!
The reason is that part of the definition of a commit is its set of parents --
the commits it depends on (usually 1, but can be 2 or more for a merge
commit). When you rebase, you are rewriting all the parents, so you end up
creating a whole stream of new commits and discarding the old ones. If you
then attempt to share history with someone who has the old ones, presumably
Git will become confused. (I've never actually run into this problem, but I
don't use rebasing much.)

~~~
acqq
Those are good arguments why "pull --rebase" is not a default action, thank
you. Now I don't understand why either "pull" or "pull --rebase" is supposed
to have any different behaviour when somebody does it! As far as I undestand
git has enough information not to complain more in one case and less in
another, and as far as I understood, the main argument of the main article was
that "--rebase" somehow "eases some pain"!? How come?

------
vlucas
I moved all my projects and code over to git from svn about a year ago. My
workflow has improved tremendously as a result - especially working in larger
teams.

That being said, resolving conflicts in git has almost always been more
difficult and counter-intuitive than resolving conflicts in SVN. With git, you
have to learn the "right way" to do it, which typically involves several steps
or a "git reset --hard origin". I can understand the frustration coming from
the guy in the linked post that this post is a response to.

------
codesink
There's a free book about git available at

<http://progit.org/book/>

Both the basics and advanced topics are clearly discussed.

~~~
morbidkk
yes I agree. this was clear and easy to understand. Now I keep
<http://cheat.errtheblog.com/s/git> this open all the time for easy reference

------
theBobMcCormick
My two favorite resources for getting started with git:

1) Using Git with a central repository: <http://toroid.org/ams/git-central-
repo-howto>

2) A succesful Git branching model: <http://nvie.com/git-model>

------
mhb
I'm using git on AWS running Ubuntu Hardy. Is there a GUI tool I can use to
look at my repository from a remote Windows box (where I use SecureCRT to
access the instance)?

~~~
morbidkk
smartGIT

------
gorm
Comparing git with svn doesn't make sense. It's two different tools with
different use-cases and a transition from svn to git only make sense for some
(motivated) teams.

~~~
wycats
While there are quite a few different workflows, there are a number of similar
operations (get a repository, get updates from a remote, push changes to the
remote). My point was not to compare git (as a whole) to subversion, it was to
demonstrate how the tools handle those operations.

I think it's reasonable for people who are moving from subversion to git to
expect that commands for these operations will exist (and they, of course do).

Finally, even in my limited description, I managed to demonstrate a benefit of
git (namely, that the slightly different default workflow handles the common
problem of rollback much more elegantly)

