
 What's the difference between 'git pull' and 'git fetch'? - leonegresima
http://stackoverflow.com/questions/292357/whats-the-difference-between-git-pull-and-git-fetch
======
losvedir
I think what many people miss is that if you have a remote repository called,
say, "origin", and a local branch synced with it called, say, "master", then
_there's a behind-the-scenes local branch called "origin/master"_. This third
branch is what your local git repository knows about the remote branch.

    
    
        [ master ] -------------- [origin/master | master]
         remote                           local
    

With this model in mind the difference is pretty clear. `git fetch` pulls down
all the new stuff from origin and updates _origin/master_ but leaves your
local _master_ untouched. You can then merge _origin/master_ into your local
_master_ to bring it up to date. `git pull` just does both steps: pull down
the remote data into origin/whatever, and then merge origin/whatever into
whatever.

    
    
        master   |    origin/master    | master
          ----------------->
          git fetch
    
          ----------------->--------------->
          git pull

------
mwfunk
I feel like git has turned into the Perl of version control systems, in both
the best and worst senses of the analogy. It's a system that was designed at a
very low level but has grown organically from there, in directions dictated by
how people use it (with Linus himself having an overwhelming influence here).

The real-world analogy is the probably apocryphal story about how some
university/corporate campus/park whatever didn't put in paths, but instead
waited a year to see where people walked and put in paths there. I first heard
this story on John Siracusa's old podcast, but it's apparently been around for
a long time. Some googling turned this up for yet another retelling:
[http://opensource.com/business/10/12/discovering-desire-
line...](http://opensource.com/business/10/12/discovering-desire-lines-how-
break-down-barriers-and-let-paths-emerge).

It's great because it really is insanely featureful compared to older VCS's,
and it has absolutely enabled workflows that simply were not possible 10 years
ago. On the other hand, the complexity and the inconsistencies and the
TIMTOWTDIness of it all means that it is one more tool that you have to
dedicate yourself to knowing inside out and thinking about all the time, as
opposed to a tool that's more of a fire-and-forget thing that's easy to learn,
easy to use, and never screws up (and doesn't give you enough rope to hang
yourself).

Somewhere among git, Perl, Python, the Linux kernel, Mercurial, various Linux
distros, various BSDs, and many many proprietary software projects, there is
some really awesome classic book on software engineering waiting to be
written, about the pros and cons of top-down vs. bottom-up design, having a
BDFL vs. community-driven design and goals, user- vs. marketing- vs.
developer-driven designs, etc.

Cathedral and the Bazaar doesn't count, it's way to shallow and opinionated
about there being a right way to do everything. There really is no one true
way to choose any of these approaches, and they all have their upsides and
downsides. Maybe the best we can hope for is to be aware of these things and
choose the best model on a per-project basis, and be able to adapt when that
project's ideal model changes.

------
js2
First let's start with the simple answer: "pull" starts by invoking "fetch".
It then performs either a merge or rebase depending upon various "git config"
settings and CLI options you can pass to git pull.

Now let's pop the hood.

Git commit history is represented by a graph. Typically each commit has a
pointer to a single parent. When history diverges and needs to be merged, a
merge commit is created which has two parents (it can have more than two but
such a commit is extremely atypical). The first commit obviously has no
parents and is also called a root commit.

Okay, so we've got a history of commits pointing to each other in a directed
acyclic graph. Now we want to traverse that graph with "git log". Where do we
start? This is what branches are, a mapping from a name to a particular
commit. Git stores the branch names under .git/refs/heads. Go look. These are
just files whose names are the branch names, and whose values are the SHA-1 of
a particular commit. (For performance reasons, git will occasionally remove
the files and instead use .git/packed-refs. But again this is a file you can
go cat.)

Now, there are two types of branches: 1) local branches; 2) remote branches.
Local branches are just the names (those things under refs/heads) which git
updates whenever you create a new commit. Remote branches are the things which
git updates when you perform a git fetch. That's it.

So a fetch operation examines a remote repo's local branches (refs/heads),
examines the corresponding remote branches in your repo
(refs/remotes/<remote_name>), pulls over the differences, then updates your
remote branch to match the remote's local branches. It does so according to
.git/config with a section that looks like this:

    
    
      [remote "origin"]
        url = https://github.com/gitster/git.git
        fetch = +refs/heads/master:refs/remotes/origin/master
    

That "fetch =" line tells git what to do when you invoke "git fetch" or "git
fetch origin". It says to update your repo's refs/remotes/origin/master to
match refs/heads/master in gitster's repo on github. The "+" at the start
means to force it to happen even if the remote end has been rewritten (that
is, the remote master's history does not "contain" refs/remotes/origin/master
on your end). When you invoke git fetch you can see this happen in its output:

    
    
      From https://github.com/gitster/git
         52a3e01..edca415  master     -> origin/master
    

Well what happened here? Fetch examined refs/remotes/origin/master in my repo
and refs/heads/master in gitster's repo, pulled over the commits I was
missing, then updated my refs/remotes/origin/master (aka origin/master) from
52a3e01 to edca415. So "git log 52a3e01..edca415" will show me exactly the
commits that fetch just brought over.

You'd typically perform a git fetch on its own so that you can then do
something like "git log master..origin/master" which tells git to show you all
the commits that are in refs/remotes/origin/mater but that are NOT in
refs/heads/master, i.e. exactly those commits you either need to merge in or
rebase upon.

So that's fetch.

Git pull then invokes either merge or rebase. To talk about these let's add
some history.

Pretend your local repo started as a clone. The remote repo ("origin") has a
single branch, master, and at the time you cloned it there was a single commit
on master, A. So your local repo after the clone:

    
    
      refs/heads/master: A
      refs/remotes/origin/master: A
    

Now you create a new commit:

    
    
      refs/heads/master: B
      refs/remotes/origin/master: A
    

Someone else pushes a new commit to the clone and you fetch that commit:

    
    
      refs/heads/master: B
      refs/remotes/origin/master: C
    

Now we have a case where history has diverged. Two people have created
commits, both which have the same parent commit A, and we need to tie these
both into master. Let's do it with a merge first:

    
    
      refs/heads/master: D
      refs/remotes/origin/master: C
    

But what is "D"? D is a merge commit with two parents, B and C. And because D
"contains" C, you can push it to the remote repo, updating refs/heads/master
in the remote repo.

But what if instead you want to rebase?

    
    
      refs/heads/master: B'
      refs/remotes/origin/master: C
    

This has linearized history. B' has a single parent, C, whose parent is A.

Similar to creating the merge, you can push B' to the remote repo because B'
contains C (unlike the original B). The rebase operation "rewrote"
refs/heads/master, dropping the original B which had A as its parent and
replacing it with B', which has C as its parent. Your original B is still in
your repo btw, and will be there for some time until "git gc" removes it. You
can find the original B in your ref log.

------
Zigurd
When I see people say git is/is not complicated, I think that doesn't quite
capture the problem. Git, like the set of chess moves, is simple. But git,
like playing chess without being able to see the chessboard, puts a big
cognitive load on users that many users think isn't necessary.

Moreover, that analogy might approach illustrating the problem, but is still
unsatisfying. Unlike, say, Eclipse, where one can readily sum up how Eclipse
fails at being a well behaved GUI - it offers to do things that will fail and
are nonsensical - git is, on the one hand, a lot better designed and it
provides more value than Eclipse, and on the other hand git is even more
baffling to beginners.

I want to see what's happening. I want to see what's going on at every
accessible point in the workflow of every project I am working on. I want to
see the results of actions. I want to be presented with all and only the
sensible actions in ways that are discoverable. I do not want a lame fake
"GUI" that is just dialogs pasted on a CLI.

------
oxtopus
Consider that git is a database of snapshots-in-time with an interface to
append your own snapshot(s), and a protocol for distributing the revised
timeline to others.

`git fetch` is how you retrieve those changes from a remote (like github).
`git pull` takes it a step further and attempts to merge your changes in
automatically. Think of it as two separate "download" and "sync" operations,
depending on your workflow, you may want to download, but defer the sync.
Eventually, you will need to sync before sending your changes upstream (`git
push`), but you have some flexibility in deciding when.

For example, if you are travelling, you may want to fetch your remote(s) so
that you have access to a relatively recent copy of the remote, but you aren't
ready to apply those changes to your current source tree. Or, you may want
fetch the remote(s) and compare the differences before applying the changes
locally in case there's risk of incorporating a breaking change that could
disrupt things.

------
pbreit
Wow, after reading a dozen or so responses, I still don't really get it.

~~~
lmm
I never liked stack overflow, so let me try here: because git is distributed,
you actually have two "copies" of your code; one that's a mirror of what's on
the server (origin/master), and one that's your local checkout (master).
Things that in SVN you would just do by contacting the server (e.g. svn diff .
<https://server/repo/trunk>) you instead do against your "local server copy"
(git diff . origin/master).

So there's an extra stage to think about; rather than making changes to your
local checkout and committing them to the server, in git you make changes to
your local checkout, commit them to your local repository, and then push them
to the remote server. But this also works in the other direction: rather than
fetching changes from the server directly into your local checkout, you fetch
changes from the server to your local repository, then from there into your
local checkout. At least, if you want to. "fetch" does the first of these,
while "pull" does both of them together.

Now, you can naturally ask why we would ever want this additional complexity,
but as I said at the start it's the whole point of a DVCS. Conceptually, your
repository on github and your local repository are the same kind of thing,
rather than a client/server relationship. You can use git without using
remotes at all, in which case you'd never use push, fetch or pull; in that
mode it behaves rather like SVN (just with the repository being on the same
machine as the checkout). Conversely, you can use it in truly distributed
fashion, where there are several federated repositories, none of which is
physically distinguished from the other. At that point there is no real notion
of committing to the "canonical" repository, because there isn't one, but what
you can do is commit to, and checkout from, your local repository, and you can
synchronize your repository with another repository in either direction (i.e.
you can send commits from your repository to another, or receive commits from
another repository to yours). When using it in this fashion it becomes very
important that these are different operations and you want to be able to
manually control when each step happens.

------
ams6110
If you want a DVCS that IMO works a lot more like what you are accustomed to
from svn, at least in the routine "checkout, change, commit" workflow, have a
look at Mercurial (hg).

------
daigoba66
I think an important aspect to "getting" git is realizing and accepting that
git has several somewhat redundant commands and usually two or more ways to
accomplish something.

Personally I always do "git remote update -p" to synchronize all my remotes
and then explicitly merge or rebase depending on what I'm trying to do.

------
lquist
Simple:

`git pull` = `git fetch` + `git merge`

`git fetch` much more often used by those who use the rebase workflow (vs. the
merge workflow).

------
chris_wot
My understanding if git fetch is that the remote. Hangers are stored out if
the way, ready to be merged to a branch. git pull is just a convinence
function.

Now a git pull -r, that's interesting. Does this just fetch the commits and
then do a rebase instead of a merge?

~~~
the_mitsuhiko
Pull with rebase: yes, that's how it works. You can also make it the default
behavior globally or per-repository if you prefer.

------
ohwp
With Mercurial it's the opposite where fetch is the "automated" command. Pull
just pulls the latest version from the server while fetch also tries to
update, merge and commit.

------
artagnon
Just read git-pull.sh: it's a 300-line shell script. How much simpler does it
get?

~~~
diggan
It gets simpler by someone telling in short what's the difference so you don't
have to read and understand the code.

Just because you understands the basics of Git doesn't mean you can understand
the language it was written in.

~~~
artagnon
Then being able to read basic code is a nice thing to aspire for, no?

Read and ask questions about what you don't understand.

~~~
jebblue
I know how to use drills, saws and other tools but never took them apart to
see how they work inside.

------
shadowmint
Why do so many people have such trouble with this simple concept?

Git isn't complicated, you just have to understand that a git repository is
composed of two things:

1) The _index_ which is the git information about all your files (ie. .git/)

2) The files you currently have checked out.

When you git fetch, you update the index, nothing else.

When you git merge or git rebase or reset or checkout, you update your _files_
from _the index_.

-____-

It makes me extremely sad to see this repeated over and over and people don't
get it.

~~~
tednaleid
The index isn't the information on all your files (it's not the .git
directory). The index is an intermediate holding location between the file
system and actually storing all of those files as a commit in the commit tree.
It's the current state of the proposed next commit.

Your description misses (or conflates) an entire tree of information (the HEAD
tree), and it's arguably the most important one as it holds the whole of your
git repo's history.

I didn't fully understand git till I read Scott Chacon's "A Tale of 3 Trees"
which explains what reset is all about and goes into the details: <http://git-
scm.com/blog/2011/07/11/reset.html>

I love git, but I do not think it's obvious or intuitive without some
explanation. It's different than any other SCM I've used in the past. I
created this presentation a while ago that I think highlights some of the real
concepts that people need to know to really understand git:
<http://tednaleid.github.io/showoff-git-core-concepts/>

~~~
shadowmint
Ok, not the index. The point is:

You have repository meta data in your .git folder.

You can download _new information_ into your .git folder using git fetch.

...but you can only apply _changes_ to your .git folder's data locally (eg.
git merge)

...and you can only apply changes to your _file system_ from your local .git
folder.

Four, basic concepts. Why is this difficult?

There's tonnes of complexity in git, sure, if you have trouble merging after a
rebase, sure, that's totally understandable.

...but the basic failure to understand the difference between applying a
remote change directly to your current file system (not possible) and
downloading that change and then applying it locally in various ways
frustrates me, I've got to say.

~~~
ams6110
This might be best explained with pictures, which of course aren't really
possible on a forum like this. The problem is when people try to explain git
in words they often start talking about "directed acyclic graphs" and "local
remote" or "remote local" branches (what?) and using terms like "clone",
"checkout", "rebase", "master", "head", "origin" without defining them; terms
that have specific meanings in git that are different from their meanings in
other systems and different from what many uninitiated users might think they
mean intuitively.

