
Why is git pull considered harmful? (2013) - aleem
http://stackoverflow.com/questions/15316601/why-is-git-pull-considered-harmful
======
sergiosgc
I feel tempted to just answer the question using the Betteridge Law of
Headlines: Why is git pull considered harmful? It isn't.

\- nonlinearities aren't intrinsically bad. If they represent the actual
history they are ok.

\- accidental reintroduction of commits _rebased_ upstream are the result of
wrongly rewriting history upstream. You can't rewrite history when history is
replicated along several repos.

\- modifying the working directory is an expected result; of debatable
usefulness, namely in the face of the behaviour of
hg/monotone/darcs/other_dvcs_predating_git, but again not intrinsically bad.

\- pausing to review others' work is needed for a merge, again an expected
behaviour on git pull

\- making it hard to rebase against a remote branch is good. Don't rewrite
history unless you absolutely need to. I can't for the life of me understand
this pursuit of a (fake) linear history

\- Not cleaning up branches is good. Each repo knows what it wants to hold.
Git has no notion of master-slave relationships.

~~~
jenius
Ok so let me post an opinion from the other side. I think that histories that
represent actual history in git are actually not useful at all. Let's look at
an example.

I make a commit that introduces a bug and contains a typo. Someone else points
out the bug, so I fix the bug. Then someone else points out the typo so I fix
the typo, one small commit for each fix. While this is indeed an accurate
depiction of history, having three commits (one broken, and two silly short
one character/line changes) in your commit history instead of one commit is
not in any way useful here. For anyone reviewing a pull request or doing
anything else that involved looking through the history (that's what history
is for, right?), this is a waste of time, and unnecessarily sloppy. Or if you
need to cherry-pick or bisect, for example, you now have three commits that
represent one change, rather than one commit for one change.

Let's say someone makes a pull request to one of my projects for a change they
made that ends up having 5 commits fixing things they did wrong initially,
that could be 1 or 2 commits. There is zero chance I'm going to say "Ah well,
I guess that's an accurate representation of history! Let's merge it!" I'm
going to tell them to squash and rebase their commits to clean it up, then
force push. And to be honest, I think anyone who didn't do this would be doing
it wrong, promoting a messy history in their project.

Generally, this whole debate within git is referred to as whether you should
"hide the sausage" or not. Further discussion can be found here, with lots of
arguments I didn't touch on at all in this comment:
[http://sethrobertson.github.io/GitBestPractices/#sausage](http://sethrobertson.github.io/GitBestPractices/#sausage)

~~~
pierrebai
Every times I hear these sort of justification my little rage meter goes up a
notch. These arguments are all based on unverified and unncessary
optimizations.

* How many times have you read a project history commit by commit? How much time the occasional typo-commit took you to read? The probalbe answer, if measure would be negligible. Yet you're ready to spend time rewiting history. Which can cause real and known time wate as people have to rebase remerge, and sometimes cause all the problem that have been outlined.

* The whole point of bisect is using a logarithmic search into history. The rare typo-fix commit won't affect its runtime. Again, this is wasting time for no measurable effect.

* Stop the micro-fixing commit already in the first place!

But what really annoys me is that all these are the symptoms of a deeper
disease: mis-managed repos. This first thing to do if you have the legendary
50GB log show up in your repo is not to rewrite history. The first thing to do
is to make an urgent note to review your repo management processes. How did
that file get in there in the first place? Are you pulling directly in your
main repo!? The correct practice is to always pull into a staging repo and
only merge into main if clean. How do you know it's clean? Easy: use the
double-staging repo trick: pull into a staging repo make sure everything is
shape (human manual process) then pull into another clean-from-main staging
repo and _diff the history_. If the diff is not empty, you know something is
wrong. Only when diff are clean do you pull from that staging repo into main.

(BTW, that double staging is only necessary if you allow yourself to do
cleanup in the staging repos. If you always insist on clean pull, then all the
cleaning up is done elsewhere. This is not always possible / easy /efficient
to do on busy repo. And on your private working repo, do as you please, as
long as the pull then comes off clean.)

(Also, I recommend doing the same on your private working repo. I always find
it easier to have clean copy of the main repo, one staging repo where my own
cersion of clean-up history is kept and the real dirty-work repo yet a third
repo. The fact that I prefer to work with mercurial which works best by
cloning rather than branching pretty much enforce this discipline. It promotes
happy collaborations since you never pull into your work repo and you never
push from it.)

~~~
nathanvanfleet
The more I read about this stuff the more I realize there are a lot of ways to
manage things that fall entirely outside of the actual software that does the
management. We're going into the muddy waters of "best practices" but it's
interesting to read about different implementations.

------
bcl
If git pull is causing problems for you, you are not using git correctly. You
shouldn't be doing work on upstream branches -- that's one of the wonderful
things about git, branching is cheap.

When working with more than 0 other people you should reserve the upstream
branches for merging _your_ work and pushing. Do your actual work in a branch
and you can easily commit/stash our working tree, switch to the other branch
and examine their changes.

~~~
pjc50
This is the classic "if you use a prominent feature in a seemingly obvious
way, then it causing problems for you is your own fault". People familiar with
`svn update` or other VCS actions which synchronise a local copy with a
central repository will get surprised by `git pull`.

~~~
asolove
This is the classic "I want to use a new, more powerful tool, but I'm going to
rely on concepts from the old tool and not bother to learn the new one." If
you're using git in this way, you should just use svn. You'll be missing out
on the power of git, but you already are, and at least your tool will match
your mental model of it.

~~~
phillmv
That's a false dichotomy. Matter of fact is there is no right way to use git,
because the way you use git is going to change according to your organization
size and complexity.

The Linux kernel is a very different animal than your average rinky-dink hobby
project, which is different from your ten-dev consultancy.

Matter of fact is: the git UX is awful and it punishes everyone who doesn't
have have a high level understanding of how the underlying persistence model
works.

~~~
sethrin
> Matter of fact is: the git UX is awful and it punishes everyone who doesn't
> have have a high level understanding of how the underlying persistence model
> works.

I believe this is true, but reading through a description of how the
underlying persistence model works was for me a revelatory experience. This
was presented together with a description of the precise problem git was
designed to solve in the O'Reilly book, _Version Control with Git_. It seemed
to be a problem solved in every particular -- an exact solution, that rarest
of all things. I did struggle with git before I read about it, which is why I
picked up the book in the first place. Since then I have convinced myself that
any failures with git are due to my own deficiencies and not its, but my tasks
with git are far from exotic; I rarely encounter any difficulties.

The value in the software is not so much the UX, which I consider acceptable
(at least with aliases), but in the underlying data model. For me, it is a
tool of daily use, and so the knowledge required to use it is a trivial
investment. If we were discussing something other than a command-line tool, I
think I might be more amenable to the argument that a simpler UX is a better
one. With the CLI though, you pretty much need to know what you're doing
before you attempt to do it, and any problems with the process are rarely
considered UX issues.

~~~
johnbm
Isn't it a problem then that Git does not expose or visualize its internal
model in any comprehensible way to its end users? That and the lack of instant
undo seem ridiculous for a program designed to manage change.

~~~
sethrin
It has a steep learning curve. It is not optimized for novices. In my opinion,
it does exactly what it should do internally, while I consider other tools to
be deficient in some respects.

In the common case, undo is provided by _git commit --amend_. For anything
more complicated, I don't think that there is an explanation or GUI which
could be considered "intuitive". I've used gitk and git-gui, and a variety of
other visual interfaces before Reading the Figurative Manual, and while I'll
be the first to cry my own ignorance, I have not found any of them to give
much information about even the available options. What is cherry-picking?
What about reverse cherry-picking? When would you want to use either?

It's not impossible to design a user interface which would translate all of
git's graph manipulations into a simple visual system. However, its current
textual interface is extremely flexible, and it exposes a very useful
scripting interface. It is a tool that rewards knowledge and experience, like
emacs or vi(m). Which is not to say that it's necessarily a good value
proposition to you, but it is (imo) a good time investment. One of these days
I need to invest in vim too :(

~~~
johnbm
Eh? "Git undo" should restore the state of the repository to what it was
before I typed the last command. How is this not the simplest and most
intuitive implementation of this feature?

Unix geeks, you make my head hurt with your stockholm syndrome.

------
googletron
The although the answer is quite thorough in almost every example you had to
be doing something else incorrectly for git pull to be an issue.

I have often railed with other developers working on mid/large sized projects
about having this pristine git history where commits are all tailed back to
back. My main issue with this is the fact that you never get a commit for when
two branches come together so you never really know when an issue was
introduced.

i.e.

Branch A works. Branch B work.

Branch A-B doesn't work. No merge commit to show introduction of bug.

People in general practice should not be force pushing.

Most of the arguments seem to be against not know what you are going to get
which can be preempted with a fetch portion of git pull, but if you don't
trust people you are working with, work with new people.

git pull doesn't delete old branch just as git add doesn't to delete old
branches, I am not even sure making an argument like that means, its not what
git pull is for.

------
tristan_juricek
This sort of "git pull" slam is really just an aftershock of shooting yourself
in the foot.

The UX of git is so different from other tools, but seems so similar. It's a
real problem for newbies. It's easy to think you know what's going on, because
is vaguely similar to other operations. And, for the impatient, the
documentation is really, really obnoxious. And most developers I've met are
fairly impatient.

End result: most developers I've met has shot themselves in the foot when they
started to use git.

When I bring people onto git now, I start them with a nice visual tool in an
existing repo; my current favorite is SourceTree. But that's not a
requirement. The simple fact that they can a nice history log with see tags of
"origin/master" and "master" usually triggers that "WTF" experience and they
start asking the right questions. If I start them with a new repo, and then
have them add, and stage, it's all a bunch of things they could figure out,
and they get impatient, and then bad things happen when it comes time to play
with others.

------
jordigh
Meanwhile, "hg pull" is just fine (it really works more like git fetch).

This really is just another case of bad UI design in git.

~~~
Crito
Why is "hg pull" being equivalent to "git fetch" bad UI on the part of git?
Just because it is different from Hg?

I use git-fetch, and have never had a problem with it being called "fetch".

------
Jacen
Why was this link put into hacker news ? It's old, it is also based on
inaccurate understanding of git. Experienced git users don't need this(1). Is
Hacker news designed to publish hints for beginners ?

Hacker news aimed at links that are deeply interesting.

> "A crap link is one that's only superficially interesting. Stories on HN
> don't have to be about hacking, because good hackers aren't only interested
> in hacking, but they do have to be deeply interesting."

In my opinion, this link is superficial, and does not go in the real topic :
the workflow.

(1) In my opinion, an experienced git user should know what he need to fetch,
and when he should have a merge instead of rebasing some commits, and should
also know the commands to do so. It's more a matter of workflow than a matter
of git command.

~~~
recursive
Many readers of Hacker News are not experienced git users. I know from first
hand experience.

------
websitescenes
I have never had an issue with git pull. With proper communication and
push/pull etiquette, there will rarely be conflicts and when there are, just
rebase. I don't think it is ever ok to force a push. If you are having trouble
reviewing others commits then try using gitx to view the changes in an easy to
read way. I like that it doesn't remove deleted branches because sometimes I
still want it when the rest of the team doesn't. I'm not convinced..

------
mcfunley
So I take it that the poster is also the one who gets worked up when people
use git pull? This person is criminally insane and I feel for his colleagues.

------
anton_gogolev
It's not harmful. It's just the surprising asymmetry between "git pull" and
"git push": for some reason (which is obvious), "git push" does _not_ merge,
whereas "git pull" does.

Again, compare with how perfectly symmetrical pull/push operations are in
Mercurial.

~~~
etherealG
mind explaining what hg push/pull do to a git only user?

perhaps it's that i'm just only used to git's definitions of the words, but
they seem symmetrical to me in the sense that they both try to synchronise
history, either upwards or downwards.

i can't really imagine what a pull would do differently than a merge or rebase
that would satisfy the idea of synchronisation but be more symmetrical as you
describe. i'd really appreciate an elaboration.

~~~
anton_gogolev
"hg push" does nothing fancy: it just prepares a bundle [1] and sends it over
to the server. The server can either reject (when committing a bundle will
result in creation of a new head) or accept (either by virtue of --force flag
or just because it does not create any new heads) the bundle and
transactionally commit it to the repository. If the repository being pushed to
is not "bare" (in Git parlance), it's working copy will not be affected by a
push operation.

Note that there's no merging going on here. All the merging/rebasing/rewriting
happens before "hg push".

Now, "hg pull" does perfectly symmetrical thing: it downloads a bundle from
the server and commits it to the local repository. Working copy is not
affected at all. Again, merging/rebasing happens elsewhere, outside of "hg
pull" workflow.

[1]:
[http://mercurial.selenic.com/wiki/BundleFormat](http://mercurial.selenic.com/wiki/BundleFormat)

------
SeanDav
Although SO does give one the option to answer an own question it is clear
here that this guy posted his question in order to answer it, perhaps to farm
points, perhaps because he thought his answer is genuinely useful. The answer
is time stamped the same as the question.

Not sure why we should care one way or the other but to me it does seem a bit
against the spirit of how SO should be used.

~~~
DrJokepu
AFAIK you don't get reputation for answers to your own questions.

~~~
maaaats
You do get rep. I occasionally get reputation for my own answer to my own
question (
[http://stackoverflow.com/q/12077126/923847](http://stackoverflow.com/q/12077126/923847)
).

If you got some nice info, why should one not be able to share it? If you get
rep, it means that it was useful for someone.

------
badman_ting
Oh cripes. I just want to do my work and commit my code.

~~~
mcv
Then just stick to `git pull`, `git commit` and `git push`. They do everything
you need. All that messing about with your history that this guy is talking
about is totally unnecessary.

~~~
jebblue
Agreed, that's what I stick to and git has been working great, very fast. I've
noticed that if a push fails, all I have to do on the master is:

    
    
      git checkout -b foo
      // do the push from the client repo
      git branch -d foo

------
zwieback
Seems obvious to me that git is just the tool we use to implement a specific
workflow. If the developers don't understand or disagree about the workflow it
will be a bumpy road. git pull is certainly part of our workflow but doesn't
generate much pain at all.

------
bsg75
The accepted answer on SO is to create an alias of a command sequence,
including a git config for the command session.

It is really a git best practice to create custom commands for common
operations?

~~~
frabcus
Excellent question! I've several times been told to make git aliases. I'm very
reluctant to, because I know in practice different people and different
machines will have different aliases, making a friction on collaboration and
knowledge sharing.

------
merloen
I have a cronjob that does a "git fetch" every 5 minutes. That means I never
pull. I merge-fastforward, and I rebase. Very satisfied with that workflow.

------
joshlegs
That post really irks me. Granted, S.O. encourages you to answer your own
question, but this guy would have been much better served by a blog post. A
Q&A is not the appropriate format for him to ask a question when he has no
need for an answer. Seriously, blog it, dude.

------
richardjordan
Ah. So THIS is how you juice your stack overflow ranking. Good to know.

~~~
chc
By asking questions lots of people want to know about and then waiting a year
for votes to trickle in? Sure, but I would have thought that would be obvious.

------
Guvante
Apparently a solution has been released for git 2.0.

> "git pull" can be told to only accept fast-forward by setting the > new
> "pull.ff" configuration.

------
nikon
git pull --rebase

~~~
pascal_cuoq
Did you read the linked webpage? The problems of “pull --rebase” are
addressed, and the author offers an alternative solution. I am not expert
enough to judge whether there still are problems with his solution, but I can
assure you that your comment is not adding anything to the debate.

~~~
krisdol
The linked webpage does not mention git pull --rebase. The author talks about
problems with setting the configuration options of git pull to any one
default, but doesn't address the fact that we can perform a fetch, merge,
rebase in one command when necessary with "git pull --rebase". The default
configuration is preserved in this case. That said, I have none of the issues
the SO guy has. It's insane that he has so many problems with people "cleaning
up" a remote's history by making it more linear than it actually is. You
should never force push and you should never rebase pushed commits. There are
very few exceptions to this rule.

------
jheriko
isn't this just another very special case of 'modifying history is harmful'?

------
mokkol
I use git-up for this issue. [https://github.com/aanand/git-
up](https://github.com/aanand/git-up)

------
njharman
Why is git so god damn hard / complicated? I mean other than haven been
created by and for Kernel Developers.

------
unethical_ban
Another example of how complex git can be. How I wish git's interactions were
more like bzr!

------
evanm
w/e. i <3 git pull

------
a3voices
It's amusing how the same person posted and answered the question. It's almost
like he's using Stack Overflow as a blog.

~~~
dgellow
It's a StackOverflow feature. The question has been asked before on Meta :
[http://meta.stackoverflow.com/a/17467](http://meta.stackoverflow.com/a/17467)

Question: "Can I answer my own questions, even if I knew the answer before
asking?"

Answer: "Yes! There are already numerous posts that answer their own
questions. There's nothing wrong with it. It's even encouraged."

~~~
why-el
I think the amusing part is how long it took the asker to answer their own
questions, suggesting that he asked only to answer right after. Nothing wrong
with that, in fact I think it's great.

