I feel tempted to just answer the question using the Betteridge Law of Headlines: Why is git pull considered harmful? It isn't.
- nonlinearities aren't intrinsically bad. If they represent the actual history they are ok.
- accidental reintroduction of commits rebased upstream are the result of wrongly rewriting history upstream. You can't rewrite history when history is replicated along several repos.
- modifying the working directory is an expected result; of debatable usefulness, namely in the face of the behaviour of hg/monotone/darcs/other_dvcs_predating_git, but again not intrinsically bad.
- pausing to review others' work is needed for a merge, again an expected behaviour on git pull
- making it hard to rebase against a remote branch is good. Don't rewrite history unless you absolutely need to. I can't for the life of me understand this pursuit of a (fake) linear history
- Not cleaning up branches is good. Each repo knows what it wants to hold. Git has no notion of master-slave relationships.
Ok so let me post an opinion from the other side. I think that histories that represent actual history in git are actually not useful at all. Let's look at an example.
I make a commit that introduces a bug and contains a typo. Someone else points out the bug, so I fix the bug. Then someone else points out the typo so I fix the typo, one small commit for each fix. While this is indeed an accurate depiction of history, having three commits (one broken, and two silly short one character/line changes) in your commit history instead of one commit is not in any way useful here. For anyone reviewing a pull request or doing anything else that involved looking through the history (that's what history is for, right?), this is a waste of time, and unnecessarily sloppy. Or if you need to cherry-pick or bisect, for example, you now have three commits that represent one change, rather than one commit for one change.
Let's say someone makes a pull request to one of my projects for a change they made that ends up having 5 commits fixing things they did wrong initially, that could be 1 or 2 commits. There is zero chance I'm going to say "Ah well, I guess that's an accurate representation of history! Let's merge it!" I'm going to tell them to squash and rebase their commits to clean it up, then force push. And to be honest, I think anyone who didn't do this would be doing it wrong, promoting a messy history in their project.
Generally, this whole debate within git is referred to as whether you should "hide the sausage" or not. Further discussion can be found here, with lots of arguments I didn't touch on at all in this comment: http://sethrobertson.github.io/GitBestPractices/#sausage
In the precise case of the bug and typo, you should probably use git commit --amend.
Anyhow, generically, while your changes are local, rewrite history to your hearts content. Have it tell a development story that makes sense. The real important part is: you can rewrite history only while it is private. When you push the commits, never ever rewrite history.
Namely, rebasing is not a replacement for merging.
git commit --amend is rewriting history, just not very much of it. If you --amend after pushing, you run into the exact same problems mentioned before.
Once you push a change to a public repo, you should not change it. It sucks that it's broken, but that's your fault; you should either roll back your change or submit a second change that fixes it. However, you shouldn't rebase your second change onto your first one. There be dragons.
If you want to have a perfect public repo history, you can revert your change, fix it in private, then push the modified version to the public repo.
"Once you push to a shared branch in public repo" rebasing and force pushing feature branches is OK (how else does one maintain a clean history after code-review?).
Every times I hear these sort of justification my little rage meter goes up a notch. These arguments are all based on unverified and unncessary optimizations.
* How many times have you read a project history commit by commit? How much time the occasional typo-commit took you to read? The probalbe answer, if measure would be negligible. Yet you're ready to spend time rewiting history. Which can cause real and known time wate as people have to rebase remerge, and sometimes cause all the problem that have been outlined.
* The whole point of bisect is using a logarithmic search into history. The rare typo-fix commit won't affect its runtime. Again, this is wasting time for no measurable effect.
* Stop the micro-fixing commit already in the first place!
But what really annoys me is that all these are the symptoms of a deeper disease: mis-managed repos. This first thing to do if you have the legendary 50GB log show up in your repo is not to rewrite history. The first thing to do is to make an urgent note to review your repo management processes. How did that file get in there in the first place? Are you pulling directly in your main repo!? The correct practice is to always pull into a staging repo and only merge into main if clean. How do you know it's clean? Easy: use the double-staging repo trick: pull into a staging repo make sure everything is shape (human manual process) then pull into another clean-from-main staging repo and diff the history. If the diff is not empty, you know something is wrong. Only when diff are clean do you pull from that staging repo into main.
(BTW, that double staging is only necessary if you allow yourself to do cleanup in the staging repos. If you always insist on clean pull, then all the cleaning up is done elsewhere. This is not always possible / easy /efficient to do on busy repo. And on your private working repo, do as you please, as long as the pull then comes off clean.)
(Also, I recommend doing the same on your private working repo. I always find it easier to have clean copy of the main repo, one staging repo where my own cersion of clean-up history is kept and the real dirty-work repo yet a third repo. The fact that I prefer to work with mercurial which works best by cloning rather than branching pretty much enforce this discipline. It promotes happy collaborations since you never pull into your work repo and you never push from it.)
The more I read about this stuff the more I realize there are a lot of ways to manage things that fall entirely outside of the actual software that does the management. We're going into the muddy waters of "best practices" but it's interesting to read about different implementations.
Hey so think about this -- what if instead of a "staging repo" you just used branches? This is what they are supposed to be used for. In fact, you can make a pull request from a branch, then other developers can review it and the branch can be rebased and cleaned up before it's merged.
> I can't for the life of me understand this pursuit of a (fake) linear history.
Because we, the rebasers, don't consider the repository to be "history".
If you stop thinking of it as history, your whole perspective changes.
It may fool you into thinking that it's history because there are dates everywhere, but it's really just a big collection of ordered changesets.
With that in mind:
> - accidental reintroduction of commits rebased upstream are the result of wrongly rewriting history upstream. You can't rewrite history when history is replicated along several repos.
Of course you can. Your local repo and the remote servers you have access to are malleable and can manipulated however you'd like. I happily force push when the situation calls for it.
However, it does place a burden on everyone else who fetches/pulls from that repo and so one need to take that into consideration. For major screwups (like the 50gb file accidental checkin from the article), it's very, very good that this can happen.
For one of my projects I'm using git-multimail[1] on our "master" repo and it handles force pushes brilliantly (sends out emails with a list of commits removed/added). This is a great way to communicate with the rest of the team when you have a rewrite-after-push culture.
If there is such a thing as rebasers, git-rebase man page reading should be mandatory for admittance:
„Rebasing (or any other form of rewriting) a branch that others have based work on is a bad idea: anyone downstream of it is forced to manually fix their history. This section explains how to do the fix from the downstream’s point of view. The real fix, however, would be to avoid rebasing the upstream in the first place.“--https://www.kernel.org/pub/software/scm/git/docs/git-rebase....
>I can't for the life of me understand this pursuit of a (fake) linear history.
me too... i really enjoy history being immutable and accurate. all of the arguments i've heard for pruning history tend to boil down to ease of use - but i strongly suspect its a spurious argument, and that the perceived ease of use stems from not even having considered the alternatives and following the crowd (which is afaik why most people using git are using it broadly)
My policy is this: You may rewrite commits you've made to your local repo since your last push, but you may never rewrite a commit that has been pushed.
Your teammates don't care much if you do the former. From their perspective, it's the same as if you'd committed at different points in your development process. So long as your final, rewritten history doesn't violate your team's preferences as to the granularity of commits, you're fine.
So go ahead and rewrite your local, un-pushed changes if you feel it makes the history more organized.
But if you do the latter--rewriting history that has been pushed--you're in trouble. All developers should be able to assume that the history in the central repo is immutable. Most likely, they will rely on that assumption.
I'd like to separate private from local. I allow for private branches on the central repo to be mutable. (By "private", I just mean marked in such a way that its clear it's owned by a particular user.) Otherwise, people end up keeping their changes local until they're ready to merge and you lose the backup aspect of git. But no one should be merging private branches (regardless of location) without their author giving an explicit OK anyway, so this doesn't seem dangerous.
That's a good point. How do you synchronize history rewriting across the local and remote copies of the private branches? Do you have a formalized way to ensure it doesn't get messed up?
If you enforce a convention that private branches start with the developers initials (nb-restructre-everything), and that developers can force-push only their own branches, you can rely on each developer to keep their remote up-to-date.
You can even rewrite a branch shared between developers that hasn't been merged mainline: once it's done, have one of the developers create a duplicate branch in their own private namespace, and rebase and merge that.
Accidents happen, though. If you want to be guaranteed safety, enforce that every developer have their own remote fork. They can rebase at-will on their own fork, and submit pull requests to the mainline repository. (If you're using GitHub, this is dead simple, since you can open pull requests across forks.) This method introduces a fair bit of hassle and overhead for a smallish team, though.
Where I work, you prefix the name of your branch with your username, like "acchow/new-feature". This tells other people "do not touch this branch because its history will be changed by acchow".
Alternatively, if you use github, you can set access controls on forks to read-only and send pull requests from your fork to the main repo.
It's not just the excessive number of merges that git pull creates – they also have the wrong orientation.
Since pull merges others' commits on top of yours, the diff it creates is full of their changes but has your name on it, which clouds blame and history and makes conflict resolution more difficult. I think this is something git should fix.
> I can't for the life of me understand this pursuit of a (fake) linear history
Git is a highly useful tool but it's also an incredibly flawed one, especially in that most critical aspect for all software: the merger of technical and human concerns.
Git is a useful set of source control primitives but it's lacking the cohesion necessary to make it a truly great tool. The fact that it's so popular is largely a consequence of the state of competition rather than its own internal awesomeness. One consequence of this is that instead of a robust, well thought out set of use cases which mesh well with various real-world applications git instead has a few hacky workarounds for certain scenarios. This results in the inevitable wobbliness of any system built on top of hacks and leaky abstractions. Rebasing being perhaps the ultimate example of just that.
Personally I think history should largely be treated as immutable (it's funny how in programming we've become more and more attuned to the values of immutability while in source control we've become more amenable to mutability) but with some sort of system designed to make viewing history easier for the user (lots of other source control products have precisely this, it's sort of silly that git doesn't).
Yeah, I'd vote that answer up. If sergiosgc won't post it on stackoverflow, maybe I will. (My notice my comment there has already been upvoted quite a bit.)
If git pull is causing problems for you, you are not using git correctly. You shouldn't be doing work on upstream branches -- that's one of the wonderful things about git, branching is cheap.
When working with more than 0 other people you should reserve the upstream branches for merging your work and pushing. Do your actual work in a branch and you can easily commit/stash our working tree, switch to the other branch and examine their changes.
This is the classic "if you use a prominent feature in a seemingly obvious way, then it causing problems for you is your own fault". People familiar with `svn update` or other VCS actions which synchronise a local copy with a central repository will get surprised by `git pull`.
This is the classic "I want to use a new, more powerful tool, but I'm going to rely on concepts from the old tool and not bother to learn the new one." If you're using git in this way, you should just use svn. You'll be missing out on the power of git, but you already are, and at least your tool will match your mental model of it.
That's a false dichotomy. Matter of fact is there is no right way to use git, because the way you use git is going to change according to your organization size and complexity.
The Linux kernel is a very different animal than your average rinky-dink hobby project, which is different from your ten-dev consultancy.
Matter of fact is: the git UX is awful and it punishes everyone who doesn't have have a high level understanding of how the underlying persistence model works.
> Matter of fact is: the git UX is awful and it punishes everyone who doesn't have have a high level understanding of how the underlying persistence model works.
I believe this is true, but reading through a description of how the underlying persistence model works was for me a revelatory experience. This was presented together with a description of the precise problem git was designed to solve in the O'Reilly book, Version Control with Git. It seemed to be a problem solved in every particular -- an exact solution, that rarest of all things. I did struggle with git before I read about it, which is why I picked up the book in the first place. Since then I have convinced myself that any failures with git are due to my own deficiencies and not its, but my tasks with git are far from exotic; I rarely encounter any difficulties.
The value in the software is not so much the UX, which I consider acceptable (at least with aliases), but in the underlying data model. For me, it is a tool of daily use, and so the knowledge required to use it is a trivial investment. If we were discussing something other than a command-line tool, I think I might be more amenable to the argument that a simpler UX is a better one. With the CLI though, you pretty much need to know what you're doing before you attempt to do it, and any problems with the process are rarely considered UX issues.
Isn't it a problem then that Git does not expose or visualize its internal model in any comprehensible way to its end users? That and the lack of instant undo seem ridiculous for a program designed to manage change.
It has a steep learning curve. It is not optimized for novices. In my opinion, it does exactly what it should do internally, while I consider other tools to be deficient in some respects.
In the common case, undo is provided by git commit --amend. For anything more complicated, I don't think that there is an explanation or GUI which could be considered "intuitive". I've used gitk and git-gui, and a variety of other visual interfaces before Reading the Figurative Manual, and while I'll be the first to cry my own ignorance, I have not found any of them to give much information about even the available options. What is cherry-picking? What about reverse cherry-picking? When would you want to use either?
It's not impossible to design a user interface which would translate all of git's graph manipulations into a simple visual system. However, its current textual interface is extremely flexible, and it exposes a very useful scripting interface. It is a tool that rewards knowledge and experience, like emacs or vi(m). Which is not to say that it's necessarily a good value proposition to you, but it is (imo) a good time investment. One of these days I need to invest in vim too :(
Eh? "Git undo" should restore the state of the repository to what it was before I typed the last command. How is this not the simplest and most intuitive implementation of this feature?
Unix geeks, you make my head hurt with your stockholm syndrome.
gitk --all is excellent for this. I didn't really fully understand what git rebase was for until I worked through a few tests on my repository, refreshing the GUI after each command I typed.
> Matter of fact is: the git UX is awful and it punishes everyone who doesn't have have a high level understanding of how the underlying persistence model works.
that is very true imo. reverting a merge once its pushed is a real headache in git compared to mercurial and its caused me problems often enough at critical moments that i simply refuse to use git
it can solve my problem but even 'experts' who like to tell me how to do it and how 'easy' it is seem to miss crucial details - the documentation is a little lacking in that area too...
in short i can learn how to do it with hg from no previous knowledge so quickly that i can not worry about forgetting it. with git the overhead for learning how to perform this one task is far too great - mainly down to ultra configurability and what imo are extremely poor choices of defaults.
> you don't work on upstream branch unless you don't understand the whole concept of git
As painful as it is, it is unavoidable in oddball situations. The problem is everyone leaves it as "don't do that" rather than "don't do that, but if you do, here is how to fix it".
Normally trudging through the doco would get you to the fix, but back to problem #1.
>>Matter of fact is: the git UX is awful and it punishes everyone who doesn't have have a high level understanding of how the underlying persistence model works.
I learned Git very recently, and I agree. It took me a while to grasp the basics because they just are not intuitive unless you understand how everything works under the hood.
Actively choosing a version control system is a rare decision; the vast majority of the time you have to use whatever the project leader (OSS) / company wants. If a project is on github I can't exactly choose to use svn.
This has led to me using some truly terrible version control systems, of which the worst was Cadence DesignSync. When using this system I wrote a rant detailing it (which is very offtopic for this thread but might be amusing):
- by default, a "checkout" contains symlinks to (non-writable) files, not the files themselves.
- this means that files can change under you. I suppose it saves a bit of disk space.
- want to edit a file? Mandatory locking!
- yes, that's right, mandatory locking is used instead of merges. No built in merging at all.
- In theory that avoids conflicts. In practice, what happens is: people take their own copy of the file, edit it, get the lock, write their edited version in.
- because it's hand-merged or unmerged, we've frequently lost changes and had to reapply them.
- towards the end of the project, finding out that the file you needed was locked by someone in a different timezone who'd gone home was getting really, really infuriating.
`git pull` is not what's causing problems here. It's changing existing history that already exists across several repositories that's causing the problem.
The other points are merely the result of not really accepting the distributed nature of git. That distributed nature means that history is never going to be linear, and trying to force it is wasted effort.
Is git rebase the "prominent feature" you're referring to? If so, I don't see why you would justify using it in a way that is known to cause problems, even if you're using it for its intended purpose. To use a crude analogy, the `rm` command is a prominent feature which is intended to remove files, and we all know the obvious way to use it, but it's still your fault if you deliberately go around rm'ing your important files.
I think this practice tends to encourage merging code later rather than sooner, which has caused more problems in code I've worked on than anything else. With N developers, you may end up with N or more increasingly diverging branches. If feature A and feature B subtly breaks one another, neither developer A or B will catch this problem until both branches are merged into master, at which point both developers are probably convinced their code works, and may have moved on. If they had worked on the same branch, the problem would have been apparent from the moment the code was written.
Nope, it encourages merging in your own playground, incrementally. You can (and arguably should) still merge mainline into your branch with some regularity, say once a day.
With the issue you're talking about, that's why we have staging environments. If you're going to release A and then B, you should be staging A, and then staging A+B. It's true that if they had ZERO idea on what the other were doing, they could step on each other's toes. But it will get caught before release. And hopefully there's enough communication to at least have a vague idea of who's working on the same parts of the code as you. If you're designing things that span large parts of the codebase without consulting or warning teammates, you have bigger problems.
Without distinct feature branches, you end up in abysmal horror situations where you push something, realize you have a bug in production, but can't roll it back without rolling N other features back. Then you start hacking in code to turn the feature off temporarily, etc, which rapidly turns into a mess.
Branch A and B can merge in from mainline every hour, but you wont catch problems with A+B until either is merged _into_ master.
Yes, you can catch the problem in staging, but you might have caught it before that if you had merged earlier.
> Without distinct feature branches, you end up in abysmal horror situations where you push something, realize you have a bug in production, but can't roll it back without rolling N other features back.
I don't think having a dev branch in an unshippable state is an abysmal horror situation. And even if you use feature branches for everything, that does not mean you can necessarily roll back the commits easily.
As someone new to large-scale DCVS usage, what do the remote repositories look like in a scenario like this?
Do the developers of feature A and B push their respective feature branches to a repo in the staging environment where they are merged in some local integration branch there? Logistically, who does the merging? Is it this integration branch that is then merged into mainline?
Create A+B integration branch, then merge in both directions, A -> A+B then A+B -> A (to pick up changes from B). Integration branch can be tested and merged to master later instead of individual branches A and B. It's not that complex as it sounds.
Probably too late to get another reply, but I'll ask anyway..
I'm thinking about the scenario where you create a branch to do some dev work. It takes a week or so, during which, other commits are happening on master. Probably at multiple points during your week of dev work, you are going to want to pull in updates from master to make sure you don't hit conflicts or run into issues that weren't found until the day you try to merge your final work.
It sounds like you are saying that every time you want to update, you'd create a new integration branch of your work + master and then switch to that for further work until you are finally ready to commit to master? That feels a bit unwieldy..
But you can't actually 'work on the same code'. You have to have a copy. And some projects are significant, not committed in a day or two. SO merging is a fact of life.
And in an open-source community, as you rightly point out, nobody wants to do the dirty work. And merging is as dirty as it gets.
> But you can't actually 'work on the same code'. You have to have a copy.
Surely this is a little overly pedantic? The point is that you want developers A and B to work on code bases that are as close to each other as possible, so that any problem in the interaction between these developers' work is caught early. Your project doesn't need to be completed in a day or two, you just need to accept that you have a dev branch which isn't always in a shippable state (which I think is usually unproblematic).
If they work on the same branch, what happens if A finishes early and B is late, or abandoned as a bad idea?
Instead, A and B should be on separate branches. Maintain integration environments where the feature branches are regularly merged together when each gets to the level of doneness that environment represents. The final step is to get promoted to master-candidate, where you are going to launch tomorrow unless we find a problem. If things break there, blow that away and recreate it from master, and fix your problems one integration environment lower.
> what happens if A finishes early and B is late, or abandoned as a bad idea?
In those cases, you have to fix the code base, and git wont help you. In my experience, it is much less common to have code that needs to be removed than to have bugs that were introduced because the code bases were synced too late. How much of a problem this is depends on how flexible your release cycle is.
Firstly, it is actually very easy for new work to happen upstream while you're merging your work in. Now you have the same problems the OP is discussing - your git pull will merge the upstream branch in, which you don't want (if you are like the OP). The only solution to this is to fast forward to upstream and re-do the merge (hopefully faster this time, especially if you're using rerere!), which is basically what the OP is suggesting.
Also, it seems to me that the OP's point is more "why using pull with the default options is considered harmful", and I think it's worth noting that the "correct" workflow you're advocating doesn't seem to be the one "advocated" by those defaults. You're saying "you shouldn't be merging upstream in, you should be merging your branch into upstream", but git pull's default behavior is precisely "merge upstream in". I think people can be forgiven for "not using git correctly" when they're using it in the way it seems to be encouraging! I'm being uncharitable to you - I know you mean "not using git in the manner I've found to work the most nicely" (and I agree with you), but I think it's worth pointing out that there a lot of git usage patterns are different than what the tool seems to point people toward.
I agree, but I've been bitten by this more times that I would admit.
Usually it's for "silly" things, like updating the readme with the online editor over GitHub (to quickly fix a type for example, when I'm not on my desktop) and then forgetting to do a "git pull" as the next thing on my local branch.
You get those "Merge branch 'master' of github.com:Username/Repository" commits, that introduce "unnecessary nonlinearities in the history" (to quote the reply).
I mean, it's not an huge problem (and that's why I still use git pull), but it does comport some kind of annoyance.
It seems that the SO answer doesn't understand that there is a `git pull --rebase` option. It attempts to rebase local commits on top of what was fetched, and by default shows merge issues if the conflicts could not be resolved automatically. It is my default for my workflow, but I will use different merge strategy when necessary (which is almost never).
Let's say I've got a copy on my server and a copy on my laptop.
I fix a typo in the README on the server, as that's the window I've got in front of me. I push the change, but don't pull it onto my laptop.
I don't pull the change to my laptop. I fix a different typo, and push it.
I pull down the new version on the server. I fix another typo. Now this commit has two parents: the one from the server earlier, and the new one from my laptop. Because it has two parents, it's a merge commit.
But why is that a problem? If you're merging changes from two different sources, of course the merge is going to have two parents. What else do you expect?
This is a fine suggestion for large, systematically developed projects. But for hobby projects supported by an occasional community contribution, branching everytime can be cumbersome.
Except that you need to know what you will do in advance, in order to give a meaningful name to the branch, and you need to delete odds branches in order to find your way when switching, et cetera.
The first point is stronger than it seems, at least for me. It is a bit like requiring a definitive title to a novel I'm starting to write before the first line is there.
Branches are literally just text files that contain a sha in refs/heads. The name of the file is the name of your branch. They are an incredibly lightweight concept that should never be considered the source of any sort of technical debt.
Luckily with Git, you have infinite power with that kinda thing - at least until you push and share your code. Branching is cheap; moving commits from one branch to another is easy; creating a branch after you've done a couple commits is simple.
The difficult part, I think, is on the one side realizing what power you have and how to use it, and on the other side plain discipline.
Do you think Da Vinci sat at his table and thought "let's invent the helicopter" (with a nice unique memorable name for his work in progress)?
I would rather imagine Da Vinci being supposed to draw some boring architectural work or even writing against his will to his great-grant aunt. And then a paper fell off his table, spinning away. And then I imagine Da Vinci drawing a spinning paper on the margins, and adding equations, schemas, and getting deeply inside this new project, which becomes clearer in his mind as he works on it, without ever having a name.
This is exactly why I hate deciding a branch name before some tasks and I love `git add -p` which allow me to add a bucnh of unrelated things in the work tree and extract the juice later.
With git you can experiment to your hearts content. Then, once you decide you have something ready to go, create the branch and commit to it. You don't have to create the branch first.
Once you know what you're doing you don't have to branch preemptively, because it's easy to branch in an ad-hoc manner. For instance, I often develop directly on master, but if some change request comes in that needs urgent attention I simply `git branch tmp && git reset --hard origin/master`.
I do it even on projects when I am absolutely the only developer. I can then commit away whenever I want a checkpoint in my work and merge to master when it is in a stable and sensible state.
I have tried that for personal projects, and more than once I have forgotten about a branch entirely. Imagine: I spend couple of days working on a feature, commit into a feature branch, tick it off the todo list, and after several weeks realize that I didn't merge it into master! And by now there are going to be merge conflicts.
The only time I branch for personal projects is when it is going to be a truck-load of changes, which are going to be hard to forget.
And if you work on a large project, people will push their own development branches upstream and other people will complain there are branches there. Git pull offers very little good stuff, compared to fetch and whatever.
The although the answer is quite thorough in almost every example you had to be doing something else incorrectly for git pull to be an issue.
I have often railed with other developers working on mid/large sized projects about having this pristine git history where commits are all tailed back to back. My main issue with this is the fact that you never get a commit for when two branches come together so you never really know when an issue was introduced.
i.e.
Branch A works.
Branch B work.
Branch A-B doesn't work. No merge commit to show introduction of bug.
People in general practice should not be force pushing.
Most of the arguments seem to be against not know what you are going to get which can be preempted with a fetch portion of git pull, but if you don't trust people you are working with, work with new people.
git pull doesn't delete old branch just as git add doesn't to delete old branches, I am not even sure making an argument like that means, its not what git pull is for.
This sort of "git pull" slam is really just an aftershock of shooting yourself in the foot.
The UX of git is so different from other tools, but seems so similar. It's a real problem for newbies. It's easy to think you know what's going on, because is vaguely similar to other operations. And, for the impatient, the documentation is really, really obnoxious. And most developers I've met are fairly impatient.
End result: most developers I've met has shot themselves in the foot when they started to use git.
When I bring people onto git now, I start them with a nice visual tool in an existing repo; my current favorite is SourceTree. But that's not a requirement. The simple fact that they can a nice history log with see tags of "origin/master" and "master" usually triggers that "WTF" experience and they start asking the right questions. If I start them with a new repo, and then have them add, and stage, it's all a bunch of things they could figure out, and they get impatient, and then bad things happen when it comes time to play with others.
Why was this link put into hacker news ? It's old, it is also based on inaccurate understanding of git. Experienced git users don't need this(1). Is Hacker news designed to publish hints for beginners ?
Hacker news aimed at links that are deeply interesting.
> "A crap link is one that's only superficially interesting. Stories on HN don't have to be about hacking, because good hackers aren't only interested in hacking, but they do have to be deeply interesting."
In my opinion, this link is superficial, and does not go in the real topic : the workflow.
(1) In my opinion, an experienced git user should know what he need to fetch, and when he should have a merge instead of rebasing some commits, and should also know the commands to do so. It's more a matter of workflow than a matter of git command.
I have never had an issue with git pull. With proper communication and push/pull etiquette, there will rarely be conflicts and when there are, just rebase. I don't think it is ever ok to force a push. If you are having trouble reviewing others commits then try using gitx to view the changes in an easy to read way. I like that it doesn't remove deleted branches because sometimes I still want it when the rest of the team doesn't. I'm not convinced..
So I take it that the poster is also the one who gets worked up when people use git pull? This person is criminally insane and I feel for his colleagues.
It's not harmful. It's just the surprising asymmetry between "git pull" and "git push": for some reason (which is obvious), "git push" does _not_ merge, whereas "git pull" does.
Again, compare with how perfectly symmetrical pull/push operations are in Mercurial.
mind explaining what hg push/pull do to a git only user?
perhaps it's that i'm just only used to git's definitions of the words, but they seem symmetrical to me in the sense that they both try to synchronise history, either upwards or downwards.
i can't really imagine what a pull would do differently than a merge or rebase that would satisfy the idea of synchronisation but be more symmetrical as you describe. i'd really appreciate an elaboration.
"hg push" does nothing fancy: it just prepares a bundle [1] and sends it over to the server. The server can either reject (when committing a bundle will result in creation of a new head) or accept (either by virtue of --force flag or just because it does not create any new heads) the bundle and transactionally commit it to the repository. If the repository being pushed to is not "bare" (in Git parlance), it's working copy will not be affected by a push operation.
Note that there's no merging going on here. All the merging/rebasing/rewriting happens before "hg push".
Now, "hg pull" does perfectly symmetrical thing: it downloads a bundle from the server and commits it to the local repository. Working copy is not affected at all. Again, merging/rebasing happens elsewhere, outside of "hg pull" workflow.
Although SO does give one the option to answer an own question it is clear here that this guy posted his question in order to answer it, perhaps to farm points, perhaps because he thought his answer is genuinely useful. The answer is time stamped the same as the question.
Not sure why we should care one way or the other but to me it does seem a bit against the spirit of how SO should be used.
"If you have a question that you already know the answer to, and you would like to document that knowledge in public so that others (including yourself) can find it later, it's perfectly okay to ask and answer your own question on a Stack Exchange site."
In this case, however, it's not so much an answer, as an opinion that he vents dressed up as an answer. The real answer is that there's nothing wrong with `git pull`, but for some unusual ways of working with git, it may not be the tool you need. And that's no problem; git has other tools.
On the contrary, the stated purpose of SO is to be the repository of programming related questions and answers - to the point that moderators routinely go in and edit questions so that they are more clear. So, the goal isn't so much to help the one person who has the question at that moment in time, but the masses of people who will have that question in the future.
With that goal in mind, then answering your own question is clearly in the spirit of the site.
My issue isn't that he answered his own question. My issue is that this question might have been closed if he hadn't, since it's likely to solicit opinion.
Apparently we're supposed to care because there are still some sour-grapes, git-hating Mercurial users who can't understand why anyone would use git. They keep pointing and going "see? this is why git sucks!", but most everyone else has learned how their VCS of choice works and moved on.
Meanwhile, Mercurial has picked up history rewriting, which reminds me in so many ways of Apple fanboys who loudly decry Android, then blithely ignore when features from Android get cherry picked for iOS.
Use what you like; if you don't have a choice, either change projects (quit) or learn how the tool works. Nobody cares that you think the git UI is "godawful". Nobody cares that you think history should be inviolate. If you want that, enforce it on your project. These "complaints" have been answered multiple times, and if you can't be bothered to look into the reasons behind git does what it does, then don't be surprised when people ignore your self-serving rants.
Then just stick to `git pull`, `git commit` and `git push`. They do everything you need. All that messing about with your history that this guy is talking about is totally unnecessary.
Seems obvious to me that git is just the tool we use to implement a specific workflow. If the developers don't understand or disagree about the workflow it will be a bumpy road. git pull is certainly part of our workflow but doesn't generate much pain at all.
Excellent question! I've several times been told to make git aliases. I'm very reluctant to, because I know in practice different people and different machines will have different aliases, making a friction on collaboration and knowledge sharing.
> It is really a git best practice to create custom commands for common operations?
No, of course not. But this isn't a guy asking an honest question, this is a guy venting his opinion and showing how to use git with his unusual way of working, which is possible. But his way of working is not a git best practice. It's just one of the many, many options you have with git.
I have a cronjob that does a "git fetch" every 5 minutes. That means I never pull. I merge-fastforward, and I rebase. Very satisfied with that workflow.
That post really irks me. Granted, S.O. encourages you to answer your own question, but this guy would have been much better served by a blog post. A Q&A is not the appropriate format for him to ask a question when he has no need for an answer. Seriously, blog it, dude.
By asking questions lots of people want to know about and then waiting a year for votes to trickle in? Sure, but I would have thought that would be obvious.
Did you read the linked webpage? The problems of “pull --rebase” are addressed, and the author offers an alternative solution. I am not expert enough to judge whether there still are problems with his solution, but I can assure you that your comment is not adding anything to the debate.
The linked webpage does not mention git pull --rebase. The author talks about problems with setting the configuration options of git pull to any one default, but doesn't address the fact that we can perform a fetch, merge, rebase in one command when necessary with "git pull --rebase". The default configuration is preserved in this case. That said, I have none of the issues the SO guy has. It's insane that he has so many problems with people "cleaning up" a remote's history by making it more linear than it actually is. You should never force push and you should never rebase pushed commits. There are very few exceptions to this rule.
I read the linked webpage, and I have had none of the issues he seems to think occur with git pull --rebase. I believe he uses git differently than I, and many other people, do.
Whenever I'm about to begin work I create a new branch. I git pull --rebase that branch with the upstream branch I am going to merge with frequently while I work. I've never had any unexpected behaviour.
I think the amusing part is how long it took the asker to answer their own questions, suggesting that he asked only to answer right after. Nothing wrong with that, in fact I think it's great.
When you "ask a question" there's a checkbox at the end to "answer your own question". I guess it's the stuff one could do after spending hours tracking an issue and realising others could benefit from a google-juiced answer.
You mean the kind of question that is beyond 'how to read file in <language>? TBH I wish SO consisted only of that kinda questions (the difficult ones, that is)
It may be amusing, but it's a feature of SO. When you ask a question it gives you the option of answering it yourself. This is so you can share the solution to a problem you have solved in case anyone else has the same problem in the future.
I've done this a few times actually. If it's a problem that I can't find the solution to using Google and I suddenly solve it, I'll answer it on SO so other people can benefit in future :)
- nonlinearities aren't intrinsically bad. If they represent the actual history they are ok.
- accidental reintroduction of commits rebased upstream are the result of wrongly rewriting history upstream. You can't rewrite history when history is replicated along several repos.
- modifying the working directory is an expected result; of debatable usefulness, namely in the face of the behaviour of hg/monotone/darcs/other_dvcs_predating_git, but again not intrinsically bad.
- pausing to review others' work is needed for a merge, again an expected behaviour on git pull
- making it hard to rebase against a remote branch is good. Don't rewrite history unless you absolutely need to. I can't for the life of me understand this pursuit of a (fake) linear history
- Not cleaning up branches is good. Each repo knows what it wants to hold. Git has no notion of master-slave relationships.