Hacker News new | comments | ask | show | jobs | submit login
What is wrong with “A successful Git branching model”? (barro.github.io)
103 points by GolDDranks on Feb 28, 2016 | hide | past | web | favorite | 120 comments



I disagree with the author.

Feature branches should not be long lived in the first place. If a feature branch encompasses more than a story of 1-5 story points it's likely that it is too large in the first place. Once you dispel this notion that feature branches are allowed to be long lived all the other points fall as well.

Since feature branches are merged to master frequently the problem of integration between multiple features is mitigated. Personally I find that the extra merge commits and the commit bubbles generated by this model make the log easier to read and it's clearer how the code came to be at the state it is.

Additionally the PR/MR process codifies when code review should happen and it does not require an extra tool if a web based git UI is being used(GitLab, Github, Bitbucket). During the PR/MR the author can create many small commits with fixes to aid the review of incremental change after the initial full review. When the PR/MR is accepted a rebase on origin/master and heavy use of squashing removes the extra noise created by these smaller commits as well as resolves most problems with integration into to the master branch.


This. This is the correct way to think about it.

Ballooned feature branches suck either at merge time or at deployment time. The problem in either case isn't the model -- it's the balloon of changes.

Commit early and often -- and merge back to master. :-)


This argument assumes that useful changes can always be broken down into bite-sized chunks and implemented incrementally without unwanted side effects. That's an ideal situation, and it's also a realistic one a lot of the time, but not always. A development process that can't cope with major changes being needed from time to time isn't going to be appropriate for a lot of projects.


To me, it sounds like the author is advocating for what we did when we used Subversion and merging was painful for non-trivial differences. In my opinion, the reason for switching from SVN to Git isn't that it's so distributed (you need to push centrally to get CI and share changes after all) it's that branching and merging are so much easier there's no excuse not to.

We create a branch for nearly every ticket and definitely for all features.

Master is essentially production ready or already in production while our development branch get run through CI/CD on every commit. The development branch is rarely broken and in almost all cases is where we branch our features/fixes from. It's up to the branch developer(s) to merge development back into their branch before bringing their branch back to dev- but since we're a small team and branches rarely live longer than a week merging back causes so few issues that we don't even flinch.

Gone are the days of SVN where you'd spend two weeks off commit and pray your integration went smooth.


Totally agree, shared remote branches are a good idea to build new features. A remote branch allows you to leverage CI in advance of merging and to discuss a work in progress with a team member, for more details see http://doc.gitlab.com/ee/workflow/gitlab_flow.html


You just reminded me of a coworker I had that would work for months on a branch, then spend a couple weeks trying to merge it. Eventually I had to yell at him before he would stop. I felt bad about yelling and apologized to his cubicle-mate.

Small feature branches, or if you handle it by merging daily, cause no problem though.


    > or if you handle it by merging daily
Right, as far as I remember the "successful git branching model" still allows - even encourages - merging master back into feature branches.

"Long-lived" doesn't have to mean "really out of date".


I've done this previously, and it works, but it creates a very messy history where you have merges going back and forth. We typically had several merges to get changes in from master for each feature branch, even if they only lasted for a few days (inside a sprint). Then there's the final merge back to master. If you look at the repo from afar after a couple of features and weeks, it gets very hard to read.


use:

  > git merge --rebase
Keeps the history clean, if that's something you care about


Yeah, that's what I prefer at all times. Haven't figured out a good way to share feature branches with that model yet though. Sometimes you need to work on a feature together with other people without checking it in to the main branch, and doing force pushes isn't pretty...


I prefer the concept that master should always be deployable to production. My CI creates a build artifact on every commit to master and the artifact can be deployed to staging/production at any time. Thus, master should always be considered stable.

Development should happen in a branch. When it's done, checkout master LOCALLY, merge in your branch, run unit tests, if tests pass, push master. If tests don't pass, reset master back to origin and keep working on your feature.

This way master is always ready to go and you also get the benefit of seeing how a feature was developed and the merge commit is a representation of a complete chunk of work.

If you do need to back out a feature, how would you do it on a rebased, fast-forward merge? You have NO CLUE where the start of the feature was. You have to just guess based on the author maybe? In my experience, I've never used git bisect so I don't optimize my branching model for that experience. For me, backing out bad merges is much more common so I optimize for that.


What you describe for testing and merging is a needlessly manual process. Using e.g. Phabricator, I can make a local feature branch and when I think it's ready open a request for code review. My CI system can run tests. I can update my request in response to test failures or reviewers comments. When my code is ready, Phabricator can squash all my commits into one and rebase that onto master.

If for whatever reason I did want to back out this code, there's a single commit to revert. This is unlikely though, since until a feature has proven itself in production it will be behind a feature flag.

https://secure.phabricator.com/book/phabflavor/article/recom...


That's a nice process. Currently, GitLab doesn't support running tests on a merge request in that sense, making a temporary merge of master and the branch and running the tests to confirm the merge will work. When/If GitLab + GitLab CI adds that ability, then yes, it would make that process a lot easier.


We're discussing adding a 'test the merge' ability in 8.7 in https://gitlab.com/gitlab-org/gitlab-ce/issues/4176

This will test the 'current merge' as soon as a commit is pushed, not the merge result when it happens.


> If you do need to back out a feature, how would you do it on a rebased, fast-forward merge?

I've never been on a team where we fast-forward-merged multiple commits as a feature. (I'm sure it happens, but I can't see it working well.) What I have seen is fast-forward-merging a single commit (which is equivalent to a cherry-pick, at that point), after using git rebase to squash the feature branch into one single atomic commit and rebasing that commit against master in preparation for mainlining.

Then, you'll never have merge conflicts (if you do, you didn't rebase against master properly prior to the merge), and if we need to revert a feature, the feature lives in master as a single commit.


Squashing a large feature branch for rebase has a downside that it loses some history and timeline context; if for example an api changed slightly in master in between branch development and branch squash, it can be very confusing trying to understand why the commit did XYZ thing looking at the squashed diff - while if you keep the branch through a merge, it's easy to see that XYZ was written with the older API in mind.


This is exactly why I prefer merge commits, so you can see how the development of the feature occurred and if there's a bug where/when/how it was introduced. I don't like erasing development history for the sake of a "clean" or "linear" history.


A nice side-effect of having master===production is that your documentation on GitHub (or whatever web viewer you use) is always consistent with the deployed version.

Otherwise, you could have users come to your repo looking for docs for the version they've installed, and instead get docs about APIs that aren't released yet.


That's how I like to work as well.

For libraries/open source, homu is quite a nice gatekeeper to pushing to master (you basically tell the bot to integrate a PR into master, and it runs the tests on CI for you before pushing).

[1]: http://homu.io


IMHO, semaphoreci is better: it runs build without need to write anything in PR, so no additional teaching is required.


My biggest problem with "A successful Git branching model" is that it scares people into thinking they have to design and commit to a workflow before they can develop software.

Having a defined workflow is not a bad thing. And it certainly wasn't the intention of the author for anybody to stagnate. Unfortunately, it's very natural for technical people to overthink and over-engineer process.

In my mind, when an organization is new to source control or is haphazardly using it, it's best to simply start using the tool. Each product is different, each organization is different, and each developer is different. You should mature into an organized development model based on what you know about your own situation.

If you develop on the develop branch and release on the master branch, but don't communicate that clearly to the new guy, I guarantee he's going to start working on master right out of the gate. The same goes for every model where there's a lack of communication around procedures and no workflow controls.

Git saw a rise in popularity because it was easy where every other solution was difficult. If you're having issues with code review, patches, release versioning, or whatever - address that particular problem. If the problems are multiplying to the point where it's significant, it's time to think through a process. Spending your time and energy ferreting out a process based on the fear of what might one day happen will result in a significant amount more time spent doing that then addressing the would-be problem in the first place.


I agree, communication is essential. The most important thing is that the developers agree on how they work with git, and all these branching models are just usable defaults and starting-off points from which you figure out what works for your organization.

But preventing the new guy from pushing straight to master is not that hard. On my current project, a git hook prevents merging commits into the remote master if that commit doesn't already exist in another remote branch. So you have to do your work elsewhere or it won't be accepted.


Unless of course he sees an error and runs `git push --no-verify`.

I'm not really saying any of the advice around git workflow is bad. It's that too much time spent on process instead of development is. If you take the advice for what it is - a good starting point - I think it's great to use.


> Unless of course he sees an error and runs `git push --no-verify`.

Yes, but then you know he's someone you need to fire. That's also useful knowledge.


Letting devs continuously push their shitty commits to master without passing any sort of CI suite is a recipe for constant broken builds and blocking the deployability from master from it being constantly broken. Master should be always deployable, or better yet, auto-deploy when anything is merged in. That way your devs are forced to write good, CI-passing code in their branches before they're merged into master.


If your devs are creating shitty commits in the first place maybe you are facing a completely different problem, don't you think?

We only push to master after rebasing locally which has worked out well for us for several years. Every commit is supposed to be deployable to live and CI friendly, that is, tested. So, by definition, we don't create shitty commits (most of the time :D).


Maybe your code isn't done yet but you'd like it replicated to github? This model doesn't allow that


Sometimes, but very rarely, we do create WIP (Work In Progress) branches so we can push the code to the server. But the idea we follow is to integrate with master a fast as possible ideally with several commits per day.

I used to be a fan of the git branching model but since I work like this I find it much easier and less error prone. I guess is the way we've found comfortable for all of us.

Just choose what works for you!


Does this mean all your features are able to be completed in a day?


>Letting devs continuously push their shitty commits

Do you mind explaining/disclosing your position to dev teams?

I hope this is not your point of view of a dev in general.

Last please keep it constructive, this doesn't address the source of the problem how did you solve it/what you have tried but just rant about your perception (given no reason is provided).


I am the dev making shitty commits. I've also worked with devs much smarter than me, and they also make some shitty commits. I have no problem explaining this reasoning to them. I wouldn't use the same language I use as an anonymous internet poster. But everyone makes mistakes. Even the best of the star players break the build every now and then. Even the CTO sometimes takes down the whole system by accident. (Always using) feature branches and auto-deploying master are a solution I like. This also allows for easy code reviews. It also encourages a culture of things like good, detailed commit messages (it's easy to push an un-reviewed commit of "fixes" to master). It's also easy to run auto CI tests on these branches. It also allows for gatekeeping if you have a contractor or someone on your team who shouldn't have full commit access.


We use Gerrit and Jenkins. Any patch gets hit with CI pre-merge.


The main thing to keep in mind with git workflows is that one size does not fit all. The number of people working on the code, and the way they work on it, has a big impact on which git workflow works best for you. The nice thing about git is that it's flexible enough to accommodate very different ways of working.

His primary objection to git flow seems to be integration hell, which is easily avoided by regularly pulling from develop, which is something you also need to do in his model. His point about git bisect is interesting, though I've never heard of anyone using it.

The primary advantage of shared feature branches is that multiple developers can work on the same feature that's not ready for release yet, which is impossible if you've got a single shared branch, and it means others have easy access to your code if something happens to you. Code shouldn't live just on the developer's machine. And while other backup solutions exist, git can keep it in the regular developer work flow.

That said, I don't think feature branches are sacred; I'm totally fine with developing small features straight on develop. Or master, if you prefer. Whether you want master to be stable or unstable is a matter of taste. Although I personally think it's useful to have a single branch that always contains the latest stable release.


> The main thing to keep in mind with git workflows is that one size does not fit all.

This. This is the only true comment about this. Every software development project is different, and why there are categories they can be sorted into, every category needs a different kind of source repo layout.

Anyone who starts discussing this issue without that notion in the front of their thoughts might as well be banging their head against a wall for all the good it'll do.


You can often find ways to ship parts of a feature, or hide it behind a toggle.

http://martinfowler.com/bliki/FeatureToggle.html


IMO, there's one big feature missing from git (and other SCMs in my limited knowledge of the field) that results in posts like this every so often.

There really needs to be some concept of a commit group, one or more commits (probably the diff in commits from one branch to another) that are packaged together so that they can be addressed as one object instead of trying to keep track of a whole bunch of commit hashes. Merging a commit group is revertible and trivially cherry picked across branches. This addresses one of the weirdest behaviors in git: the rebase squash. We want commits to be understandable and usable, but that probably assumes the committer either held off committing or rewrote history to make it seem like things were written correctly the first time. It seems to me like the whole purpose of an scm is to keep track of history, whether it's the neatified, readable feature/branch merges or an individual git user's frantic, probably unorganized development progress.


We use ticket references in our commit messages. So you'll be able to find all relevant commits of a specific feature or bug fix. Cherry-picking them is then also no big problem.

  git cherry-pick $(git log release/3 --grep ISSUE-123 --pretty=%h)
The commit message contains the technical details about the changes. The ticket reference is a linking to the feature details itself, so you can quickly find out what the commit tries to achieve on a higher, non technical, level.


Do you put `ISSUE-123` in the title, or the body of the commit message?


Maybe we need "git commit --signoff" for some kind of QA or review process to bless a branch as ready to deploy and give a high-level summary of it, and a mode of "git log" that only shows the graph of commits having a particular Signed-Off-By value.


There's more than one way to do it and what works well in one project may not be suitable in another, depending upon things like team size, development velocity, code base size, external dependencies, etc.

The model described by this article is very close to how Chromium development works. Meanwhile, git itself uses a model much closer to the one this article dislikes. But git is a smaller code base, with many fewer commits per day, and has a single person responsible for integrating all the changes which come in over the mailing list.

Generally, I'd say that a smaller team that produces relatively fewer commits per day can use a more complex branching model, while a larger team that produces much more code churn requires a simpler model.


I'm usually a silent reader of HN, but I can't help myself.

I definitely disagree with pretty much everything. The merits of merge commits have already been proven, and using rebases is basically rewriting history. If you want all work done precisely as happened in real life you should always use merge commits. Saved my ass a bunch of times.

Also, when something breaks after a merge, it's super easy to undo and to deal with - "Yo Don, your merge of feature X just broke feature Y. Run some tests".

I just recently moved to a new job where they love the "everything is on master" approach and that is absolutely terrible IMO. Everything is constantly broken, and developers always end up breaking builds and stepping on each others' toes. CI doesn't help in this case either because you always need to wait for some dev to fix the build. Wouldn't it be better to just undo a merge of some feature (i.e. reject the merge) and then let the dev fix that on her own time? Working just on master is just the wrong way of doing anything on a team with more than 2 people.


The merits of merge commits have already been proven

That is an opinion, many people clearly disagree with it.

Using rebases is basically rewriting history

Yes. Why is that a bad thing?

If you want all work done precisely as happened in real life you should always use merge commits

Why is that important at all?

Saved my ass a bunch of times.

So has reflog.

I personally don't find any compelling rationale for why I need the git history to reflect what actually happened in time. That seems entirely incidental. Who cares if you made 10 commits on your branch or 2 or 1? Who cares if you actually wrote one part of a feature before another?

The purpose of having history is another way of communicating design intent to other people that look at the history, no more, no less. Therefore why shouldn't it be editable? It's like arguing that a typewriter and whiteout produced better books than a word processor.

I also use rebase to extract portions of the feature that are merge able before the rest. Or you can pull out sections of a branch when you are happy with them and branch off a few experiments on top of that solid base, finally merging in the one that works out best. Not employing those techniques is akin to not knowing how to refactor code.


Indeed, from my observations people using rebase are usually working solo or in very small teams, or just like spending insane amounts of time fixing their broken code.


The typical effective way to work with Git is to develop on master while running "git pull --rebase" regularly, and before you commit. Prefer to build up a feature as one commit, with "git commit --amend" or "git commit --fixup". Since you pull and rebase regularly, your code is closely in sync with all shipped changes, and if there are any conflicts you get to resolve them in small incremental pieces. When your change passes CR, push it.

The result is a simple, clear, linear commit history with a minimum of effort spent on branching and merging fuss. The fuss is rarely worth it. This model works for quite large teams. After a certain size, it might be better to split the package into several rather than add branching/merging complexity.


Doesn't that prevent you from using commits while developing the feature?


You can make as many commits as you need on your feature branch. When you're done with the feature, you do `git pull --rebase origin master` (or whatever the main branch is) and squash your commits into one (or a few -- when it makes sense) using `git rebase -i`.


Better to squash before the rebase so you resolve conflicts once rather than x times...and hope there are no nasty merge commits to ruin the history.


I know the workflow of rebasing a feature branch into one commit on master, but that doesn't sound like what Pyxl101 was talking about, regarding the use of "git commit --amend" and "git commit --fixup".


Rebasing is safe as long as you only do it to commits you haven't shared yet. Of course the risk is that if rebasing becomes your default, there's a risk you might also do it on a commit you already have shared.

As any time traveler knows, you need to be careful when you rewrite history.


This sounds like horrible advice. Especially the part about git bisect: it is indeed an often situation, where both feature branch changes and master branch changes are correct by themselves, and only introduced the bug when combined. And in this situation, git bisect pointing to the merge commit as culprit is exactly what should happen, because incorrectly implemented merge, which didn't take care of interface changes (for example), is exactly the reason for such a problem.

This exact situation is also the reason why I hate git rebase: it rewrites history and hides it's real complexity under leaky abstraction. When I tried to use it, I found myself guessing a few months later: were these commits really how I (or a team member) wrote this code? Or may be these are actually rebased commits?

So, this statement:

> having the history linear without any merge commits could immediately point out the commit that causes issues.

Is laughably incorrect. When you make your history linear, you can no longer pinpoint the commit that caused issue, because after doing rebase, you destroyed it. The original commit that caused the issue is no longer there. Or, it never was there: if the issue was introduced not by individual change set A or B, but by there combination, then the merge commit would the one that caused the issues. But since you did rebase instead, the `git rebase` command was the "commit" that caused your issues! But, of course, you won't see it in git's history.


> git bisect pointing to the merge commit as culprit is exactly what should happen

in theory, maybe. in practice, this simply does not help if you are looking at massive merge commits. sometimes, the merge goes wrong even with no conflicts, and then you can debug that massive merge with little clue where to start.

compare that to rebasing, where you have a hopefully small commit that breaks. you indeed loose the information whether the rebase broke it or the initial version was broken as well, but I don't see the practical difference, it needs to be fixed either way.


Because the notion that commit broke that code is false, and out of that false information you will be able to get a lot of other false ideas about how the code was developed, which will lead to your fix not being entirely correct.


That notion is false, right, thus I'd rather think "this commit is not correct anymore".

Regarding getting ideas how the code was developed, for the commit that is not correct anymore I have a commit message telling me why it was added. For reasonably small commits, I don't a lot of room for misunderstandings and wrong fixes.

This applies if the required changes in your branch are rather small. I think we can agree that rebasing breaks if some changes in master require lots of changes in your branch. But so does merging, as the merge becomes even more horrible and error-prone in that case. Basically, large changes ("real" development) should happen neither in merge commits nor during rebases in my opinion.


> I'd rather think "this commit is not correct anymore"

Using the word "anymore" violates the whole abstraction of git, which captures the exact state of the project in each commit. If you look at a commit, there's no "anymore", no other timeframe rather that the commit's; it's either correct or incorrect on it's own.

> But so does merging, as the merge becomes even more horrible and error-prone in that case.

Why?

Just as you don't put all your feature work in one commit, why would you implement a complex merge as one, single merge commit? It makes much more sense to merge each interface change from master to feature branch individually, resulting in atomic merge commits, each of which handles specific aspect, while staying correct.


> If you look at a commit, there's no "anymore", no other timeframe rather that the commit's; it's either correct or incorrect on it's own.

this is true but frankly nothing i ever cared about. I think of commits as something that i can change and break and fix through rebasing, amending and the like. This mindset makes you a lot more flexible when handling commits, and when keeping in mind that the commit hashes change while doing that, I don't see any downsides.

> It makes much more sense to merge each interface change from master to feature branch individually, resulting in atomic merge commits, each of which handles specific aspect, while staying correct.

Then we have different views on what a horrible merge is. For me, that is having dozens of merge conflicts plus multiple necessary changes not indicated by conflicts, and most of those problems need to be resolved before the project compiles and the rest has to be resolved before all tests go green. There is no "while staying correct" in there. And this, working on lots of breakages at once, while needing to fix all of them before tests are green again, this i don't like at all and will be partly mitigated by the rebasing strategy.

There certainly are advantages of merge commits, but in my (admittedly limited, especially when talking about larger projects) experience, resolving merge issues does get harder with them.


Unless we live in some parallel universe, everything is linear...

I think you simply miss the idea of linearizability...


I must respectfully disagree. In this universe, nothing is linear in terms of how we perceive events and their simultaneity. If we consider a commit is an event, then who can say one commit comes before another? As in relativity, if both developers are working independently then they each of their own frame of reference. In effect, a merge commit reconciles those different frames of reference in its own discrete and well-documented entity. You could think of it as the equivalent of a Lorentz transformation. I understand the analogy is quite stretched here, but I still find it useful.

It follows then that if a merge commit is a genuine reconciliation, then a rebase is the act of one party rewriting their own memory in order to hide from themselves the fact that they were working in isolation and later had to agree on the actual outcome of events. Thus, in effect, rebasing ones code is no different than suffering from a self-inflicted delusion. That is, a delusion that we are all working in one shared directory or, to a lesser extent, that there is one source of truth as in SVN or other centralised systems.

This is, in fact, quite a sad thing our community. Many a developer -- while they may be using git -- are not truly using git in the way it was intended. They have limited themselves to their old ways of thinking and acting and, in the pursuit of what they have been taught is "the ideal", they have hidden from themselves the possibilities, efficiencies and understanding that might be discovered by the move into a world which can capture the fact that not everyone works in a single shared directory with immediate knowledge of each other's work.

We have to learn to embrace the oddity of a non-linear history because, while we might not like it, it represents what really happens when we write code. The better our tools help us model reality the better answers we can produce for ourselves now and in the future. Six months from now when we say, "How the hell did this ever work?" or "What the fuck was I thinking when I wrote that?", we'll want answers. If we choose rebase, corrupt our own memories and burn the reflogs we'll never know.


This is a poor and incorrect analogy for rebase. Someone using rebase is conducting the reconciliation and then rewriting their commit to account for it, while simplifying the commit history and hiding it.

When a commit lands in master, it doesn't matter to anyone else how it was developed. Those events are almost outside the "light cone". Those who rebase a commit onto the upstream branch recognize that the little details of how it was made have no relevance to others and thus do not belong in shared history.

People often speak about preserving history while missing the point that source history is most meaningfully logical, not physical. Imagine an editor that made a new commit for every key press. That would be a true recounting of history, and yet would be irrelevant. Most feature development can and should land as one commit; the details of how it was made are, like the key presses that composed it, minimally relevant to anyone but the author.


Fair assessment. Commits should represent logical groupings of modifications. This is always a trade off between the extremes of recording each change atomically and dropping one tarball release of all code to replace the last with no changelog. However, the purpose of having a history is to, in as much as possible, reconstruct the contents of the writers mind when evaluating their actions and the changes they have made. This is the heart of why we have revision control -- to allow us to manage this information and query it effectively.

Since the purpose of a revision history is to enable historians to gather a complete picture of a feature's genesis so as to interpret the author's changes and mental states prompting those changes in their correct context all of recorded history should facilitate that purpose. If rebase workflows do indeed encourage big atomic commits which drop features into a repository as though they were commandments from a god then obviously they do not help historians understand the authorial intent that went into their creation. You yourself argue for just that as do, in my observations, most users of rebase oriented workflows. Additionally, the heart of rebase workflows themselves, namely retroactively changing the parent of a commit, misrepresents what the author knew at the time that code was written. This clearly misleads the historian and I find the workflow to be dishonest and unhelpful in preserving the very reason we use revision control and thus I judge it harmful.

If one uses rebase (say, in interactive squash mode) to clean up and regroup commits into small logical chunks in order to facilitate the understanding of authorial intent then I don't have any issues with it at all. It's when it is used to misrepresent, overload and mislead the historian for the sake of a "good looking" git log that I find the usage of rebase distasteful.

An author should never be so self-absorbed that they believe that no-one but themselves could possibly care how they did their work and arrived at the conclusion they did. Authors that abuse their tools really have no excuse and best remember that they themselves are also historians with respect to other people's code and, in time, their own.


> When a commit lands in master, it doesn't matter to anyone else how it was developed.

On the contrary — this is one of the most important pieces of information I hope to get from my VCS.


Well people work concurrently, and IMO the repository should show that, too.


The work of several people who are active at the same time cannot be linear by definition.


> Is laughably incorrect

It depends on your workflow. It's correct in context. Don't rebase. You see the commit that caused the breakage if you have a testsuite-on-commit workflow.


But you will have to either merge or rebase if someone else pushed into master while you were working. The difference is, rebase keeps the history honest about the fact that you were working concurrently, while rebase tries to construct an alternate, false version of these events.


The interesting part of a commit is not when you did something, but how it will change the repository.


Rebase also doesn't change when you did something either.


But "how it will change the repository" depends on "when" did you change something, e.g. parent commit.


Yes, which is why you want a straight history so that it is obvious that no shenanigans had to happen in the merge commit.


> rebase keeps the history honest

The history is a record of changes. You're suggesting there's a good reason to track timeshifting by implication (you could rebase at multiple points) because...honesty?

I'm not trying to argue about the why. I'm trying to understand the why.

My view is that you merge your change into the head and there's a record. That's the story. The particulars of how a particular commit came to be is irrelevant to the repo and less problematic when it's treated the same as any other commit.


I recently had to work at some place that followed the 'successful Git branching model', and it was quite awful but I also felt maybe it was suitable for them because they had probably several hundred developers running around to make their large media website (once you count the drupal, .net, node.js, and I forget the last one - as well as frontend stuff and media management processes they had to do)

on my own projects there are often only 2 or 3 internal developers (and sometimes that can go down to just me, so that further decreases the need for any complicated process) and there I like to have Master be deployable, a branch named development that does sprint type stuff where you expect not to push to master for a week or two, and bugfixes where you expect any work gets merged and deployed in hopefully a matter of hours.

Sometimes self contained feature additions that will not affect anything else will be put on their own branch, often these are done by just one developer who might even be external to the main project.

Is this the perfect way to do it, probably not, but I tend not to believe in perfection of process I just want something reasonably manageable for the team size and the complexity of the project.


The whole series of article from Junio C. Hamano, and in particular "Fun with merges and purposes of branches" (http://gitster.livejournal.com/42247.html) are full of good advices regarding how to use branches, merging and rebasing.


This is a good idea - Junio understands git. Linus understands git too, but you only get insight out of him when someone screws things up badly.


From the article:

> The biggest issue with [the article "A successful Git branching model"] is that it comes up as one of the first ones in many git branching related searches when it should serve as a warning how not to use branches in software development.

This is so true! That one seems to be a classic example of over-engineering. Although that model may be useful for some types of developments, ultimately every project has to find out their own way of branching model, which I should be the simplest model that serves their need - no more, no less. Going crazy with branches is just another way to fail using them.

> I will explain next why merge commits are bad and what you will lose by using them.

To add to that, I think it is quite telling that the Linux kernel developers themselves prefer a simple, linear history in the end (using branches only as intermediate steps), especially since they were the ones who created Git in the first place.


> I think it is quite telling that the Linux kernel developers themselves prefer a simple, linear history in the end

What do you mean? This is what the current git history of the linux kernel looks like:

    *   12b9fa6 Merge branch 'for-linus' of git://git.kernel.org/pu
    |\  
    | * 5129fa4 do_last(): ELOOP failure exit should be done after 
    | * a7f7754 should_follow_link(): validate ->d_seq after having
    | * d456564 namei: ->d_inode of a pinned dentry is stable only 
    | * c80567c do_last(): don't let a bogus return value from ->op
    | * 0fcbf99 fs: return -EOPNOTSUPP if clone is not supported
    | * b6853f7 hpfs: don't truncate the file when delete fails
    * |   340b3a5 Merge tag 'armsoc-fixes' of git://git.kernel.org/
    |\ \  
    | * \   d877a21 Merge tag 'renesas-soc-fixes-for-v4.5' of git:/
    | |\ \  
    | | * | 901c5ff ARM: shmobile: Remove shmobile_boot_arg
    | | * | 4e960f5 ARM: shmobile: Move shmobile_smp_{mpidr, fn, ar
    | | * | b1568d8 ARM: shmobile: r8a7779: Remove remainings of re
    | | * | d2613f5 ARM: shmobile: Move shmobile_scu_base from .tex
    | * | | 7931845 MAINTAINERS: Extend info, add wiki and ml for m
    | * | |   9fa6c2b Merge tag 'omap-for-v4.5/fixes-rc5' of git://
    | |\ \ \  
    | | * | | 3f315c5 ARM: OMAP2+: Fix onenand initialization to av
    | | * | | e327b3f Revert "regulator: tps65217: remove tps65217.
    | * | | | a9e5547 MAINTAINERS: alpine: add a new maintainer and
    | * | | | 5e45a25 ARM: at91/dt: fix typo in sama5d2 pinmux desc
    | * | | |   b223c9f Merge tag 'imx-fixes-4.5' of git://git.kern
    | |\ \ \ \  
    | | * | | | f5d0ca2 ARM: dts: imx6: remove bogus interrupt-pare
    | | | |/ /  
    | | |/| |   
    | * | | |   e3acd74 Merge tag 'omap-for-v4.5/fixes-rc3-v2' of g
    | |\ \ \ \  
    | | | |/ /  
    | | |/| |   
    | | * | | cf26f11 ARM: OMAP2+: Fix omap_device for module reloa
    | | * | | 08c78e9 ARM: OMAP2+: Improve omap_device error for dr
    | | * | | bf26927 ARM: DTS: am57xx-beagle-x15: Select SYS_CLK2 
    | | * | | a5b8751 ARM: dts: am335x/am57xx: replace gpio-key,wak
    | | * | | 5f35dc4 ARM: OMAP2+: Set system_rev from ATAGS for n9
    | * | | |   74a46ec Merge tag 'mvebu-fixes-4.5-2' of git://git.
    | |\ \ \ \  
    | | * | | | 44361a2 ARM: dts: orion5x: fix the missing mtd flas
    | | * | | | 9d021c9 ARM: dts: kirkwood: use unique machine name
    * | | | | |   691429e Merge branch 'akpm' (patches from Andrew)
    |\ \ \ \ \ \  
    | * | | | | | 7f6d5b5 dax: move writeback calls into the filesy
Merge commits galore.


It's true that at the top level, most of Linus' commits are merge commits. However, on a per-subsystem level, we very much favour a nice linear history. In most parts of the kernel, it's rare to go beyond 2 levels of merging, which given that the kernel has something like 1500+ developers and well over 10k+ commits per release cycle, is fairly linear...


Interesting (I'm not a kernel developer). What's the workflow at the subsystem level -- does the maintainer do a `git fetch` followed by `git rebase`? Or is the rebasing done via email patches and `git am`?


It's pretty much all done by emailed patches with git am - I maintain a personal tree on GitHub and one privately in my company, but they're purely for experimentation, not for sending pull requests. Email is how we submit code, discuss code and review code. You use git send-email to fire your patches off to the appropriate maintainer + mailing list, you make your modifications/rebase it/etc then send off V2, V3, ... V14 of your patch until everyone's happy. Among other things, we tend to be quite picky about getting commit messages right, and making sure that patch series are "structured" in a "nice" way - so git rebase -i is one of the most common commands I run on my private branches...

Maintainers all have their own tools and scripts to help automate and track the whole process - in the area where I work, we track things using Patchwork (see http://patchwork.ozlabs.org/project/linuxppc-dev/list/). When the maintainer's happy with a patch, it gets applied and pushed to one of the trees they use (e.g. with powerpc, we have powerpc-next for feature development and powerpc-fixes for important fixes). Eventually, they send Linus a pull request, and it makes its way into the kernel mainline.

It's a bit of a tricky system that requires understanding of the kernel community's social norms to get right - which I'm not entirely happy with but I don't think it'll change particularly quickly. However, it's also surprisingly effective - the kernel is one of the largest and most distributed individual projects in the open source world, and as a community we keep pushing out releases.


Not just Linux - most major open source projects I've come across on GitHub also practice similar processes, if not identical.


Unfortunately, GitHub pull requests encourage explicit merge commits for every pull request.

This is not helpful and can be quite annoying - especially for pull requests that contain just a single commit. Those merge commits are introduced even if the

Maybe there should be a GitHub feature that pull requests are automatically rebased (as long as possible safely and automatically). Or, that the GUI allows a pull reqeust to be "fast-forwarded" instead of "merged", if possible.


once more, gitlab does it right and offers a "rebase" button on merge requests along with a "fast-forward merges only" option for repositories.

See http://doc.gitlab.com/ee/workflow/rebase_before_merge.html


Not sure why this was downvoted. This is a really good point, and great to know!


Huh, didn't know that - I'm starting to like GitLab more and more!


I need to update or GitLab ...


Ideally it would be a setting on the repo to allow both kinds of processes.


This method doesn't assist code reviews. Everything is on master, and I imagine it would be difficult to find all the commits for a given feature, unless every feature is just a single huge commit (terrible).

It ain't hard people, just do:

master (production) -> develop (staging) -> feature/foo-bar (feature)

No biggie.


Feature should be tracked 1 layer above change history, in some issue tracking system. ...


What's wrong with a single commit for a feature?


You lose a whole lot of development history information. You have no chance of finding out why these five lines of code came to be when all your commits touch hundreds or thousands of lines. Maybe the intent of the code should've been in a comment, but if there are none, it's nice to have commit messages as a backup.

Also, bisecting (and then finding the actual fault in the code) gets harder the larger the commits are.


> You lose a whole lot of development history information

What information do you lose, and why is that a problem? The feature commit tells you everything you need to know. Beneath those details tend to be irrelevant minutiae.

Let's imagine an editor that made one commit for each key press you type, or every time you save a file. That would be the closest thing to true history. I might accuse your practice of not committing that level of detail with "losing a whole lot of development history", and it would be true but only in a banal sense.

How an author built up their feature commit is almost never (and shouldn't be) relevant to everyone else. What's relevant to others is how they changed the repository and why. A good feature commit or commit series should stand alone.


You're going too far in the other direction in an attempt to counter. A single commit per feature is bad because you lose context that links changes together. A commit every keystroke is bad because you again lose the context that links changes together. A commit should be a logical grouping of something you accomplished towards the overarching feature. In my mind, commits should happen any time you could envision yourself at a point where you could leave for the day and come back to continue the next without having any difficulty spinning back up (what was I trying to do here, again?) Sometimes that overlaps with the feature, if your feature is really small, but often it won't.


One thing you lose is history across filename changes. Git can track renaming, but if in the same commit you also change the contents of the file, git has no way to figure that out. Separate it over two changes, and git knows what happened.

For that reason I also try to spread big refactorings over a couple of commits (always with the code working after each commit, of course).

I also like to have formatting changes in a separate commit. Mixing functional changes with formatting changes means every line in the file has changed, and the functional changes become invisible. (Though commit hooks that demand an (Jira) issue nr in the commit message make this hard.)


Isn't this mostly true if you both change the file name and change most of the file's content? In which case it might be best to just see it as a deletion and creation anyway?


If the file still serves roughly the same functionality, just is refactored form, I think it's nice if the old version still shows up in the history.


I often don't want to record my development history in the master branch because it's messy. Instead I squash my work into a single cohesive commit with a decently well written commit message.

I'm not convinced that more commits generally make it easier to understand why code changed. Often when I blame a line, if I see it was changed by a cohesive feature commit, that tells me all I need to know—and if something's wrong with it, I can say "hey Joe, you broke the frobber when you implemented the new gizmo."


> I often don't want to record my development history in the master branch because it's messy. Instead I squash my work into a single cohesive commit

personally i rebase my commits into fewer ones so the result is a reasonable compromise between how-the-code-actually-evolved (which is messy, true) and a clear, simple progression from start to completion of the feature (one single commit if taken to the extreme).

> I can say "hey Joe, you broke the frobber when you implemented the new gizmo." that breaks if joe is not available anymore or the change is complex (or old) enough that the detailed history would help debugging, but joe doesn't have the branch lying around anymore because he deleted it or switched machines etc.

There are certainly advantages to a single commit per feature :) I just think having more information available through more commits is a good thing, and i don't see large disadvantages of having larger amounts of commits, provided you really do squash those "fix last commit" commits etc. to keep the commit count reasonable.


Mostly agreed. I have some notion that a commit "should" be a well-defined change in terms of program behavior, so if I can develop a feature in actual stages, I like to make separate commits. That's often the case with complex features.

Multiple commits can be confusing too, when commit messages aren't good, so I think making coherent changes is a problem beyond commit frequency...


You're throwing away your backup of info which now exists nowhere outside Joe's head. What happens if he gets poached next quarter, or even just forgets the tricky part of the new gizmo?


I'm not convinced that having more of Joe's intermediate commits would help very much with that problem. If I don't understand his code or what it's supposed to do, I probably won't understand his commit messages either.


Having a feature implementation splitted in a sequence of commits helps a lot the review process.

This often involves the use of interactive rebase to cleanup the history before presenting the branch for the review.


I wish github had a way of showing the contents of all separate commits instead of everything in a big blob :(


I find all of these posts on "the right way to use git" pretty tiresome.

At the end of the day it's really not that complicated and you will probably have more luck by keeping it that way. Bringing in a complex workflow that isn't rooted in the flow of work in your organisation makes no sense.

Stick to master being the newest code, fixes done in a branch, review and rebase ontop of master, features the same just hang around in a branch for longer - still rebase ontop of master. Maintain branches for releases and cherry-pick from master onto them.

If your workflow takes a blog post to describe it's just too damn complicated.


Gitflow is exactly this, only they add an extra branch which tracks all of the releases. And, for some weird reason, call it 'master'.


It's worth pointing out that the entire point is that you use SHORT-LIVED feature branches. The long-lived branches are the releases that you need to support.

You're basically formalizing a working copy by calling it a branch; with the added advantage of being able to make a nice clean set of commits ready for code review.

I introduced using short-lived feature branches with rebase/merge after having a great experience doing something similar with Perforce. Gitflow simply adds a master which contains only releases and formal fix-bug-on-release-merge-back-to-development rules.


Not another one.

All good git workflows are different; all bad git workflows are bad in the same way - okay, one of the same two ways: superstitiously declaring important parts of the git model "bad" (merge commits are bad! rebasing and amending are bad!), or superstitiously merging things that have no business being merged together left and right (this is really a special case of the first: it's 'branches are bad!', though the person doing it often doesn't realize it).

GitHub certainly doesn't help matters by putting a frontend onto git that doesn't match the conceptual model of git in the least - you can't even see the graph unless you dig deep and find "Network", the world's worst graph visualizer, and the pull-request module is absolutely unreviewable and has giant inviting buttons to instantly do nontrivial things to your immutable (without inflicting cascading rebases on the entire planet) history that you may or may not even want without telling you exactly what they'll do.

It's a pretty serious indictment that Bitbucket, originally designed for Mercurial, is the only web frontend to git that does a passable imitation of gitk. (Git (the command-line porcelain) doesn't help matters by assuming the user is a seasoned LKML veteran with Linus or a vice-Linus above them willing to ruthlessly reject their history if it sucks, either.)

"We are a small two-person collaboration on separate chapters of a book in LaTeX, so we will forego branching, work on master, use pull --rebase, and push to the same central remote, to basically have a more robust CVS" is a git workflow you can have.

"I am the maintainer of a decently large open-source project, so I will accept only signed tag pull requests of work based on the last signed and tagged stable release, with cleanly logically separated commits and no internal merges allowed unless explicitly justified, and --no-ff merge them onto my ongoing blessed integration branch after review" is another git workflow you can have, and it uses more of git's featureset - it's close to how Linux works, if you factor out the mailing lists and ignore the presence of vice-Linuses.

"Rebase everything willy-nilly because merge commits are BAD" and its just-as-evil twin, "have multiple branches but merge them all into each other whenever someone blinks, because separating separate work is BAD but 'branch' is a buzzword" are not workflows you can have, they're symptoms you can exhibit.

GitHub, et "considered harmful" blog posts, et superstition, delendae sunt.


You shouldn't do continuous integration without continuous deployment, because you don't know whether you're relying on code that is not ready to go to prod. And you shouldn't do continuous deployment unless your test coverage (including load+perf) is so exhaustive that every possible change has zero risk in prod. Literally every team I've ever worked on had to manage production deployment tactically in terms of "are the right people available if we roll this right now" and "how much effort would the rollback be" and "is this an especially important day for our customers".

New features should be on branches off master, and master is the code we've already pushed to prod and agreed is good enough not to roll back.


Some people seem to think the only software that gets developed any more is running on a web server, that its developers can and should always have direct access to production servers, that pushing minor changes to those production servers several times a day with minimal oversight or control is a badge of honour, and that having serious failures in production now and then is acceptable.

No doubt for some software development projects this is all true. For many others, it is not. Processes and tools that work well for a team in one context might be completely inappropriate for a team in another.


I do prefer rebases over merges because they lead to a cleaner history, but when you have a set of features in separate branches that are all dependent, rebasing becomes a pain. I like to have one local branch per CL/pull request. If I have a branch feature1 with commits on top of master, then feature2 with commits on top of feature1, feature3 with commits on top of feature1, and feature4 that merges feature2 and feature3 and then has commits on top of that, that thing is a pain to rebase. (I know about --preserve-merges, but it does not always do the right thing.) The main problem here is that the end result should be feature4 rebased on top of master, but the other branches should also be updated to point to the new commits.


It's rare to need to develop multiple features as concurrent branches, as long as you can release regularly.

The practice that works really well is: just develop against master while running "git pull --rebase" regularly. Code review and then ship the feature, repeat. Use branches only for rare concurrent development, and only after the need has arisen (Git makes it trivial to move changes into a branch after the fact).


It is not rare at all. The features I am talking about are small. In a large project code review can take several days if the reviewers are busy, even for a small change, so it is essential to work on multiple things concurrently.


Good points.

Two issues:

(1) All code review tools I know of currently use merges http://programmers.stackexchange.com/q/256789/108980

(2) If you want to keep your local work up-to-date, you need to rebase. It can be painful to do this after a while; many like to do it often. If you have a conflict, you will need to resolve it everytime you rebase, unlike merge, which saves the conflict resolution in the merge commit. You can use git rerere, though it takes some effort https://git-scm.com/docs/git-rerere


Basically don't have long lived branches.

Everyone should know that by now, doesn't matter what development model or process you are using, if you've gone off on a branch for too long you will have problems.


This poses questions of the caliber of "Why not instead avoiding version control and just have a shared folder?", it's "simpler"... There's a reason: organization.


Not sure why the aversion to merges. Bisect works fine even with merges, and the avoidance of the "christmas tree look" is IMO just aesthetics.


The model described in this article is a very reasonable one. I've been using something quite similar with various teams at different software companies over the last years and it has always worked great. Little form, lots of function.

One note, though — feature branches should IMO be short-lived. Long-lived branches cause huge and difficult merges and rapidly start being costly to maintain.


> when one developer changes some internal interface and other developer builds something new based on the old interface definition

Well, the compiler would just catch this, right‽ Except if, I suppose, you're not using a strongly-typed compiled language like Haskell… ;)

(okay, I know this doesn't actually address the git issue at hand and isn't foolproof either)


This "Something more simple/Cactus Model" is exactly what we're doing very successfully since years with SVN. We're focusing on release branches. Very rarely doing feature branches, because branch merging with SVN is PITA and something you want to avoid.


We used feature branches with Subversion very often, then switched to GIT, because it is much faster and has built-in support for rebase instead of custom rebase-branch script for svn.


With git, branching and merging is pretty natural.


"How changes get live" is one of the first, most important things I tackle with a new project. This means I consider my "live system" is effectively part of my development environment.

Right now, I have a "live system" repository, and then development actually occurs in a second repository which is a submodule of the first.

This has a lot of advantages:

• Configuration and data is out-of-tree, so it is easy for me to have a read-only view of the live database, and read-write to a local (temporary) database

• I can select a branch using a signed cookie or a URL which makes it very easy to demonstrate features on the live system

• I can try out tags on 1% of traffic, or 5% of traffic, testing some features with my live users

• I can use git bisect on the live system to find regressions

There are a few disadvantages, but most of them are psychological: Testing on live sounds scary; how to make database migrations is a lot more work; etc.


GitFlow, when used correctly with Agile, solves all these problems quite well.




Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: