Hacker News new | comments | show | ask | jobs | submit login
Google's vs. Facebook's Trunk Based Development (paulhammant.com)
96 points by cpeterso 1167 days ago | hide | past | web | 41 comments | favorite

I have to say I struggle with advanced source control management and DVCS.

Every time I have been in a development team that proposes anything more complicated than a basic branch, I start to break out in hot sweats.

I think this has come from experiences where we have had multiple live branches and then got into a mess with merging, testing, incompatibilities, not having the right thing on the right branch etc.

When it does all work, it feels more luck than judgement.

I agree with a poster above that some of the teams that I have been on seem to spend massive amounts of time moving code between branches and accounting for it. It feels like such a drag and a distraction that it's often not worth branching even for fairly substantial changes.

I think I need to slice off a few days to really deepen my understanding of GIT or similar.

In my experience the motivation to use a particular SCM scheme is tooling. That makes some of the choices non-obvious to developers that aren't familiar with the tools and their quirks. A lot of people developing those schemes forget to answer "Why do we do this?" because it is so obvious to them and you really should ask that question.

The trouble is that VCS pain points are far from universal. Everyone has their own experience, for better or worse, and will have a view on what's important based on their personal experience (or lack of experience).

In other words, arguing for a model (mainline or trunk) without first setting a context is pointless.

For instance, consider the needs of a very large team versus a very small team. Consider a team that releases often versus a team that releases infrequently. Consider a team releasing multiple projects according to a roadmap versus a team releasing a single project on an ad hoc basis.

The variations are many, and so too are the strategies for managing source code.

I myself am never going to argue for Mainline - http://paulhammant.com/2013/12/04/what_is_your_branching_mod...

It is not clear to me how

"Trunk based development with tests and code-review pre-commit"

is tangibly different from

"Branch based development with tests and code-review pre-merge"

The article talks a lot about trunk-based-development, but if you're doing any sort of checking before "commiting", then don't you essentially have a short-lived branch?

You got it, it's a smoke-and-mirrors screed to justify the lack of dependency management. 'One giant repo' might work ok in Perforce, but it's bad in git and SVN--forget about it.

Au Contraire. Google (and I'm assuming Facebook) have very detailed dependency rules that are explicitly described in files in nearly every directory throughout the codebase. There's no way you can do massive cloud compiles without this kind of information.

Subversion could do it with sparse-checkouts, but there is a huge gap in capability between it and P4 :- http://paulhammant.com/2014/01/06/googlers-subset-their-trun...

If everyone is on trunk, then a team communicates their changes to each other through the trunk.

If everyone is on a branch, a team is tempted to use their branch to communicate their changes to each other just on the branch. Next thing you know, you've drifted far from main, and another team is trying to touch the same files, and there's no way to merge, and someone who did a refactoring has broken your APIs, and you're tempted to release from your branch because you don't have time to reconcile the merge, and you have code that solves this problem in ANOTHER branch and you manually copy it in to this project in the same place, but it needs to be slightly different so they drift apart from each other, and then your project gets deferred and none of your code gets merged...

If you're on trunk, you HAVE TO commit in order to share your code with your team. And that makes all the difference!

It's possible to use branches the same way he's proposing to use trunk, but it's awful tempting to do bad things.

I'm reminded of the Indian proverb I saw on Reddit: "If you want to go fast, go alone. If you want to go far, go together." Meaning, if you quickly want to make a demo, a branch is your friend. If you want to make sure your code lives on, make sure it lives on in trunk as quickly as you can.

The problem comes when you start releasing from your branches. Then repositories diverge, different projects have different codebases with a common ancestor. Fixes don't get propagated everywhere, etc.

The approach that has worked for me is: 1. Develop against trunk 2. Branch for release 3. When a defect is reported in that release, fix it in trunk if it manifests there and then merge the commit to that release branch. This will prevent regression in the next branch. 3b. If the defect only manifests in the release branch, fix it there, and then skip merging from release-branch back into trunk. 4. Dis-allow any new feature work to be merged from trunk to the release branch after it is cut. Only defect work.

I don't like the idea of merging from a release branch back into the trunk. I see branches as things that are cut, potentially hardened, and then discarded.

This post puzzled me. I feel like it was missing an introductory paragraph with a thesis and summary of the two branching models. You have to read part way through the article (ignoring the confusing diagram) before this becomes apparent.

I agree and I'm still not really sure what the difference is between Trunk Based Development and Mainline development. I searched around a bit as well.

My guess is that trunk based development is the idea that all commits pushed to the canonical repository are pushed to trunk (rather than a remote branch) with incomplete code being hidden using feature toggles?

In contrast mainline would push incomplete code to long lived remote feature branches and those branches would only be reintegrated into trunk once the code was complete.

However, I don't really see how this relates to a lot of the rest of the article which seems to be more to do with versioning, dependency management, and testing.

There's also some more specific points I'd like to pick up on:

Is it really easier to rebase my local branch than it is to merge from one remote branch to another? Seems like half a dozen of one and six of the other to me.

The article contrasts Google's & Facebook's model with the pull-request model of Etsy and Github but again I don't really see much of a difference. Facebook sends a patch to phabricator for review, someone looks over it and then it gets committed to trunk.

My guess is that trunk based development is the idea that all commits pushed to the canonical repository are pushed to trunk (rather than a remote branch) with incomplete code being hidden using feature toggles?

Perhaps I'm misunderstanding things, but the impression I got from the original post is that most (many? all?) developers have local repositories where they manage their features. So, instead of using branches in the central repository, those branches are employed in local repos.

The article contrasts Google's & Facebook's model with the pull-request model of Etsy and Github but again I don't really see much of a difference. Facebook sends a patch to phabricator for review, someone looks over it and then it gets committed to trunk.

I agree. I suspect it may be just workflow/jargon differences?

Feature toggles (and Branch by Abstraction) can figure, yes. Facebook do dark launches, which is related - http://www.facebook.com/note.php?note_id=96390263919

Developers, if they are local branching, are not marshaling long running 'in progress' changes there. By habit they're working on something that's going to hit the trunk after a matter of hours or a day or three. They might flip to a new branch for a defect fix (and push that), before coming back to the thing they were working on.

Paul links to it above in his comments: http://paulhammant.com/2013/12/04/what_is_your_branching_mod...

Click the "Trunk Based Development" category. This article is in a series that has spanned a decade.

I've worked with a similar workflow in a number of places with SVN. It's horrible.

These workflows which avoid branching, avoid merging, and avoid multiple services with separation of concerns are a product of the limitations of the SCM system. Too many times, the designers have praised their workflows for so perfectly utilizing the features of the chosen SCM system. They put the cart before the horse, and don't see that their workflow is an attempt to work around the places where the SCM falls short.

I've heard this same line of thinking many times, usually in allusion to Git but often not stated explicitly. But I've never heard a good answer to the question of how you can actually have an adequate SCM-merge unless it's language and refactoring aware (and even then...)?

Every SCM that exists will fail spectacularly at merging two refactorings that touched the same code. You simply can't solve this with software today.

Enter... merge pain.

The only real solution that doesn't involve developers avoiding refactoring unless they really need to (either because it's painful or because they don't know they should be doing it), is trunk-based development.

As a footnote, for what it's worth... multiple services with SoC is definitely a good thing, but I don't think trunk-based development precludes that.

Every SCM that exists will fail spectacularly at merging two refactorings that touched the same code. You simply can't solve this with software today.

Same code is not supposed to be refactored twice. If it happens, a human with knowledge of the two refactors must resolve the conflict.

If there's a line of code which has been touched twice in two different refactoring efforts, I dont think its a good idea to let machine decide between them.

Unless, machine knows the exact purpose of the code. If we get to that point, I think machines will be able to write programs themselves :)

The good folks at PlasticSCM are working in the direction of smarter (language-aware) history and merges with Plastic and SemanticMerge. References:

- http://www.semanticmerge.com/ - http://herdingcode.com/herding-code-183-semantic-merge-with-...

But you are doing a merge possibly every time you push to the trunk, are you not? So if I got it right, the difference between branch based and trunk based style is how long you let your branches to develop, both in time and commits. The one extreme is a branch for the whole release and the other is a "branch" for just a one commit. Feature branches would be something in between.

Now the problem with long branches and painful merges sounds like a communication problem between the master and the branch, just at the level of the code. We know that at the level of people timely and terse communication is key to a successful project. So the same would kinda make sense at the code level too.

So, if I understand you correctly:

  1. merge conflicts are almost unavoidable
     in a non-trivial codebase
  2. resolving merge conflicts in some SCM systems
     is unreasonably difficult

I'd change 2 to "resolving merge conflicts in all SCM systems is necessarily difficult."

As far as I know there's no SCM system which can understand the /intent/ of the change and without being able to reconcile the intents of two conflicting merges there's no way of reliably merging the code (at least as far as I know).

Being reminded of an earlier article about how facebook contributed to Mercurial to optimize it[0] because their code base had become so large. Wouldn't it provide a much better flow if they built their system around separating things into packages and using a internal built (or smth like that) package manager to update etc? Making it much easier to divide things up into different repos instead of one giant one (which, quite frankly, seems like a terrible idea at this size).

[0] https://code.facebook.com/posts/218678814984400/scaling-merc...

A central company-wide repository and modular code are not mutually exclusive. It's not that people at facebook or google are having all the code in one messy unmanageable soup.

The build systems offers fine grained dependency management and the repository is organized so that different teams have responsibility over their components, just as if they were different repositories for all practical purposes.

The advantage of having them in a single repository are:

* atomic operations, i.e. you can refactor components and avoid code rot, or apply and API upgrade to all the clients, thus reducing the amount of time you have to maintain backward compatible APIs

* having a single, monotonically increasing, number that describes exactly which bugs or bug fixes your codes has. This simplifies greatly the management of rollouts in case of complex component dependencies.

Good idea. I think this article is a disguised anti-service-oriented-architechture & anti-git argument.

Many people are so fond of what I call commit accounting that they spend more effort on all that branches, merges and rituals than on actually getting their code to work.

One more reason to use trunk based development. Or SVN.

Could you share more details about 'commit accounting'?

Moving commits from one branch to another according to a ritual. This way even simple change seems like a lot of work. The concept that machine should be one doing rituals, not humans by hand, is promptly ignored.

Sure, that's pointless busywork. Doesn't trunk-based development enshrine this, with the moving of last-minute hotfixes between the release branch & trunk?

What came to mind personally was overuse of rebasing to turn a commit log into something artificially pristine at merge time.

People do want some minimum of commit hygiene so that the repo avoids being a mystery vortex. But because distributed VCS lets you hide everything about how you actually work, people are able to get unnecessarily competitive about it.

I'm not fond of broken commits in a public timeline, and often my local commits are broken - they are snapshots of development milestones that make sense to me, but have no business being made public in the persisted timeline of a package.

I see this a lot in public timelines:

    Commit A - splendid new feature
    Commit B - oops, missed a semicolon
    Commit C - typo
    Commit D - addressing code review
    Commit E - typo
That 5 commits for one new feature, and the commit I want to build from has a commit message which has nothing to do with the feature in question.

Rebasing that to a single commit takes little effort or time, and makes the timeline clear. On top of that, if you are using a code review tool which persists changes you may already have history of commit C which is probably the only other relevant commit you'd care to be able to find in the future.

It seems like this is something that should be handled by flags/tagging or some other object rather than destroying history.

Like, instead of collapsing A-E into a single commit, A should be marked as the start of project/feature/milestone/whatever Foo, and E as the completion of it, and A-E shown as a single object when viewing the history but allowing the user to drill-down to expose the internal commits.

Destroying information is never good.

Yes. This. The whole religion about keeping history "clean" would collapse if we had this single technical ability.

You can get some of the way there now by making use of tags or branch labels or something.

At one point, i wrote a Mercurial hook that would sit in the central repository add a tag every time someone pushed. Then, only tagged commits would be considered first-class parts of history. There are numerous problems with this, not least that none of the existing tooling is aware of this convention. The fact that Mercurial tags live in commits, which pushers then have to immediately pull, was also very awkward.

You can use branches in a similar way: do all the intermediate, historical-footnote, commits in a development branch, and merge into a master branch to publish them. Then only consider commits in the master branch to be first-class. This is roughly Git Flow, isn't it? Again, the tooling doesn't quite do everything you'd want it to around this.

Yes, you can still rebase to one commit in a Trunk model. Git-Svn and Perforce's Git-Fusion allow you to knock yourself out with DVCS (squash, local-branches) before you dcommit / push back to the canonical repo.

Moving commits to release branch is a problem too, but it is usually ritualless

> Google have many [...] buildable and deployable things, which have very different release schedules. Facebook don’t as they substantially have the PHP web-app, and apps for iOS and Android in different repos.

At least when I was there, the first sentence was true for Facebook. There was/is a separate repo for Thrift services (largely C++, but also Java, Python, and more), each Thrift service is a deployable. The deployables could be pushed at separate times and were usually pushed by teams themselves on their own schedules. The process was still frictionless and trunk-based: only difference was having to type a few well documented commands to push the service yourself. This may changed since, however.

A bit confusing article but it's very good anyway.


Though this is fascinating to read, basing your repository decisions on a ginormous company's requirements is not really useful. I have advocated trunk development everywhere I have worked and usually people come around to it. But 99.99% of people are not Google or Facebook sized.

Given the known cultural emphasis on separation and redundancy at Amazon I'm interested to know what their tooling and source code control implementations are.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact