Hacker News new | past | comments | ask | show | jobs | submit login
Stacked changes: how FB and Google engineers stay unblocked and ship faster (graphite.dev)
467 points by tomasreimers 63 days ago | hide | past | favorite | 279 comments



At my last job, I found our code review process was terribly slow and set about analyzing it and coming up with solutions. I found a lot of developers were blocked on reviews. They either task switched (expensive) or babysat their review (expensive). Even though we were using Phab, few did stacked commits because of how unfamiliar the tools were which made them feel brittle.

Improving the stacked workflow is one part of the equation but I'd recommend being careful of letting that mask other problems

I identified a series of busy-waits including:

- Post, wait, CI failure, fix, post

- Post: wait for reviewer, ping reviewer, wait, find feedback was posted, fix, post

Some of the root causes included:

- Developers weren't being pinged when the ball was in their court (as author or reviewer)

- Well, some notifications happened, but review groups were so noisy no one looked

- The shared responsibility of review groups is everyone assumed someone else would look.

- The tools we were using made it hard to identify active conversations

My ideal solution:

- Auto-assign reviewers from a review group, allowing shared responsibility of the code base while keeping individual accountability of reviews

- Review tools that ping the person the review is blocked on when they are ready to act

- Review tools that give a reminder after a set period of time

- Azure DevOps' tracking of conversation state with filtering (and toggling between original code and the "fixed" version)

- Auto-merge-on-green-checks (including reviewer sign off)

With this, responses are timely without people polling the service, letting machines micro-manage the process rather than humans.


Gerrit does many of the "ideal solution" items, as it replicates the internal Google workflow.

It's a code review system with browsing as an afterthought, as opposed to GitHub which is a code browser with review as an afterthought.


Gerrit is nice, but to me feels a bit dated, UX-wise. Before joining Google, I used Azure DevOps and I feel that it was easier to review code over there (less jumping around the page). At first I thought it's a matter of (in)familiarity with the tool, but I still feel like I'm fighting with it after 3 years of use...


The Gerrit UI is just bad (confusing). I don't like the "dated" criticism for work tools because whether it works well is mostly independent of it being fashionable - cf. Steve Jobs quote "Design is how it works". Gerrit's UI is both dated and confusing, but confusing is the major thing here. For example, "+1" and "-1" meaning "I like / don't like it" and "+2" / "-2" meaning "I approve / block it". The numbers are not summed up, the names are misleading. Or the workflow of writing a comment and then publishing it. Probably makes sense when making several comments (on different blocks of code), but it's too easy to forget the publishing, especially when writing a top-level comment where there isn't anything else to publish together.

Another thing I don't like about Gerrit is that they seem to play Tetris with UI elements, i.e. "wherever there's room to put it is where we put it". It's not as bad as it used to be and improving, and it looks less dated than it used to, too. There are also newish features like patch editing in the web interface.

All that said, patch queues / stacks and the rebase based workflow are nice.


Definitely, "feels dated" is very subjective. The other problem ("I feel like I'm fighting with the tool", "I'm jumping all over the place to do what I want") may be related to your "UI tetris" point, and I may be unnecessarily tying it to the UI feeling "dated".

ISTM that what is needed is to observe how people interact with the tool and then try to figure out how the workflow can be improved based on those observations. Or maybe I'm trying to interact with gerrit in some strange way - that's also possible.

The +1 / +2 for different labels aren't that confusing to me - additionally, in Chromium I found that gerrit is very descriptive when you on-mouse-over the buttons, but IDK if it's specific to this deployment or if that aspect got improved.


I really like that with Gerrit I'm not encouraged to force-push a PR branch. There's a couple other things that are nicer on Gerrit but GitHub/GHE is great that it has markdown in comments.


Gerrit does support some Markdown-like formatting:

     Some basic markdown-like syntax is supported which renders indented lines preformatted, lines starting with "- " or "* " as list items, and lines starting with "> " as block quotes
from: https://gerrit-documentation.storage.googleapis.com/Document...


Gerrit only has force push for the PR equivalent.

The nice thing it has is really easy to rebase off other people's in-progress work (for GitHub you need to to fetch the other person's commit by url and hash -- I guess after writing that down it's not a huge advantage).


There has been a lot of investment in Gerrit in the last few years and the UX is improving dramatically. It's still not perfect, but I've been very happy with it lately.


Totally totally totally agree. I think a lot of the slow down comes from not understanding when the ball is in your court. And Github doesn't prioritize this b/c I think this problem is less prevalent in open source, where there is usually no expectation of a reasonable time until review.

A lot of what you described is exactly what we've built / we're planning for the code review portion of the site - you should sign up and check it out :)


Agreed 100%

Making it easier to stack PRs feels like entirely the wrong direction - "when you're in a hole, stop digging" - stacking is giving the people in the hole better shovels to keep digging with, instead of getting PRs flowing through more easily so that stacking is not needed so much.


"Stop digging" is not an option when working on a project that isn't entirely internal to one (sub)organization. These are not only open source projects.


Github has been improving on this. I felt like they were rolling out the features I wanted right as I was enumerating them!

They have

- Auto-assign reviewers

- Pings for people (unsure how sophisticated)

- Auto-merge

The review conversation itself is the big issue.


To add to your list:

> Reviewer able to make tiny changes to the code without a further round trip to the developer. Things like "missing full stop here" or "Please put a comment warning about X here". Developer should get a heads up that their code has been merged with changes, just so the tiny changes don't go unreviewed.


I feel the hard part is putting automation around this.

I loved it when I worked on a small, high trust team where we were comfortable with each person making the decision on whether it was safe to do a post-merge review or that it needed attention first with a pre-merge review.

EDIT: Something I have liked at a couple of my company's was a "cowboy commit audit" where we allowed people to directly push but doing so would ping the code owners and they would have to review to clear the state. People understood they shouldn't but also understood when it happened.


Github suggestion comments usually do this pretty easily. You can prompt the submitter to accept a diff. It does have two annoying problems though:

- You must use a browser editor, no attaching a diff you generated elsewhere. Anything more than 1 line is hell because Github's editor gives no fucks about running gofmt or whatever. - If you request a change, approve the PR, and then the submitter accepts your change, it dismisses your approval.


i'm a bit ambivalent about this sort of idea - reviewers adding "tiny" changes without extra reviews could be a place where bugs leak in. what's tiny or insignificant might turn out to be big - that's why code is reviewed by multiple people!

I think for style formatting, making it an auto-format step during commit/push is the "better" way. Codified formatting means you argue and hash it all out once and be done with it forever.


Nothing stops an organisation having standards what is sufficiently tiny to use the process...

Eg.: "Touches comments and whitespace only". "Resulting compiled binary does not differ". "Variable and function renames are okay". etc.

Remember that with the original developer still able to review, they can always rollback a change they weren't happy with.


i've heard it said that pair-programming can alleviate the need for doing code review... or at least the typical in-depth reviews that usually become bottlenecks...

any experience with that?


My primary experience with pairing was not ideal. It was being used in a mentoring capacity rather than when people are more equal. I also found that the person doing the mentoring usually drove which meant that the mentee just saw things whizzing by on the screen and had no idea what happened and when they did they didn't have the first hand experience to commit it to memory.

I do remember a situation of hot seat programming where we had a critical problem right before release that crossed domains and three of us worked at one desk consulting the others or swapping out as needed. Again, it was worth it because of the urgency and priority but not something worth doing generally.


> the mentee just saw things whizzing by on the screen and had no idea what happened

That's a bad pairing experience, and I've had a handful of those in my last five years of pair programming.

On occasion I've said, "Look, why don't you tell me what to do, and I'll drive?"

But I think that the larger issue is that some developers are better at solo'ing than pair'ing, and there's nothing wrong with that, and at the company I worked (Pivotal), we'd find positions for the people who were more inclined to solo.


An easy fix is to have the mentee drive.


Now navigate to the FOO service

FOO is the background event scheduler, it's connected to BAR via RPC

Ok don't worry about what that is for now, just have a look at the service logs

So it's a daemon that running on port 5000

You'll have to tail the logs in /var/foo/logs

So type T. A. I. L, and then dash f

No, the hyphen character, not literally dash

And so on and so forth. It's not that a person more senior than you can't drive, but I have found a lot of time the implicit knowledge of so many systems, and topics and accreted domain knowledge can really be a productivity drain. The best pairing usually comes from equally matched individuals.


A software engineer at a tech company that doesn't know about commandline flags? That doesn't seem very realistic.

Anyway, this person will learn that dash means - , and not make that mistake again. It's a one-time mistake. So I think the pairing helps even in this case.


> A software engineer at a tech company that doesn't know about commandline flags? That doesn't seem very realistic.

Why not? 95% of most peoples work can be done in an IDE/text editor, and that last 5% is probably very exploratory anyway. I work in games, and gameplay programmers work almost exclusively out of IDEs with GUIs for source control and most other dev tools etc.

> Anyway, this person will learn that dash means -, and not make that mistake again. It's a one-time mistake.

Disagree that it's a one time mistake. On any given day/task there might be 5/10/15 of these. dash, ctrl + r, tab complete, !!, different shells, shebangs, flaky backend services, etc etc. If you want to teach someone how you work then yes this might be appropriate.

> So I think the pairing helps even in this case.

That said, I agree :) It depends though, is your goal to bring two programmers up to speed or is it to crunch through tasks right now? Because pairing helps with one of those goals but not the other


Okay, solution:

teach the person to use a computer first


Have you ever paired with a junior before? This is definitely the kind of thing you would encounter with a fresh grad/intern.


Sounds good -- I know that you're a great teacher. You'll still have those features done by tomorrow, right?


I read your parent as "that guy should not be anywhere near programming, he needs to learn how to use a computer first".

Okay I might exeggerate a bit but if someone literally writes 'dash' at that point, don't pair on something that has a deadline like that.

Do take him aside and let him know he gotta brush up on the damn basics and that you are there for him through it but you're gonna finish that feature with a deadline first while he takes a stab at that easy bug over there.


Exactly. The easy fix is not to have the mentee drive.


Tell the PM that we need to go slower to get faster.


Pair programming can't alleviate the need for code review. Good pair programming may reduce the amount of changes needed, and thus speed up the review phase, as a secondary effect of increased code quality.

Bad pair programming will suck up time by occupying two people at reduced efficiency, and won't lead to better code.

I've had some really great pair programming experiences, but we as a team aren't mandated to use it. We see it as a tool to pull out when cross-domain knowledge is needed, or when facing a particularly thorny problem. If you get the right two people together, they can fill in the gaps of each other's knowledge. The person not driving has more mental capacity available to think through potential alternatives.

However, the success of pair programming depends as much on the personalities of the individuals doing the pairing as anything else. They need to function as teammates, without any ego issues or any other thing that may prevent either party from feeling fully comfortable sharing their knowledge. Trying to pair program with someone who is defensive about their code/abilities is a true nightmare.


I've had great experiences with pair programming (and also bad experiences, but this is assuming a great experience), and I'd still want code review. The benefits can actually stack, if not compound, often enough.


I suspect something missing in pair programming that you get in a code review is someone stepping back and looking at the change as a whole, rather than the incremental bits as you go along. It can be helpful! I know at least when reviewing, there are things I miss when I only look at the update vs looking at the PR as a whole.


Pair programming doesn't scale for all the purposes of code review. Pair programming can be great for bursts and unblocking when there is good alignment, but code review serves as an enduring artifact for the whole team including future team members.


The way I'm used to pair program is that the pairing counts as a much more intense review than the typical corporate review case.

So the whole waiting aspect is completely gone.

Sure, this can have bad results with less skilled and/or serious people. But that's always true.


Once a review is done, it can help to sit with the person to avoid an endless back & forth via whatever app's hosting your repo. In my experience developers usually just put aside 1-2 moments per day to spend on reviews, so if there's 3 back & forths that can mean the changes don't get accepted for several days. Sometimes for just a few lines of code.


Well that's obvious, because the person who did the pairing is already pretty familiar with the code, so even if you do perform a review it will go fast.


that's like saying meetings alleviate the need for emails (documented decisions). Ideally you need both, but pair programming/meeting needs to be scheduled, with all the associated problems, async communication doesn't. If you have to pick one over another, then obviously that's async communication. A meeting without minute is a meeting that didn't happen.


At my company, we do exactly that using Github+Slack integration. Github Codeowners + Github Teams let's you to autoassign teams as reviewers. When the team is assigned then a random person from the team is pinged using round robin. Furthermore, with Github Scheduled Reminders you can get notifications in Slack according to some schedule and/or when some event happens, e.g. your PR was approved, got a comment or a mention, etc.


>Auto-merge-on-green-checks

ADO has that.

Post, wait, CI failure, fix, post

This is the worst, especially when the CI system is homegrown, rickety, and difficult or impossible to reproduce locally.


When I was doing product evaluation at my last company, I was torn between AzDO and Github. After being burned by the lack of progress on Phab (before it went away), one of our criteria was what I called a "living platform" meaning that it either has a vision that aligns with us and progress is being made or there is a vibrant community where we can collaborate. AzDO has a lot of good features but it doesn't seem like its getting much attention and everyone has been keeping their integrations with it private, killing off collaboration.


> This idea isn't new, and if you've worked at a company like Facebook, Google, Uber, Dropbox, or Lyft (among others), you've either had first-class support for stacked changes built into your code review platform (i.e. Phabricator at Facebook...

Phabricator got a lot of stuff about code review right (I may be biased from working at FAANG). I'm happy to see someone trying to improve the code review situation on GitHub. When I did use GitHub at the startups I worked at, we were never fully satisfied with the code review experience.

I hope they surface "Phabricator versions" as first-class objects. I never liked it how GitHub just clobbers the history left behind from force-pushes from to branch.


Gitlab got this right, Merge Requests are versioned and you can look at the differences between versions of a MR.

It's somewhat amazing and shocking how little the review experience on GitHub has improved over the last 10 years.


Honestly what i want is for github to purchase https://reviewable.io/ and make it native.

Maybe gate sone of the more advanced features behind an opt-in so beginners are not too overwhelmed (and because you probably don’t need them on smaller projects), but the current state of things is just appalling, and more importantly has not at all kept up with workflows on large projects or long / complicated PRs.

The complete lack of support for reviewing over force pushes is insane.


The ability to quickly manage branches and reorder commits within them in Mercurial is super powerful, something that Git and `rebase -i` could only dream of.

Doing the same in Phabricator is awesome. One blocking commit PR on the bottom of a stack can easily be moved to the top or a new stack, unblocking the rest of your changes. Re-writing and entire stack due to one small change is mostly eliminated.

(Good) Tooling is a major gain in efficiency.


> The ability to quickly manage branches and reorder commits within them in Mercurial is super powerful, something that Git and 'rebase could only dream of.

This is a git CLI wart more than anything. Magit (git interface in Emacs) makes juggling branches/worktrees and rewriting history intuitive to humans. I feel super-powerful in Magit and absolutely feeble using the CLI.


I completely agree. When Google decided to invest in a mercurial front end to their version control system a few years ago, I was confused, but after using it for a while, I can honestly say it has much better ergonomics for common workflows. Creating a tree of dependent commits is simple and trivial. I sometimes wish I could create a dag instead, but that's probably more trouble than it's worth.


One thing Phabricator did/does worse than GitHub is the interface for browsing the individual commits that make up a PR. I like being able to tell a story with the sequence of commits and their messages.


I honestly don't feel like Github did a good job there either, I mean at least on the web UI because the VSCode Github extension helps a lot (when it's not buggy).


Gerrit is the only thing I've seen do change exploration well.

Github is annoying as crap. Can't even comment on a line in the file that needs changing but was missed in the PR, can only comment on a tual changes.

Phab is god awful, it's like someone built that whole system with a purist mindset, everything is perfectly designed with perfect referential IDs. But every screen is a view into the db table, no screens join information and make it usable. This is worse when you have n:m relationships because they expose those tables via screens too, it's just plain awful.


> Phab is god awful, it's like someone built that whole system with a purist mindset, everything is perfectly designed with perfect referential IDs. But every screen is a view into the db table, no screens join information and make it usable. This is worse when you have n:m relationships because they expose those tables via screens too, it's just plain awful.

I actually like Phabricator, but the DB schema is insane. Like, literally insane. I spent hours trying to figure it out once before giving up in disgust.


The actual observed problem in the article is that the team's SDLC pipeline can't handle the current flow it's being subjected to. Work is getting "blocked" in code review.

Devising a coping mechanism for developers so that they can increase the inbound flow to an already overloaded system is not actually a solution.

The real solution to this rather common problem is to introduce a culture where "done" means "shipped" and not "I created a bunch of PRs for my team mates to read and while they all do that Im gonna create some more" and to use tech where you can to increase the flow the SDLC pipeline can handle.

This involves things like moving as much "code review" into the compiler/linter as you can, do a little more design/consensus building upfront, streamline all acceptance testing and deployment steps to their bare essentials etc...

https://trunkbaseddevelopment.com/


None of which works well if your reviewer sits in another timezone and your next business day is their public holiday. Waiting 3 days or more is not an option when I could just comfortably stack my changes and have the whole stack reviewed whenever my reviewer is available.


Why would reviewing have to be done by a specific individual? If reviewing is the blocker, you need to raise the priority and allocate more resources to it.


For me code reviews are falling in two categories:

1. Just pass it by a second pair of eyes to make sure I'm mot doing anything really dumb or nefarious. Here I don't care who reviews it, and in my team the person currently on-call is usually the choice. 2. I'm working on something complex or that I'm not yet familiar with. In that case, I want a specific person, who I know is familiar with the technology or the project.


I work for FAANG and making changes to code owned by a team in the US is very common. However I'm from Europe and nobody else around me can review that code.


> Waiting 3 days or more is not an option

Waiting for more than 20-30 minutes for a review is ridiculous and means some combination of three problems are present:

* commits are too big to be reviewed quickly

* Tooling/testing is not catching a lot of blatant problems before the pull request happens.

* Review process has turned into some bizarre management ritual


Or:

* Developers on the team value their sanity and timebox chunks of time for deep focus work, leaving only parts of the day for reviews. Some developers may even take vacations or go to talk to customers, leaving a small team with less review bandwidth.


> Waiting for more than 20-30 minutes for a review is ridiculous

Does your entire team just sit around doing nothing waiting to review your code? That timeline is asinine.


I don't know that it is optimal, but our team generally batches PRs once per day, to avoid interrupting them. So typically it's going to take about a half day for a review. As different people end their day at different times.

Exceptions of course, but this seems close to optimal to me (from the reviewers perspective).


If reviewing is the bottleneck for your team then having some slack where people sit around waiting to do reviews may well be the most efficient use of time (e.g. having a reviewer rotation where each week one person is a designated reviewer).


Do you have a dedicated review team?


I suspect the parent is saying something like "much code review is not necessary, can be replaced with automated checks". So once you remove the 80% of code review that isn't actually needed, the review capacity is 5x more.


That is kind of my point. Review capacity is meaningless if external factors require a certain review _latency_.

Even if the actual review takes 30 seconds, it does not matter if my reviewer is asleep for the next 8 hours.


I really didn't understand what the big deal was with Stacked Commits when I was just working in vanilla github, and it used to cause friction with members of my team who knew the stacked workflow, but didn't know the Github workflow. After moving to a company that uses stacked commits I will never go back to the pull request workflow. I think there are some famous people saying that Pull Requests aren't Code Reviews, and that becomes obvious once you start using stacked commits for code reviews.

This is essentially how cli git is set up to function, and mailing list driven workflows use this concept to have small incremental reviews. I am excited to see this tool, but honestly I have soured so much on github because of the lack of code review, that I would just as soon it didn't try to shoehorn into pull requests, and just did what people do with Phabricator, optionally use GH too host the git repo if you need to, but do everything else in a different tool


I still don't understand the big deal so maybe you could help: what is the difference between stacked commits and pull requests targeting branches in other pull requests?

I've used GH for code review so I'm not sure what's missing there, either. Reading this whole comment section leaves me a bit confused, like I'm the only one using a hammer to pound in screws when everyone else has moved on to a screwdriver.


The big win for me is when someone reviews my code I can easily rebase the entire stack and push it back up for review, and thats all done with `git rebase -i HEAD~n`. That means when someone is actually reviewing my code its not uncommon for me to change something somewhere near the base of the stack and push up changes to the entire stack. You in theory can do that with branch based workflows but its a pain. As I understand this is what graphite is automating.

Once you can easily do that, it naturally leads to being able to do review per commit which it makes it so easy to unblock your work because your not fighting against git, and you aren't worried about getting immediate reviews because I know if any changes are requested its trivial to make those changes, and I still have branches, so if I do have unrelated work I can still use branches to isolate them. The last thing is another tool feature, when I rebase and push up changes I have to give a comment on that new commit so reviewers can see the changes that caused the new revision.

Honestly I was a heavy user of Github and GitLab having instituted one of them at three different companies, and built out CI pipelines and development procedures for multiple large engineering teams, and I now feel like I was using a hammer to pound in screws and I will never go back.


Could you elaborate what you mean with Stack? I'm confused. Is it some kind of logical grouping of commits that aren't branches?


Correct it is just the group of sequential commits. You can keep multiple stacks on a branch, but the big idea is the branch is not the unit of review, each commit is the unit of review


I'm curious, how is this achieved?

The way I logically group commits is by stacking branches which are merged without fast forward. Works well with interactive rebase.


I think I would need to see it in action to truly understand. Reviewing commit by commit sounds good to me, though. I do try to keep my commits focused, reviewable, and "complete" (basically shippable) so that part would fit with my current workflow.


GH will also automatically retarget the dependent PRs to the `main` branch after the dependency PR is merged. I usually just make a note in each stacked PR, "Depends on #N", mostly so that the review and merge sequence is followed. But even this would be clear on its own by the PR number and target branch.

I agree that the PR code review process leaves a lot to be desired, but this particular workflow doesn't seem like a big problem. Though I've never used Phabricator and similar tooling, so there must be something we're missing.


Only for branches in the main repo, not those in forks.


It's equivalent but horrible to actually use. Try managing a stack of 40 that way.


That sounds miserable no matter the tooling tbh!


It's actually fantastic with the right tooling. Keep in mind that the Linux kernel works this way too, with long series of patches.


I figure I can probably keep two or three problems in my head at once before I start making mistakes, and my branches stay in my head until they're merged or abandoned. 40? It'd be impossible. Maybe I just have a weird brain though (or am just not a good engineer, based on what I've read about stacks elsewhere since reading this post.)


You wouldn't do 40 completely unrelated things, they would be 40 aspects of a single thing. I would think 40 is getting a bit big as I have trouble getting stacks of more then 15 past a reasonable review, but the good news is ( at least with some tooling ) you can commit partial stacks, so I can let my team review the first 10, while I am working on 20-30 and everything still just works ( even after I merge the first 10 into the main branch )


The stack of 40 is usually quite related and often would be a 2-3 PRs in GitHub -- it also lets you perform pipelining where you can add new commits to the end of the stack at the same time as merging commits earlier in the stack.


Even when the tool support it, I'm not sure it's a good practice.

Small code reviews often miss the forest for the tree. When you make the extent of the feature changes even harder to mentally trace back and put together by having a diff over another set of code that is itself an unapproved diff, the code review becomes pretty low quality.

I've also done this where my change that depend on other changes get approved, but then the other set of code gets critiqued so I have to change it and that breaks the code that depends on it so I have to change that as well, and now I still need to get it all reviewed again, so really I just wasted someone's time the first round.

What I do now is that I make commits over it in my branch, so I keep working while they review my first commit, and when that's approved I send the next commit to be reviewed, and so on. If there are comments I do an interactive rebase to fix them.


Google had a good paper on the benefits of smaller changes: https://sback.it/publications/icse2018seip.pdf

> A correlation between change size and review quality is acknowledged by Google and developers are strongly encouraged to make small, incremental changes

Heard though about wanting to maintain context within a stack. I think the best balance is tooling that both lets you create a stack of small changes, while also seeing each change in the broader context (forest).

The benefit of breaking out PRs instead of commits, is that each small change gets to pass review and CI atomically. I like your strategy of gating commit pushes until the first is reviewed, but I think the dream is to decouple those processes :)


Having worked in startups for most my career, and now recently joining a BigTechCo, I think some of the cross-talk comes from the commingling of concerns within a PR. PRs are used to review architecture, enforce security/code-conventions/style, check for logic bugs, catch potential conflicts with other efforts, and validate the logic of a feature.

In BigTechCo you have more coordination problems, so you want to merge code to trunk as soon as possible to prevent integration risks. But that means you drive discussion of the architecture and features to other venues than than the code review. Code review becomes more of a "will this break things" check, and enabling a feature-flag is more the review stage for the feature itself.


Thanks for that comment, now I understand why BigTechCo's like stacked changes.

If reviews are only "will this break things" I would gadly review their big PRs :D


Hum, thanks for the study link:

> Previous studies have found that the number of useful comments decreases [11, 14] and the review latency increases [8, 24] as the size of the change increases. Size also influences developers’ perception of the code review process; a survey of Mozilla contributors found that developers feel that size-related factors have the greatest effect on review latency [26]

I'll have to dig into those. It makes sense that it decreases latency, but I'm curious how they assessed useful comments.

That said, I'm not saying to make massive code reviews, because reviewers are lazy, they won't want to spend more than an hour to review no matter the size of it. But when the reviewed code is made on top of code that itself has not been reviewed, the quality of the review suffers in my opinion.


That's what we do as well but we've got merge block labels. If a pr depends on another PR, we put a merge blocked label on it and reference the PR in the comments. We've got a GitHub action which doesn't let a merge happen unless zero labels are present.


I've worked at both FB and at smaller companies. The "stacked diff" workflow is quite good, although it can be replicated via Github's PRs via "logical commits." The basic workflow is:

1. Each commit is a Phabricator-style diff. Put the whole idea into a single commit via `git add -p` and rebasing. 2. Multiple ideas/diffs for the same feature? Split them into multiple commits on the same PR. 3. Review the PR commit-by-commit: you can click on the links in Github to the individual commits and see+comment on the changes there. Comments on commits automatically propagate back to the PR FWIW, so this works pretty well.

The mental model becomes: the PR is the overview of the full feature, but not how reviewers are expected to review. Reviewers review commits.

It works pretty well! Assuming you have buy-in from your team to operate this way. In fact, I actually somewhat prefer it: with "stacked diffs," it can be harder to get the birds-eye view of the entire feature's set of changes. Git and Github are very flexible, so in practice I find you do need some established workflows that everyone agrees to follow, or else it's chaos.


This should describe commits for a normal PR workflow too though - each commit is the smallest deployable unit with a meaningful title and description. I guess squashing merges by default allows / encourages people to put less thought into individual commits since they aren’t expected to become part of the permanent history?


But, when using GitHub, isn't the missing puzzle piece here (which Graphite and other "stacked review tools" mentioned here attempt to solve) that you then cannot just merge those individual PRs one by one – you either have to merge the whole PR or nothing? Which leads us to the second option: to use multiple dependent PRs, which causes its own issues explained in other comments.


Stacked diffs in practice were often deployed as a unit at FB, in my experience, with the "Ship stacked diffs" button. There's maybe a little value in merging individual commits in a PR, but I think not a lot: if all the commits aren't in master, the feature doesn't work anyway. Technically you get a little bit of an edge from merging individual commits if earlier ones are acceptable and later ones aren't, in that those commits land in master faster and are less likely to have to deal with merge conflicts later — but in practice I didn't find this to be a dealbreaker. Most PRs weren't enormous, and most files weren't subject to many engineers modifying the exact same lines concurrently, so overall review time didn't usually cause too many conflicts — and conflicts usually weren't hard to solve anyway.


The "stacked diff" workflow means each diff lands as a separate squashed commit on master, even you click Ship Stack to land all of them in one CI run. This has advantages for code archaeology reasons.


I think his/her perspective, and mine, is that we don't want to merge individual commits.

My personal feeling is that I need to review each commit and then I need to review the whole PR (list of commits). If both are good then the PR can be merged but each individual commit is not OK to merge unless it's part of a PR.


> you either have to merge the whole PR or nothing

That's not true, though. You absolutely can merge a single commit, via git push, there's just no button in the GitHub UI for it.


> You absolutely can merge a single commit, via git push, there's just no button in the GitHub UI for it.

Well, I'm asuming we are talking about PRs against some protected branch where manually pushing changes directly is forbidden.


But you can't amend a commit without destroying the old version of it.

Stacked diffs run in two dimensions. In one dimension is the stack of diffs -- logical incremental changes to the code, which make sense to code historians from the future. But each diff in the stack also has a history as it progressed through code review.


Second.

Squashing and amending commits is equivalent to the workflow described in the article.


Usually when feature-branch-1 is still under review and feature 2 relies on feature 1, I just open a PR from feature-branch-2 onto feature-branch-1 to get the review process started, knowing that feature-branch-1 might still change which I can deal with by rebasing.


I came from Airbnb (vanilla github) and some friends converted me to a stacked workflow. I think the challenge with vanilla git is that rebasing and re-pushing becomes a brutal process if you stack 3+ changes. (At facebook people regularly stack over 10). Beyond recursive rebases, Github offers no support for navigating your stack, merging in a stack of approved PRs together, adjusting merge bases, etc.

Most of these practices are just inspired by what Phabricator and a few of the big companies are already doing well :)


>(At facebook people regularly stack over 10).

Meaning branch 1 depends on branch 2 which depends branch 3... all the way to 10?

That just seems crazy to me. What happens if branch 4 changes their implementation and borks everything depending on it?


Glue code is pretty easily split up into a stack. 90% of what I’ve worked on has been glue code.

To ship your change you need to update what’s there already. Tool A needs an integration point. Tool B needs a hook added. Tool C finally joins A to B.

Implementing the bits for A and B found two useful refactorings in each codebase. At the end of it all there was also some dead code.

That’s 8 commits. The downside is you end up yak shaving a lot.


You have to fix a lot of merge conflicts. I used this workflow at AWS -- code reviews can take some time, you're writing a lot of code, and you want your CRs to be small.

If someone leaves a comment on the CR for the first branch you fix the issue and rebase. It's not fun, but this is the only effective way to do it (that I know of).


So are the 10 branches mostly yours then? That seems slightly less crazy.


Yup. I rarely even pushed the branches — they’re local branches


Many FB devs don't use branches or maybe use one branch with the entire stack.

What you're describing is sometimes a problem but often not really one.


Are you sure about merge bases? I regularly use GH to specify a branch that is part of a different open pull request.


Yeah this shouldn't be an issue when following standard git branching practices. It's kinda the point


One problem with GitHub is that you can't do this with branches from a fork.


Can you be more specific what you think can't be done?

I just tried it, and I can without problem open a PR from a fork against an arbitrary branch of an arbitrary fork (or main repository).


You can't open a PR in the parent repository against a branch in the fork. You can create that PR inside the forked repository, but that doesn't allow the maintainers of the parent repository to do anything with it. So if you've got two stacked branch in a fork, you can't open two PRs for them in the manner GP described.


but if everyone works at the same company they can create the branches in the same repo


Zuul is an open-source CI/CD/project-gating-system that implements this for Github, GitLab, Gerrit, and pagure. It actually lets you stack dependencies between any of those systems. Just say "Depends-On: <url>" in the PR.

See https://zuul-ci.org/ and https://zuul-ci.org/docs/zuul/discussion/gating.html#cross-p...


Zuul is great. After using it for about a year in production, I must say Zuul is everything I hoped for and more (coming from a Jenkins deployment previously.) Some of the concepts take a bit of effort to wrap my head around but once I understood them they make a lot of sense. Leveraging of Ansible is great. Making jobs cheap and inheritable is awesome.


As the parent poster well knows (as they invented it) the trick here is that you write your jobs to install from a checkout the CI system does for you.

https://opendev.org/opendev/system-config/src/commit/0a27974...

is really nice practical example. What it does isn't even really that important; but it runs a simulation of deploying production code in OpenDev. This is a "devel" job, we deliberately test any changes against all the latest HEADs of the master/main/development branches of projects we use. This is a non-voting job -- a notice that what you're introducing might be fine now, but there is trouble brewing when our dependencies release their next version. Sometimes that's fine and a known issue, or upstream is broken, and other times it's something totally unique and needs to be fixed (this is why, despite AI being able to write code for you, so far the implications of using that code still need a human in the loop :)

The "required-projects" in the job definition tells Zuul what repositories this test needs.

You can clearly see how it handles various projects having "devel", "main" or "master" branches to pull from to get their latest versions.

In the "vars" section you can see we're setting variables that get passed to the job roles flagging "this is the devel job, don't install the latest release but use the source Zuul will checkout for you from here".

The amazing thing? If I make a change and this job fails, I may debug it and find that it wasn't actually my fault, but something in upstream Ansible committed recently. I can propose the fix upstream. They do all their CI, and that's fine. But I now put in my change comment

Depends-On: https://github.com/ansible/ansible/pull/1234

and magically Zuul will recognise that I want to apply that pull request to the Ansible tree in testing and set it up for me (as noted, Zuul can do this for all sorts of systems, not just github). Additionally, Zuul will not merge the change until the dependency is satisfied -- I can NOT commit broken code!


This is awesome! We should add support for this to the CLI, I took a note


For anyone interested, I've been collecting notes on various tools in this space:

https://github.com/epage/git-stack/blob/main/docs/comparison...

(granted the page doesn't mention git-stack since that is assumed)


Do you have a favorite?

I've struggled with this problem for a long time, but never knew that there were tools to solve it.


I'm biased as the author of git-stack but struggling with this problem is why I wrote it. When I started it, I didn't know there were so many tools in this space. However, if I knew then what I knew now, I would have still written it. Too many tools require adopting a specialized workflow which doesn't play well with everyone's situation or other git tools. Some do but they require manually defining each change which is too much boilerplate for me.

My goal with git-stack has been to automate my PR workflow and sit alongside other git tools. I've even tried to keep it open so you can use parts of it with Phab or other tools. You should be able to start using it out of the box with a standard PR model. It'll have better performance and behavior if you configure which branches are shared branches and mark them as "protected" in git-stack. If you don't use a PR model, then that might just don't run `git stack --push` but the rest should still do a reasonable job.

If I were to pick a tool besides git-stack, it would either be git-branchless or Graphite.


> How do you tell reviewers that your reactions PR depends on the comments PR?

It's at the top of pr.

"someone wants to merge 1 commit into random-branch from another-random-branch"

> How will reviewers see the relevant diff for the reactions PR (excluding the comments PR changes)?

It will show it by default? If they want to see comments PR changes, they will need to go to comments PR.

> How can you propagate changes to the reactions PR if you ever need to update the comments PR before you land it (i.e. to address review comments)?

Use git rebase --onto.

It would be nice if the marketing page told prospective users what exact difference their tool provides - since I'm already using that workflow and never seen one of those internal tools.


Hi, sadly no marketing people at this company. Just engineers and product people.

We were trying to get this to work on Github for almost a year before we decided to just build this ourselves https://twitter.com/TomasReimers/status/1325647290128850950

Going through everything that's different is hard, but I'll try: When was the last time you made a stcak of changes that was over 30 branches tall? Why?

Smaller PRs are generally agreed to be strictly better, and you've probably written a 2000-4000ln pr at least once (which could have been 30 100ln PRs). So why did you not break it out? What part of the tooling was broken?


> When was the last time you made a stcak of changes that was over 30 branches tall?

Is this a thing? I think the highest stack I've ever had was 2. I suppose it could depend on company culture/tech stack/code organization (monorepo vs. services).

> 2000-4000ln pr at least once (which could have been 30 100ln PRs).

I can't say I have. Maybe deleting a lot of code where it's very obvious what is being deleted and why. What sort of work takes 2000 lines that can't be broken down into smaller, incremental pieces?


You can break down a big feature into smaller bits of code, but what's the point of releasing a some new code that's not being called by anything yet, when you might change it as you converge on the final solution?

What's the point of a PR that's just an HTTP handler + adapter that's not exposed via an endpoint yet, and might change once we implement related the domain layer methods? What kind of context will a reviewer have on whether this is a decent handler + adapter when there is no other context about which methods the handler will call and what data will be returned?

If we release the domain layer first, we may want to adjust the domain data model in order to better serve the application layer, or work better with the infrastructure/datastore layer.

And so on.

IMO the PRs need to be big enough to encompass an atomic, complete unit of functionality or else it's basically impossible to tell if the approach and code is any good.

"It doesn't break anything" is not good enough, you can still create tech debt by releasing code that may need to be changed to complete the feature. Or someone will just implement the rest of the feature in a worse way in order to integrate with the non-ideal code that you already released when you didn't have a good enough view of the final solution


The point is that others see that it's there, they can start to take it into account when they develop their solutions.

Of course if there are problems with people depending on not yet final code that seems more of a higher level communication issue than git workflow issue.

Again, if atomic unit of functionality needs huge towers of stacked PRs that indicates that they are not really atomic and the design phase (communication) seems to be missing.

After all the reviewer should know about the design, the HTTP handler + adapter should not be a mystery to the reviewer. And so on...

...

Now, that said, sometimes looking at a big PR as whole takes up less review time than in daily/weekly increments. But if the reviewer only sees the thing late in the dev process plus the design is not agreed upon then that means the developer writes a lot of code that might go straight to /dev/null when the reviewer disagrees with the fundamental design of it.

The assumptions needed for these huge dangling WIP PRs to make sense seem very strange (risky and suboptimal) :o


It doesn't have to be an either/or. The method that I recommend to my reports is to develop the feature in a large WIP PR branch, get some rough reviews on the macro architectural choices in that PR, and then once you have a greenlight on the larger approach, break it out into smaller PR's that each represent a smaller change necessary to effect the feature.

There are definite benefits to shipping less code to production in one atomic deploy. When something breaks (and it will), you want to wade through as little code as possible while things are burning in order to identify what went wrong. Sure, you could just revert everything that went out with the last deploy, but sometimes things aren't that straightforward. Say when there are other sibling changes that are not easily roll-backable that you'd have to clobber to do a pure rollback. In these scenarios, it may be safer (albeit slower) to revert your individual change and deploy the revert by itself.


I get what you mean, but rolling out in chunks can make it even harder to pinpoint and revert issues.

If you roll out chunks A, B, then C, and things broke after C, the issue can still be in A, it just doesn't get exposed until it is accessed in a certain way by C.

How quickly can you determine which one(s) to revert?


You revert C, because that's what caused the issue. Then you investigate.


Better yet when possible, you turn off the feature flag that enabled the code path.


I’m working on having a large scale open source project support a new SQL database (Postgres) and the migration work is about 2500 lines currently since I have to touch basically every part of the project. I don’t want to do the migration piecemeal since turning on the Postgres unit tests for only sections of the code base and keeping track of that and worrying about all the dependencies of code changing as other people touch the project is a lot more overhead than doing it all waterfall style.

Now, I will say that these type of things are not everyday kind of PR’s, but they have certainly happened several times over the course of my career. Probably once or twice a year if I had to take a rough guess.


I think you have an useful feature, but needing to get someone else answer me what it is in comment (https://news.ycombinator.com/item?id=29256481) isn't a good way to explain the product.

Regarding your questions, the most I've done was 4, usually at most 2. My PRs are single commits that I amend and force-with-lease after getting useful code review feedback. I don't think rebasing one after another is particularly annoying, but I think not having to do it is a nice feature. I wonder whether it's a product tho. I wish you success anyway.


Parent did not imply that you have or should have marketing people. Your first sentence comes across as defensive. You should express thanks for this feedback and avoid sounding defensive, as that is alienating to people, stifles dialog, and casts you in a weak light. This is your moment in the HN sun, soak it up!


That did not come across as defensive at all to me. It was pretty clearly tongue in cheek. Your comment, in fact, comes across as judgmental and condescending


Thanks. I can see that I could have worded my feedback more tactfully, and will do so in the future.


Oh hey, I'm sorry! :(

That sincerely was not my intention, and a lot of context and playfulness can get lost in comments. For what it's worth, I was simply trying to make a joke about why our marketing materials may not be as good as they should be and continue the conversation. I apologize if that came off as defensive and will make sure I double check my future comments so that they don't come off in the same light.


Ironically, this reply is also an apology!

No judgment, I just found it funny.


No worries! My intention was just helping you all look your best -- which you are doing with this friendly reply.


Thank you for the feedback :)


The docs for the CLI and dashboard go into some more depth - might be what you're looking for?

- https://docs.graphite.dev/guides/graphite-cli

- https://docs.graphite.dev/guides/graphite-dashboard

Re `git rebase --onto`, the open source CLI offers a recursive implementation to prevent you from having to carefully rebase each branch in your stack (https://github.com/screenplaydev/graphite-cli/blob/main/src/...)

One of the reasons you cant use a simple rebase --onto is that you dont want to accidentally copy all downstack commits between the merge base and what you're restacking onto. The CLI tracks branch bases commits through git refs to ensure that restacking never duplicates commits.


>Re `git rebase --onto`, the open source CLI offers a recursive implementation to prevent you from having to carefully rebase each branch in your stack

Thanks - you've answered what I've been looking for and I'd prefer that this information was in the marketing copy.

I agree that this would be an useful feature. I don't think this is a product though - I don't see myself doing more than 3 or 4 stacked diffs when this functionality would really shine.


As far as I’ve used it, git rebases ignore duplicate commits.


They are not duplicate if you change something (like, using amend) in one of the preceding branches.


yes that’s a possibility too, but then the typical merge resolution rules should apply. If there are common areas modified that is going to incur a manual review. There’s no magic automation I’d really entrust to fixing that.


I second that. It's a bit surprising to me that teams could have such big PRs and branch stacks. I can only say from own perspective but I imagine this kind of nightmare would be our reality now if we didn't pay attention to improve the we work and use PRs along the way.

PRs work very well for us and if we find ourselves stacking them, then we keep an eye on getting them into master (yes, we don't use the 'main' bs) in reasonable order. Then Github is actually doing job for us changing the target branch once the underlaying changes are merged into it.


Merging a branch into another branch is fine, but then you have to merge that other branch up into main. And that shows effectively two branches' worth of diffs, which really muddles things if some of the changes have already been reviewed.


As long as you review the 'comments' branch first, merge it, then move on to reviewing the 'reactions' branch it should be fine.


> The blocker here isn’t Git. In fact, the reason you can’t go with option #4 is that your code review tools probably don’t support it

I do stacked commits (branches off branches) all the time. Github certainly supports PRs based on such.


I still haven't found a reliable way to prevent merging branches that aren't the first in the stack :/


(Graphite engineer) We had the same problem. Coded in a feature that prevents you from casually merging branches that are not bottom of the stack :)


What I usually do is if I need to stack more PRs I open them as a draft to avoid accidental merges. I then convert each PR to a non-draft as I “pop” the stack.


How do you deal with rebases when you have to?


Like so:

    git switch offshoot
    git rebase --onto feature branchpoint^
Will replay your offshoot branch onto its underlying feature branch after the feature branch has been rebased. The branchpoint is the first commit that diverges from the feature branch.

Finding the branchpoint commit is the only real annoyance here -- if there's a ref or a better trick for finding it, I'm all ears.


My usual workflow to rebase a stacked set of feature branches off of main is:

    git checkout main
    git pull
    git checkout feature-branch-1
    git rebase -i -
    git checkout feature-branch-2
    git rebase -i -
    ...
There's most likely a more efficient way to do this, but I find that referring to the most recently checked-out branch with a single hyphen is a lesser-known feature, so perhaps it will be helpful :)


Nice, didn't know about `-`. What's your process once you're in the interactive rebase? The `git rebase --onto` I use is noninteractive and could be an alias, were it not for the business of finding the `branchpoint` commit.


I have an extra script in the CI job to detect rebased/squashed commits and fail the build if they're present.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: