Hacker News new | past | comments | ask | show | jobs | submit login
Stacked Diffs versus Pull Requests (2018) (jg.gg)
40 points by codesuki 9 months ago | hide | past | favorite | 49 comments

I've used both, and I wouldn't describe the time savings as significant.

Compare the case study of Yang, who runs either

    git checkout neighbour-fix
    git checkout -b yang-work
    <time passes>
    git checkout neighbour-fix
    git pull
    git checkout yang-work
    git rebase neighbour-fix

    arc patch --diff D12345 # arc's a CLI tool for Phabricator
    <time passes>
    git fetch
    git rebase -i HEAD~2
    <goes into phabricator and finds the new commit hash for D12345>
    <pastes it into vim in place of the original commit hash>
From this, it's not clear to me that Yang is getting the better deal if he uses Phabricator. Certainly, he has to enter fewer commands, but he also has to leave the terminal to go copy a commit hash from Phabricator, and then paste it into an interactive rebase session. If given the choice, I'd choose the PR workflow over the Phabricator workflow, based on not switching to the browser and not needing to copy-paste things into an interactive rebase. Phabricator and arc handle deep stacks of diffs more nicely, but that's rare enough that I'd rather optimise for the common case.

Apologies if I've misunderstood exactly which commands Yang would run in either case.

After <time passes>, you can run arc patch --diff D12345 again and arc creates another branch called arc-diff-d12345-1 or something like that, then you can rebase -i onto that.

I feel like phab had a couple workflows that were very efficient and if you ever had to do multiple git operations you were probably using it wrong. You could do everything but the actual code review in the CLI. Github is hampered because they can only use git operations and they just try to figure everything out server-side.

That said, phab certainly has it's idiosyncrasies and anything outside of the workflow that they intended was not supported.

One related aspect that the article doesn't go into too much is that there is a tension in the size of the unit of code review: there are reasons for reviewing big chunks at once, but also reasons for reviewing individual changes that are as small as possible. I've gone into more detail on this in the past.[0]

Stacked diffs make that possible because you can review either individual commits or an entire stack at once.

The irony is that this is largely the way that Linux kernel development works -- and the Linux kernel was the first user of Git! Most projects who later adopted Git seem to have never learned this lesson that was there from the beginning, and have since been reinventing wheels.

[0] http://nhaehnle.blogspot.com/2020/06/they-want-to-be-small-t...

Great post!

Hmm, I don't get it:

> If you want to have your code reviewed, you first have to branch master, then commit to that branch, then push it remotely, then create a PR.

When I want to create a PR, I always create a feature/WIP branch locally and push that. Branches are cheap in Git.

I find when the unit of scrutiny is an individual commit, like when you contribute to the Linux kernel, I spend way to much time fabricating nice atomic commits. Like maybe 50%, with 30% actual development time and 20% normal cleanup.

The only thing I miss with PRs is the ability to rewrite my branch after showing it. I think it is not supported to do `rebase -i` and then `push --force` to a PR, is it? It would be really useful, especially for the cases where you have multiple commits that cancel each other out (e.g. add debugging output, remove debugging output).

What you describe is standard merge request practice in Gitlab (it is possible to view the diff between different force pushed ”versions” of the branch as well).

Don’t know about Github.

GitHub is hostile to rebased PRs, going as far as erasing commits not present in the latest version of a PR. It does record a "... was force pushed from commit xxxxxx to xxxxx" in the PR's chronology though; but if you click on the first commit in that message it shows a 404.

Good to know, we use GitLab and occasionally would use that. I tried it a couple of years ago on GitHub, and it seemed to break the UI. I read it got a bit better now.

To make it easiest on review, I do fixup commits (commit --fixup <sha>) and then clean them up at the end. If the conflicts are too messy, I don't rewrite. I have a check to prevent merging with fixups present. Id love it if that was built into the merge itself.

This is one problem with the "don't rewrite history" comments people make: the advise isn't clear enough, People advise against it once "push"ed or "public", but that isn't right. Its when collaborating, like on a long lived branch. PR branches are 99% of the time single contributor.

you can rewrite it and force push it. it is part of the rebase-based work-flow ( always rebase PR to the head of main/master)

Sure you can force push it, but the reviewer has to review the whole branch all over again. No diff between versions of the branch.

Actually, the words 'force pushed' in the force push message on the PR (or very nearby to that), are a link to a diff between the pre and post push states of the branch.

You can also construct such a diff yourself by going to something like /compare/<old commit hash>..<new commit hash>, and those hashes can also be tags or branch names, and should even allow relative stuff like HEAD^.

(I haven't checked the URL just now, but I use both often. It's a little annoyingly hidden, but it is there.)

Yuck. That sounds like a github bug to me. bitbucket has its flaws, but this isn't one of them. We rebase review branches on a regular basis without difficulty and have a useable linear history as a result.

This is something that Gerrit is really good at. The model is a little weird, because it tracks all pushes to a code review essentially as separate commits. But it makes the workflow really really easy for reviewers: you can diff any two versions of the proposed patch. Github has no equivalent that I've found. You can look at a subset of individual commits in the PR, but they're not diffed against each other, which makes it harder to check that the one or two things you were concerned about have been addressed.

<plug>You might want to take a look at https://reviewable.io -- we can diff arbitrary commits without Gerrit's weird model, and a whole lot more besides. </plug>

You're still missing the point: diff of diffs. When GitHub compares two commits, it diffs the whole repo as of one commit against the other (like `git diff commitA commitB); what is requested is diffing the change contained in one commit against the other (like `diff <(git show commitA) <(git show commitB)`).

PS: I checked out your demo. I can't see how to diff arbitrary commits, only revisions of a PR.

IIUC, you're looking for true diffs between revisions with different bases. If so, Reviewable deals with that automatically, running 4 separate diffs in the background and integrating the results so that base-only diffs are hidden by default.

As for diffing commits, you can switch a review into "review each commit" mode so that it will create one revision per commit. You can make this the default for a repo but it's not in the demo as it's not the most common way to work. The demo does really need updating, though!

I do something in between. After the initial set of commits and review, I'll add a new commit 'Changes from PR review'. If there are additional changes from further review, I do usually amend and force push this commit.

If there are extensive changes from a follow-up review it gets its own commit and specific commit message.

Github shows the force push, and from which commit to which commit it happened. You can then manually put the into the diff view of Github and see what changed. I agree that it should have a UI though.

The diff view is unusable if master has moved in the meanwhile. What you really need is a UI for "git range-diff".

You can do that in azure devops. By default, it shows all commits and then you can go to the commits tab and select the ones you want to view ( multiple or single)

Can it show a diff-of-diffs, and detect patches that were added, removed or reordered? Something like https://patchew.org/QEMU/20210414134112.25620-1-iii@linux.ib....

> You can then manually put the into the diff view of Github and see what changed.

GitHub throws up 404 when I do this. Same with simply trying to view the old commit directly using the `.../commit/<hash>` link.

In GitLab you can diff the states between force pushes of a branch, which is a feature I love and really miss when working on GitHub.

The author is describing creating a local branch too, but that branch needs to be created from a commit, which will typically be creating your local branch from the most recent master commit. e.g git checkout origin/master -b my-work

The piece seems to make the most negative assumptions about one workflow and the most optimistic ones about the other. Or rather the assumption that you take the effort to learn the optimal way to use one workflow but take no effort to do the same for the other. Most everything looks great when you take that approach.

Things like making a branch off of another branch being somehow so difficult that no one would ever even consider it. Or that there isn't a Github CLI tool that allows you to avoid having to go to the UI.

Good points.

> "a Github CLI tool that allows you to avoid having to go to the UI."

-- better yet, git-extras (https://github.com/tj/git-extras) provides CLI utils that work across github, gitlab, and bitbucket.

Speaking from personal experience I’ve done both. The “optimal” PR experience is still more painful than stacked diffs if you’re working on a team that cares about commit quality (which everyone should IMO)

Which is a valid argument imho, I just found the article to make too many pessimistic/optimistic assumptions based on the workflow they dislike/like.

If you care about having nice self-contained commits, what is your process for sending rough work-in-progress to colleagues, or to continuous integration?

You post PRs in a draft or WIP state and don’t request review until you are ready. This lets you leverage CI, but not take any time away from others who are doing review.

same in both cases. Either you put up a PR or a stack set with no reviewers and send that to whoever needs it. In the stack diff approach you can actually add reviewers for the “non controversial” parts that are ready for review and not add reviewers to the ones that aren’t ready yet.

In both cases you can rewrite your commit stack after-the-fact to add new ones, make fixes etc.

I agree with most of the point of the article (I'm a huge Phab fan), but I feel like the explanation of what stack diffs are and why it matters was a bit lacking.

I usually describe it by comparing it to floating patches instead of anchored branches:

(Phabricator) diffs are more like git patches than commits/branches for review. They can be "grafted" on master or other places by applying the patch anywhere master or local branch, the diff is "floating". PRs are specific commits / commit ranges to review and adopt by merging/rebase-squashing, but are "anchored" to the place they branched off of.

This matters most when fixing 1 bug means multiple separate change that would be best staying separate commits, having to make N pull requests that depend on each other leads to rebase/forcepush cascades, which sucks. Having "patches" ready to apply means that halfway through "landing" (closing/merging) the diffs, the remaining diffs can be grafted on actual master instead of the (now closed) diff below, instead of having to rebase forever to update Github UI.

One major downside of this workflow is that it's different from what git teaches you, so every newcomer needs to learn "the Phab way", which gets people grumpy, and is only realistically feasible in a workplace, not really in a FOSS project as it increases the bar to contribution.

And then there's tools like Gerrit, which Just Work. Phabricator and GitHub are both very hostile to "stacked diffs".

Github does just fine. It's true that continuous rebasing works poorly in the Github UI (though I still do it anyway), which tends to forget ancestor commits if they get orphaned (you can end up with comments hanging off of stale commits that it forgot). Github's review UI prefers that changes to the PR be presented as new patches and not reworked versions of old ones.

I agree this is a poor choice, but it's not about merge strategy. In fact Github, when you apply a pull request, rebases it linearly by default and it goes into the tree the same way a "stacked diff"[1] would, preserving the linear history feature that is the whole point of that strategy.

[1] I hate that term, FWIW. The value being provided is a continuous linear history in the main archive, we should call it "branchless" or something instead of naming it after minutiae about how it's implemented.

> Github does just fine.

I wish this were true, but it's really not. Not remotely. It's garbage. bradfitz's twitter thread[0] does a good job enumerating the problems. My shortlist: unable to comment on arbitrary lines; unable to comment on specific/old commits (surprise! you force pushed and they're gone anyways);

> I hate that term, FWIW.

The traditional term is "patch set" -- you apply the patches in series.

Mailing lists are truly the only good (publicly available) review tool.

[0] https://twitter.com/bradfitz/status/1375112189636304898

> I wish this were true, but it's really not. Not remotely

Again, you're off on a tangent talking about the Github review UI and not git branching strategies. As far as github: meh. I use it every day to manage large PRs on a large project, and it's fine. Gerrit and mailing list processes fix the problems you have, but have their own pain points. Gitlab is worse than all three.

Pick the tools that work for you, adapt to the ones that work for your codevelopers, and don't start fights. Calling someone else's tool garbage helps nothing but your upvote count.

> Again, you're off on a tangent talking about the Github review UI and not git branching strategies.

> I use it every day to manage large PRs on a large project, and it's fine.

I feel like you're becoming borderline disingenuous here. I demonstrated how Github is not fine, it's actually quite deficient, regardless of your branching strategy. The title of TFA is literally "Stacked Diffs versus Pull Requests". Here's the second line of TFA:

"People who have never used it and only use Pull Requests with GitHub/GitLabs generally don’t understand what the fuss is all about."

I'm beginning to believe you're those people.

> Pick the tools that work for you, adapt to the ones that work for your codevelopers, and don't start fights. Calling someone else's tool garbage helps nothing but your upvote count.

I'm not looking to pick a fight here, I'm honestly not sure why any sane person would defend the Github PR workflow. Beyond "it exists" it really doesn't have much going for it.

I call it Garbage because that's how I see it. And I hope others see it that way too. And I hope that we as an industry work to build tools that are not Garbage and I hope that those folks mentioned in the article learn to use and experience other tools and realize there is A Better Way to do things. We don't all have to put up with mediocre software.

If I see a friend eating literal garbage, I would tell them "hey, you're eating literal garbage, maybe you should stop" and not just keep it to myself.

I think https://news.ycombinator.com/item?id=26925235 put it nicely.

PRs are expensive. So when you think 'this is a self contained change, but I need it to continue development' you are unlikely to send a PR for it. You will just pile on more code.

Because if you make another PR based on that first PR and someone finally does code review and tells you to fix something then you suddenly need to propagate these changes to all follow up PRs which is quite some work.

And while I think I might just miss something obvious it seems many people miss it that's why we have blog posts about 'stacked PRs'. (On HN a day ago or so)

They seem to be conflating a few different things here. We are right now doing branch-from-master-and-squash-merge-pr, just do we don't have the bisection issues they mention.

Having a far greater emphasis on even smaller units of work for a single feature (per commit) seems to unnecessarily increase cognitive load? Like now I have to do even more "planning ahead" of how my changes are going to inter-relate, and find some way to separate them out. I mean, that is ideal, but... software also has to be written.

You actually have less visibility into your bisect because your bisect step is much larger than in a stacked diff. You can’t distinguish if it was one of N refactor a that had a typo or if it was one of the 3 logical changes you made.

I’ve never plan out my commits and I manage to have well-structured commits. I use a mixture of git add -p and noticing when I’m needing to do something else and being diligent. It’s a skill for sure, but I think it’s one that anyone can get good at with practice and mentorship.

The easiest way actually to start this is to have a single giant WIP commit. Then git reset HEAD^ once you are done and now start committing piecemeal. Once you do this enough times you’ll develop a sixth sense and start making the interim commits on the fly and use fixups for mistakes that you autosquash. That’s what I do 90-95% of the time and the rest of the time I might have to do some rewrites or I might get some merge conflicts with myself (if I change a preceding commit that a subsequent commit changes). Here’s the thing though. If you’re being diligent about your commits in PR land and making sure each is reviewable (which really is what you should always do regardless), then you’ve already done all the work. Representing it as a stacked diff in the review tool just makes it easier to land piecemeal and avoid situations like “well your PR had other pieces but I needed this one piece from it so I extracted it into its own PR and fixed the issues I had with it. Sorry about the merge conflict”.

Stacked diffs aren’t necessarily the end of where I want review tools to get to. What I really want is to be able to generate a hydra where I might merge in entire stacks. If diffs sit in review long enough, other things I’m working on start to depend on previous commits in other stacks (eg cool feature X). I want to be able to rebase the entire stack and have the tools work correctly. Unfortunately that’s not possible with any tool I’m aware of and stacked diffs still suffer from rebase hell (Facebook internally had a GUI tool that made this slightly more bearable by letting you drag branches around visually). However, this piece requires support from both review tools and the SCM system (so I can rebase an entire body of work without doing each piece manually and resolving BS conflicts multiple times)

> I want to be able to rebase the entire stack and have the tools work correctly.

<plug> https://github.com/krobelus/git-branchless/ </plug>

> resolving BS conflicts multiple times

For identical conflicts you can enable git rerere

Or you could just do trunk based development.

AFAIU this is trunk based development. It's just way easier with a system like Gerrit than it is with GitHub.

In cases like that, you miss the magic of svn where you could easily commit/push small parts of the code independently in the current branch.

I haven't used SVN but you can do this with git by using git add --patch. I do it every day!

No because you can't pull or push until your branch is clean or you have to stash everything.

Put uncommitted changes into a WIP commit (or multiple WIP commits), then fetch & rebase.

Putting stuff in commits ASAP locally is good practice anyway, since it guarantees you will never accidentally lose your work.

So it is exactly what i'm saying. You need 3 operations, that will completely change the state of your repo folder.

with svn, it was common for me to have something with 50 files.

Currently working on 5 files "v,w,x,y,z".

I noticed a simple typo in file "c":

modify "c", commit "c"

3 pending modifications in "f,g,h".

commit "f"

commit "g"

commit "h"

Someone ask me to correct something else in "c".

svn up, correction and commit "c".

then "svn up", just updating everything without messing with current work on "v,w,x,y,z".

(no need to stash, or rebase, or anything as no one else touched them anyway)

And let's suppose that i just want the new version file "k" without modifying everything in my current working folder:

svn up "k"

Good old time was as easy as that... even if git has its own advantages for some other cases.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact