Improving the stacked workflow is one part of the equation but I'd recommend being careful of letting that mask other problems
I identified a series of busy-waits including:
- Post, wait, CI failure, fix, post
- Post: wait for reviewer, ping reviewer, wait, find feedback was posted, fix, post
Some of the root causes included:
- Developers weren't being pinged when the ball was in their court (as author or reviewer)
- Well, some notifications happened, but review groups were so noisy no one looked
- The shared responsibility of review groups is everyone assumed someone else would look.
- The tools we were using made it hard to identify active conversations
My ideal solution:
- Auto-assign reviewers from a review group, allowing shared responsibility of the code base while keeping individual accountability of reviews
- Review tools that ping the person the review is blocked on when they are ready to act
- Review tools that give a reminder after a set period of time
- Azure DevOps' tracking of conversation state with filtering (and toggling between original code and the "fixed" version)
- Auto-merge-on-green-checks (including reviewer sign off)
With this, responses are timely without people polling the service, letting machines micro-manage the process rather than humans.
It's a code review system with browsing as an afterthought, as opposed to GitHub which is a code browser with review as an afterthought.
Another thing I don't like about Gerrit is that they seem to play Tetris with UI elements, i.e. "wherever there's room to put it is where we put it". It's not as bad as it used to be and improving, and it looks less dated than it used to, too. There are also newish features like patch editing in the web interface.
All that said, patch queues / stacks and the rebase based workflow are nice.
ISTM that what is needed is to observe how people interact with the tool and then try to figure out how the workflow can be improved based on those observations. Or maybe I'm trying to interact with gerrit in some strange way - that's also possible.
The +1 / +2 for different labels aren't that confusing to me - additionally, in Chromium I found that gerrit is very descriptive when you on-mouse-over the buttons, but IDK if it's specific to this deployment or if that aspect got improved.
Some basic markdown-like syntax is supported which renders indented lines preformatted, lines starting with "- " or "* " as list items, and lines starting with "> " as block quotes
The nice thing it has is really easy to rebase off other people's in-progress work (for GitHub you need to to fetch the other person's commit by url and hash -- I guess after writing that down it's not a huge advantage).
A lot of what you described is exactly what we've built / we're planning for the code review portion of the site - you should sign up and check it out :)
Making it easier to stack PRs feels like entirely the wrong direction - "when you're in a hole, stop digging" - stacking is giving the people in the hole better shovels to keep digging with, instead of getting PRs flowing through more easily so that stacking is not needed so much.
- Auto-assign reviewers
- Pings for people (unsure how sophisticated)
The review conversation itself is the big issue.
> Reviewer able to make tiny changes to the code without a further round trip to the developer. Things like "missing full stop here" or "Please put a comment warning about X here". Developer should get a heads up that their code has been merged with changes, just so the tiny changes don't go unreviewed.
I loved it when I worked on a small, high trust team where we were comfortable with each person making the decision on whether it was safe to do a post-merge review or that it needed attention first with a pre-merge review.
EDIT: Something I have liked at a couple of my company's was a "cowboy commit audit" where we allowed people to directly push but doing so would ping the code owners and they would have to review to clear the state. People understood they shouldn't but also understood when it happened.
- You must use a browser editor, no attaching a diff you generated elsewhere. Anything more than 1 line is hell because Github's editor gives no fucks about running gofmt or whatever.
- If you request a change, approve the PR, and then the submitter accepts your change, it dismisses your approval.
I think for style formatting, making it an auto-format step during commit/push is the "better" way. Codified formatting means you argue and hash it all out once and be done with it forever.
Eg.: "Touches comments and whitespace only". "Resulting compiled binary does not differ". "Variable and function renames are okay". etc.
Remember that with the original developer still able to review, they can always rollback a change they weren't happy with.
any experience with that?
I do remember a situation of hot seat programming where we had a critical problem right before release that crossed domains and three of us worked at one desk consulting the others or swapping out as needed. Again, it was worth it because of the urgency and priority but not something worth doing generally.
That's a bad pairing experience, and I've had a handful of those in my last five years of pair programming.
On occasion I've said, "Look, why don't you tell me what to do, and I'll drive?"
But I think that the larger issue is that some developers are better at solo'ing than pair'ing, and there's nothing wrong with that, and at the company I worked (Pivotal), we'd find positions for the people who were more inclined to solo.
FOO is the background event scheduler, it's connected to BAR via RPC
Ok don't worry about what that is for now, just have a look at the service logs
So it's a daemon that running on port 5000
You'll have to tail the logs in /var/foo/logs
So type T. A. I. L, and then dash f
No, the hyphen character, not literally dash
And so on and so forth. It's not that a person more senior than you can't drive, but I have found a lot of time the implicit knowledge of so many systems, and topics and accreted domain knowledge can really be a productivity drain. The best pairing usually comes from equally matched individuals.
Anyway, this person will learn that dash means - , and not make that mistake again. It's a one-time mistake. So I think the pairing helps even in this case.
Why not? 95% of most peoples work can be done in an IDE/text editor, and that last 5% is probably very exploratory anyway. I work in games, and gameplay programmers work almost exclusively out of IDEs with GUIs for source control and most other dev tools etc.
> Anyway, this person will learn that dash means -, and not make that mistake again. It's a one-time mistake.
Disagree that it's a one time mistake. On any given day/task there might be 5/10/15 of these. dash, ctrl + r, tab complete, !!, different shells, shebangs, flaky backend services, etc etc. If you want to teach someone how you work then yes this might be appropriate.
> So I think the pairing helps even in this case.
That said, I agree :) It depends though, is your goal to bring two programmers up to speed or is it to crunch through tasks right now? Because pairing helps with one of those goals but not the other
teach the person to use a computer first
Okay I might exeggerate a bit but if someone literally writes 'dash' at that point, don't pair on something that has a deadline like that.
Do take him aside and let him know he gotta brush up on the damn basics and that you are there for him through it but you're gonna finish that feature with a deadline first while he takes a stab at that easy bug over there.
Bad pair programming will suck up time by occupying two people at reduced efficiency, and won't lead to better code.
I've had some really great pair programming experiences, but we as a team aren't mandated to use it. We see it as a tool to pull out when cross-domain knowledge is needed, or when facing a particularly thorny problem. If you get the right two people together, they can fill in the gaps of each other's knowledge. The person not driving has more mental capacity available to think through potential alternatives.
However, the success of pair programming depends as much on the personalities of the individuals doing the pairing as anything else. They need to function as teammates, without any ego issues or any other thing that may prevent either party from feeling fully comfortable sharing their knowledge. Trying to pair program with someone who is defensive about their code/abilities is a true nightmare.
So the whole waiting aspect is completely gone.
Sure, this can have bad results with less skilled and/or serious people. But that's always true.
ADO has that.
Post, wait, CI failure, fix, post
This is the worst, especially when the CI system is homegrown, rickety, and difficult or impossible to reproduce locally.
Phabricator got a lot of stuff about code review right (I may be biased from working at FAANG). I'm happy to see someone trying to improve the code review situation on GitHub. When I did use GitHub at the startups I worked at, we were never fully satisfied with the code review experience.
I hope they surface "Phabricator versions" as first-class objects. I never liked it how GitHub just clobbers the history left behind from force-pushes from to branch.
It's somewhat amazing and shocking how little the review experience on GitHub has improved over the last 10 years.
Maybe gate sone of the more advanced features behind an opt-in so beginners are not too overwhelmed (and because you probably don’t need them on smaller projects), but the current state of things is just appalling, and more importantly has not at all kept up with workflows on large projects or long / complicated PRs.
The complete lack of support for reviewing over force pushes is insane.
Doing the same in Phabricator is awesome. One blocking commit PR on the bottom of a stack can easily be moved to the top or a new stack, unblocking the rest of your changes. Re-writing and entire stack due to one small change is mostly eliminated.
(Good) Tooling is a major gain in efficiency.
This is a git CLI wart more than anything. Magit (git interface in Emacs) makes juggling branches/worktrees and rewriting history intuitive to humans. I feel super-powerful in Magit and absolutely feeble using the CLI.
Github is annoying as crap. Can't even comment on a line in the file that needs changing but was missed in the PR, can only comment on a tual changes.
Phab is god awful, it's like someone built that whole system with a purist mindset, everything is perfectly designed with perfect referential IDs. But every screen is a view into the db table, no screens join information and make it usable. This is worse when you have n:m relationships because they expose those tables via screens too, it's just plain awful.
I actually like Phabricator, but the DB schema is insane. Like, literally insane. I spent hours trying to figure it out once before giving up in disgust.
Devising a coping mechanism for developers so that they can increase the inbound flow to an already overloaded system is not actually a solution.
The real solution to this rather common problem is to introduce a culture where "done" means "shipped" and not "I created a bunch of PRs for my team mates to read and while they all do that Im gonna create some more" and to use tech where you can to increase the flow the SDLC pipeline can handle.
This involves things like moving as much "code review" into the compiler/linter as you can, do a little more design/consensus building upfront, streamline all acceptance testing and deployment steps to their bare essentials etc...
1. Just pass it by a second pair of eyes to make sure I'm mot doing anything really dumb or nefarious. Here I don't care who reviews it, and in my team the person currently on-call is usually the choice.
2. I'm working on something complex or that I'm not yet familiar with. In that case, I want a specific person, who I know is familiar with the technology or the project.
Waiting for more than 20-30 minutes for a review is ridiculous and means some combination of three problems are present:
* commits are too big to be reviewed quickly
* Tooling/testing is not catching a lot of blatant problems before the pull request happens.
* Review process has turned into some bizarre management ritual
* Developers on the team value their sanity and timebox chunks of time for deep focus work, leaving only parts of the day for reviews. Some developers may even take vacations or go to talk to customers, leaving a small team with less review bandwidth.
Does your entire team just sit around doing nothing waiting to review your code? That timeline is asinine.
Exceptions of course, but this seems close to optimal to me (from the reviewers perspective).
Even if the actual review takes 30 seconds, it does not matter if my reviewer is asleep for the next 8 hours.
This is essentially how cli git is set up to function, and mailing list driven workflows use this concept to have small incremental reviews. I am excited to see this tool, but honestly I have soured so much on github because of the lack of code review, that I would just as soon it didn't try to shoehorn into pull requests, and just did what people do with Phabricator, optionally use GH too host the git repo if you need to, but do everything else in a different tool
I've used GH for code review so I'm not sure what's missing there, either. Reading this whole comment section leaves me a bit confused, like I'm the only one using a hammer to pound in screws when everyone else has moved on to a screwdriver.
Once you can easily do that, it naturally leads to being able to do review per commit which it makes it so easy to unblock your work because your not fighting against git, and you aren't worried about getting immediate reviews because I know if any changes are requested its trivial to make those changes, and I still have branches, so if I do have unrelated work I can still use branches to isolate them. The last thing is another tool feature, when I rebase and push up changes I have to give a comment on that new commit so reviewers can see the changes that caused the new revision.
Honestly I was a heavy user of Github and GitLab having instituted one of them at three different companies, and built out CI pipelines and development procedures for multiple large engineering teams, and I now feel like I was using a hammer to pound in screws and I will never go back.
The way I logically group commits is by stacking branches which are merged without fast forward. Works well with interactive rebase.
I agree that the PR code review process leaves a lot to be desired, but this particular workflow doesn't seem like a big problem. Though I've never used Phabricator and similar tooling, so there must be something we're missing.
Small code reviews often miss the forest for the tree. When you make the extent of the feature changes even harder to mentally trace back and put together by having a diff over another set of code that is itself an unapproved diff, the code review becomes pretty low quality.
I've also done this where my change that depend on other changes get approved, but then the other set of code gets critiqued so I have to change it and that breaks the code that depends on it so I have to change that as well, and now I still need to get it all reviewed again, so really I just wasted someone's time the first round.
What I do now is that I make commits over it in my branch, so I keep working while they review my first commit, and when that's approved I send the next commit to be reviewed, and so on. If there are comments I do an interactive rebase to fix them.
> A correlation between change size and review quality is
acknowledged by Google and developers are strongly encouraged to make small, incremental changes
Heard though about wanting to maintain context within a stack. I think the best balance is tooling that both lets you create a stack of small changes, while also seeing each change in the broader context (forest).
The benefit of breaking out PRs instead of commits, is that each small change gets to pass review and CI atomically. I like your strategy of gating commit pushes until the first is reviewed, but I think the dream is to decouple those processes :)
In BigTechCo you have more coordination problems, so you want to merge code to trunk as soon as possible to prevent integration risks. But that means you drive discussion of the architecture and features to other venues than than the code review. Code review becomes more of a "will this break things" check, and enabling a feature-flag is more the review stage for the feature itself.
If reviews are only "will this break things" I would gadly review their big PRs :D
> Previous studies have found that the number of
useful comments decreases [11, 14] and the review latency
increases [8, 24] as the size of the change increases. Size also
influences developers’ perception of the code review process; a
survey of Mozilla contributors found that developers feel that
size-related factors have the greatest effect on review latency
I'll have to dig into those. It makes sense that it decreases latency, but I'm curious how they assessed useful comments.
That said, I'm not saying to make massive code reviews, because reviewers are lazy, they won't want to spend more than an hour to review no matter the size of it. But when the reviewed code is made on top of code that itself has not been reviewed, the quality of the review suffers in my opinion.
1. Each commit is a Phabricator-style diff. Put the whole idea into a single commit via `git add -p` and rebasing.
2. Multiple ideas/diffs for the same feature? Split them into multiple commits on the same PR.
3. Review the PR commit-by-commit: you can click on the links in Github to the individual commits and see+comment on the changes there. Comments on commits automatically propagate back to the PR FWIW, so this works pretty well.
The mental model becomes: the PR is the overview of the full feature, but not how reviewers are expected to review. Reviewers review commits.
It works pretty well! Assuming you have buy-in from your team to operate this way. In fact, I actually somewhat prefer it: with "stacked diffs," it can be harder to get the birds-eye view of the entire feature's set of changes. Git and Github are very flexible, so in practice I find you do need some established workflows that everyone agrees to follow, or else it's chaos.
My personal feeling is that I need to review each commit and then I need to review the whole PR (list of commits). If both are good then the PR can be merged but each individual commit is not OK to merge unless it's part of a PR.
That's not true, though. You absolutely can merge a single commit, via git push, there's just no button in the GitHub UI for it.
Well, I'm asuming we are talking about PRs against some protected branch where manually pushing changes directly is forbidden.
Stacked diffs run in two dimensions. In one dimension is the stack of diffs -- logical incremental changes to the code, which make sense to code historians from the future. But each diff in the stack also has a history as it progressed through code review.
Squashing and amending commits is equivalent to the workflow described in the article.
Most of these practices are just inspired by what Phabricator and a few of the big companies are already doing well :)
Meaning branch 1 depends on branch 2 which depends branch 3... all the way to 10?
That just seems crazy to me. What happens if branch 4 changes their implementation and borks everything depending on it?
To ship your change you need to update what’s there already. Tool A needs an integration point. Tool B needs a hook added. Tool C finally joins A to B.
Implementing the bits for A and B found two useful refactorings in each codebase. At the end of it all there was also some dead code.
That’s 8 commits. The downside is you end up yak shaving a lot.
If someone leaves a comment on the CR for the first branch you fix the issue and rebase. It's not fun, but this is the only effective way to do it (that I know of).
What you're describing is sometimes a problem but often not really one.
I just tried it, and I can without problem open a PR from a fork against an arbitrary branch of an arbitrary fork (or main repository).
See https://zuul-ci.org/ and https://zuul-ci.org/docs/zuul/discussion/gating.html#cross-p...
is really nice practical example. What it does isn't even really that important; but it runs a simulation of deploying production code in OpenDev. This is a "devel" job, we deliberately test any changes against all the latest HEADs of the master/main/development branches of projects we use. This is a non-voting job -- a notice that what you're introducing might be fine now, but there is trouble brewing when our dependencies release their next version. Sometimes that's fine and a known issue, or upstream is broken, and other times it's something totally unique and needs to be fixed (this is why, despite AI being able to write code for you, so far the implications of using that code still need a human in the loop :)
The "required-projects" in the job definition tells Zuul what repositories this test needs.
You can clearly see how it handles various projects having "devel", "main" or "master" branches to pull from to get their latest versions.
In the "vars" section you can see we're setting variables that get passed to the job roles flagging "this is the devel job, don't install the latest release but use the source Zuul will checkout for you from here".
The amazing thing? If I make a change and this job fails, I may debug it and find that it wasn't actually my fault, but something in upstream Ansible committed recently. I can propose the fix upstream. They do all their CI, and that's fine. But I now put in my change comment
and magically Zuul will recognise that I want to apply that pull request to the Ansible tree in testing and set it up for me (as noted, Zuul can do this for all sorts of systems, not just github). Additionally, Zuul will not merge the change until the dependency is satisfied -- I can NOT commit broken code!
(granted the page doesn't mention git-stack since that is assumed)
I've struggled with this problem for a long time, but never knew that there were tools to solve it.
My goal with git-stack has been to automate my PR workflow and sit alongside other git tools. I've even tried to keep it open so you can use parts of it with Phab or other tools. You should be able to start using it out of the box with a standard PR model. It'll have better performance and behavior if you configure which branches are shared branches and mark them as "protected" in git-stack. If you don't use a PR model, then that might just don't run `git stack --push` but the rest should still do a reasonable job.
If I were to pick a tool besides git-stack, it would either be git-branchless or Graphite.
It's at the top of pr.
"someone wants to merge 1 commit into random-branch from another-random-branch"
> How will reviewers see the relevant diff for the reactions PR (excluding the comments PR changes)?
It will show it by default? If they want to see comments PR changes, they will need to go to comments PR.
> How can you propagate changes to the reactions PR if you ever need to update the comments PR before you land it (i.e. to address review comments)?
Use git rebase --onto.
It would be nice if the marketing page told prospective users what exact difference their tool provides - since I'm already using that workflow and never seen one of those internal tools.
We were trying to get this to work on Github for almost a year before we decided to just build this ourselves https://twitter.com/TomasReimers/status/1325647290128850950
Going through everything that's different is hard, but I'll try: When was the last time you made a stcak of changes that was over 30 branches tall? Why?
Smaller PRs are generally agreed to be strictly better, and you've probably written a 2000-4000ln pr at least once (which could have been 30 100ln PRs). So why did you not break it out? What part of the tooling was broken?
Is this a thing? I think the highest stack I've ever had was 2. I suppose it could depend on company culture/tech stack/code organization (monorepo vs. services).
> 2000-4000ln pr at least once (which could have been 30 100ln PRs).
I can't say I have. Maybe deleting a lot of code where it's very obvious what is being deleted and why. What sort of work takes 2000 lines that can't be broken down into smaller, incremental pieces?
What's the point of a PR that's just an HTTP handler + adapter that's not exposed via an endpoint yet, and might change once we implement related the domain layer methods? What kind of context will a reviewer have on whether this is a decent handler + adapter when there is no other context about which methods the handler will call and what data will be returned?
If we release the domain layer first, we may want to adjust the domain data model in order to better serve the application layer, or work better with the infrastructure/datastore layer.
And so on.
IMO the PRs need to be big enough to encompass an atomic, complete unit of functionality or else it's basically impossible to tell if the approach and code is any good.
"It doesn't break anything" is not good enough, you can still create tech debt by releasing code that may need to be changed to complete the feature. Or someone will just implement the rest of the feature in a worse way in order to integrate with the non-ideal code that you already released when you didn't have a good enough view of the final solution
Of course if there are problems with people depending on not yet final code that seems more of a higher level communication issue than git workflow issue.
Again, if atomic unit of functionality needs huge towers of stacked PRs that indicates that they are not really atomic and the design phase (communication) seems to be missing.
After all the reviewer should know about the design, the HTTP handler + adapter should not be a mystery to the reviewer. And so on...
Now, that said, sometimes looking at a big PR as whole takes up less review time than in daily/weekly increments. But if the reviewer only sees the thing late in the dev process plus the design is not agreed upon then that means the developer writes a lot of code that might go straight to /dev/null when the reviewer disagrees with the fundamental design of it.
The assumptions needed for these huge dangling WIP PRs to make sense seem very strange (risky and suboptimal) :o
There are definite benefits to shipping less code to production in one atomic deploy. When something breaks (and it will), you want to wade through as little code as possible while things are burning in order to identify what went wrong. Sure, you could just revert everything that went out with the last deploy, but sometimes things aren't that straightforward. Say when there are other sibling changes that are not easily roll-backable that you'd have to clobber to do a pure rollback. In these scenarios, it may be safer (albeit slower) to revert your individual change and deploy the revert by itself.
If you roll out chunks A, B, then C, and things broke after C, the issue can still be in A, it just doesn't get exposed until it is accessed in a certain way by C.
How quickly can you determine which one(s) to revert?
Now, I will say that these type of things are not everyday kind of PR’s, but they have certainly happened several times over the course of my career. Probably once or twice a year if I had to take a rough guess.
Regarding your questions, the most I've done was 4, usually at most 2. My PRs are single commits that I amend and force-with-lease after getting useful code review feedback. I don't think rebasing one after another is particularly annoying, but I think not having to do it is a nice feature. I wonder whether it's a product tho. I wish you success anyway.
That sincerely was not my intention, and a lot of context and playfulness can get lost in comments. For what it's worth, I was simply trying to make a joke about why our marketing materials may not be as good as they should be and continue the conversation. I apologize if that came off as defensive and will make sure I double check my future comments so that they don't come off in the same light.
No judgment, I just found it funny.
Re `git rebase --onto`, the open source CLI offers a recursive implementation to prevent you from having to carefully rebase each branch in your stack (https://github.com/screenplaydev/graphite-cli/blob/main/src/...)
One of the reasons you cant use a simple rebase --onto is that you dont want to accidentally copy all downstack commits between the merge base and what you're restacking onto. The CLI tracks branch bases commits through git refs to ensure that restacking never duplicates commits.
Thanks - you've answered what I've been looking for and I'd prefer that this information was in the marketing copy.
I agree that this would be an useful feature. I don't think this is a product though - I don't see myself doing more than 3 or 4 stacked diffs when this functionality would really shine.
PRs work very well for us and if we find ourselves stacking them, then we keep an eye on getting them into master (yes, we don't use the 'main' bs) in reasonable order. Then Github is actually doing job for us changing the target branch once the underlaying changes are merged into it.
I do stacked commits (branches off branches) all the time. Github certainly supports PRs based on such.
git switch offshoot
git rebase --onto feature branchpoint^
Finding the branchpoint commit is the only real annoyance here -- if there's a ref or a better trick for finding it, I'm all ears.
git checkout main
git checkout feature-branch-1
git rebase -i -
git checkout feature-branch-2
git rebase -i -