Hacker News new | past | comments | ask | show | jobs | submit login
Modern Code Review: A Case Study at Google (2018) [pdf] (sback.it)
120 points by ra7 10 days ago | hide | past | favorite | 45 comments

>In terms of speed, we find that developers have to wait for initial feedback on their change a median time of under an hour for small changes and about 5 hours for very large changes. The overall (all code sizes) median latency for the entire review process is under 4 hours

I worked at Google from 2014-2018. I'm extremely skeptical of these numbers.

Based on my experience working on three different teams and sending changelists to many other teams outside my own, I'd expect something more like 3-4 hours for small changes (<50 LOC) and 1-2 days for large changelists (500+ LOC).

Google has so many automated changelists that I'm wondering if that skewed their numbers. Like if someone just clicks through 10 auto-generated CLs and approves them all, did that count for a bunch of near-instant review cycles?

That said, I was a big fan of Google's code review process. I know people complained about the unnecessary level of process and the hassle of getting readability, but I found it all a great way to spread knowledge across teammates and improve programming skills.

I think pocket reviewers probably account for the generally short time. In my day-to-day I usually did not send changes to people who weren’t already primed to receive them and sitting right next to me, so I could turn around and say “I sent you the thing” and they’d review it straight away because we were working towards the same goal anyway, and in some cases had already pair-designed or pair-coded the thing so the review was already a formality.

Caution though: this can lead to blind spots. I witnessed a critical bug become exacerbated by the first fix because the developer and reviewer had essentially co-wrote the solution, so the sign-off was essentially a rubber stamp.

If you want the least bias, you should find someone to review something that hasn't been deeply involved in designing it. And for pieces of code that are small enough, reading and getting up to speed shouldn't take that much time. I prefer sending reviews to engineers that are familiar with the area of the code, but we don't do pair-programming so the implementation hasn't been seen before.

We do, however, do a lot of design work before a single line of code is written.

*also this is at google.

Yep, "pocket reviewer" generally has a negative connotation. But, let's face it, most changes are minor. You have to be judicious about the level of review required for a given changelist.

Pair programming really is the best code review!

Yes it is. At places that require two reviews to merge having a pair gives you one for free and they have practical knowledge of what was done.

Even if the median or 90%ile review is quite quick, the slow ones are where all the pain is.

Sure, you can split out 9 little cleanups and typo fixes while working on a larger change, but it's still days to get the main change approved.

That's definitely curious. If the median latency is under 4 hours, does that mean there are lots of people sitting around doing nothing, or working on nothing urgent, so that they can immediately turn around a code review?

> does that mean there are lots of people sitting around doing nothing, or working on nothing urgent, so that they can immediately turn around a code review?

No, it means there is a cultural expectation / etiquette that you prioritize keeping your teammates unblocked by reviewing their code quickly, even if it means putting aside your own coding to do so.

In addition to what the other user said, most changes are also small. Google's infra and culture encourages changes that are the size of a commit, not a pull request.

This depends on the particular language and context, but simple, isolated, and usually small changes are preferred and encouraged. This allows relatively quick review for lots of things.

If I see a change that modifies a single function and adds a test to validate the change in functionality, that's very quick to review.

Discussed at the time:

Modern Code Review: A Case Study at Google [pdf] - https://news.ycombinator.com/item?id=18035548 - Sept 2018 (64 comments)

By means of 12 interviews, a survey with 44 respondents, and the analysis of review logs for 9 million reviewed changes, we investigate motivations behind code review at Google, current practices, and developers’ satisfaction and challenges.

While quite a large number of changes, that seems a rather small sample size for interviews, and especially for survey response.

A general pattern of tech folks trying to understand a subject is that they will analyze billions of data points while doing everything in their power to avoid or minimize talking with people.

Perhaps one of those two activities scales easier than the other.

I worked at Google circa 2015, and found the code review process to be actively terrible.

I was writing JS, while my sole teammate was a firmware engineer. He was not willing or able to meaningfully review my code, so I had to try to scrounge random reviewers from other teams. They naturally de-prioritized it, so it took forever. And because they were not involved in my project, they would only have local stylistic feedback.

Months in, I finally figured out that other engineers were just self-reviewing: you can add yourself as a reviewer and just click merge. So I started doing that too.

That honestly sounds very atypical. I've been at Google 9 years and never experienced the scenario you describe. You should probably have reached outside of your team for assistance or flagged a manager.

Yes, there have been times when code review wasn't super useful, but on the whole I never experienced the kind of dysfunction you're talking about and merging reviews without external review is a pretty big no-no.

2015 was 6 years ago. Google as possibly changed? Doubled in size (or more) by staff? has more rigorous approaches? has more people capable of meaningfully reviewing code?

Your comment seems to be exactly the same as this one [1] from the original HN story in 2018, except with less detail. Are you ridiculous_fish, or is this just a huge coincidence or something that happens all the time at Google?

[1] https://news.ycombinator.com/item?id=18037184

Might be sensible not to directly out people in the future, they could change usernames for privacy reasons. Think of it like deadnaming someone.

Ha, busted.


> Months in, I finally figured out that other engineers were just self-reviewing: you can add yourself as a reviewer and just click merge. So I started doing that too.

Where? This certainly isn't possible in google3 (and wasn't possible in 2015 either).

That said, there are now some changes made to make, in general, the code review process for people without readability and without teammates who have readability more streamlined.

It was in the Android repo, which used Gerrit.

I never dealt with readability, because Android did not require it. In fact that was part of their internal recruiting pitch: fed up with readability? Come work on Android! Heh.

well, Android was always a bit of a special snowflake compared to the rest of the company. Separate codebase, separate build system, separate SCCM, etc. You can't really compare an Android team to a google3 team.

Culturally, it was worse when Andy was in charge, but it was still true when I left in 2018.

Ah yes, that's been rectified now =D

Sounds like a team health issue. Sounds like a hard time, my sympathies.

In my N years of experience, very few engineers are in an environment where only they have meaningful context on what they're working on. Generally any non-trivial project worth staffing is worth having two headcount work in the space, even if one is a TL with divided attention.

Self merging your code at a company like Google is wild.

That does not sound like they were working in the main trunk. I do not believe you can run any production code that was self signed like this.

You can't. Borg binary authentication is integrated with build and review. A build that does not descend entirely from reviewed, approved, and committed code running as a production user with access to userdata will raise alerts. Individuals are able to run non-committed code on Borg under their own accounts, but not under production role accounts.

You can break glass in emergencies by committing code TBR, or "to be reviewed", however this automatically escalates to owners of the code in question plus your manager and director, and all TBRs have to be resolved by actual review within a short time. An author cannot submit to-be-approved code; they have to be owners of the code in question (personally or transitively included in the OWNERS file) to TBR.

You can read about this system here: https://cloud.google.com/security/binary-authorization-for-b...

The review experience can really vary between teams, and depending on whether you have readability or not and whether you're changing code you own or not. This can make the difference between a couple of hours to a week to submit the same change. Readability in general is a process that while valuable, but can be really streamlined. I was working in a fast moving team using Python where everyone had readability, and I just didn't have time to delay most non trivial CLs, so even after submitting hundreds of CLs I did not have readability, having sent only a handful for review. Another negative effect I saw was people opting not to use a less popular language that was the right choice for a project, just because not many people in their office have readability in that language.

A much easier process would go something like this: Suppose my teammate has readability in language X, They can tag review comments as being readability related or not. Once someone has N CLs under their belt with readability related comments under some low percentage in the past M CLs, they are granted readability automatically, or have to submit just 1-2 CLs for a speedy readability process.

I'm realizing you are using "have readability" in a specialized way I may not be following. Can you say more what you mean by "whether you have readability" and "readability is a process...". Readability is a process (not an adjective)? That only some teams have? Can you say more?

Based on [1]; readability in google means you've been certified to review code according to Google's code style in a given language.

Seems like if you've been certified you don't need an extra person who has been certified in 'readability' to review your code so it makes it easier.

1) https://www.pullrequest.com/blog/google-code-review-readabil...

Code changes generally require 3 things:

1. LGTM (that is, any second set of eyes) 2. Readability Approval 3. Owners approval (similar to github ownership)

The CL author can provide readability and ownership, if I have python readability and own the file I'm changing, anyone else can LGTM. But if not, at least one of my reviewers needs to provide the language approvals.

There's a process for gaining readability in a particular language, where you send your normal code changes to be additionally reviewed by experts who give deeper feedback on language idioms/use. This process is contentious because Google is fairly strict about style guides, and the readability reviewers goal is to ensure that you are writing idiomatic code, not to ensure your changes get submitted fast, so they'll often request changes and delay things (but this is known in advance, and for a high priority change, there are other avenues to get language approvals).

After you are granted readability, you no longer need approval from someone else with readability for changes in that language.

Personally, I found the process valuable for all of the languages I've gone through it in, but I can absolutely understand people's frustration, especially when dealing with deadlines or similar.

It was certainly valuable, especially in C++ which includes a lot of internal standards and libraries, although at times I felt like a lot of these conventions can be enforced by a smarter linter, but I am writing better C++ code after this process despite having years of experience pre-Google.

I didn't feel the same for Python though.

thedance provided a nice overview of the "readability" process here: https://news.ycombinator.com/item?id=22620455

Essentially what it means is that, when writing code in certain programming languages, one of your code reviewers has to have proved that they're fluent in that language and the style guide for it.

Yeah python and c++ readability there is full on bs. No two readability reviewers agree on things, nitpicking was ridiculous and very superficial ime and most of those issues can be solved with some light linting/static analysis. Then you watch all those l6s+ shipping obviously terribly styled code bc noone dares to say shit... fwiw Go had much better process for this but i got mine when the readibility list was like a hundred folks so not sure about now

Google has heaps of horrible code. The problem with lots of small isolated CLs and little time invested in revisiting broader architectures.

Then they’ll finally rewrite the thing some day, after a massive and often unnecessary design doc filled with needless bike-shedding that ultimately proposes an overengineered solution to help the person make L+1.

The things that always blows my mind is how every individual piece of code you read has good, clean, well documented, well tested code. But overall it's just a big mess.

Not everything, though. Infrastructure code and libraries are usually very high quality and very nice to work with.

That first paragraph resonates 200%.

As long as you’re only looking at like 30 lines it’s lovely. Then you zoom out to view the beast.

Modern was a poor choice of words for this title. It's three years old at this point and will only become less applicable over time. "A 2018 Investigation Into Code Review at Google" might have aged better.

What has changed in 3 years?

Within Google? I suspect enough to make this paper obsolete.

The paper itself says: "Employing lightweight, tool-based code review of code changes (aka modern code review)"; I think that substitution was a disservice. Practically all case studies are on modern processes so it's not really worth specifying.

After a quick scan of the article, I can tell you that not much has changed in the interim.

Will that hold true for the rest of time? If not, a more descriptive word would have been better than 'modern'.

I don't follow. "Modern" in this usage obviously means contemporaneous with the article's publication date.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact