Hacker News new | comments | show | ask | jobs | submit login
Modern Code Review: A Case Study at Google [pdf] (sback.it)
182 points by beliu 81 days ago | hide | past | web | favorite | 64 comments



Most code review ends up being pair/team programming gone horribly awry. There's a lot of ego involved, it's slow and cumbersome, it's emotionally painful, and the metaphor is wrong (a panel of judges inspecting a lone defendant rather than a team collaborating on an engineering artifact).

I recently had the opportunity to work with other developers on a project by displaying a laptop onto a giant screen while we all sat on a couch and drank coffee: it was fun and we produced a high quality result. After years of code review, on the other hand, I can say that code review is painful, expensive, and mediocre.

I'm hoping that as software 'eats the world' and provides ever more value to larger numbers of people, companies will realize that putting a single developer on an engineering problem, and having the solution be submitted for review to the rest of the team after the solution has been designed is both unkind to your employees and bad business.


Where I have worked I am semi famous for a style of code review which almost exclusively is based on asking questions. A lot of the questions are around design rationale, because the "why" is often not captured. Questions can either lead to code changing, "oh, you're right, that doesn't make sense" or "missed that!" , or a comment being added to ensure the rationale is captured. Developers seem to really appreciate this type of review, in contrast to the style you have outlined.


This is such an underrated comment.

I love this style of code reviews and it's the one I've experienced best discussions with. It goes in line with assuming that you, as a reviewer, don't know what the intentions/rationale are rather than blindly dictating your solution. A lot of times the reviewer that says "this is wrong" is then explained to why it was done a certain way and is the one proven wrong. Questioning yourself as a reviewer is always a good idea. Asking questions gives feedback without coming off aggressive/judgemental/patronizing.


This is essentially what I do. While I do note some obvious "bad things" I'm usually trying to understand the code and the thought process behind it. The result is usually a) me learning something b) better documented code and c) the occasional code change resulting from the aforementioned "missed that" or "that makes sense" moments.


I do something similar. I try to start most of my comments with "What would happen if...". I find that people are more responsive if they work through it themselves rather than if I'd said "X is wrong, you should have done Y".


“This is wrong. Please do the needful!”

- Anonymous code reviewer


Socratic method style.


I think you're doing code review wrong. In several places I've seen code reviews as a place to have religious wars about code styles after the code has been written and tested. This is the worst way to do code reviews.

Instead I've had a lot of success limiting code reviews to three types of critiques.

1. This code is broken. It does not do what it's supposed to do in a way that will affect current users.(avoid what if scenarios)

2. The code does not meet a clearly delineated style guideline that the team has agreed to.

3. The code does not meet our clearly defined checklist for possible bugs.

The idea is to keep the code reviews objective and boring. People can fight religous wars over the style guidelines, but not over any given lines of code.


There's a 4th that I find is, at the very least, incredibly useful for mentoring and knowledge sharing: Comments to the effect of, "Did you know about this other way to do that? It would have X effect."

With a clear understanding that it is up to the person submitting the code review to decide whether or not to make the change. 'Cuz, I agree, arguing about stylistic things after-the-fact tends to do more harm than good.


If you have no steps in between assignment and code review, then the problem is not code review.

companies will realize that putting a single developer on an engineering problem, and having the solution be submitted for review to the rest of the team after the solution has been designed is both unkind to your employees and bad business.

Let's give people a little agency here. "Companies" can't realize anything. If you, as a software engineer, are operating this way and blaming it on "the company" then you are part of the problem. You are the company. Do it better and it'll be better.

External locus of control, and using "the company" as a scapegoat to absolve oneself of responsibility, is what's ailing such a company.


Just like code review doesn't happen on its own, pair programming does need to be a team effort. A single developer deciding to do pair-programming more tends not to change an entrenched culture. Working in the same place and time and on the same code change doesn't happen automatically, particularly in places that support flextime and working from home.

The one place I worked where we pair programmed all the time and it worked well was a startup where everyone hired knew going in that it's an XP shop (and this was part of the attraction).

On the other hand, flextime and working from home actually are important benefits. It's a tradeoff.


[flagged]


Um, what? Is any of that supposed to be related to the conversation we're having?


You make a perceptive point about the metaphor being wrong. Code reviewers should be more like advisors and less like gatekeepers. I've had to go through too many reviews where I end up having to bend backward to please the reviewer.

Gatekeeping is occasionally needed, like in case of someone outside the team contributing a change, serious bugs, egregiously bad coding style like global variables and gotos, new team members/employees, people fresh out of college, etc. But I've seen reviewers going too far in the direction of gatekeeping.

I tried to improve myself as a reviewer by asking if I can approve the change in the first round of review. I don't hold back from giving as many comments as I feel like, but I also try to approve in the first round, reminding myself that the other person is also a competent engineer who can make the right decisions even if I'm not looking over his shoulder every second.


In the "How it all started section" -

> E explained that the main impetus behind the introduction of code review was to force developers to write code that other developers could understand; this was deemed important since code must act as a teacher for future developers.

This should always be at the forefront of the code reviewer's mindset. It's quite easy for the review process to degenerate into a style argument, quest to find every inefficiency, or attempt to make code as "clever" and brief as possible.


I think a key to instilling positive code review culture is to make sure that the comments are based on some objective, such as readability and clarity, and not based on personal preference. In particular, if one person is reviewing code of another, it is not helpful for the reviewer to remark that they would have written it differently, if they had written it at all. Some people for some reason can't keep themselves from doing this. They should remember that they did not, in fact, write the change, so their personal style is irrelevant. If code is clear, obviously correct, well-tested, and correctly formatted, it LGTM.


except the definition of readability varies and we're back to square one



https://developers.google.com/edu/python/introduction: According to the official Python style guide (PEP 8), you should indent with 4 spaces. (Fun fact: Google's internal style guideline dictates indenting by 2 spaces!)

https://github.com/google/styleguide/blob/gh-pages/pyguide.m...: Indent your code blocks with 4 spaces.


Not really.

Between serious professionals, there will be a few disagreements, but (1) you can work through them with some give and take, and (2) over time you get to know each other, which usually means one persons agrees the other's style is better, or you agree to to disagree on on point.


Yes well in the final analysis all problems are recruiting problems, aren’t they?


I think putting opinions in code review comments is fine, as long as you’re clearly indicating it as an opinion. The worst is when a reviewer tries to justify their opinionated comment with some vague appeal to style or readability instead of “I hate this part, ship it”


Reading another human's code (or prose) is the closest thing we have to mind reading. Whenever I can't avoid doing code reviews, most often, I really wish I didn't know that much about the author.

--

"...code must act as a teacher..."

Agree. Alas, most teachers aren't very good.

https://en.wikipedia.org/wiki/Sturgeon's_law

I used to really care about this stuff. I started a design pattern study group 25 (?) years ago. It's still going strong. (Geeks love to eat pizza and argue about the rules. :) )

Professionally, I've all but given up. Any more, I just try to mimic the code style of whatever goo I'm working on. And do whatever kabuki required of me to get my PRs merged.

Programming is now my hobby. Whenever I feel the need to code with intent and feeling, I work on a personal project. (For some reason, this reminds me of Al Pacino quote about exercising. Something like "I just lie down until the feeling goes away." Happily, I'm not that bleak.)

--

When I am feeling optimistic, I imagine a future where we adopt a hybrid of Pieter Hintjens' Social Architecture and Michael Bryzek's Test in Production.

https://legacy.gitbook.com/book/hintjens/social-architecture...

https://qconsf.com/sf2017/sf2017/presentation/testing-produc...

TL;DR: Accept all PRs, radically reduce the cost of changes.


I think I mostly accept anything that comes my way, except for ones that have obvious, non-blocking fixes.

People tell me to be more critical, but there’s no point in holding everything up because someone wrote a triple nested ternary operator.

It’s icky, but they, or someone else, will fix it the next time they come across that piece of code. That’s why we have tests.


Xoogler here, circa 2014. Noogler orientation impressed upon us that every change was code reviewed, period! However my experience was that code reviews were heavyweight and frustrating, often to the point of just being skipped: completely different from the article's observation of latency in hours and "fast-paced iterative development."

1. My initial team was a firmware engineer, and myself doing Android development. My peer felt unqualified to review Android code, so I spent a lot of energy begging code review time from tangentially related teams. Their feedback was not deep or useful for obvious reasons, and the days of latency posed a major problem because our project was tied to a hardware schedule.

I learned later my teammate was simply stamping his own changes, which the tool permits, and so I started doing the same. Bypassing code review resolved the problem.

2. My next team was mostly in East Asia, while I was in Mountain View. If they had a question on a change, it would add two days latency due to time zone differences. And because I was physically detached from my peers, I believe they felt more comfortable deferring my changes. Again we were tied to a hardware schedule, so this latency was very painful.

3. The last scenario was overseeing builds, on (say) a Foxconn factory floor. For every build, the hardware coming off the line would behave unexpectedly, and we would have to very quickly update some code to make things work. Code reviews in this situation would have been absurd: any changes only had to last as long as the build itself, and well, there wasn't even network access. This is admittedly an exotic and specialized scenario (but recall orientation's "every change, period!")

As a Googler code reviews were either skipped or were a major burden. I made some mistakes: I ought to have escalated quicker, increased my visibility, been less cowed by the orientation. Google also could improve in some areas:

1. Code review expectations need to be adjusted for small heterogenous teams. Such teams are more likely as Google ramps up its hardware efforts.

2. The engineering culture ought to recognize the burden imposed by time zones. Code reviews must be more forgiving when the communication cost is high.

3. Shift orientation towards teams instead of company-wide. CR for self-driving cars must be fundamentally different than CR for matching selfies with famous artwork, or CRs for updating factory floor test fixtures.

Some of these may have changed since I left and if so I'd love to hear how.


A lot of your perspective seems to be the result of a very strange work situation. If you regularly need people on the other side of the world to review complex changes, then your team shouldn't have an ocean between its halves.

That being said:

1. Given that OWNERS approval must come from your own team, yes, your team dictates many of the code expectations. I'm not sure how this was ever not the case, unless you're arguing that teams should be exempt from readability, in which case my rebuttal is that although people gripe about it a lot, the vast majority agree on its utility after they go through the process. (Rereading your comment it sounds like you had difficulty getting Java approvals, and today there is a system where you can request a review from the pool of Java readability approvers at Google, and the turnaround tends to be pretty good, generally within a few hours.)

2. Absolutely not. If anything those need to be more careful to make sure nothing's lost in communication.

3. See 1. This is also driven by general test coverage and design processes, as well as the ability to define custom presubmit checks on a path-by-path basis.


Mozilla requires code review for all code changes, too. Time zones are a big challenge because many employees work remotely and few feature teams are collocated in the same office. Engineers have to accept some latency, like you describe. To keep their pipeline full, an engineer typically must multitask multiple bugs in different stages of the review process.

Reviewers try to prioritize requests from volunteer contributors so they don't get discouraged. These reviews often require extra feedback on basics of patch creation or coding style. Over time, some experienced volunteer contributors become reviewers, too.

Instead of requesting review from a specific individual, some teams have experimented with assigning some review requests to a team alias. Anyone on the team who is available or has some free time can approve and land the patch. This works well for small patches that don't depend on specific expertise. This process also depends on everyone sharing the review burden.


I have also seen the "mandatory code reviews" bit fall apart with teams in mixed time zones. One tiny back and forth with the Paris team can delay a change by two days, sometimes blocking other work.

I have wanted to switch to something closer to "just commit, notify people, and let them revert or fix it as needed".


The paper is mostly discussing the culture within the mainline “google3” development, that contains most of the code and in which most googlers are working. Android has a very different culture. Chrome and kernel teams have slightly different cultures.


> If they had a question on a change, it would add two days latency

Wouldn't that be just one day? You ask the question while they sleep, they answer while you sleep, you get the answer the next day.


You publish the code review at 5pm on day 1, it gets reviewed and a question added at 5pm (local time) on day 2, you see it, answer the question/make a change and publish at 5pm day 2, it gets a ship it on day 3 (local time), and then you push the change on day 3


Oh, alright. I thought the "it" would refer to the question, not the whole review process.


Putting the human code review aside, one of the things I've been more impressed by during my time at Google has been the automated presubmits, generally the static analysis ones. I do a lot of Java, so Error Prone (https://errorprone.info/bugpatterns) has been a useful one (cf https://errorprone.info/bugpattern/FormatString); we have some others that Tricorder runs but I can't seem to find public references to.

One thing that's also an interesting challenge is the work it takes to make something a blocking presubmit: sometimes you don't get useful feedback from one, or maybe a test doesn't provide very good signal, and then people start force submitting to bypass the check because it's not providing utility, but in the process also skip other checks.


What so you mean by automated presubmits? Are those „just“ CI checks for static analysis or is there more sophisticated things going on like merging it to a generated branch, deploying it, running all tests in the infrastructure there and more?

Can you elaborate a little bit more on this? Or point me to a resource that is doing this?


I will be glad when people stop using the word “modern” for this kind of thing.

It’s an artifact of some imagined distinction we seem to cling to in the computer world, between the present and the past. No one talks about how to write modern novels. They’re just novels. It’s presumed that they are changing over time. And that things are situated in a historical progression.

Also, modernism is an actual thing, and it would be nice if we could use the word for that.


No doubt. I believe the word they are looking for is contemporary.


I'd rather say "current", because to me "contemporary" is something like a transitive adjective. Things really should be contemporary with (or contemporaries of) something else.


I’d rather use an actually descriptive adjective that situates practices in logical space, rather that use a time-relative term that will become meaningless within a year.


Unpopular (?) Opinion:

In most "modern" environments, with constant online updates, where shipped defects are not that costly anymore, code review is a net loss. You can just skip it. This study fails to compare with "no code review whatsoever" and it measures by "defects found" instead of "total effort spent" (i.e. the only metric that truly matters).

In code review, most developers will not do the kind of thorough analysis that uncovers serious bugs, if they actually did, code review would be too costly. Instead, disagreements on superficialities and bike-shed arguments are promoted, often leading to low-value follow-up changes that can easily cause worse issues than they solve.

Code review satisfies the urge for process and structure, but it also satisfies the need for diffusion of responsibility, which is a bad thing. People need to own their code.

Code reviews have some value during onboarding or with junior devs, to get everybody on the same page. After that, you can disregard it, the effort is better spent on other things like automated testing.


In my experience, code reviews

1. Check that the code is addressing the problem at hand and is fulfilling a business need (mostly making sure it’s linked to a reasonably-written ticket)

2. Align style and make code look consistent across repos/teams (which includes pointing out similar work which could be reused)

3. Force other people to read and understand the change, so the original author isn’t on the hook for all support/oncall questions)

4. Highlight hard-to-understand sections that might be unclear to someone newly-arriving to the code, suggesting those parts need more explanation or a simpler implementation

5. Copy editing on error messages and docs to remove typos/passive-aggressiveness/make important information easier to see

6. Find bugs by humans running static analysis (“this will race/deadlock”) and provide pointers to resources to help the reviewee learn (“see <some link> for why and how to avoid it”)

7. Human-run linting (if the automatic linters miss/mess up stuff)

I’ve also see it used as a way to bully other less-knowledgeable devs by nitpicking and issue-raising when the commenter isn’t included on the original list of reviewers and leaves criticism without any constructive feedback (“this is wrong”)

Rarely have we spotted bugs in code review: that’s what tests, oncall, and a robust rollback mechanism are for


mostly making sure it’s linked to a reasonably-written ticket

This one seems to come up quite often recently, but to me sounds an awful lot like "satisfying the urge for process and structure". Yes, code people write in professional environments should generally be satisfying user/business requirements, but that's something that can be discussed periodically, perhaps with your manager -- it doesn't seem like something that needs somebody looking over your shoulder all the time.

Ticketing systems can be useful tools, especially in environments with lots of user-submitted requests, but I'm suspicious of trying to tie every single piece of work to a ticket. That seems rather like a tool which should be your servant becoming master.

Edit: if you can't have a sensible conversation, potentially with a not-especially-technical person, about what value you've created in the past few months, then you might have a problem. One of my worries about ticket-focussed environments is that it's possible to focus on the small-scale and lose sight of what you're creating.


The point of the tickets is to have traceability from requirements down to the code. In some industries, this is mandatory so they can satisfy e.g. safety standards.

It's a form of documentation.


Sure, it's process which is mandated in some fields.

I still think it's net harmful, and can lead to over-emphasising the small/local vs understanding the application and its value as a whole.

In cases where "you really need to understand requirement OBTUSE-1234 before you even think about touching this function" really does apply, a comment in the code addresses this much better, and will be seen by future developers even if they aren't religiously reading the VCS blame output for everything they edit (or if it's been clobbered by reformatting or other changes).


A lot of these points are highly subjective and therefore dangerous points of friction. If you're going to start having a discussion about the "business need" when some presumably competent developer needs something they have already written merged, there's going to be a problem and it's not the code in question.

Again, I'm presuming competence. Code reviews can help with tutoring new arrivals, but at some point they need to graduate to a responsible, self-directed actor. At that point, chances are their code reviews will get signed off with a superficial "lgtm" anyway.


This is mostly to make it easier to understand why the hell someone did something in a particular way and if it’s safe to change it to a more contemporary style years later. The reviewer should trust the author and the business requirements that lead to the change being created, and just check that it’s linked to a ticket that would make sense to that future maintenance programmer.


They're a tool. If they're used to their maximum potential they can improve quality and increase knowledge in the team.

I am experienced and welcome reviews. They give me the opportunity to see if the team understands what I wrote or if it still needs to be adapted.

When reviewing I check for readability, to see if architectural constraints are respected and also offer advice on design and coding idioms.


>In code review, most developers will not do the kind of thorough analysis that uncovers serious bugs, if they actually did, code review would be too costly.

The point isn't to go in-depth pick apart the design (if a dev needs that kind of deep intervention on a regular basis, it's time to have some hard conversations), it's to share knowledge. The reviewer gets to see a another dev's code along with a good description of what that change does so if there's an emergency someone on the team might actually know where to look. And the submitter gets to know of any footguns the reviewer has seen in similar code.

>Instead, disagreements on superficialities and bike-shed arguments are promoted, often leading to low-value follow-up changes that can easily cause worse issues than they solve.

Which is why it's important to have a system where the submitter can choose the reviewer(s) to avoid bike-shedding and also choose to ignore the reviewers' comments. That's Google's system. Is someone throwing a lot of BS nits at you? Ack the comments, take them off the reviewers list and pick someone else. Do that a couple times and most folks get the message. Talk to management about those that don't.


> The reviewer gets to see a another dev's code along with a good description of what that change does so if there's an emergency someone on the team might actually know where to look.

In an emergency, you need developers to be able to figure things out regardless of whether they have seen something in a code review or not. Knowledge of a code base is built much more efficiently by actually working on it and, to a lesser degree, reading the code. With the time you save not doing reviews, you can let people write tests or do refactorings to build that exact skill directly, instead of having it be a side-effect.


You may be underestimating how effective this type of knowledge sharing is. Knowing where to look is half the battle.


> In code review, most developers will not do the kind of thorough analysis that uncovers serious bugs, if they actually did, code review would be too costly

We replaced code reviews of every commit with in-depth code walkthroughs.

Basically whenever we start having a lot of issues with a subsystem we organise up to five half day sessions to go through the entire subsystem line by line as a group.

It brings team progress on other areas to a standstill while we do it but works way better than normal code reviews.


I completely disagree, if I could have a penny for a bug that I found in a code review I’ll be rich.


Completely disagree also. Around 50% of all patches I review have real logic bugs in them. Patches I write myself aren't flawless either and often someone finds bugs in them as well.

And it's not like you have to spend several hours on thorough analysis either. Small changes takes a few minutes, medium changes 30min and larger changes should deserve a few hours.

If your team is bikeshedding in code reviews, install an automatic code formatter.


Personally I'm at the other end of things, I routinely ask people to look at my code multiple times even before submitting it for code review. Chances I could learn about some edge case I did not consider before.


A link that's actually loading for me: https://ai.google/research/pubs/pub47025


12 interviews and 44 survey responses? Is this a joke?


I'm surprised by the finding that Google changes have a medium sized of 24 lines changed. While I'm a big fan of small commits, I was under the impression that some of the large open source Google projects (Chrome, Android) have relatively large commits flowing into their repositories.


I am sure they are rebased several times before going public


After years of trying to make it work at various companies I have stopped supporting "modern code review" implementations.

There is a host of problems with those in my opinions, and there are better alternatives.

MCR are typically practiced as very shallow analysis by a colleague. It is even solidified in the tyical feedback given as "Looks Good To Me". This is not even remotely near to what is needed to get quality results. Most problems discovered this way are very easy problems that jump right at you that could have been prevented with just a bit more focus on the initial implementation.

The reviewer is typically prevented from doing the code review correctly. She is typically engrossed in some other problem when the review comes requiring her to switch context. Aside from the annoyance factor, the typical developer will want to get to his original problem as quickly as possible.

Even more annoyingly, people will treat review notes personally and it is very difficult to be the person that does most of the reviews for your team and hence constantly point out errors in their judgement.

The reviewer is forced into needlessly artificial role of having to reject/accept colleagues' results with little time for analysis and facing the consequences of being on the other end of the stick the next time.

There is no incentive to spend required time on code review. Organizations typically are not willing to support this expense of time even though everybody is willing to support the concept of the review.

The review is typically done AFTER the implementation is already completed. This is possibly the worst moment to give the feedback, when the original developer is pressed for time to get his product shipped and where typically very little can be fixed. This causes people to get defensive which is straight way into conflict. It takes effort to avoid conflict if you try to be dilligent in your reviews which is the whole point. So if you don't like how the solution is structured it is typically too late to fix it and there is incentive to just let it go.

Finally, the code review typically delays the deployment of the solution as it sits in queue. This is against the agile idea of getting stuff shipped as fast as possible, hopefully immediately.

Much better strategy, one that I am promoting now, is pair programming where you physically sit with the other developer and spend time discussing the problem, the requirements, the solution and then implementation and verification. No work is allowed to be done outside of the pair.

The problem and requirements are analyzed by both persons so they both understand it well. The implementation is done by both people so they have equal, real chance of catching problems. The possible problems are caught early in the process even at the analysis stage, preventing from having large amounts of work done before review. The discussion is friendly and does not force one of the developers into artificial position where she has to accept or reject the solution. The developers take full responsibility for the implementation as they understand when they ship the code there will be no further review to catch errors -- only (hopefully) automated testing. Finally, when the implementation is done it is done and immediately available for shipment which means the process is not delayed.


I generally agree with all of this.


Off topic, this reminded me how much I hate how pdfs are the standard for papers.


Off topic, this reminded me how verbose research papers are.


What's the alternative?


Html was literally invented for sharing academic research. The fact that it can also be used for shopping, banking and showing your friends your holiday slides is an happy accident.


HTML makes perfect sense, as another poster said.

I've also found Authorea[1] to be pretty promising.

[1] https://www.authorea.com/


And two columns? Really?




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: