
Modern Code Review: A Case Study at Google [pdf] - beliu
https://sback.it/publications/icse2018seip.pdf
======
pmcollins
Most code review ends up being pair/team programming gone horribly awry.
There's a lot of ego involved, it's slow and cumbersome, it's emotionally
painful, and the metaphor is wrong (a panel of judges inspecting a lone
defendant rather than a team collaborating on an engineering artifact).

I recently had the opportunity to work with other developers on a project by
displaying a laptop onto a giant screen while we all sat on a couch and drank
coffee: it was fun and we produced a high quality result. After years of code
review, on the other hand, I can say that code review is painful, expensive,
and mediocre.

I'm hoping that as software 'eats the world' and provides ever more value to
larger numbers of people, companies will realize that putting a single
developer on an engineering problem, and having the solution be submitted for
review to the rest of the team _after the solution has been designed_ is both
unkind to your employees and bad business.

~~~
kikoreis
Where I have worked I am semi famous for a style of code review which almost
exclusively is based on asking questions. A lot of the questions are around
design rationale, because the "why" is often not captured. Questions can
either lead to code changing, "oh, you're right, that doesn't make sense" or
"missed that!" , or a comment being added to ensure the rationale is captured.
Developers seem to really appreciate this type of review, in contrast to the
style you have outlined.

~~~
kangax
This is such an underrated comment.

I love this style of code reviews and it's the one I've experienced best
discussions with. It goes in line with assuming that you, as a reviewer, don't
know what the intentions/rationale are rather than blindly dictating your
solution. A lot of times the reviewer that says "this is wrong" is then
explained to why it was done a certain way and is the one proven wrong.
Questioning yourself as a reviewer is always a good idea. Asking questions
gives feedback without coming off aggressive/judgemental/patronizing.

------
colemorrison
In the "How it all started section" -

> E explained that the main impetus behind the introduction of code review was
> to force developers to write code that other developers could understand;
> this was deemed important since code must act as a teacher for future
> developers.

This should always be at the forefront of the code reviewer's mindset. It's
quite easy for the review process to degenerate into a style argument, quest
to find every inefficiency, or attempt to make code as "clever" and brief as
possible.

~~~
romed
I think a key to instilling positive code review culture is to make sure that
the comments are based on some objective, such as readability and clarity, and
not based on personal preference. In particular, if one person is reviewing
code of another, it is not helpful for the reviewer to remark that they would
have written it differently, if they had written it at all. Some people for
some reason can't keep themselves from doing this. They should remember that
they did not, in fact, write the change, so their personal style is
irrelevant. If code is clear, obviously correct, well-tested, and correctly
formatted, it LGTM.

~~~
gear54rus
except the definition of readability varies and we're back to square one

~~~
dirkgently
Unless companywide guidelines are agreed upon, and published.

E.g
[http://google.github.io/styleguide/javaguide.html](http://google.github.io/styleguide/javaguide.html)

[https://github.com/google/styleguide/blob/gh-
pages/pyguide.m...](https://github.com/google/styleguide/blob/gh-
pages/pyguide.md)

~~~
mikewhy
[https://developers.google.com/edu/python/introduction](https://developers.google.com/edu/python/introduction):
According to the official Python style guide (PEP 8), you should indent with 4
spaces. (Fun fact: Google's internal style guideline dictates indenting by 2
spaces!)

[https://github.com/google/styleguide/blob/gh-
pages/pyguide.m...](https://github.com/google/styleguide/blob/gh-
pages/pyguide.md#34-indentation): Indent your code blocks with 4 spaces.

------
ridiculous_fish
Xoogler here, circa 2014. Noogler orientation impressed upon us that every
change was code reviewed, period! However my experience was that code reviews
were heavyweight and frustrating, often to the point of just being skipped:
completely different from the article's observation of latency in hours and
"fast-paced iterative development."

1\. My initial team was a firmware engineer, and myself doing Android
development. My peer felt unqualified to review Android code, so I spent a lot
of energy begging code review time from tangentially related teams. Their
feedback was not deep or useful for obvious reasons, and the days of latency
posed a major problem because our project was tied to a hardware schedule.

I learned later my teammate was simply stamping his own changes, which the
tool permits, and so I started doing the same. Bypassing code review resolved
the problem.

2\. My next team was mostly in East Asia, while I was in Mountain View. If
they had a question on a change, it would add two days latency due to time
zone differences. And because I was physically detached from my peers, I
believe they felt more comfortable deferring my changes. Again we were tied to
a hardware schedule, so this latency was very painful.

3\. The last scenario was overseeing builds, on (say) a Foxconn factory floor.
For every build, the hardware coming off the line would behave unexpectedly,
and we would have to very quickly update some code to make things work. Code
reviews in this situation would have been absurd: any changes only had to last
as long as the build itself, and well, there wasn't even network access. This
is admittedly an exotic and specialized scenario (but recall orientation's
"every change, period!")

As a Googler code reviews were either skipped or were a major burden. I made
some mistakes: I ought to have escalated quicker, increased my visibility,
been less cowed by the orientation. Google also could improve in some areas:

1\. Code review expectations need to be adjusted for small heterogenous teams.
Such teams are more likely as Google ramps up its hardware efforts.

2\. The engineering culture ought to recognize the burden imposed by time
zones. Code reviews must be more forgiving when the communication cost is
high.

3\. Shift orientation towards teams instead of company-wide. CR for self-
driving cars must be fundamentally different than CR for matching selfies with
famous artwork, or CRs for updating factory floor test fixtures.

Some of these may have changed since I left and if so I'd love to hear how.

~~~
ndh2
> _If they had a question on a change, it would add two days latency_

Wouldn't that be just one day? You ask the question while they sleep, they
answer while you sleep, you get the answer the next day.

~~~
falsedan
You publish the code review at 5pm on day 1, it gets reviewed and a question
added at 5pm (local time) on day 2, you see it, answer the question/make a
change and publish at 5pm day 2, it gets a ship it on day 3 (local time), and
then you push the change on day 3

~~~
ndh2
Oh, alright. I thought the "it" would refer to the question, not the whole
review process.

------
joatmon-snoo
Putting the human code review aside, one of the things I've been more
impressed by during my time at Google has been the automated presubmits,
generally the static analysis ones. I do a lot of Java, so Error Prone
([https://errorprone.info/bugpatterns](https://errorprone.info/bugpatterns))
has been a useful one (cf
[https://errorprone.info/bugpattern/FormatString);](https://errorprone.info/bugpattern/FormatString\);)
we have some others that Tricorder runs but I can't seem to find public
references to.

One thing that's also an interesting challenge is the work it takes to make
something a _blocking_ presubmit: sometimes you don't get useful feedback from
one, or maybe a test doesn't provide very good signal, and then people start
force submitting to bypass the check because it's not providing utility, but
in the process also skip other checks.

~~~
andygrunwald
What so you mean by automated presubmits? Are those „just“ CI checks for
static analysis or is there more sophisticated things going on like merging it
to a generated branch, deploying it, running all tests in the infrastructure
there and more?

Can you elaborate a little bit more on this? Or point me to a resource that is
doing this?

------
erikpukinskis
I will be glad when people stop using the word “modern” for this kind of
thing.

It’s an artifact of some imagined distinction we seem to cling to in the
computer world, between the present and the past. No one talks about how to
write modern novels. They’re just novels. It’s presumed that they are changing
over time. And that things are situated in a historical progression.

Also, modernism is an actual thing, and it would be nice if we could use the
word for that.

~~~
jeffnappi
No doubt. I believe the word they are looking for is _contemporary_.

~~~
kirkules
I'd rather say "current", because to me "contemporary" is something like a
transitive adjective. Things really should be contemporary _with_ (or
contemporaries _of_ ) something else.

~~~
erikpukinskis
I’d rather use an actually descriptive adjective that situates practices in
logical space, rather that use a time-relative term that will become
meaningless within a year.

------
beefheart
Unpopular (?) Opinion:

In most "modern" environments, with constant online updates, where shipped
defects are not that costly anymore, code review is a net loss. You can just
skip it. This study fails to compare with "no code review whatsoever" and it
measures by "defects found" instead of "total effort spent" (i.e. the only
metric that truly matters).

In code review, most developers will not do the kind of thorough analysis that
uncovers serious bugs, if they actually did, code review would be too costly.
Instead, disagreements on superficialities and bike-shed arguments are
promoted, often leading to low-value follow-up changes that can easily cause
worse issues than they solve.

Code review satisfies the urge for process and structure, but it also
satisfies the need for diffusion of responsibility, which is _a bad thing_.
People need to own their code.

Code reviews have some value during onboarding or with junior devs, to get
everybody on the same page. After that, you can disregard it, the effort is
better spent on other things like automated testing.

~~~
falsedan
In my experience, code reviews

1\. Check that the code is addressing the problem at hand and is fulfilling a
business need (mostly making sure it’s linked to a reasonably-written ticket)

2\. Align style and make code look consistent across repos/teams (which
includes pointing out similar work which could be reused)

3\. Force other people to read and understand the change, so the original
author isn’t on the hook for all support/oncall questions)

4\. Highlight hard-to-understand sections that might be unclear to someone
newly-arriving to the code, suggesting those parts need more explanation or a
simpler implementation

5\. Copy editing on error messages and docs to remove typos/passive-
aggressiveness/make important information easier to see

6\. Find bugs by humans running static analysis (“this will race/deadlock”)
and provide pointers to resources to help the reviewee learn (“see <some link>
for why and how to avoid it”)

7\. Human-run linting (if the automatic linters miss/mess up stuff)

I’ve also see it used as a way to bully other less-knowledgeable devs by
nitpicking and issue-raising when the commenter isn’t included on the original
list of reviewers and leaves criticism without any constructive feedback
(“this is wrong”)

Rarely have we spotted bugs in code review: that’s what tests, oncall, and a
robust rollback mechanism are for

~~~
beefheart
A lot of these points are highly subjective and therefore dangerous points of
friction. If you're going to start having a discussion about the "business
need" when some presumably competent developer needs something they have
_already written_ merged, there's going to be a problem and it's not the code
in question.

Again, I'm presuming competence. Code reviews can help with tutoring new
arrivals, but at some point they need to graduate to a responsible, self-
directed actor. At that point, chances are their code reviews will get signed
off with a superficial "lgtm" anyway.

~~~
falsedan
This is mostly to make it easier to understand why the hell someone did
something in a particular way and if it’s safe to change it to a more
contemporary style years later. The reviewer should trust the author and the
business requirements that lead to the change being created, and just check
that it’s linked to a ticket that would make sense to that future maintenance
programmer.

------
vlovich123
A link that's actually loading for me:
[https://ai.google/research/pubs/pub47025](https://ai.google/research/pubs/pub47025)

------
foreigner
12 interviews and 44 survey responses? Is this a joke?

------
dochtman
I'm surprised by the finding that Google changes have a medium sized of 24
lines changed. While I'm a big fan of small commits, I was under the
impression that some of the large open source Google projects (Chrome,
Android) have relatively large commits flowing into their repositories.

~~~
y0ghur7_xxx
I am sure they are rebased several times before going public

------
lmilcin
After years of trying to make it work at various companies I have stopped
supporting "modern code review" implementations.

There is a host of problems with those in my opinions, and there are better
alternatives.

MCR are typically practiced as very shallow analysis by a colleague. It is
even solidified in the tyical feedback given as "Looks Good To Me". This is
not even remotely near to what is needed to get quality results. Most problems
discovered this way are very easy problems that jump right at you that could
have been prevented with just a bit more focus on the initial implementation.

The reviewer is typically prevented from doing the code review correctly. She
is typically engrossed in some other problem when the review comes requiring
her to switch context. Aside from the annoyance factor, the typical developer
will want to get to his original problem as quickly as possible.

Even more annoyingly, people will treat review notes personally and it is very
difficult to be the person that does most of the reviews for your team and
hence constantly point out errors in their judgement.

The reviewer is forced into needlessly artificial role of having to
reject/accept colleagues' results with little time for analysis and facing the
consequences of being on the other end of the stick the next time.

There is no incentive to spend required time on code review. Organizations
typically are not willing to support this expense of time even though
everybody is willing to support the concept of the review.

The review is typically done AFTER the implementation is already completed.
This is possibly the worst moment to give the feedback, when the original
developer is pressed for time to get his product shipped and where typically
very little can be fixed. This causes people to get defensive which is
straight way into conflict. It takes effort to avoid conflict if you try to be
dilligent in your reviews which is the whole point. So if you don't like how
the solution is structured it is typically too late to fix it and there is
incentive to just let it go.

Finally, the code review typically delays the deployment of the solution as it
sits in queue. This is against the agile idea of getting stuff shipped as fast
as possible, hopefully immediately.

Much better strategy, one that I am promoting now, is pair programming where
you physically sit with the other developer and spend time discussing the
problem, the requirements, the solution and then implementation and
verification. No work is allowed to be done outside of the pair.

The problem and requirements are analyzed by both persons so they both
understand it well. The implementation is done by both people so they have
equal, real chance of catching problems. The possible problems are caught
early in the process even at the analysis stage, preventing from having large
amounts of work done before review. The discussion is friendly and does not
force one of the developers into artificial position where she has to accept
or reject the solution. The developers take full responsibility for the
implementation as they understand when they ship the code there will be no
further review to catch errors -- only (hopefully) automated testing. Finally,
when the implementation is done it is done and immediately available for
shipment which means the process is not delayed.

~~~
ian1321
I generally agree with all of this.

------
mezzode
Off topic, this reminded me how much I hate how pdfs are the standard for
papers.

~~~
jokh
What's the alternative?

~~~
kybernetikos
Html was literally invented for sharing academic research. The fact that it
can also be used for shopping, banking and showing your friends your holiday
slides is an happy accident.

