
How I review code - peterstensmyr
https://engineering.tumblr.com/post/170040992289/how-i-review-code
======
taylodl
_" Senior engineers sometimes need to be reminded that highly performant,
abstract, or clever code is often difficult to read and understand later,
which usually means asking them for more inline comments and documentation."_

Ha! That's not a senior engineer. Senior engineers write the most simple-
looking code that just works. In every rainy day scenario imaginable. The
clever code writers aren't there yet.

~~~
scalesolved
Ha it does make me think of this tweet
[https://twitter.com/KevlinHenney/status/381021802941906944](https://twitter.com/KevlinHenney/status/381021802941906944)

You nailed it really, senior engineers code is the most simple looking as
generally they've picked the right abstraction for the problem.

~~~
weego
Is this a senior engineers are literally superheroes meme I've missed?

Everyone is capable of making poor decisions and straight up logic errors.
Senior devs sometimes more-so because we tend to get entrenched in a
particular issue solo for longer periods.

~~~
scalesolved
Not at all and it's now how I view senior engineers. True senior engineers
though do tend to find the right abstractions more often than other engineers
but it doesn't mean they can't make huge mistakes with bigger implications.

My thoughts are also on actual senior engineers, not 2 years of experience
etc.

------
yeukhon
I echo the author's point in "Review the code with its author in mind".

Without comments, sometimes it's really really difficult to navigate the code.
I have been adding more comments than ever: don't assume every line is
obvious, write a comment to explain what the next few lines really do.

    
    
        # base case: stop dividing when we find the largest square.
        if width == height:
            return width, height
        else:
            # otherwise, we know that we can break the land into several
            # pieces, some are already square-sized, but there must be
            # left over, which is the difference of the width and height,
            # and can be divided again.
    
            remain = abs(width - height)
            return largest_square_plot(remain, min([width, height]))
    

^ This code is only for me to read so I didn't really care much about
grammar... but a year from now I shouldn't have trouble understand the code in
a minute or two.

Some of my function/method has a pretty long docstring which may include
explaining the rationale, and perhaps even some ascii diagrams. If you have
trouble understand a piece of code after a few passes, that's a bad code.
Also, use more newlines...

> I check every Github email I get; I make sure that I don’t get notified for
> everything that happens in the repo

Not sure about others, but I am tired of reading PR notifications in my
mailbox. I don't know how kernel developers can live with this.

I have been thinking about just build a bot.

* receives PUSH from GitHub

* adds events to a queue

* notifies based on priority

* pings me once in a while to remind me that I have outstanding PRs to review (as reviewer and as author of the patch).

If someone needs me to review right away, he/she can reach out to me directly
in chat.

~~~
xerophyte12932
A good counter argument to Comments:

[https://blog.codinghorror.com/coding-without-
comments/](https://blog.codinghorror.com/coding-without-comments/)

~~~
Stratoscope
That article doesn't make a compelling case.

It suggests taking this code:

    
    
      // square root of n with Newton-Raphson approximation
      r = n / 2;
      while ( abs( r - (n/r) ) > t ) {
          r = 0.5 * ( r + (n/r) );
      }
    
      System.out.println( "r = " + r );
    

And refactoring it to this function:

    
    
      private double SquareRootApproximation(n) {
          r = n / 2;
          while ( abs( r - (n/r) ) > t ) {
              r = 0.5 * ( r + (n/r) );
          }
          return r;
      }
    
      System.out.println( "r = " + SquareRootApproximation(r) );
    

I'm all for this refactoring, but something was lost in the process. What kind
of square root approximation is being used? Does the algorithm have a name?
What would I search for if I wanted to read more about it? That information
was in the original comment.

~~~
macobo
There's an infinite amount of detail that's impossible to capture in a comment
and which invariably changes over time and doesn't hold in the future.

For my team, the solution has been writing longer commit messages detailing
not only what has changed, but also the why and other considerations,
potential pitfalls and so forth.

So in this case, a good commit message might read like:

``` Created square root approximation function

This is needed for rendering new polygons in renderer Foo in an efficient way
as those don't need high degree of accuracy.

The algorithm used was Newton-Raphson approximation, accuracy was chosen by
initial testing:

[[Test code here showing why a thing was chosen]]

Potential pitfalls here include foo and bar. X and Y were also considered, but
left out due to unclear benefit over the simpler algorithm. ```

With an editor with good `git blame` support (or using github to dig through
the layers) this gives me a lot of confidence about reading code as I can go
back in time and read what the author was thinking about originally. This way
I can evaluate properly if conditions have changed, rather than worry about
the next Chthulu comment that does not apply.

~~~
dhimes
Now you have to look in two places for the information- the code and the
commit messages.

~~~
macobo
How so? The code still documents what is happening, the commits however lay
out the whys.

The point is that these two are separate questions, and that trying to use
comments as a crutch to join the two religiously is a headache. It's
impossible to keep everything in sync and I don't want to read needless or
worse misleading information.

What's worse, in comments we often omit the important details such as why was
the change made, what other choices were considered, how was the thing
benchmarked, etcetc.

That said, comments still have a place. Just not everywhere for everything and
especially not for documenting history.

~~~
dhimes
I disagree. I think the "whys" belong in the comments- in fact, that's the
most important part of the comment if the code is cleanly written. I don't
want to be happily coding along, get to a glob and have to go to the repo
pane, hunt for the commit that explains this particular thing, then read a
commit message. Put it in a comment in the code. Pretty please.

------
anameaname
A conundrum for me is how to get other people to code review the way I want to
be code reviewed? Particularly, I noticed code reviewers on my team are pretty
pedantic, obsessed with correctness, and need to be explained why each change
is okay. These are people that regularly write good quality code themselves,
but there is a high amount of distrust. Why doesn't a team of talented
programmers trust each other?

(in case it needs to be stated, to date, I have reviewed about as much code as
I have written, upwards of 100k lines, as have most of the other people on my
team. We aren't amateurs, but it often seems like we're babies.)

~~~
adrianN
I'm an overly pedantic reviewer because I feel the review is one of the few
places where I can counteract the accumulation of technical debt. Once the
less than ideal code is in the codebase it's there to stay. I also ask a lot
of pretty dumb questions during a review because I want to make sure that my
understanding of the requirements matches the understanding the other engineer
had.

~~~
watwut
> Once the less than ideal code is in the codebase it's there to stay.

That is not objective fact across projects. That means either the process or
culture is bad.

~~~
adrianN
I've never seen a project where you go back to fix something just because its
implementation is not ideal. If it works it stays, because there is always
something more important to do.

~~~
LandR
This, this and this.

You get told to quickly write something as a Proof of Concept. Then get told,
it works so put it in the code and reelase it. We will rewrite it later.

You _never_ get to rewrite it later.

Over time the code base goes to hell. _Every single time_

~~~
watwut
I have hard time to believe you never refactor function you come across. I see
how refactoring whole architecture or something major is avoided, but when it
comes to smaller functions and pieces of code, refactoring it as you go takes
around same amount of time as refactoring it during that detailed code review
(e.g. often very little).

The really big mess tend to be emergent - when features and code base grew too
much for original architecture over time. You cant prevent that one by
detailed code review and it takes a lot of effort to fix it.

~~~
LandR
At my last job, if you had unecessary changes for the work item (e.g.
refactoring) you failed the code review.

But then we had 5% unit test coverage on a 5million+ LOC code base, they were
terrified that refactoring (with no tests to back it up) might break things.

We were writing medical software and they were very risk adverse (obviously),
so if i refactored a method I would have to raise that and would mean test
cases would have to be written and manually run to ensure I hadn't broken
anything.

So refactoring hardly happened (The code was an absolute mess), when i left
they were working on a project to fix it but the timescales for it were 5 to
10 years.

------
pmcollins
> We have repositories for the PHP backend, our database schemas, our iOS
> (Swift/Obj-C) and Android (Java/Kotlin) mobile apps, infrastructure projects
> written in Go, C/C++, Lua, Ruby, Perl, and many other projects written in
> Scala, Node.js, Python, and more

Why do organizations allow this? I realize that some platforms require their
own languages (iOS, Android), but outside of that, just pick one or two and
hold the line.

~~~
athenot
Why? Having many languages allows to pick the sweetspot for each subsystem.
Each language has special strengths and weaknesses, there is no silver bullet
that excels at everything. Go, C/C++, Lua, Ruby, Perl, Scala, Node.js,
Python... each of these are THE best choice for certain classes of problems
(and terrible for others). It may be because of language features that
elegantly express a solution, or particular efficiency considerations (speed,
latency or memory footprint), or integrations with specific libraries (if
language X has a lib that does 90% of the requirement, use it instead of
reimplementing from scratch in language Y). Also, it may be a team
consideration: the talent pool for front-end engineers has different
preferences than the talent pool for some back-end systems.

In a large system like Tumblr, they are likely highly distributed with many
different subsystems running in different parts of their infrastructure, each
with their own SLA and resilience requirements.

Their team is probably large enough that most engineers focus on only some of
the subsystems, and communication between them is via defined interfaces.

Though having a diverse ecosystem fosters an openess of mind, instead of
enforcing "One Metaphore To Rule Them All". You find yourself borrowing
efficient patterns from other environments, as they eventually start cross-
pollinating.

~~~
pencilhappen
> Each language has special strengths and weaknesses, there is no silver
> bullet that excels at everything. Go, C/C++, Lua, Ruby, Perl, Scala,
> Node.js, Python... each of these are THE best choice for certain classes of
> problems (and terrible for others).

Not quite true... not all languages have a sweet spot in a production
environment. Node.js isn't particularly excellent at anything, the attraction
is mainly "I know JS, and I don't want to learn anything else right now",
which is a terrible attitude for someone who wants to have a career in tech.
Knowing more than one tool in the toolbox is a key skill, since there is no
one-size-fits-all.

Jury is still out on Scala. It's a big language, but unclear if there's a
production sweet spot or not compared to other JVM hosted languages.

The rest of the ones you listed have their definite sweet spots. There are
tons of languages though, and most don't have one.

~~~
athenot
> Node.js isn't particularly excellent at anything

That's not very kind. Yes it has its flaws but it's not bad at being a higher
abstraction over async-everything, augmented with a huge module repository.
Support for isomorphic javascript can come in handy for some applications.

I've never done Scala but my understanding of it is that it could be a good
fit for modelling certain types of business logic while being able to take
advantage of exiting java libs out there.

> Knowing more than one tool in the toolbox is a key skill, since there is no
> one-size-fits-all.

Yes, we agree here. That was the point I was making.

~~~
pencilhappen
Node is adequate at async, but not excellent. As I said in a sibling comment,
things that are synchronous still, like GC, make it fall down at scale.
Isomorphic JS isn't that useful in most real situations.

My point about Scala is that there doesn't seem to be a clear cut niche that
it excels at. There are numerous other non-Java languages that run on the JVM
that could be in the running, and they all get to take advantages of the whole
Java ecosystem too.

------
skate22
My code is well commented. //eslint-disable-line all over the place.

~~~
lsadam0
Invoking eslint-disable needs to have comment of it's own justifying why the
line in question is being skipped. What use is the linter if we just disable
it everytime it complains?

~~~
skate22
I dont have permission to remove lint rules but some of them are absurd.

I refuse to remove 'extrenious' parenthesis that make the code more readable
to junior devs who may not know the language specific order of evaluation in a
logical expression.

------
nimbix
For me the most important part of reviewing any nontrivial changes it actually
check out the branch and test every change I see. This keeps a lot of issues
from reaching the QA team and catches issues they could have missed since they
don't actually go through the code.

~~~
ryanianian
This is an ideal but is hardly scalable if you're doing 3-4+ PRs per day and
are expected to do your own coding as well (plus attend bureaucracy).

You can effectively do this by checking that every nontrivial change has
sufficient automated test coverage. This saves you from having to test changes
yourself and saves future devs from having to go through your thought-process
when they touch that code next.

------
bwest87
Something the author doesn't bring up, but that we started doing about 9
months ago at my company, is synchronous reviews. Meaning the committer is on
the phone or in person with the reviewer. It's great. We don't do it for all
PR's, but anything medium sized or above, or even small one's if they involve
critical logic. The way we usually do it is the committer walks through the
changes with the reviewer. Often the committer will realize their own ways of
improving the code. And with the added context, the reviewer can often provide
better feedback. Plus the X factor of just two people talking who come up with
ideas, improvements, etc. And half our team is remote, so this wasn't a
natural outgrowth. We make it happen, but I think it's worth it.

~~~
uremog
That sounds very similar to Rubber Duck Debugging.

------
soneca
I am a junior developer and my latest feedback was that one of the main skills
I should develop is to make better, more well-thought, critic and deep code
reviews (including of PRs from more senior developers).

Any tips on how to improve this?

Would a checklist help? Have a clear process on what to review first?

~~~
wiredfool
A checklist can help. (he says, and then does a mental one because there isn't
one nearby).

What I look for is:

1) What's the problem being solved? Does this look like a reasonable approach?
Is the code pythonic (Obv: for python)?

2) What edge cases are there? Does this handle the important ones? Does it
punt properly on the less important ones?

3) Look for a short list of bug classes that have come up in the project
before that have lead to emergency patches. E.g. Decrefing, Checking mallocs,
any exec sorts of things. (This is a clear application for a checklist)

4) Are there tests/documentation/other required fixtures and stuff?

5) Does the code generally match the style of the project?

1000) Code formating and whitespace and line wrapping and all that
bikeshedding stuff.

Feel free to short circuit anywhere once it becomes clear that there's more
work required.

~~~
kripke
1000 should really be handled by automated tools. Takes useless burden from
the reviewer, and emotionally easier for both sides too.

~~~
wiredfool
It can be, especially at a company. (But then watch the bikeshed discussion on
the tools. And PEP8 is a guideline, not a set of hard and fast rules.
Beautiful is better than ugly and all that. ).

What I see as one of 5 maintainers of an open source project is that when a
review comes back with a bunch of formatting comments, it's because the
reviewer didn't step back and see the other parts. Raymond Hettinger has a
talk where he discusses it, but it's the sort of feedback that can be given in
almost any case and can obscure the more fundamental issues with the code.

------
chriswarbo
> I look for code that is well-documented (both inline and externally), and
> code that is clear rather than clever. I’d rather read ten lines of verbose-
> but-understandable code than someone’s ninja-tastic one-liner that involves
> four nested ternaries.

"Clear" and "clever" aren't in opposition, and likewise "verbose" and
"understandable" aren't correlated.

I think this characterisation, and especially the example, shows a lowest-
common-denominator straw man of "clever one-liners" which seems to miss the
reason that some people like them. In particular, it seems to be bikeshedding
about how to write branches. The author doesn't say what those "ten lines of
verbose-but-understandable code" would be, but given the context I took it to
mean "exactly the same solution, but written with intermediate variables or
if/else blocks instead".

This seems like an analogous situation to
[https://wiki.haskell.org/Wadler's_Law](https://wiki.haskell.org/Wadler's_Law)
where little thought is given to what the code means, more thought is given to
how that meaning is encoded (e.g. ternaries vs branches) and religious
crusades are dedicated to how those encodings are written down (tabs vs
spaces, braces on same/new lines, etc.).

Note that even in this simple example there lurks a slightly more important
issue which the author could have mentioned instead: nested ternaries involve
boolean expressions; every boolean expression can be rewritten in a number of
ways; some of those expressions are more clear and meaningful to a human than
others. For example, `loggedIn && !isAdmin` seems pretty clear to me; playing
around with truth tables, I found that `!(loggedIn -> isAdmin)` is apparently
equivalent, but it seems rather cryptic to me. This is more obvious if
intermediate variables are used, since they're easier to name if they're
meaningful.

In any case, compressing code by encoding the same thing with different
symbols doesn't make something "clever". It's a purely mechanical process
which doesn't involve any insights into the domain.

To me, code is "clever" if it works by exploiting some non-obvious
structure/pattern in the domain or system. For example, code which calculates
a particular index/offset in a non-obvious way, based on knowledge about
invariants in the data model. Another example would be using a language
construct in a way which is unusual to a human, but has the perfect semantics
for the desired behaviour (e.g. duff's device, exceptions for control flow,
etc.).

Such "clever" code is _often_ more terse than a "straightforward" alternative,
but that's a side-effect of the "cleverness" (finding an existing thing which
behaves like the thing we want) rather than the goal.

If the alternative to some "clever" code is "10 lines of verbose but
understandable code" then it's probably not that clever; so it's probably a
safe bet to go with the latter. The _real_ issues with clever code are:

\- Whether the pattern it relies on is robust or subject to change. Would it
end up coupling components together, or complicate implementation changes in
unrelated modules?

\- How hard it is to understand. Even if it's non-obvious, can it be
understood after a moment's pondering; or does it require working through a
textbook and several research papers?

\- Whether the insights it relies on are enlightening or incidental, i.e. the
payoff gained from figuring it out. This is more important if it's harder to
understand. Enlightening insights can change the way we understand the
system/domain, which may have many benefits going forward. Incidental insights
are one-off tricks that won't help us in the future.

\- How difficult it would be to replace; or whether it's possible to replace
at all.

This last point is what annoys me in naive "clever vs verbose" debates, and
prompted this rant, since it's often assumed that the only difference is line
count. To me, the best "clever" code isn't that which reduces _its own_ line
count; it's the code which _removes problems entirely_ ; i.e. where the
alternative has caveats like "before calling, make sure to...", "you must
manually free the resulting...", "watch out for race conditions with...", etc.

One example which comes to mind is some Javascript I wrote to estimated prices
based on user-provided sliders and tick-boxes, and some formulas and constants
which sales could edit in our CMS (basically, I had to implement a spreadsheet
engine).

Recalculating after user input was pretty gnarly, since formulas could depend
on each other in arbitrary ways, resulting in infinite loops and undefined
variables when I tried to do it in a "straightforward" way. The "clever"
solution I came up with was to evaluate formulas and values lazily: wrapping
everything in thunks and using a memo table to turn exponential calculations
into linear ones. It was small, simple and heavily-commented; but the team's
unfamiliarity with concepts like lazy evaluation and memoising made it hard to
get through code review.

Also, regarding "straightforward" or "verbose" code being "readable": it's
certainly the case that any _particular_ part of such code can be read and
understood _locally_ , but it can make the _overall_ behaviour harder to
understand. Just look at machine code: it's very verbose and straightforward:
'load address X into register A then add the value of register B', simple! Yet
it's very hard to understand the "big picture" of what's going on. Making code
more concise, either by simplifying it or at least abstracting away low-level,
nitty-gritty details into well-named functions, can help with this.

When used well, "clever" code can reframe problems into a form which have very
concise solutions; not because they've been code-golfed, but because there's
so little left to say. This can mean the difference between a comprehensible
system and a sprawling tangle of interfering patches. This may harm local
reasoning in the short term, since it requires the reader to view things from
that new perspective, when they may be expecting something else.

When used poorly, it results in things like nested ternaries, chasing
conciseness without offering any deeper understanding of anything.

~~~
tome
An HN gem of a comment!

------
thetruthseeker1
I don’t know how much time he spends code reviewing. But if at the end of the
day he wants anybody to get the complete context of what the change entails by
looking at the PR... I would think the code review process is more elaborate
and time consuming than many companies can afford.

------
agentgt
> _clever code is often difficult to read and understand later_

I have seen this many times and they are actually usually talented developers
that are just not used to working in groups....

but what I have seen more often (back when I did code reviews)... is lazy copy
and pasting or something analogous.

------
DarkVador
I don't understand why you need to know the progamer behind the code ? You
need to be totaly impartial when you judge something.

So i think it's a wrong way to review the code. You don't need the WHO but the
WHY.

------
dingo_bat
> I’d rather read ten lines of verbose-but-understandable code than someone’s
> ninja-tastic one-liner that involves four nested ternaries.

One of my pet peeves is the inability to solve a K-Map for 6 variables and do
stuff based on the resulting boolean expression. Just because it is utterly
unreadable. I've tried this with many reviewers but nope.

