
Code Review – A Needed Habit in Science - bootload
http://software-carpentry.org/blog/2015/10/code-review-habit.html
======
ngoldbaum
Pull requests are slowly becoming my main way of contributing to scientific
software. The main project I work, yt, has almost 2000 pull requests since we
started actively doing code review on bitbucket [1]. We regularly catch bugs
and style issues through this process, and it makes it easy to test out and
look at in-development before it's merged into the development branch.

That said, pull requests aren't a panacea. In particular, as the amount of
code to review increases, the utility of reviewing the pull request decreases
proportionately. It's important that new code comes in in bite-sized pieces,
and the code gets reviewed changeset by changeset. We're still struggling with
this, since often the code review happens at a late stage in a development
project when a developer wants to get some code used in a research project
merged into the public distribution. We have to strike a balance between code
quality and not putting up arbitrarily high barriers to getting code merged.

[1] [https://bitbucket.org/yt_analysis/yt/pull-
requests/](https://bitbucket.org/yt_analysis/yt/pull-requests/)

~~~
sitkack
[http://yt-project.org/](http://yt-project.org/)

I just had an idea about using deep learning to train a bot to judge the
quality of a merge. Does it have tests? Does it pass lint? Is the subset of
the grammar used in the patch the same as the rest of the source? If a patch
is "clean" it should get automatically merged and only require human
intervention to reject it.

~~~
Gankro
So basically, anything that looks like a real patch should be merged? This is
a terrifying idea. Human review is necessary to determine if a change is
semantically sound and technically/politically desirable. No amount of
automation could ever address these concerns.

Lint checking is handled by basic CI integration (e.g. Travis). Having tests
is something a reviewer can trivially check (and many changes don't
necessarily warrant test changes anyway).

~~~
sitkack
> So basically, anything that looks like a real patch should be merged?

Yes.

> No amount of automation could ever address these concerns.

Watch out for when experts say no.

If a tests and coverage can be trivially checked by a reviewer, why not let a
bot do it? Conversely if it doesn't meet basic standards, why not let a bot
ask for changes?

What if I train a refactor bot to make trivial refactorings? Naming, extract
method, extract interface, warn on over coupling? If the cost of backing out a
patch in the future is near zero, the cost of applying a patch today is also
near zero. If the person making the change has a good track record, why not
automerge?

~~~
Gankro
Doing lots of automated analysis on a patch is definitely great. However I'm
talking about the kind of analysis that really requires humans who understand
the larger ecosystem. A bot can't understand that some interface design is
broken on windows because some kernel API exists that subverts its
assumptions.

Further, the cost of backing out or applying a patch isn't zero. A bad patch
landing (even one that passed all the analysis at your disposal!) can cause
big projects to have to basically shut down their tree as they scramble to
figure out what's wrong. This is a huge cost. The classic example of this is
creating a sporadic test failure, causing random patches to bounce.

I've definitely worked in projects where I'll basically automerge some PRs
because I know the person is great and the risks are low, but this is
generally conditional on the purpose of the PR. If it's cleanup and
refactoring... great, ship it, I don't care. If it's adding some
functionality, I need to review if that functionality should be added.

However a big value of PRs is to force us to uncut corners. We all cut corners
(or miss obvious details) when we program, and I see human review constantly
catching this kind of sloppiness and forcing great programmers (the kind I'd
automerge on trivial changes) to do the job right. If you just automerge, you
accumulate code debt much more rapidly. There's also the aspect that human
review often helps ensure that at least two people understand the decision
process behind some piece of code.

~~~
sitkack
Don't disagree on much.

What if the cost of backing out a change actually was zero or close to zero?
What if the prospective change actually was run on all platforms with all of
the historic input the function has ever seen?

What happens when our undo buffer and our log of refactorings gets checked
into version control. Not a log of pure text, but a log of semantic
transformations.

Of course bad patches make it into the tree. Right now all the bad patches
require human intervention. I am saying that it might be economically
advantageous to let in some of the bad patches automatically so it takes less
work.

For most codebases I see in industry, simply rejecting a patch on lint
failure, lack of tests or documentation would effectively freeze most trees.

------
sitkack
I was just talking with a friend of mine that teaches writing at a university
and the dean wants to focus on STEM, so much so that they are considering
drastically cutting back the english department.

This is a huge mistake.

Efficient and succinct communication of ideas as expressed in writing
transcends all boundaries. Many of the best programmers I have known were
equally as skilled writing an essay as decomposing a problem into functions.
An essay can be lyrical and it can also be computational. The lack of clarity
in writing, in general, is the cause of bad code. This crushingly intense
focus on STEM is shortsighted. The root of the problem is a lack of clarity of
writing, both in english and in code.

~~~
lmm
Good writing is a vital skill for programmers. But are English departments any
good at teaching it? None of the great writers I know have English degrees.

~~~
dragonwriter
> None of the great writers I know have English degrees.

Plenty of notable writers have English degrees; "None (or, equally, 'all',
'most', etc.) of the _X_ that I know" usually introduces a statement that
reveals more about the speaker's social bubble than the broader world.

------
unwind
Why is it that as soon as you talk about software in science, the term "codes"
gets used when all software-industry people would say "code" or "programs"?

Example from the article:

 _We had three codes to review, they were written in IDL [...]_

That is clearly talking about _programs_ to review, but uses "codes" as if it
is synonymous with "program". So annoying!

Is it just me, or is this usage very typical of the scientific community?

~~~
dalke
Is it the use of "code" instead of "program", or the use of code as a
countable noun rather than uncountable which you find jarring?

The author is a native Spanish speaker, and I've seen fluent but non-native
speakers of English make similar confusions concerning other mass nouns and
irregular plurals.

So could you be more specific by pointing to how a native English speaker uses
it in a scientific software context?

~~~
dalke
Methology: I did a search of [http://arxiv.org/find/hep-
ex/1/abs:+codes/0/1/0/all/0/1?quer...](http://arxiv.org/find/hep-
ex/1/abs:+codes/0/1/0/all/0/1?query_id=d5fca86498d3f01c&form=yes) for "codes"
in hep-ex entries at arXiv.org and reported the first few papers which
actually contained "codes". (It appears that stemming is in place, as some
papers only had 'code'.)

[http://arxiv.org/pdf/1510.08141.pdf](http://arxiv.org/pdf/1510.08141.pdf) \-
"The processes are simulated with the pythia and MadGraph 5 Monte Carlo codes"
(authors are from Brazil and Switzerland)

[http://arxiv.org/pdf/1510.04063.pdf](http://arxiv.org/pdf/1510.04063.pdf) \-
"They present new challenges to the simulation codes" (author is from Germany)

[http://arxiv.org/pdf/1509.00209.pdf](http://arxiv.org/pdf/1509.00209.pdf) \-
"The codes and setups used for the formulation of these plots are listed
below:" (from Italy) (Note the ' _and setups_ '. This is a clear mistake made
by a non-native speaker.)

[http://arxiv.org/pdf/1508.00589.pdf](http://arxiv.org/pdf/1508.00589.pdf) \-
"The Cambridge variable MT2 can already be reliably computed with one of
several publicly available codes" (3 of the 8 authors have a US address,
though only one of those three is a common name for a native English speaker)

[http://arxiv.org/pdf/1507.08764.pdf](http://arxiv.org/pdf/1507.08764.pdf) \-
"a benchmark for the hadronic interaction Monte Carlo simulations codes" (1 of
the 10 authors has a US address. That author has a typically English name)

[http://arxiv.org/pdf/1507.06706.pdf](http://arxiv.org/pdf/1507.06706.pdf) \-
"Both codes are available at" (1 US-based author of 5).

I then repeated the same search with 'programs':

[http://arxiv.org/pdf/1510.00391.pdf](http://arxiv.org/pdf/1510.00391.pdf) \-
"Using a fully automated framework based on the FeynRules and
MadGraph5.aMC@NLO programs" (1 UK-based author of 5)

[http://arxiv.org/pdf/1509.01918.pdf](http://arxiv.org/pdf/1509.01918.pdf) \-
"The results are compared with the values used in the simulation programs
GEANT4 and UNIMOD." (All authors are based in Russia.)

[http://arxiv.org/pdf/1508.05895.pdf](http://arxiv.org/pdf/1508.05895.pdf) \-
"various high-energy physics programs such as Monte Carlo event generators."
(1 UK based author of 6)

[http://arxiv.org/pdf/1507.00556.pdf](http://arxiv.org/pdf/1507.00556.pdf) \-
"This requires an interface between the higher-order theory programs and the
fast interpolation frameworks" (23 authors, 18 institutions, and 10
institutions in English speaking countries)

[http://arxiv.org/pdf/1506.08759.pdf](http://arxiv.org/pdf/1506.08759.pdf) \-
"The new estimates presented here are based both on simulation programs
(GEANT4 libraries) and theoretical calculations" (Authors are from Italy.)

[http://arxiv.org/pdf/1504.06469.pdf](http://arxiv.org/pdf/1504.06469.pdf) \-
"While many fixed-order and parton shower programs allow" (4 authors, 2 based
in English speaking countries, one with a common name for a British person)

This is surely not proof of anything, but it does demonstrate that 1) "codes"
is use in scientific computing (though it doesn't show the ratio between
"codes" vs. "programs" nor its use in fields other than experimental HEP), 2)
there's a suggestion that non-native speakers use "codes" more often, but it's
not clear at this level, and 3) there are a lot of non-native speakers pushing
papers, while I think the 'software-industry people' who publish in English
has a much stronger bias towards native English speaking people.

A more thorough analysis would also need to see if the modern scientific sense
is historically consistent; perhaps the industry use is newer though wider
coinage.

------
methou
In my workplace code review were done mostly during debug sessions, when
people ask their peers for helping. If code went through smoothly, then no
code review were done.

In this specific case, I believe well written tests/good coverage would be
more prioritized than code review.

Besides, most codes we wrote are merely prototypes that won't be used again.

~~~
maxxxxx
"Besides, most codes we wrote are merely prototypes that won't be used again."

I used to work with scientists that said pretty much the same thing. Once
their stuff started working we had to deal with a huge pile of very badly
written code and make this into a product quickly.

~~~
danieltillett
As a scientist who writes code, a good 95% of all the code I write is
prototype code that gets written just to see if an approach will work. Almost
everything I write never gets used again because the idea failed. This is just
the nature of research.

I do make an effort to rewrite what I have hacked together once I have a
working approach, but if I wrote everything correctly from the start I would
be wasting a huge amount of time and get very little actually done.

~~~
analog31
I'm a scientist too. A lot of my code is not even what I would call a
prototype. Many of my programs get used exactly once, by myself. I write a
program to solve a specific problem, and when the problem is solved, I move on
to something else.

Now, I also do some prototyping, but my workplace has a formal software
development process, and my code would never leak into a product. For one
thing, I use a different language than the programming department. Likewise
for the electronics and embedded portions of my prototypes.

Sometimes my code continues to serve a useful purpose, for instance allowing
testing of the hardware portions of a product, months before production
software is ready to use. And if our codes produce different results, well,
let's just say that I've lost a few bets, but have also won a few. ;-)

With all those things said, I'd welcome review of my code in any "critical"
situation, including preparation of results for publication.

~~~
danieltillett
I think the problem is expecting that there is a one-size-fits-all approach to
writing software. I have had many discussions with developers over the years
who come from a pure CS background who really disparage the code scientists
write. Yes a lot of the code is not good code, but it would not have been
possible to get anything done at all if strict coding standards had been
applied from the start.

The best approach I think is to take a hybrid approach of hack and then
rewrite. I have worked on problems where I have written 10,000s of lines of
very hacky code over many months and once I had a viable solution rewrote the
whole module from scratch in a week. The key is to make sure you (or someone
else) re-writes the code once it is working if you are going to use again and
not pretend that what you have hacked together is acceptable just because it
(sometimes) works.

~~~
collyw
As a software engineer (who works with scientists) that pretty much how I
write new bits of code where I am unsure if the approach will work. Try a
quick hacky version, and if that appers to be working, rewite it "properly".
If its going to be complex, then start off writing it reasonably well in the
first place as it will make debugging much easier.

~~~
danieltillett
You raise a good point about debugging - I have had to rewrite some hacky code
just because it just became too complex to debug efficiently long before it
was ready to be rewritten for final use.

------
strictfp
IMO code review as it is commonly done today focuses too much on small details
and way too little on the general approach, algorithms and architectural
soundness. We should talk more about the solution, not about the "code".

~~~
mattexx
+1. We review pretty much all code at Climate Corp (even science code) but
code review seldom reveals design flaws. I still think code review is
incredibly useful. It makes all of us better coders. It just isn't everything.

------
moondowner
I've worked on memory and performance optimizations on algorithms written in
Java by scientists, and I'd definitely agree - code review is really needed.
Some things are better handled while the design of the algorithm is taking
place.

------
Ch_livecodingtv
I love seeing the work of other coders. I learn from them. If I make
substantial review, they learn from me. If my review seems to be off, they get
challenged. I say yes It's a good habit and a useful habit when fairly used.

------
a3voices
Code reviews add a layer of bureaucracy and greatly slow down software
development speed. Because you have to worry about what others think about
your code, in addition to whether it actually works or not. Most comments are
nit-picking and minor flaws at best. I think code reviews are beneficial for
junior engineers, but should be avoided by everyone else.

