Hacker News new | comments | show | ask | jobs | submit login
Code Review – A Needed Habit in Science (software-carpentry.org)
91 points by bootload on Oct 30, 2015 | hide | past | web | favorite | 51 comments

Pull requests are slowly becoming my main way of contributing to scientific software. The main project I work, yt, has almost 2000 pull requests since we started actively doing code review on bitbucket [1]. We regularly catch bugs and style issues through this process, and it makes it easy to test out and look at in-development before it's merged into the development branch.

That said, pull requests aren't a panacea. In particular, as the amount of code to review increases, the utility of reviewing the pull request decreases proportionately. It's important that new code comes in in bite-sized pieces, and the code gets reviewed changeset by changeset. We're still struggling with this, since often the code review happens at a late stage in a development project when a developer wants to get some code used in a research project merged into the public distribution. We have to strike a balance between code quality and not putting up arbitrarily high barriers to getting code merged.

[1] https://bitbucket.org/yt_analysis/yt/pull-requests/


I just had an idea about using deep learning to train a bot to judge the quality of a merge. Does it have tests? Does it pass lint? Is the subset of the grammar used in the patch the same as the rest of the source? If a patch is "clean" it should get automatically merged and only require human intervention to reject it.

So basically, anything that looks like a real patch should be merged? This is a terrifying idea. Human review is necessary to determine if a change is semantically sound and technically/politically desirable. No amount of automation could ever address these concerns.

Lint checking is handled by basic CI integration (e.g. Travis). Having tests is something a reviewer can trivially check (and many changes don't necessarily warrant test changes anyway).

There's a great talk on this [1] by Raymond Hettinger where he argues that style guides (e.g. PEP8) are no alternative to thinking hard about the code. Blindly following rules (like a bot would do) just gives a false sense of security that it's good code.

[1] https://www.youtube.com/watch?v=wf-BqAjZb8M

> So basically, anything that looks like a real patch should be merged?


> No amount of automation could ever address these concerns.

Watch out for when experts say no.

If a tests and coverage can be trivially checked by a reviewer, why not let a bot do it? Conversely if it doesn't meet basic standards, why not let a bot ask for changes?

What if I train a refactor bot to make trivial refactorings? Naming, extract method, extract interface, warn on over coupling? If the cost of backing out a patch in the future is near zero, the cost of applying a patch today is also near zero. If the person making the change has a good track record, why not automerge?

Doing lots of automated analysis on a patch is definitely great. However I'm talking about the kind of analysis that really requires humans who understand the larger ecosystem. A bot can't understand that some interface design is broken on windows because some kernel API exists that subverts its assumptions.

Further, the cost of backing out or applying a patch isn't zero. A bad patch landing (even one that passed all the analysis at your disposal!) can cause big projects to have to basically shut down their tree as they scramble to figure out what's wrong. This is a huge cost. The classic example of this is creating a sporadic test failure, causing random patches to bounce.

I've definitely worked in projects where I'll basically automerge some PRs because I know the person is great and the risks are low, but this is generally conditional on the purpose of the PR. If it's cleanup and refactoring... great, ship it, I don't care. If it's adding some functionality, I need to review if that functionality should be added.

However a big value of PRs is to force us to uncut corners. We all cut corners (or miss obvious details) when we program, and I see human review constantly catching this kind of sloppiness and forcing great programmers (the kind I'd automerge on trivial changes) to do the job right. If you just automerge, you accumulate code debt much more rapidly. There's also the aspect that human review often helps ensure that at least two people understand the decision process behind some piece of code.

Don't disagree on much.

What if the cost of backing out a change actually was zero or close to zero? What if the prospective change actually was run on all platforms with all of the historic input the function has ever seen?

What happens when our undo buffer and our log of refactorings gets checked into version control. Not a log of pure text, but a log of semantic transformations.

Of course bad patches make it into the tree. Right now all the bad patches require human intervention. I am saying that it might be economically advantageous to let in some of the bad patches automatically so it takes less work.

For most codebases I see in industry, simply rejecting a patch on lint failure, lack of tests or documentation would effectively freeze most trees.

I didn't follow the last sentence. Does every pull request get reviewed by a human before merge? At what stage does the human rejection happen?

The smart deep learning bot can determine if a branch is "clean", but it surely can't determine if it's correct.

I find the type of quality measures you have described as being very superficial. I joined a startup a couple of months back, and it may well have passed all of your quality measures. But the overall design was terrible. I could have rewritten it in around a tenth of the code. The databse design was poor, with lots of redundant data. Joins were being done at the apllication level, and it was making way too many calls to the database an just doing things in a really inefficient roundabout way. 10 x more code than is needed makes understanding it way more difficult than is should be.

I certainly won't trust a bot to validate code for all the operational and trust issues it commingles. Not anytime soon.

What if validating code was your job, and you wanted to automate an unimportant part of it? What part would you automate and why?

I was just talking with a friend of mine that teaches writing at a university and the dean wants to focus on STEM, so much so that they are considering drastically cutting back the english department.

This is a huge mistake.

Efficient and succinct communication of ideas as expressed in writing transcends all boundaries. Many of the best programmers I have known were equally as skilled writing an essay as decomposing a problem into functions. An essay can be lyrical and it can also be computational. The lack of clarity in writing, in general, is the cause of bad code. This crushingly intense focus on STEM is shortsighted. The root of the problem is a lack of clarity of writing, both in english and in code.

> This crushingly intense focus on STEM is shortsighted. The root of the problem is a lack of clarity of writing, both in english and in code.

I think lack of clarity in writing -- either in code or in natural language -- isn't the root of the problem, its a symptom of a lack of good, clear general analytical process. Lots of programmers I've met are great at analyzing how to implement a very explicitly described thing in code, but aren't great at analyzing domain information to determine how to solve a problem.

I don't think that STEM-focused education is bad for this, but I think that developing generalized analytic skills seems to be helped by having experience applying analysis in varied and not-closely-related contexts, particularly contexts where the problems are often fuzzily defined and you need to do work to discover, or impose, structured order to compose a response.

Collegiate English departments can be an important venue for that, but I don't think uniquely so -- so can Philosophy or social sciences departments. (But its all highly dependent on the particular institution, professor, and courses chosen.) I'm personally partial to legal writing, specifically (often addressed in undergraduate coursework in law-focussed political science courses) as a vehicle for developing this skill, but then I'm a programmer who's also a law geek, and who was a political science major, so, there's probably some bias there.

The best programmers I know are also great writers.

Not good, but great.

Their ability to organize ideas to write about them also serves their ability to organize ideas to code about them. I really think we give too little credit to analytical training available from English departments. I would train an English major as happily as a CS major, other factors being roughly similar.

Good writing is a vital skill for programmers. But are English departments any good at teaching it? None of the great writers I know have English degrees.

> None of the great writers I know have English degrees.

Plenty of notable writers have English degrees; "None (or, equally, 'all', 'most', etc.) of the X that I know" usually introduces a statement that reveals more about the speaker's social bubble than the broader world.

> None of the great writers I know have English degrees.

Off the top of my head: Tolkien and C.S. Lewis (the creators of Lord of the Rings and Narnia, respectively) were both Oxbridge professors of English.

One problem is that English departments often don't focus on efficient and succinct communication of ideas. Occasionally it seems as if they are focused on encouraging willful misinterpretation of another person's writing. I've had the experience of telling a teacher that I wanted to get better at writing technical documentation and being laughed at and told that was boring. Of course, these experiences aren't universal and certainly my memory of them suffers from selection bias. But so do the memories of other folks, leading them to be highly skeptical of the argument that the English department is a valuable way to provide students with skill in communicating clearly.

What kind of research do English departments do? It you're just looking at creative writing classes, then you're missing out on a lot of what is important for the state of the art of the field.

Agree. I took a non-STEM education then transitioned into coding and after many years I now find myself writing proposals, task specifications, and research reports. I increasingly conduct independent reviews of documents other people have written - not just the quality of the English, but primarily whether they expose us as a business to technical or commercial risk because they are confused or misleading.

And after many late nights fixing documents just before their due dates, I've developed some basic guidelines that aim to increase clarity, including: * Not using multiple distinct terms for the same thing in different parts of the document * making responsibilities clear - especially in multi-party contracts! E.g saying "X will do Y" rather than "Y will be done" * Not expecting the reader to guess what the writer means (e.g. by saying "there will be an impact") * Not using 'this' after a discussion that references multiple distinct things.

The results might not look like the sort of literature I enjoy reading outside work, but they have much more clarity!

Yesterday, I had the idea that I should try working as a technical writer as a way to realign my past with my future - I don't want the stress of debugging things every day, and I've done this writing informally and always thought, "I wish I could spend more time getting this right but I can't justify it."

An hour and a half later, I had started conversation with the maintainer of an open source project, speccing out a portfolio project with him. It's a bit early to tell, but treating the writing as my only goal, rather than as a supplement to coding work, is doing things to my perspective...there's definitely a need for the humanities in technical environments. A really great piece of writing can do more than instruct clearly, it can motivate creative uses, put the techniques into perspective and lead people away from dangerous ideas. As well, in-depth writing about technology is a great way to review its real utility and add discipline to development that has gone astray.

And there is so much technology that would benefit from doing this better. I think it might be the proper antidote to cargo culting.

I never thought about this but you're absolutely right, I've yet to see a great programmer who doesn't also write really well. This works both ways, too: sloppier programmers also tend to be sloppier writers.

Even if what you say about programming is true, that doesn't mean you need English majors.

You don't need CS majors either. You can write software as a hobby.

Why is it that as soon as you talk about software in science, the term "codes" gets used when all software-industry people would say "code" or "programs"?

Example from the article:

We had three codes to review, they were written in IDL [...]

That is clearly talking about programs to review, but uses "codes" as if it is synonymous with "program". So annoying!

Is it just me, or is this usage very typical of the scientific community?

> Is it just me, or is this usage very typical of the scientific community?

It is a well-established tradition in scientific programming to use "code" as a count noun. It's even acknowledged in the Jargon file:


I find it jarring too, despite being a scientific programmer myself, but you learn to go with it. It is a tradition that this linguistic community has established, and there's no reason to rail against it.

I work at a supercomputer company, they've always referred to it thusly as codes.

Consider it the same as humor versus humour in computing and don't read too much into it. It comes from a lot earlier time from what I can gather, probably the 60's or even earlier where codes were run on mainframes.

Is it the use of "code" instead of "program", or the use of code as a countable noun rather than uncountable which you find jarring?

The author is a native Spanish speaker, and I've seen fluent but non-native speakers of English make similar confusions concerning other mass nouns and irregular plurals.

So could you be more specific by pointing to how a native English speaker uses it in a scientific software context?

Methology: I did a search of http://arxiv.org/find/hep-ex/1/abs:+codes/0/1/0/all/0/1?quer... for "codes" in hep-ex entries at arXiv.org and reported the first few papers which actually contained "codes". (It appears that stemming is in place, as some papers only had 'code'.)

http://arxiv.org/pdf/1510.08141.pdf - "The processes are simulated with the pythia and MadGraph 5 Monte Carlo codes" (authors are from Brazil and Switzerland)

http://arxiv.org/pdf/1510.04063.pdf - "They present new challenges to the simulation codes" (author is from Germany)

http://arxiv.org/pdf/1509.00209.pdf - "The codes and setups used for the formulation of these plots are listed below:" (from Italy) (Note the 'and setups'. This is a clear mistake made by a non-native speaker.)

http://arxiv.org/pdf/1508.00589.pdf - "The Cambridge variable MT2 can already be reliably computed with one of several publicly available codes" (3 of the 8 authors have a US address, though only one of those three is a common name for a native English speaker)

http://arxiv.org/pdf/1507.08764.pdf - "a benchmark for the hadronic interaction Monte Carlo simulations codes" (1 of the 10 authors has a US address. That author has a typically English name)

http://arxiv.org/pdf/1507.06706.pdf - "Both codes are available at" (1 US-based author of 5).

I then repeated the same search with 'programs':

http://arxiv.org/pdf/1510.00391.pdf - "Using a fully automated framework based on the FeynRules and MadGraph5.aMC@NLO programs" (1 UK-based author of 5)

http://arxiv.org/pdf/1509.01918.pdf - "The results are compared with the values used in the simulation programs GEANT4 and UNIMOD." (All authors are based in Russia.)

http://arxiv.org/pdf/1508.05895.pdf - "various high-energy physics programs such as Monte Carlo event generators." (1 UK based author of 6)

http://arxiv.org/pdf/1507.00556.pdf - "This requires an interface between the higher-order theory programs and the fast interpolation frameworks" (23 authors, 18 institutions, and 10 institutions in English speaking countries)

http://arxiv.org/pdf/1506.08759.pdf - "The new estimates presented here are based both on simulation programs (GEANT4 libraries) and theoretical calculations" (Authors are from Italy.)

http://arxiv.org/pdf/1504.06469.pdf - "While many fixed-order and parton shower programs allow" (4 authors, 2 based in English speaking countries, one with a common name for a British person)

This is surely not proof of anything, but it does demonstrate that 1) "codes" is use in scientific computing (though it doesn't show the ratio between "codes" vs. "programs" nor its use in fields other than experimental HEP), 2) there's a suggestion that non-native speakers use "codes" more often, but it's not clear at this level, and 3) there are a lot of non-native speakers pushing papers, while I think the 'software-industry people' who publish in English has a much stronger bias towards native English speaking people.

A more thorough analysis would also need to see if the modern scientific sense is historically consistent; perhaps the industry use is newer though wider coinage.

I think it's both.

But of course, I'm not a native English speaker.

I saw this prevalently used in the supercomputing industry, and at big government labs. I also think it's bizarre :)

I don't think it's typical of the scientific community as a whole (certainly not in, say, math), but I don't often interact with computational biologists and such.

In my workplace code review were done mostly during debug sessions, when people ask their peers for helping. If code went through smoothly, then no code review were done.

In this specific case, I believe well written tests/good coverage would be more prioritized than code review.

Besides, most codes we wrote are merely prototypes that won't be used again.

"Besides, most codes we wrote are merely prototypes that won't be used again."

I used to work with scientists that said pretty much the same thing. Once their stuff started working we had to deal with a huge pile of very badly written code and make this into a product quickly.

As a scientist who writes code, a good 95% of all the code I write is prototype code that gets written just to see if an approach will work. Almost everything I write never gets used again because the idea failed. This is just the nature of research.

I do make an effort to rewrite what I have hacked together once I have a working approach, but if I wrote everything correctly from the start I would be wasting a huge amount of time and get very little actually done.

I'm a scientist too. A lot of my code is not even what I would call a prototype. Many of my programs get used exactly once, by myself. I write a program to solve a specific problem, and when the problem is solved, I move on to something else.

Now, I also do some prototyping, but my workplace has a formal software development process, and my code would never leak into a product. For one thing, I use a different language than the programming department. Likewise for the electronics and embedded portions of my prototypes.

Sometimes my code continues to serve a useful purpose, for instance allowing testing of the hardware portions of a product, months before production software is ready to use. And if our codes produce different results, well, let's just say that I've lost a few bets, but have also won a few. ;-)

With all those things said, I'd welcome review of my code in any "critical" situation, including preparation of results for publication.

I think the problem is expecting that there is a one-size-fits-all approach to writing software. I have had many discussions with developers over the years who come from a pure CS background who really disparage the code scientists write. Yes a lot of the code is not good code, but it would not have been possible to get anything done at all if strict coding standards had been applied from the start.

The best approach I think is to take a hybrid approach of hack and then rewrite. I have worked on problems where I have written 10,000s of lines of very hacky code over many months and once I had a viable solution rewrote the whole module from scratch in a week. The key is to make sure you (or someone else) re-writes the code once it is working if you are going to use again and not pretend that what you have hacked together is acceptable just because it (sometimes) works.

As a software engineer (who works with scientists) that pretty much how I write new bits of code where I am unsure if the approach will work. Try a quick hacky version, and if that appers to be working, rewite it "properly". If its going to be complex, then start off writing it reasonably well in the first place as it will make debugging much easier.

You raise a good point about debugging - I have had to rewrite some hacky code just because it just became too complex to debug efficiently long before it was ready to be rewritten for final use.

Call it "refactoring" and you're golden. ;-)

For me, it's less about standards than about tools. I don't think software development is slow because of standards.

I don't use the same tool chain as the developers, so I'm oblivious to their standards. Their code is unreadable to me. I follow what would probably have been considered good practices 15 years ago when programs tended to be smaller. I actually learned good hygiene, but in the context of BASIC and Pascal in the 1980s, not the massive tool chains of today. And my programs probably look like a throwback to that era in terms of their conceptual structure.

I re-write early and often. Transforming a program is a good way to find out if it made sense in the first place.

I have always considered refactoring as improving your current code rather than starting from scratch. I like to start over once I have a working approach for two reasons:

1. The original code is by that time I have finished hacking a mess and it is not a good place to start.

2. I can use the original code and the new code to find bugs I missed the first time around. A good percentage of the time I find bugs in my original code that only a clean rewrite exposes.

That's perfectly valid. In fact I did a re-write of a favorite code, a few weeks ago. It's something a colleague and I are working on, and I explained to him that I no longer trusted the old code because it was such a mess.

Also, I try to incorporate new techniques that I've learned when I write new code.

I am not advocating for doing everything perfectly from the start but I think scientists who write code should at least try to learn some basic software engineering skills.

I am talking about simple things like writing a function instead of copy/paste the same 500 lines of code 10 times, using STL containers instead of heap allocated arrays everywhere, meaningful variable names or (my favorite) using variables for important numbers instead of putting the number 8 everywhere into the code so you don't know if 7 is "8-1" or really is just 7.

Some of the scientists I worked with somehow took pride in not following any coding practices because they were scientists, not mere coders.

I believe in this particular startup I worked for these practices contributed to the eventual failure since we had 100s of thousands of lines of prototype code that crashed left and right and nobody knew exactly what it's doing because there was almost no documentation.

I think a little respect for engineering practices is a good thing for scientists and I don't think it impedes their creativity.

A lot of scientists are ether self taught coders, or taught by other self taught scientists and so are ignorant of bad practices to avoid. Developers with experience should be helping educate the scientists in a positive way that the scientists don't feel like they are considered idiots by the developers. Respect is the key.

The problem with the startup you describe sounds more like a management problem. Hacked code that sometimes works is not code that should be used more than once and it should be rewritten before check in. If the scientists don't know how to do this then they need to be helped.

This reminds me of a recent popular quote: "The only difference between screwing around and science is writing it down."

Can we comapre "Screwing around" to "Hacked together" (undocumented?) throw-away code and "Writing it down" to "Correctly written" (documented?) code?

Aren't you as a scientist supposed to also document what ideas failed?

Now it depends a bit on what you mean exactly by "hacked together" and "correctly written" I guess. I'm all for prototyping and getting to the "heart" of a problem quickly to confirm / reject ideas. But I wonder.

Do e.g. chemists feel they could try more experiments quicker if they didn't have to take notes on what they tried and what happened? Is there a comparable activity to hacking together throw-away code in chemistry (or other older fields of science)?

It is not a matter of not taking notes (code is notes), it is more a case of not writing code that handles all the edge cases, writing good comments, and most importantly taking the time to design the software and api's well. All of this is worth doing, but it takes a lot more time than just hacking together some code to test if an idea will work.

To answer your other question in science yes there is the equivalent of hacking something together just to see if the idea will work. It is very common to do a quick experiment to see if something will work before going back and doing all the controls and working out how robust the experimental conditions are. Science is hard and almost everything you do fails.

IMO code review as it is commonly done today focuses too much on small details and way too little on the general approach, algorithms and architectural soundness. We should talk more about the solution, not about the "code".

+1. We review pretty much all code at Climate Corp (even science code) but code review seldom reveals design flaws. I still think code review is incredibly useful. It makes all of us better coders. It just isn't everything.

Code reviews focus on small things because they occur after the code is written. Major design changes at that point are quite expensive.

I've worked on memory and performance optimizations on algorithms written in Java by scientists, and I'd definitely agree - code review is really needed. Some things are better handled while the design of the algorithm is taking place.

I love seeing the work of other coders. I learn from them. If I make substantial review, they learn from me. If my review seems to be off, they get challenged. I say yes It's a good habit and a useful habit when fairly used.

Code reviews add a layer of bureaucracy and greatly slow down software development speed. Because you have to worry about what others think about your code, in addition to whether it actually works or not. Most comments are nit-picking and minor flaws at best. I think code reviews are beneficial for junior engineers, but should be avoided by everyone else.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact