
Why code review beats testing: evidence from decades of programming research - kevinburke
http://kev.inburke.com/kevin/the-best-ways-to-find-bugs-in-your-code/
======
bankim
In my previous job at VMware I've found code reviews to be very useful. Apart
from finding bugs at early stage as pointed by this article:

\- Code reviews are an effective means of knowledge transfer among engineers.

\- It makes engineer more conscious while writing code since the patch will be
read by others and hence less likely to cut corners/put hacks.

\- Since the patch should be understood by others, author needs to ensure the
code and the change/commit description are well documented.

\- It lets your co-workers/manager know about the amount of progress made on a
task.

~~~
dekz
Completely agree with the above but would add:

\- Provides a mechanism for developers to receive feedback to promote learning
and improve code quality.

------
100k
I gave a talk a couple of years ago on this topic (I read Code Complete and
some of the same papers referenced). My goal was to get people to think more
about doing things _in addition to_ developer testing to find problems.
Personally, I've found (cheap, informal) usability tests to be pretty amazing
at finding fundamental problems with software -- the kind of stuff that
doesn't show up in a unit test. After all, 100% test coverage can't tell you
if your app sucks.

<http://www.infoq.com/presentations/francl-testing-overrated>

<http://railspikes.com/2008/7/11/testing-is-overrated>

[http://www.scribd.com/doc/8585284/Testing-is-Overrated-
Hando...](http://www.scribd.com/doc/8585284/Testing-is-Overrated-Handout)

------
tarekayna
Comparing code reviews against software testing is a bit odd to me. They are
both necessary in a professional software environment.

Code reviews are great to ensure a consistent code base that follows a certain
coding standard. It is also great for passing on the knowledge between
developers (avoiding common pitfalls, better ways of doing things, etc...)

Testing is such a big field that it is hard to compare directly with reviewing
code. Manual tests help in verifying the user experience and end to end
scenarios, esp for UI (something that code reviews can't do). Automated
functionality tests are great in verifying daily builds and catching
regressions (which is also very hard to do in code reviews in fairly
complicated systems). Various other types of automated tests are necessary for
different considerations: performance tests, security tests etc...

~~~
narag
The first paragraph of the article says exactly that: you need to invest in
more than one of testing, reviews and QA, and different techniques may detect
different types of errors.

------
jbrechtel
Given how much more popular automated testing and TDD have become in the past
7 years since that book was published (and much longer for some of the sources
cited for those data), I'd be very surprised if we haven't gotten much better
at that particular form of testing.

Also I'm quite curious how you can actually count the number of bugs found
from each of these methods. Test-after unit testing would even be preventative
if you wrote tests immediately after writing the code. If doing that made you
think about more edge cases and thus fix up the code then it seems like those
bugs would not get counted.

Seems like many of these would have different amounts of a preventative effect
that would be difficult to measure....especially if you wanted to isolate
their effect from each other.

~~~
kevinburke
As I understand the paper, the researchers tried both asking programmers to
use certain techniques to debug a program with known bugs, then measured the
number of bugs they caught against the known total.

In addition they went to larger companies, checked which bug detection
techniques they were using and how well each one detected bugs.

I have the gated PDF, email me if you want a copy.

------
rgraham
To me, unit testing is a way to prevent regressions and try to enforce
contracts. Because the same developer usually writes the tests and the code,
they have 'the curse of knowledge' and seem to me unlikely to find a lot of
bugs this way.

That said, preventing regression is tremendously important. Especially after
some turnover on who works on the code.

------
godarderik
"It’s well known that it’s more expensive to find bugs early on in program
development."

I think that's a little backwards, shouldn't it say less expensive?

~~~
kevinburke
Yes it should, nice catch, it's fixed now.

~~~
currere
What I like about this is that by writing it the wrong way round you have
collected plenty of evidence that your intended meaning is indeed well known!

------
modeless
Number of bugs found seems to me a poor metric. Some bugs are much (much) more
important than others, and different bug-finding methods will find different
kinds of bugs.

~~~
kevinburke
There have been a number of papers showing that total number of bugs is a
pretty good heuristic for code quality.

I agree that it's hard to tell which is picking out more important bugs. My
guess is that less prominent papers by the same authors get into the topic in
more detail.

------
discreteevent
Be careful about the statistics that are quouted. It may be the case that if
they are from code complete that the code that was reviewed was C or C++. In
my experience formal code inspection does catch a fair number of bugs in C or
C++ especially where the developer is not that experienced. In my experience
this is much less the case with other languages.

~~~
kevinburke
It may be even worse than that... one study was using FORTRAN.

------
melling
Modeling or prototyping scored high and sounds interesting. For my Android and
iPhone apps, I've started to add some unit tests in the hope of getting some
benefit months down the road. I'm doing all the development myself and I'd
like to reduce the testing effort before each release.

Anyway, has anyone applied "modeling" or "prototyping" to mobile development?

~~~
kevinburke
To be perfectly honest I'm just quoting Steve McConnell there and I have no
idea what those terms mean. I would appreciate it if someone could provide
some insight.

~~~
zoul
Problem definition is a part of the problem solution, especially in software
design. To prototype is to partially solve a problem to find a better
definition of it. I guess modelling is simply a synonym or a slightly
different approach with the same results.

------
tylerneylon
Most of this content is already in Code Complete.

There are two anti-bug ideas I strongly agree with: bugs are found by use of
the code, whether through testing or actual use; and that great unhindered
programmers tend to write solid code. It's very logical that bugs will be
found by usage - this is basically the test-driven reasoning. As far as
programmer quality, it is also logical, but it is underemphasized in the
article.

I distrust articles that emphasize one-size processes for software
development. In the end, there's talent and something like culture. Government
agencies that can't go out of business follow policy, and provide results just
above the legal required standards. Old companies with seniority-based
hierarchies feel similar to code monkeys. A small company with respectful,
productive, and creative builders is ripe for a culture of sincerely-desired
quality. This is what I mean by the importance of culture over process.

~~~
HeyLaughingBoy
I agree wholeheartedly about the importance culture over process. I live in a
process-heavy environment, but I still think that culture matters more.

Where process definitely helps is in finding those bugs that _can't_ be found
"by use of the code." i.e., the bug will be experienced by the user, but it
may be invisible to the developer. In just the last year, we've uncovered two
nasty race conditions that have been in our codebase for almost a decade, but
only just started to show up because some unrelated code changed the timing of
certain behaviors. Even having knowledge of the problem (thank heavens for
good log files!!), we could not design tests to verify that the bugs were
fixed: the windows of opportunity were just too small. Code inspection was the
only way to verify that the fix matched the problem.

------
GlennS
There's an assumption that there is a formal specification, and that it itself
has no bugs in it:

'Giving programmers a specification, measuring how long it took them to write
the code, and how many bugs existed in the code base'

------
shaggyfrog
I'm on my iPhone and so the material is a little hard to read, but from my eye
the chart separates all texting into four segments and has just one for formal
code testing. The thesis tends to rely on this graph to make its point. It
seems to me this is incorrect; rather, code review beats the best _single_
method of testing. In aggregate testing is hard to beat.

Like another poster said, this was just linkbait. Would rather see a more
reasoned analysis.

------
Volpe
The headline is pure link bait. The article concludes that you should do more
than 'just testing'.

It does make that one statement about code reviewing being faster at finding
bugs, that testing. But it discounts TDDing (i.e it looks at programs that
already have bugs, and times how long it takes the programmer to find them).

So code reviews are faster at finding bugs than testing, as long as you
discount a number of testing techniques.

------
Tommabeeng
It should be noted that code reviews, though useful for catching defects, are
incredibly expensive.

I think there are cheaper ways -- along the lines of automated testing and
_design_ reviews in lieu of code reviews -- to reduce risk and defects, and
obtain high quality software, than to spend such massive time/$ on code
reviews.

~~~
HeyLaughingBoy
They are, but having users find them is even more so.

We use design reviews, code reviews, unit tests, integration tests and final
validation tests. Our bug rate is well below 1/kloc, but bugs still get out
the door.

In the end it depends on how you calculate Cost of Quality. In some
environments, having a customer experience a bug can have disastrous
consequences, in others it's not a big deal at all. We're in the former
category :-(

------
Semiapies
I'm a little amazed by an "X beats Y" post showing up that's thoughtful, well-
argued, and acknowledges value in both X and Y. Or one that goes beyond X and
Y partisanship to _what is the actual goal, here_.

------
ars
I think this chart has cause and effect backwards. Places with programmers
capable of doing formal code reviews are less likely to have bugs in the first
place.

It's really hard to do a (proper) code review, a programmer who is capable of
it is probably a high caliber programmer (I assume programmers review each
others code).

Also I don't like this sentence: "code reading detected 80 percent more faults
per hour than testing".

If you look at the article that sentence links to you will find that the
problem is poor technique in creating the tests. You can't do "white box"
testing without reading and understanding the code in the first place. And if
their "black box" testing didn't find bugs, they didn't write the test
properly in the first place.

~~~
DarkShikari
_And if their "black box" testing didn't find bugs, they didn't write the test
properly in the first place._

That's practically begging the question: a test isn't a real test unless it
catches all possible bugs? Really, any code for which such a magical test can
be written is probably so simple that it only exists inside Fizzbuzz
interviews.

Years of experience have consistently shown me that 10 minutes of looking over
code by another developer will find more bugs than days of testing. A huge
number of bugs are obvious to the eye, but have high odds of escaping tests.
Doubly so in code where there's no "right" output to test for: as in the case
of _any_ non-optimal optimization function.

This doesn't mean tests are bad. Tests are good for the sort of code that's
easily testable, and large-scale regression testing is needed for any project,
even if unit tests aren't feasible. But testing is no substitute for having
people read the bloody code.

~~~
felipemnoa
>>10 minutes of looking over code by another developer will find more bugs
than days of testing

I find this surprising. Unless you are talking about new developers. Automated
test cases should be more than unit test cases. They should include system
wide test cases as much as possible. i.e. For a game you would write test
cases on the physics engine. The physics engine includes collision detection
and collision response. Of course, you would also have test cases for each of
those modules independently.

~~~
nl
in Code:

    
    
      static final String HELLO = "helo wolrd";
      ...
      out.println(HELLO);
    

in test:

    
    
      input = in.readln(); // yeah massively simplified, I know
      assertEqual(Code.HELLO, input);
    

pass.

But a developer will (should!) find the error immediately.

~~~
boyter
That's a bad test though. Its not brittle but assuming it was written in a Tdd
manner there should be at least two chances for the developer to catch the bug
if they don't use the constant they are testing in the test.

I have no source on this but isn't it bad practice to use constants in your
tests? I personally use a new string with the expected value. It forces me to
look twice and really consider what I am doing.

~~~
nl
It's a comment on hacker news illustrating a point, not a guide of how to
write test cases.

The point is that it is possible to have good test coverage and _still_ have
bugs that are best found by humans.

------
Mc_Big_G
TDD while pairing is even better.

------
jen_h
Headline is deceiving. Article confirms conventional wisdom: good product
requires testing _sieve._

