
The Cost of Skipping TDD and Code Reviews (2016) - nreece
https://medium.com/javascript-scene/the-outrageous-cost-of-skipping-tdd-code-reviews-57887064c412
======
Morendil
I'm a big fan of TDD, but I quit reading this article the second my eyes came
across "IBM System Sciences Institute" and the chart that accompanies it.

This is by now one of the most thoroughly debunked memes in our profession. I
wrote a couple chapters about it:
[http://leanpub.com/leprechauns](http://leanpub.com/leprechauns) . I wrote a
handy little guide for people to know just how much BS was involved in any one
citation:
[https://plus.google.com/+LaurentBossavit/posts/aNKut1QV8pT](https://plus.google.com/+LaurentBossavit/posts/aNKut1QV8pT)

If that wasn't enough, there is now an actual negative research result on the
so-called defect cost increase:
[https://arxiv.org/pdf/1609.04886.pdf](https://arxiv.org/pdf/1609.04886.pdf)

It should be quite clear by now that this type of argument doesn't do TDD any
favors, it taints it by association with intellectual dishonesty.

~~~
mannykannot
I hung on a bit longer, but lost any hope of seeing something relevant when he
launched into an imaginary case study. Later on, we find him mentioning an
actual study (of code reviews), only to dismiss its results as being
implausible to him.

The author appears to have drunk so deeply of the dogma that he has no idea of
how to make an objective case.

~~~
whipoodle
I thought it was funny he waved off the code review study as well. If the
studies don't actually mean anything then why can't we similarly disregard the
one that says TDD is good?

------
PhilWright
A multiple of 100x the cost to fix a bug in production compared to development
is very context dependent. These examples always assume your working on a
large project with many developers and complex testing and deployment
processes.

But there are many developers, like myself, that work on relatively small
projects in small teams. The multiple is more like 5x or less for these
scenarios. My current project is developed by two of us and deployed to just a
couple of dozen users inside our company. I can walk to a users computer, be
shown the bug, fix and test it on my machine and deploy it in minutes. Do I
need TDD and Code Reviews? Will it pay back all the extra time? I would argue
not. One size does not fit all. Sometimes it is appropriate but sometimes not.
I hate this type of article, claiming their way should be adopted by everyone
all the time. Context is everything!

~~~
cjsuk
I have found that small teams in small companies are more vulnerable to
critical failures personally. That doesn't mean that the TDD mantra needs to
be followed word for word but covering your core functionality with test cases
allows you to manage risk and getting another set of eyes on the problem tends
to pick up obvious problems you are blinkered to. It also enables large
changes to take place effectively whic is a big problem when you have very few
people working on a project. Without coverage you end up with something people
are afraid to change which means every fix is snowballing it into a muddy mess
because the shortest path is taken, not the most architecturally sound
solution.

While defects happen, it's massively more expensive to handle them with the
end user even if they are in the same office. You're costing their time as
well as yours.

I'm middle of the road but I'd rather sleep easy knowing my ass is covered at
least.

This experience comes from watching a company fail miserably due to their
internal software failing. It cost a rewrite in the end which the company
couldn't foot the bill for so spiralled into decline. Massive cock up and bad
failure mode but a real one I'm afraid.

------
zxcmx
Not buying the dogma.

OK, having tests and code reviews sure, but spec-first has generally beat
test-first in my experience.

Not "design the whole thing waterfall-style" specs, but actually just a wide
ranging discussion on what the thing is more or less supposed to do. Why?
Because it's not just you! The way it works is about your team, and the
business and the context of that code. And 5 bazillion test cases aren't
actually the best way to communicate this to all the stake-holders involved.

~~~
knocte
The best way to define a spec in the most unambiguous way is actually writing
tests.

~~~
eesmith
I disagree. Consider this counter-example, "This function takes a float64 'x'
where 1.0d <= f <= 100000.0d and returns a float64 'y' which is the square
root of 'x' to within 10ulp."

Tests can verify that the implementation with certain inputs is correct. If
done incorrectly they can even over-specify the spec, like asserting that
sqrt(16.0d) exactly equals 4.0d when it should also allow 4.000000000000001

But I don't see how they are less ambiguous or why they are better.

(As a real-world example, look to the Pentium FDIV bug.)

~~~
Ace17
Tests are _formal_. Which means there's a lot less room for ambiguity, and
that you can check an implementation for nearly zero-cost.

I'm sure you can guess what the following function does by only looking at the
following (incomplete) test suite:

assertEquals([], f([], []));

assertEquals([1,2,3,4], f([1,2,3,4], []));

assertEquals([1,2,3,4,5,6,7,8], f([1,2,3,4], [5,6,7,8]));

assertEquals([1,2,3,4,5,6,7,8], f([1,3,5,7], [2,4,6,8]));

assertEquals([1,2,3,4,5], f([1,2,3,5], [4]));

... now, imagine the same test suite, but with the name 'f' replaced with
'mergeSortedLists'.

By the way, I don't get how your example actually proves your point: you're
basically saying that tests "done incorrectly" could be harmful. So what?

~~~
eesmith
It appears that your f is:

    
    
      def f(x, y):
        return sorted(x+y)
    

However, it could also be the more memory efficient in-place version:

    
    
      def f(x, y):
        x.extend(y)
        x.sort()
        return x
    

It could even be:

    
    
      def f(x, y):
        return list(set(x).union(y))
    

... and it turns out I was wrong. It's:

    
    
      import heapq
      def f(x, y):
        return list(heapq.merge(x, y))
    

I only figured that out when I read the name "mergeSortedLists". It's the text
description which clued me in to what it was supposed to be, not the test
cases.

I'm saying that test cases only verify specific data points. They don't define
what happens across the entire range of inputs. I'm sure you _can 't_ guess
what the following tests verify:

    
    
      assertEquals(f(1), 1)
      assertEquals(f(2), 2)
      assertEquals(f(3), 3)
      assertEquals(f(4), 4)
      assertEquals(f(5), 5)
      assertEquals(f(6), 6)
      assertEquals(f(7), 7)
      assertEquals(f(8), 8)
      assertEquals(f(9), 9)
      assertEquals(f(10), 10)
      assertEquals(f(11), 11)
      assertEquals(f(12), 12)
      assertEquals(f(13), 14)
    

Now replace this exemplar-based specification with "f(i) returns the i^th
11-smooth number: numbers whose prime divisors are all <= 11; i>= 1".

What test suite gives a better definition of what f is supposed to do than the
one-line text-based specification?

(The OEIS says those tests could also be the divisors of 27720, the "Paradigm
Shift Sequence for a (-4,5) production scheme with replacement", "Numbers all
of whose prime factors are palindromes", or even the positive "Numbers in
decimal representation, such that in German their digits are in alphabetic
order", and more.)

------
seanwilson
I'd like better stats on why you should write tests first. You can't test
everything anyway and writing all your tests first eats up time when your
architecture and code is still in a high state of flux because you have to
keep refactoring your tests.

I've worked on TDD projects before that had huge numbers of tests for the
smallest pieces of behaviour and refactoring them can be such a hassle you
sometimes avoid it (but to be fair you knew when you broke something). The
cost of writing some tests sometimes isn't worth it either when you weigh up
how likely those tests are to catch a bug that wouldn't be caught by an
integration test or during normal QA and the impact of that bug.

Strong typing to cut down on writing tests is also much more preferable to me
when it's an option.

~~~
ygra
I've found TDD to be useful when you're implementing an interface outside your
control, so the interface you test against is fixed and so are its semantics.
This basically means that the only thing in flux is the internal
implementation, and it's easy coming up with a fairly complete set of tests
beforehand.

In other cases I pretty much write my tests more or less randomly before, at
the same time, or after the implementation.

~~~
seanwilson
> In other cases I pretty much write my tests more or less randomly before, at
> the same time, or after the implementation.

Similar most of the time. Once I know the interfaces have settled down and the
problem you're solving is now well understood it feels like a better time to
write tests. Once you have a feel for which parts are most error-prone as well
you can concentrate on tests which have the most benefit.

------
heisenbit
Of course catching bugs early is much, much better and well executed code
reviews and TDD are the way to go.

But what do you do if your team struggles to get the basics right? When asking
them to write tests results in doubling the code base with trivial module
tests finding no bugs and only preventing the all too necessary refactoring?
Slowing the team down so much it has to be scaled up? What if the software is
not intended to live long?

What if manual tests are the fastest and most reliable way to establish
feedback?

~~~
clu3l355
Sure if you run manual tests once they might be faster than test
implementation, but having an automated test suite protects against
regression. Having to do a full manual test for every change is going to
consume a lot of time.

------
fulafel
"According to the IBM System Sciences Institute, fixing a production bug costs
100x more than fixing a bug at design time, and over 15x more than fixing a
bug at implementation time"

For most common software bugs, a design-time fix would imply a significantly
heavier, verification based design and specification process. Comparing this
cost to implementation/production bug fixes doesn't seem straightforward.

~~~
keithnz
Unfortunately there seems to be no reference to the study and how it got to
its conclusion. I know in the embedded world, that factor can be a lot higher,
in the web world, it can be a lot lower.

Also this article multiplies a number of stats together, but It doesn't look
to me that the stats referenced are super solid. In fact it seems the stats
are all over the place trying to measure the benefit of TDD and live in the
realm of "That's really interesting, further research needed"

~~~
fulafel
Googling more, the same vaguely sourced "IBM System Sciences" figure seems to
be passed on from one poorly researched web post to another. The trail went
cold but it seems to date back to the mainframe IBM of the 1970's or 1980's.
Certainly it's from a different software engineering culture than today's
continuous delivery processes.

A promising looking current review of the question is here:
[https://link.springer.com/article/10.1007/s10664-016-9469-x](https://link.springer.com/article/10.1007/s10664-016-9469-x)
(PDF link:
[https://arxiv.org/pdf/1609.04886.pdf](https://arxiv.org/pdf/1609.04886.pdf))

They conclude: "We checked for traces of this effect in 171 projects from the
period 2006–2014. That data held no trace of the delayed issue effect. To the
best of our knowledge, this paper is the largest study of this effect yet
performed." ... "Our results beg the question: why does the delayed issue
effect persist as a truism in software engineering literature? No doubt the
original evidence was compelling at the time, but much has changed in the
realm of software development in the subsequent 40 years."

------
Ensorceled
I've seen productivity gains from unit testing and no gains when adding TDD to
a process and team already invested in unit testing. Anecdotally, in my
experience TDD crystallizes sub-optimal implementations early on, especially
for junior developers. Unit testing allows fearless refactoring but TDD makes
developers less likely to refactor in the early stages where it's most
beneficial.

Gains from code reviews are hard to quantify because people code differently
when they know they are being code reviewed. Most of the gains come from
simply announcing that code will be formally peer reviewed.

Critical bugs can cost 100x to fix since the out of band release costs can be
so high. Non-critical bugs just get added to the next sprint and the extra
cost is in the support costs, if any.

------
jorgeleo
Lack of understanding of the business model produce bugs that are much more
costly to fix, and TDD cannot catch those.

TDD has a place, and finding bugs in production can range from annoying to
flat out dangerous depending on the system, but the primary reason for failed
IT projects is not bugs, but lack of soft skills, lack of proper business
communication. Check [https://www.cio.com/article/3211485/project-
management/why-i...](https://www.cio.com/article/3211485/project-
management/why-it-projects-still-fail.html) and see the root of the problems,
and notice that the proposed solution does not point to solve the root
problems, instead they keep tooting dogmas

------
ivanhoe
I don't understand why people keep on painting it black or white, like either
you use TDD or no tests at all? You can implement a feature and then write a
test afterwards to test if it works as planned. It'll work just as good as
TDD. IMHO the main benefit of tests is when refactoring at a later time, while
during the initial prototyping and developing process they're usually just
telling you the obvious (you didn't return the expected value... yeah, I know,
I didn't finish the freakin' method yet, give me a break..)

~~~
eesmith
Because if you are for a certain approach then it's easier to put all
alternatives in the same category, then pick one as the antithesis, show that
your approach is better, and conclude that everything which isn't your
approach is worse.

You'll see that it's mostly TDD people who treat things as either "test first"
or "test last", where "test last" occurs after _all_ development.

As another example, agile people say there's "agile" and "waterfall", even
though there are many software development approaches other than those two.

~~~
marcosdumay
Or, in other words, those people are shooting for political dominance and
don't care about truth.

I am quite sure it is an inconscient bias, but it's tiring to keep listening
those dishonest arguments everywhere.

------
rockostrich
Assuming the bug is just a logical one, if the time to fix it in production is
15 hours then there is a serious problem with your team's release workflow. He
estimates 1 hour to fix it while developing so that leaves 14 hours of code
review, building, and deploying. If it takes you an hour to find/fix the bug
then I would assume it would take 30-60 minutes for people to review and OK
the fix (if it's an urgent bug then this shouldn't even take that long). Once
it's merged, if it takes longer than an hour to build/deploy the code then I'm
really sorry for you. Maybe I've just been lucky to have worked on teams where
CI/CD was very important.

------
ljf
Really liking and pushing for trialing of ATDD in my current role - Acceptance
Test Driven Design:

[https://en.wikipedia.org/wiki/Acceptance_test%E2%80%93driven...](https://en.wikipedia.org/wiki/Acceptance_test%E2%80%93driven_development)

------
a_imho
[https://news.ycombinator.com/item?id=3033446](https://news.ycombinator.com/item?id=3033446)

------
trapperkeeper74
Untested code.. even if you're writing a kernel, you've got to find a way to
mock/stub/double/fake your way into unit testing, smoke testing and
integration testing. Cucumber isn't a requirement, but having appropriate
layers of assuring production code is correct is fundamental to sustainable
software engineering.

------
niahmiah
Using something like [https://flow.org](https://flow.org) goes a long way too.

