
Developers, confess why you don't TDD - lancerkind
https://confessionsofanagilecoach.blogspot.com/2017/08/developers-confess-why-you-dont-tdd.html?m=1
======
andreasgonewild
I'm sure the world learned a thing or two from the agile movement. But like
all movements it got twisted into semi-religious caricatures of the original
message, like this post.

Event Kent Beck admits skipping tests when he feels like it, no big deal.

I hereby confess; I don't give a shit about icons, worship and obedience; I am
creating.

~~~
lancerkind
How was TDD presented/introduced to you?

~~~
andreasgonewild
I introduced myself to TDD/eXtreme programming long before anyone was talking
about it, read all the books and preached the ideas. Since then I've had the
misfortune of seeing several fucked up interpretations in the wild, from hip-
shooting startups to ISO-certified Scrum shops; and they all manage to miss
the entire point of the exercise; which is applying common sense; distributing
authority and focusing on the important parts, not least creativity. Agile is
kind of like the Jet Kune Do or Ving Tsun of software development; since it
contains almost no form in itself; it's very easy embrace and extend into
whatever you fancy, and therefore very easy to corrupt.

~~~
lancerkind
I'm curious how much time was spent trying to write test driven code before
deciding it wasn't working for you. Did you succesfully write a few small
programs using TDD (coding katas)? Or did you apply TDD to your project at
work? For how many hours or lines of code did you try TDD before deciding it
wasn't going to work?

------
eesmith
TDD doesn't really seem to be useful, at least, not as clearly useful as its
proponents generally advocate.

That is, it's generally advocated as a clear win, but of the various attempts
to quantify it, it's a mix at best.

For example,
[https://www.computer.org/csdl/trans/ts/2013/06/tts2013060835...](https://www.computer.org/csdl/trans/ts/2013/06/tts2013060835.html)
says "This paper provides a systematic meta-analysis of 27 studies that
investigate the impact of Test-Driven Development (TDD) on external code
quality and productivity. The results indicate that, in general, TDD has a
small positive effect on quality but little to no discernible effect on
productivity."

It then goes on about subgroup analysis, but we must take that with a grain of
salt as subgroup analysis is more likely to have "statistically significant"
correlations due to random fluctuations.

When I try TDD on my own projects, I find that I lock in the internal API too
quickly, that is, I test against internal functions, because they are easy to
test. Then when I refactor, I find I avoid refactoring which might, for
example, merge a couple of functions into one or otherwise change that
internal API, because of the effort of rewriting the tests to meet the new
API.

I work best with "Spike and Stabilize", that is, a spike solution, flesh it
out, then once the API is mostly finished, develop the tests. I also use
coverage testing to help identify which tests are missing.

Side note: I hate it when TDD people say that TDD code naturally has 100%
coverage. It doesn't. At a process level, the refactor can include "replace
algorithm", but the new algorithm may have different edge cases than the
original algorithm, so require additional tests to ensure correctness.

For example, replace a simple quicksort with timsort, because it's possible to
build a test case which causes quicksort to hit quadratic worst-case
performance. But if the input has 63 elements or fewer then only the insertion
sort part of timsort will be tested. You need a very different test suite to
really exercise timsort, compared to what's needed to test a simple quicksort.

Or for a simpler example, performance analysis may say that the case n=1 takes
90% of the time, so rather than use the slower general-purpose algorithm, the
new strategy is to special-case n=1. However, deeper in the code there was
already special support for n=1, which is now no longer used. There's no way
TDD can identify this dead code.

Has there been any analysis of coverage analysis on TDD-based projects, where
coverage was never evaluated before, to see what the coverage actually is?

~~~
lancerkind
Regarding coverage comment, your right in that code can contain capabilities
(or bugs) beyond the TDD coded conceived. (If they weren't conceived then it
could be a bug or things that don't happen in practical operations.) The TDD
people are right in that if you follow the TDD process, all your code comes up
as covered by a code coverage tool. But this is only coverage. Not all
possible functionality coverage. I imagine "all possible functionality
coverage" is an intractable problem in the same class as the Halting Problem.
Branch/line coverage should be near 100% for TDD code. (Some I/O coupling
areas are intentionally saved for other types testing as they aren't unit
testable.) It's an easy thing to measure. It's a common industry metric. It's
not the whole picture but it is something.

Measuring code coverage on TDD created software that was built without
monitoring the code coverage, then later use code coverage at the end as an
evaluation: I've only a few anecdotal (two) instances where I did this and I
found the code coverage to be well above 95% for the code base and at 100% for
modules that were pure logic. In one of those instances, after many rounds of
defect injection, I discovered one or two locations where another unit test
could be added.

All in all, I've been pretty happy with TDD. But who cares? I want to hear
from the industry why others aren't doing it. :-)

Thank you eesmith, I enjoyed the discussion!

How about hearing some more from others? What's stopping you from doing TDD?
Is it because you don't believe it? If you don't believe it, how was it
presented to you? Did you try it?

~~~
eesmith
"your right in that code can contain capabilities (or bugs) beyond the TDD
coded conceived"

I do not understand what you mean by my being right. I did not say that. As
you wrote it, it's a trivially true statement, and I didn't think what I wrote
was so trite.

I will try to reword it. In the "Red - Green - Refactor" style of TDD you are
allowed to refactor the code so long as the tests remain green. This assumes
there are sufficient tests to catch errors in the refactoring. However, one of
the allowed refactorings is "Substitute Algorithm"[0]. The new algorithm may
have have corner cases which the old algorithm did not have, and so wasn't
originally tested.

R-G-R omits the part where you add new unit tests which are expected to be
green, but added in order to verify that the new algorithm really does work
correctly for new corner cases. It should be "Red-Green-Refactor-Yellow-
Green", where "Yellow" means "do the failure analysis to ensure that the
refactoring didn't add new failure cases." Yet TDD people don't teach that.

And I think the reason why is that this "Yellow" stage is the normal part of
non-TDD programming. If you drop "Red-Green" and just iterate over "Code-
Green-Yellow-Green" (write the code, make sure the old tests work, do the
failure analysis, and make the new tests work), you'll still end up with good
quality code.

This comes back to another aspect of TDD. Suppose that the programmers in a
company are in their 20s and 30s while middle management is in their 50s.
There is an asymmetry as middle management has much more experience in
applying pressure than the younger programmers. Managers, after all, have more
experience trying to manipulate people.

Some programmers in this situation use TDD as a tactic to resist external
pressure to release code with insufficient testing. "The code's done, but it
isn't fully tested." "It's done? Ship it, and we'll deal with the problems
later." With TDD this becomes "we're using TDD because <famous people> say
it's a best industry practice". This gives the programmers a way to
externalize the resistance to releasing poorly tested software.

However, it's also unprofessional. Some code can be lightly tested, perhaps
manually tested instead of a unit test suite. TDD shuts off the dialog about
how to make those trade-offs at the business level, in the name of either
technical purity or an unwillingness or inability to really negotiate.

"if you follow the TDD process, all your code comes up as covered by a code
coverage tool"

No, that is incorrect. I have given two examples of how TDD code can end up
with less than 100% statement coverage, much less 100% branch coverage.

Ken Beck wrote "TDD followed religiously should result in 100% statement
coverage." You backed that off a little with "should be near 100%". But my
point is that these are assertions of faith, which TDD proponents repeat
without evidence.

If they repeat this without evidence, what else are they repeating without
evidence?

I myself have written non-TDD code, then tests, then verified code coverage,
and ended with 98% coverage. (I wasn't able to test malloc() failures, though
I coded in handlers for them and manually exercised those code paths to make
sure they worked.)

[0] As Fowler originally described it, "Substitute Algorithm" is used "to
replace an algorithm with one that is clearer." However, if a new test appears
to hang, and analysis shows it will take several days to finish, then you may
need to use a more complex algorithm with better scaling performance. If this
isn't a refactoring, then what is it? Others, like
[https://refactoring.guru/substitute-
algorithm](https://refactoring.guru/substitute-algorithm) , don't require that
the new algorithm be simpler.

~~~
lancerkind
I'm interested in your test later approach. When you're attempting 100% branch
coverage, do you sometimes find writing the test later causes redesign of the
product code? On average, how many days does it take to complete a cycle of:
write the code and then write automated unit test code? What ratio of that
time is spent doing test automation versus writing production code?

~~~
eesmith
I rarely try for 100% branch coverage. I find that the extra tests aren't
worthwhile.

I don't even try for 100% statement coverage. I use coverage to help identify
important missing test cases.

For example, some of my code will read/write a binary format of my own
devising. The format is used in a "friendly" environment, so I don't have to
worry about malicious users trying to cause failures. (The code is in Python,
which eliminates a lot of the memory error that a malicious user could use.)

I could get 100% coverage by creating corrupted binary files in all of the
different ways to fail. But why go through the work of testing unrealistic
cases?

Instead, I use my experience and intuition to only test the realistic cases. I
also know that my code is not mission critical, so if one of my users reports
a failure, I'll send them a patch or point release.

I can't give you a breakdown of time because I don't track it. My usual
estimate is 1/3 programming, 1/3 testing, and 1/3 documentation, for
commercial-quality packages. I find that writing good documentation helps find
bugs that users are likely to experience.

(Now that I think of it, that binary format took several months to write. The
first two attempts didn't scale and I had to toss them. They had almost no
tests, so think of them as spikes.)

For consulting, it depends on the project. In one two-week project I had no
tests. The control flow was simple and most of the code was executed during
normal use. Plus, the support staff was able to maintain the software in case
of errors, and we did a thorough code review for knowledge transfer.

In another project, I was brought in to optimize some existing code, which
didn't have tests but which had been extensively tested manually. It was
presumed to be correct. The new code I wrote also didn't have unit tests.
Instead, I cross-compared a few million inputs to check that they gave the
same results, at about 10x performance.

In yet another project, I had unit tests for the user-facing parts, but not
for the main algorithm. Rather than explain the program, I'll use an analogy.
Consider a program to factor a number into its primes. This is difficult. On
the other hand, it's easy to test that the result is correct. Multiply the
numbers to see if they give the original number, and use a primality test to
ensure all of the factors are prime.

In essence, I have a #define which enables verification. This also slows
things down, so it's not enabled by default. I then (once again) processed a
few million inputs. This sort of integration testing is much more complete
than manual unit tests.

(I do have a simple unit test used as a burn test, so it's not completely
untested.)

For an extreme case, see [https://randomascii.wordpress.com/2014/01/27/theres-
only-fou...](https://randomascii.wordpress.com/2014/01/27/theres-only-four-
billion-floatsso-test-them-all/) . Why write unit tests when it's possible to
examine all 2^32 possibilities in 90 seconds?

~~~
lancerkind
Thank you for that. I've got the idea now. In these projects, are you
responsible for the maintence and later code changes? Or are these projects
one time deliveries and someone else handles things from then on?

Also, if you don't mind my asking, how many years of experience did you have
in working in the IT industry when first trying your hand at TDD?

~~~
eesmith
I don't think that's an informative question, because the answer is "yes."

Which cases? For software I sell, I am responsible.

For software I write as a contractor/consultant, it depends on the contract.
After all, I'm not going to maintain it forever, for free.

For the two week contract I said I would fix things for 3 months [0], if it
was something I should have caught during development. I don't recall them
asking for any changes, but like I said, we did a code review at the end to
get the two local people up to speed with what it does, and they are good
people. The project used a combination of technologies which they hadn't used
before.

The "yet another project" was a series of work orders across 18 months, where
I did two major rewrites to handle new capabilities. Someone local was also
reviewing the code, doing validation, and making a few patches. Nearly all of
it was me, including maintence and code changes. I've done several projects
for that company for about 5 years.

Again, I'll fix errors that should have been caught during development. On
that last project, which finished in February or so, there was a bug report
two weeks ago which I investigated and suggested two possible fixes. They
chose one and implemented it themselves.

I started programming in 1983 and got my first full-time job in the IT
industry in 1995. I didn't try TDD until around 2007 or so. Our local
programming user's group does code katas. The organizer is an Agile/XP/TDD
facilitator and testing advocate, so I've also learned and had practice that
way. Also, she was the project lead for a project where I was involved as a
contractor, so when we pair programmed [1] we did it as TDD.

[0] Their standard contract said 2 years, which I said was extreme for a two
week contract, but if they wanted that then I would adjust my rates
accordingly. We settled on 3 months.

[1] This wasn't often. I mostly worked on a back-end server component, while
she worked on front-end and middleware.

~~~
lancerkind
Some software languages or platforms are easier to do TDD in than say
proprietary computing "appliances" in proprietary languages. You've mentioned
Python, which is a pretty flexible language and conducive to doing TDD. Do you
do most of your project work in Python?

You mentioned getting bug fix requests. How many times over the year do
requests like this interrupt your other project work? On average, how much of
your effort does servicing such requests (research, meet over them, plan a
fix, then do the agreed upon fix and test the agreed upon fix, deploy the fix)
take? Hours, days,...?

~~~
eesmith
I'm tired of this conversation. I don't see the point of your questions, and I
think I've explained my "confession" quite well enough.

I rarely get bug reports concerning the software I delivered. The fixes almost
always take only a few hours.

