
When TDD Doesn't Work - jodosha
http://blog.8thlight.com/uncle-bob/2014/04/30/When-tdd-does-not-work.html
======
Nursie
IMHO TDD, like a lot of the agile stuff, is a good idea with solid foundations
that people get wrong all the time and end up making things worse with.

Agile was supposed to ease up on process and make teams adapt to changing
requirements. It wasn't supposed to use up >30% of your working time just to
service the methodology, but that's what it ends up doing when you get in the
Agile Evangelists.

TDD was supposed to ensure more correct software at the cost of some overhead
(perhaps 30%?) by making sure every unit had its tests written ahead of the
code. In practice I've seen it kill productivity entirely as people write test
harnesses, dummy systems and frameworks galore, and never produce anything.

A combination of these two approaches recently cost an entire team (30+)
people their jobs as they produced almost nothing for almost a year, despite
being busy and ostensibly working hard all year. We kept one guy to deal with
some of the stuff they left behind and do some new development. When asked for
an estimate to do a trivial change he gave a massive timescale and then
explained that 'in the gateway team we like to write extensive tests before
the code'.

The only response we had for him was 'and do you see the rest of the gateway
team here now?'

~~~
TelmoMenezes
> IMHO TDD, like a lot of the agile stuff, is a good idea with solid
> foundations that people get wrong all the time and end up making things
> worse with.

So one could reasonably suspect that people who "get agile" are just talented
and would be good developers anyway. Occam's razor invites us to assume that
agile has no effect. Are there any scientific studies on the effectiveness of
agile (or TDD), or is this just a homeopathy situation?

~~~
hox
nonsense. I'm not an advocate of large process by any means, but a number of
principles of the agile movement can benefit developers of any skill level or
experience. and more importantly it can help a team collectively more than it
can help the individual. the trick is imposing these principles carefully,
which almost every agile leader I've met fails to do.

I've never seen anything wrong with more communication between stakeholders
and adapting a solution to meet their ever-changing needs.

~~~
TelmoMenezes
> I've never seen anything wrong with more communication between stakeholders
> and adapting a solution to meet their ever-changing needs.

Surely there's a point where more communication starts being detrimental.
Reducing it to the absurd: if you spend 100% of the time communicating then
you have no time left to actually build the thing. So there's a trade-off, as
usual.

It could also be argued that ease of communication to deal with "ever-changing
needs" encourages more superficial requirements and less deep thinking about
the actual problem, leading to wasted effort and lower quality results.

Maybe you are right, maybe the above paragraph is right. I don't know and
neither do you. Replying "nonsense" is not really an argument.

------
yanowitz
Except for this statement, beware of absolutism in statements of How To Do
Software Development.

The specifics of this debate are kind of uninteresting because of the
(general) lack of nuance from various sides, albeit all informed by their own
lived experience.

OTOH, the recurrent reality of <insert topic> debate in our industry _is_ very
interesting.

I think it's some combination of:

* a bunch of problems are still unsolved

* software is so powerful that sub-optimal solutions are usuallly Good Enough

* industry amnesia, driven by developer/engineer turnover

* the relative infancy of the industry, especially as a function of the rate of change (I'm not sure how you would normalize for rate-of-change, social structure and communication speed, but it would be interesting to compare these debates to medieval guilds in Europe).

* ???

To take up the first two above:

Things are better than they used to be -- as late as the 90s, code reuse was
still an unsolved problem. Of course, code quality is still hard--we are
reusing broken code, but at least we "only" have to fix it once.

I think it's hard to overestimate the importance of Good Enough as a factor in
these recurring debates. Everyone can be right from the business's point of
view--tons of money is still being saved. Once you get past the initial ramp
of a company, how to structure for continuing velocity of a team and make
headway in your chosen market(s) seems like a different optimization problem
than what got you there (again, not a new topic!)

Just some partially formed thoughts...

~~~
praptak
About code reuse - here is some disagreement about whether it is a good thing
in the first place: _' I also must confess to a strong bias against the
fashion for reusable code. To me, "re-editable code" is much, much better than
an untouchable black box or toolkit. I could go on and on about this. If
you’re totally convinced that reusable code is wonderful, I probably won’t be
able to sway you anyway, but you’ll never convince me that reusable code isn’t
mostly a menace.'_ This is from Donald Knuth:
[http://www.informit.com/articles/article.aspx?p=1193856](http://www.informit.com/articles/article.aspx?p=1193856)

~~~
jarrett
I have to wonder if he was thinking about, for example, a reusable
implementation of a hash table. And if we was, why in the world _wouldn 't_ he
want that. Running with the hash table example: I use them many, many times a
day, and if I had to reimplement them every time, I'd be sunk. Just a few
moments ago, I wrote a script to check for duplicate entries that looked
something like this:

    
    
      seen = {}
      elements.each do |element|
        if seen[element]
          raise "Duplicate element: #{element.inspect}"
        end
        seen[element] = true
      end
    

It's quick and dirty and isn't a shining example of architecture. But it found
my duplicates and let me move on with my day. But what if I didn't have a
reusable hash implementation to lean on? Would I even have attempted to write
that script? Or would I have done my duplicate checking manually, wasting
about an hour?

~~~
sheriff
Looks like Ruby. Why use a Hash instead of the built-in Set class?

------
grandalf
While DHH's rant has spawned an interesting discussion, it feels to me like
he's arguing in reverse in defense of his framework.

Many Rails apps are tightly coupled, and many unit tests written by developers
using rails test 10% program logic and 90% framework features.

Of course this is going to be slow. We can argue about hacks to make it faster
but at a certain point it's a problem whose solutions start to distract us
from solving the important problems.

If you have a web app and get to write a single test to determine whether it's
safe to deploy, that test would be an integration test.

The decision to write tests more granular than integration tests is a decision
to be made based on assumptions about the rate of change of components of the
system.

TDD is tangential to the above observation.

There are many cases where an implementation is easy to figure out (though
possibly time consuming) while the optimal interface design is less obvious.
TDD can be really useful to quickly iterate interfaces and verify that all the
moving parts work as expected together, before worrying about the
implementation details... This makes it possible to work on a larger system
with more focus on problem solving, fewer mistakes, and less overall cognitive
load.

~~~
jshen
It's not slow to test a rails app. See DHH's latest post.

~~~
insensible
In which he enumerates measurements of what are quite slow test runs by TDD
standards.

~~~
jshen
And TDD is of no use to me if it expects me to run my full test suite whenever
I change a line of code. If my code isn't decoupled enough to run a subset of
tests for a small change then my code is bad so there is no good reason for
TDD to demand I run the full suite every 5 seconds.

~~~
grandalf
It depends on how coupled your code is. Pure ruby application logic is really
fast to test, so there is no reason not to test _all of it_.

But due to the high degree of coupling of logic to AR model code, callbacks,
etc., you end up testing much of Rails along with your own logic, which makes
the tests slow.

If your "business logic" is just Rails associations and one or two callbacks,
then arguably (and this is what DHH is arguing, I think) it probably doesn't
need to be unit tested. Other parts of your app are far more likely to break
or behave unexpectedly.

However if you are doing any kind of nontrivial software design, unit tests +
TDD can be extremely helpful.

~~~
jshen
I don't think a full test suite that takes 4 minutes is slow, and I don't see
much if any value in adding a bunch of abstractions and indirection for the
purpose of running the full suite every 5 seconds. I get 99% of the value by
limiting my test execution to the test file for the chunk of code I'm editing
and occasionally running the full suite. Running the full suite after each
edit is of very little value.

------
programminggeek
I have a really crazy idea on how you could test the GUI in a way that would
both save time and provide something more valuable than just testing true ==
true.

My idea is to create tests that take screenshots and do a diff of them over
time. You then could set a change % threshold that would signal a test
"failure" signal. Your QA team could then run through that process and see
that something significant changed. Maybe that is fine, but maybe that is
hugely unintended.

Having a Time Machine of screenshots of different processes, you could compare
changes easily and see if they are worth further investigation. For example,
this would be useful if you change some CSS or JS for just one page, and it
ends up breaking another page.

The key point of this system is not that it would tell when your system is
broken, but rather that there was a significant change that occurred that
might have broken something. It's not a substitute for human analysis or
thought.

Is anyone doing something like this and would it be useful to anyone else?

~~~
DougWebb
My company's QA team set up a system like that around 1999. There were far too
many false-positives, because the GUI intentionally changes all of the time
during development. So instead of testing functionality, the team spent all of
their time updating screenshots. The worst was when we made a very simple
style change that affected every page in the application; they'd have to redo
every single screenshot instead of just doing a 5-second test that results in
"Yeah, the banner is the right shade of blue now, and I know it's used on
every page."

------
lnanek2
This isn't really true. At smartphone OEMs we certainly do have boxes to put
the devices in that perform physical tests like on the touch screen,
microphones, speakers, antennas, etc.. And in mobile development we have UI
automation tests that confirm buttons are certain color, have certain text or
state, the right screens popup when pressed, etc. - heck we have a program
called the monkey that presses everything that can be in addition to the UI
automation scripts. I know the web side of things has Selenium and similar
robots. I think he just hasn't ever worked somewhere where everything is
tested which is reasonable. In many cases you have nothing to do with the OS
your software is running on, for example, so there isn't as much point in
testing beyond what your app outputs to it.

~~~
reedlaw
I think what Uncle Bob is referring to are things like layout, color, and so
on. Of course you can automate browser interactions with Selenium, but you
can't easily catch layout changes, broken UI elements, or regressions. The
only method I know of that can come close is automated screen capture
comparison. But that wouldn't work perfectly and still requires human
intervention to check out false positives.

~~~
dllthomas
_" The only method I know of that can come close is automated screen capture
comparison. But that wouldn't work perfectly and still requires human
intervention to check out false positives."_

It would require human intervention in the case of a failure, to be sure, but
those cases where it can guarantee I don't need to bother looking because my
change didn't change anything visible in those screenshots is potentially a
significant boon.

------
jshen
"Over the years many people have complained about the so-called "religiosity"
of some of the proponents of Test Driven Development. "

And Bob is guilty of that religiosity himself. See here
[https://www.youtube.com/watch?v=WpkDN78P884&feature=youtu.be...](https://www.youtube.com/watch?v=WpkDN78P884&feature=youtu.be&t=58m)

Jump to the 58 minute mark if it doesn't automatically.

------
planetjones
>> So near the physical boundary of the system there is a layer that requires
fiddling. It is useless to try to write tests first (or tests at all) for this
layer.

Maybe I can see what he's trying to say, but I don't think statement alone is
accurate.

For the GUI i.e. at the human boundary of the system, the most value
(especially to stop regression and catch side affects) is often added with
tests e.g. automated tests which perform some user function in the GUI and
assert the results.

Another physical boundary of the system is a database. Writing tests which
cross this boundary add a lot of value too.

I'd favour these tests which hit the boundaries and go over them, over a
codebase with only unit tests and endless mocking any day of the week.

Also these type of tests can be written first. We do it.

~~~
agentultra
Unit tests don't have to test everything... just the units. Integration tests
should be testing the interactions between different modules. And system tests
should be the whole stack top-to-bottom. It ends up looking like a pyramid.

~~~
planetjones
Yes thank you I know the testing pyramid. But testing should concentrate on
what adds value and a zillion unit tests, with everything mocked often don't.
Infact they can be a distraction from the bigger picture. Rather than rules,
methodologies etc. we should TWMS - Test What Makes Sense. Or even better TWAV
- Test What Adds Value. Or DTFTS - Don't Test For Testings' Sake.

~~~
agentultra
If I understood correctly I think that's what Uncle Bob was trying to argue
for in this blog post.

I don't think having 3-4 tests per LOC is such a bad thing. Far more testing
goes into the sqlite codebase and I think it'd be hard to argue that it could
have been better if the developers had stopped wasting their time and
concentrated on what added value.

------
nsfyn55
This article is a breath of fresh air. I can't how many times I've encountered
the dogmatic TDD adherent. I write more tests than anyone I know and what I
have learned is TDD is great except when the cost of TDD outweighs its
benefits. I've seen a dev spend 8 hours fiddling around with Mocha/Chai trying
to test if a button changes color in response to a successful callback.
Sometimes its good enough to click the button and see if it changes color.

------
nimblegorilla
We've seen most of this argument before, but the most interesting (new) part
of the article is the implication that CSS is the final layer between software
and the physical world and thus hard to test. I'm sure the people on the
Mozilla, Chrome, Opera, and IE projects would disagree that CSS is untestable.

It seems Uncle Bob implies that it's ok to skip TDD if you think it is hard to
test something. There are much better reasons for most apps to avoid testing
their CSS. Likewise there are many reasons for some projects to have extensive
automated testing around CSS even if it might be hard.

------
asgard1024
"I have often compared TDD to double-entry bookkeeping."

It always seemed to me, if you were to make perfect, automated tests that 100%
cover your application, you would basically have reimplemented it. (Or in
other words - if you want to check if your calculations are correct, you have
to do the calculations again.) Ideal, fully automated tests are basically
taking two implementations, run them side-by-side, and compare the results.

That's why I am not a big fan of tests, in the sense that there is too much
focus on them in the SW industry, and they seem like a hammer (useful but
overused).

I think there should be more focus on writing the _one_ implementation
correctly. This can be done with better abstractions (e.g. actor model for
concurrency, functional programming, ..) and asserts (programming by
contract), and maybe even automated SW proving. I don't think these techniques
are as popular as testing, but I wish they were more popular, because they let
you write programs only once and correctly.

[Update: To specifically expand on point about asserts, if you can trivially
convert test to assert, why not do it? Unfortunately tooling doesn't support
asserts as much as tests.]

~~~
jdlshore
You're right in a sense; in TDD, you're basically saying a thing, and saying
the inverse of a thing, and then making sure they line up. I find that my
codebases are about 50% test code and 50% production code.

Your implication that this is wasteful is wrong, though. My production code is
smaller when I work this way because I do more refactoring, and for projects
that live longer than a month or two, I go faster.

By the way, TDD is absolutely in the same vein as design by contract and
formal proofs. Both involve saying the same thing twice, in two different
ways. TDD is a sloppier, more practical version of the same basic idea. See
"Worse is Better."

------
williamcotton
A good tool for testing visual interfaces is an image diff combined with a
manual testing process.

Initially the tester will view each screenshot of an application state that is
being tested and set that as "passing". Next an automated test runs and the
latest screenshots are compared to the passing screenshots.

If they are different then the test fails. A manual tester then needs to take
a look at the tests that failed and decide if the test actually failed or if
the changes were supposed to be there.

If the changes were supposed to be there the tester can make this image the
new passing screenshot. Passing screenshots should probably be reset BEFORE
the tests are run. I see no reason why not to just check these images in to
the repo along with all of the other test conditions.

I've been scheming on ways to do video diffs for testing transitions and
animations although I'm not sure if this provides much value. It would be
mostly an academic pursuit.

------
mbrock
There seems to be two aspects to this discussion.

One involves questions of process, enforcing TDD, and whether or not TDD can
save a bad team from producing bad stuff. The other is the question of what
TDD can offer for a skilled team with an intelligent approach to development.

Mixing these aspects leads people to dismiss TDD because they've seen teams
fail by doing TDD in a bad way.

Another question: is there a way to structure software so that questions of
boundaries and collaborators become less troublesome for testing? I think a
promising road is in value-oriented programming without side effects.

Another way to see that: if you need a lot of tedious mocking to test your
unit, maybe the unit should be redesigned to have fewer collaborators, or
maybe you should move the complex logic to a pure function, and so on. Maybe
TDD difficulties are showing us that there is something wrong with how we
write code. After all that's what it's supposed to do.

------
lclarkmichalek
So you're telling me that being pragmatic will result in sensible solutions? I
could do with more blog posts like this!

------
dllthomas
_" So near the physical boundary of the system there is a layer that requires
fiddling. It is useless to try to write tests first (or tests at all) for this
layer. The only way to get it right is to use human interaction; and once it's
right there's no point in writing a test."_

This seems dead wrong. There is probably no way to write tests first in this
environment, but with so many different browsers interpreting your CSS (to run
with preceding example) you need to be aware of when changes in your code
cause changes in rendering that might need to be revalidated and further
fiddled! I _do_ agree that it doesn't fit well with T _D_ D, but it absolutely
can work with automated testing.

~~~
_ikke_
He _is_ talking about TDD / unittests in this case. He is not saying anything
about automated testing in general.

~~~
dllthomas
He is _focusing_ on TDD, but in a few places he clearly generalizes. In the
quoted bit, the _" (or tests at all)"_, and a paragraph down we have,

 _" Anything that requires human interaction and fiddling to get correct [...]
doesn't require automated tests."_

Again, I believe I agree with the narrower case of TDD. I was objecting to the
generalization.

------
DanielBMarkham
<standard TDD comment>

 _"...software controls machines that physically interact with the world..."_

See, that's not always true. I would love it if all software interacted with
the outside world. But a lot of software doesn't interact -- just take a look
at some of that code sitting in your repository sometime. Some of that isn't
deployed, isn't being used. You could test that until the cows come home and
have a whole bucket full of nothing.

Because the Bobster and the other TDD guys are correct: you gotta test to know
that the code is doing what it's supposed to. Testing has to come first. In a
way, the test is actually more important than the code. If you get the tests
right, and the code passes them, the code itself really doesn't matter.

Where we fall down is when we confuse the situation of a commercial software
development team working on WhipSnapper 2.0 with a startup team working on
SnapWhipper 0.1. The commercial guys? They are working on a piece of code with
established value, with a funding agent in place, with a future of many years
(hopefully) in production. Everything they create will be touched and used
over a long period of time. The startup guys? They've got a 1-in-10 shot that
they're alive next year. Any energy they put into solving a problem that
hasn't been economically validated is 90% likely to be wasted.

Tests are important, but only when you're testing the right thing. The test
for the startup guys is a _business_ test, not a code test. Is this business
doing something useful? If so, then just about any kind of way of
accomplishing that -- perhaps without any programming at all -- provides
business value.

That's a powerful lesson for startup junkies to assimilate. In the startup
world, you don't get rewarded based on the correctness or craftsmanship of
your code. You're looking at one or two weeds instead of realizing the entire
yard needs work.

Put a different way, we have Markham's Law: The cost of Technical Debt can
never exceed the economic value of the software to begin with.

</standard TDD comment>

------
robmcm
I have never found it to work in visual/interactive development. A lot of the
time you are working on something you evolve as you develop, try, iterate
again.

I can see it's benefits if you have a simpler I/O for your code.

------
pornel
TDD for CSS is indeed an odd concept, but it's possible to do automated CSS
regression testing:

[http://tldr.huddle.com/blog/css-testing/](http://tldr.huddle.com/blog/css-
testing/)

------
harel
As a non religious person, there's one thing I don't get in the whole to TDD
or not to TDD debate that's ongoing now.

Does it matter if "TDD says this or says that"? Aren't these methodologies
more of 'suggestions' for us to adopt as it fits our needs, while trimming the
stuff that doesn't? Once you adhere to a methodology religiously you lose the
flexibility and pragmatism that methodology intended to give you. It just
becomes systematic Dogma following of a rule book, like any religion.

------
dllthomas
_" How can I test that the right stuff is drawn on the screen? Either I set up
a camera and write code that can interpret what the camera sees, or I look at
the screen while running manual tests."_

Or screenshots, of course, which is still relying on code obviously but
probably not _your_ code. Of course, defining what you're looking for in a
screen shot is going to be nontrivial unless you're doing simple check that it
hasn't changed from the last manually-approved version or something.

------
platz
I am reminded of this somewhat recent discussion when TDD doesn't work (which
I believe uncle Bob responded to as well)

[https://news.ycombinator.com/item?id=7130765](https://news.ycombinator.com/item?id=7130765)

------
tempodox
Why is it that we need articles like this to tell us that there is no silver
bullet and we had better use our brains instead of the Methodology Du Jour?
Sometimes, HN just makes me want to cry...

~~~
nsfyn55
I think critical programmers from every generation back to Ada Lovelace have
made this observation. When the solution has started costing more than the
problem...stop.

------
dllthomas
_" However, software is different from accounting in one critical way:
software controls machines that physically interact with the world."_

Doesn't accounting?

------
pjmlp
So the acceptance that there are lots of scenarios where one cannot write
tests first.

A good post for the TDD evangelists I have met so far.

------
raverbashing
"But if I want to be sure that the bell rings when the proper signals are sent
to the driver, I either have to set up that microphone or just listen to the
bell.

How can I test that the right stuff is drawn on the screen? Either I set up a
camera and write code that can interpret what the camera sees, or I look at
the screen while running manual tests."

I have nothing else to add

~~~
tonyarkles
Or you programmatically capture the screen output and compare it to a known-
good capture. Not the best solution, but it's a solution that I've seen used.
99% of the time, the screen output is the same, and occasionally it'll be
different and require human intervention to determine whether the change is
right or wrong.

Getting notified when things aren't what you'd expect them to be is pretty
valuable.

~~~
JabavuAdams
This can work well for form-like things, but how would you do this for a 3d
game, e.g.

Hmm. Computer vision project #346.

~~~
fixermark
[http://www.sikuli.org/](http://www.sikuli.org/)

We used this in a videogame engine test framework to verify specific game
states against gold-master images. It was particularly useful for verifying
when our physics engine had changed in subtle ways; one of our tests involved
dropping a couple of boxes on top of each other and verifying where they
landed.

All kinds of stuff would disrupt that test---which made it great for knowing
when we'd changed something subtle that would have real impact on our game
engine's users.

~~~
raverbashing
Yeah, this is complicated, it may have changed but it may still be correct

like a value of 15.000002 instead of 15.000013 but errors may accumulate in
the end.

