

The End of Bugs? - sant0sk1
http://adam.blog.heroku.com/past/2008/7/6/the_end_of_bugs/

======
akeefer
Having done TDD a bunch over the last five or so years, I couldn't ever go
back to a world without extensive unit tests.

That said, there are interesting scaling problems with tests that I don't feel
like many people seem to write about or know how to deal with; all the TDD
books and sites describe techniques that work with simple, small code-bases
but which often break down in the face of real-world problems. In particular,
writing tests that only fail when the code is broken (and not just because the
test is broken) is often incredibly difficult, and if you have (like we do)
more than 40,000 unit tests, even a small false-positive rate like .1% will
mean that a simple check-in could require hours of test rewriting not because
the code is broken but because the tests are poorly written or make too many
assumptions about implementation details. Even the best, most disciplined
teams create some "bad" tests, and it doesn't take many to start killing you.

Similarly, in a real software environment where a code base lives for years
and undergoes numerous changes, you often end up with stale tests that either
no longer test anything useful or which enforce requirements that have since
been changed.

So while the tests are invaluable and I can't imagine not having them, at the
same time there are a lot of issues with test maintenance over the long term
that become very difficult to deal with.

~~~
gruseom
This and your other comment are obviously coming from real experience using
these techniques on production systems. (In fact, they're so lucid that I went
back and read all your previous HN comments and quite enjoyed them.) They are
refreshing since most of the discourse on unit testing falls into binary
for/against mode (q.v. the OP). I also agree with you about test maintenance.
(That's one reason I believe we can do better than the object-model-plus-unit-
tests approach.)

Here's one idea I've come to: test code and production code are different
species that need to develop differently. Good production code is a well-
factored, abstract machine. Good unit tests are concrete examples illustrating
one thing each that are independent of one another. The same principles do not
apply, much as a car's manual is not built like a car.

For example, I allow almost no duplicate code in production, but with unit
tests my tolerance is much higher. Duplication is bad in either case, but
trying to eliminate it from tests has negative consequences that are worse: it
prevents them from being simple, self-contained, and independent. A good unit
test reads like a story. There are many such stories you might want to tell
about your system and typically there is lots of overlap between them, working
out different permutations of the same thing and so on (imagine an enormously
complicated Venn diagram). Production code is not a story at all (except maybe
inside the occasional well-defined function), it's a set of abstractions.

When people treat tests like production code and try to factor out the common
bits, they end up creating a whole new set of abstractions that sits _between_
the production and tests. This layer soon becomes a sink for all kinds of
bloat (ObjectMotherFactoryManagers, anyone?), eventually so thick that you
can't even see the production code from the tests. You spend hours tracking
down test failures only to find that the problem was in this test
infrastructure. Regardless of what one thinks about unit testing in general,
this clearly isn't a good way to scale it.

Here's an analogy that popped into my head one day when helping some people
work on a large system that had this problem. In the 1920s it was fashionable
to do formal analyses of fairy tales. People discovered all kinds of common
patterns and overlaps ("Once upon a time", things come in threes, etc.). Yet
each tale is also unique. Now imagine if someone said: All this duplication is
bad. Instead of duplicating "Once upon a time" and "In a deep forest there
lived...", let's refer the reader to a "setup" section. Of course we'll need
to provide some parameters (how many boys, how many girls, etc) because
they're not _quite_ identical, only mostly. And witches always do such-and-
such, so let's factor out a Witch and then we'll only have to write each
witchy action once and can refer to these from different stories. My point is
that, if you did this to a book of fairy tales, you'd no longer have a
storybook. You'd have a weird assortment of meta fairy tale bits. Critically,
you could no longer pick it up and read a chapter at a time. In fact, a person
coming in cold would have trouble reading any of it.

~~~
akeefer
Indeed, there's a real tension there between treating test as "real" code and
letting them be separate. You end up getting bitten both ways, and we've
gotten hit by all of them. Things can't all work the same, since if TestA and
TestB share a common setup routine or helper and TestC comes along and needs
to tweak that, you don't want it to hose TestA and TestB. On the other hand,
if you have 1 production implementation of some interface and 50 different
test mocks, you're in huge trouble if you need to change that interface, and
life will be better if the tests either use the real implementation or all
share a single mock (though even then there's a danger in the mock diverging
from the main interface).

There's a similar problem with tests using their own code paths to set things
up instead of the normal system path. For example, suppose you're testing a
method that takes a User object. In your production system, users might only
be created in a certain way, they might have certain fields required, they
might have sub-objects attached to them, the database might enforce
nullability or fk constraints, there might be insert callbacks in the code or
in the DB, etc. If you try to just hack up a User in memory and pass it to the
test method outside of the normal creation path, your test User object might
differ in important ways from your actual User objects in production. So
changes in the production code, for example assuming a field isn't null or
relying on a default value, might cause the tests to break erroneously even
though the app is fine. Then you have to find all the places you create Users
in tests and fix them, or try to centralize things so you only have to fix
them in one place.

Some of those problems can be partially avoided by proper decomposition and
decoupling of the code, though oftentimes you have to have the foresight to do
that before the tests get out of hand (and having the tests get out of hand is
a good canary in the coal mine that your code is in trouble).

We actually went all the way to one extreme whereby we run our tests in the
production environment as much as we can and avoid mocks and stubs; we're a
Java servlet app (kinda the only way to do enterprise software these days,
unfortunately), so we start up a Jetty server, an H2 database in memory, and
go through the normal server-startup sequence before running most unit tests,
which at least eliminates the test setup problem and a lot of the
test/production divergence. It comes at a huge cost in terms of local test
execution times, unfortunately.

------
richcollins
Writing specs that are incorrect but pass are also bugs. I'm surprised he
hasn't run into this situation.

------
jdale27
<http://en.wikipedia.org/wiki/No_Silver_Bullet>

~~~
silentbicycle
The distinction between accidental complexity and essential complexity is very
relevant here. Test-driven design is not a perfect solution ("The End of Bugs"
is pretty sensationalized), but it's good at reigning in many kinds of bugs in
dynamically typed languages, much like the static type systems in Haskell or
OCaml.

Sure, it won't prevent all bugs, but I wouldn't write off testing so quickly.
Consider Proebsting's Law.
(<http://research.microsoft.com/~toddpro/papers/law.htm>)

(I'm responding to anti-testing backlash in general; I'm not clear what
position, if any, you're making by just dropping in the link. Also, akeefer
makes some really good points here.)

------
ntoshev
You can't do exploratory programming writing unit tests first, can you?

~~~
akeefer
My experience is that it's painful if you don't have any idea of what you're
doing; it's not the testing itself per se but rather the fact that if you're
iterating rapidly on the code your tests will need to be rewritten over and
over again. Large libraries of unit tests are invaluable for catching
regressions, but they tend to impose a certain amount of friction on code
changes and refactorings. So personally, I tend to do TDD when I know where
I'm going, but for exploratory programming I write the minimal amount of high-
level tests to test things end-to-end and make sure things kind of work at all
and back-fill more detailed tests once the code has settled down a bit.

------
greyman
Please can someone explain more about "BDD" here?

~~~
ntoshev
<http://en.wikipedia.org/wiki/Behavior_driven_development>

------
wallflower
Nice to hear that BDD/TDD is doing well for you. I love RSpec's stories.
Unfortunately in a land where using the Spring framework is a big risk (e.g.
corporate software development), BDD is not going to be widely adopted, let
alone seriously considered. Startups are for those who want to expand their
technological comfort zone. Average companies in the other hand, they're fine
with remaining average.

~~~
gruseom
One hears the exact opposite argument too, which is that these techniques are
suited for large/average teams in corporate environments and not at all for
startups where people are talented and have to work quickly.

------
aditya
Can someone point out any compelling reasons to use RSpec over Test::Unit?

~~~
tyler
I've used both to a limited degree. The only thing I can say is that RSpec
felt more "natural". I suppose it's up to you whether feeling natural is
compelling or not.

