

TDD, BDD, and the tea tasting lady - rdfi
http://blinkingcaret.wordpress.com/2012/10/02/tdd-bdd-add-every-other-method-that-promises-software-quality-here-and-the-tea-tasting-lady/

======
swalsh
It's perhaps not rigorously scientific, but our red mine server is tied into
our Jenkins continuous integration server, which also has a code coverage tool
tied in. We've been doing this for a little while, and have the ability to
visualize bug counts divided by severity along side test count, and code
coverage percentage.

There is an order of magnitude difference from the projects that did not use
TDD, and the projects that did. Another nice part of red mine is that I can
also visualize how close we are at meeting our expectations for time estimates
(they have this task tracking feature). Overall adding TDD has made
development a bit slower in the front end, but a lot shorter in the back end
(during QA). However when you look at the past, it would seem like something
like 60% to 70% of our time was actually spent fixing issues found in QA,
which was usually under estimated. So overall we're definitely seeing
quantitative proof that TDD is a better methodology in comparison to our past
process.

Anther cool tool for project management is we've built a set of tools that
allow us to designate a requirement from our functional spec with a unit test,
which runs every check in. So our project manager can get a near real time
assessment of our progress. I also use that report and associate it with the
code coverage report. Though this part of the system is newer, so I have no
data on how effective it is.

~~~
coopdog
Do you find linking unit tests to functional requirements useful? I've also
been toying with visual traceability (requirementweaver.com) and it seems to
have a lot of potential to speed up the traditionally slow 'quality process'

~~~
swalsh
Like I said, its a really new thing for us so I don't have a lot of experience
yet. The most immediate benefit i saw though was that we spent more time on
the functional spec than we typically do making sure it was complete. That
process yields a lot of benefits (its far easier to delete paragraphs, then
code).

Beyond that though, its nice to see a bit more details on your progress as the
development continues. We definitely have more productive status meetings. We
used to sit down once a week, and go around the table asking people where they
are, but now everyone knows the status all the time (we all can see it). That
allows us to spend more time on discussing problem solutions. During
development, developers keep track of their own progress with less work. As a
lead, its useful for me to see areas of weakness so I can have more targeted
code reviews.

------
noelwelsh
The problem is one of cost. The tea tasting experiment is very easy to
reproduce and very cheap to run. Reproducing, say, the development process of
Myna is practically impossible not to mention the prohibitive cost of finding
devs as awesome as Dave and I ;-)

To apply the scientific method to software development you need to apply
methods from the social sciences. Now I don't know a great deal about these,
but I do know you need lots of data which is very hard to find. The typical
solution is test hypotheses on undergrad students, because that is what the
experimenters have plentiful access to. The problem, which is also apparent in
psychology, is generalising these results beyond this group. Are the
experiences of 2nd year undergrads using Java for a 2 week project predictive
of developers with 10+ years experience working on a year long project? One
can reasonably argue they are not.

------
evolve2k
There is a great book which teaches you statistics looking through its
fascinating history.

The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth
Century
[http://www.goodreads.com/book/show/106350.The_Lady_Tasting_T...](http://www.goodreads.com/book/show/106350.The_Lady_Tasting_Tea)

------
rlpb
It's very difficult to perform a fair test.

If I test random programmers by getting them to do the same project either
with TDD or without, TDD proponents might (justifiably) complain that the
programmers didn't do TDD properly, since they didn't know how.

If I use TDD proponents to do the TDD side of the test, and TDD skeptics the
non-TDD side, then one might claim that there is a bias in the quality of
programmer (either people who like TDD make better programmers, or TDD
skeptics make better programmers, depending on the results).

For every test you come up with, there's a potential bias that you cannot
eliminate, since there will be some hypothetical correlation somewhere that
doesn't correspond to TDD itself.

~~~
JohnLBevan
Do four tests - TDDers doing TDD, TDDers doing non-TDD, non-TDDers doing TDD
and non-TDDers doing non-TDD. From this you can get a better idea of whether
it's the method or the coder's skill affecting the results. Clearly if TDD is
better the TDDers will do better than the non-TDDers when both doing TDD
because they've more experience. Comparing the non-TDDers in TDD vs non-TDD
will show you if the method alone is enough, or if other factors have affected
it (e.g. learning curve of the new method, ability of programmers who chose
that method). The TDDers doing non-TDD helps contribute to the question of are
they just better programmers (i.e. do they do comparatively well in non-TDD /
are they worse that those TDDers doing TDD).

Doing other tests with entirely randomly chosen programmers ensuring they're
trained up in these techniques would be another way - though as you point out
the training would have to ensure they got the techniques they were being
tested against / you'd need to avoid bias of people having spent the last few
weeks exclusively in an exclusively TDD environment due to the training.

------
waivej
I've been thinking about using a random process to guide bug seeding. In other
words, putting in bugs in random locations and random "type" and then see how
many are caught by the test suite.

~~~
colonelxc
This is called mutation testing. Here is an article I read recently that I
thought was a good introduction to the idea:

[http://dev.theladders.com/2013/02/mutation-testing-with-
pit-...](http://dev.theladders.com/2013/02/mutation-testing-with-pit-a-step-
beyond-normal-code-coverage/)

------
JohnLBevan
Another thing to take into account is who's coding, what they're coding in,
what they're building, how big their team is, how mature the team is (i.e. how
long have they been working together) and how their environment's set up. As
with most questions, context is everything, so should anyone experiment to
find the best solution you'd want them to include those details in their
conclusions, and ideally to alter and test against each of those variables and
show the effects of BDD/TDD under those conditions.

------
Sandman
I'd be genuinely interested in reading a paper on what impact these
methodologies have on software development. Does anybody know of any research
on this topic?

~~~
stonemetal
<http://evidencebasedse.com/> Collects papers on SE practices. They claim to
have 33 papers on TDD and 40 on testing in general.

~~~
rdfi
Thanks for this

------
mistercow
>Your reaction (as mine was) is that it is impossible to tell the difference.

If you pour the milk into the tea, the first bit of milk will heat to nearly
boiling, scalding it and changing its flavor. If you pour the tea into the
milk, you don't have that problem.

~~~
skore
And, according to some research[0], no matter which way you pour it, milk will
make the tea worse.

(And as a personal note - if you need milk to make your tea taste good, maybe
you just don't have good tea?)

[0] <http://news.bbc.co.uk/2/hi/6241139.stm>

~~~
mistercow
The health problem can be solved by using a non-dairy milk like almond milk
(which is also objectively tastier). As far as snootiness, I really think it's
just a matter of personal preference. I enjoy it either way, although it can
certainly mask the flavor of bad tea.

------
ajanuary
The article talks about how to blind the tea tasting lady to make the results
more accurate.

It's pretty hard to apply that same blinding to reading code with large or
small methods. There are definitely ways to test it, but it's not quite as
simple.

------
abraininavat
I think that Uncle Bob and his ilk would dispute the idea that breaking a
large function into smaller functions ever makes the original function less
readable. They'd argue that each of the new, small functions is proven to be
correct (since you'd fully unit-tested it), so its contents are of no
consequence when reading the larger function.

How true that is I'm not really sure.

~~~
jbrechtel
In an OO-world function extraction usually results in private methods to a
class. You (generally) don't test those directly so it doesn't necessarily
result in more specific unit tests.

~~~
maio
Well you can extract methods into new object where they will be public.

