*While it is useful to tease out the contributory causes for why adopting even a...

jacques_chester · on March 20, 2016

> You've invented this term "half-baked TDD", but that seems a little unfair.

I was rolling with your characterisation that the study wasn't about "real" TDD.

> Prominent TDD advocates, Bob Martin among them, claim quite unambiguously that TDD is essential to writing good software, even using patronising and insulting language like "unprofessional" to describe anyone who doesn't do it.

I personally find Bob Martin quite infuriating.

Doubly so, because I am being apparently grouped with him.

> Given that, I'm sorry but I find it patently absurd to argue that the only reasons hardly anyone is doing TDD, even though it is so inherently superior in both quality of results and cost effectiveness, are that it is hard or unfamiliar.

My actual argument is that TDD is a practice that is hard to learn alone. Every anecdote I read about someone trying and rejecting TDD is an individual trying it by themselves.

> Many of us worked on software projects that have not failed. Not failing is table stakes for this debate.

Reducing defects found in production by 40-90% on a first encounter with TDD is more than table stakes. Especially considering how many projects utterly fail.

Consider for contrast Fagan-style code inspections. These too boast studies with ~90% bug yields. I don't see many people doing them.

Or formal methods. Again, claims of remarkable bug prevention outcomes on very challenging projects, for long spans of time. Yet it hasn't swept the industry.

Some practices are, frankly, harder to learn than others. That the industry is quicker adopt more easily-adopted practices says nothing else about the practices.

We clearly aren't going agree.

Edit: one more thing. I was struck by your point that people only ever cite the one paper. So I began looking for reviews.

Here are two recent ones of interest:

The effects of test driven development on internal quality, external quality and productivity: A systematic review

http://www.sciencedirect.com/science/article/pii/S0950584916...

and

"Considering rigor and relevance when evaluating test driven development: A systematic review"

http://www.sciencedirect.com/science/article/pii/S0950584914...

This second one in particular is of interest, the authors include Munir, who was an author of early research showing equivocal results for TDD.

Unfortunately, both behind paywalls, so a closer reading may weaken the fairly strong statements in the abstracts.

Silhouette · on March 21, 2016

My actual argument is that TDD is a practice that is hard to learn alone. Every anecdote I read about someone trying and rejecting TDD is an individual trying it by themselves.

In itself this is a fair point, but I think this kind of argument only stands up for so long. The same could be said of previously relatively obscure programming styles like functional programming, but they have slowly worked their way into the mainstream as more people have learned them. The same could be said of the modern emphasis on DevOps, but again knowledge and tooling for that have evolved rapidly and gained widespread acceptance in an industry where they were mostly alien just a few years ago.

Consider for contrast Fagan-style code inspections. These too boast studies with ~90% bug yields. I don't see many people doing them.

Fagan-style is too heavyweight to be practical in most software development organisations, and rightly meets resistance as such. However, this is an area where I have considerable personal experience, and I can tell you there are a lot of places that have successfully implemented lighter weight code reviews and/or broader technical reviews of project assets, with very favourable results. Even major Open Source projects typically have some level of mandatory review and often super-review today before new code is allowed into the master branch. Almost every project that is serious about software quality has at least some form of code review process today.

Or formal methods. Again, claims of remarkable bug prevention outcomes on very challenging projects, for long spans of time. Yet it hasn't swept the industry.

Formal methods are too expensive for most projects with today's techniques. They have their place, and they can achieve excellent results in the right context. I'm bullish about the future of this field, not because I expect it to take over completely any time soon, but because I expect that some of its ideas will drift into the mainstream and become common practice as they become incorporated into our languages and tools, just as today strong, static type systems can eliminate entire classes of programmer error that are possible in more dynamic environments. However, for now the cost of heavyweight formal methods is so high that you really are into the territory where alternative engineering solutions involving completely redundant systems and the like can actually be more cost-effective.

I was struck by your point that people only ever cite the one paper. So I began looking for reviews.

I've only read one of those (the Munir one) but I'm afraid you might be disappointed. For example, of the 41 primary (mostly) sources they considered, just 9 were in their high rigour and high relevance quadrant. Of those, they report that 7 did conclude that the external quality of the TDD-based development was significantly better (one of the 7 being the Nagappan paper).

However, when you look at the primary sources, you find that like Nagappan, often what they were looking at wasn't really TDD either. For example, one was actually about moving away from TDD at a class/method level and more towards testing at a higher level with components, and it was the latter that gave the better results.

I might also challenge the classification of some of those papers as being rigorous and relevant. For example, one of the key metrics used in the Slyngstad case study is defects per SLOC, which in itself is questionable. The case study compared several releases of the same project, between which the number of SLOC varied widely (notably changing quite dramatically at the same release the TDD was introduced) but in all cases was quite small by professional development standards (only a few thousand lines). And then the paper does some extremely dubious arithmetic to reach its headline statistic of TDD reducing the mean defect density by around 35%, glossing over things like a sharp rise in the defect density in the release when TDD was introduced and the fact that the average for the test-last releases was completely dominated by a much worse score for the very first release.

In at least one case, the Siniaalto paper, the survey appears to have almost completely reversed the position of the original paper, perhaps as a result of scanning for key words and phrases a little too loosely and failing to notice that the paper was actually challenging disputing some of those claims rather than supporting them.

Overall, it's still much the same story here: some of the generalisations being presented in the summaries aren't necessarily supported by the primary data when you look at the details. There are lots of examples of the understandable but still real distortions that these kinds of surveys always seem to show up.

So while I appreciate the interesting discussion, I'm afraid we might still have to agree to disagree on this one. I'm not saying TDD doesn't or can't work for the right team in the right context, but the idea that it is innately superior to other development methods in general and the evidence typically cited to support such a claim just don't stand up to scrutiny.

jacques_chester · on March 21, 2016

Are you a researcher or a practitioner? The last half of your answer was much more interesting than the slogans at twenty paces we exchanged in the early part of the discussion.

Silhouette · on March 22, 2016

I'd say I'm a practitioner, but one who has been around the block a few times and perhaps done more research than most along the way.

Once upon a time I did spend several years doing fairly serious investigations into ways to improve software development processes and what evidence was out there. The majority of that work wasn't primary research, but it was fascinating and sometimes enlightening to separate advocacy from evidence, and I suppose I've maintained the habit ever since.

I find some of the ideas popularised by the Agile movement particularly interesting. Often there is decent evidence of effectiveness to some degree or in some context, the kernel of a good idea, if you like. Unfortunately, there is also the whole dogmatic advocacy thing, where evangelists extrapolate beyond the evidence and the benefits get overstated.

Just to be clear, I'm not suggesting that you've been doing this in our discussions here. I'm happy that TDD seems to work for your organisation, I've no reason to doubt that you find it effective, and if you have any write-ups of what you've found does or doesn't work well then I'd be happy to read about it.

However, I've mentored more than one junior developer who really has told me point blank that we were doing software development wrong just because we weren't following the gospel according to Bob Martin, Joel Spolsky, or whoever it is this week. That gets old, so I tend to comment when discussions get into evidence-based debate, in the hope that it will point others towards information that took me a long time to find and reconcile.

jacques_chester · on March 24, 2016

I hear you. I often feel the same way.

Most of what coalesced into agile in the late 90s was already "in the air", just taken further and tied together.

For example, it's normal now to have CI/CD.

In 1996, McConnell listed "daily build and smoke" as a best practice, describing it as the "heartbeat" of a project. Without it, you're dead. It didn't have a sexy name it was slow and fragile, but the concept was there.

Or, variously, sprints or iterations. Various spiral models existed before Scrum and XP became the talk of the town; just nobody tried it.

This was a good discussion. I continue to disagree with you about the parallels you drew and what I see as a line of argument that adoption of a practice is commensurate solely with its value and effectiveness.

However I now appreciate why you were forceful. I fit a pattern you recognise.