Hacker News new | comments | show | ask | jobs | submit login
Why TDD isn't crap (hillelwayne.com)
159 points by mzl 77 days ago | hide | past | web | favorite | 161 comments

Like the author, I subscribe to the less strict view that TDD isn't necessarily about writing the test first, but rather about having test play some part in how the code takes shape.

Unlike the author, I absolutely believe that tests are about design. More specifically, they're about identifying coupling so that you can reduce it. The function being "awkward" to use is part of it, but code which is hard to test is almost always going to be hard to maintain.

If you do this for enough time, you should naturally start to write code that is less and less tightly coupled. At that point, the value of tests as a design canary decreases, but never completely goes away.

This is all broad generalizations. Individual vigilance and giving-a-fuck matter more than anything else. But if you show me some random code, and it's spaghetti, I'll bet every single time that the author doesn't test.

I think TDD (or rather, unit tests):

1) ... over-emphasise the importance of reduced coupling, and can actually increase the chances of integration failures. Why? Because components are tested in isolation rather than together, but are still considered "tested" - especially when using stubs and mocks. I've seldom seen mocking used well, it almost always over-specifies implementations.

2) ... increase the tested "surface area" of the code, making it much more costly to modify in a way that changes the "surface area". When two bits that normally only ever face one another are tested on both sides, you now have 4 places to update when you move functionality between them instead of 2. And that's just for a small refactoring.

3) ... encourage a spurious modularity that increases the overall conceptual complexity of the solution. When every dependency needs to be replaced for testing, they become parameters, one way or another; and thus the code becomes over-generalized and over-abstracted, more removed from the work it's doing and more concerned with the bureaucracy of communication and coordination. It's this spurious modularity that's the cause of Enterprise Java FactoryFactories and friends.

That sounds quite negative on unit tests, and actually I'm not negative at all. I think code can be modified in more than one way to accommodate tests; poorly, by introducing single-implementation interfaces everywhere; or better, by reducing the number of types of interfaces, converting control flow into data flow, and generally adopting more functional and higher-order compositional architectural patterns.

It's hard to get more specific without talking about examples, though, so I'll stop here.

I worked at a place recently where all of these factors, but especially (3) turned out to be a huge cost for an ostensibly simple service that it severely hampered any further development. Since it was a hard TDD shop, any discussion of cutting out all the intermediate layering was completely off the table. Internal structure was easily 75% of the size of the project.

After all, a lot of effort had gone into building all of the internal mocks, and while many of the early design decisions enshrined in the tests were quite dubious, to revisit those assumptions would have been backsliding. Even though there were no users to speak of yet, the existing tests had to pass.

And of course because of (1), the product didn't actually work because there were no integration tests at all. Because all of the unit tests passed, the ongoing assumption was that the platform was basically sound.

The most charitable thing I can say about TDD given that context is that it might be valuable, but isn't sufficient.

“Any technique, however worthy and desirable, becomes a disease, when the mind is obsessed with it.” —Bruce Lee

I've always thought that the bulk of the value of tests is for the unfortunate person who has to refactor or extend your code years down the track.

By that logic, if you write testable, but untested code, then you're still making it difficult for people to refactor later. This applies even if the code is well designed and uncoupled.

> the value of tests is for the unfortunate person who has to refactor or extend your code

This is probably one of the easiest things to quantify. I just finished working on a project that used BDD and TDD. The project was a rewrite of an older system that required manual testing. It took about 6 months to get a change into production. After the new system went live, they found a simple way to improve it. The code change was very small, but because all of the TDD and BDD efforts could be leveraged as regression tests, the change was deployed in a matter of weeks. That single change was able to capture an additional $5 millions in value that would have been lost if things had of gone through the previous 6-month cycle for testing the entire application manually.

You don't even have to think that far ahead - consider some code you wrote in week 1, that was then modified by a colleague in week 2, now you return to the code in week 3 - what does the code do? Any mental model you had at week 1 is now irrelevant as the code has changed in unknown ways. What is the current intent of the code? Do you feel safe modifying it, despite the fact you no longer know how to manually test all the end points? You only have to miss one to have clients and managers breathing down your neck when the blame game is played as to who broke the app in production.

> Any mental model you had at week 1 is now irrelevant as the code has changed in unknown ways.

Isn't that what good git commit messages are for though? (Or any other source code versioning tool.)

I'm not saying tests aren't useful, but often times I find reading a file diff history is way more useful to understanding it than its tests.

IMO, tests should be there only to provide some level of feeling safe modifying files, as you can easily know if you messed up something.

It'll definitely show you the changes, but how far back do you start from? and how many files do you do this for? In addition, it doesn't show you how the dataflow changed; you need to model that in your head still.

Tests would also do this for you, but without the mental burden of brain compiling. It's nice to be able to set some breakpoints in code, then start a quick debugging session with the relevant test and see the data flow through the code - you can understand any function usually within 5 minutes.

What do I do? Review the changes in git. Am I the only one that doesn't have much faith in unit tests preventing a regression even on a well covered project?

Helping with regressions is a big benefit (and not just for years down the road). But that's more about unit testing in general than TDD. Really, we only disagree about which is bigger. I think for the first few years of a developer's career, learning about solid (pun intended) design is hugely valuable.

Many times that unfortunate person is you just a month or two later.

How do you test your test if you don't write the test first?

If you write the code, and then you write the test, then you haven't tested your test. You have no proof that your test will detect broken code. The only way to prove that your test can detect broken code is to run it against broken code. So you can write the code, then write the test, and then break the code and run the test. Or you could just write the test first, and run it. It should fail. Surprisingly, sometimes it doesn't: either because the test was bad, or the existing code didn't work like you thought, or sometimes the language doesn't behave like you think.

I've dealt with hundreds of thousands of lines of tests where I could go into the code and just start deleting functionality en masse, and no tests failed. If you don't test your tests, they're just bloat. They're the kind of pointless bloat people are complaining about in these unit testing threads.

Although on the other hand, writing your test first with no code and having it fail isn't really telling you much about the test. It just shows that the test handles the least interesting case (the case where there exists no code). You still don't know that your test is correct when you get it passing. You just know that it is correct when no code exists.

For example,

    TestSort(ISorter sorter)
        var input = { 1, 2, 3, 4 };
        var output = sorter( input );
        Assert.IsSorted( output );
This test will fail if you run it before the ISorter is implemented. But then it will pass if you implement an empty ISorter because the test still has problems (ie it starts with sorted input).

Cue the ever familiar: You still have to use your brain when using TDD.

If TDD needs brain usage to function, maybe the real thing that is useful is the brain usage and TDD is vestigial.

If someone gave me that test, I'd implement the code as "return input;"

Forget that you, a human, thinks you know what an ISorter should do. What do the tests demand that you do?

Doing TDD, the goal, when coding, is to write the minimal amount of broken code that makes the test pass. If you passed me {4,3,2,1}, then my code would be "return {1,2,3,4};". If you wrote a second test that passed me {6,5,4,3}, my code would be "return (input[0]==4)?{1,2,3,4}:{3,4,5,6};". You've got to write a test that makes me actually code what you want. I use this adversarial approach even when I'm writing both the test and the code.

Usually the first test I write if I'm starting with a blank slate, is to pass null and check that I get a NullPointerException. That gets the class created. After that, its got to be randomly generated data.

And that works fine ... as long as you know all of the cases that will cause the minimal amount of broken code to fail in a test case.

What happens if you're implementing something like a unification algorithm, but you don't know about the occurs check. You'll progressively add test cases that break your code and eventually stop. Until an end user creates a query that contains itself and your algorithm fails to terminate.

The solution to get TDD to work is to know what all of the weird edge cases of your algorithm are. However, I assert that the thing that makes TDD ever work is that the practitioners who are "doing it right" already know all of the weird edge cases of their algorithms. And I assert that if you know the weird edge cases you can drop the test first part of TDD and still get working algorithms.

Random data is interesting because it's not something I've seen anyone mention when they talk about how they do TDD before now. I suppose if you knew nothing about your problem and had some sort of tool progressively feed you intelligently generated random data, that might work for algorithms that have few edge cases or relatively simple edge cases. However, it seems like you would want something more like quick check for cases where you have really esoteric edge cases. And I don't see how you avoid implementing bubble sort (or its equivalent in your domain).

I don't think TDD can't work, but so far nobody has described TDD to me where I have any reason to assume that TDD is actually doing anything worthwhile. It's always sounded like TDD was irrelevant and the real key to success was thinking about the problem carefully. Additionally, it sounds like there are problem domains where anything that TDD might bring to the table would be nullified (ie certain problems that exhibit a certain level of complexity).

I can only suggest that you commit to it for some considerable time, such as six months. I, too, thought it was, at first, a fucking stupid idea, and then, when people I respected continued to advocate it, grudgingly tried it, and thought it was a pain in the ass - this was in 2006. However, the number of times where I've written some code that I thought worked, and then commented it out and rewrote it using TDD, and found bugs, is too high. Its rare, but its too high. So either I'm not that great a programmer, or TDD is useful. Another way to think of TDD is rubber ducking, perhaps.

So you're in a local maximum, and to get to the next, higher, maximum, you're going to have to go through a trough.

I'm not discounting or discrediting your experience. If it worked for you, then great. Unfortunately, my personal experience involves a few decades of "just try it my way and you'll see" ending in disappointment at best and catastrophe at worst. Maybe it's just the way I'm wired.

That being said, if anyone wants to fund me for six months (plus either some compensation to my employer for having me gone for six months OR a significant bonus for my own financial security) to try TDD, then I'll gladly do it.

That being said, I plan on putting it through one heck of a trail. At the end I'll either have a pretty good idea of exactly why TDD works OR I'll have a compelling argument for why it's gains are illusory.

When I write a test after altering/adding code, I will generally stash the altered application code, and run the test I wrote on unaltered app code to confirm that it fails.

As long as one maintains decent version control discipline, you can be pretty confident that the test is valid with this technique. (It does get a bit more difficult when there are schema changes, but then it also encourages good discipline and vigilance when you're making schema changes ;)

That's good. I suggest that you're doing unnecessary work (having to stash and unstash) when you had "broken" code at first, but at least you are testing your tests, so well done.

In my experience coaching teams on TDD, its quite easy, when writing code first, to add more code than is needed to solve the problem. And then only the problem is tested for. Meaning that there is code that is not tested. Maybe an if statement where a branch is never taken. Strict TDD means you don't write any code without a failing test, and you write the minimal code needed to make it pass.

Now quite often, a programmer has no idea how to tackle a problem, so I advise that they just write something that works. Just have a go and get something working. Most programmers (if they test at all) will then just go and write a test for it, usually for happy path. And that usually results in code with bugs. I know this because many times the programmer has done as I suggested and commented out their code, and reimplemented from scratch using TDD, and doing TDD finds a bug in the original that nobody saw.

Oh absolutely. I've done my fair share of "pure" TDD, but my style has evolved over the years to not always favor a pure approach (usually in the interest of time).

It really comes down to the nature of the problem - if I foresee a chance that the final architecture is not immediately obvious, I'll start breaking down the problem into small pieces and TDD the units, then the integrations.

I guess I'm of the opinion that not everything needs TDD, but everything does need diligence - my code after doing years of TDD has improved drastically, whether or not it was strictly test driven. Merely considering whether it will be easy to reliably test the code improves it, IME.

you could test your tests using mutation testing [0]. it's not bulletproof but quite close.

[0] https://en.wikipedia.org/wiki/Mutation_testing

For years I have written too many tests. Recently I have been trying something more like this:

- One test to prove I can cleanly load the component or object in a module. This is about loose coupling and ensuring that each module contains only one object.

- One test to prove the component or object does what I what it says it does. This is primarily about the Single Responsibility Principle. If I need six tests to prove an object works, then the object might be doing too much and might need to be refactored.

Other tests can be added if bugs appear, but just two tests per object can help us keep things light. Of course, if I have a module full of utility functions, then that is a different matter. But I think the limiting the amount of code in my tests and the overall number of tests is valuable.

"having test play some part in how the code takes shape"

That is exactly the point why I don't like TDD.

I mean, there are enough constraints that shape the code, why should something artificial like tests shape it too?

With mocking and everything you end up writing code for tests and not code for your problems.

If you use mocks extensively in your tests then you probably aren't getting much of the benefit of TDD. Mocks indicate high coupling so using them is just powering though testing your bad design rather then driving good design via testing.

How do you handle dependencies of the class under test then?

This is the biggest challenge. What I find works is separating your orchestrations from your data processing.

For example, instead of creating one class that does the network call, and processes the JSON response, break these out.

One class does the network call (keep it extremely thin). Dependency inject stuff in if you want. But view this purely as an orchestration.

if (connectionValid) json = makeNetworkCall anotherObject.processJson(json) <- test separately end

Then, in the other plain old data process class, do you heavy off-by-one edge case testing there.

If the first case, the orchestration, I can write tests like "One make the network call if the connection is valid". But that's all that test would do.

I am at the point now where I actually don't even test that. I assume that is going to work, because it is so simple. That and I generally hate mocking. Way too much coupling. Way too hard to refactor.

Where I do test heavy is on the plain old object side, that has no outside ports or connectors. These are just plain old objects.

I agree with some other comments I have ready here. Mocking is a bitch and I generally don't like it. But it's there if I needed it, so I do use it occasionally.

But it has so many downsides I try not to. Instead I separate the orchestrations from the processing. Unit test the processing (in an integration test sort of way), and try to stick to testing the public APIs of my class, so I am free to change the internals without things breaking.

My 2c. Hope that helps.

Would love an answer to this question, as well. I've only come up with DI as a long-term mostly win for this problem and it's something that I grapple with in every code base at work, since we have some microservices and other HTTP API layers that we need to interface with and I want to write testable code.

Well, you write code for design flexibility. Testing just forces the issue and helps you explore possible design needs. For example, what if the database query you're using isn't suitable any more? Instead of designing around a User table, you start passing around User records. Then your design is more future proof in case you start getting Users from a service or in-memory cache instead of a database.

> ...not code for your problems...

Right. It's not your problem now. It may be your (or your successors') problem in the future.

Is it always worth the effort? Nope. But experience, communication, and good teamwork will help with balancing short-term and long-term goals.

This is exactly the problem. In general you should not be writing code for design flexibility. Your code should contain the minimum number of abstractions to satisfy its requirements. If flexibility is a requirement right now then that's ok. Otherwise refactor in the flexibility later. But don't make the code flexible for the sake of tests. It vastly over-complicates it for no benefit. Instead write tests as close to the requirements level as possible (ideally most of your tests should be just below the ui)

> Otherwise refactor in the flexibility later.

I'm saying, with experience, teamwork, communication (including through tests), you'll know when this proposition makes sense or not. It's not universally true that "we'll worry about it later" make sense.

I don't consider taking steps to make sure the code actually as you expect it to as "artificial" feels pretty essential, to me.

If you're writing lots of mocks, odds are you're writing a lot of unit tests. In my experience these are rarely useful. Most often they are either really testing a compiler's output (indirectly), supplying an ad-hoc and incomplete type checking system to a dynamic language, or more appropriately covered by asserts that will scream when integration tests fail.

Because tests tend to make your code do what you originally intended, and keep your attention on the quality of the result.

Tests are not artificial constraints. They are the guarantees which (should) matter most to your users and stakeholders.

I am quite convinced stakeholders couldn't really care less about the guarantees being made by unit tests. Higher level integration tests, maybe.

I'm not hearing anything negative here. Why is any of this innately bad?

Tests shaping your code is bad, because? Because you already have "enough constraints"? Sorry, I don't get it.

> code which is hard to test is almost always going to be hard to maintain

This is a pointless equivalence to make. You're trying to suggest that code which is hard to test could be written differently, and better. This is not always true, and therefore useless to say.

STUXNET was very difficult to test. Reports are that it required an entire Israeli nuclear facility as a test environment. Would you suggest that STUXNET was written to a substandard level of professionalism? I doubt it.

You're missing the word "almost". It's important. Attacking parent comment with an absurd statement no one would agree with isn't a great argument.

He literally admits it's not always true in the sentence

Maybe it's only difficult to test end-to-end? I would assume there's code in there that's algorithm-like. Give it these inputs, it should return these outputs. If so, it should be possible to isolate that code and test it.

But, I dunno, I haven't looked at the source code. It might be very difficult to maintain.

I expect the reason TDD is so controversial on here is people can't see the long term benefits of tests, and instead only think in the short term. But in the commercial world, code you write can potentially have a lifespan of 30+ years. In this case, making a choice to write tests is the difference between writing a maintainable component in the future vs writing a soul-destroying 'legacy system'.

If you agree tests are a good idea, but think TDD is too extreme, consider that TDD simply makes sure you write testable code from the outset. When you have a test wrapping a method, and need to add a dependency, you actually decide to use DI (Dependency Injection) because otherwise your current tests will break / become integration tests. TDD makes you think upfront about about things like mocking, separation of concerns, etc.

When you have the mindset that you will absolutely 100% write tests at some point anyway, TDD is actually a faster and more fun way to develop than bolting on tests afterwards.

This whole thread is a response to: https://news.ycombinator.com/item?id=15591190

Your points are directly addressed in the pdf. One of his general points being, in practice, the tests become the legacy system instead. And I'd add to that. Given that you've at minimum doubled the code (and doubled the bugs), it seems like a really bad long-term trade off.

Also DI does not reduce coupling. I've seen plenty of code with DI that's just injecting like 30 things, which is obviously therefore coupled. It just makes it really obvious, but DI itself has massive downsides.

If you've ever worked with bad programmers and seen it in the wild now, I'm sure we can agree DI and TDD doesn't stop bad programmers writing bad code. In fact, all it seems to do is make even more of a mess.

Not only do you have to pick apart the bad code, you have to start dealing with carefully moving methods to the right places because DI can make it hard to figure out what's being used where, and then on top of that tests break all over the place because they're entirely dependant on the implementation instead of the functionality.

What exactly is your experience working in a commercial environment? When the updates you ship can affect tens of thousands of customers? In these situations, 'Sods Law' often comes to mind - "what can happen, will happen". If you have a defect in your untested code, you can absolutely bet it will come out at the most painful time possible in front of all your clients. Being blamed for that kind of stuff is a stressful way to live your life, much nicer to have tests shout at you instead.

* and to reply to your edit: of course tests break when you change the code they are testing! But after reviewing the broken tests, you see the intent and re-adjust the test. But what often happens, is you realise you didn't fully understand the code previously, and actually after reading the test you need to undo your refactoring as it didn't make sense in the first place.

I think that everyone should have a 5 year stint with zero tests. I genuinely believe it offered me an insight into the deepest parts of depression.

In saying that, it was raw and I think I enjoyed my job more. There is no safety harness. You write decent working code and sign your name by it or you run home to your mothers nipple.

I definitely learned a lot about programming during that time. To take a terrible system and try to research ways to make it better for your own mental health, not the companies is a personal drive and golden age of discovery which plays a part of my programming to this day.

I get none of this from TTD. It is intensely dull and unsatisfying but gets the job done.

Had over a million people use my code last year, when I checked for vanity. 12 years commercial experience.

I'm effectively the tech lead for two startups, one getting about 750k users per year, the other 250k. Freelancer, but I do a lot of time for 2 clients at the moment.

Both were projects started by other people, but a significant chunk of the code is mine now (one of them I've virtually totally rewritten from VB.Net to C#) and the other I've done huge amounts of refactoring to fix big performance problems, reducing the "main" pages from 10 sec load times for complicated orders to 250ms.

I've also refactored a lot of javascript for both of them without any tests, significantly improving client-side compile times and page load times.

I played with unit tests a few years ago, one of my clients has some, but we mainly don't add any new ones and they never catch anything[1].

[1]That's a slight fib, one caught a bug last month for the first time in the 1.5 years I've worked for this client. It would almost certainly have been caught in testing though.

Pick the tool for the job, if your experience is with things that last 5+ years then ok. I work on a system which is 5 years old and all things from 5 years ago are not relevant. Actually I am working on it only for 3 years now but we pretty much each year rebuild whole thing. With minor things going in and out. Of course we spend a lot of money on automated tests but those tests had to be thrown away because of amount of changes in the system.

So you disagree with TDD, but you need to rebuild your system from scratch every 5 years? I feel like this is an argument for TDD.

As for throwing away tests; Unit tests are meant to be pretty simple - rule of thumb is you can run a thousand tests in ten seconds. Arrange, Act, Assert - they don't need to be complex, they just need to imprint the intent into the codebase. If the intent changes, by all means remove the test.

> So you disagree with TDD, but you need to rebuild your system from scratch every 5 years? I feel like this is an argument for TDD.

Not necessarily. If the rewrite is only addressing issues that would have been prevented with tests, then sure, this is clearly an argument for TDD. However, if the rewrite is going beyond what could have been provided by tests, then having a large testing system could actually make the rewrites harder, which means it's an argument against TDD.

It's hard to say which is the case without knowing the details, and I'd certainly err on the side of saying a yearly rewrite is not a good sign. But if a company is undergoing rapid growth, it's not that unusual for certain systems to be rewritten frequently as fundamental new insights are gathered about how to tackle problems that are hard to scale. And if the system isn't that large to begin with, periodic rewrites could be easier than writing a single version that's supposed to last 10+ years as business requirements dramatically change and expand.

I agree. My experience is:

1. Wow it made 10k! Lets rewrite

2. Wow it made 100k! Lets rewrite

3. Wow it made 1M! Lets rewrite but let's properly think hard about the future of this product (Introduce tests and more people).

ALWAYS DO TESTING should always be given the context (if budget allows).

We rebuild because business needs change, because compliance and stuff. It is not like we rewrite because code is crap. We have like 30 percent of test coverage because that is part that changed only a bit.

Also DI does not reduce coupling. I've seen plenty of code with DI that's just injecting like 30 things, which is obviously therefore coupled. It just makes it really obvious, but DI itself has massive downsides.

This matches much of my experience with DI on real projects. Dependency injection is, unfortunately, used to reduce the labor required to achieve massive coupling. Of course the intended and potentially useful application of DI is being able to rewire an application with different components for different purposes or different environments, and the tradeoff for this flexibility is that you lose explicitness. You can no longer see the explicit wiring in code, which is a big downside. Yet I've worked on several DI-heavy projects with seasoned engineers who get this completely backwards. They see DI as a labor-saving device that allows them to write heavily coupled systems without having to explicitly work out the dependencies between things. In fact, they even see DI as eliminating the cost of complex interdependencies because it reduces the work and cleverness required to create them.

Partly I think this reflects a desire to work on "real" "enterprise" systems. Instead of fighting to keep complexity below the level where DI actually helps, they embrace the flattering thought of, hey, we're doing big boy work here — it's going to get complex, so we'd better use DI from the outset. People who take a lot of pride in working on big systems can't help creating them.

When TDD first started coming out years ago, in an email thread I asked one of the proponents how they handle the fact that tests can easily increase the complexity of a system by adding more code. I was expecting a rational discussion of how to balance tests making your code simpler with tests be a source of added complexity. Instead it was met with anger because I should "count" tests as part of the codebase, and how stupid I was for even considering such a thing.

I still stand by my point. 100 lines of code with 1000 lines of testing code means your codebase is 10x what it would be without the tests. That extra 10x had better result in a hell of a lot of benefit.

Additionally, over the years I've found that easily-tested code sacrifices readability, and that tiny functions isn't always the best way to go. I've gone back to liking larger, more monolithic functions as often I don't want to create a bunch of generic functions for hypothetical use for other projects, I just want some code that fits my needs to a T.

> I asked one of the proponents how they handle the fact that tests can easily increase the complexity of a system by adding more code.

Adding more code != more complexity. Complexity comes from high coupling between code, that doesn't separate concerns, which make it difficult to untangle, aka. spaghetti. Unit tests are mean to be simple, and anything they test will be no more complicated that how the system will actually be used. In this way they clarify the code, by encoding the intent, available for all to see for years to come.

> I've gone back to liking larger, more monolithic functions as often I don't want to create a bunch of generic functions for hypothetical use for other projects, I just want some code that fits my needs to a T.

This isn't what TDD causes, it just makes you break down things into small testable units. It doesn't make you create super generic wrappers, which I also agree are useless because of their un-readability.

Again, this is the argument made to me over 10 years ago, and again, they are wrong. While it is possible to add more code and not get more complexity, it is very difficult. Generally speaking, adding code adds complexity. Full stop. Excellently executed TDD will reduce overall complexity, but I almost never see that level. Rather, TDD's ability to catch bugs is mostly enough to reduce bugs (and other benefits) overall. But the costs are high.

Tests make clear the intent of the code, how else can you get that exactly? And the next time you have to fix a broken unit test, think of how much time it has saved you by constantly running autonomously as part of your build process.

> Given that you've at minimum doubled the code (and doubled the bugs), it seems like a really bad long-term trade off.

I'd say you've at maximum doubled the code. The test ensures you write only what you need to get the test pass. Without them, devs get distracted and wander until the feature works. Usually distracting themselves with tons of YAGNI violations along the way. In my experience, untested code bases have a ton of unnecessary code.

I don't understand how it would double the bugs. The article has references saying it reduces them. But, even thinking about it, I don't see why you'd say that.

That article is garbage. For example, the claim that an object has trillions of states ignores our ability to classify ranges of values. Sure, an int may contain "four billion states", but for the requirements (which he makes such a big deal about) its highly likely that we can classify the integer into three states: less than zero, zero, greater than zero. As a bank, I might not care how much money you have, only that you have more than zero. In a transaction, I don't care how much money changes hands, as long as no money is lost. The article is a tantrum.

> we can classify the integer into three states: less than zero, zero, greater than zero

Err, not quite. We're running on computers, remember. A comprehensive unit test that doesn't go through every possible integer input should still include:

maximum and minimum integer values for that size

maximum and minimum expected inputs

0, 1, -1

So, there's 7 test cases for a single integral input. And since you bring up banking, let's imagine you're working with a function with two inputs. Since most bugs come about because of the interaction between two variables [0], you'd want to check out each combination. So, 7 possible inputs for the first integer, 7 for the second: 49 test cases.

Why bother? You're working for a bank... imagine facing a client whose balance shot negative because of an overflow error and started accruing "overdraft coverage" fees.

Back to the original point: when was the last time anybody but the AFL tool wrote the "proper" 49+ unit tests for an "add_to_balance(int, int)" style function, when one test would give you 100% coverage for that function?

[0] https://csrc.nist.gov/Projects/Automated-Combinatorial-Testi...

If you mean "nobody would write 49 tests for a function when they could get away with writing one test and have it show up as 100% code coverage because people are lazy bastards" then I agree, and that's the problem: cargo culting the metrics. Look we have 100% code coverage! We are awesome!

If you mean, "no look seriously, you'd need 49 tests for every method because you're a bank", well then, no, you actually need less than 49 (since some are subsets of the others), and in reality even this would get very tiring so you'd write a currency class that can't overflow, and yes, being a bank we would test every case and certainly use automated test generation while we're at it.

But if you mean that in general, for non-banks, the classes of values is too large to manually write unit tests for, then we disagree. That means that your method is too complex.

> some are subsets of the others

In a purely mathematical sense, yes, this is true. But with computers those subsets matter. For example, I wouldn't drop the "max expected" for the "maxint" test; an overflow exception or implicit type change in the case of the latter shouldn't be tolerated for the former.

> That means that your method is too complex.

Would you consider the Python `requests` library's `get` method to be too complex? It has 10 parameters to it: 4 strings, 4 dictionaries, 1 file-like object, and 1 tuple/AuthHandler option.

If you conservatively say there are 5 test cases which should be tested for each parameter (a very low count, especially with strings and dicts), there's approximately 1,125 test cases that should be explored.

Not quite a trillion, but it's still a lot.

Without getting into the pros and cons of python (which will likely get me downvoted to oblivion), 'get' is an example of what I would call a coordination method.

I coach my teams to think of two kinds of methods: methods that do data processing, and methods that coordinate. To test a coordination method, you don't need to care about the value of the things it has to deal with. Maybe you care the they are not null, if your language has nulls. But other than that, what you are testing is that if your method A is supposed to pass arguments 1 thru 5 to method B, and 4 thru 10 to method C, then that's what you test. You don't need to test that argument 1 is a positive integer. The test for method B will do that, if it needs to. And finally you might test that if we expect the result of method B to be passed to method C, the test for our method A would do that using a mock B. Again, what mock B returned could be any object: we only care that it got passed to C.

So it looks like its 1,125, but its actually three.

If we have classes that are SOLID, then we tend to see the processing methods and the coordination methods end up in different classes. sessions.py isn't a great example of SOLID, and I would have a hard time writing tests for it. It is certainly not the kind of code you get if it was written with TDD.

I forgot who said it but it goes something like, "any code without unit tests is legacy code."

Which by definition means your tests are legacy code. It's turtles all the way down. For what it's worth I do think Unit Tests provide value.

Nah, the code tests the tests just as the tests test the code. If you expect your tests to pass and they don't, something's wrong. (Either with the code, or with the tests.) Same goes for when you expect the tests to fail and they don't.

The code is definitely NOT the test of the tests. It may seem that way because (sometimes) when the code breaks (e.g. fails to run at all), the tests also break.

But consider this: what if your tests are poorly written and fail to detect bugs? What if they fail because the test code is buggy? Failing to detect bugs is a bug (undesirable behavior/output) of test code. There are some techniques to address this, for example mutation testing ( https://en.wikipedia.org/wiki/Mutation_testing ), which effectively become some part of "the tests of the tests".

The take away: no, the code is not enough. You need to test your tests.

> what if your tests are poorly written

This seems like tautological reasoning. Naturally, if you write bad tests then they won't be effective, but that seems like a poor argument against testing. That's like saying "Yes, bypass surgery could save your life, but consider this: what if the surgeon is reckless and kills you on the operating table?". Obviously, there is a minimum expectation that the surgeon will follow proper medical procedures when operating.

If it seems like a poor argument against testing, that's because it's not meant to be one! It's meant to be an argument against the notion that "production code tests the tests". It's also an argument for making sure your tests are effective.

> Naturally, if you write bad tests then they won't be effective

But how can you tell if your tests are effective, i.e. if you wrote "good tests"? Production code alone is not enough; you cannot tell if a test is green because everything is ok or because it's buggy/incomplete. You must introduce some degree of quality control in your test code.

I missed this before:

> what if the surgeon is reckless and kills you on the operating table?

Excellent question! What's this surgeon's track record? How many patients have died on her operating table? How up to date with current medical research is she? Is she daring and reckless, or does she play it conservatively safe?

Note that mutation testing is mostly just an advanced (and more brittle) form of code coverage. The flip side is that code coverage reports can also be considered tests of your tests. Not exhaustive, but I'd argue that unit tests hardly ever are as well - and that's OK.

I disagree.

Coverage tests how much of your code is exercised by your tests. Mutation testing is (one way) of testing how effective your tests are -- i.e. how sensitive they are to bugs in your code. It follows that good tests must have both good coverage and good effectiveness, but the two are not the same.

The former seems to me like a (less comprehensive but also less brittle) way of doing the latter. Higher coverage usually means that your tests are more sensitive to bugs in your code - i.e. if a bug occurs in uncovered code, your tests won't catch it.

Of course they're not literally the same, but both are ways of measuring how good your tests are at catching bugs in your code.

It might sound pedantic, but this way of thinking shows you that it's not turtles all the way down: if they'd really be "tests of the tests", then you might also want "tests of the tests of the tests". And since we obviously don't want to keep doing that, you might conclude that we don't need "tests of our tests" as well.

But since they're simply measurements of test quality, we can see that both test coverage and mutation testing can be useful and not a never-ending story.

I disagree. Coverage and mutation measure completely different things, one is not "a way of doing" the other:

- Coverage measures how much of your code is exercised by the tests. It does NOT measure test effectiveness (it's not like mutation, only "less/more brittle"). This can be trivially shown by writing tests that exercise all of your code but have no asserts (or trivial asserts such as 1 == 1). Unsurprisingly, this kind of obviously ineffective test code is often find in the wild (mostly written by junior devs), but less obvious cases of ineffective tests are also common, such as failing to test border conditions. This happens and it's more common than we'd like.

- Mutation testing is a way (but not the only one!) to measure how effective your tests are at actually finding bugs. That this technique exists shows that there is indeed a need to "test the tests", i.e. measure test quality & effectiveness. Production code is NOT "the test of the tests", as someone up this thread erroneously said.

This is like measuring altitude and airspeed: yes, they both measure something useful about your airplane, but they measure different things!

I confess I did not understand the rest of your post.

I didn't mean coverage measurement is a way of doing mutation testing, I meant that measuring how much of your code is exercised by your tests is a way of measuring test effectiveness.

Yes, it can be inaccurate, as your trivial example shows, but it does guide you towards unhandled test cases (i.e. where your tests are ineffective). Likewise, mutation testing guides you towards unhandled test cases (including ones that coverage can't detect).

> I meant that measuring how much of your code is exercised by your tests is a way of measuring test effectiveness.

The problem is that coverage alone is a very poor way of measuring effectiveness, which is why other techniques -- including, but not limited to, mutation testing -- are needed. This is nothing new: limitations of coverage are well known in software engineering.

The code is the “test” for your tests ;)

The code and the test are probably to some degree based on the same logical construct. If that logical construct is malformed your tests may pass and your code might not work.

Regardless if your code works your tests could still fail, likewise your code could not work and your tests pass so since the tests and code may vary independently if either is buggy I don't see how this can possibly be true.

If the logical construct is flawed, no amount of tests will ever catch it, only user testing would.

The point I’m trying to make is that the verification of correctness is bidirectional. If I’m writing tests around existing code, I rely on the code to test my expectations. If I’m writing new code, I write tests to assert my new assumptions. All automated tests do are assert that I’ve written the same logic twice. Writing it a third time will also decrease the likelihood of transcription error again, but at diminished returns.

I believe that was Michael Feathers in Working Effectively with Legacy Code

I've heard it attributed to Michael Feathers.

There are undoubtedly multiple reasons. One reason is that there are domains where the hard problems are integration problems. If the point of your program is to poke/sample some physical object, talk to another opaque chassis/address space, execute realtime tasks (which in practice often includes UI/UX concerns), etc., then extensive integration testing is absolutely vital. If you focus on some other testing discipline and try to de-emphasize integration, there is a considerable risk of fooling yourself about whether your program actually does what it says on the tin.

Yeah I didn't mean integration tests were bad, just that you don't want to turn your unit tests into them unintentionally. Following the TDD mantra, you should absolutely have a failing integration test before writing integration code.

>I expect the reason TDD is so controversial on here is people can't see the long term benefits of tests, and instead only think in the short term.

There's a number of reasons why you could try TDD and get frustrated with it because it ends up sucking for you:

* You're using unit tests where integration tests would be more appropriate.

* Your integration tests would take longer to build than the thing you're building itself because you have to set up some sort of elaborate mocking.

* Time is of the essence but quality isn't / your code will be run a limited number of times.

The code you write can also potentially have a lifespan of 3 months, and it can be hard to predict which it's going to be.

I feel that TDD is an attempt to impose engineering discipline onto something which is still largely at the craftsman stage. As an engineer I like the idea of pushing the field forward, but I don't think software development tools are ready yet to make TDD and the like broadly accepted practice. That means you need to decide case-by-case if they make sense.

Tests define intent. Usually when you craft something, you have an idea of what you want going in. You may even draw it, and in the case of construction projects, fully 3d render it.

Tests are the programmer equivalent, they define a contract you expect from your functions, which can then help you shape the function itself.

> you actually decide to use DI

Whats DI?

Dependency injection

TDD isn't bad at all for a mature product where you have clear requirements for additional feature development and granular developer tasks.

I've just seen a lot of it where a product is still in broad strokes development and developers get stuck between whether having tests written based on early assumptions are correct and the code should conform to those, or whether new ways of thinking that invalidates the early assumptions and tests are the right way and the tests should be changed.

From the article "We don’t actually know that much about what good software engineering looks like." sums the issue up nicely. There is no definitive playbook on whether a strategy like this is good or bad. It's a tool that is good if you use it right.

There's some truth to this, but it's also possible that the team just wasn't very good at writing tests. It's not easy to do well. Changes will break tests, but given that changes are applied one at a time (you're not erasing your ./src folder, dumping in new code and expecting all your tests to pass), then it should be possible to keep the tests up to date. If a few changes break many tests, then tests might be over-specifying, not making use of factories, testing too much, or a lot of ways that are easy to get it wrong.

I've seen and heard this a lot, and it's often the result of code that does too much (not cohesive, too much coupling) and tests that correspondingly do too much.

Getting good at TDD requires being bad at TDD for a while, which requires someone to write bad test code at some point.

But if the entire population as a whole is still having problems with TDD for as long as TDD has been around, then it needs to be a niche methodology.

Our decisions to adopt methodologies/technologies can't revolve around the perfect case of a crack A-team of developers who are good at everything they do.

Think of the mediocre developers!

Tests allow you to easily refactor implementation and allow you to add features without worrying as much about breaking the existing interface.

However, they're terrible if you want to refactor your design and correct problems with the interface itself. Tests freeze the interface. This is really good for whole classes of problems but terrible for other classes of problems.

The debate is going to continue endlessly as those on either side are looking at it from entirely different frames of reference.

That is why I almost exclusively tests against external interfaces, since those should be frozen anyway (so others can depend on it). If the system is a web service that means the HTTP API, if it is a JS library it is the public API etc. And conversely almost no tests against internal interfaces, since internals don't matter and should be free to change.

This submission appears to be a reaction to the following submission:

"Why Most Unit Testing Is Waste"


(just for the record, i.e. for future readers)

The problem with many TDD critiques is that they offer no alternative. The original presentation:


is a case in point. Presenter takes what he believes to be TDD's four main points, some of them strawmen, and mocks them. He does make some good points, but here's the problem: he offers no alternative.

If you're not writing tests as you go, that you run before every commit (or before you move onto the next thing), then what is your standard for putting code into a production repository?

You're right, I also don't know good alternatives.

I wrote my first big API with many tests, hundreds of them. Then the requirements changed and all of them failed. People worked for weeks to get the tests passing again. So just writing many tests up front in a new project doesn't seem to help anyone.

Also, I went from feature to bugfix sprints. First I implemented some features, then they got tested by non-devs, then I fixed all the bugs.

Often this was faster than the whole tests up front stuff with refinement of tests afterwards.

I also saw that >90% of my bugs came from the dynamic nature of my language of choice (JS). So I could imagine, that this feature->bugfix->feature->bugfix cycles could be greatly shortened (especially the bugfix parts) by using a typed language instead.

I am generally for TDD, but I don't think APIs really need heavy code coverage.

That said, testing the input/output directly doesn't work well because now a single change the output data (adding a field to the JSON for example) will call tons of tests to fail.

I usually use or create some JSON serialization/deserialization classes and use those to generate fake data and ensure that the API returns 200 responses.

That way if I add a field to the JSON objects, my tests are unaffected and will generate the new fields in the tests.

So I may do something like this (Python-ish):

    new_user = User.generate_user()
    response = api.post('/users', new_user.serialize())
    assert response.status_code == 200
    response = api.get('/users')
    users_json = json.loads(response)
    users = {}
    for user in users_json:
    assert new_user in users
That way changes to the User JSON will not impact tests and I still have some basic sanity checks.

Hope that helps.

Nice, thanks :)

Do you have any good resources on testing? Books/articles etc.?

I often have the feeling the only people who write about this stuff are the die hard TDD gurus.

No problem!

I did read Clean Code by Robert Martin. He's definitely in the "die hard TDD guru" category, but there are still useful examples to be extracted from the book.

Besides that, I mainly learned a lot of testing tricks in the wild, reading through Github projects. My learning process now-a-days is mostly:

1. Discover a really cool trick or concept I didn't know existed in a Github project. 2. Research the hell out of it. 3. Try to use it in some personal example project.

In most statically typed languages, the bugfixing (usually) takes longer because you need to do many more things each time you refactor. An enterprise Java application doesn't typically pass around the equivalent of a JS object. It passes around Users, Admins, and Guests. But then some features in Admins need to be added to Users without affecting Guests. But Guests inherit from Users, so you have to go back and restructure the type hierarchy. But it turns out that User is a Hibernate entity, so you have to make sure you don't change your DB schema on accident.

Using a static language does help with some bugs (where it effectively serves as a compile-time test suite), but it can also increase coupling a lot. It's not easy to quantify that tradeoff in abstract. I'd be wary of people overselling aggressive compiler checks leading to productivity boosts.

I'm beginning to suspect that, in the context of these debates, "most static languages" is a roundabout way of saying, "Java", or other static languages with a similarly anemic type system. Also it's a roundabout way of describing the particular way that Java code tends to be structured.

In one that has better support for generics (i.e., reification, contravariance and covariance) and some form of mixin, you generally shouldn't have that much trouble adding behaviors to an entire class of types without having to modify any of them. This is the fabled open/closed principle that is oft lauded and rarely practiced.

To take it a step further, if you're using interface polymorphism and decorators to build your types instead of relying on subclasses, you won't be able to paint yourself into the corner you describe in the first place. The problem is, of course, that a language like Java that doesn't let you add new behaviors for types without either modifying the original source file or resorting to some Gang of Four awfulness, will tend to punish people for writing cleanly-structured code like that.

> ...is a roundabout way of saying, "Java", or other static languages with a similarly anemic type system...

I agree. In my experience, it's how the median developer writes code, though. Even in more flexible languages they'll reach for type hierarchies and abstract base classes. I've seen people create these things in languages like Lua and Javascript that don't really need them.

I think TDD-like approaches make the case for other approaches more clear, for what it's worth.

Further agreement.

I haven't looked at a textbook on programming recently, but I'm worried that the standard is still to actively teach new developers to program this way, even though we've _known_ for decades that towering piles of subclasses invariably collapse under their own weight.

That said, I still wouldn't lay this tendency for damaged design at the feet of static typing in general. Not when some of the most vigorous arguments for static typing tend to come out of language communities that don't have subclassing in the first place (e.g., Haskell), and when (as you point out) similar mistakes are just as often made in dynamic languages. Dynamic languages are certainly more forgiving about poor design, but whether that's a good thing is yet another fun debate.

I don't blame statically typed languages. But, yeah, they are a bit more unforgiving if you need to refactor your way out of that mess is all.

Static languages make refactoring easier, not harder. The more information you have at compile time, the more automated tooling can do for you.

The more compile time information you have, the greater coupling you have as well. Make User subclass Entity for polymorphism reasons and now taking a User parameter makes your business logic depend explicitly on your ORM.

But now the conversation's going in a circle. In a decent static language, you should never have to run into that particular problem.

Assuming it's a modern static OO language, your business logic should depend on a User interface, so that it never has to take a dependency on implementation details like that. Even if User was a class beforehand, you can easily extract an interface at a later date, when you find that you need to avoid some tight coupling.

Don't blame the gun for what happens when you point it toward your foot and pull the trigger.

Somewhat disagree - you still have 'types' in JS, but the runtime just sees maps instead of concrete types. The concept is still there, but you won't get an error until runtime if you invalidate the models. At least with static languages the problems are made clear up front and prevent compilation.

There are other limitations though, such as not being able to treat static types as a hashset at runtime, which can be good or bad thing depending on what camp you're in.

The 'types' in JS don't typically include implementation implications as much as they do in more strongly typed languages. If your algorithm is looping through a set of widgets, you just need a container and things that look like widgets. In Java, C++, etc., you often are pulling in a type with baked-in assumptions about logging, DB access, etc.

>If you're not writing tests as you go, that you run before every commit (or before you move onto the next thing), then what is your standard for putting code into a production repository?

It seems to me that that would be the same standard that you would otherwise try to write a test for.

Now writing a test isn't necessarily a bad idea, but I think it's important to realise that the test isn't the standard itself, but is itself an implementation.

That has to be one of the worst presentations I've seen. Is it ironic?

It's a bit of a tongue in cheek and a bit of a roast. Eric Smith does TDD consulting.

You know what's awesome about TDD? It means I, a nobody, can contribute to a huge open source project with lots of moving pieces and be pretty confident that I'm not catastrophically breaking anything and that my feature works as intended.

That's awesome!

Right. Good tests are a passive communication tool. They communicate expectations. Most of the discussion here revolves around different people having different expectations. I think the "please test" people have a better argument mostly because they're advocating for tools for good communication.

Totally. I used to hate testing. It felt like someone making me do homework I didn't want to do.

Then it dawned on me that I was already testing, the hard way, by opening up a console and manually setting up test conditions over and over and over again, and that I could do this much faster and in a reusable way by writing tests and running them. What an epiphany that was!

I still open up the console all the time. It's a really useful thing. I think ideally, my testing environment would dump me into a full-fledged console if something went wrong, but this is not something I've taken the time to set up.

That seems to do more with automated testing (unit, regression, integration, etc) than with TDD. It's important not to conflate the two because TDD proponents say that TDD is not primarily about testing.

Thanks for clarifying this. I wasn't aware of the hardcore TDD thing, I moreso just interpreted TDD to be the concept of "write tests as you code".

Yeah. "Pure" TDD is strict cycles of "Write failing test, write the bare minimum of code that passes, refactor", which most studies suggest isn't much more helpful than the "common practice" TDD of "write tests as you code".

It's not more helpful as tests than "write tests as you code". But "pure" TDD is more helpful as a guide to low-level design. I'm not sure that most studies can quantify that.

My first resource on these things is the absolutely fantastic book "Making Software", which has overviews of a lot of these questions how they're studied. The studies they looked at didn't find a correlation between pure TDD and 'software quality', but they admit the data is pretty scarce.

All projects that have been created using TDD have tests, but not all projects that have tests have been created using TDD. You're saying that having tests is good and I would assume most people agree... But that doesn't mean that TDD is automatically great. It's the DD part a lot of people find themselves skeptical towards.

My journey with TDD started with hater, moved to skeptic, and is currently on cautionary-supporter.

It's a design methodology, not some new way of unit testing. In fact, I think the more you think of TDD as being testing, the more you're probably missing the point.

Modern OO languages are full of hidden dependencies and perverse side effects. The only sane way to write clear and maintainable code is to write the spec first, that is, you code by writing tests, then writing the code to make the tests pass. In this manner your code is always up-to-date with your spec.

Where is it a bad idea? Exploratory or academic code, for one. Startup code where there's no clear benefit to maintainability or even knowledge of what the app is supposed to do.

Pure functional code is another case entirely. Lately, I've switched to writing small pure FP in microservices, usually with less than 200 lines of code. Writing code like this creates very simple and small pieces of functionality with little hidden state or adverse side effects and a limited cyclomatic complexity factor. I don't see any reason to use TDD here, because there's nothing happening that isn't obvious. (It's a horse of a different color with larger pure FP projects, however. Having said that, one should pay careful attention to whether or not you need to build out huge pure FP execution units in the first place)

>It's a design methodology, not some new way of unit testing. In fact, I think the more you think of TDD as being testing, the more you're probably missing the point.

Weirdly it seems to be the loudest advocates of TDD who are the most confused about this.

Yes. Every time I read an essay where in one paragraph the author uses "TDD" and the next paragraph "unit testing" to mean the same thing, I want to cringe. I know we're off-the-rails.

I've seen many outsourced teams say they're doing TDD and when you look at the code it's obvious it's just the same old unit testing as before. I have no idea how vendors get away with this. It's no less than fraud, really.

ADD: I think the danger here is that, even with hardcore TDD boosters, they don't understand why they're doing it. It's a discipline, not an engineering skill (Choosing the tests is the engineering skill). Over time they tend to get lax. After all, the code always does mostly what I wanted it to do, right? So I can look at it and by inspection reason through the execution.

At this point, when you don't understand the rationale behind it and you've started to slip-up in your application, TDD has become nothing but some weird way of writing unit tests. Then, sure, you can use the terms interchangeably. But then you've missed the entire point of what you're doing, so might as well just call it "unit test ahead of coding" or something.

I have been attempting at TDD on and off for about 3 years. It never really took off for me. I would spend most of the time writing the tests, then implementing the code, then only to find out that a lot of my tests don't make sense, or are overthoughts that don't really add business nor technical value.

However, I think I reached my sweet spot just weeks ago. Here is my optimum workflow now:

1) Write test cases (the one sentences that say what is expected, in plain English) 2) Implement the code 3) Implement the test code

The test cases become neatly arranged in bullet-like layout in a test case file. I'm able to read through and be confident that I'm probably covering most if not all the cases that need to be covered. I'm always able to switch between the code and this file to make sure my code covered all that the tests need.

Then once the code is done, I come back to the test cases to implement them one by one, catching error by error and seeing my code coming to life. As I code now, I know I have the implementation I'm quite confident of, my tests that have business value are being covered one by one. It's been pleasure since then and I'm more confident of my code and of my time spent efficiently.

I've been trying to get into TDD myself for years. I've read the 2 most suggested books and another book that had "testing" in it. But I still came back with issues. I don't program Java, so I don't understand some of this code, so I can't just convert it over to the languages I know. The examples were of basic stuff like "Lets make something that adds two numbers", "Here's a basic mock, lets not worry about DB queries", etc... It really didn't teach me how to think in TDD, it wasn't specific for the languages I program in, nor did it touch on things like "real time data" coming from a stream source, which I think is "How to think in TDD and how to program to fit it."

This works very well for me too when I'm reluctant to go full TDD. That said, step 2 isn't necessarily the complete code needed for the tests, so there is some back-and-forth between step 2 and 3, resulting in some test code still being written before the code under test.

I never was a fan of TDD, until I saw this talk by Ian Cooper: https://www.infoq.com/presentations/tdd-original

The whole idea of testing functions and/or classes separately means tightly coupling your test code to the implementation of the real code, while you should only care about testing the functionality.

Nowadays, I try to write tests that test a unit of functionality. And the tests should only change when the functionality changes, not after every refactoring.

That said, code coverage is a metric with no inherit value (well, unless it's 0, of course)

Not a word for word quote, but I heard this:

> How many of you have a code base that if you refactored, test would break?

> Most people raise hand

and it made me realize maybe I'm not as stupid as I think I am.

I've been trying to understand how to create unit tests that allow me to refactor for years, and have been completely unsuccessful. The only way I can achieve this is using the "classicist" viewpoint of creating many "unit test" and only isolating the architecturally significant boundaries. That doesn't make me happy either, though.

But, yeah, good to know (or bad to know) that I'm not the only one that struggles with this.

Back to the video I go.

I don't think that any of the common development methodologies are crap. I see no problem with writing tests as a way to work out ideas. I see no problem with not doing so. I think the main flaw in these methodologies is that they assume people problem solve in the same way. It only becomes a problem when you have a methodology zealot telling you to work in that way only. I am happy to write unit tests to cover my code. Unit tests can and do prevent bugs/outages in many cases.

What I do think is kind of crap is the holy war of methodologies that exists. In my experience, it usually happens that some new CTO or other manager comes in and says, "We are doing it wrong! From this day forward, WE ALL MUST USE TDD/AGILE/WATERFALL/YOURFAVORITEWAYHERE".

There is no latitude given, and folks who are not used to the methodology now take 1.5 to 3x longer to complete things, and deadlines slip. Then the push to complete work means that things get written sloppily and tests are not well thought out or people arent fooing their bar with the baz properly. Quality suffers in the short term, but eventually everyone catches on, and life goes back to normal.

Then the new CTO shows up...

I'm glad that developers are finally having honest discussions about this. Not long ago, it seemed like 100% unit test coverage with TDD was the only valid point of view to have.

I'm not a fan of TDD (in the sense of writing the test first) but I do think that some sort of testing-as-you-go is important for back end API work in particular. I don't think that 100% test coverage is a good idea for 99% of business use cases.

I’ve been viewing tests as writing the logic twice. That might sound like a waste but it’s kinda like a form asking you to type your new password twice; you may make a mistake the first time, but making the same mistake twice has much lower odds.

And really, we’re playing a probability game with ourselves, trying to reduce the probability of typing the wrong thing. Writing it twice is one way to do that.

I think DHH sums TDD up best in this talk (which is absolutely fantastic btw): https://youtu.be/9LfmrkyP81M?t=24m

For those who don't use that acronym:

TDD means here

Test Driven Development


the most important part to me is to write testable code. and you cannot be sure that something is testable unless you do write at least some test to check that objects do in fact work in a mocked environment and that you can replicate user interaction at any level of your application.

writing a full test battery after that is probably overkill, but making sure everything can be taken out of the running app and tested singularly is essential to be able at a later date to get a user bug report and convert it into a testable case to narrow down the root cause.

a good strategy for that goal is to make use of dependency injection at each layer separation and to make sure that every user generated event can be also triggered pro grammatically - that's especially useful as while relying on something like selenium do work, it's exceptionally costly and aggravating in the long run.

being able to isolate the bugged behavior and responsible component is the major advantage that comes with the full TDD implementation, but that doesn't mean you can't have enough of that with a lighter approach

> you’ll need a bunch of servers and a hug if you have microservices

I laughed out loud

Hi Hillel Wayne (Author), I love your writing style and that you are so carefully articulate about what people say, and reading their arguments charitably! The world needs a lot more of that. This article has an inspirational writing style that I will try to leverage in my own writings. PS maybe put your name at the top of your posts :)

Thank you! I was heavily inspired by Dan Luu's blog (https://danluu.com/), so I'd recommend checking that out too.

My view is very pragmatic: Some code is far easier to write quickly and correctly using a TDD approach.

In other cases the cost of refactoring with TDD is very high if any significant design or architecture changes occur, so in those cases I find it is useful to stabilize the broad strokes/patterns a bit before investing to much in tests.

I haven't shared this on HN but I saw this on another forum and thought this was a really fantastic example of what you can end up with, in the real world, when you are driven by tests. Real-world example:


It doesn't look like a joke to me. It only works over integers so the code is absolutely correct.

It also strikes me as the kind of convoluted logic that someone took a really, really long time to come up with, before it finally worked. (As indicated in the comment.)

A test can hardly capture what's wrong with this code. But any human can see it instantly. (And it's kind of weird that the programmer didn't.) I think most people can think of braindead decisions that are not really captured by testing.

Are there any proponents of serious testing or tdd that don't also promote code review? Why would tests make this code more likely? If anything, I am more confident that I can change it to be better if I have a test suite.

Yeah the way to write better code is to write better code. I happen to find TDD a useful tool in that but doing things badly is still possible.

Someone mentioned DI not getting rid of coupling and I agree. DI is a tool you might use but the way to get rid of coupling is not one simple thing but a process of a bunch of different tools and techniques. You can't just slavishly fallow some process and expect it to fix all your issues. You have to think and do work yourself to fix it.

>Why would tests make this code more likely?

Code like this is basically only possible through testing, because the programmer doesn't understand why it is working: it says so in the comment, which I believe.

So if it weren't written against testing (manual or automatic) it simply couldn't be written like this.

There are a lot of really broken designs that "work". Testing gets you "working" code. Often you could do much better starting with correct code and then adding testing afterward - which I believe is not the essence of TDD. TDD is about driving the writing of code by tests, and I believe the example I shared is one of these bad outcomes. (There are others.)

This is a very poorly researched article, and the previous was as well.

With people like Capers Jones and others doing piles of studies 30+ years ago, I'm confused why someone says there are no studies on TDD.

My bet is the author doesn't have access to the relevant historical papers, and doesn't know they exist.

Could you provide some citations please?

Look up the talk "what we know about software engineering" on YouTube. Also, search for Mills, Cleanroom, and obviously Capers Jones as I mentioned.

I'm on my phone so I can't link a lot, but I maintain that the author should do a lit review of software quality in engineering, and they'll get better conclusions

TDD is only ~15 years old. Cleanroom and correct-by-construction both have some solid studies supporting them, but take very different approaches than TDD does. They tend to be more rigorous and also take longer so again, it's a case-by-case basis thing.

> They tend to be more rigorous and also take longer so again, it's a case-by-case basis thing

I'm not sure if we're taking away the same things from these studies, as their whole conclusion is it actually takes less time and costs less, due to early wins in quality. The papers claim it's not a case-by-case thing.

I created an document explaining TDD (test unit in general) is a real case scenario:


tl/dr: In theory TDD sounds great but in a real example, TDD is not magic with a real limited coverage.

it's not crap if there's tight coupling between the testing frameworks the programming language and the frameworks used.

The only thing I disagree with in this article is the leeway given to supposedly legendary programmers who can somehow write bug-free code without tests, specifications, or an inkling of communication with others.

First, it's probably not true. Linus Torvalds is not a legendary human being who can write critical systems without a single flaw. He relies on legions of human beings to carefully check and review every line of code before he even looks at it. There are discussions on mailing lists. There are arguments and disagreements. There's a process there. He doesn't just flit his fingers across the keyboard and output amazing, error free code. It probably has tonnes of errors.

Linus' philosophy is that errors aren't the end of the world and someone will patch them when they are uncovered.

For some use cases that's fine. However there are plenty of applications where a more proactive approach to correctness is necessary: real-time systems, safety critical systems, and yes... even security.

Maybe TDD is a misnomer. I think we should call it specification driven development. Unit tests and integration tests are just a weak form of specification. They provide theorems in the form of examples that we try to prove with an implementation. Property based tests give us more examples to quantify assertions over. Model checking can test liveness as well as safety in our high-level designs... how much you need to specify and how thoroughly really should be a factor of the risk and complexity present in the requirements of the system.

To use an analogy: blueprints. If you're just building a shed or a small footpath then it's enough to sketch your idea on a napkin. If you're building a house you need to have a more specific and detailed plan that passes by a civil engineer. And if you're building a sky scraper then you need to be thorough and able to convince others of the validity of your designs.

(credit for the analogy should go to Leslie Lamport).

I think most software projects are at the house level in terms of risk. You could get by with using a dynamic language and a few unit tests if you value productivity more than correctness. That just means you're willing to accept that you will have higher reported error rates and are comfortable with potentially losing customer data or a higher risk for security vulnerabilities. You can lower your risk if the project requires more sensitivity to data consistency or security by using a sound type system and encoding your assertions at the type level, add some property-based tests, and more integration and unit tests. It's a spectrum one should consider.

I know we all like to write code and sometimes we even hear ourselves saying, "Well if you wrote the perfect specification you might as well have written the software," but don't be fooled.

"Software engineering is the part of computer science which is too difficult for the computer scientist." -- Friedrich Bauer

Not disagreeing with your overall message, just want to point out something that bugs me and you're not the first one I heard saying something like this: "They provide theorems in the form of examples". This is incorrect. Theorems are deductive while test are experimental and this is a crucial difference in my opinion. To give and example: say we have our function isDAG :: Graph -> Bool, which determines whether a graph is a DAG, and we want to _prove_ that it works then no amount of experimental test cases will be sufficient. In real life, however, most of us (including me) settles for having _some_ confidence in the correctness of our code by utilising tests.

You're absolutely right. Thanks for pointing that out.

I was using theorem and proof as an analogy to illustrate the separation of specification and implementation.

A useful distinction as you get further along in writing formal specifications.

Update: typo.

I think it's more than just "specification driven development". The question is, does your code match the specification? What is your objective evidence that it does? (Yes, I know, a test is not perfectly objective evidence. It's better than "looks like it should work", though.)

> The only thing I disagree with in this article is the leeway given to supposedly legendary programmers who can somehow write bug-free code without tests, specifications, or an inkling of communication with others.

I absolutely agree with your disagreement here.

tdd always reminds me of ron jeffries attempt at solving sudoku, in contrast with peter-norvig's approach...

Ron Jeffries' mistake was thinking that he could just use TDD to solve sudoku, as opposed to research and up-front design. Testing does not substitute for thinking.

What is TLA+?

Author has a nice tutorial on a subset of it with practical examples you might find interesting:


I guess TDD can be great for parsers.

In the classical way of setting up the parser, passing some example input and then asserting some things about the output? It's good for covering basic features/usecases. Since setup is always the same, it should be written as a list of input/expectation pairs (data driven). But coverage is typically limited by the effort needed to write the assertions - which grows in parallel with the input complexity.

If one also writes a serializer, then one can additionally test for any input the property: `serialize(parse(input)) === input`. This means adding a new test is just dropping in more example inputs (say from bug reports).

Now one can go further, and define a set of mutation operators that can act on input to produce a new valid input. Reorder tokens, change data at leaves, delete and insert new data, etc. Now one can generate arbitrary amounts of new test cases based on existing input examples.

Other mutation operations can be designed to generate invalid inputs, which should always give an error (never crash, halt or unexpected exception).

This is great.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact