Hacker News new | comments | show | ask | jobs | submit login
Tests Are Overhyped (sturgill.github.io)
149 points by sturgill on Apr 15, 2013 | hide | past | web | favorite | 149 comments

My experience is that you can write an essay like this about any development practice. You can cite your experience ignoring it, you can say some nice things about the importance of experience and priorities and managing trade-offs, and you can satirize people who are dogmatic to the point of becoming cult-zombies.

It appears as if such an essay providing great advice. But it it?

What is the reproducible, objective process or behaviour this essay is advocating? If I give this essay to five teams and then look at what they are doing to follow its advice, will they all be doing the same thing? What results does this essay suggest we will obtain from following its advice? How can we be sure it will be better than the results we would obtain from mindlessly chaining ourselves to the red/green light?

For better or for worse, the red/green light protocol is specific. As such, it can be criticized. A specific protocol has edge cases. You can argue slippery slopes. It's a stationary target. Whereas, vague and nonspecific generalities are more difficult to criticize. There are no details, so how does one identify edge cases?

It's great to hear about this one specific anecdote. beyond that, what am I supposed to walk away with besides the knowledge that slavishly following a process can be bad?

On the other side:

Today I was working with a new library. I wrote a 30 line sample program just to make sure I understood the library semantics, and that it would at least do what I thought in a situation similar to the actual code base.

I called over another developer to sanity check my thinking, and he spent 10 minutes going on how I should have just written tests for it in our codebase instead.

He would not take any of these as valid discussion points:

* I spent nearly 7 minutes on my sample code, writing tests including the appropriate mocks for our codebase would have taken 70.

* I didn't even know we would be moving forward with that library, why start integrating it into our infrastructure?

* I'll write tests of our code using the library, when we figure out what the thing we're building actually does.

* I would have to write that same sample program before I can even write tests, to know how to use the library.

I do know now not to include that guy in my thinking process ever again, if tests aren't perfect, I will lose any time including someone might gain by dealing with his TDD is my holy saviour rants about how I should learn.

Worst part: this guy takes days to write "tested" code that still doesn't actually do what it is supposed to.

You did a spike.

It's ok to be untested for discovery. Even in TDD.

In pure TDD you would be supposed to throw it away afterwards and redo it clean (with learnings) based on test-first.

And that is exactly what is wrong with TDD - the sheer dogma involved in a insane statement like that. Throw out all the code - tests weren't written first!

It doesn't seem to matter much that the end result will be the same if you write up some tests for that code, or if you throw the code out, write tests, then just write the same code again. If the tests didn't come first, it's not TDD, and now we can't follow some holy procedure.

Pure madness and everything that is wrong with TDD is summed up in that simple statement: "In TDD you would throw it away and redo it".

I am not a TDD advocate. As it tends to be too cost intense in early stage startups.

But if you are working on critical parts or products you know will be in use for a longer time it is usually the right approach.

To defend why you should throw spike results away in TDD:

- Usually during a spike you learn how to use an api/framework. You are focusing on very low level solutions.

- You made it work the first time. But now when you do it again you wouldn't do it the same way. You don't redo usually.

- Usually the second time the solution is clean and proper strcutured.

TDD is very focused around proper software design. And for the areas where you actually need that it is a great tool.

In any development, you probably want to throw away your exploratory prototyping - that's not TDD specific. I've seen/heard "write it; throw it away; rewrite it" attributed to Knuth on several occasions, but I'm not finding the original source.

I feel your pain. It sounds like your colleague really likes his hammer and now sees all problems as nails. What you were doing was a little experimental probing of a possible line of enquiry. Basically some stuff on which TDD has nothing to say.

That guy is an asshole.

Yet is still better in general to solidify your gains into repeatable tests, instead of throwing away all your research.

To the GP, yes, the less you write tests, the more your code base rolls up into a giant ball of hair and wax, and you can't investigate the behavior of any part of it without mocking out the rest of it. That is why we say testable code is usable code.

I've been on both sides of this.

You do not want to wind up with important code that is not tested. I spent hours today chasing down bugs in code that I wrote over the last month that I would have never known were there if I had not been writing tests. I value tests.

But on the other hand there is a valid role for experimenting and trying things out. In that mode you're not writing production ready code - you're throwing stuff together. When you write code in that mode, following a testing mantra is worse than useless. I say worse than useless because the mental energy spent testing gets in the way of experimenting and refactoring.

The right way to reconcile those two modes is to write your experimental code, think about it, figure out how you want it written, then once you know how to write it, go back and write it "for real", complete with tests. In fact the carefully tested code that I was writing today is based in large part on ideas that I figured out in a throw away prototype written some time ago, in a different language, that had very little in the way of tests.

I'm not sure that there is a sure-fire way to quantify what tests are or are not necessary. In my opinion, this is something that comes with experience and is more of an art than a science. But I'm okay with that.

If all you got out of the article was that slavishly following a process (like TDD) can be bad, I'll consider that a win for my writing. Because my general feel of the community is that testing is some sacrosanct practice that must be adhered to without question. And I question that.

I write tests. I'm not anti-test. I just imagine I write far fewer tests than most. In my opinion we should constantly question whether a certain practice (in this case, writing a specific test) is worth it. I think that's a healthier approach to development than blind red/green light.

I very much agree that knowing when to test and when testing is not needed is an art, not a science. I also hope that your readers know the difference between writing tests and writing code using tdd.

My biggest problem with your statements comes down to the idea that you claim tdd can be bad (particularly, it would seem, for startups) and yet you do not give an alternative which has equal or greater benefits without the perceived overhead cost. You say we should constantly questions whether certain practices are worth it in stead of blindly following red/green lights. Fair. But how will questioning practices give you the knowledge that your code is not broken when a junior developer makes a change "somewhere"? How will questioning practices allow you to better refactor when you pivot quickly?

I have worked in many startups and I know time is money and bank roll is king. But I believe that tdd allows you to keep those financial concerns in the black longer. If not tdd, what is an alternative method? "not writing code in tdd", is certainly an alternative one, but is it a good one?

To be clear: I did not say TDD was bad. I simply said it might not always be best. I took that a step further and suggested that with a start-up, and especially an early stage one, is probably not worth it. Not that it's bad or wrong. Simply that the costs might outweigh the benefits.

If you aren't market tested and/or market validated, I would suggest that getting something out the door is probably more important than test coverage. Because odds are those tests you wrote will be worthless when you pivot. There's a cost to writing tests, and the benefit from those tests are increased when they are reused. If you use a test once, then pivot and have to scrap it, you would have been better off hacking your way through the first release and getting to that first pivot faster.

Once you start to gain market traction and you actually know what the market wants you to be then your world changes and your approach to writing code should as well.

And as for junior devs, that's a tough one and a reason I avoid hiring junior talent. Not avoidable for all managers, but I didn't write my post to be the new bible. Quite the opposite: I think we're a bit too worried about making sure we're doing what everyone else says we should be doing. Rails is omasake and all...

I'm just tired of hearing everyone talk about TDD as if it were a given. Protecting against SQL-injection? That's a given. Being a TDD-based web startup? Not so much.

The problem with articles like this is that people who don't work in web startups and aren't writing tests at all will see this and go "writing tests is hard, that guy on HN said testing wasn't worth it, therefore I'll carry on without writing tests". You seem to live in an environment where people are already advanced in their software practices, but I doubt this is representative of out industry as a whole. Your analogy with SQL-injection is quite revealing. I can assure you that protecting against SQL-injection is not a given for everyone.

The irony is that if testing is over-hyped, it's not where it would be much needed. Too many big-budget long term developments don't use testing at all. Because there are no tests people are scared to make changes because it might break something, so they never refactor and the code becomes a big mess which obviously makes it even more scary to change.

I felt like a hungry man reading an article about how food is over-hyped :)

Sometimes writing tests takes an inordinate amount of time compared to the amount of value they provide. I worked on one project in which we were spending 70% of the time writing tests. The scope of the project was small, and the uptime requirements were light, so we ended up just throwing out all tests. It never crashed in production and we were able to iterate on new features much more quickly,

If you have a general feel of the community, your community is very small.

Is your community full of laptop researching prototypers, or million request per second distributed system operators?

By "community" I'm referencing the Rails startup scene in particular, and technology startups in general. Here's a fun exercise: go find a job posting for a Rails startup that doesn't expound on the virtuosity of its TDD practices.

"million request per second distributed system operators" is the definition of a tiny community.

A tiny community with tens of thousands of members in each of several major cities.

> My experience is that you can write an essay like this about any development practice

Right, but in this case, he's writing about testing. One would argue that relevant now because the foolishness around how to test and how often has reached ridiculously dogmatic levels.

> What is the reproducible, objective process or behaviour this essay is advocating?

Why does it have to have one? There are many articles that make it to HN that call out software practices that are just ridiculous and help spark a discussion about them, why can't this be one of those? why can't it be the article that makes a young developer who's slowly turning into a TDD zombie "must ... test ... everything ... all ... the ... time" pause for a second and go "hmmmm"?

Any popular development practice is overhyped, many of us know so little about writing good software that we often cargo cult. How does one manage a bunch of inexperienced software developers, and maybe the manager is inexperienced themselves? Well, try to copy what others say has worked!

Really, the only cure is judicious/experienced application of development practice. Is unit testing always appropriate? No! Is it sometimes appropriate? Yes! But that is not an easy message for people to understand.

I've seen unit tests on prototypes before because that was standard practice, only to have the entire prototype scrapped at the end...because that was also standard practice (never ship prototypes :) ). Confusing.

Tests for software development == Crimes for political agenda.

This is one of those issues you are supposed to lie about.

Tests are like crime in politics -- no politician is going to say "I am soft on crime, I think we should reduce sentences." So crazier and crazier laws are created. Mandatory minimum sentences for possessing small amounts of drugs, all kind of craziness. People know it is crazy, but nobody is capable of speaking up and remain standing.

Same thing with tests. Nobody can say "fuck it, stop writing tests, the customer will never run this code, or this is too fucking obvious, test stuff that matters". That is considered irresponsible. Everyone is supposed to worry about not having enough tests.

Now there a subtle difference and that is when it comes to shipping on a schedule and in many companies tests are ignored -- BUT silently. Everyone loves talking about tests but try and tell you boss you spent 2 weeks on adding tests to your code. They might not like. Try to double the time you promise to do a feature by saying I need to write tests for it. Or if you are given free rein try saying I won't do all the new awesome features, I'll write more tests -- it will be approved probably but everyone in the end will praise the guy who chose to work on and deliver features -- even though they might be completely buggy and unusable. So that is the other side if you will.

> Same thing with tests. Nobody can say "fuck it, stop writing tests, the customer will never run this code, or this is too fucking obvious, test stuff that matters". That is considered irresponsible. Everyone is supposed to worry about not having enough tests.

I don't know about everyone else, but I definitely worry about having too many tests. For every test I write, I have to weigh how useful the test is versus how likely it is that the thing it is testing will change. If its usefulness is overshadowed by the likelihood that it will be a burden later on, it does not get written.

Yap. It is very easy to write bad tests -- tests that test too much, tests that test the wrong thing, tests that are too fragile (very timing dependent, they break intermittently and suck up resources in tracking down the cause), tests that are not repeatable (someone inserted random input generation and seeded the generator with the time function).

Apply the same thinking to the feature code you write as well.

Oh, that threshold is even higher.

Try to double the time you promise to do a feature by saying I need to write tests for it.

That, IMO, is the best argument for TDD. By having a process that forces tests first, you never get into the situation where you are cutting critical tests for the schedule's sake.

For me, I ask the question, "Is this code adding more value than other code I could be writing?" If the answer is no, I do something else. Of course, this is completely subjective, but my years of experience count for something.

That, IMO, is the best argument for TDD. By having a process that forces tests first, you never get into the situation where you are cutting critical tests for the schedule's sake.

On the other hand, with a process like TDD you also never get into the situation where you can cut non-critical tests.

If the tests aren't critical, is the feature critical?

Surely that depends on many factors, just like using any other tool or process?

Does each test tell you something valuable?

Is the cost of implementing and maintaining that test less than whatever value the test offers? That is, is the test cost-effective?

Could you have found the same information more effectively some other way? That is, what is the opportunity cost of writing and maintaining that test?

> stop writing tests, the customer will never run this code

git rm <code that customer will never run>

"Checkbox features."

The customer will never run the code, but that doesn't mean they don't buy your product because of the code.

Agree! I was thinking also of in house support code, tooling, infrastructure maintenance, local IT stuff, build tools. Stuff that runs in-house, might be important but if it breaks, there won't be a 2am call from Australia or a public tweet about how "Product <X> from company <Y> just lost me all my data".

Sure,not important if your in-house backend database backups work. Customers never see that.

I have seen in-house bug tracking database backups fail and I have seen upset customers that spread the word to other potential customers. I'll take an in-house database backup failure any day.

This is for a small niche industry where everyone knows everyone a bad word or hit can mean the end of the company.

tl;dr: Tests are overhyped because we once wrote tests that became irrelevant when the thing we were testing changed

Really? I mean, tests might be overhyped, but so far it looks like the author draws incredibly general conclusions from some isolated incident. Why did the tests become irrelevant? Is their scoring algorithm now doing something completely different? Do they now have different use cases? Were they hard-coding scores when the actual thing that matters was, say, the ordering of the things they score?

> Only test what needs to be tested

well, thanks for the helpful advice :) Care to share what, in your opinion, needs to be tested?

OP here. My conclusion wasn't drawn from an isolated incident; I shared the incident to highlight what I imagine is a common pitfall in testing.

I used the phrase "only test what needs to be tested" intentionally. How should I know what you need to test? But if you accept that you should only test what needs to be tested, then there is an implication that some (I'd venture to say most) of what you write doesn't need to be tested. And that's a liberating concept. You aren't obligated to test everything, but you should test what really matters.

What needs to be tested is probably directly related to the size and stability of the company (assuming size and stability have a positive correlation). I would venture to surmise that young start-ups have almost no need to test anything. That comment isn't meant to be inflammatory, but I look at testing too early like optimizing too early. There's no reason to shard a database until you absolutely have to, and I don't think you need to test something until you know it's mission critical.

For example, Stripe offers a service that were components of it to fail would jeopardize their entire business. Their code base is obligated, by its nature, to have more stringent tests. But the startup in the garage next to yours who is still trying to determine MVP and will probably end up pivoting five times in the next two weeks? Save yourself a lot of grief and don't worry about the tests. Once you know what you are going to be, once you know what you can't afford to lose, well, wrap those portions up with some solid tests.

> I used the phrase "only test what needs to be tested" intentionally. How should I know what you need to test? But if you accept that you should only test what needs to be tested, then there is an implication that some (I'd venture to say most) of what you write doesn't need to be tested. And that's a liberating concept. You aren't obligated to test everything, but you should test what really matters.

In -my- anecdotal experience, this is the excuse developers use while they write bug-ridden code without tests. Maybe they save a little time at the early stage of the development cycle, but the real reason seems to be avoiding work they don't find to be as fun as hacking out code.

After that code is hacked out, we then waste a bunch of cycles across multiple teams while we find and fix out the most obvious of their bugs.

After the release goes out, we spend even more time finding previously undiscovered bugs and fixing them -- which hits actual user satisfaction.

All because some lazy sod thought he was too smart to need tests, and that he could be trusted to decide when his code didn't require them.

Fuck that. I've been doing this too long and seen this happen too many times. If you're on my team, you write the damn tests.

Ok, so if you have people that are legitimately lazy, then they should be fired, full stop. However there is probably another reason they are not writing enough tests:

- They don't see the benefit because they are forced to write lots of tests that do not, and never would have, provided value. (like stupid getter and setter tests being the canonical example)

- Writing tests is hard or slow, either due to things not being tooled out or because the code has not been architected to be testable.

- Writing tests causes lots of false positives, fails to catch real bugs, or end up being extremely brittle. Classic example here is religious mocking of the database in the name of performance, yet fails to catch most bugs since it's, you know, not actually hitting a database.

- The suite is slow or causes deployments to slow down such that adding tests adds a "tax" to everything else people do.

I've had enough experiences with all of the above phenomena so as to become completely skeptical of extensive testing since unless you are a real pro you will inevitably wane on at least one of these things.

We have a slightly different approach to the same problem: if people keep putting out crap code that breaks (or worse, introduces regression bugs) and I will let them go. I have no time for people who do crap work.

You use tests to weed those people out. That's fine, if it works for you.

I'm anal about knowing everything that's happening in my code. I compare git commits to ticket numbers to keep abreast of how things are changing over time. A bug comes through on Goalee and I know which handful of files it will be in. Honestly, I'm quite uptight about this stuff. I catch most shortcuts personally before they go out, and have the appropriate conversation with the responsible party.

I've never seen untested code that wasn't crap, or didn't require just as much work (if not more) to ferret out all the bugs (including watching everyone like a hawk).

A huge percentage of the code you use every day doesn't have tests - Linux kernel? No tests - TCP/IP stack? No tests - Windows Kernel (up till WinXP anyway) ? No tests - Android? No tests - Quake engine ? No tests

Don't make stupidly broad statements like "I've never seen untested code that wasn't crap". Well unless by "untested" you mean "code that was never even ran or QA'd a little". I'm pretty sure you mean "code that didn't go through TDD" though, in which case - you are very, very wrong.

Even if we make your assumption that he's specifically referring to TDD only, just because good code that didn't go through the more rigorous definitions of TDD exists doesn't mean he's seen it. And plenty of crap code ships.

I'd argue Linux has significantly more than 'no' tests: http://stackoverflow.com/questions/3177338/how-is-linux-kern... -- it's not TDD, and it's not centralized, but then again, neither is Linux development. And while you might convince me that pre-XP kernels went without unit testing, I'd point out we hardly call them bastions of stability and security. Try certifying a 360 or ModernUI title under the Microsoft banner and see if they'll let you get away without writing a single test. I'd wager anyone who successfully accomplishes this has spent far more effort arguing why it should be allowed than writing some tests would take.

And, given the sheer girth of their security processes (patching, QA certification, tracking) and toolset (through virtualization, fuzzing, unit testing, CI build servers), it would take much more effort to convince me that they do absolutely no unit testing whatsoever on their kernel in this day and age. Far more than you could reasonably give: I suspect such convincing would require breaking several NDAs and committing several felonies.

I'm curious as to whether you've actually worked on the Linux kernel, or the UNIX descendants, or TCP/IP stacks.

I have. The only reason they work at all is the enormous number of man years spent banging on the poorly tested code after its written. The number of regressions that pop up in core subsystems of the kernel are staggering. VM bugs still happen in commercially shipping systems like OS X because of small changes to extremely fragile and untested code.

Linux isn't a paragon of code quality, and neither are most Network stacks. If anything, you're proving my point -- code written in that methodology has required enourmous amounts of work to stabilize and prevent/fix regressions going forward.

Case in point; a few years ago, I found a trivial signed/unsigned comparison bug but in FreeBSD device node cloning that was triggered by adding/removing devices in a certain order; this triggered a kernel panic, and took teo days to track down. The most trivial of unit tests would have caught this bug immediately, but instead it sat there languishing until someone wrote some network interface cloning code (me) that happened to trigger this bug faaar down below the section of the kernel I was working in.

This kind of thing happens all the time in the projects you list, and its incredibly costly compared to using testing to not ship bugs in the first place.

"I've never seen untested code that wasn't crap"

That's the statement I was replying to - and yes, code with tests is going to be better. But to call all code without tests crap is directly saying that the Linux code is crap, that the BSD code is crap, tcp/ip code is crap, etc. I disagree completely - the code is awesome and has built billion dollar industries and has made most peoples lives better. Could it be improved with more tests? Sure. Everything can be improved. Calling it 'crap' however is insane.

I, personally, would be more than happy to have produced the 'crap' BSD code that has propelled Apple into one of the most valuable companies today.

> I disagree completely - the code is awesome and has built billion dollar industries and has made most peoples lives better.

It shipped. That doesn't prove your point. Code that is expensive to produce and ships is better than no code at all. That doesn't mean that this is the best way to produce code.

What exactly do you think is awesome about the code?

> I, personally, would be more than happy to have produced the 'crap' BSD code that has propelled Apple into one of the most valuable companies today.

I did produce some of that code.

Nothing that you're saying justifies why producing worse code, less efficiently, is better than producing good code, more efficiently. Your position assumes a false dichotomy where the choice is between shipping or not shipping.

The truth of the matter is that the legacy approach to software development used in those industries has more to do with the historical ignorance of better practices and the resulting modern cultural and technical constraints. In the era that most of that code was originally architected, it was also normal to write C programs with all globals, non-reentrant, and non-thread-safe. Are you going to claim that this is also the best way to write code, just because it produced some mainstays of our industry?

Nothing that you're saying justifies why producing worse code, less efficiently, is better than producing good code, more efficiently. Your position assumes a false dichotomy where the choice is between shipping or not shipping.

Well, you say it's a false dichotomy, but if TDD really does reliably produce better code and more efficiently than a non-TDD approach, how come hardly any of the most successful software projects seem to be built using TDD? It's been around a long time now, with no shortage of vocal advocates. If it's as good as they claim, why aren't there large numbers of successful TDD projects to cite as examples every time this discussion comes up?

> If it's as good as they claim, why aren't there large numbers of successful TDD projects to cite as examples every time this discussion comes up?

What exactly makes you think there aren't?

What exactly makes you think there aren't?

The fact that every time I have ever had this discussion, the person taking your position comes back with a question like that instead of just listing counterexamples.

Can you name a few major projects, preferably on the same kind of scale as the examples that have been mentioned in other posts in this discussion, that were built using TDD? Not just projects that use some form of automated testing, or projects that use unit tests, or projects that use a test-first approach to unit tests, or anything else related to but not implying TDD, but projects where they really follow the TDD process?

The development processes for major open source projects tend to be public by their nature, and there are also plenty of people who have written about their experiences working for major software companies on well-known closed source projects, so telling a few TDD success stories shouldn't be a difficult challenge at all.

> Can you name a few major projects, preferably on the same kind of scale as the examples that have been mentioned in other posts in this discussion, that were built using TDD?

Operating systems? No. Sorry. Operating systems, as a rule, predate modern practices by some 1-4 decades. Making a kernel like Linux, Darwin, or FreeBSD testable would be a task somewhere between "massive headache" and "really massive headache". I've done some in-kernel unit testing, but only on our own driver subsystems that could be abstracted from the difficult-to-test underlying monolithic code base.

Outside of kernels/operating systems, just Google. Unit/automated testing is prevalent in just about all modern OSS software ecosystems.

A few examples, off the top of my head.

Apache APR: http://svn.apache.org/viewvc/apr/apr/trunk/test/README?revis...

Clang: http://llvm.org/svn/llvm-project/cfe/trunk/test/

Go: https://code.google.com/p/go/source/browse/src/pkg/math/all_... (see all *_test.go files)

My apologies if I've misinterpreted your posts. I thought we were talking specifically about TDD, since that seemed to be the subject of the original article, but from the examples you gave, you seem to be talking about automated unit testing more generally now. In that case, I would certainly agree that there are successful projects using the approach.

However, I would also say that if you've never seen code untested in that way that wasn't crap then you're not looking hard enough. Some of the most impressively bug-free projects ever written used very different development processes with no unit testing at all. Space shuttle software, which is obviously about as mission critical as you can get, went in a very different direction[1]. Donald Knuth, the man who wrote TeX among other things, isn't much of a fan either[2].

[1] http://www.fastcompany.com/28121/they-write-right-stuff

[2] http://www.informit.com/articles/article.aspx?p=1193856

There are also other means to assure high quality of code. Testing is just one of many tools. And even 100% coverage does not guarantee your code still isn't crap. Your tests might be crap as well. By testing you can only prove there is a bug but you can't prove there isn't. You can only hope that if you tested the right things, it is likely your code works fine.

For example static code analysis is sometimes superior to testing, because it can prove for absence of some wide classes of bugs.

And in code reviews we often find subtle bugs that would be extremely hard to write tests for (e.g. concurrency related bugs).

You can also decrease bug rates by writing clean, understandable code. Which is often related to hiring a few great programmers instead of a bunch of cheap ones.

I've heard this saying: "Test until fear turns to boredom." Then you need to make sure you get bored at a properly calibrated rate (i.e. not too soon given your business' criticality, not too late given your need to move quickly). For example, if a lot of expensive bugs during final cause analysis are found to be preventable via unit tests, then your calibration is probably tuned to being bored too quickly.

This is a great line. I think it goes deeper though, in that after doing this for a while, if it hurts, you are probably doing it wrong. Writing code with too little tests hurts, because you are scared shitless. Writing code with too many tests hurts, because you are performing a religious rite not actually building something useful.

The trouble is that you have to do this for a while to get to the point where you can both feel and recognize pain, or the lack thereof.

That's an excellent saying. It's exactly the line I try to tread with testing. I don't know what to do with the angst of getting the tradeoff wrong though.

1) Test your APIs. A public API, especially one that is key to your business, should be near 100% coverage. And attacked to look for security/usability/load problems.

2) Test enough during development to support later regression tests, and to make sure that the design is testable. This can usually be achieved with less than 20% coverage. But if you write production code that's so screwy it can't be regression tested, then you've got big problems.

3) Test any parts that scare you or confuse you or make you nervous. Use "test until you're more bored than scared" here.

Terse, but extremely helpful and well formulated. I'll print these out for further reference ;-) I'd add 4) Continuous integration testing doubling as operations monitoring. I know that this doesn't "really" qualify as testing, but from my experience, it's far easier getting bitten by components not working together than from a screwed component alone. The key here is monitoring and testing business parameters from end to end. It doesn't help you finding out where the bug is, but it lets you know early on if there is one.

>well, thanks for the helpful advice :) Care to share what, in your opinion, needs to be tested?

I am not the OP, but I think unit tests are more useful for code that has a lot of edge cases (such as string parsing) and for code which causes the most bugs (as seen in black box testing or integration tests.)

Also, some code is just easier to develop if you create a harness that runs it directly instead of having to work your way to the point in the program where it would execute it. If you do this, you might as well turn it into a test.

Principles I've found:

* A test should accept any correct behaviour from the tested code. Anything which is not in the requirements, should not be enforced by the test.

* A test should not use the same logic as the code to find the "right" answer.

* A function whose semantics are likely to be changed in the next refactor should not be invoked from a test.

* Whenever a test fails, make a note of whether you got it to pass by changing the test or by changing the code. If it's the test more than 2/3 of the time, it's a bad test.

* If you can't write a good test, don't write a test at all. See if you can write a good test for the code that calls this code instead.

In my admittedly limited experience, unit tests are way overhyped, especially when things like mocking are brought into the mix. It's easy to end up with a test that is not about correctness, but "did line 1 get called, did line 2 get called, etc.". Then you change the implementation, and 20% of your test cases break. That's not to say that they are valueless, but that I think unit tests should be used pretty sparingly.

Where I've found a ton of value has been in writing almost artificial-intelligence driven integration tests. Write a bot that uses your service in way, as fast as possible. Run fifty of these bots simultaneously, and see what happens. Then have some way to validate state at various points (either by tallying the bots actions, or sanity checks). Bugs will come fallout out of the sky. Then, in the future, when you get a bug, the challenge becomes to update the integration test bots behavior so that they (preferably quickly) can reproduce the bug.

I mean, I think that this is dependent on the domain of your software, but I think it's a good strategy for many areas.

It's easy to end up with a test that is not about correctness, but "did line 1 get called, did line 2 get called, etc.".

Yes, it's easy to write bad tests. But that does not reduce the value of good tests!

If we're only using unit tests to see if line N gets called, I think we're doing it wrong. Instead, we want to use unit tests to tell us if the answer we get is correct -- while exercising line N.

e.g.: def test_foo(self): result = get_foo(bar=12) self.assertEquals(result.category, 'FOO') self.assertEquals(result.name, 'my-foo-12') self.assertEquals(result.unit_price, 97.12)

This lets you verify that this particular branch (when bar=12) is executed, and that your results are as you expect. If you change some of the underlying calculations, things can break (as you get different answers), but then you at least have a test that lets you ensure that changing the answers is what you want to do. Sometimes, you want to change the way you calculate something and get the same answer, after all.

So the question is, what if bar = 12 means "retrieve foo with ID 12 from the FooService", and we want to make sure that get_foo does something sensible when the server returns an error code.

So what do you do? Typically one writes a mocked up service that returns the expected results, pass it in somehow, and write your tests like you did above. So if you have a lot of services, you end up doing what I said - testing lines of code. Here you're testing that FooService gets called. If it also called BarService, you'd be testing that BarService gets called. And so on.

But then later, we decide that FooService is no good - we want, nay demand - a FooService2, with a new API. However, get_foo is to behave the same. So what now? Only one choice - update all of the tests to have a mocked up FooService2.

As the code base grows, I find this becomes annoying to maintain (although to some extent a necessary evil, of course).

The alternative I'm saying is that instead of spending a ton of energy unit testing get_foo, hammer on it and write good state validators. Your unit test shows that get_foo for bar = 12 returns a unit_price of 97.12. Great, but I don't think that's as interesting as building a fake database (so we know what the expected returns are), and then making 50,000 requests per second with random data for 12 hours, and then through runtime assertions and as a post-process validating that the service actually did what it was supposed to.

If writing tests (or production code) is boring, then you probably have design problems. Boredom is a sign of duplication--not cut-and-paste duplication, but the more insidious duplication caused by poor abstractions.

In the example you gave (of a mocked up service), what would happen if you didn't mock the service? That would mean you couldn't call the service in your tests, right? (Except the tests that tested the service integration explicitly.) How could you change your design to make your code testable under that constraint?

The answer depends on your situation, but one way I've seen this play out in the past is to make more value objects and create a domain layer that operates on value objects rather than the service. The service is responsible for creating and persisting the value objects, but the domain layer handles the real work of the application.

As a result, testing the important part of the application can be done completely independently of the service. No mocks required. Now I can upgrade to FooService2 without changing any tests except for FooServiceConnection's. My value objects and domain layer remain unchanged.

If you're bored while programming, look for the duplication hiding in your design. In the case you're describing, it's hiding in the oft-repeated dependencies on FooService.

Interesting? Testing isn't about programmer enjoyment, it is about correctness.

A 12 hour stress test is going to catch different things than a unit test. A simple suite of regression tests can be automatically run by a continuous integration tool to flag if the build is broken and alert the dev/team so things can be fixed quickly.

In this example it seems you are thinking of writing tests as a separate step from writing the code, which is part of why it seems like a chore. Make small changes, update the tests until everything passes & new code is exercised, commit, repeat. (Or if you are better disciplined than I, update your tests first and then write code until tests pass, repeat) Monster refactors, while sometimes necessary, are best avoided when possible.

What I meant by interesting is that they tell me more about the health of the system, not that they're more interesting to write.

Suppose there's a race condition in some library that you're using. No unit test in the world is going to catch that. Now, unit tests certainly have their place - but my point is that from what I have seen, unit tests catch the boring bugs, while integration tests catch the interesting ones(by interesting here I mean obscure or subtle - deadlocks, invalid state, etc), while at the same time inferring the boring bugs (ie. add(5, 3) returns 7 instead of 8), so that hour for hour, especially with limited resources (ie. a small startup), integration testing has the potential to give you a lot more value.

That makes much more sense!

However I would still add that simple suite of regression tests (in my CRUD app these are almost entirely integration tests), often speed up development by more than the time it takes to write the tests in the first place. So to say a startup doesn't have time for them seems shortsighted.

I've had great success with this technique. In my book automated and randomized integration stress tests cannot be beaten when it comes to hardening software as efficiently as possible. Randomized stress tests come across pathological conditions quickly, and it only takes a fraction of the imagination that it would take to write a unit test to successfully catch each one. Great for data structures and distributed systems, and probably applicable wherever non-trivial unit tests would also be useful...

I find it interesting that the premise for this entire diatribe is that once some code was written, a pivot was made, and a number of tests then had to be thrown away (or reworked, or re-factored?).

Recently I had a code base where we decided that due to a new delightful feature our customers were going to be quite pleased with we would need to switch out our old queuing system for a new one. In doing so well more than half of our huge test suite turned red. This told us two important things 1) that the queuing system touched a lot of areas of code we needed to think about and 2) where in the code the queuing system had touch points.

Ultimately we were able to put in the new queuing system, fix the areas that were broken by the change, and have the confidence at the end of the process that we had not broken any of the areas of the code that were previously under test. (This does not mean that our code was bug free of course, only that the areas under test were still working in the prescribed way, but that is a discussion for a different article.)

I believe that this would have taken a team of people weeks to do previously. I was confident that the change was ready after only 3 days with 2 developers. I would not trade my tests. There is a cost associated with everything, but I believe tests are the least costly way to get highly confident software built.

Get halfway through a project, realize you have to make a big change, then make it. Without tests i'll guarantee you'll watch your stability plummet. With tests, you might just go home at 5 pm. Alternatively don't make the change, and deal with a problem in your design. I've seen it more than a few times to have become convinced that there's a lot of value when striving to cover as much of your code as you can.

If you have a statically-typed language - and it doesn't even have to be a good one, with C++ being fine, in my experience - then making such changes is usually a case of making the change, or perhaps removing the bit of code you intend to change, compiling, and seeing what happens.

What happens is usually that you have 14,000,000 compile errors. Well, congratulations! That's the difficult part over. Now it's time to relax! Start at error 1, and fix it. Repeat. Every now and again, build. Once it builds again, it will usually work; if not, it's always something quite simple. If you have enough asserts in your code, then you can be doubly confident that there's nothing wrong with the new code.

I've had good success with this approach, for all sorts of changes, large and small. I've given up being surprised when the result just builds and runs.

I have no real idea, how you would do this reliably in a large program written in something like python. Sure, you'd fix up your tests easily enough... but then what? Don't you have a program to maintain as well? :)

Jonathan Blow wrote about this years ago: http://lerp.org/notes_4/index.html

Its rarely the case that you can fix 100 errors, and everything is working just 100%. There's often one corner case that works just a little bit differently.

Depends on the power of the type system. In languages with stronger / more expressive type systems than C++ (e.g. Haskell or Scala) the compiler catches really a huge amount of possible problems. The downside is it sometimes catches non-problems, too.

> There's often one corner case that works just a little bit differently.

You might just as well miss that one corner case in your test suite. This is the problem with tests - you can never be sure.

If you pick just the right thing to break, the corner cases break as well. (Sometimes you do need to temporarily break a few things in order to figure out where extra changes need making.)

I'm not saying it can't go wrong when you're finished - it just usually doesn't. And it just doesn't take much effort to fix when it does.

If you have 140k compile errors, you might need to factor out repeated idioms into functions.

Erm - 140k? Actually, 14,000K!

Anyway, while it was a - slight :) - exaggeration for effect, even in well-factored code small changes can have a wide-ranging repercussion.

140k is probably huge, but when talking about reafactoring, error numbers can go quite high. Refactoring is usually not just replacing a few private methods.

Yes but there are also cases where relatively safe changes break a lot of tests, and it tells you nothing except that you now have a lot of tests to fix-up. I've held off of refactorings that I knew would scrap a bunch of tests. There is no easy answer to this stuff.

The argument would be that tests that are easily broken by relatively safe changes are bad tests.

Or that the tests are correctly telling you that your proposed change will break your clients.

Care to comment on your experience with larger code bases? The content of your post seems short-sighted, and there's an exponential function of complexity increase as LOC and developer headcount both go up.

You're right people pay for features, but lagging a little at the beginning to establish good TDD culture pays off in spades later on. Shipping product is something you have to do continuously, and you arguably create more value as time goes on, so ensuring you can continue to ship product in a timely manner is a great thing for organizations.

I'm not the original poster, but I find that integration tests have a far larger payoff than unit tests, in general. A good release process that tests differences in behavior between a test system and the current version of the service in production is also valuable.

Being able to test that the whole system works as intended gives a better return on investment, in my experience, than testing small bits in isolation. The errors are, often as not, in the glue between the small bits.

The main advantage I've found from testing little bits in isolation is that it tells you a lot more about where exactly the bug is. However, there's no reason why you need to write them in advance to get this benefit.

The general workflow I've grown fond of is to write mostly higher-level tests that test the system as a whole, and then only write fiddly unit tests when there actually is a bug to fix. Those unit tests then stick around, but I don't feel bad about blowing them away without replacing them if the thing being tested changes significantly enough to make them useless.

Agree, integration test is cost-effective. And I would like to rephrase "integration test" as "test from outside as much as possible".

Unit-tests work great for us for testing some well isolated pieces of complex logic with a simple API (e.g. algorithmic stuff). But we don't unit-test things simple like getters or setters or code that is glue between components and just delegates real work to some other components. For this, we do functional / integration testing and it plays well.

unfortunately, integration tests tend to take a longer time to run. unit tests tend to run instantaneously.

I've worked on quite a few different large code bases. I agree with the author 100%. Tests are a tool. They are a particularly useful one, but they are just a tool. Unit tests are significantly less useful than TDD folks would have you believe when compared to things like integration tests. And integration tests are vastly more costly than what TDD folks would have you believe. Personally, I much prefer gold file tests with manual inspection when it comes to test automation. I've seen large compiler projects that get by pretty far with just that. Never mind things that make no sense to test: Like "Is this game fun?"

The key, to me, is determining what needs to be tested. That doesn't change with larger code bases. Note that I didn't say that tests were pointless or that you shouldn't write tests. That's incredibly shortsighted.

Originally I was going to title this "Tests are overrated" but that both seemed like linkbait and distorts my actual opinion.

I've been on projects where they tested to make sure that certain values were unique in the database and I couldn't help but think they: didn't understand the framework; didn't understand what tests are meant to do; didn't understand database key constraints; or all of the above.

Tests have their place. But they are a means, not an end. And I see a lot of people confusing them for the end.

But, again, I don't dislike tests. I just dislike what I perceive to be a current overemphasis.

Behavior/test-driven development needs all the hype it can get. There are so many developers, even entire software development subcommunities (I'm looking at you, video game developers) who haven't written a single test in their lives and don't understand its value.

Yes, it is time-consuming and invisible to your customers, but I imagine so is setting up a frame for a house instead of just stacking bricks. The structure, flexibility and peace of mind you get from a comprehensive test suite pays off when you have a 50-brick tall structure to put a roof on.

In my experience test-driven development is not time consuming, even for small developments. I don't think we even need to say things like "yes it takes time now but it will pay off later". I have found that it saves time right from the beginning. It seems other people have a similar experience : http://blog.8thlight.com/uncle-bob/2013/03/11/TheFrenziedPan...

I'm not necessarily disagreeing with the author, but I think if you are going to 'do agile', especially in a breaky language like C++, you need to do tests. Lots.

The current shop I'm at is maintaining a huge code base, parts of which go back 20 years. Because test coverage is so low, there is a real reluctance to refactor.

The first thing I did when I started here was to clean up the code, renaming miss-named variables to get it in line with the coding standard, adding autopointers here and there to head off memory leaks. By gosh, I nearly got fired.

If you are going to fearlessly edit your codebase, you need to know that regressions are going to be caught. You need automated testing.

"We wrote the first scoring algorithm at Goalee based on the red-green light. Within a couple of weeks we made our first algorithmic change, and made several quick fix releases to update the scoring methods in the following weeks. By the end of the month, a whole section of our testing suite was almost completely worthless. In a world where cash is king, I wouldn’t mind having those dollars back."

What I don't understand about comments like this is that a whole section of your code, both runtime / deliverable code and test code had become worthless. But, you only seem to view the discarded test code as wasted effort. Either the tests have value or they don't. And, if you write tests, and then discard the code they test, you'll likely also discard the tests. But, that doesn't change whether or not the tests had value, nor whether the new tests that you'll write for the new code have value.

> What I don't understand about comments like this is that a whole section of your code, both runtime / deliverable code and test code had become worthless

Not so. The code demonstrated that the first algorithm wasn't good enough and provided the experience needed to write the second one. The tests (hopefully) made the first algorithm's code maintainable, but it turns out there was no need to maintain it.

I think the hype of tests is driven by agencies that charge by the hour. One thing I have noticed is that on an existing system especially, you can put a very inexperienced developer down and instruct them to 'write tests for untested functionality', and do so but produce neligible value.

The same thing with green fields development. I've seen steaming piles of shit with huge test suites. Absolutely zero insight into the problem. No craftmanship at all, nothing interesting about the application. But a set of tests.

It's like the suite is proof enough that there was a job well done. I fear that a lot of development is devolving into nothing more than superstition and hype, backed up by agencies that like to bill a lot and amateurs who need a justification for their timelines and ineptitude.

Amen! I run https://circleci.com, where we make all of our money by actually running tests for people, and I still believe this. The goal of your software is to achieve goals, typically business goals. Often the software itself is already one or two steps removed from the value provided to customers. Tests are an extra step removed at least.

I wrote a similar piece here: http://blog.circleci.com/testing-is-bullshit/

Of you don't care whether your CI tool works, I won't care either, because I won't use it.

Perhaps I missed where I said I didn't care - where did you see that?

Tests are not overhyped. They are under-understood. Unit tests are not regression tests are not integration tests. I've encountered tons of teams that don't seem to understand this.

Unit tests are more than any other factor a design tool. Like any other design tool (uml, specification, etc), when the design needs to change, you throw them out. If it takes longer to design a system with unit tests than without them 1 of 2 things is true 1) you should not write unit tests 2) you should learn how to write unit tests.

In total agreement with the OP.

I think it just goes to the way human beings handle original ideas, first they fight them, then they embrace them, then they take them to ridiculous extremes as they try to substitute rules for common sense in applying them.

You can see it in politics, religion and almost any really popular area of human endeavor.

Testing falls in the same category, I have had interviewers look me in the eye and in all seriousness, declare that developers who don't write tests for their code should be fired, or that their test suites cannot drop below x% of code coverage. Dogma is a horrible thing to afflict software teams, whether it is pair programming, or mandatory code reviews, if there are no exceptions to the rule or situations where you don't have to apply it, its probably a bullshit rule IMO.

Me, I like to ship shit, and I like to look at tests as a way to help me ship shit faster, because the less time I spend fixing regressions the more time I can spend actually getting more code/features out that door.

So my only metric for writing tests is this ... "is this going to help (me or a team member) not break something months from now, when I change code somewhere else".

I honestly don't care about anything else.

Tests certainly interfere with your ability to ship shit.

I use a strongly typed functional language, so I don't need to write tests. If my code compiles, it's correct.

This is not accurate unless you use a nice dependently typed lanugage like Agda or Coq. It is true that you need far fewer tests, but hardly none at all.

This is particularly evidenced by the fact that Haskell--certainly a "strongly typed functional language"--also has some of the best testing facilities of any language I've seen. QuickCheck simply can't be beat, and you can cover other parts of the code with more standard tools like HUnit.

Now, there is some code--only very polymorphic code--where the type system is strong enough to give a proof of correctness. For that sort of code, which you're only likely to encounter if writing a very generic library, you can get away without testing. But that is not even the majority of all Haskell code! And even there you have to be careful of infinite loops (e.g. bottom).

Comments like this make functional programmers sound much more arrogant and clueless than they really are.

If you try hard enough, you can make code that doesn't work in Haskell too!

Poe's law applies here.

Thanks I didn't know about Poe's law. My first thought was that it was a joke and a rather good one, but then seeing replies taking it seriously I started to doubt :)


I have a long experience in enterprise software and I agree with the premise.

There are two kinds of test units: workflow and functional.

1 - Workflow test units are a waste of time because no single test unit stays valid when there is a change. In other words, whenever we added/removed steps in the workflow, 99% of the time we have to change the test unit to fit that new workflow which breaks the concept of "write once, test all the time" concept. In my experience, having proactive developers who test areas around the workflow that they changed is much faster and reliable.

2 - Functional test units are great. They test one function that needs certain parameters and is expected to spit a certain output i.e function to calculate dollar amounts or do any king of mathematical operations. However, these functions tends to stay unchanged during the lifetime of a project. Therefore, the test units are rarely run.

From my experience workflow changes/bugs represent 80% of the problems we face in enterprise software. Functional changes/bug are rare and can be detected quickly.

This is why I agree with the author premise that unit testing is overhyped.

I tended to do API-centric functional testing (does it spew JSON with the structure and data I expect?), and found it to be a time-saver in that it was faster to write a test and re-run it to check output than to manually use the web app or make the calls myself.

However if the test exceeded this cost/benefit metric where it wasn't really helping me get the feature written, out the window it went.

Helped when I went to refactor/fix fairly major chunks of the backend as all those tests from back when I did initial development were still there. It wasn't really "test first" because I didn't know what to test for until the basics of the API endpoint were in place.

This was Python if it matters (default unittest2). I do mostly Clojure when I have the choice lately.

It's not even really a matter of "does it need tested?", although you should be asking that question and building up the coverage for the critical bits.

For me it was a question of, "is this going to save me time/sanity?"

I advocated tests to the other engineers at my startup only when they were experiencing regressions/repeat bugs. I left them alone about the lack of test discipline otherwise.

My Clojure code tends to "just work" (after I'm done experiencing insanity and disconnection from reality in my REPL) to the degree that I mostly write tests when I'm making a library or touching somebody else's Git repo.

This is all fitting though. I use Clojure instead of Haskell precisely because I'm impatient, lazy, etc. Would kill for a "quickcheck" for Clojure though.

This whole debate has a whiff of excluded middle (we have 3 UND ONLY 3 TESTS <---> TDD ALL THE TIME!), not to speak of people not mentioning how tests can simply...save time sometimes.

Funnily enough what the article describes is almost the perfect case for testing: "We wrote the first scoring algorithm...based on the red-green light." This sounds like it describes some heuristic weighting, which couldn't have been solved by types, but reasonable tests would have shown if the new algorithm weighted some special cases higher or lower than it should.

The problem with an heuristic weight though, is that it's an heuristic, judged against other heuristics by taste not proof.

The obvious testing approach, ensuring that the score for each test case retains the same order as you tweak the algorithm - is overtesting. You don't care about this total order; you more likely care about ordering of classes of things, rather than ordering within those classes; or simply that 'likely' cases follow an order. Hence, you hit far too many test failures.

I'd agree that it's possible to overtest in general, but it's so easy to overtest heuristics that it needs called out as a special case, and it sounds like the problem here.

No amount of tests can help you if don't know how to solve the problem. And the better you know how to solve the problem the less tests you really need. ;)

Old, but quite relevant: http://ravimohan.blogspot.com/2007/04/learning-from-sudoku-s...

Some sane amount of testing is good. However, I'm not very convinced writing tests first is a good idea. I saw a few programmers practicing this and when writing code they often concentrated too much on just passing the damn test-cases instead of really solving the problem in a generic way. Their code looked like a set of if-then cases patched one over another. Therefore, if they missed an important case in their testing suite, they had 99% chances for a bug. Once I ended up writing a separate, more generic implementation to validate the test suite. And it turned out there were a few missed/wrong cases in the test suite.

Seeing how many people agree with the post I am left wondering who the commenters on HN are. With a large code base and actual customers, comprehensive unit testing is by far the most cost effective way to create and maintain software. This is not being dogmatique, it's the shared experience of the majority of established software engineering firms. In the case of my company, we experienced so much growing pains that our productivity almost came down to zero before we switched to TDD. Many developers in our group were skeptics (including me) but today you won't find one of us arguing for less testing.

The kind of large pivot that the author refers to is only possible when you don't have established customers and you have a minimal product. You may as well call it prototyping. And prototyping with or without tests is indeed more a matter of taste than effectiveness.

This post is over-hyped: "the cool guy who ships products and applies to cool startups, but they only hire super talented and passionate engineers."


The text above is just like the blog post: is unfounded and purely based on personal opinions.

Having a strong test culture in a company of any size is important for a number of reasons, and all of them are founded by both books and articles, but also by AMAZING products.

An engineer who has talent in writing readable and well structured tests is also someone who has a special skill in: creating robust systems, reverse-engineering existing systems and therefore good at being hired by a startup that has NO or very little test culture, which end up being hired by a company that has code that is buggy, error prone, coupled and hard to maintain, so that he can apply all this knowledge to clear crap written by people who doesn't like tests

An ex-colleague of mine used to put it well in saying: "It's best to be pragmatic rather than dogmatic while testing and pair programming."

I think that can be easily generalized to: "It's best to be pragmatic rather than dogmatic."

It is pragmatic to be dogmatic in a context where there is a a temptation to cut corners which will result in non-obvious, severe negative consequences.

Ahh, but is it not dogmatic to assert a statement is always true for one class of situation? Accepting imperfection, are we not pragmatists? 道可道非常道 (The Tao that can be told is not the eternal Tao) 名可名非常名 (The name that can be named is not the eternal name.)

So because his particular tool (or use of a tool) adds inconvenience or overhead means the entire concept is bunk? Don't confuse the thing with the concept.

Tests are double checking your work. ("Yup. That condition I said would be true still is!"). The tighter and more lightweight we can get that feedback loop going the better out output and the more confident in our work we'll be.

I find unit tests written after the fact to be pointless. TDD makes a lot of sense because it forces you to think about how you are going to write your code. I like that. If you're writing tests after the fact, integration tests have a much more satisfying finish, and I can't imagine going without them (unless they are painful to implement)

>>> We wrote the first scoring algorithm at Goalee based on the red-green light. Within a couple of weeks we made our first algorithmic change,

I think he actually mens an specification change. Tests would help you refactor in case you want to change the algorithm against the same specification.

But yeah, is a flawed metric so it shouldn't be put on a pedestal.

[I]f any portion of your site changes (pivot, any one?) you’re now in a position of having to rewrite large portions of your tests. And the larger the pivot, the more useless your old testing becomes.

I'm under the impression that the tests are written first. If you are writing new code, then first there's a (failing) test, then there is the new code. If you are altering existing code, then pivot or not if something breaks you need to see if the reason is broken code or an inappropriate test, and you fix it before moving on.

In any event this should be a gradual process. How common is it that large slabs of code gets changed while ignoring failing tests, or large slabs of new code gets added while skipping the tests?

Is it that TDD is, in actual practice, unsustainable, or is the problem with not adhering to TDD? Or something else altogether?

I don't know what to do with tests, to be honest. I am a very inexperienced developer, and the general sentiment is "knowing what to test is a learned skill"


I could use some opinions actually. The current workplace arguments about what we should or shouldn't test go like this:

   "We write tests to establish specifications, then we write code to meet
    those specifications. We should have tests covering every aspect of the 
    system as a way to assert that things are meeting the designed specs."

   "but our model objects have allot of tedious detail. Why should we write 
    tests about what the name field can and can't accept? It's a max length of 
    32, that's it. Asserting that the constraint fails is beyond tedious and a 
    waste of our time. Most of the code we'd be testing is auto-generated. Are 
    we running tests on our framework or our code?"

   "Some of this does seem like a waste of time, but what about when 
    someone changes something as part of the development cycle? eg. 
    Someone needs to remove validation from the name field to test other 
    parts of the systems behavior with an overly large name. If we had a 
    complete testing system in place it would alert us that the model isn't 
    meeting the spec. We've already had instance where temporary code to 
    bypass authorization was accidentally pushed to the develop branch."
On one end of the spectrum there is this base level of tests that are actually REALLY useful when developing features. Doing simple testing has helped me catch off by one errors and other kinds of bugs, I can stand by my code confidently. These kinds of ad-hoc in development tests don't cover everything, but they're useful. They're like a substitute for step through debugging to verify code.

On the other, I don't know how big of an issue bug regressions will be, we've heard other developers suffering for lack of a solid testing base to detect problems. Without a full battery of tests asserting everything we know about the spec, there's no way we will catch many regressions.

HN what do you think?

> I don't know what to do with tests, to be honest. ... HN what do you think?

Well, since you asked!

The most valuable tests in any system is functional tests. Unfortunately most tests are unit tests that give little value.

Here's something that I encountered recently in the wild. Imagine a situation like this, a calculation process with three distint black boxes of code.

Block A - Pull data from 3rd party source

Block B - Distribute data to individual nodes (multi-threaded environment)

Block C - Individual nodes processing data

Each one of these blocks is around fifty thousand lines of code, each one of these blocks has hundreds if not thousands of unit tests. Every line of code is tested by a unit test. Yet there is a very real and dangerous bug in the system.

Why? Because the object passed from B-C has a transient variable and when it is serialized and deserialized that value is lost and reverted to the default value. Now most places leave this value as the default, so it's only one out of a thousand customers that have this problem, but when they do it's a huge problem.

Functional tests is the process of saying "I put meat/spices on this side and sausage comes out the other" it doesn't try to determine the RPM of the grinder, the number of bolts, the rotation of the gears, or the type of metal used to build the sausage-maker. It is simply the process of determining that meat becomes sausage. Functional tests have high value and are absolutely worth knowing.

In most CRUD applications Unit Tests generally wind up testing that the universe still exists as expected (true is still equal to true, zero is not one, and a null value is not the same a string, oh happy days the system works!).

That's not always the case. If you are working on something that is more advanced, say something that should measure the chance a hurricane might knock down a given 3d model you'll wind up having a huge stack of unit tests, and these will be very valuable. More often than not you'll know they are valuable as they'll be spent proving the science behind the application, and not simply that the application 'does something'.

Thats good advice, thanks.

That bug with the transient variable. You could only would really catch that if you had e2e or integration tests covering a majority of the code in your application, right? Even then, only if you were to persist the data, read the data back out, then again run more tests on it.

I accept testing isn't a silver bullet, but ouch.

As a general strategy, it sounds like it's better to run your higher level tests as a gauntlet (feeding the result of test one into test two) then with tightly controlled inputs (using explicit data for each test).

FWIW, here are a few things about automated testing that I’ve found to be true very generally.

Other things being equal, testing is good and automated testing is more valuable than manual testing, for the same reasons that having any systematic, repeatable process is generally more efficient and reliable than doing the same things manually and hoping to avoid human error. Many of the following notes are just reasons why other things aren’t always equal and sometimes you might prefer other priorities.

Automated test suites get diminishing returns beyond a certain point. Even having a few tests to make sure things aren’t completely broken when you make a change can be a great time-saver. On the other hand, writing lots of detailed tests for lots of edge cases takes a lot of time and only helps if you break those specific cases. For most projects, a middle ground will be better than an extreme. But remember that as a practical reality, an absurdly high proportion of bugs are introduced in those quick one-line changes in simple functions that couldn’t possibly go wrong, so sometimes testing even simple things in key parts of the code can be worthwhile.

Automated test suites have an opportunity cost. Time you spend writing and maintaining tests is time you’re not spending performing code reviews, or formally proving that a key algorithm does what you think it does, or conducting usability tests, or taking another pass over a requirements spec to make sure it’s self-consistent and really does describe what you want to build. These things can all help to develop better software, too.

Automated test suites do not have to be automated unit test suites. For example, higher-level functional or integration tests can be very useful, and for some projects may offer better value than trying to maintain a comprehensive unit test suite.

Unit tests tend to work best with pure code that has no side effects. As soon as you have any kind of external dependency, and you start talking about faking a database or file access or network stack or whatever other form of I/O, unit testing tends to become messy and much more expensive, and often you’re not even testing the same set-up that will run for real any more.

A corollary to the above is that separating code that deals with external interactions from code that deals with any serious internal logic is often a good idea. Different testing strategies might be best for the different parts of the system. (IME, this kind of separation of concerns is also helpful for many other reasons when you’re designing software, but those are off-topic here.)

Modifying your software design just to support unit tests can have serious consequences and can harm other valuable testing/quality activities. For example, mechanics that you introduce just to support unit testing might make language-level tools for encapsulation and modular design less effective, or might split up related code into different places so code reviews are more difficult or time-consuming.

Automated testing is there to make sure your code is working properly. It is not a substitute for having robust specifications, writing useful documentation, thinking about the design of your system, or, most importantly of all, understanding the problem you’re trying to solve and how you’re trying to solve it.

No-one really only writes code that is necessary to pass their tests. Even if they religiously adhere to writing tests first, at some point they generalise the underlying code because that’s what makes it useful, and the test suite didn’t drive that generalisation or verify that it was correct. In TDD terms, the tests only drive the red/green part, not the refactor part.

For similar reasons, just because someone has a test suite, that does not mean they can safely refactor at will without thinking. This may be the most dangerous illusion in all of TDD advocacy, but unfortunately it seems to be a widespread belief.

A lot of “evidence” cited in support of various test strategies and wider development processes is fundamentally flawed. Read critically, and be sceptical of conclusions that over-generalise.

And finally, what works for someone else’s project might not be a good choice for yours, and something that didn’t fit for someone else might still be useful for you. If you’re experimenting, it can be very informative just to keep even basic records of roughly how much time you’re really spending on different activities and what is really happening with interesting things like speed of adding new features or how many bugs are getting reported in released code. You’re allowed to change your mind and try a different approach if whatever you’re doing right now isn’t paying off.

Even though I can sympathize with the tone of the OP, I think his argument cannot be generalized. Consider the following scenarios:

A one-person engineering team. This is some 20-something working on a start-up in his parents' garage or the 40-something code hobbyist out there that likes writing fun little apps every now and then. I would argue that there is virtually no need for testing unless you're either bad at writing code or inexperienced. I've written dozens of applications (in various languages) and once you get into the swing of things, you don't need TDD to actually "test." Debugging/edge cases/etc. just becomes a natural process of writing code.

A two-person engineering team. This is a secondary stage in every start-up -- sometimes, start-ups even are founded by two engineers. Here is where TDD starts being important. I may be an expert coder, but if my partner isn't that great (or if our code doesn't work well together), not testing can be a huge nightmare. But the impact is still relatively small. Bugs are easy to spot as both engineers are (or at least should be) incredibly familiar with the entire codebase.

Three-person to 10-person engineering team. Here is where things get really tricky and TDD becomes integral to meeting deadlines and actually, you know, putting out products. You've got Jim working on some financial stuff while Bob is optimizing some JPA code. At the same time, Susan is implementing a couple of new features. Having a test suite ensures that behavior will remain predictable given non-centralized iterations. Without TDD, large teams would be fixing bugs more than they would be writing code (most teams probably do that anyway -- not that it's a good thing).

10+ people; open source projects, etc. When you have more than 10 people working on a project, I think that TDD is simply a necessity. There are simply too many moving parts. Without a cohesive test suite, having this many people mess around with code will undoubtedly break stuff.

Ironically, I think that small teams put too much emphasis on TDD, whereas teams in larger companies (AT&T and IBM, for example) put too little.

There seems to be a sense that writing tests slows you down. And that is understandable. However, I find that tests helps me move in small steps and provide a feedback loop that helps me stay in a groove moving towards the finish line. And they help me move faster. I think we as developers need to continually learn how to write better tests that provide deeper value and allow us to move more quickly.

I find that when testing is tough it is often that the underlying design is deficient. And tests shine a light on code and design smells. Discarding tests could be valuable if the tests have outlived their value. It might also be the case that excessive test maintenance is telling you something about your production code.

My general rule:

If you are planning on some sort of reuse (API, library, protocol, etc) a test spec is a very good idea if only to hash out practical use of the interface and identify caveats in the interface spec. Once released, the test spec is great for answering the question "did I accidentally break something that used to work?" thus vastly speeding up the bug-fix process for underlying components or widely used components.

If you are writing an end-user app, or a service that will be used non-programmatically (such as a GUI or web app), then a test spec is not required and probably a waste of time as things will likely change too quickly.

I'm sorry, but I've seen both sides of the coin here and think that in the right situations, having a suite of automated-tests to give you confidence in an application is invaluable. Especially when you consider the alternative, which is hours wasted debugging. Sure writing tests can be hard, but it will make you write better, more modular code.

If you really do need to just get something out the door that quick, then you should just be prepared to accept the consequences of no tests, or put some in later for maintenance sake.

Depending on the context, I fully agree (most stuff here is about web development so in general shipping a broken web page is fine - it wont cause too much damage even when it happens, and it can be fixed quickly).

In Safety Critical Software however, I hope tests have been written for every single change, and everything has been checked over and over again. Tests become a lot more valuable and simply having to re write them, may make you think of problems you have introduced in the code.

It's a good idea to record invariants about your code and your problem. Tests are one way, and there are some other ways. It's a good idea to have the checking of invariants be automatic somehow.

Any assumptions about the output of your code that you make but don't record in a checkable form, are something that could potentially be silently broken by a change in the code. And then you're wasting your time debugging rather than solving the problem you're working on.

Sounds not very sustainable to me. Ruthless and speedy refactoring is only possible with tests. Better onboarding of new team members is only possible with tests. Updating external libraries is only possible with tests. How many hours did I spent in projects w/o tests figuring out what's going on... things that an easy test could have told me in the first place. In a world where cash (=time) is king, I wouldn’t mind having those dollars back.

Another day, another link-bait, faux controversy post about testing on HN. TLDR: Writing bad tests makes for a bad time. Do we need a whole article to cover that?

As a side note, all this recent hand wringing over testing reminds me so much of the time when the new breed of version control software was coming out, it was the same kind of accusations about things being "overhyped" and "you can just email a tar file!"

My approach to tests is that where you control all dependencies, do tests for core functionality and any tricky parts. After all, you can always hot-fix any breakage since everything is under your control. However, where you have external dependencies (eg: an API that third parties use) you should have fairly robust tests to ensure you are not introducing any regressions.

Without due scientific process to back up TDD we are left with such arguments having equal validity as TDD. Without a scientific proof of TDD being superior then we are left with mere opinions on both sides.

NAARP! http://paddy3118.blogspot.co.uk/2013/02/no-attempt-at-rigoro...

Like anything in life, you have to make a judgement. There is no silver bullet. Tests are very painful to maintain in all but the most well understood problems. There is a ton of great software we all rely on that doesn't have much testing. Having a ton of people rely and test beta versions of your code is much more valuable then any test suite. Dare to think.

If the author stopped at the title, he would be one hundred percent correct.

It's easy to rant about unit tests. Don't fall into that trap. Be aware that it's a hype, and correct for that fact - and once you do, use your best judgement. In the end, it's you, the engineer, who's shipping the product - not the countless ever-arguing denizens of the internet.

Man I like this article so much. I worked at a company that aimed at 90% test coverage. Without taking into account how important the piece of code under test was.

I think the 80/20 rule applies here. Only 20% of your code does 80% of the work. Focus on writing tests for those 20% then your time is much better spent.

The importance of testing obviously depends on how severe the consequences of shipping broken code is. If not testing means that some broken UI element on some webpage slips by, that's obviously easier to live with than if not testing means you blow up some $150M piece of hardware.

It's true that some tests might get redundant when your functionality changes, but you don't know which part you will throw away at the point you're writing it. Better have that safety net for the part that will need to continue to work after you refactor.

[X] is overhyped != [X] is useless

His beef is more with unit tests than functional tests, and I can't disagree with that. Functional tests code that will directly impact your users while unit tests are merely there to make the developer's life more comfortable.

I'll take a crack at my own over-generalization, albeit one I think is more useful.

Adding strict types (Typescript!) to your code base gives you way more maintainability than testing. Do it first, then if you still needs tests, write them.

Can we please stop referring to 'testing' as if it's the same thing as TDD?

I think finding a critical balance what to test and what to skip is often overlooked.

Test early to minimize the cash investment you are worried about in test.

I agree and this is not the only thing that is overhyped right now...

Also overhyped... trains, The Lumineers, IPAs.

tests should be fast and easy to write , that's why any library should be bundled with things that make it easy ( null classes , mocked classes , ect...).

Applications are open for YC Summer 2018

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact